Q 506. There is a golden rule of sampling - larger the sample size better it is. However, if your sample size is too large, it leads to Lindley's Paradox - Small errors in the null hypothesis are magnified when large data sets are analyzed, leading to false but highly statistically significant results. Illustrate this paradox by providing examples. What are the ways to avoid it? Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday. All questions so far can be seen here - https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/ Please visit the forum home page at https://www.benchmarksixsigma.com/forum/ to respond to the latest question open till the next Tuesday/ Friday evening 5 PM as per Indian Standard Time. Questions launched on Tuesdays are open till Friday and questions launched on Friday are open till Tuesday. When you respond to this question, your answer will not be visible till it is reviewed. Only non-plagiarised (plagiarism below 5-10%) responses will be approved. If you have doubts about plagiarism, please check your answer with a plagiarism checker tool like https://smallseotools.com/plagiarism-checker/ before submitting. The best answer is always shown at the top among responses and the author finds honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term

Message added by Mayank Gupta, September 23, 20223 yr

Lindley's Paradox states that small errors in the null hypothesis are magnified when large data sets are analyzed, leading to false but highly statistically significant results.

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Rahul Arora on 22nd Sep 2022.

Applause for all the respondents - Rahul Arora, Rakesh Chandra, Subham De Sarkar, M Vijayakumar Elangovan.

Lindley's Paradox

Followers

September 20, 20223 yr

Q 506. There is a golden rule of sampling - larger the sample size better it is. However, if your sample size is too large, it leads to Lindley's Paradox - Small errors in the null hypothesis are magnified when large data sets are analyzed, leading to false but highly statistically significant results. Illustrate this paradox by providing examples. What are the ways to avoid it?

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

All questions so far can be seen here - https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/
Please visit the forum home page at https://www.benchmarksixsigma.com/forum/ to respond to the latest question open till the next Tuesday/ Friday evening 5 PM as per Indian Standard Time. Questions launched on Tuesdays are open till Friday and questions launched on Friday are open till Tuesday.
When you respond to this question, your answer will not be visible till it is reviewed. Only non-plagiarised (plagiarism below 5-10%) responses will be approved. If you have doubts about plagiarism, please check your answer with a plagiarism checker tool like https://smallseotools.com/plagiarism-checker/ before submitting.
The best answer is always shown at the top among responses and the author finds honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term

Solved by Rahul.Arora2

September 22, 20223 yr

Go to solution

September 22, 20223 yr

Solution

Lindley’s Paradox, developed by Sir Harold Jeffrey, showcased the conflict between the frequentist & bayesian approaches to hypothesis testing. It refers to the fact that with the increase in sample size (keeping a constant p-value eg p < 0.05), there seems to be a conflict between p-values & baye’s factors i.e. the p-value suggests that the null hypothesis (Ho) should be rejected, however the baye’s factor indicates towards the null hypothesis (Ho) out-predicting the alternative hypothesis (Ha) & this would ultimately result in Ho being rejected as per the frequentist approach & accepted basis the bayesian approach simultaneously.

Let us try to understand this concept through an example:-

Suppose a bank which processes loan applications receives applications for home loan. Also generally the bank receives all kinds of loan applications in two batches on a regular basis i.e. one batch containing 25% home loan applications & the second batch containing 50% home loan applications. Now the bank wants to figure out which of these two batches the received applications belong to.

Thus in order to do that, let’s say the bank takes a random sample of 48 applications & observed that 36 of these random samples are home loan applications which amounts to 75%. Thus going by the above result we can conclude that the applications belong to the second batch i.e. which contains 50% home loan applications.

Now let us apply hypothesis testing & go with the first hypothesis i.e. Testing whether the applications belong to the first batch which contains 25% home loan applications. Let us calculate the populations parameters i.e µ & σ.

µ = np = 48*0.25 = 12

σ = sqrt(np(1-p)) = sqrt(48*0.25*(1-0.25)) = sqrt(48*0.25*0.75)) = 3

Now at 99% confidence level (or 0.01 significance level), the range is 12 +/- 3*3 i.e. from 3 to 21. Here findings of 36 samples taken above is nowhere close to this range thus making us reject the null hypothesis i.e. the applications belong to the batch containing 25% home loan applications.

Now let us also test the hypothesis whether the applications received belong to the second batch containing 50% home loan applications.

Let us again calculate the populations parameters i.e µ & σ.

µ = np = 48*0.50 = 24

σ = sqrt(np(1-p)) = sqrt(48*0.50*(1-0.50)) = sqrt(48*0.50*0.50)) = 3.5

Now at 99% confidence level (or 0.01 significance level), the range from 13.5 to 34.5 which does not include the sample result of 36, which again will lead us to reject the null hypothesis that the applications received belong to the second batch i.e one containing 50% home loan applications.

Now basis the results, the possibility of the received applications belonging to both the batches got rejected which is the underlying premise of lindley’s paradox.

Let us now also see the different ways through which we can mitigate the same:-

One approach is to lower down the alpha level as a function of the sample size, thus one should get the best result with any value of alpha that makes the ratio of critical value to the standard error increase with increase in sample size.

Another approach is to set the baye’s factor (which is basically the ratio of the probability of data under both null & alternate hypothesis i.e. p(data|Ha) / p(data|Ho)) to 1 which implies equal evidence for both null & alternate hypothesis. Next is to adjust the alpha level in a way that the baye’s factor at the critical test statistic value is not greater than 1.

September 22, 20223 yr

Larger the sample size better it is-

· Larger sample size is more closely approximate the population as the primary goal of inferential statistics is to generalize from a sample to a population; it is less of an inference if the sample size is larger.

· Small sample size is bad and if we pick a small sample means we are running a greater risk and it will be completely random and will be very unrepresentative of the whole population so the variability will be greater if the sample size is small.

· The value of the standard error is directly dependent on the sample size and to calculate the standard error, we divide the standard deviation by the sample size.

· If the sample size is large enough, a sampling distribution will be normally distributed and if the sampling distribution is normally distributed, we can make better inferences about the population from the sample.

· Large sample size gives more power and we will have the smaller standard error.

Lindley’s paradox makes note the conflict between Bayesian and Frequentist evidences in hypothesis testing.

If the sample size is large, we become more confident about our estimate and our intervals become smaller and the size of our confidence interval decreases.

Whenever the sample size is too large, the chances of the error are small in null hypothesis and highly statistically significant result.

The ways to avoid the paradox is to not conflict both the Bayesian and Frequentist evidence in hypothesis testing and to use the analysis on the base of large sample size.

September 23, 20223 yr

Lindley paradox

The Lindley paradox is a perplexing situation in statistics where, depending on the prior distribution selection, the results of the frequentist and Bayesian approaches to a hypothesis testing problem differ.

Two hypotheses, Ho and H1, as well as some prior distribution pi that represents uncertainty as to which hypothesis is more accurate before taking into account, can each account for the result x of an experiment.

The Lindley paradox appears when:

A frequentist test of Ho finds that the result x is "significant," i.e., there is enough evidence to reject Ho at the 5% level.

2. There is strong evidence that Ho agrees with x more strongly than H1 based on the posterior probability of Ho given x being high.

These outcomes are possible when Ho is highly specific, H1 is more diffuse, and neither Ho nor H1 is strongly favoured by the prior distribution, as shown in the example below.

The paradox proposed by Lindley is illustrated by the next numerical example.

49,581 boys and 48,870 girls have been born in a particular city over a specific time frame. Therefore, the observed percentage of male births is 49,581/98,451 0.5036. Assumed to be a binomial variable with parameter, the proportion of male births. Whether is 0.5 or another value is what we want to find out. In other words, the alternative to Ho:Ø 0.5 is the null hypothesis, Ho:Ø =0.5.

Calculating a p-value, or the likelihood of observing a fraction of boys at least as large as x assuming Ho is true, is the frequentist method for testing Ho. Due to the large number of births, we can use a normal approximation to calculate the percentage of male births, xN(, 2), where =np= n=98451x 0.5=49225.5 and 2=n(1-)=98451x0.5x0.5=24612.75

image.png.ae3a8b725f8825114245adb757e4093e.png

A frequentist would typically conduct a two-sided test, for which the p-value would be P2x0.0117=0.0235, because we would have been equally surprised if we had observed 49,581 female births, or x0.4964. The frequentist approach rejects Ho in both cases because the p-values are less than the significance level,, of 5% and Ho does not agree with the observed data.

September 23, 20223 yr

Lindley’s Paradox concern the situation when comparing Null & Alternate Hypothesis test results in significant leading reject the null hypothesis

The example follows a binomial where a survey of people who feel positive about the government. We have taken null hypothesis Ho= 0.5 and Alternate as Ha Not equal 0.5.

We absorbed 20K cases and 9.8K was mentioned as positive. In this case, P value is 0.047, if we 95% significance level and the null hypothesis are rejected.

Lindley’s paradox can happen

• Sample size is large

• Ho is precise

• Ha is not strong opposite or relatively diffuse or not one-sided

Multiple ways to address the false positive in the sampling.

• In above example sample set is not clear whether we need gender segregation will take the survey. We need to take samples which address the true population.

• Take a sub-sample from each sector to get a better true population

• When handling a large sample, the null hypothesis not be precise but rather more projected

• Alternate Hypothesis should have strong contract opposition which should be one side.

3 yr3 yr Rohit Gandhi locked this topic

September 23, 20223 yr

Author

Going by the description, example and the methods to avoid the Paradox, Rahul Arora's answer is selected as the best answer this week.

3 yr3 yr Rohit Gandhi unlocked this topic

Create an account or sign in to comment

Followers

Go to topic listing

Lindley's Paradox

Featured Replies

Solved by Rahul.Arora2

Create an account or sign in to comment

Who's Online (See full list)

Lead AI Transformation without coding

Most Solved

Forum Statistics

Member Statistics

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)