Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta,

Lindley's Paradox states that small errors in the null hypothesis are magnified when large data sets are analyzed, leading to false but highly statistically significant results.

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Rahul Arora on 22nd Sep 2022.

 

Applause for all the respondents - Rahul Arora, Rakesh Chandra, Subham De Sarkar, M Vijayakumar Elangovan.

Featured Replies

Q 506. There is a golden rule of sampling - larger the sample size better it is. However, if your sample size is too large, it leads to Lindley's Paradox - Small errors in the null hypothesis are magnified when large data sets are analyzed, leading to false but highly statistically significant results. Illustrate this paradox by providing examples. What are the ways to avoid it? 

 

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

Solved by Rahul.Arora2

  • Solution
Lindley’s Paradox, developed by Sir Harold Jeffrey, showcased the conflict between the frequentist & bayesian approaches to hypothesis testing. It refers to the fact that with the increase in sample size (keeping a constant p-value eg p < 0.05), there seems to be a conflict between p-values & baye’s factors i.e. the p-value suggests that the null hypothesis (Ho) should be rejected, however the baye’s factor indicates towards the null hypothesis (Ho) out-predicting the alternative hypothesis (Ha) & this would ultimately result in Ho being rejected as per the frequentist approach & accepted basis the bayesian approach simultaneously.
 
Let us try to understand this concept through an example:-
 
Suppose a bank which processes loan applications receives applications for home loan. Also generally the bank receives all kinds of loan applications in two batches on a regular basis i.e. one batch containing 25% home loan applications & the second batch containing 50% home loan applications. Now the bank wants to figure out which of these two batches the received applications belong to.
 
Thus in order to do that, let’s say the bank takes a random sample of 48 applications & observed that 36 of these random samples are home loan applications which amounts to 75%. Thus going by the above result we can conclude that the applications belong to the second batch i.e. which contains 50% home loan applications.
 
Now let us apply hypothesis testing & go with the first hypothesis i.e. Testing whether the applications belong to the first batch which contains 25% home loan applications. Let us calculate the populations parameters i.e µ & σ. 
µ = np = 48*0.25 = 12
σ = sqrt(np(1-p)) = sqrt(48*0.25*(1-0.25)) = sqrt(48*0.25*0.75)) = 3
 
Now at 99% confidence level (or 0.01 significance level), the range is 12 +/- 3*3 i.e. from 3 to 21. Here findings of 36 samples taken above is nowhere close to this range thus making us reject the null hypothesis i.e. the applications belong to the batch containing 25% home loan applications.
 
Now let us also test the hypothesis whether the applications received belong to the second batch containing 50% home loan applications.
Let us again calculate the populations parameters i.e µ & σ. 
µ = np = 48*0.50 = 24
σ = sqrt(np(1-p)) = sqrt(48*0.50*(1-0.50)) = sqrt(48*0.50*0.50)) = 3.5
 
Now at 99% confidence level (or 0.01 significance level), the range from 13.5 to 34.5 which does not include the sample result of 36, which again will lead us to reject the null hypothesis that the applications received belong to the second batch i.e one containing 50% home loan applications.
 
Now basis the results, the possibility of the received applications belonging to both the batches got rejected which is the underlying premise of lindley’s paradox.
 
Let us now also see the different ways through which we can mitigate the same:-
 
  • One approach is to lower down the alpha level as a function of the sample size, thus one should get the best result with any value of alpha that makes the ratio of critical value to the standard error increase with increase in sample size.

 

  • Another approach is to set the baye’s factor (which is basically the ratio of the probability of data under both null & alternate hypothesis i.e. p(data|Ha) / p(data|Ho)) to 1 which implies equal evidence for both null & alternate hypothesis. Next is to adjust the alpha level in a way that the baye’s factor at the critical test statistic value is not greater than 1.

Larger the sample size better it is-

·         Larger sample size is more closely approximate the population as the primary goal of inferential statistics is to generalize from a sample to a population; it is less of an inference if the sample size is larger.

·         Small sample size is bad and if we pick a small sample means we are running a greater risk and it will be completely random and will be very unrepresentative of the whole population so the variability will be greater if the sample size is small.

·         The value of the standard error is directly dependent on the sample size and to calculate the standard error, we divide the standard deviation by the sample size.

·         If the sample size is large enough, a sampling distribution will be normally distributed and if the sampling distribution is normally distributed, we can make better inferences about the population from the sample.

·         Large sample size gives more power and we will have the smaller standard error.

 

Lindley’s paradox makes note the conflict between Bayesian and Frequentist evidences in hypothesis testing.

If the sample size is large, we become more confident about our estimate and our intervals become smaller and the size of our confidence interval decreases.

Whenever the sample size is too large, the chances of the error are small in null hypothesis and highly statistically significant result.

 

The ways to avoid the paradox is to not conflict both the Bayesian and Frequentist evidence in hypothesis testing and to use the analysis on the base of large sample size.

Lindley paradox

The Lindley paradox is a perplexing situation in statistics where, depending on the prior distribution selection, the results of the frequentist and Bayesian approaches to a hypothesis testing problem differ.

 

Two hypotheses, Ho and H1, as well as some prior distribution pi that represents uncertainty as to which hypothesis is more accurate before taking into account, can each account for the result x of an experiment.

The Lindley paradox appears when:

A frequentist test of Ho finds that the result x is "significant," i.e., there is enough evidence to reject Ho at the 5% level.

 

2. There is strong evidence that Ho agrees with x more strongly than H1 based on the posterior probability of Ho given x being high.

 

These outcomes are possible when Ho is highly specific, H1 is more diffuse, and neither Ho nor H1 is strongly favoured by the prior distribution, as shown in the example below.

 

The paradox proposed by Lindley is illustrated by the next numerical example.

49,581 boys and 48,870 girls have been born in a particular city over a specific time frame. Therefore, the observed percentage of male births is 49,581/98,451 0.5036. Assumed to be a binomial variable with parameter, the proportion of male births. Whether is 0.5 or another value is what we want to find out. In other words, the alternative to Ho:Ø 0.5 is the null hypothesis, Ho:Ø =0.5.

 

Calculating a p-value, or the likelihood of observing a fraction of boys at least as large as x assuming Ho is true, is the frequentist method for testing Ho. Due to the large number of births, we can use a normal approximation to calculate the percentage of male births, xN(, 2), where =np= n=98451x 0.5=49225.5 and 2=n(1-)=98451x0.5x0.5=24612.75

 

 image.png.ae3a8b725f8825114245adb757e4093e.png

 

 

A frequentist would typically conduct a two-sided test, for which the p-value would be P2x0.0117=0.0235, because we would have been equally surprised if we had observed 49,581 female births, or x0.4964. The frequentist approach rejects Ho in both cases because the p-values are less than the significance level,, of 5% and Ho does not agree with the observed data.

 

Lindley’s Paradox concern the situation when comparing Null & Alternate Hypothesis test results in significant leading reject the null hypothesis

The example follows a binomial where a survey of people who feel positive about the government. We have taken null hypothesis Ho= 0.5 and Alternate as Ha Not equal 0.5.

We absorbed 20K cases and 9.8K was mentioned as positive. In this case, P value is 0.047, if we 95% significance level and the null hypothesis are rejected.

Lindley’s paradox can happen

            Sample size is large

            Ho is precise

            Ha is not strong opposite or relatively diffuse or not one-sided

Multiple ways to address the false positive in the sampling.

            In above example sample set is not clear whether we need gender segregation will take the survey. We need to take samples which address the true population.

            Take a sub-sample from each sector to get a better true population

            When handling a large sample, the null hypothesis not be precise but rather more projected

            Alternate Hypothesis should have strong contract opposition which should be one side.

 

 

  • Author

Going by the description, example and the methods to avoid the Paradox, Rahul Arora's answer is selected as the best answer this week. 

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.