Lindley’s Paradox, developed by Sir Harold Jeffrey, showcased the conflict between the frequentist & bayesian approaches to hypothesis testing. It refers to the fact that with the increase in sample size (keeping a constant p-value eg p < 0.05), there seems to be a conflict between p-values & baye’s factors i.e. the p-value suggests that the null hypothesis (Ho) should be rejected, however the baye’s factor indicates towards the null hypothesis (Ho) out-predicting the alternative hypothesis (Ha) & this would ultimately result in Ho being rejected as per the frequentist approach & accepted basis the bayesian approach simultaneously.
Let us try to understand this concept through an example:-
Suppose a bank which processes loan applications receives applications for home loan. Also generally the bank receives all kinds of loan applications in two batches on a regular basis i.e. one batch containing 25% home loan applications & the second batch containing 50% home loan applications. Now the bank wants to figure out which of these two batches the received applications belong to.
Thus in order to do that, let’s say the bank takes a random sample of 48 applications & observed that 36 of these random samples are home loan applications which amounts to 75%. Thus going by the above result we can conclude that the applications belong to the second batch i.e. which contains 50% home loan applications.
Now let us apply hypothesis testing & go with the first hypothesis i.e. Testing whether the applications belong to the first batch which contains 25% home loan applications. Let us calculate the populations parameters i.e µ & σ.
µ = np = 48*0.25 = 12
σ = sqrt(np(1-p)) = sqrt(48*0.25*(1-0.25)) = sqrt(48*0.25*0.75)) = 3
Now at 99% confidence level (or 0.01 significance level), the range is 12 +/- 3*3 i.e. from 3 to 21. Here findings of 36 samples taken above is nowhere close to this range thus making us reject the null hypothesis i.e. the applications belong to the batch containing 25% home loan applications.
Now let us also test the hypothesis whether the applications received belong to the second batch containing 50% home loan applications.
Let us again calculate the populations parameters i.e µ & σ.
µ = np = 48*0.50 = 24
σ = sqrt(np(1-p)) = sqrt(48*0.50*(1-0.50)) = sqrt(48*0.50*0.50)) = 3.5
Now at 99% confidence level (or 0.01 significance level), the range from 13.5 to 34.5 which does not include the sample result of 36, which again will lead us to reject the null hypothesis that the applications received belong to the second batch i.e one containing 50% home loan applications.
Now basis the results, the possibility of the received applications belonging to both the batches got rejected which is the underlying premise of lindley’s paradox.
Let us now also see the different ways through which we can mitigate the same:-
One approach is to lower down the alpha level as a function of the sample size, thus one should get the best result with any value of alpha that makes the ratio of critical value to the standard error increase with increase in sample size.
Another approach is to set the baye’s factor (which is basically the ratio of the probability of data under both null & alternate hypothesis i.e. p(data|Ha) / p(data|Ho)) to 1 which implies equal evidence for both null & alternate hypothesis. Next is to adjust the alpha level in a way that the baye’s factor at the critical test statistic value is not greater than 1.