Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta,

Berkson's Paradox is a fallacy where two variables seem to be correlated to each other but in reality they are not. This is usually a result of systematically observing some events more than others (ascertainment bias). Berkson's paradox is also known as Berkson's bias or Collider bias.

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Kiran Kumar Gadhamsetty and Rahul Arora.

 

Applause for all the respondents - Chandra Shekar, Kiran Kumar Gadhamsetty, Rahul Arora, Sohan Subhash Mirajkar, Mohamed Asif, Kaviraj Rajasekar, Piyush Jain.

Featured Replies

Q 489. One would assume a positive relationship between smoking cigarettes and COVID-19 severity. However, as per a research (the European Commission review by Wenzel 2020), they have a negative relationship. While it is counterintuitive, this is one of the recent examples of Berkson's paradox. Explain the paradox citing a few more examples. What are the methods of preventing this paradox? 

 

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

Solved by Kiran Kumar Gadhamsetty

Berkson’s paradox or Berkson's fallacy is a counter-intuitive result in probability & statistics.

For example. let's say we have two independent events A and B. By definition of independence, the conditional probability of occurrence of event-A given B is the same as the probability of occurrence of event-A:

 

                                                                        P(A|B) =P(A)

 

that is, knowing that event-B occurred gives us no information about the probability that A has occurred.

Berkson’s paradox states that, if we restrict ourselves to the cases where events A  or  B occurs,  where least one of the events A or B occurs– knowledge that B has occurred makes it less likely that A has also occurred.

 

                                                                   P(A|B, A or B )  < P(A|A or B )

 

The reason that this result is counter-intuitive is that A and B are two independent events. That is P(A|B)=P(A).  But they become negatively dependent on each other when we restrict ourselves to the cases where A or B occurs. 

Berkson’s paradox is a form of selection bias in restricting ourselves to A or B and we ignore the cases where both A and B do not occur.

 

Examples

For example, suppose a stamp collector has 1000 postage stamps, of which 300 are pretty and 100 are rare, with 30 being both pretty and rare. 10% of all stamps are rare and 10% of pretty stamps are rare. So prettiness tells nothing about rarity. 

 

The stamp collector puts the 370 stamps which are pretty or rare on the display. Just over 27% of the stamps on display are rare, but still only 10% of the pretty stamps on the display are rare . If an observer only considers stamps on the display, he will observe that a spurious negative relationship between prettiness and rarity as a result of the selection bias. That is, not-prettiness strongly indicates rarity in the display, but not in the total collection of stamps.

  • Solution

Berkson's paradox describes a situation where the conclusion on correlation between two variables from a sample study is found to be against our intuition. This happens because of wrong sample selection.

 

For example, regular exercise keeps a person active. However, if the study is performed on hospitalized patients, the result can turn out to be counterintuitive.

 

As another example, regular investments in mutual funds provides a positive return. But, if the the study is performed on poorly performing funds, the result might be negative returns.

 

Sample selection needs to be done from a general population instead of a biased population to prevent occurrence of Berkson's paradox.

Berkson’s Paradox also known as Berkson’s Bias or Collider Bias is a particular kind of selection bias that is caused by systematically observing some events more than the others. It seems to show case correlation between two independent events however in reality there is no such correlation that exists between those independent events. Here the correlation between two events let’s say A & B i.e. the probability of event A happening is higher in the presence of event B happens because cases where neither of the events occurs are excluded from the sample taken fro study.
 
This principle was illustrated by Joseph Berkson in 1946 with a case study that linked diabetes with cholecystitis amongst the patients admitted in a hospital. There seemed to be no correlation amongst both the diseases based on the data collected from the overall population, however since the samples in this study were taken from the patients admitted in the hospital it indicated a misleading positive association between the two diseases. Let us see this example to understand why such association was observed:-
 
Let’s say that the population of patients admitted in the hospital is 100 & the two diseases i.e. diabetes & cholecystitis are two independent events & we have even distribution of population amongst the four categories as shown below:-
  • High cholecystitis & low diabetes : 25
  • High cholecystitis & high diabetes : 25
  • Low cholecystitis & low diabetes : 25
  • Low cholecystitis & high diabetes : 25
Now since the data was collected from the hospital hence the category with low cholecystitis & low diabetes would not appear in the study i.e. only the data for the other three categories would be reported. Let us calculate the probability that a patient with lower diabetes diagnosed with cholecystitis would be calculated as:-
P(High cholecystitis | Low Diabetes) = 25/25 = 100% (since we have not considered the category with low cholecystitis & low diabetes in the study)
Now let us calculate the probability that a patient with high diabetes would be diagnosed with cholecystitis:-
P(High cholecystitis | High Diabetes) = 25/(25+25) = 50% (since we will be taking into considerations both the categories where patient has high diabetes)
Thus there is a false conclusion now that the patients with high diabetes tend to have a lower risk of having cholecystitis.
 
Let us now take another example in order to see the effect of Berkson’s Paradox:-
 
Let us consider two seemingly dependent events i.e. Diligence & Academic Results. Now logically there should be a positive relationship between these two variables i.e. the more diligent you are the better your academic results would be. Thus there will be an uneven distribution of population as well as shown below:-
 
  • Lazy & Good Results : 20
  • Hardworking & Good Results : 30
  • Lazy & Poor Results : 30
  • Hardworking & Poor Results : 20
 
Now let’s say the data is collected from a top school then we would not be considering lazy students with poor results. Here the probability of lazy students getting good results is given as :-
 
P(Good Results | Lazy) = 20/20 = 100% (since we have not considered the category of lazy students having poor results in the study)
 
Also the probability of hardworking students getting good results:-
 
P(Good Results | Lazy) = 20/(30+20) = 60% (since we have not considered both the categories where students are having good results)
 
Thus this study also gives us a wrong impression that the lazy students have better chances of getting good results which is not the case had we collected the data from the entire population of schools.  
 
Preventing Berkson’s Bias:
 
Below are the common strategies should be adopted in order to prevent Berkson’s Bias:-
 
  • Select the correct target population eg: if the target population is students then including locals who didn’t attend college will introduce bias.
  • Select random samples from the target population eg: in order to study the effect of sleep on college student grades ensure that the right balance of students who have enrolled to early morning courses & night courses are included rather than taking students belonging to only one type of course.
  • Perform a pilot study before going for a full blown study as this will give an idea quickly on the appropriateness of the selection design.
  • Create a standard method of selecting samples from the population & measuring data so that everyone involved in the study is calibrated.

Berkson first explained about this paradox in 1946. His original research paper demonstrated that two diseases, which has no real life relationship, can be what he called ‘spuriously associated‘ in hospital-based case control studies. This idea was not widely accepted until 1979, but when David Sackett of McMaster University provided a strong evidence that Berkson’s paradox does exist, it got acceptance.

 

Berkson’s paradox which is also called as Berkson’s fallacy or Berkson’s bias is the counter-intuitive idea that events which seem to be correlated actually are not correlated.

 

Examples of Berkson’s paradox:

1. For example, take 2 events which are completely independent like lung cancer and diabetes. If a study selects for both the presence of lung cancer and diabetes, if there is presence of cancer then it is more likely to have diabetes as well. Although intuitively, there is no sense in this correlation, but the data seems to back this counter-intuitive notion up, showing that there is, in fact, a connection.

 

2. For example, someone may observe from experience that fast food restaurants in their area which serve good burgers also tend to serve bad french fries and vice versa; but because they would likely not eat anywhere where both burgers and french fries were bad, they might fail to allow for the large number of restaurants in this category which weaken or even contradict the correlation.

 

The most effective way to prevent Berkson’s bias in research studies is to collect a simple random sample from a population. That means that every member of the given population of interest has an equal chance of being included in the sample.

Berkson’s paradox is a special case of collider bias.

In simple terms, this bias results from conditioning on a common effect of at least two causes.

 

In more easy terms:

This happens when 2 variables appear to be negatively correlated in the sample data yet they are actually positively correlated with regards to the overall population

 

For instance, let’s consider, two ancestors namely, exposure (E) and disease (D) and a common descendent (C).

Here conditioning on C leads to a distortion in the association between E and D. That is Berkson's fallacy. 

 

r234.png.70fc4da0faaf0a02f5e5d1ad1303e51c.png

 

In the below example, if we condition on the collider ‘hospitalization’, we can notice a reversal in the association between Smoking and Covid

Ref2.thumb.jpg.9125dac6247ceeb83fd6520ea672d386.jpg

 

This is very much similar to that of the Berkson's original work in 1946, where he observed a negative correlation between cholecystitis and diabetes in patients, in spite of diabetes being a risk factor for cholecystitis.

 

One of the best methods to prevent the bias is to collect simple random samples from population and that itself will reduce the errors in data gathering. Ensuring to properly define the population and then examine statistically whether the sample is the unbiased representation of the population.

 

Berkson’s Paradox – Generally visualized as a statistical Illusion.

This paradox is also known as Berkson’s bias, this is a condition where two metrics can statistically be negatively correlated or even uncorrelated in the general population though they appear to be positively correlated in the specific population. The situation may arise due to the selection BIAS of the analyser/observer during the collection of data. On the other hand, Berkson’s Paradox is the counter-intuitive connection between two traits in the statistical data. Basically, this false observation will be based on the inappropriate assumption that “cause” is related to “effect” and bias in data collection. Let’s take the high school dropouts as an example.

 

Example: We were often impressed with the coverage of school dropouts who became multi-billionaires and people would start saying “universities are useless!” or “Academics are useless!” as their response. Their argument is graduates are struggling to find jobs while high school or university dropouts are building successful businesses around the world. Let us take this case to understand the paradox.

 image.png

 

Figure.1, General population. In this graph, the X axis is higher education level, and the Y axis is a higher success.

 

We biased ourselves as that less-educated people will find great success in life. Hence, we have ignored a fair proportion of the population (Shaded in Fig.2) in our collected data before analysis, and proceed with the analysis with succeeded/educated people (Fig. 2).

 image.png

  

Figure.2, Ignored un-succeeded / less educated population.

Because of the bias in our data collection period, We ended up seeing 100% of less-educated people finding success (Figure 3 – green shade), But only a fraction of highly-educated people finding success (Figure 4 – green shade). Therefore, forced to conclude that, education seems to make people less successful in life.

Figure.3, People with a low education level.     
image.png
Figure.4, People with a high education level

 

Think of a scenario, where the few data from the population were ignored due to selection bias, the takeaway will be Highly successful people are less educated.

image.png

 

On the other hand, if we consider the entire population without selection bias (as shown in Fig.1) then the results will show that success in life is not even correlated with the level of education.

 

 image.png

This indicates how a narrow mindset/selection bias would lead to strange conclusions in analysis. Hence before using statistics to support our claim or fact, everyone should think and double confirm whether we collected enough data and opened our views wide enough to avoid Berkson’s paradox/bias.

 

 

Berkson’s Paradox, also known as Berkson’s falseness or Berkson’s bias is the counter-intuitive idea that events which feel to be identified actually are not. 
 
 Take two events, A and B, which are fully independent events( for illustration, lung cancer and diabetes). still, the presence of diabetes will make the presence of lung cancer more likely, If a study selects for both the presence of A( lung cancer) and B( diabetes). Intimately, this makes no sense, but the data seems to back thiscounter-intuitive notion up, showing that there is, in fact, a connection. 
Berkson wrote about the incongruity in 1946. His original paper showed that two conditions, which have no real relationship, can be what he called ‘ spuriously associated ‘ in sanitarium- grounded case control studies. still, the idea wasn’t extensively accepted until 1979, when David Sackett of McMaster University handed strong substantiation that Berkson’s incongruity does, in fact, live.

 

To understand this, consider a particular children’s Hospital during an influenza dread. We ’re going to prove thecounter-intuitive idea that having influenza offers some protection against appendicitis.

 

10% of the general public has Influenza.


In the sanitarium, full of sick children, the odds are of course higher; 30 percent of the children may have been admitted for influenza.
Now suppose 10 of the children were admitted for appendicitis.
There will be some imbrication; we assume a child with appendicitis is just as likely to get flu as any other child, and a child with flu can still have appendicitis. The percent of appendicitis cases with influenza would be 10 of 10,(0.10 *0.10 = 0.01) or 1 of Hospital cases.

 

a.png.46a231456bd67ce9d88f7d2f3524ffd0.png

 

Reasons for hospital admissions/100 children: Influenza (blue), Appendicitis (red), both (red/blue).


still, he has 30 chance of having influenza, and a 10 chance of having epilepsy/ storms, If you choose one sanitarium child aimlessly. That's to say, 10 out of 100 children will have epilepsy/ storms, and 30 out of a hundred will have influenza.
Now let’s calculate a new chance a non influenza child’s chance of having appendicitis.
You're choosing from all the children in the unheroic boxed area( 70 children) below.

 

 

3.png.3ff23b559f8574636798951082d7d57b.png

 

 

 

This is what we know, 

The thirty influenza cases outside of the unheroic box includes the appendicitis/ influenza( red/ blue) children. In our illustration, that’s just one child,
Out of the 100 children, there were 10 total appendicitis cases, so there will be 9 among the seventy non-influenza cases we ’re picking from now.
So we can calculate the new chance a non influenza child has a 9/ 70 = 12.9/ 100, or
chance, of having appendicitis. That’s advanced than the 10 rate of appendicitis among all children.
So Indeed though these two events are entirely independent, the inner- sanitarium statistics make it look like having influenza is some small insurance against appendicitis.

There are two winners to this question - Kiran Kumar Gadhamsetty (for providing some interesting examples) and Rahul Arora (for highlighting multiple ways to prevent this bias). Well done!

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.