Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta,

Chi Square Test is a family of hypothesis tests that compare the observed distribution of your data to their expected distribution under the null hypothesis.

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Sandip Mittra and Shiva Kumar V

 

Applause for all the respondents - Gaurav Mathur, Sandip Mittra, Gopal Menon, Afzal Wadood, Shiva Kumar V.

Chi Square Test

Featured Replies

Q 430. Chi Square Test can be utilized for three types of comparisons - A test of homogeneity, or as a test of goodness of fit, or as a test of independence. Elaborate the difference between the three with suitable examples.

 

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

Solved by Sandip Mittra

chi-squared test (χ2) is basically a comparison of two statistical data sets. It is also known as Pearson’s Chi-Sqaure test for categorical data analysis and distribution. It is a method used for comparing experimentally obtained results with those to be expected theoretically on some hypothesis.

 

Chi-Square Formala

χ2 = ∑(Observed Values – Expected Values)2/Expected Values

 

Chi-square test of goodness of fit is a test to test how well an observed distribution fits to a theoretical one. The chi-square test of goodness of fit helps us to compare the distribution of classes of observations with an expected distribution.

Chi-square test of independence is used to test for a relationship between two categorical variables. It uses the fact to compute expected values under the assumption that the two variables are independent. This test is for categorical data used for two independent variables and want to check if there is any relationship between the variables.

 

Chi-square test of homogeneity tests to see whether different columns (or rows) of data in a table come from the same population or not.

  • Solution

Chi-Square Test

The test which helps us to measure the differences between the observed and expected value according to an assumed hypothesis is called Chi-Square Test. It was developed by Karl Pearson in 1900 and is the most important test amongst the several test of significance. It is a non-parametric test which is not based on any assumption or distribution of any variable

 

There are 3 applications of Chi-Square Tests

1.       Goodness of fit

2.       Test for independence

3.       Test of homogeneity

 

Goodness of Fit

The Chi-square goodness of fit test is used to compare a randomly collected sample containing a single categorical variable to a larger population. For example – If we want to test if 70% of ladies take medical, then we must use Goodness of Fit.

This is measured using the below formula.

 

chi-square-formula.jpg

Where:

O = Observed Value

E = Expected value

If X2 (Calculated) > X2 (Tabulated) with (n-1) degree of freedom, then null hypothesis is rejected otherwise accepted.

 

Let’s try to understand this with an example.

 

In a medical college of 1000 students, there are 650 female students. Does it follow the theory that 70% of female students take medical?

 

 

Observed

Expected

Male

350

300

Female

650

700

Total

1000

1000

 

The hypotheses for a Chi-square test of independence are as follows:

 

Null Hypothesis (HO): The collected data is consistent with the population distribution.

Alternative Hypothesis (HA): The collected data is not consistent with the population distribution.

 

 

Observed

Expected

(O-E)

(O-E)^2

(O-E)^2/E

Male

350

300

50

2500

8.333333

Female

650

700

-50

2500

3.571429

Total

1000

1000

 

 

11.90476

 

X2 (Calculated) = 11.9

X2 (Tabulated) with (n-1) degree of freedom = 3.8

 

Decision: -

If X2 (Calculated) > X2 (Tabulated) with (n-1) degree of freedom, then null hypothesis is rejected otherwise accepted.

 

11.9 > 3.8 and therefore the null hypothesis is rejected

 

Test for independence

 

This test helps us to identify if there is an association between two categorical variables within the same population.

 

Let us understand this with an example:

 

Table with Observed Value

Qualification / City or Village

Middle school

High School

Bachelors

Masters

PHD

Total

City

18

36

21

9

21

105

Village

12

36

45

36

6

135

Total

30

72

66

45

27

240

 

The hypotheses for a Chi-square test of independence are as follows:

 

Null Hypothesis (HO): There is no association between the qualification and the places the students come from.

Alternative Hypothesis (HA): There is association between the qualification and the places the students come from

 

Table with Expected Value

Qualification / City or Village

Middle school

High School

Bachelors

Masters

PHD

City

(105x30)/240 = 13

(105x72)/240 = 32

29

20

12

Village

17

41

37

25

15

 

Observed Value (O)

Expected Value (E)

(O-E)

(O-E)^2

(O-E)^2/E

18

13

5

23.8

1.8

36

32

5

20.3

0.6

21

29

-8

62.0

2.1

9

20

-11

114.2

5.8

21

12

9

84.4

7.1

12

17

-5

23.8

1.4

36

41

-5

20.3

0.5

45

37

8

62.0

1.7

36

25

11

114.2

4.5

6

15

-9

84.4

5.6

       

31.2

 

Degree of freedom         = (Column -1) x (Row-1)

                                           = (5-1) x (2-1)

                                           =4 x 1 = 4

X2 (Calculated) =31.2

X2 (Tabulated) with 4 degree of freedom = 9.48

 

Decision: -

 

If X2 (Calculated) > X2 (Tabulated) then null hypothesis is rejected otherwise accepted.

 

31.2 > 9.48 and therefore the null hypothesis is rejected

 

Test of homogeneity

 

We perform this test to confirm if the event is following uniformity or not. The basic difference from Test for independence is for two categorical variables within the same population and Test of Homogeneity is for single categorical variable within different population.

 

Let us see if the TV watching pattern of males and females are same.

 

The hypotheses for a Chi-square test of homogeneity are as follows:

Null Hypothesis (HO): The distribution of watching pattern of TV for Males and females is same.

Alternative Hypothesis (HA): The distribution of watching pattern of TV for Males and females is not same.

 

Table with Observed Value

Qualification / City or Village

Movies

Sports

Serials

Other

Total

Male

72

84

49

45

250

Female

91

86

88

35

300

Total

163

170

137

80

550

 

Table with Expected Value

Qualification / City or Village

Movies

Sports

Serials

Other

Male

74

77

62

36

Female

89

93

75

44

 

Observed Value (O)

Expected Value (E)

(O-E)

(O-E)^2

(O-E)^2/E

72

74

-2

4.4

0.1

84

77

7

45.3

0.6

49

62

-13

176.2

2.8

45

36

9

74.6

2.1

91

89

2

4.4

0.0

86

93

-7

45.3

0.5

88

75

13

176.2

2.4

35

44

-9

74.6

1.7

       

10.1

Degree of freedom         = (Column -1)

                                           = 4-1=3

X2 (Calculated) =10.1

X2 (Tabulated) with 3 degree of freedom = 7.8

 

Decision: -

If X2 (Calculated) > X2 (Tabulated) then null hypothesis is rejected otherwise accepted.

 

10.1 > 7.8 and therefore the null hypothesis is rejected. The watching pattern of TV is not same between males and females.

 

 

 

 

Chi Square Goodness of fit test is used to determine if the sample data of variable is a true representation of the population. For e.g., Cadbury's GEMS has chocolates coated with various colors. If one has to find out if the data distribution of green color in a sample (in this example let's consider it to be 20% of total count of GEMS chocolates in the sample) is a true representation of the population, the goodness of fit test can be applied. Here, the comparison is of distribution of one variable against the entire population is as per the expected distribution

 

Chi Square test of homogeneity test is used to confirm if the distribution of two categorical variables is same as each other or not. E.g. This test can be used to compare if GEMS Mini crackers and normal GEMS have same color mix / distributions. In this case the variable distribution data for both the variables is not available and data has to be collected from 2 separate samples from 2 separate populations

 

Chi Square test for Association or Independence is used to check if the 2 variables are associated in some way. In this example, let's take GEMS packet and pick green color GEMS to understand this concept better. There could be green colored GEMS that are whole and also broken. This test can be used to determine if the there is some association between green color of GEMS and whether it is whole or broken. In this case, the variable distribution historic data is not expected to be available. However, the sample is collected from the same population

 

Chi Square test is a hypothesis test performed on the categorical data in various scenario’s. To perform the test, we first have to calculate the square of difference of observed and expected values and divide this by expected value. This is summed for all the distribution values which gives the calculated Chi Square values (X^2). Then we look for the tabulated chi square value for the applicable degrees of freedom.

 

If Calculated X^2> Tabulated X^2 then we reject the null hypothesis(which means we accept alternate hypothesis).

 

Below are the types of comparison where we can apply the  chi square test.

 

  1. Test of Homogeneity : Homogeneity refers to how similar thing are. So if want to check that 2 distributions are similar , we can perform this test of homogeneity. For  example, we can use the test  to check whether male or female have same/similar preference for various subjects in graduation studies (math, humanities, Science, etc.).

 

  1. Test of goodness of fit:  This is used to check how similar are the observed values to the values expected based on assumed distribution. For example , we have certain expectation for  sales of our company product in different days of the week . This may be based on some sales distribution we have assumed. Then we check the observed sales for the week days. Now we can use the test to check whether observed sales distribution is similar to the expected sales distribution or whether the expected distribution is the right fit for the observed values.

 

  1. Test of Independence:  This is used to test whether 2 variables are somewhat associated/related or they are independent. This is part of non- parametric tests. Suppose we want to check whether the education level and gender are related to each other. In other words whether gender of the person influences the education level of that person. Here we can use the Chi Square test of independence to check the relationship.
  Test of goodness of fit Test of homogeneity Test of independence
Purpose It is a type of Hypothesis testing, where population data with unknown distribution is being tested to see whether it fits any known distribution or not It is a type of Hypothesis testing, is used to check if 2 population data set with unknown distribution has the same distribution as each other  It is a type of Hypothesis testing, is used to decide whether two variables are dependent or independent
Data used for testing Single qualitative data or single survey question or single outcome is used for test Single qualitative survey question or experiment given to two different population Two qualitative survey questions or Experiments 
Test Population is uniform, Normal, Same as another with the known distribution Check whether the 2 data sets has same distribution as each other To see if population is related or unrelated
H0 & H1  H0 - Population fits the given distribution H0 - Two population follow the same distribution H0 - Two variables are independent
  H1 - Population doesn’t fit the given distribution H1 - Two population doesn't follow same distribution H1 - Two variables are dependent

This was a difficult one and I am pleasantly surprised to see some great answers from participants. There are two winners for this question - Sandeep and Shiva. Well done!

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.