Kappa is defined as the ratio of proportion of times that the appraisers agree to max proportion of times that the appraisers could agree.
Kappa ranges from -1 to 1
The larger the kappa, the more agreement in that category
For instance, Kappa value of 1 represents Absolute agreement
Below table represents commonly accepted values for reliability measures:
Cohen’s kappa Value Interpretation:
0.91 - 1.00 - Almost perfect
0.80 - 0.90 - Strong
0.60 - 0.79 - Moderate
0.40 - 0.59 - Weak
0.21 - 0.39 - Minimal
0.00 - 0.20 - None
Krippendorff’s alpha Value Interpretation:
0.80 - 1.00 - Reliable value
0.67 - 0.79 - Acceptable for tentative conclusions
0.00 - 0.66 - Not acceptable
Take Away:
With caution, Stat practitioners should primarily examine the marginal distribution and not uncritically interpret the kappa value whether it is high or low. As prevalence, odds, raters independence, and the impact on diagnosis and other additional factors can have significant influence on the kappa statistics.
Kappa statistics represents the degree of absolute agreement amongst ratings and popular statistics includes that of,
Cohen’s kappa – Measures assessment agreement between two raters
Fleiss’s kappa – Generalization of Cohen’s kappa (>2 raters)
In most of the statistical tools, such as Minitab, by default Fleiss’s kappa is calculated for AAA (Attribute Agreement Analysis)
As we could note here, Fleiss’s kappa is based on the theory that the observed agreement is corrected for the agreement expected by chance.
However, on the contrary, Krippendorff’s alpha is based on the observed disagreement corrected for disagreement expected by chance.
Key Differences:
Fleiss’s kappa:
Cannot handle missing values
Expected agreement sample size is infinite
Best suited for Nominal data
Krippendorff’s alpha:
Can handle missing values
Actual sample size is considered
Can handle all data types
Both Fleiss’s kappa and Krippendorff’s alpha can be likewise recommended in the circumstance when the data is nominal and when there are no missing values.
However, Krippendorff’s alpha statistics is preferred in below situations, viz.,
Whenever the data is missing
Higher than nominal order (ordinal, interval, ratio)
When there is bias in the distribution of disagreements (even strong bias will not have any distorting effect)
When different participants have different number of raters (usually when the number of raters is more than 2 and can be applied to any scale level)
When there is incompatibility in obtaining observation ratios by pair counting in the small samples
Summary Table:
Final Take Away:
Before deep diving into the reliability data, it is recommended that based on the context, practitioners should select the index of Inter Coder Reliability based on data properties and assumptions, including the level of measurement of each variable to calculate the agreement and the number of coders.
Most of the times, it is difficult and complex to compute Krippendorff’s alpha statistics compared to Fleiss’s kappa, however Krippendorff’s alpha provides higher reliability, particularly when there are no perfect conditions for research.