Fleiss' Kappa
Cohen's Kappa
This is a way to measure agreement between 3 or more raters. Used for nominal data (e.g. likert scale).
Therefore this measures agreement between 3 or more dependent categorical samples
Similar to Fleiss’s Kappa This is a way to measure inter rater reliability but for below scenarios:
- 2 raters rate same trial once each or
- 1 rater rates 2 trials (measures agreement of new method with old or over time),
Can be used for any number of raters
Can be used for only 2 raters
Allows for scenario where each rater is rating different items also
Only works for scenario where raters are rating identical items
Assumption includes that raters are chosen independently from larger set
Assumption includes that raters are chosen deliberately and are fixed
Scenarios for use:
5 raters randomly picked from a pool asked to give pass/fail by picking samples randomly from pool (e.g. destructive tests)
Scenarios for use:
2 raters asked to give pass/fail for 20 interview candidates
Have 2 machines for measuring pass/fail of an item’s attribute
Condition of random sampling among raters means this is not suitable if all raters are reqd to rate all samples
Conversely not suitable if all samples cant be rated because of cost of test or if its destructive in nature