Inter-Rater Reliability

"Statistics: Keep it Simple and it will be Useful"

Do you need general statistical consulting services?  See what Dr. Gwet has to offer.

Inter-rater reliability - kappa statistic
     
     Inter-Rater Reliability Articles

Awareness

  • More articles...

"When you can measure what you are speaking about, and express it in numbers, you know something about it.  But when you cannot -- your knowledge is of meager and unsatisfactory kind. ---" Lord Kelvin (1824-1907).

 

The numbers needed to shape the knowledge are obtained from the basic information provided by data collectors.  This data, we know it, has errors (particularly when one believes it does not).  We often speak about these errors. To know something about the errors, we need to quantify them too.  Here is the thing that many researchers do not do.  Those who decide to conduct this quantification exercise must know that proper use of statistical principles is the key to success.

 

It is impossible quantify all possible types of errors in data.  However, two types of errors known for their dramatic impact on data quality have caught my attention:

  • When the basic information is gathered by many  different data collectors, it is often the case that they do not fully agree  about the implementation of various data collection procedures.  This could lead to serious measurement errors.  In order to learn something about this type of errors, researchers often assess the extent of agreement between raters with the Inter-rater reliability (IRR) statistic, or "Inter rater reliability", also referred to as interrater agreement.

A lot of statistical thinking, which has been conducted on this topic, is unknown to most researchers.  And what is known to researchers to date is not backed by a sound methodology. Our goal is to provide on this site, a more rigorous treatment of this problem. In addition to learning about the serious limitations of the ubiquitous kappa statistic, researchers will find new and more reliable tools for evaluating the extent of agreement between raters.  In fact, there are many other interrater agreement indices that are available to researchers.

  • The second type of errors that concerns us is due  to sampling. It is a fact that researchers tend to over-generalize results obtained from an experiment that is limited in scope.  Such a generalization is often referred to as inference and can be complex.  Inference must include proper weighting  of data as well as a careful assessment of the magnitude of the sampling error.  Here, we delve into the domain of statistical inference. Although the kappa statistic has received considerable attention in the literature, the treatment of the inferential aspects of its use remains incomplete. Such a treatment is even more needed for other inter rater reliability coefficients, which have received much less attention. Future developments of this site will provide researchers with basic tools for streamlining the process of generalizing research findings.

 

  E-Books On Sale

Solution Graphics

(Price: $14.95)

interrater agreement - kappa statistic

 
 
 
 

 

(Price: $8.95)

Inter-rater reliability - kappa statistic