The world’s leading publication for data science, AI, and ML professionals.

Building a Web App to Calculate Cohen’s Kappa Coefficient

Mixing streamlit with scikit-learn

Image by Charles Deluvio from Unsplash
Image by Charles Deluvio from Unsplash

Introduction

In healthcare research, there are often multiple groups or individuals looking into the same condition or variable. They collect data independently, but expect, and often need, consistency between the people reporting the findings. Well-designed procedures must therefore include systematic measure agreement among the various data collectors. The extent of agreement among data collectors is known as interrater reliability.

Historically, measuring reliability between two raters has been achieved with simple percent agreement, which accounts for the total number of agreements out of the total number of scores. This, however, does not control for the possibility that both scorers agreed inadvertently.

Cohen’s Kappa Coefficient was therefore developed to adjust for this possibility.

Putting it simply, Cohen’s Kappa is a way to measure reliability between two raters (judges, observers), correcting for the probability of agreement occurring by chance.

For more reading on Cohen’s Kappa Coefficient, view the following:

Cohen’s kappa – Wikipedia

Interrater reliability: the kappa statistic

Cohen’s Kappa


Building a Simple Kappa Statistic App

My hope with this application was to create a simple way to input data to calculate Cohen’s Kappa Coefficient. To do so, I used streamlit, which is an open-source framework to rapidly create Data Science apps with pure python, and scikit-learn, which is an open source library used for machine learning and data analysis.

*Streamlit The fastest way to build and share data apps**

scikit-learn

Below is a tutorial building a simple kappa statistic app with Streamlit and scikit-learn.

Setting the Stage

Let’s begin by importing the libraries we need. Most importantly, we use scikit-learn (sklearn) for our functions and streamlit for our user interface.

import pandas as pd
import streamlit as st
import sklearn
from sklearn import metrics
import os
import numpy as np

from urllib.error import URLError
import matplotlib.pyplot as plt

import base64

Title and Text

Streamlit makes it easy to add text to our app.

To do so, we use streamlit.title and streamlit.text. We add additional text to our sidebar using streamlit.sidebar.

st.title("Kappa Stat Calculator")
st.text("Measuring interrater reliability")
st.sidebar.header("About")
st.sidebar.text("""The kappa stat calculator uses the
power of scikit-learn to quickly
calculate cohen's kappa statistic
between two raters.
Upload a csv file with columns
specifying your raters names or ids.
""")
Title and text
Title and text

File Upload

Next, we make it possible to upload a file into our app using streamlit.file_uploader.

df = st.file_uploader("Choose a file")
File upload
File upload

Splitting into Columns

For this next part, we split the screen into two columns to maintain easy usability and interaction. This can be done with streamlit.beta_columns.

col1, col2 = st.beta_columns(2)

Left Column

The left column will display our data frame, using pandas to work with our data. We use streamlit.dataframe and add df.style.highlight to make exploring data easier, exposing potential discrepancies.

with col1:
   st.dataframe(df.style.highlight_max(axis=0)) 
Dataframe with Columns of Raters
Dataframe with Columns of Raters

Right Column

The right column will display a graph for further data exploration at a glance. We can view a line chart with streamlit.linechart.

with col2:
   st.linechart(df)
Altair line graph showing dataframes
Altair line graph showing dataframes

Input Fields

Next, we create text input fields to specify columns in our dataframe. In this case, our data has two raters named person1 and person2.

person1 = st.sidebar.text_input("Enter column name for person 1")
person2 = st.sidebar.text_input("Enter column name for person 2")
Text input fields
Text input fields

Function

To utilize scikit-learn’s cohen’s kappa statistic calculator, we utilIze sklearn.metrics.cohen_kappa_score and display a button with streamlit.button.

kap = sklearn.metrics.cohen_kappa_score(y1, y2,labels=None, weights=None, sample_weight=None)
Streamlit button with sckit-learn function
Streamlit button with sckit-learn function

Results and Celebration!

Our function will show the Cohen’s Kappa Statistic below the button using streamlit.write.

st.sidebar.write('Result: %s' % kap)
Kappa Statistic
Kappa Statistic

We then celebrate with streamlit.balloons!!!

Takeaways

This article provided a brief overview of how to create tool for a common healthcare research metric. I hope you came away with a better understanding of Cohen’s Kappa, streamlit, and scikit-learn.

The source code for this project can be found here.

More stories from the author:

Making it rain with raincloud plots

Introducing OpenHAC— an open source toolkit for digital biomarker analysis and machine learning

Thank you for reading!!


Related Articles