The world’s leading publication for data science, AI, and ML professionals.

Understanding ROC Curves

Building the basic intuition for receiver operating characteristics curve with Python

Photo by Isaac Smith on Unsplash
Photo by Isaac Smith on Unsplash

with Python

If you google: "ROC curve machine learning", you get a Wikipedia answer like this:

A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied

Another common description is that the ROC Curve reflects the sensitivity of the model across different classification thresholds. When I was starting out in Machine Learning, these definitions would always confuse me.

In this article, I will share how I learned to disentangle my "beginner-like" confusions and develop a good enough intuition about the ROC curve.


What a Heck is the ROC Curve?

One way to understand the ROC curve is that it describes a relationship between the model’s sensitivity (the true-positive rate or TPR) versus it’s specificity (described with respect to the false-positive rate: 1-FPR).

Now, let’s disentangle each concept here.

The TPR, known as the sensitivity of the model, is the ratio of correct classifications of the "positive" class divided by all the positive classes available in the dataset, mathematically:

image by author
image by author

while the FPR is the ratio between false positives (number of predictions misclassified as positives) and all the negative classes available, mathematically:

image by author
image by author

So in essence, you are comparing how the sensitivity of the model changes with respect to the false-positive rate across different threshold scores that reflect a decision boundary of the model to classify an input as positive.


Decoupling the Issue with The Threshold Scores

The intuition that alluded me, in the beginning, was to grasp the role of the threshold score. One good starting point is to build a mental picture:

With this classic visualization, one can learn the first intuition, which is that the ideal model is one with the true positive rate as high as possible while keeping the false positive rate as low as possible.

The threshold corresponds to some value T (like a value between 0 and 1 for example) that serves as the decision boundary for the classifier and it affects the trade-off between TPR and FPR.

Let’s write some code to get a visual of all of these components.


Visualizing the ROC Curve

The steps to visualize this will be:

  1. Import our dependencies
  2. Draw some fake data with the drawdata package for Jupyter notebooks
  3. Import the fake data to a pandas dataframe
  4. Fit a logistic regression model on the data
  5. Get predictions of the logistic regression model in the form of probability values
  6. Set different threshold scores
  7. Visualize the roc curve plot
  8. Draw some final conclusions

1. Import our dependencies

from drawdata import draw_scatter
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_recall_curve,precision_score, plot_roc_curve

2. Draw some fake data with the drawdata package for Jupyter notebooks

draw_scatter()

Output:

3. Import the fake data to a pandas dataframe

df = pd.read_csv("./data.csv")

4. Fit a logistic regression model on the data

def get_fp_tp(y, proba, threshold):
    """Return the number of false positives and true positives."""
    # source: https://towardsdatascience.com/roc-curve-explained-50acab4f7bd8
    # Classify into classes
    pred = pd.Series(np.where(proba>=threshold, 1, 0), 
                     dtype='category')
    pred.cat.set_categories([0,1], inplace=True)
    # Create confusion matrix
    confusion_matrix = pred.groupby([y, pred]).size().unstack()
                           .rename(columns={0: 'pred_0', 
                                            1: 'pred_1'}, 
                                   index={0: 'actual_0', 
                                          1: 'actual_1'})
    false_positives = confusion_matrix.loc['actual_0', 'pred_1']
    true_positives = confusion_matrix.loc['actual_1', 'pred_1']
    return false_positives, true_positives

# train test split on the fake generated dataset
X = df[["x", "y"]].values
Y = df["z"].values
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
y_test = np.array([1 if p=="a" else 0 for p in y_test])
y_train = np.array([1 if p=="a" else 0 for p in y_train])

# create the model
lgr = LogisticRegression()
lgr.fit(X_train, y_train)

5. Get predictions of the logistic regression model in the form of probability values

y_hat = lgr.predict_proba(X_test)[:,1]

6. Set different threshold scores

thresholds = np.linspace(0,1,100)

7. Visualize the roc curve plot

# defining fpr and tpr
tpr = []
fpr = []
# defining positives and negatives
positives = np.sum(y_test==1)
negatives = np.sum(y_test==0)

# looping over threshold scores and getting the number of false positives and true positives
for th in thresholds:
    fp,tp = get_fp_tp(y_test, y_hat, th)
    tpr.append(tp/positives)
    fpr.append(fp/negatives)

plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r',label='Random', alpha=.8)
plt.plot(fpr,tpr, label="ROC Curve",color="blue")
plt.text(0.5, 0.5, "varying threshold scores (0-1)", rotation=0, size=12,ha="center", va="center",bbox=dict(boxstyle="rarrow"))
plt.xlabel("False Positve Rate")
plt.ylabel("True Positive Rate")
plt.legend()
plt.show()

8. Draw some final conclusions

By varying the threshold scores we get increasing values of both true positive and false-positive rates. A good model is one where the threshold score puts the true positive rate as close as possible to 1 while keeping the false positive rate as low as possible.

But, how could we choose the best classification threshold?

A simple method is to take the one with the maximal sum of true positive and false-negative rates (1- FPR).

Another criterion could be to simply choose the point closest to the top left corner of your ROC space. However, that implies that the true positive rate and the true negative rate have the same weight ([source](https://stats.stackexchange.com/questions/123124/how-to-determine-the-optimal-threshold-for-a-classifier-and-generate-roc-curve#:~:text=A simple method is to,thresholds like financial costs, etc.&text=Choose the point closest to,corner of your ROC space.)), which is not necessarily true in cases like cancer classification where the negative impact of a false positive is bigger than the impact of a true positive.


Final Thoughts on ROC Curves

I think taking some time to digest evaluation metrics is extremely beneficial in the long run for your Machine Learning journey. In this article you learned:

  • A basic intuition on how ROC curves work
  • How classification thresholds affect the relationship between the sensitivity and specificity of a model
  • An intuition on how to use the ROC curve to set an optimal classification threshold

If you want to learn more about Python for Machine Learning, check out these courses:

These are affiliate links, if you use them I get a small commission, cheers! 🙂


If you liked this post, follow me on Medium, subscribe to my newsletter, connect with me on Twitter, LinkedIn, Instagram, or join Medium! Thanks and see you next time! 🙂


References


Related Articles