An interesting and intuitive view of AUC and ROC curve

You don’t have to learn TPR, FPR, and a bunch of other stuffs before learning AUC.

Shiu-Tang Li
Towards Data Science

--

AUC, or area under ROC curve, is a metric widely used to evaluate model performance.

A bunch of resources about AUC are available online. They usually start by explaining to the readers about true positives, sensitivity, type I, II errors, and FPR, etc. Many of them are nice and they explain the concepts in detail, but some concepts may confuse a few people without analytics background. What I’d like point out here is, one can actually learn AUC well without knowing all these technical terms.

Let me start with a simple example showing how.

Say a credit card company builds a risk score model to evaluate whether the customer can pay the bill on time. The data they have is labelled, 1 being customer not able to pay on time (risky) and 0 indicating the ones who pay on time. This is what the result looks like by taking a sample of 20 records (assuming there’re no repetitive scores):

Let’s see how we can draw an ROC curve from these 20 records. We begin by drawing a square with (0,0) on the bottom left corner and (1,1) on the top right corner:

Then, in these 20 records, we have five 1s and fifteen 0s, so we partition the y axis by 5 and the x axis by 15:

Then, from high scores to low scores, the corresponding labels are 1,1,0,1,0,0, … , let’s replace 1 with ‘Up’ and 0 with ‘Right’, and we’ll get a new sequence: Up,Up,Right,Up,Right … Then starting from (0,0), draw your curve based on these ‘Up’s and ‘Right’s. One Up (or Right), one move. This is the ROC curve. The area below it is AUC.

Let’s compare this with a perfect model. In a perfect model, any ‘1’ will have a higher score than any ‘0’. As a result, ranking from high scores to low scores, the corresponding labels would be 1,1,1,1,1,0,0,0,0,0, … and the ROC would be

and the AUC is 1. Some observations:

  • To make AUC high, you need more ‘Up’s showing up before ‘Right’s.
  • This means, ‘1’s need to go before ‘0’s.
  • And this means, the model is giving the targets (records with label 1) higher scores, so the model is better.
  • AUC is between 0 and 1.
  • AUC is a ranking metric (what matters is the score order but not the score value itself).

Before I dive deeper into other the properties of ROC, I’d like to show a few special cases.

More examples

1. Tied scores. Let’s change the previous example a little bit to get tied scores. In this group (score 70), we have three 0s and a single 1. The revision we have to make is to combine three ‘Right’s and one ‘Up’ into a single move.

and the new ROC is

So, if all the scores are tied, the ROC would be exactly the line x=y (AUC = 0.5).

2. Random guesses. When records are assigned with random order, the ROC curve would fluctuate about x=y, and the AUC is around 0.5.

3. Reversed scores. If we reverse the score order and the example in the very beginning, we’ll get a new ROC curve symmetric to the original one about the point (0.5, 0.5) [Not symmetric about the line x=y!]. The new AUC is one minus the original AUC. This why people say when we have a model with AUC <0.5, we can perform this score reversion trick to get a better model.

Now let’s see what other info we can extract from the ROC curve.

Properties

  1. The higher the secant line slope, the higher the tag ratio (number of 1s over total number of records in a group). In the example below, two score groups are selected, and the group with high scores do have a higher tag ratio and hence the higher secant line slope.

2. Concavity. Usually, the ROC curve for the results generated by good ML algorithms is concave downwards in the entire domain. But if unfortunately you have a curve which is concave upwards on a subinterval, you can reverse the scores in that specific interval to boost performance (and it is suggested that you also check the modeling data).

3. High score region / low score region. The green box below contains the info of the records with higher scores, and the orange box contains the info of the lower scores. In green region, we hope the secant line slope is larger; in the orange region, the smaller the slope the better.

AUC is not the only way we measure ranking model performance. Sometimes we only focus on whether the model did a good job on the higher score region.

Model with low AUC still has values. For example, the AUC for the ROC curve below is only ~0.64, but it has nice performance on high scored records.

4. Model Ensemble. When we have multiple models on the same dataset, and the shapes of the ROCs are different (some are good in high score regions, some are good in low score regions), we can often find values by combining the models (bagging or stacking).

--

--