
The first time I saw a ROC Curve (Receiver Operating Characteristic), I was confused. Why do data scientists and statisticians have so many names for the same thing (for example, recall is the same thing as sensitivity which is the same thing as the true positive rate)? And what exactly is this curve trying to tell me?
But after thinking about and using them a bit more, I realized that the ROC Curve is basically a cost/benefit curve for your model. More specifically, a ROC Curve is the Data Science cousin of the efficient frontier curve from financial markets.
The Efficient Frontier
Let me put my Finance hat back on for one second. One of the core concepts of finance is that there is a tradeoff between risk and return – the more your money is at risk, the more you should be compensated for it (through higher expected returns). While the definition of risk depends heavily on your temperament and investment philosophy, the underlying intuition makes a lot of sense: if you are not getting paid for it why take on the additional risk?

While you may not necessarily agree with the rank order of the assets in the previous plot, it gets the point across. However, that’s not the whole story. Nobody holds a portfolio of just stocks or just bonds. We hold diversified mixes of investment assets. So how does the risk and return plot change when we start putting stocks, bonds, gold, etc. together in the same portfolio. That is, what does the risk and return plot look like if we were to calculate a risk and return value for all the portfolios we could possibly hold and then plot all of them on a scatter plot? It would look something like this:

Look familiar? It looks exactly like a ROC Curve! The curved line in the previous graph, which we call the efficient frontier, is the lynchpin of the investment world. It’s probably worthy of its own series of blog posts but the key points to note here are:
- The portfolios represented by the blue dots are dominated by the portfolios on the curved green line (the efficient frontier). This means that a rational investor would never want to hold one of the inferior portfolios on the red line because for the same amount of risk, he could earn a higher return on the efficient frontier. And for a given set of investment assets, there is only one efficient frontier.
- Why does the efficient frontier line have a curved shape? Because there are diminishing returns to taking on more risk. I won’t be able to do it justice in this bullet but the reason that returns diminish is that as you move further out on the risk curve (as you move right on the X axis), your portfolio becomes increasingly concentrated in stocks and other risky assets. And these risky assets are highly correlated so you start to lose the benefits of portfolio diversification. We will see later that ROC Curves behave in a similar manner.
Back to Data Science
So why all this talk of finance and the efficient frontier? Isn’t this a data science post? That’s because cost and benefit are most straightforward and easily understood when it impacts our wallets. It should be obvious why the efficient frontier is attractive – portfolios along it give you the most bang for your buck. ROC Curves work the same way – the curve with the most area under it represents the model that gives us the most bang for our buck. That model is basically your data science efficient frontier.
In the case of ROC Curves, what is benefit (the bang) and what is cost (the buck)? Let’s take a look at an actual ROC Curve.

Benefit is the True Positive Rate of your model and cost is the False Positive Rate (I will explain this in plain English later, just bear with me for a second). Just like with portfolio risk and return, we want to find the model that gives us the highest True Positive Rate for a given False Positive Rate. In the ROC Curve chart above, random forest has the highest area under the curve (0.847). And as you can see, for any False Positive Rate, it produces the highest True Positive Rate (just put your finger at any point along the X axis and then move it straight up; the last line you cross is the model that gives you the most bang). The area under the curve comprehensively summarizes a model’s bang for your buck across the various levels of buck (a.k.a. cost).
The Confusion Matrix
Now let’s unpack what the costs and benefits mean with an example. In binary classification problems, we are trying to classify observations into one of two groups such as _goodinvestment vs. _badinvestment. The following 2 by 2 matrix, known as the confusion matrix, shows the results of a simple stock prediction model that tries to guess whether a stock will go up or down over the next 12 months (the data is fictitious and for illustrative purposes only).

The numbers in red represent incorrect predictions while the ones in green are correct ones. The cool thing about the ROC Curve is that it captures all four of these numbers in one chart and visually shows the tradeoffs between them.
The Bang/Benefit
Let’s calculate our benefit (the True Positive Rate) first:
True Positive Rate = 30 / (10 + 30) = 75%
What does that 75% mean? There were 30 + 10 = 40 good investments in all and we correctly predicted and capitalized on 30, or 75%, of them. Not bad! And even though True Positive Rate is the bang/benefit of our model, notice that its calculation includes the 10 stocks we incorrectly predicted to be bad investments (Predicted = Bad, Actual = Good). So the True Positive Rate (our bang) is a number that includes both our correct predictions but also our opportunity cost (from not investing when it was actually a good stock).
Opportunity costs are no fun so how can we reduce those? In the previous ROC Curve, look at the orange line (Naive Bayes). What happens if we are willing to incur maximum cost (~100% False Positive Rate), predicting everything to be a good investment? In that case you catch every good investment but incur a significant cost.
Also note that in the example ROC Curve above, Random Forest is able to reduce opportunity cost to nearly zero while Naive Bayes is unable to do the same without incurring maximum cost. Generally, we will always have to accept at least a few missed opportunities. Finally, notice that the ROC Curve exhibits the same diminishing returns as the efficient frontier. This means that each subsequent attempt to reduce missed opportunities will cost you more.
The Buck/Cost
There is a tradeoff between reducing missed opportunities (a.k.a. false negatives) and increasing our model’s False Positive Rate (the cost). Let’s calculate the cost of our previous example first:
False Positive Rate = 20 / (40 + 20) = 33%
This means that there were 40 + 20 = 60 bad investments and sadly, we got stuck with 20, or 33%, of them. In our example, these are real negative costs not just missed opportunities. We invested our money and got hit with losses 20 times, ouch!
Let’s dive deeper into the cost vs. benefit tradeoff represented by the ROC Curve. Assuming my model has real signal (that it is not just producing random noise), I can use a probability threshold to adjust how often it turns on. For example, if I set the threshold to 90% (only predict a Good Investment when the model spits out a probability of 90% or more), it would almost never turn on – but when it does, we would expect to make money more often than not.
On the other hand if I set the threshold to 10%, our model would turn on all the time. And we would invest in plenty of stocks but much more of these (relative to the 90% threshold case) would be money losers. These money losers are the false positives, the costs we incur in our quest to capture as many good investments (true positives) as we can.
Tying it All Together
How to properly tune the threshold of your model is a topic for another day. It depends heavily on the individual characteristics of your model, the average cost of a False Positive vs. that of a False Negative, etc..
But the one key I want you to takeaway from all this is that every classification model can be framed as a tradeoff between:
- Opportunity Cost: The higher the benefit/True Positive Rate of your model’s predictions, the fewer opportunities you miss out on and the lower your opportunity cost. But when you lower opportunity cost you end up casting a wider net and along with more true positives you also catch more…
- False Positives: A higher false positive rate means your model takes more actions but is also more frequently wrong when it does turn on. These false positives are your cost incurred.
And the ROC Curve is a visualization of how effectively a particular model trades off between cost and benefit. That is why the model with the highest area under the curve, like the efficient frontier, is the model that dominates all others in terms of delivering the most benefit for the amount of cost incurred.