Fitting data sets to models is the basis of machine learning and Data Science. In this way we are able to create predictive tools that help people make important decisions.
This is typically done by minimizing the difference between the predictions generated by the model and the actual data in the model training, validating, and testing phases.
To do this, we must be able to identify the difference between the model prediction and the actual values of the data. To fit the model to the data and create a valuable prediction tool, we must be able to rapidly calculate that difference using a minimizable function.
This is the goal of the cost function.
What is a cost function?
In data science, the cost function calculates the difference between your model predictions and your data set.
According to Wikipedia a cost function is "a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event". In more practical terms, it’s a function to calculate a parameter that you are about to either minimize or maximize.
In data science, this commonly means identifying the differences between your models predictions and the training data set. In this way you can quantify the error in your model’s predictions, and use optimization approaches to minimize that error.
In the name cost function, "function" refers to the fact that it’s a function calculating a value and "cost" refers to the penalty that is being calculated by the function.
Note that the notion of a cost function is very flexible, and it can be customized to match a situation. It’s about writing an equation that captures what you really care about.
How are cost functions used?
One example of cost function use is a person simulating the energy performance of a building. (S)he wants to minimize energy consumption, and writes a cost function accordingly. But the minimum energy consumption would be a building that never heats or cools the space! Clearly not acceptable.
(S)he can add a term to the cost function that also penalizes the result for the number of hours when the occupants are uncomfortable. Then the result of the optimization will be a design that uses the minimum amount of energy while still maintaining comfort in the space. This cost function might look something like:
Cost = Energy_Consumption + 1,000,000 * Uncomfortable_Hours
In this way a single hour of predicted discomfort for occupants is as bad as one million units of energy consumption. Now the result of the system will be to find the most energy efficient way to heat and cool the space, not merely avoid heating/cooling the space to save energy.
What are some examples of cost functions in model fitting?
You can design your cost function as needed to study what’s most important in your application.
This was a very industry specific example. Some more generic data science examples include:
- Calculating the sum of the error between model predictions and data points, as discussed in Understanding the Fundamentals of Linear Regression,
- Calculating the sum of the squared error between model predictions and data points, as discussed in Understanding Multiple Regression,
- The Mean Absolute Error (MAE), which expresses the average difference between the predictions and the data points across the set, and
- The Root Mean Square Error (RMSE), which calculates the square root of the average squared error across the data set.
Remember that you can design your cost function as needed to study what’s most important in your application. This is important because all cost functions have different strengths and weaknesses.
Thinking back to the building energy simulation example, a cost function that only examined energy consumption would have different impacts from one that only looked at uncomfortable hours. One without the energy consumption term has no pressure to ensure that the building is energy efficient. And one without the uncomfortable hours term has no pressure to ensure that the building keeps the occupants comfortable.
Combining the two into a single cost function forces the model to care about both.
How do I balance a cost function with multiple terms?
Since the model is capable of examining both, it becomes important to ensure that each term is given the right amount of consideration in the cost function. This is done by adding weighting factors, or multipliers to each term, that tell the cost function how much you care about each term.
In the building energy simulation example we had a weighting factor of 1 for the energy consumption, and a weighting factor of 1,000,000 for the uncomfortable hours. This basically forces the model to ensure there are 0 hours of discomfort.
But what if a few hours of discomfort are OK to save energy? Then you can change your weighting factors to give the model results a different balance of caring about energy consumption and uncomfortable hours.
Weighting factors tell the cost function how much you care about each term.
A more directly data science example of this is combining MAE and RMSE in a cost function used to fit a model to a data set. MAE has the effect of getting the model as close to the data set as possible on average, across the entire data set. "On average" is an important term here. It can lead to models that are extremely close at most point, and extremely inaccurate in a few points. The RMSE has the opposite impact. Since it uses the average of the squared errors, it pressures the model to have small errors at each and every point. It avoids models with a small handful of extremely inaccurate points, but can sacrifice average model accuracy to force that goal.
To balance the strengths and weaknesses of these two approaches, a model fitting cost function could combine the MBE and RMSE. You would then want to apply weighting factors to the two terms in the cost function to balance the two terms as needed for your application.
Wrapping it up
In order to create useful prediction tools, we must be able to create mathematical models that closely fit the data set. In order to create models that closely fit the data set, we must be able to quantify how different our models predictions are from the actual data.
This is where the role of the cost function. Cost functions are formulas calculating a "cost" associated with a situation. In the case of model fitting, the cost function typically calculates the error in the model.
Cost functions are flexible, and can include multiple terms. This enables you to combine different important factors into a single cost function. One example is looking at energy consumption of a building. Combining the terms of energy consumption and uncomfortable hours creates a cost function forcing the model to minimize energy consumption while also making sure people stay comfortable. In model fitting, a cost function may want to combine both the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE).
Cost functions with multiple terms need a way to balance the two terms, according to how much you value one over the other. Weighting factors are multipliers applied to each term that increase/decrease the value of that term accordingly. They can be added and adjusted to each term as needed to identify the desired balance in the cost function, and in the results of the minimization.