If you are reading this article, you are working or plan to work in the field of Data Science. Thus, I assume that you are well aware of the potential and importance of the field.
Data science is such a broad field that might exist in any process where we can collect data. The tools and techniques vary depending on the task but the ultimate goal is the same: creating value using data.
Machine Learning is a major sub field of data science. It involves algorithms that explore the rules, patterns or relations in data without being given explicit instructions. We provide the data to a machine learning algorithm and expect it to extract valuable insight from it.
In this article, I will briefly explain key terms and concepts that you are likely to encounter in machine learning. In a sense, it can be considered as a glossary for machine learning.
Note: If you’d like to read this article in Spanish, it is published on Planeta Chatbot.
Supervised learning
Supervised learning involves labelled observations or data points. A supervised learning algorithm models the relationship between independent variables (i.e. features) and a dependent variable (i.e. target or label) given a set of observations.
Consider a model predict the price of a house based on its age, location, and size. In this case, the age, location, and size are the features and the price is the target.
Unsupervised learning
Unsupervised learning does not involve labels for observations. An unsupervised learning algorithm finds the underlying structure or patterns among a set of observations.
An example can be a retail business dividing its customers into groups based on their shopping behavior. There is no label associated with customers. Instead, unsupervised learning algorithm is expected to find such labels.
Reinforcement learning
Reinforcement learning works based on an action-reward principle. An agent learns to reach a goal by iteratively calculating the reward of its actions.
It can be viewed as learning from interaction. In that sense, reinforcement learning is similar to how we learn from our mistakes. An agent interacts with the environment to reach its goal and evaluates the outcome its actions. Common applications of reinforcement learning are computer games and robotics.
Classification
Classification is a supervised learning technique that deals with discrete (or categorical) target variables. For instance, detecting whether an email is spam is a classification task. It is also called a binary classification since the target variable has only two possible values, spam or not. If the target variable contains more than two values (i.e. classes), it is known as multi-class classification.
There are several evaluation metrics for classification algorithms. Some of the commonly used ones are classification accuracy, log loss, precision, recall, and AUC.
Regression
Regression is a supervised learning technique in which the target variable is continuous. A typical example can be house price prediction.
In a regression tasks, machine learning algorithms are evaluated based on how close the predicted values are to the actual values. Some of the commonly used evaluation metrics for regression are mean absolute error (MAE), mean absolute percentage error (MAPE), and mean squared error (MSE).
Clustering
Clustering is an unsupervised learning technique. It groups observations in a way that observations in the same group are more similar to each other than to the observations in other groups. The groups are called clusters.
Unlike classification, the observations (i.e. data points) in clustering do not have labels. Grouping customers into clusters based on their shopping behavior is an example for clustering.
Overfitting
Machine learning algorithms learn by training with data. The algorithms are so powerful that they can learn each and every detail of the training data. However, we do not want this.
The main purpose of training a model is to be able to use them on new, previously unseen data (e.g. test data). There will always be small differences between the training data and test data. Thus, a model should generalize well instead of memorizing the details of training data.
Overfitting arises when a model tries to fit the training data so well that it cannot generalize to new observations. Well generalized models perform better on new observations. If a model is more complex than necessary, it is highly likely we end up with overfitting.
Underfitting
In a sense, underfitting is the opposite of overfitting. An underfit model fails to capture enough detail on training data. Thus, it does not perform well on either training or test data.
Bias
Bias **** is a measure of how far the average prediction is away from the real values. Bias arises if we try to model a complex relation with a very simple model.
The predictions of a model with high bias are very similar. Since it is not sensitive to the variations within data, the accuracy of the model is very low on both training data and test data. The models with high bias are likely to result in underfitting.
Variance
Variance **** is the opposite of bias in terms of being sensitive to the changes within data. A model with high variance is sensitive to very small changes or noise in the training data. An outlier is likely to be captured by the model as well.
Models with high variance perform well on training data but fail on test data. Thus, they result in overfitting.
L1 regularization
Excessive model complexity causes overfitting. Thus, controlling the complexity is a solution to overfitting.
Regularization controls the complexity by penalizing higher terms in the model. If a regularization terms is added, the model tries to minimize both loss and complexity of model.
L1 regularization forces the weights of uninformative features to be zero by substracting a small amount from the weight at each iteration and thus making the weight zero, eventually. Lasso regression uses L1 regularization.
L2 regulatization
It is another technique to control model complexity. Ridge regression uses L2 regularization.
L2 regularization forces weights toward zero but it does not make them exactly zero. L2 regularization acts like a force that removes a small percentage of weights at each iteration. Therefore, weights will never be equal to zero.
Log loss
Log loss is a commonly used evaluation metric for classification models. Classification algorithms usually output probability values for an observation belonging to a particular class. Class label is assigned as the label with the highest probability.
Log loss evaluates a model based on the probability values instead of the assigned labels. Thus, it provides a more decent and thorough evaluation.
For instance, -log(0.9) is equal to 0.10536 and -log(0.8) is equal to 0.22314. Thus, being 90% sure results in a lower log loss than being 80% sure. However, being 90% or 80% sure does not make any difference in terms of classification accuracy.
Conclusion
We have covered some of the key terms and concepts in machine learning. There is much more to learn but the basics are always important. They lay the ground for what comes next.
Having a comprehensive understanding of the basic will be helpful for your journey in machine learning.
Thank you for reading. Please let me know if you have any feedback.