
1. Introduction
Machine learning is a sub-field of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
For the process of learning (model fitting) we need to have available some observations or data (also known as samples or examples) in order to explore potential underlying patterns, hidden in our data. These learned patterns are nothing more that some functions or decision boundaries.
These patterns are learned by the systems (computer systems) automatically without human intervention or input.
- Join the ‘Data Science Hub‘ on my Patreon and get exclusive access to one-on-one consultations, detailed responses to your questions, and curated resources. My goal is simple, to offer value that aligns with my expertise and that my audience will find useful. patreon.com/TheDataScienceHub
2. The main machine learning categories
Machine learning algorithms are usually categorized as supervised or unsupervised.
2.1 Supervised machine learning algorithms/methods

For this family of models, the research needs to have at hand a dataset with some observations and the labels/classes of the observations. For example, the observations could be images of animals and the labels the name of the animal (e.g. cat, dog etc).
These models learn from the labeled dataset and then are used to predict future events. For the training procedure, the input is a known training data set with its corresponding labels, and the learning algorithm produces an inferred function to finally make predictions about some new unseen observations that one can give to the model. The model is able to provide targets for any new input after sufficient training. The learning algorithm can also compare its output with the correct intended output (ground truth label) and find errors in order to modify itself accordingly (e.g. via back-propagation).
Supervised models can be further grouped into regression and Classification cases:
- Classification: A classification problem is when the output variable is a category e.g. "disease" / "no disease".
- Regression: A regression problem is when the output variable is a real continuous value e.g. stock price prediction
Some examples of models that belong to this family are the following: SVC, LDA, SVR, regression, random forests etc.
2.2 Unsupervised machine learning algorithms/methods

For this family of models, the research needs to have at hand a dataset with some observations without the need of having also the labels/classes of the observations.
Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled data. The system doesn’t predict the right output, but instead, it explores the data and can draw inferences from datasets to describe hidden structures from unlabeled data.
Unsupervised models can be further grouped into Clustering and association cases.
- Clustering: A clustering problem is where you want to unveil the inherent groupings in the data, such as grouping animals based on some characteristics/features e.g. number of legs.
- Association: An association rule learning is where you want to discover association rules such as people that buy X also tend to buy Y.
Some examples of models that belong to this family are the following: PCA, K-means, DBSCAN, mixture models etc.
2.3 Semi-supervised machine learning algorithms/methods
This family is between the supervised and unsupervised learning families. The semi-supervised models use both labeled and unlabeled data for training.
2.4 Reinforcement machine learning algorithms/methods

This family of models consists of algorithms that use the estimated errors as rewards or penalties. If the error is big, then the penalty is high and the reward low. If the error is small, then the penalty is low and the reward high.
Trial error search and delayed reward are the most relevant characteristics of reinforcement learning. This family of models allows the automatic determination of the ideal behavior within a specific context in order to maximize the desired performance.
Reward feedback is required for the model to learn which action is best and this is known as "the reinforcement signal".
Some examples of models that belong to this family is the Q-learning.
3. Summary
Supervised: All the observations in the dataset are labeled and the algorithms learn to predict the output from the input data.
Unsupervised: All the observations in the dataset are unlabeled and the algorithms learn to inherent structure from the input data.
Semi-supervised: Some of the observations of the dataset arelabeled but most of them are usually unlabeled. So, a mixture of supervised and unsupervised methods are usually used.
Using Machine Learning (ML) models we are able to perform analyses of massive quantities of data. Data patterns that would be impossible to identified by a human being, can be accurately extracted using these ML models within seconds (in some cases). However, most of the times, accurate results (good models) usually require a lot of time and resources for the model training (the procedure under which the model learns a function or a decision boundary).
That’s all folks ! Hope you liked this article!
Stay tuned & support this effort
If you liked and found this article useful, follow me to be able to see all my new posts.
Questions? Post them as a comment and I will reply as soon as possible.
Get in touch with me
- LinkedIn: https://www.linkedin.com/in/serafeim-loukas/
- ResearchGate: https://www.researchgate.net/profile/Serafeim_Loukas
- EPFL profile: https://people.epfl.ch/serafeim.loukas
- Stack Overflow: https://stackoverflow.com/users/5025009/seralouk