Introduction
In this article, I’ll bring to you a walkthrough on the concept of a Support Vector Machine (SVM) with some intuitive examples, with technicalities aside as far as possible.
A Classification Example
The Linear Case
Suppose we have the following set of data points with two classes – blue squares and orange circles:

Now we want to determine the optimal decision boundary that partitions the data. The decision boundary will lie between the above two classes of data, and can take the form of:

or:

or even:

Now which one might be the optimal one? Note that not having an optimized decision boundary may result in a larger number of misclassifications when new data are introduced.
Let’s work with the last example above. Suppose that we define arbitrary separation lines between the two classes:

The point(s) which lie on the green colored separation lines are known as ‘support vectors’. They are data points that the margin pushes up against, or points closest to the other class. In other words, we are only concerned with the support vectors in the algorithm, and all other training data points are not of primary concern. To be precise, we want the SVM algorithm to be able to look at extreme data points which are close to the other class, instead of the margins. The maximum margin is the sum of the closest distances of the support vectors to the decision boundary hyperplane. Note that the red line is called a hyperplane for the case of a SVM as the SVM can handle multi-dimensional data, where the data points are known as vectors because they have coordinates within this multi-dimensional space of data.
But what happens when the data points are not linearly separable (i.e. nonlinear) like in the above example? That’s when the nonlinear SVM comes in.
The Nonlinear Case
Consider the following example where the data points have some sort of quadratic (nonlinear) pattern:

To obtain a linear mapping of the data, we can apply a transformation to the hyperplane (red line) to arrive at the 1-dimensional space above. The hyperplane is now a parabola.
However, such transformations can be computationally expensive and may take a long time to run, especially with large datasets. Fortunately, there is a technique called the ‘kernel trick‘. The naming of this approach arose from the usage of kernel functions that enable operations in high-dimensional vector spaces without the need to compute coordinates for data points in the spaces. Instead, they compute the dot products of the vectors in the feature spaces. Using this ‘trick’ results in all points being mapped into a higher-dimensional space via some transformation. The gist of the approach is to transform a nonlinear space into a linear one, that’s about it!
Some popular kernel functions include the linear kernel, polynomial kernel, and radial basis function (RBF) kernel. A downside to the kernel trick in SVMs is that choosing the right kernel is non-trivial and some hyperparameter tuning have to be performed. Fortunately for us, popular Python libraries like Scikit-learn has implemented the SVM and hyperparameter tuning modules for use with intuitive integrations. Extensive documentation on the Python implementations can be found [here](https://scikit-learn.org/stable/modules/grid_search.html) for SVMs and here for hyperparameter tuning in Scikit-learn.
What About The Regression Case?
To perform regression with SWMs, all that is needed is to reverse the objective – instead of trying to fit the largest possible margin (Figure 5) in classification, for regression we try to fit as many instances on the space defined by the margin:

What’s Next
I will be introducing more on SVMs in later articles, to state their advantages and disadvantages, and more practical examples along the way. So stay tuned!
References
[1] Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction. 2nd ed. New York: Springer.
[2] Hyperparameter tuning approaches. Retrieved from: https://scikit-learn.org/stable/modules/grid_search.html.
[3] Support Vector Machines in Scikit-learn. Retrieved from: https://scikit-learn.org/stable/modules/svm.html#svm.
Remarks
- Figures 1 to 7were hand-drawn by myself, so they need not be cited.