
Support Vector Machine (SVM) is a widely-used supervised Machine Learning algorithm. It is mostly used in classification tasks but suitable for regression tasks as well.
In this post, we dive deep into two important hyperparameters of SVMs, C and gamma, and explain their effects with visualizations. So I will assume you have a basic understanding of the algorithm and focus on these hyperparameters.
SVM separates data points that belong to different classes with a decision boundary. When determining the decision boundary, a soft margin SVM (soft margin means allowing some data points to be misclassified) tries to solve an optimization problem with the following goals:
- Increase the distance of decision boundary to classes (or support vectors)
- Maximize the number of points that are correctly classified in the training set

There is obviously a trade-off between these two goals which and it is controlled by C which adds a penalty for each misclassified data point.
If C is small, the penalty for misclassified points is low so a decision boundary with a large margin is chosen at the expense of a greater number of misclassification.
If C is large, SVM tries to minimize the number of misclassified examples due to the high penalty which results in a decision boundary with a smaller margin. The penalty is not the same for all misclassified examples. It is directly proportional to the distance to the decision boundary.
It will be more clear after the examples. Let’s first import the libraries and create a synthetic dataset.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.svm import SVC
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=200, n_features=2,
n_informative=2, n_redundant=0, n_repeated=0, n_classes=2,random_state=42)
plt.figure(figsize=(10,6))
plt.title("Synthetic Binary Classification Dataset", fontsize=18)
plt.scatter(X[:,0], X[:,1], c=y, cmap='cool')

We will first train a linear SVM which only requires to tune C. Then we will implement an SVM with RBF kernel and also tune the gamma parameter.
To plot the decision boundaries, we will be using the function from the SVM chapter of the Python Data Science Handbook by Jake VanderPlas.
We can now create two linear SVM classifiers with different C values.
clf = SVC(C=0.1, kernel='linear').fit(X, y)
plt.figure(figsize=(10,6))
plt.title("Linear kernel with C=0.1", fontsize=18)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='cool')
plot_svc_decision_function(clf)

Just change the C value as 100 to produce the following plot.

When we increase the C value, the margin gets smaller. Thus, the models with low C values tend to be more generalized. The difference becomes more clear with larger datasets.
The effects of hyperparameters only reach to a certain extent with linear kernels. The influence of hyperparameters becomes more visible with non-linear kernels.
Gamma is a hyperparameter used with non-linear SVM. One of the most commonly used non-linear kernels is the radial basis function (RBF). Gamma parameter of RBF controls the distance of the influence of a single training point.
Low values of gamma indicate a large similarity radius which results in more points being grouped together. For high values of gamma, the points need to be very close to each other in order to be considered in the same group (or class). Therefore, models with very large gamma values tend to overfit.
Let’s plot the predictions of three SVMs with different gamma values.
clf = SVC(C=1, kernel='rbf', gamma=0.01).fit(X, y)
y_pred = clf.predict(X)
plt.figure(figsize=(10,6))
plt.title("Predictions of RBF kernel with C=1 and Gamma=0.01", fontsize=18)
plt.scatter(X[:, 0], X[:, 1], c=y_pred, s=50, cmap='cool')
plot_svc_decision_function(clf)

Just change the gamma values to produce the following plots.


As the gamma values increase, the model is becoming overfit. The data points need to be very close in order to be grouped together because the similarity radius decreases with an increasing gamma value.
The accuracies of RBF kernels on this dataset with gamma values 0.01, 1, and 5 are 0.89, 0.92, and 0.93, respectively. These values indicate that models are overfitting to the training set as the gamma values increase.
Gamma vs C parameter
For a linear kernel, we just need to optimize the c parameter. However, if we want to use an RBF kernel, both c and gamma parameters need to optimized simultaneously. If gamma is large, the effect of c becomes negligible. If gamma is small, c affects the model just like how it affects a linear model. Typical values for c and gamma are as follows. However, specific optimal values may exist depending on the application:
0.0001 < gamma < 10
0.1 < c < 100