How to Find Linear (SVMs) and Quadratic Classifiers using MATLAB

A quick guide (with images) to using YALMIP to find separating hyperplanes and quadric hyper-surfaces for data classification

Ewe Zi Yi
Towards Data Science

--

A pretty smart looking elliptic hyperboloid

Ever wondered if linear classifiers can be generalised to other shapes that aren’t just a single boring-looking flat plane (or hyperplanes for that matter)?

Well yes! Welcome to the world of quadratic classifiers where data points of two or more classes are separated by a quadric surface!

Now before we go into all that, let’s take a look at what linear classifiers are, and how we can model them with MATLAB and YALMIP, an optimisation package developed by Johan Löfberg. I’ll then move over to quadratic classifiers and how you can model them in the second half of this article.

*Some parts of this article may appear a little too mathematical but I’ll try to keep them simple as much as possible to spend more time on the programming part.

We All Begin with Data

We start off with some random data points, each with 3 dimensions in the Euclidean space. These points come from 2 different classes (X & Y), and we have 5 points from each class.

% Define some key problem parameters
nDimension = 3;
nVariable = 5;
% Generate some random numbers
X = randn(nDimension, nVariable);
Y = randn(nDimension, nVariable);

*You can always generalise the problem to higher dimensions, more classes, and more data points but we’ll keep these values small in this article to keep things simple and easy to visualise.

Random data points that we have just generated in 3D-space

The idea then is to find a classifier, using information from the data points which we are given, to split the entire space (a 3D Euclidean space in this case) into two, where all points lying on one side belong to the class X, and those on the other in the second class Y.

But how do we choose a border to separate the entire space? A quick and intuitive way is to use a plane.

Separating Hyperplanes

A hyperplane refers to a subspace that has a dimension that is one less than that of the space it exists in. In other words, a hyperplane would have n-1 dimensions in an n-dimension space.

For example, in a 2D-space, a hyperplane would be a 1D-line, whereas, in a 3D-space, a hyperplane would simply be a 2D-flat plane.

Visualisations of what a hyperplane is (Image: DeepAI)

Going back to our problem, we’d like to construct a hyperplane to separate the entire space into two. In particular, we want the hyperplane (defined solely by the vector a and the scalar b) to satisfy the following equations:

Equations that define a separating hyperplane

where a and b, a vector and a scalar respectively, are to be determined.

As you can see, this assumes that the coordinates of the data points have some linear relationship between them.

However, as our instinct would probably have told us by now, it’s not always possible to identify a hyperplane that would perfectly separate the entire space in such a way that only the points belonging to X lie on one side and those belonging to Y on the other.

Nonetheless, even if this was really the case, we’d still like to find the best hyperplane that somewhat splits the space into two, even if it means having some points ending up on the wrong side.

For this we’ll need to introduce some error variables (one for each data point) in the equations to allow for some leeway in defining our hyperplane:

Equations for a less than perfect hyperplane

Similar to a and b, these error variables are up to us to determine, and are what we call decision variables in optimisation parlance.

With our problem now defined this way, the best hyperplane can then be said to be the one that best reduces the sum of these errors. Hence, the goal of our problem can be succinctly rewritten in the form:

A linear programme that when solved, provides a separating hyperplane

As some of you must have noticed by now, the problem we have defined above is called a linear programme because its objective function (the minimisation function) and its constraints (all the other equations/inequalities) are all linear.

Linear programming with MATLAB

Going back to where we left off from MATLAB, we’d like to use YALMIP to solve the linear programme that we have defined, in order to obtain a separating hyperplane.

We begin by defining the decision variables in our linear programme. This requires creating an sdpvar object for each of them, which YALMIP will then recognise as decision variables:

% Hyperplane variables
a = sdpvar(nDimension, 1, 'full');
b = sdpvar(1);
% Error variables for points in X and in Y
xError = sdpvar(1, nVariable);
yError = sdpvar(1, nVariable);

Next, we move on to define the constraints in our problem:

% Error variables should be above 0
constraints = [xError >= 0, yError >=0];
% Hyperplane constraints
constraints = [constraints, a'*X+b <= -(1-xError), a'*Y+b >= 1-yError];

Lastly, we need to specify our objective function:

% Minimise error values for all points in X and in Y
objective = sum(xError) + sum(yError);

With our problem defined, all that’s left is to solve the problem! We do this by invoking the optimize() function.

% Solve the linear programme
diagnosis = optimize(constraints, objective);
disp(diagnosis.info); % Success/failure report

Retrieving the values of the decision variables and the optimal objective value is easy; their optimal values are stored in the exact same object that they were created.

However, we need to convert them from sdpvariables to actual values to access their optimal values:

a = value(a);
disp('a =');
disp(a);
b = value(b);
disp('b =');
disp(b);
objective = value(objective);
disp('Optimal objective =');
disp(objective);

If a perfect hyperplane can indeed be found for the random data points that you have generated, you should be getting something like this where the optimal objective value is exactly 0:

Otherwise, an objective value anywhere above 0 would indicate that the hyperplane found does not perfectly separate the 3D-space into 2 half-spaces containing only data points from X or Y in each side.

Plotting the hyperplane

Where’s the fun in solving the problem if we don’t get to see anything?

Since the example that I’m using is pretty simple — 3 dimensional with only 10 data points from 2 different classes, it’s easy (and feasible) to plot the results we’ve obtained from solving the linear programme onto a graph.

Let’s begin by plotting the hyperplane first.

Recall the two equations the hyperplane needs to satisfy? Well, in order to plot the actual separating hyperplane, we need only plot the general equation of the hyperplane instead:

General equation of a hyperplane

To do so, we’ll generate some dummy x-coordinate and y-coordinate values, and then compute their respective z-coordinate values by solving the equation above. In MATLAB, this would look something like this:

% Generate x and y dummy data points
[xValuesDummy,yValuesDummy] = meshgrid(-4:0.1:4);
% Solve for z
zValuesDummy = -1/a(3)*(a(1)*xValuesDummy + a(2)*yValuesDummy + b);
% Plot the hyperplane
surf(xValuesDummy, yValuesDummy, zValuesDummy, 'FaceAlpha', 0.5, 'FaceColor', 'blue')
% Holds the figure before displaying
hold on;

Next, we need to retrieve the coordinate values of each data point along each axis and store them in their corresponding arrays. We’d also like to select different colours to plot the data points from the two classes.

% Retrieve values of each data point along each axis
xValues = [X(1,:) Y(1,:)];
yValues = [X(2,:) Y(2,:)];
zValues = [X(3,:) Y(3,:)];
% Create different colours for points from different classes
Colour = repmat([1,10],nVariable,1);
colour = Colour(:);
% Plot the data points
scatter3(xValues.', yValues.', zValues.', 100, colour,'filled');

Once this is done, you should be able to produce nice 3D plots like these:

A perfect separating hyperplane
A hyperplane that fails to separate the data points into 2 separate half-spaces

Now, what if you don’t want to use a hyperplane to split the space into two halves, especially when your data doesn’t seem to be linearly distributed?

Well, we can try to find a non-linear relationship between our data points and one such way is to consider a quadratic function that separates them!

Separating Quadric Surfaces

What are quadric surfaces? Well simply put, they are generalised forms of 2D conic sections (ellipses, hyperbolas and parabolas).

I hope you find these more cool-looking than hyperplanes (Image: Saeid Pashazadeh & Mohsen Sharifi)

As you might have thought, these shapes would perhaps be more fitting in certain instances to fit certain data, so let’s try them out!

Similar to how we came up with the equations to define a separating hyperplane, we need to look for a symmetric matrix A, a vector b, and a scalar c that satisfies these quadratic equations:

Equations that define a separating quadric surface

Likewise, we need to include error variables to allow our model to fit datasets that aren’t separable by a quadric surface:

Equations for a not-so-perfect separating quadric surface

With these equations, we can now define our new problem, just as in the hyperplane case:

Problem to solve to find a quadric surface

Solving with MATLAB

Now that our decision variables no longer look like those that we have used in our hyperplane problem, let’s see how we should define them so that they can be solved using YALMIP:

% Quadric surface variables
A = sdpvar(nDimension, nDimension, 'symmetric');
b = sdpvar(nDimension, 1);
c = sdpvar(1);
% Error variables for points in X and in Y
xError = sdpvar(1, nVariable);
yError = sdpvar(1, nVariable);

Next, we have to define the constraints in our problem:

% Error variables should be above 0
constraints = [xError >= 0, yError >=0];
% Quadric surface constraints
constraints = [constraints, diag(X'*A*X)'+b'*X+c<= -(1-xError), diag(Y'*A*Y)'+b'*Y+c >= 1-yError]; % We are only concerned with the diagonal entries of the nVariable x nVariable matrix

Lastly, we specify the objective function of the problem:

% Minimise average error values for all points in X and in Y
objective = sum(xError) + sum(yError);

Now, it’s time to solve the problem using the same function optimize(), which we used in the hyperplane problem:

diagnosis = optimize(constraints, objective);disp(diagnosis.info); % Success/failure report

Upon successful completion of the algorithm, we retrieve the optimal decision variables and the optimal objective value.

A perfect separating quadric surface is found!

Plotting the quadric surface

A hyperboloid that separates the two data classes

Unlike how we have plotted hyperplanes in the previous section, we’ll need to adopt a slightly different approach to plot quadric surfaces.

We begin by generating dummy x, y and z values across the entire plot area, which will then be used to solve for the general quadric surface equation when using the function isosurface(). lhs represents the left-hand side of the equation which will be used as the 4th argument of the function, while 0 represents the right-hand side of the equation used as the 5th argument.

General equation of a quadric surface
% Generate x, y and z dummy data points
[xValuesDummy,yValuesDummy,zValuesDummy]= meshgrid(-5:0.05:5);
q1 = A(1,1)*xValuesDummy+A(2,1)*yValuesDummy+A(3,1)*zValuesDummy;
q2 = A(1,2)*xValuesDummy+A(2,2)*yValuesDummy+A(3,2)*zValuesDummy;
q3 = A(1,3)*xValuesDummy+A(2,3)*yValuesDummy+A(3,3)*zValuesDummy;
lhs = q1.*xValuesDummy+q2.*yValuesDummy+q3.*zValuesDummy+b(1)*xValuesDummy+b(2)*yValuesDummy+b(3)*zValuesDummy + c; isosurface(xValuesDummy,yValuesDummy,zValuesDummy,lhs,0);hold on;

Last but not least, we’ll plot the individual data points found in both classes:

% Plot data points
xValues = [X(1,:) Y(1,:)];
yValues = [X(2,:) Y(2,:)];
zValues = [X(3,:) Y(3,:)];
Colour = repmat([1,10],nVariable,1);
colour = Colour(:);
scatter3(xValues', yValues', zValues', 25, colour,'filled');

And voilà you are done! This should allow you to generate a variety of quadric surfaces that will separate your data points in eye-catching plots such as these:

2-sheet hyperboloid
Hyperbolic paraboloid (or saddle)
Hyperbolic cylinder

Of course, don’t be too alarmed if any of the quadric surfaces generated fail to perfectly separate all the data points. Just like in the hyperplane case, it’s not always possible to find a perfect solution to fit every possible dataset.

Conclusion

I’ve just showed you how MATLAB and YALMIP can be used to not only find but also plot separating hyperplanes and quadric surfaces for data classification.

It’s important to keep in mind however, that the examples that I have presented are really simple and can certainly be generalised to higher dimensions and numbers. Though they can still be solved easily, it might be difficult to visualise all the data in the same way if it’s greater than three dimensions.

I hope you’ve had fun playing around with the various functions and plots and thanks a lot for reading my article if you managed to make it all the way here!

--

--

Just an engineering student trying to figure out what he’d like to dedicate his life to