
Our dataset
Using PCA, we can reduce our features from (n) down to either 2 or 3 dimensions which can then be plotted. We will start by looking at our dataset as we downloaded from kaggle.
We can see 4 attributes, which is super for predictions, but does not allow us to plot a visualisation.

Understanding
When we use PCA, our main aim is to take our multinomial features (x1, x2, x3, …xn) to a 2 dimension (z1,z2). Figure 1 below is a simplified sample of converting am2 dimension (x1,x2) down to a 1 dimension (Z).
Note how (similar to gradient descent) we calculate a projection error while projecting x1 & x2 onto Z.

The goal of PCA is to find a direction (a vector) onto which to project the data so as to minimize the projection error.
Now that we have a basic understanding, lets get coding…
Loading our dataset
Let’s load the data and be sure to normalise. In general, we normalise for a better ML performance. Lets also use grp2idx to convert the Y labels into numbers.
%% Load our data
clear;
tbl = readtable('IRIS.csv');
[m,n] = size(tbl);
X = tbl{:,1:n-1};
[y,labels] = grp2idx(tbl{:,n});
[X_norm, mu, sigma] = featureNormalize(X);
Run PCA
The PCA function returns the principal component coefficients, also known as loadings, for the m-by-n data matrix X
. Armed with the coefficient, we can now project "Z".
%% use pca to reduce to 2 dimensions
K = 2;
[coef, S] = pca(X_norm);
As in figure 3, our coefficient will be a 4×4 matrix since we have 4 features.

Now let’s create a coef_reduce with only our prefered number of columns (K). Use simple matrix multiplication reduce our features from 4 to 2.
% calculate Z with 2 features
coef_reduce = coef(:, 1:K);
Z = X_norm * coef_reduce;
And that’s it, we had a X_norm matrix with 4 features, but have now reduced that to 2 features as stored in the Z matrix.

Plotting the result
Plotting now becomes a fairly standard process in Matlab. Since we have a multivariate dataset, let’s create a pallet to color each category a different color.
%% Create colors pallette
figure;
palette = hsv(K + 1);
colors = palette(y, :);
% Plot the data, with different Y labels being different colors
scatter(Z(:,1), Z(:,2), 30, colors);
title('Pixel dataset plotted in 2D, using PCA for dimensionality reduction');
And Kaboom… we have plotted our dataset.

Conclusion
So, as we can see, plotting multinomial datasets is not too hard. Naturally, you can’t make sense of individual features as you would when plotting 2 features, but we can get a good idea if our dataset labelling makes sense. This is very useful during categorisation algorithms such as logistics regression or K means.
ps. Below is the featureNormalize function
function [X_norm, mu, sigma] = featureNormalize(X)
mu = mean(X);
X_norm = bsxfun(@minus, X, mu);
sigma = std(X_norm);
X_norm = bsxfun(@rdivide, X_norm, sigma);
end