# Andrew Ng’s Machine Learning Course in Python (Anomaly Detection)

This is the last part of Andrew Ng’s Machine Learning Course python implementation and I am very excited to finally complete the series. To give you guys some perspective, it took me a month to convert these codes to python and writes an article for each assignment. If any of you were hesitating to do your own implementation, be it in Python, R or Java, I strongly recommend you to go for it. Coding these algorithms from scratch not only reinforce the concepts taught, you will also get to practice your data science programming skills in the language you are comfortable with.

With that said, let’s dive into the last programming assignment

In this part of the assignment, we will implement an anomaly detection algorithm using the Gaussian model to detect anomalous behavior in a 2D dataset first and then a high-dimensional dataset.

Loading relevant libraries and the dataset

import numpy as np

import matplotlib.pyplot as plt

from scipy.io import loadmat

mat = loadmat("ex8data1.mat")

X = mat["X"]

Xval = mat["Xval"]

yval = mat["yval"]

Visualizing the data

plt.scatter(X[:,0],X[:,1],marker="x")

plt.xlim(0,30)

plt.ylim(0,30)

plt.xlabel("Latency (ms)")

plt.ylabel("Throughput (mb/s)")

To estimate parameters (mean and variance) for the Gaussian model

def estimateGaussian(X):

"""

This function estimates the parameters of a Gaussian distribution using the data in X

"""

m = X.shape[0]

#compute mean

sum_ = np.sum(X,axis=0)

mu = 1/m *sum_

# compute variance

var = 1/m * np.sum((X - mu)**2,axis=0)

return mu,var

mu, sigma2 = estimateGaussian(X)

Multivariate Gaussian Distribution is an optional lecture in the course and the code to compute the probability density is given to us. However, in order for me to proceed on with the assignment, I need to write the `multivariateGaussian `

function from scratch.

def multivariateGaussian(X, mu, sigma2):

"""

Computes the probability density function of the multivariate gaussian distribution.

"""

k = len(mu)

sigma2=np.diag(sigma2)

X = X - mu.T

p = 1/((2*np.pi)**(k/2)*(np.linalg.det(sigma2)**0.5))* np.exp(-0.5* np.sum(X @ np.linalg.pinv(sigma2) * X,axis=1))

return p

p = multivariateGaussian(X, mu, sigma2)

Some of the interesting functions we had utilized here are from numpy linear algebra class. The official documentation can be found here.

Once we estimate the Gaussian parameters and obtain the probability density of the data, we can visualize the fit.

plt.figure(figsize=(8,6))

plt.scatter(X[:,0],X[:,1],marker="x")

X1,X2 = np.meshgrid(np.linspace(0,35,num=70),np.linspace(0,35,num=70))

p2 = multivariateGaussian(np.hstack((X1.flatten()[:,np.newaxis],X2.flatten()[:,np.newaxis])), mu, sigma2)

contour_level = 10**np.array([np.arange(-20,0,3,dtype=np.float)]).T

plt.contour(X1,X2,p2[:,np.newaxis].reshape(X1.shape),contour_level)

plt.xlim(0,35)

plt.ylim(0,35)

plt.xlabel("Latency (ms)")

plt.ylabel("Throughput (mb/s)")

I had not explained the process of creating a contour plot before as most are quite straight forward. If you have difficulties following along, this article here might help. In simpler terms, we first create a meshgrid around the data region and compute the Z-axis. `plt.contour`

then create the contour plot using the 3 axes (X, Y, Z).

Now to select a threshold that will flag an example as anomalies.

def selectThreshold(yval, pval):

"""

Find the best threshold (epsilon) to use for selecting outliers

"""

best_epi = 0

best_F1 = 0

stepsize = (max(pval) -min(pval))/1000

epi_range = np.arange(pval.min(),pval.max(),stepsize)

for epi in epi_range:

predictions = (pval<epi)[:,np.newaxis]

tp = np.sum(predictions[yval==1]==1)

fp = np.sum(predictions[yval==0]==1)

fn = np.sum(predictions[yval==1]==0)

# compute precision, recall and F1

prec = tp/(tp+fp)

rec = tp/(tp+fn)

F1 = (2*prec*rec)/(prec+rec)

if F1 > best_F1:

best_F1 =F1

best_epi = epi

return best_epi, best_F1

pval = multivariateGaussian(Xval, mu, sigma2)

epsilon, F1 = selectThreshold(yval, pval)

print("Best epsilon found using cross-validation:",epsilon)

print("Best F1 on Cross Validation Set:",F1)

In case you have not noticed, F1 score is used here instead of accuracy as the dataset is highly unbalanced. To learn more about the various way to evaluate a machine learning’s model performance, this article sums up the topic quite well.

Visualizing the optimal threshold

plt.figure(figsize=(8,6))

# plot the data

plt.scatter(X[:,0],X[:,1],marker="x")

# potting of contour

X1,X2 = np.meshgrid(np.linspace(0,35,num=70),np.linspace(0,35,num=70))

p2 = multivariateGaussian(np.hstack((X1.flatten()[:,np.newaxis],X2.flatten()[:,np.newaxis])), mu, sigma2)

contour_level = 10**np.array([np.arange(-20,0,3,dtype=np.float)]).T

plt.contour(X1,X2,p2[:,np.newaxis].reshape(X1.shape),contour_level)

# Circling of anomalies

outliers = np.nonzero(p<epsilon)[0]

plt.scatter(X[outliers,0],X[outliers,1],marker ="o",facecolor="none",edgecolor="r",s=70)

plt.xlim(0,35)

plt.ylim(0,35)

plt.xlabel("Latency (ms)")

plt.ylabel("Throughput (mb/s)")

As for high dimensional dataset, we just have to follow the exact same steps as before

mat2 = loadmat("ex8data2.mat")

X2 = mat2["X"]

Xval2 = mat2["Xval"]

yval2 = mat2["yval"]

# compute the mean and variance

mu2, sigma2_2 = estimateGaussian(X2)

# Training set

p3 = multivariateGaussian(X2, mu2, sigma2_2)

# cross-validation set

pval2 = multivariateGaussian(Xval2, mu2, sigma2_2)

# Find the best threshold

epsilon2, F1_2 = selectThreshold(yval2, pval2)

print("Best epsilon found using cross-validation:",epsilon2)

print("Best F1 on Cross Validation Set:",F1_2)

print("# Outliers found:",np.sum(p3<epsilon2))

The second part of the assignment involved implementing a collaborative filtering algorithm to build a recommender system for movie ratings.

Loading and visualization of the movie ratings dataset

mat3 = loadmat("ex8_movies.mat")

mat4 = loadmat("ex8_movieParams.mat")

Y = mat3["Y"] # 1682 X 943 matrix, containing ratings (1-5) of 1682 movies on 943 user

R = mat3["R"] # 1682 X 943 matrix, where R(i,j) = 1 if and only if user j give rating to movie i

X = mat4["X"] # 1682 X 10 matrix , num_movies X num_features matrix of movie features

Theta = mat4["Theta"] # 943 X 10 matrix, num_users X num_features matrix of user features

# Compute average rating

print("Average rating for movie 1 (Toy Story):",np.sum(Y[0,:]*R[0,:])/np.sum(R[0,:]),"/5")

The print statement will print: `Average rating for movie 1 (Toy Story): 3.8783185840707963 /5`

plt.figure(figsize=(8,16))

plt.imshow(Y)

plt.xlabel("Users")

plt.ylabel("Movies")

Going into the algorithm proper, we start with computing the cost function and gradient

def cofiCostFunc(params, Y, R, num_users, num_movies, num_features, Lambda):

"""

Returns the cost and gradient for the collaborative filtering problem

"""

# Unfold the params

X = params[:num_movies*num_features].reshape(num_movies,num_features)

Theta = params[num_movies*num_features:].reshape(num_users,num_features)

predictions = X @ Theta.T

err = (predictions - Y)

J = 1/2 * np.sum((err**2) * R)

#compute regularized cost function

reg_X = Lambda/2 * np.sum(Theta**2)

reg_Theta = Lambda/2 *np.sum(X**2)

reg_J = J + reg_X + reg_Theta

# Compute gradient

X_grad = err*R @ Theta

Theta_grad = (err*R).T @ X

grad = np.append(X_grad.flatten(),Theta_grad.flatten())

# Compute regularized gradient

reg_X_grad = X_grad + Lambda*X

reg_Theta_grad = Theta_grad + Lambda*Theta

reg_grad = np.append(reg_X_grad.flatten(),reg_Theta_grad.flatten())

return J, grad, reg_J, reg_grad

Similar to the previous approach, the assignment requires us to compute the cost function, gradient, regularized cost function and then regularized gradient in separate steps. The code block above will allows you to follow the assignment step by step as long as you use the correct indexing.

To test our cost function,

# Reduce the data set size to run faster

num_users, num_movies, num_features = 4,5,3

X_test = X[:num_movies,:num_features]

Theta_test= Theta[:num_users,:num_features]

Y_test = Y[:num_movies,:num_users]

R_test = R[:num_movies,:num_users]

params = np.append(X_test.flatten(),Theta_test.flatten())

# Evaluate cost function

J, grad = cofiCostFunc(params, Y_test, R_test, num_users, num_movies, num_features, 0)[:2]

print("Cost at loaded parameters:",J)

J2, grad2 = cofiCostFunc(params, Y_test, R_test, num_users, num_movies, num_features, 1.5)[2:]

print("Cost at loaded parameters (lambda = 1.5):",J2)

Once we get our cost function and gradient going, we can start training our algorithm.

Loading of the movie list

# load movie list

movieList = open("movie_ids.txt","r").read().split("\n")[:-1]

# see movie list

np.set_printoptions(threshold=np.nan)

movieList

You can enter your own movie preference at this step but I used the exact same ratings as the assignment to keep it consistent.

# Initialize my ratings

my_ratings = np.zeros((1682,1))

# Create own ratings

my_ratings[0] = 4

my_ratings[97] = 2

my_ratings[6] = 3

my_ratings[11]= 5

my_ratings[53] = 4

my_ratings[63]= 5

my_ratings[65]= 3

my_ratings[68] = 5

my_ratings[82]= 4

my_ratings[225] = 5

my_ratings[354]= 5

print("New user ratings:\n")

for i in range(len(my_ratings)):

if my_ratings[i]>0:

print("Rated",int(my_ratings[i]),"for index",movieList[i])

To prepare our data before inputting into the algorithm, we need to normalize the ratings, set some random initial parameters, and use an optimizing algorithm to update the parameters.

def normalizeRatings(Y, R):

"""

normalized Y so that each movie has a rating of 0 on average, and returns the mean rating in Ymean.

"""

m,n = Y.shape[0], Y.shape[1]

Ymean = np.zeros((m,1))

Ynorm = np.zeros((m,n))

for i in range(m):

Ymean[i] = np.sum(Y[i,:])/np.count_nonzero(R[i,:])

Ynorm[i,R[i,:]==1] = Y[i,R[i,:]==1] - Ymean[i]

return Ynorm, Ymean

def gradientDescent(initial_parameters,Y,R,num_users,num_movies,num_features,alpha,num_iters,Lambda):

"""

Optimize X and Theta

"""

# unfold the parameters

X = initial_parameters[:num_movies*num_features].reshape(num_movies,num_features)

Theta = initial_parameters[num_movies*num_features:].reshape(num_users,num_features)

J_history =[]

for i in range(num_iters):

params = np.append(X.flatten(),Theta.flatten())

cost, grad = cofiCostFunc(params, Y, R, num_users, num_movies, num_features, Lambda)[2:]

# unfold grad

X_grad = grad[:num_movies*num_features].reshape(num_movies,num_features)

Theta_grad = grad[num_movies*num_features:].reshape(num_users,num_features)

X = X - (alpha * X_grad)

Theta = Theta - (alpha * Theta_grad)

J_history.append(cost)

paramsFinal = np.append(X.flatten(),Theta.flatten())

return paramsFinal , J_history

Once again, I chose batch gradient descent as my optimizing algorithm. One thing going through all the programming assignment in python taught me is that you would rarely go wrong with gradient descent. At this point, the code for gradient descent should be fairly familiar to you.

Y = np.hstack((my_ratings,Y))

R =np.hstack((my_ratings!=0,R))

# Normalize Ratings

Ynorm, Ymean = normalizeRatings(Y, R)

num_users = Y.shape[1]

num_movies = Y.shape[0]

num_features = 10

# Set initial Parameters (Theta,X)

X = np.random.randn(num_movies, num_features)

Theta = np.random.randn(num_users, num_features)

initial_parameters = np.append(X.flatten(),Theta.flatten())

Lambda = 10

# Optimize parameters using Gradient Descent

paramsFinal, J_history = gradientDescent(initial_parameters,Y,R,num_users,num_movies,num_features,0.001,400,Lambda)

Plotting of cost function to ensure gradient descent is working

plt.plot(J_history)

plt.xlabel("Iteration")

plt.ylabel("$J(\Theta)$")

plt.title("Cost function using Gradient Descent")

To make predictions on movies that you had not rated

# unfold paramaters

X = paramsFinal[:num_movies*num_features].reshape(num_movies,num_features)

Theta = paramsFinal[num_movies*num_features:].reshape(num_users,num_features)

# Predict rating

p = X @ Theta.T

my_predictions = p[:,0][:,np.newaxis] + Ymean

import pandas as pd

df = pd.DataFrame(np.hstack((my_predictions,np.array(movieList)[:,np.newaxis])))

df.sort_values(by=[0],ascending=False,inplace=True)

df.reset_index(drop=True,inplace=True)

print("Top recommendations for you:\n")

for i in range(10):

print("Predicting rating",round(float(df[0][i]),1)," for index",df[1][i])

Finally, I had come to an end for Andrew Ng’s Machine Learning Course in Python. I hope this is as beneficial to you as it is for me writing it and I thank all of you for the supports.

The Jupyter notebook will be uploaded to my GitHub at (https://github.com/Benlau93/Machine-Learning-by-Andrew-Ng-in-Python).

For other python implementation in the series,

- Linear Regression
- Logistic Regression
- Regularized Logistic Regression
- Neural Networks
- Support Vector Machines
- Unsupervised Learning

Thank you for reading.