# Andrew Ng’s Machine Learning Course in Python (Anomaly Detection)

This is the last part of Andrew Ng’s Machine Learning Course python implementation and I am very excited to finally complete the series. To give you guys some perspective, it took me a month to convert these codes to python and writes an article for each assignment. If any of you were hesitating to do your own implementation, be it in Python, R or Java, I strongly recommend you to go for it. Coding these algorithms from scratch not only reinforce the concepts taught, you will also get to practice your data science programming skills in the language you are comfortable with.

With that said, let’s dive into the last programming assignment

In this part of the assignment, we will implement an anomaly detection algorithm using the Gaussian model to detect anomalous behavior in a 2D dataset first and then a high-dimensional dataset.

`import numpy as npimport matplotlib.pyplot as pltfrom scipy.io import loadmat`
`mat = loadmat("ex8data1.mat")X = mat["X"]Xval = mat["Xval"]yval = mat["yval"]`

Visualizing the data

`plt.scatter(X[:,0],X[:,1],marker="x")plt.xlim(0,30)plt.ylim(0,30)plt.xlabel("Latency (ms)")plt.ylabel("Throughput (mb/s)")`

To estimate parameters (mean and variance) for the Gaussian model

`def estimateGaussian(X):    """     This function estimates the parameters of a Gaussian distribution using the data in X    """        m = X.shape[0]        #compute mean    sum_ = np.sum(X,axis=0)    mu = 1/m *sum_        # compute variance    var = 1/m * np.sum((X - mu)**2,axis=0)        return mu,var`
`mu, sigma2 = estimateGaussian(X)`

Multivariate Gaussian Distribution is an optional lecture in the course and the code to compute the probability density is given to us. However, in order for me to proceed on with the assignment, I need to write the `multivariateGaussian `function from scratch.

`def multivariateGaussian(X, mu, sigma2):    """    Computes the probability density function of the multivariate gaussian distribution.    """    k = len(mu)        sigma2=np.diag(sigma2)    X = X - mu.T    p = 1/((2*np.pi)**(k/2)*(np.linalg.det(sigma2)**0.5))* np.exp(-0.5* np.sum(X @ np.linalg.pinv(sigma2) * X,axis=1))    return p`
`p = multivariateGaussian(X, mu, sigma2)`

Some of the interesting functions we had utilized here are from numpy linear algebra class. The official documentation can be found here.

Once we estimate the Gaussian parameters and obtain the probability density of the data, we can visualize the fit.

`plt.figure(figsize=(8,6))plt.scatter(X[:,0],X[:,1],marker="x")X1,X2 = np.meshgrid(np.linspace(0,35,num=70),np.linspace(0,35,num=70))p2 = multivariateGaussian(np.hstack((X1.flatten()[:,np.newaxis],X2.flatten()[:,np.newaxis])), mu, sigma2)contour_level = 10**np.array([np.arange(-20,0,3,dtype=np.float)]).Tplt.contour(X1,X2,p2[:,np.newaxis].reshape(X1.shape),contour_level)plt.xlim(0,35)plt.ylim(0,35)plt.xlabel("Latency (ms)")plt.ylabel("Throughput (mb/s)")`

I had not explained the process of creating a contour plot before as most are quite straight forward. If you have difficulties following along, this article here might help. In simpler terms, we first create a meshgrid around the data region and compute the Z-axis. `plt.contour` then create the contour plot using the 3 axes (X, Y, Z).

Now to select a threshold that will flag an example as anomalies.

`def selectThreshold(yval, pval):    """    Find the best threshold (epsilon) to use for selecting outliers    """    best_epi = 0    best_F1 = 0        stepsize = (max(pval) -min(pval))/1000    epi_range = np.arange(pval.min(),pval.max(),stepsize)    for epi in epi_range:        predictions = (pval<epi)[:,np.newaxis]        tp = np.sum(predictions[yval==1]==1)        fp = np.sum(predictions[yval==0]==1)        fn = np.sum(predictions[yval==1]==0)                # compute precision, recall and F1        prec = tp/(tp+fp)        rec = tp/(tp+fn)                F1 = (2*prec*rec)/(prec+rec)                if F1 > best_F1:            best_F1 =F1            best_epi = epi            return best_epi, best_F1`
`pval = multivariateGaussian(Xval, mu, sigma2)epsilon, F1 = selectThreshold(yval, pval)print("Best epsilon found using cross-validation:",epsilon)print("Best F1 on Cross Validation Set:",F1)`

In case you have not noticed, F1 score is used here instead of accuracy as the dataset is highly unbalanced. To learn more about the various way to evaluate a machine learning’s model performance, this article sums up the topic quite well.

Visualizing the optimal threshold

`plt.figure(figsize=(8,6))`
`# plot the dataplt.scatter(X[:,0],X[:,1],marker="x")`
`# potting of contourX1,X2 = np.meshgrid(np.linspace(0,35,num=70),np.linspace(0,35,num=70))p2 = multivariateGaussian(np.hstack((X1.flatten()[:,np.newaxis],X2.flatten()[:,np.newaxis])), mu, sigma2)contour_level = 10**np.array([np.arange(-20,0,3,dtype=np.float)]).Tplt.contour(X1,X2,p2[:,np.newaxis].reshape(X1.shape),contour_level)`
`# Circling of anomaliesoutliers = np.nonzero(p<epsilon)[0]plt.scatter(X[outliers,0],X[outliers,1],marker ="o",facecolor="none",edgecolor="r",s=70)`
`plt.xlim(0,35)plt.ylim(0,35)plt.xlabel("Latency (ms)")plt.ylabel("Throughput (mb/s)")`

As for high dimensional dataset, we just have to follow the exact same steps as before

`mat2 = loadmat("ex8data2.mat")X2 = mat2["X"]Xval2 = mat2["Xval"]yval2 = mat2["yval"]`
`# compute the mean and variancemu2, sigma2_2 = estimateGaussian(X2)`
`# Training setp3 = multivariateGaussian(X2, mu2, sigma2_2)`
`# cross-validation setpval2 = multivariateGaussian(Xval2, mu2, sigma2_2)`
`# Find the best thresholdepsilon2, F1_2 = selectThreshold(yval2, pval2)print("Best epsilon found using cross-validation:",epsilon2)print("Best F1 on Cross Validation Set:",F1_2)print("# Outliers found:",np.sum(p3<epsilon2))`

The second part of the assignment involved implementing a collaborative filtering algorithm to build a recommender system for movie ratings.

`mat3 = loadmat("ex8_movies.mat")mat4 = loadmat("ex8_movieParams.mat")Y = mat3["Y"] # 1682 X 943 matrix, containing ratings (1-5) of 1682 movies on 943 userR = mat3["R"] # 1682 X 943 matrix, where R(i,j) = 1 if and only if user j give rating to movie iX = mat4["X"] # 1682 X 10 matrix , num_movies X num_features matrix of movie featuresTheta = mat4["Theta"] # 943 X 10 matrix, num_users X num_features matrix of user features`
`# Compute average rating print("Average rating for movie 1 (Toy Story):",np.sum(Y[0,:]*R[0,:])/np.sum(R[0,:]),"/5")`

The print statement will print: `Average rating for movie 1 (Toy Story): 3.8783185840707963 /5`

`plt.figure(figsize=(8,16))plt.imshow(Y)plt.xlabel("Users")plt.ylabel("Movies")`

`def  cofiCostFunc(params, Y, R, num_users, num_movies, num_features, Lambda):    """    Returns the cost and gradient for the collaborative filtering problem    """        # Unfold the params    X = params[:num_movies*num_features].reshape(num_movies,num_features)    Theta = params[num_movies*num_features:].reshape(num_users,num_features)        predictions =  X @ Theta.T    err = (predictions - Y)    J = 1/2 * np.sum((err**2) * R)        #compute regularized cost function    reg_X =  Lambda/2 * np.sum(Theta**2)    reg_Theta = Lambda/2 *np.sum(X**2)    reg_J = J + reg_X + reg_Theta        # Compute gradient    X_grad = err*R @ Theta    Theta_grad = (err*R).T @ X    grad = np.append(X_grad.flatten(),Theta_grad.flatten())        # Compute regularized gradient    reg_X_grad = X_grad + Lambda*X    reg_Theta_grad = Theta_grad + Lambda*Theta    reg_grad = np.append(reg_X_grad.flatten(),reg_Theta_grad.flatten())        return J, grad, reg_J, reg_grad`

Similar to the previous approach, the assignment requires us to compute the cost function, gradient, regularized cost function and then regularized gradient in separate steps. The code block above will allows you to follow the assignment step by step as long as you use the correct indexing.

To test our cost function,

`# Reduce the data set size to run fasternum_users, num_movies, num_features = 4,5,3X_test = X[:num_movies,:num_features]Theta_test= Theta[:num_users,:num_features]Y_test = Y[:num_movies,:num_users]R_test = R[:num_movies,:num_users]params = np.append(X_test.flatten(),Theta_test.flatten())`
`# Evaluate cost functionJ, grad = cofiCostFunc(params, Y_test, R_test, num_users, num_movies, num_features, 0)[:2]print("Cost at loaded parameters:",J)`
`J2, grad2 = cofiCostFunc(params, Y_test, R_test, num_users, num_movies, num_features, 1.5)[2:]print("Cost at loaded parameters (lambda = 1.5):",J2)`

Once we get our cost function and gradient going, we can start training our algorithm.

`# load movie listmovieList = open("movie_ids.txt","r").read().split("\n")[:-1]`
`# see movie listnp.set_printoptions(threshold=np.nan)movieList`

You can enter your own movie preference at this step but I used the exact same ratings as the assignment to keep it consistent.

`# Initialize my ratingsmy_ratings = np.zeros((1682,1))`
`# Create own ratingsmy_ratings[0] = 4 my_ratings[97] = 2my_ratings[6] = 3my_ratings[11]= 5my_ratings[53] = 4my_ratings[63]= 5my_ratings[65]= 3my_ratings[68] = 5my_ratings[82]= 4my_ratings[225] = 5my_ratings[354]= 5`
`print("New user ratings:\n")for i in range(len(my_ratings)):    if my_ratings[i]>0:        print("Rated",int(my_ratings[i]),"for index",movieList[i])`

To prepare our data before inputting into the algorithm, we need to normalize the ratings, set some random initial parameters, and use an optimizing algorithm to update the parameters.

`def normalizeRatings(Y, R):    """    normalized Y so that each movie has a rating of 0 on average, and returns the mean rating in Ymean.    """        m,n = Y.shape[0], Y.shape[1]    Ymean = np.zeros((m,1))    Ynorm = np.zeros((m,n))        for i in range(m):        Ymean[i] = np.sum(Y[i,:])/np.count_nonzero(R[i,:])        Ynorm[i,R[i,:]==1] = Y[i,R[i,:]==1] - Ymean[i]            return Ynorm, Ymean`
`def gradientDescent(initial_parameters,Y,R,num_users,num_movies,num_features,alpha,num_iters,Lambda):    """    Optimize X and Theta    """    # unfold the parameters    X = initial_parameters[:num_movies*num_features].reshape(num_movies,num_features)    Theta = initial_parameters[num_movies*num_features:].reshape(num_users,num_features)        J_history =[]        for i in range(num_iters):        params = np.append(X.flatten(),Theta.flatten())        cost, grad = cofiCostFunc(params, Y, R, num_users, num_movies, num_features, Lambda)[2:]                # unfold grad        X_grad = grad[:num_movies*num_features].reshape(num_movies,num_features)        Theta_grad = grad[num_movies*num_features:].reshape(num_users,num_features)        X = X - (alpha * X_grad)        Theta = Theta - (alpha * Theta_grad)        J_history.append(cost)        paramsFinal = np.append(X.flatten(),Theta.flatten())    return paramsFinal , J_history`

Once again, I chose batch gradient descent as my optimizing algorithm. One thing going through all the programming assignment in python taught me is that you would rarely go wrong with gradient descent. At this point, the code for gradient descent should be fairly familiar to you.

`Y = np.hstack((my_ratings,Y))R =np.hstack((my_ratings!=0,R))`
`# Normalize RatingsYnorm, Ymean = normalizeRatings(Y, R)`
`num_users = Y.shape[1]num_movies = Y.shape[0]num_features = 10`
`# Set initial Parameters (Theta,X)X = np.random.randn(num_movies, num_features)Theta = np.random.randn(num_users, num_features)initial_parameters = np.append(X.flatten(),Theta.flatten())Lambda = 10`
`# Optimize parameters using Gradient DescentparamsFinal, J_history = gradientDescent(initial_parameters,Y,R,num_users,num_movies,num_features,0.001,400,Lambda)`

Plotting of cost function to ensure gradient descent is working

`plt.plot(J_history)plt.xlabel("Iteration")plt.ylabel("\$J(\Theta)\$")plt.title("Cost function using Gradient Descent")`

To make predictions on movies that you had not rated

`# unfold paramatersX = paramsFinal[:num_movies*num_features].reshape(num_movies,num_features)Theta = paramsFinal[num_movies*num_features:].reshape(num_users,num_features)`
`# Predict ratingp = X @ Theta.Tmy_predictions = p[:,0][:,np.newaxis] + Ymean`
`import pandas as pddf = pd.DataFrame(np.hstack((my_predictions,np.array(movieList)[:,np.newaxis])))df.sort_values(by=[0],ascending=False,inplace=True)df.reset_index(drop=True,inplace=True)`
`print("Top recommendations for you:\n")for i in range(10):    print("Predicting rating",round(float(df[0][i]),1)," for index",df[1][i])`

Finally, I had come to an end for Andrew Ng’s Machine Learning Course in Python. I hope this is as beneficial to you as it is for me writing it and I thank all of you for the supports.

The Jupyter notebook will be uploaded to my GitHub at (https://github.com/Benlau93/Machine-Learning-by-Andrew-Ng-in-Python).

For other python implementation in the series,