The world’s leading publication for data science, AI, and ML professionals.

Implementing the Perceptron Algorithm in Python

Machine Learning From Scratch: Part 6

Image by Author
Image by Author

In this article, we are going to look at the Perceptron Algorithm, which is the most basic single-layered neural network used for binary classification. First, we will look at the Unit Step Function and see how the Perceptron Algorithm classifies and then have a look at the perceptron update rule.

Finally, we will plot the decision boundary for our data. We will use the data with only two features, and there will be two classes since Perceptron is a binary classifier. We will implement all the code using Python NumPy, and visualize/plot using Matplotlib.

Logistic Regression From Scratch in Python

Implementing Polynomial Regression From Scratch in Python


Introduction

The Perceptron algorithm was inspired by the basic processing units in the brain, called neurons, and how they process signals. It was invented by Frank Rosenblatt, using the McCulloch-Pitts neuron and the findings of Hebb. Perceptron Research Paper.

A Perceptron Algorithm is not something widely used in practice. We study it mostly for historical reasons and also because it is the most basic and simple single-layered neural network.

Neuron; source Wikipedia
Neuron; source Wikipedia

Perceptron

Let us try to understand the Perceptron algorithm using the following data as a motivating example.

from sklearn import datasets
X, y = datasets.make_blobs(n_samples=150,n_features=2,
                           centers=2,cluster_std=1.05,
                           random_state=2)
#Plotting
fig = plt.figure(figsize=(10,8))
plt.plot(X[:, 0][y == 0], X[:, 1][y == 0], 'r^')
plt.plot(X[:, 0][y == 1], X[:, 1][y == 1], 'bs')
plt.xlabel("feature 1")
plt.ylabel("feature 2")
plt.title('Random Classification Data with 2 classes')
Image by Author
Image by Author

There are two classes, red and green, and we want to separate them by drawing a straight line between them. Or, more formally, we want to learn a set of parameters theta to find an optimal hyperplane(straight line for our data) that separates the two classes.

For Linear regression our hypothesis (y_hat) was theta.X. Then, for binary classification in Logistic Regression, we needed to output probabilities between 0 and 1, so we modified the hypothesis as – sigmoid(theta.X). We applied the sigmoid function over the dot product of input features and parameters because we needed to squish our output between 0 and 1.

For the Perceptron algorithm, we apply a different function over theta.X , which is the Unit Step Function, which is defined as –

source: Andrew Ng course
source: Andrew Ng course

where,

source: Andrew Ng course
source: Andrew Ng course

Unlike Logistic Regression which outputs probability between 0 and 1, the Perceptron outputs values that are either 0 or 1 exactly.

This function says that if the output(theta.X) is greater than or equal to zero, then the model will classify 1(red for example)and if the output is less than zero, the model will classify as 0(green for example). **** And that is how the perception algorithm classifies.

Let’s look at the Unit Step Function graphically –

Unit Step Function; source
Unit Step Function; source

We can see for z≥0, g(z) = 1 and for z<0, g(z) = 0.

Let’s code the step function.

def step_func(z):
        return 1.0 if (z > 0) else 0.0

Perceptron as a Neural Net

A Perceptron; Image by Author
A Perceptron; Image by Author

We can visually understand the Perceptron by looking at the above image. For every training example, we first take the dot product of input features and parameters, theta. Then, we apply the Unit Step Function to make the prediction(y_hat).

And if the prediction is wrong or in other words the model has misclassified that example, we make the update for the parameters theta. We don’t update when the prediction is correct (or the same as the true/target value y).

Let’s see what the update rule is.


Perceptron Update Rule

The perception update rule is very similar to the Gradient Descent update rule. The following is the update rule –

source: Andrew Ng course
source: Andrew Ng course

Note that even though the Perceptron algorithm may look similar to logistic regression, it is actually a very different type of algorithm, since it is difficult to endow the perceptron’s predictions with meaningful probabilistic interpretations, or derive the perceptron as a maximum likelihood estimation algorithm.

(From Andrew Ng Course)

Let’s code it. See comments(#).

def perceptron(X, y, lr, epochs):

    # X --> Inputs.
    # y --> labels/target.
    # lr --> learning rate.
    # epochs --> Number of iterations.

    # m-> number of training examples
    # n-> number of features 
    m, n = X.shape

    # Initializing parapeters(theta) to zeros.
    # +1 in n+1 for the bias term.
    theta = np.zeros((n+1,1))

    # Empty list to store how many examples were 
    # misclassified at every iteration.
    n_miss_list = []

    # Training.
    for epoch in range(epochs):

        # variable to store #misclassified.
        n_miss = 0

        # looping for every example.
        for idx, x_i in enumerate(X):

            # Insering 1 for bias, X0 = 1.
            x_i = np.insert(x_i, 0, 1).reshape(-1,1)

            # Calculating prediction/hypothesis.
            y_hat = step_func(np.dot(x_i.T, theta))

            # Updating if the example is misclassified.
            if (np.squeeze(y_hat) - y[idx]) != 0:
                theta += lr*((y[idx] - y_hat)*x_i)

                # Incrementing by 1.
                n_miss += 1

        # Appending number of misclassified examples
        # at every iteration.
        n_miss_list.append(n_miss)

    return theta, n_miss_list

Plotting Decision Boundary

We know that the model makes a prediction of –

_y=1 when y_hat ≥ 0_

_y=0 when y_hat < 0_

So, theta.X = 0 is going to be our Decision boundary.

The following code for plotting the Decision Boundary only works when we have only two features in X.

See comments(#).

def plot_decision_boundary(X, theta):

    # X --> Inputs
    # theta --> parameters

    # The Line is y=mx+c
    # So, Equate mx+c = theta0.X0 + theta1.X1 + theta2.X2
    # Solving we find m and c
    x1 = [min(X[:,0]), max(X[:,0])]
    m = -theta[1]/theta[2]
    c = -theta[0]/theta[2]
    x2 = m*x1 + c

    # Plotting
    fig = plt.figure(figsize=(10,8))
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "r^")
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs")
    plt.xlabel("feature 1")
    plt.ylabel("feature 2")
    plt.title('Perceptron Algorithm')
    plt.plot(x1, x2, 'y-')

Training and Plotting

theta, miss_l = perceptron(X, y, 0.5, 100)
plot_decision_boundary(X, theta)
Image by Author
Image by Author

We can see from the above decision boundary graph that we are able to separate the green and blue classes perfectly. Meaning, We get an accuracy of 100%.


Limitations of Perceptron Algorithm

  1. It is only a linear classifier, can never separate data that are not linearly separable.
  2. The algorithm is used only for Binary Classification problems.

Thanks for reading. For questions, comments, concerns, talk to be in the response section. More ML from scratch is coming soon.

Check out the Machine Learning from scratch series –


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.