Let’s talk about NumPy — for Data Science Beginners

Published in

Towards Data Science

7 min readApr 15, 2018

NumPy (Numerical Python) is a linear algebra library in Python. It is a very important library on which almost every data science or machine learning Python packages such as SciPy (Scientific Python), Mat−plotlib (plotting library), Scikit-learn, etc depends on to a reasonable extent.

NumPy is very useful for performing mathematical and logical operations on Arrays. It provides an abundance of useful features for operations on n-arrays and matrices in Python.

This course covers basics things to know about NumPy as a beginner in Data science. These includes how to create NumPy arrays, use broadcasting, access values, and manipulate arrays. More importantly, you will learn NumPy’s benefit over Python lists, which include: being more compact, faster access in reading and writing items, being more convenient and more efficient.

In this course, we will be using the Jupyter notebook as our editor.

Let’s Go!

Installing NumPy

If you have Anaconda, you can simply install NumPy from your terminal or command prompt using:

conda install numpy

If you do not have Anaconda on your computer, install NumPy from your terminal using:

pip install numpy

Once you have NumPy installed, launch your Jupyter notebook and get started. Let’s begin with NumPy Arrays

NumPy Arrays

A NumPy array is simply a grid that contains values of the same type. NumPy Arrays come in two forms; Vectors and Matrices. Vectors are strictly one-dimensional(1-d) arrays, while Matrices are multidimensional. In some cases, Matrices can still have only one row or one column.

Let’s start by importing NumPy in your Jupyter notebook.

import numpy as np

Creating numpy arrays from python lists

Say we have a Python list:

my_list = [1, 2, 3, 4, 5]

We can simply create a NumPy array called my_numpy_list and display the result:

my_numpy_list = np.array(my_list)
my_numpy_list  #This line show the result of the array generated

What we have just done is casting a python list into a one-dimensional array. To get a 2-dimensional array, we have to cast a list of list as shown below.

second_list = [[1,2,3], [5,4,1], [3,6,7]]new_2d_arr = np.array(second_list)
new_2d_arr  #This line show the result of the array generated

We have successfully created a 2-d array that has 3 rows and 3 columns.

Creating NumPy array using arange() built-in function.

Similar to the python built-in range() function, we will create a NumPy array using arange().

my_list = np.arange(10)#ORmy_list = np.arange(0,10)

This generates 10 digits of values from index 0 to 10.

It is important to note that the arange() function can also take 3 arguments. The third argument signifies the step size of the operation. For example, to get all even numbers from 0 to 10, simply add a step size of 2 like below.

my_list = np.arange(0,11,2)

We can also generate a one-dimensional array of seven zeros.

my_zeros = np.zeros(7)

We can also generate a one-dimensional array of five ones.

my_ones = np.ones(5)

Similarly, we could generate a two-dimensional array of zeros having 3 rows and 5 columns

two_d = np.zeros((3,5))

Creating NumPy array using linspace() built-in function.

The linspace() function returns numbers evenly spaced over a specified intervals. Say we want 15 evenly spaced points from 1 to 3, we can easily use:

lin_arr = np.linspace(1, 3, 15)

This gives us a one dimensional vector.

Unlike the arange() function which takes the third argument as the number of steps, linspace() takes the third argument as the number of datapoints to be created.

Creating an identity matrix in NumPy

Identity matrices are very useful when dealing with linear algebras. Usually, is a two-dimensional square matrix. This means the number of row is equal to the number of column. One unique thing to note about identity matrix is that the diagonals are 1’s and everything else is 0. Identity matrices usually takes a single argument. Here’s how to create one.

my_matrx = np.eye(6)    #6 is the number of columns/rows you want

Generating an array of random numbers in NumPy

We can generate an array of random numbers using rand(), randn() or randint() functions.

Using random.rand(), we can generate an array of random numbers of the shape we pass to it from uniform distribution over 0 to 1.

For example, say we want a one-dimensional array of 4 objects that are uniformly distributed from 0 to 1, we can do this:

my_rand = np.random.rand(4)

And if we want a two-dimensional array of 5rows and 4columns:

my_rand = np.random.rand(5, 4)
my_rand

Using randn(), we can generate random samples from Standard, normal or Gaussian distribution centered around 0. For example, let’s generate 7 random numbers:

my_randn = np.random.randn(7)
my_randn

When you plot the result will give us a normal distribution curve.

Similarly, to generate a two-dimensional array of 3 rows and 5 columns, do this:

np.random.randn(3,5)

Lastly, we can use the randint() function to generate an array of integers. The randint() function can take up to 3 arguments; the low(inclusive), high(exclusive) and size of the array.

np.random.randint(20) #generates a random integer exclusive of 20np.random.randint(2, 20) #generates a random integer including 2 but excluding 20np.random.randint(2, 20, 7) #generates 7 random integers including 2 but excluding 20

Converting one-dimensional array to two-dimensional

First, we generate a 1-d array of random 25 integers

arr = np.random.rand(25)

Then convert it to a 2-d array using the reshape() function

arr.reshape(5,5)

Note: The reshape() can only convert to equal number or rows and columns and must together be equal to equal to the number of elements. In the example above, arr contained 25 elements hence can only be reshaped to a 5X5 matrix.

Locating the maximum and minimum values of a NumPy Array

Using the max(), and min(), we can get the maximum or minimum values in an array.

arr_2 = np.random.randint(0, 20, 10) arr_2.max() #This gives the highest value in the array arr_2.min() #This gives the lowest value in the array

Using the argmax() and argmin() functions, we can locate the index of the maximum or minimum values in an array.

arr_2.argmax() #This shows the index of the highest value in the array 
arr_2.argmin() #This shows the index of the lowest value in the array

Say you have a large volume of array and you are trying to figure out the shape of that array, you want to know if it’s a one-dimensional or two-dimensional array, simply use the shape function.

arr.shape

Indexing/Selecting elements or groups of elements from a NumPy array

Indexing NumPy arrays is similar to that of Python. You simply pass in the index you want.

my_array = np.arange(0,11)my_array[8]  #This gives us the value of element at index 8

To get a range of values in an array, we will use the slice notation ‘:’ just like in Python

my_array[2:6] #This returns everything from index 2 to 6(exclusive)my_array[:6] #This returns everything from index 0 to 6(exclusive)my_array[5:] #This returns everything from index 5 to the end of the array.

Similarly, we can select elements in a 2-d array using either the double bracket [][] notation or the single bracket [,] notation.

Using the double bracket notation, we will grab the value ‘60’ from the 2-d array below:

two_d_arr = np.array([[10,20,30], [40,50,60], [70,80,90]])two_d_arr[1][2] #The value 60 appears is in row index 1, and column index 2

Using the single bracket notation, we will grab the value ‘20’ from the array above:

two_d_arr[0,1]

We can also go further to grab subsections of a 2-d array using the slice notation. Let’s grab some elements in some corners of the array:

two_d_arr[:1, :2]           # This returns [[10, 20]]two_d_arr[:2, 1:]           # This returns ([[20, 30], [50, 60]])two_d_arr[:2, :2]           #This returns ([[10, 20], [40, 50]])

We can also index an entire row or column. To grab any row, simple use it’s index number like this:

two_d_arr[0]    #This grabs row 0 of the array ([10, 20, 30])two_d_arr[:2] #This grabs everything before row 2 ([[10, 20, 30], [40, 50, 60]])

We can also perform conditional and logical selections on arrays using & (AND), | (OR), <, > and == operators to compare the values in the array with the given value. Here’s how:

new_arr = np.arange(5,15)new_arr > 10 #This returns TRUE where the elements are greater than 10 [False, False, False, False, False, False,  True,  True,  True, True]

Now we can print out the actual elements that were TRUE in the above conditional using:

bool_arr = new_arr > 10new_arr[bool_arr]  #This returns elements greater than 10 [11, 12, 13, 14]new_arr[new_arr>10] #A shorter way to do what we have just done

Using a combination of conditional and Logical & (AND) operators, we can get elements that are greater than 6 but less than 10.

new_arr[(new_arr>6) & (new_arr<10)]

Our expected result is: ([7, 8, 9])

Broadcasting

Broadcasting is a quick way to change the values of a NumPy array.

my_array[0:3] = 50#Result is: [50, 50, 50, 3, 4,  5,  6,  7,  8,  9, 10]

In this example, we changed the values of the elements in index 0 to 3 from their initial values to 50.

Performing arithmetic operations on NumPy Arrays

arr = np.arange(1,11)arr * arr              #Multiplies each element by itself arr - arr              #Subtracts each element from itselfarr + arr              #Adds each element to itselfarr / arr              #Divides each element by itself

We can also perform scalar operations on an array. NumPy makes it possible through broadcasting.

arr + 50              #This adds 50 to every element in that array

Numpy also let’s you perform universal functions such as square roots, exponentials, trigonometric, etc on array.

np.sqrt(arr)     #Returns the square root of each element np.exp(arr)     #Returns the exponentials of each elementnp.sin(arr)     #Returns the sin of each elementnp.cos(arr)     #Returns the cosine of each elementnp.log(arr)     #Returns the logarithm of each elementnp.sum(arr)     #Returns the sum total of elements in the arraynp.std(arr)     #Returns the standard deviation of in the array

We can also grab the sum of columns or rows in a 2-d array:

mat = np.arange(1,26).reshape(5,5)mat.sum()         #Returns the sum of all the values in matmat.sum(axis=0)   #Returns the sum of all the columns in matmat.sum(axis=1)   #Returns the sum of all the rows in mat

Congratulations, we have come to the end of the NumPy tutorial!

If you completed this lesson, then you have covered a lot of grounds. Keep practicing, so that your new found knowledge stays fresh.

Got questions, got stuck or just want to say hi? kindly use the comment box. If this tutorial was helpful to you in some way, show me some 👏.

Let’s talk about NumPy — for Data Science Beginners

Installing NumPy

NumPy Arrays

Written by Ehi Aigiomawu