A Quick Introduction to the NumPy Library

Adi Bronshtein
Towards Data Science
7 min readApr 26, 2017

--

After my last blog post about Pandas, I thought it might be a good idea to take a step back and write a post about the NumPy library. NumPy (short for Numerical Python) is “the fundamental package for scientific computing with Python” and it is the library Pandas, Matplotlib and Scikit-learn builds on top off. You might think “what’s the point of using NumPy when I could be using these libraries?” but I think that NumPy is often underrated and when used right, it could be quite a powerful tool for numerical operations in Python.

Installation and Getting Started

NumPy does not come with Python by default so it needs to be installed. As I recommended for the Pandas installation, the easiest way to get NumPy (along with a ton of other packages) is to install Anaconda. If You don’t want to install all these packages and just install NumPy, you can download the version for your operating system from this page.

After you’ve downloaded and install NumPy, you need to import it every time you want to use it in your Python IDE (Integrated Development Environment) like Jupyter Notebook or Spyder (both of them come with Anaconda by default). As a reminder, importing a library means loading it into the memory and then it’s there for you to work with. To import NumPy you need to write the following code:

import numpy as np

And you’re ready to go! Usually you will add the second part (‘as np’) so you can use NumPy by writing ‘np.command’ instead of having to write ‘numpy.command’ every time you want to use it. It’s not a huge difference but hey, every key stroke counts! Remember, you will need to do it every time you start a new Jupyter Notebook, a Spyder file etc.

Working with NumPy

Creating NumPy Arrays, Loading and Saving Files

NumPy works with Python objects called multi-dimensional arrays. Arrays are basically collections of values, and they have one or more dimensions. NumPy array data structure is also called ndarray, short for n-dimensional array. An array with one dimension is called a vector and an array with two dimensions is called a matrix. Datasets are usually built as matrices and it is much easier to open those with NumPy instead of working with list of lists, for example.

Turning a list to a NumPy array is pretty simple:

numpy_array = np.array(list)

And printing/displaying the array will look like this:

array([[  7.4  ,   0.7  ,   0.   , ...,   0.56 ,   9.4  ,   5.   ],
[ 7.8 , 0.88 , 0. , ..., 0.68 , 9.8 , 5. ],
[ 7.8 , 0.76 , 0.04 , ..., 0.65 , 9.8 , 5. ],
...,
[ 6.3 , 0.51 , 0.13 , ..., 0.75 , 11. , 6. ],
[ 5.9 , 0.645, 0.12 , ..., 0.71 , 10.2 , 5. ],
[ 6. , 0.31 , 0.47 , ..., 0.66 , 11. , 6. ]])

Another option is to open a CSV file using the np.genfromtxt() function:

numpy_array = np.genfromtxt("file.csv", delimiter=";", skip_header=1)

The argument inside the brackets are the file name (and path, if needed), the delimiter set to ‘;’ to make sure it’s parsed correctly — you can use different characters to parse (like ‘,’ for example); and skip_header set to ‘1’ will make the csv load to an array without the header row. You can just not include it if you do want the headers (as the default is zero).

You can also save NumPy arrays to files by using np.savetxt(). For example, np.savetxt('file.txt',arr,delimiter=' ') will save to a text file and np.savetxt('file.csv',arr,delimiter=',') will save to a CSV file.

Another cool feature is the ability to create different arrays like random arrays: np.random.rand(3,4) will create a 3x4 array of random numbers between 0 and 1 while np.random.rand(7,6)*100 will create a 7x6 array of random numbers between 0 to 100; you can also define the size of the array in a different way: np.random.randint(10,size=(3,2)) creates an array the size of 3x2 with random numbers between 0 and 9. Remember that the last digit (10) is not included in the range when you use this syntax.

It’s also possible to create an array of all zeros: np.zeros(4,3) (4x3 array of all zeros) or ones np.ones((4)) (4x1 array of ones); you can the command np.full((3,2),8) to create a 3x2 array full of 8. You can, of course, change each and every one of these numbers to get the array you want.

Working and Inspecting Arrays

Now that you have your array loaded, you can check its size (number of elements) by typing array.size and its shape (the dimensions — rows and columns) by typing array.shape. You can use array.dtype to get the data types of the array (floats, integers etc — see more in the NumPy documentation) and if you need to convert the datatype you can use the array.astype(dtype) command. If you need to convert a NumPy array to a Python list, there is a command for that too: array.tolist().

Indexing and Slicing

Indexing and slicing NumPy arrays works very similarly to working with Python lists: array[5] will return the element in the 5th index, and array[2,5] will return the element in index[2][5]. You can also select the first five elements, for example, by using a colon (:). array[0:5] will return the first five elements (index 0–4) and array[0:5,4] will return the first five elements in column 4. You can use array[:2] to get elements from the beginning until index 2 (not including index 2) or array[2:] to return from the 2nd index until the end of the array. array[:,1] will return the elements at index 1 on all rows.

Assigning values to a NumPy array is, again, very similar to doing so in Python lists: array[1]=4 will assign the value 4 to the element on index 1; you can do it to multiple values: array[1,5]=10 or use slicing when assigning values: array[:,10]=10 will change the entire 11th column to the value 10.

Sorting and Reshaping

array.sort() can be used to sort your NumPy array — you can pass different arguments inside the brackets to define what you want to sort on (by using the argument ‘order=string/list of strings’, for example. See more examples in the documentation). array.sort(axis=0) will sort specific axis of the array — rows or columns. two_d_arr.flatten() will flatten a 2 dimensional array to a 1 dimensional array. array.T will transpose an array — meaning columns will become rows and vice versa. array.reshape(x,y) would reshape your array to the size you set with x and y. array.resize((x,y)) will change the array shape to x and y and fill new values with zeros.

Combining and Splitting

You can use np.concatenate((array1,array2),axis=0) to combine two NumPy arrays — this will add array 2 as rows to the end of array 1 while np.concatenate((array1,array2),axis=1) will add array 2 as columns to the end of array 1. np.split(array,2) will spilt the array into two sub-arrays and np.hsplit(array,5) will split the array horizontally on the 5th index.

Adding and Removing Elements

There are, of course, commands to add and remove elements from NumPy arrays:

  • np.append(array,values) will append values to end of array.
  • np.insert(array, 3, values)will insert values into array before index 3
  • np.delete(array, 4, axis=0)will delete row on index 4 of array
  • np.delete(array, 5, axis=1) will delete column on index 5 of array

Descriptive Statistics

You can use NumPy methods to get descriptive statistics on NumPy arrays:

  • np.mean(array,axis=0) will return mean along specific axis (0 or 1)
  • array.sum() will return the sum of the array
  • array.min()will return the minimum value of the array
  • array.max(axis=0)will return the maximum value of specific axis
  • np.var(array)will return the variance of the array
  • np.std(array,axis=1)will return the standard deviation of specific axis
  • array.corrcoef()will return the correlation coefficient of the array
  • numpy.median(array) will return the median of the array elements

Doing Math with NumPy

Any tutorial to NumPy would not be complete without the numerical and mathematical operations you can do with NumPy! Let’s go over them:

np.add(array ,1) will add 1 to each element in the array and np.add(array1,array2) will add array 2 to array 1. The same is true to np.subtract(), np.multiply(), np.divide() and np.power() — all these commands would work in exactly the same way as described above.

You can also get NumPy to return different values from the array, like:

  • np.sqrt(array) will return the square root of each element in the array
  • np.sin(array) will return the sine of each element in the array
  • np.log(array) will return the natural log of each element in the array
  • np.abs(arr) will return the absolute value of each element in the array
  • np.array_equal(arr1,arr2) will return True if the arrays have the same elements and shape

It is possible to round different values in array: np.ceil(array) will round up to the nearest integer, np.floor(array) will round down to the nearest integer and np.round(array) will round to the nearest integer.

This is just the tip of the iceberg when it comes to what you can do with NumPy! I do hope that this blog post did help you see the possibilities and how powerful NumPy can be when working on data with Python. If you liked this tutorial, feel free to check out my Pandas tutorial (shamelessly promoting myself :-P). As always, thank you for reading! I would appreciate any comments, notes, corrections, questions or suggestions — if there’s anything you’d like me to write about, please don’t hesitate to let me know. See you on the next blog post!

--

--