27 NumPy Operations for beginners

Data manipulation for beginners.

Parijat Bhatt
Towards Data Science

--

Source: scipy-lectures.org

Introduction

In my previous article on 21 Pandas operations for absolute beginners, I discussed a few important operations that can help someone new to get started with data analysis. This article is supposed to serve a similar purpose for NumPy. To give one a brief intro, NumPy is a very powerful library that can be used to perform all kinds of operations, from finding the mean of an array to fast Fourier transform and signal analysis. In that sense, it’s very similar to MATLAB.

You will be required to import NumPy as ‘np’ and later use it to perform the operations.

Operations:

  1. Converting a list to n-dimensional NumPy array
numpy_array = np.array(list_to_convert)

2. Use of np.newaxis and np.reshape

np.newaxis is used to create new dimensions of size 1. For eg

a = [1,2,3,4,5] is a list
a_numpy = np.array(a)

If you print a_numpy.shape , you get (5,) . In order to make this a row vector or column vector, one could do

row_vector = a_numpy[:,np.newaxis] ####shape is (5,1) now
col_vector = a_numpy[np.newaxis,:] ####shape is (1,5) now

Similarly, np.reshape can be used to reshape any array. For eg:

a = range(0,15) ####list of numbers from 0 to 14
b = a.reshape(3,5)
b would become:
[[0,1,2,3,4],
[5,6,7,8,9],
[10,11,12,13,14],
[15,16,17,18,19]]

3. Converting any data type to NumPy array

Use np.asarray. For eg

a = [(1,2), [3,4,(5)], (6,7,8)]
b = np.asarray(a)
b::
array([(1, 2), list([3, 4, (5, 6)]), (6, 7, 8)], dtype=object)

4. Get an n-dimensional array of zeros.

a = np.zeros(shape,dtype=type_of_zeros)
type of zeros can be int or float as it is required
eg.
a = np.zeros((3,4), dtype = np.float16)

5. Get an n-dimensional array of ones.

Similar to np.zeros:

a = np.ones((3,4), dtype=np.int32)

6. np.full and np.empty

np.full is used to get an array filled with one specific value while np.empty helps to create the array by initializing them with random values. For eg.

1. np.full(shape_as_tuple,value_to_fill,dtype=type_you_want)
a = np.full((2,3),1,dtype=np.float16)
a would be:
array([[1., 1., 1.],
[1., 1., 1.]], dtype=float16)
2. np.empty(shape_as_tuple,dtype=int)
a = np.empty((2,2),dtype=np.int16)
a would be:
array([[25824, 25701],
[ 2606, 8224]], dtype=int16)
The integers here are random.

7. Getting an array of evenly spaced values with np.arrange and np.linspace

Both can be used to arrange to create an array with evenly spaced elements.

linspace:

np.linspace(start,stop,num=50,endpoint=bool_value,retstep=bool_value)endpoint specifies if you want the stop value to be included and retstep tells if you would like to know the step-value.'num' is the number of integer to be returned where 50 is default Eg,np.linspace(1,2,num=5,endpoint=False,retstep=True)This means return 5 values starting at 1 and ending befor 2 and returning the step-size.output would be:
(array([1. , 1.2, 1.4, 1.6, 1.8]), 0.2)
##### Tuple of numpy array and step-size

arange:

np.arange(start=where_to_start,stop=where_to_stop,step=step_size)

If only one number is provided as an argument, it’s treated to be a stop and if 2 are provided, they are assumed to be the start and the stop. Notice the spelling here.

8. Finding the shape of the NumPy array

array.shape

9. Knowing the dimensions of the NumPy array

x = np.array([1,2,3])
x.ndim will produce 1

10. Finding the number of elements in the NumPy array

x = np.ones((3,2,4),dtype=np.int16)
x.size will produce 24

11. Get the memory space occupied by an n-dimensional array

x.nbytesoutput will be 24*memory occupied by 16 bit integer = 24*2 = 48

12. Finding the data type of elements in the NumPy array

x = np.ones((2,3), dtype=np.int16)x.dtype will produce
dtype('int16')
It works better when elements in the array are of one type otherwise typecasting happens and result may be difficult to interpret.

13. How to create a copy of NumPy array

Use np.copy

y = np.array([[1,3],[5,6]])
x = np.copy(y)
If,
x[0][0] = 1000
Then,
x is
100 3
5 6
y is
1 3
5 6

14. Get transpose of an n-d array

Use array_name.T

x = [[1,2],[3,4]]
x
1 2
3 4
x.T is
1 3
2 4

15. Flatten an n-d array to get a one-dimensional array

Use np.reshape and np.ravel:

np.reshape: This is really a nice and sweet trick. While reshaping if you provide -1 as one of the dimensions,it’s inferred from the no. of elements. For eg. for an array of size (1,3,4) if it’s reshaped to (-1,2,2), then the first dimension’s length is calculated to be 3 . So,

If x is:
1 2 3
4 5 9
Then x.reshape(-1) produces:
array([1, 2, 3, 4, 5, 9])

np.ravel

x = np.array([[1, 2, 3], [4, 5, 6]])x.ravel() produces
array([1, 2, 3, 4, 5, 6])

16. Change axes of an n-d array or swap dimensions

Use np.moveaxis and np.swapaxes.

x = np.ones((3,4,5))
np.moveaxis(x,axes_to_move_as_list, destination_axes_as_list)
For eg.
x.moveaxis([1,2],[0,-2])This means you want to move 1st axis to 0th axes and 2nd axes to 2nd last axis. So,the new shape would be.(4,5,3)

The conversion is not in place so don’t forget to store it in another variable.

np.swapaxes.

x = np.array([[1,2],[3,4]])x.shape is (2,2) and x is
1 2
3 4
np.swapaxes(x,0,1) will produce
1 3
2 4
If x = np.ones((3,4,5)), and
y = np.swapaxes(0,2)
y.shape will be
(5,4,3)

17. Convert NumPy array to list

x = np.array([[3,4,5,9],[2,6,8,0]])
y = x.tolist()
y will be
[[3, 4, 5, 9], [2, 6, 8, 0]]

The NumPy docs mention that using list(x) will also work if x is a 1-d array.

18. Change the data type of elements in the NumPy array.

Use ndarray.astype

x = np.array([0,1,2.0,3.0,4.2],dtype=np.float32)x.astype(np.int16) will producearray([0, 1, 2, 3, 4], dtype=int16)x.astype(np.bool) will produce 
array([False, True, True, True, True])

19. Get indices of non-zero elements

Use n-dim_array.nonzero()

x = np.array([0,1,2.0,3.0,4.2],dtype=np.float32)
x.nonzero() will produce
(array([1, 2, 3, 4]),) It's important to note that x has shape (5,) so only 1st indices are returned. If x were say,x = np.array([[0,1],[3,5]])
x.nonzero() would produce
(array([0, 1, 1]), array([1, 0, 1]))So, the indices are actually (0,1), (1,0), (1,1).

20. Sort NumPy array

Use np.ndarray.sort(axis=axis_you_want_to_sort_by)

x = np.array([[4,3],[3,2])
x is
4 3
3 2
x.sort(axis=1) #sort each row3 4
2 3
x.sort(axis=0) #sort each col3 2
4 3

21. Compare NumPy arrays to values

Comparing will produce NumPy n-dimension arrays of boolean type. For eg

x = np.array([[0,1],[2,3]])x==1 will produce
array([[False, True],
[False, False]])

If you would want to count the number ones in x, you could just do

(x==1).astype(np.int16).sum()

It should output 1

22. Multiply two NumPy matrices

Use numpy.matmul to take matrix product of 2-D matrices:

a = np.eye(2) #identity matrix of size 2
a
1 0
0 1
b = np.array([[1,2],[3,4]])
b
1 2
3 4
np.matmul(a,b) will give
1 2
3 4

If we supply a 1-D array, then the output can be very different as broadcasting will be used. We discuss that below. Also, there is another function called np.multiply which performs element to element multiplication. For the previous two matrices output for np.multiply(a,b) would be.

1 0
0 4

23. Dot product of two arrays

np.dot(matrix1, matrix2)

a = np.array([[1,2,3],[4,8,16]])
a:
1 2 3
4 8 16
b = np.array([5,6,11]).reshape(-1,1)b:
5
6
11
np.dot(a,b) produces
38
160
Just like any dot product of a matrix with a column vector would produce.

The dot product of a row vector with a column vector will produce:

if a is array([[1, 2, 3, 4]])
and b is:

array([[4],
[5],
[6],
[7]])
np.dot(a,b) gives:array([[60]])a's shape was (1,4) and b's shape was (4,1) so the result will have shape (1,1)

24. Get cross-product of two numpy vectors

Recall vector cross-product from physics. It’s the direction of torque taken about a point.

x = [1,2,3]
y = [4,5,6]
z = np.cross(x, y)z is:
array([-3, 6, -3])

25. Getting gradient of an array

use np.gradient. NumPy calculates the gradient using Taylor series and central difference method. You may read more about it at this post.

x = np.array([5, 10, 14, 17, 19, 26], dtype=np.float16)np.gradient(x) will be:
array([5. , 4.5, 3.5, 2.5, 4.5, 7. ], dtype=float16)

26. How to slice NumPy array?

For single element:
x[r][c] where r,c are row and col number of the element.
For slicing more than one element.x:
2 4 9
3 1 5
7 8 0
and you want 2,4 and 7,8 then dox[list_of_rows,list_of_cols] which would bex[[0,0,2,2],[0,1,0,1]] producesarray([2, 4, 7, 8])If one of the rows or cols are continuous, it's easier to do it:x[[0,2],0:2] producesarray([[2, 4],
[7, 8]])

27. Broadcasting

Any article on NumPy would be incomplete if it doesn’t include broadcasting. It’s an important concept that helps NumPy to vectorize operations and thus make the computation faster. Understanding a few rules will help one to dissect broadcasting well.

From NumPy docs:

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

1. they are equal, or

2. one of them is 1

Another thing to keep in mind is,

If the dimensions match, the output will have the maximum length in each dimension. If one of the dimension had length 1, the value in that dimension will be repeated

Suppose there are two arrays A and B of dimensions (3,4,5) and (4,1) respectively and you would want to add the two arrays. Since they don’t have the same shape, NumPy will try to broadcast the values. It starts comparing the length in the last dimension of the two: which are 5 and 1, these values aren’t equal but since one of them is 1, so it will be repeated and the final output will have length 5 in the last dimension.

2nd last dimension for the two has the same length 4 .

3rd last dimension or the 1st dimension in A has the length 3 while B doesn’t have anything. When a dimension is missing in one of the vectors, NumPy prepends 1 to the vector. So, B becomes (1,4,1) . Now, the lengths match as they are 3 and 1 and the values are repeated for 3 times in B. Final output will have the shape (3,4,5) .

a :3 5 8
4 5 6
9 7 2
b :b = [2,3,4]a's shape: (3,3)
b's shape: (3,)
Last dimension has same length 3 for the two and then 1 is prepended to the the dimension of b as it doesn't have anything in that dimension. So, b becomes [[2,3,4]] having shape (1,3). Now, 1st dimension match and values are repeated for b. Finally, b can be seen as2 3 4
2 3 4
2 3 4
So, a+b produces5 8 12
6 8 10
11 10 6

Check out these posts on broadcasting for more: first and second

Wrap-up

Thank you for reading. I hope this article helps anyone who needs to get started with NumPy. I have found these operations to be very helpful and it’s always good to have them on our tip. The operations are pretty basic and there may be different ways in NumPy to achieve the same objective. I have mostly used NumPy docs as reference except for a couple of other posts, for which links have been provided. Also, if you are interested, check out my post on 21 Pandas operations for absolute beginners and Spark installation.

Contact

If you liked this post, please share it with others who might find it useful. I really love data science and if you are interested in it too, let’s connect on LinkedIn or follow me here on towards data science platform.

--

--