
Everything about Data Science starts with data and it comes in various formats. Numbers, images, texts, x-rays, sound, and video recordings are just some examples of data sources. Whatever the format data comes in, it needs to be converted to an array of numbers to be analyzed. Hence, it is crucial to effectively store and modify arrays of numbers in data science.
NumPy (Numerical Python) is a scientific computing package that provides numerous ways to create and operate on arrays of numbers. It forms the basis of many widely used Python libraries related to data science such as Pandas and Matplotlib.
In this post, I will go over 20 commonly used operations on Numpy arrays. These operations can be grouped under 4 main categories:
- Creating arrays
- Manipulating arrays
- Combining arrays
- Linear algebra with arrays
We first need to import NumPy:
import numpy as np
Creating arrays
- Random integers in a specific range

The first parameter determines the upper bound of the range. The lower bound is 0 by default but we can also specify it. The size parameter is used to specify the size, as expected.

We created a 3×2 array of integers between 2 and 10.
2. Random floats between 0 and 1

A 1-dimensional array of floats between 0 and 1. It is useful to create random noise data.
3. Sample from a standard normal distribution
Np.random.randn() is used to create a sample from a standard normal distribution (i.e. zero mean and unit variance).

We created an array with 100 floats.
4. Matrix with ones and zeros
A matrix can be considered as a 2-dimensional array. We can create a matrix with zeros or ones with np.zeros and np.ones, respectively.

We just need to specify the dimension of the matrix.
5. Identity matrix
An identity matrix is a square matrix (nxn) that has ones on the diagonal and zeros on every other position. Np.eye or np.identity can be used to create one.

6. Arange
Arange function is used to create arrays with evenly spaced sequential values in a specified interval. We can specify start value, stop value, and step size.

The default start value is zero and the default step size is one.

7. Array with only one value
We can create an array that has the same value at every position using np.full.

We need to specify the size and the number to be filled. Also, the data type can be changed using dtype parameter. The default data type is integer.
Manipulating arrays
Let’s first create a 2-dimensional array:

8. Ravel
Ravel function flattens the array (i.e. convert to a 1-dimensional array).

By default, an array is flattened by adding row after row. It can be changed to column-wise by setting the order parameter as F (Fortran-style).
9. Reshape
As the same suggests, it reshapes an array. The shape of A is (3,4) and the size is 12.

We need to preserve the size which is the product of the sizes in each dimension.

We don’t have to specify the size in every dimension. We can let NumPy to figure out a dimension by passing -1.

10. Transpose
Transposing a matrix is to switch rows and columns.

11. Vsplit
Splits an array into multiple sub-arrays vertically.

We split a 4×3 array into 2 sub-arrays with a shape of 2×3.
We can access a particular sub-array after splitting.

We split a 6×3 array into 3 sub-arrays and get the first one.
12. Hsplit
It is similar to vsplit but works horizontally.

If we apply hsplit on a 6×3 arrays to get 3 sub-arrays, resulting arrays will have a shape of (6,1).

Combining arrays
We may need to combine arrays in some cases. NumPy provides functions and methods to combine array in many different ways.
13. Concatenate
It is similar to the concat function of pandas.

We can convert these arrays to column vectors using the reshape function and then concatenate vertically.

14. Vstack
It is used to stack arrays vertically (rows on top of each other).

It also works with higher dimensional arrays.

15. Hstack
Similar to vstack but works horizontally (column-wise).

Linear algebra with NumPy arrays (numpy.linalg)
Linear algebra is fundamental in the field of data science. NumPy being the most widely used scientific computing library provides numerous linear algebra operations.
16. Det
Returns the determinant of a matrix.

A matrix must be square (i.e. the number of rows is equal to the number of columns) to calculate the determinant. For a higher-dimensional array, the last two dimensions must be square.
17. Inv
Calculates the inverse of a matrix.

The inverse of a matrix is the matrix that gives the identity matrix when multiplied with the original matrix. Not every matrix has an inverse. If matrix A has an inverse, then it is called invertible or non-singular.
18. Eig
Computes the eigenvalues and right eigenvectors for a square matrix.

19. Dot
Calculates the dot product of two vectors which is the sum of the products of elements with regards to their position. The first element of the first vector is multiplied by the first element of the second vector and so on.

20. Matmul
It performs matrix multiplication.

We have covered the basic yet fundamental operations of NumPy. There are more advanced operations on NumPy but it is always better to comprehend the basics first.
Thank you for reading. Please let me know if you have any feedback.