The world’s leading publication for data science, AI, and ML professionals.

15 Numpy Functionalities That Every Data Scientist Must Know

Discussion on fifteen of the most essential Numpy functions that developers must learn

A large portion of machine learning is working with mathematical operations. Since math is an integral part of most Data Science projects, it becomes essential for beginner data scientists to dwell more deeply into the following subject. A great utility that is offered by one of the best programming languages for data science, in Python, is the numpy library.

Numerical Python (NumPy) is a quintessential portion of most machine learning and Data Science projects. These numpy arrays find their utility in applications of computer vision for the processing of images and computing them in terms of arrays. They also find tremendous usage for natural language processing tasks during the vectorization of arrays to make them compatible for training ML or DL models.

With the help of NumPy arrays, you can achieve the accomplishment of most mathematical calculations with relative ease. It is the best library for most tasks related to linear algebra and computing other similar operations. However, there is also another library that you must check out for simplification of integration and differentiation with a Python library. Check out how to use sympy from the following link provided below for simplifying integral and differential calculus.

Best Library To Simplify Math For Machine Learning!

This article constitutes a beginner’s guide to fifteen must-know numpy functionalities that will be extremely beneficial for a variety of operations. We will try to cover most of the essential ones, but there is so much more to explore. So, let us get started with all the basic functions to learn as a data science enthusiast.


1. Creating Arrays:

x = [1, 2, 3, 4, 5]
y = np.array(x)
y

Output: array([1, 2, 3, 4, 5])

The first essential step that every data scientist who plans to master numpy must know is how they can create one. When you can create an array, you can manipulate them to perform numerous operations. To create an array, you can perform the following computation in multiple ways. One of the ways is to use the list data structure in Python to store numeric values and then proceed to interpret them in the form of arrays with numpy.

An array can also be defined by declaring the numpy array by using the np.array() or np.asarray() functions followed by square brackets for creating single or multi-dimensional arrays. Once you create these arrays, a variety of operations and manipulations can be performed on them. In the next few sections, let us see what actions are usually performed on arrays.


2. Shape of Arrays:

a = np.array([[1, 2, 3], [2, 3, 4]])
print(a.shape)

Output: (2, 3)

An essential concept of arrays is the various shapes in which they exist. The shape of a numpy array determines the different types of calculations and manipulations that you can perform on them. The shape of a numpy array can be interpreted as soon as the particular array is created. The .shape attribute of an array will return a tuple of its particular shape.

The important topic that one must understand in numpy arrays is the concept of N-dimensional (or nd array). These arrays are those which have more than one dimension (not a row or column dimension) containing items of the same size and type. N-dimensional arrays are most commonly used for performing a variety of mathematical operations.


3. Array Indexing:

a = np.array([[1, 2, 3], [2, 3, 4]])
print("The middle elements are:", a[0][1], "and", a[1][1])

Output: The middle elements are: 2 and 3

Similar to the indexing of lists, we also have a method of indexing arrays to perform and manipulate a particular element (or elements) in a specific location in the numpy array. With array indexing, we can access any required element with the knowledge of its position.

In the above example, we are trying to receive the values of the middle elements from the particular array of the shape (2, 3). The first element in an array starts with the initial index of zero. The particular row number and the column number are specified within the square brackets, which will allow the users to compute the indexing of an array and receive the particular position of the elements.


4. Array Slicing:

a = np.array([[1, 2, 3], [2, 3, 4]])
print(a[1:2])

Output: [[2 3 4]]

Another cool operation that we can perform on numpy arrays similar to lists is the concept of slicing. In this technique, we try to obtain only the required elements from the particular numpy array. For the example code block shown above, we are trying to only get all the elements from the second row by slicing off the first row accordingly.

I would recommend exploring more such operations on slicing the numpy arrays on your own. Try out numerous slicing operations to observe different results. It is also recommended that you check out one of my previous articles on mastering lists in Python from the below link, as it will help you to gauge a better intuition on indexing and slicing.

Mastering Python Lists For Programming!


5. Multiplication of Arrays:

a = np.array([1, 2, 3])
b = np.array([[2],[1], [0]])
print(np.matmul(a, b))

Output: [4]

With numpy arrays, it is possible to compute the multiplication of matrices with great simplicity. In the above example, we notice that the particular arrays, which are in the shape of 1 X 3 and 3 X 1, are multiplied to receive an output result of a 1 X 1 matrix. Multiple such calculations are possible with numpy arrays.


6. Dot Products:

a = np.array([1, 2, 3])
b = np.array([[2],[1], [0]])
print(np.dot(a, b))

Output: [4]

Another significant computation that is possible with numpy arrays is finding the dot products of two variables. The dot product is essentially the sum of all the multiples of two specific arrays. The concept of dot products is used everywhere in Machine Learning. An example is for the use cases of weights calculation or computing the cost function.

According to the following reference, matmul differs from dot in two important ways.

  • Multiplication by scalars is not allowed.
  • Stacks of matrices are broadcast together as if the matrices were elements.

7. Sum of the elements in an array:

a = np.array([1, 2, 3])
print(np.sum(a))

Output: 6

Computing the sum of the numerous elements in a numpy array is quite a useful task that can be accomplished with the help of the sum function offered in this library. If you were to perform a similar action with the help of a list, you would probably use a for loop to iterate over all the elements in the list and add them accordingly. This method would increase the overall complexity, and hence, it is preferable to use numpy arrays for such mathematical computations.

As a bonus, I would like to mention another method of performing this computation with the help of lists while preserving the time and space complexity. This action is performed with the help of the anonymous functions available in Python. You can make use of the functools library to import the reduce function. Once imported, you can make use of this advanced function to compute the entire calculation in a single line.

from functools import reduce
a = [1, 2, 3]
sum = reduce(lambda x, y: x+y, a)
print(sum)

If you want to learn more about the topic of understanding advanced functions with multiple codes and examples, feel free to check out the article below. It covers the specific concepts in further detail.

Understanding Advanced Functions In Python With Codes And Examples!


8. Mean:

a = np.array([1, 2, 3])
print(np.mean(a))

Output: 2.0

Numpy also allows the developers to compute the mean of a particular array with relative ease. The mean or average is commutated by adding all the elements and dividing the sum by the total number of elements present in the specified array. The calculation of the mean is significant in several machine learning algorithms such as linear regression for computing the mean squared error.


9. Exponentiation:

x = 5
print(np.exp(x))

Output: 148.4131591025766

A significant operation that is performed in machine learning is the operation of exponentiation. The Euler’s number ‘e,’ whose value is approximated at 2.718, holds great significance in base logarithmic computations. Hence, this function is sometimes used for performing mathematical operations. Another similar function to consider is the pi operation available in numpy.

An extremely popular use case of a scenario where the exponentiation operation is used is for defining a sigmoid function. The above image is a representation of the following. To learn more about such activation functions, I would recommend checking out one of my previous articles on a popular activation called rectified linear unit from the below link.

Understanding ReLU: The Most Popular Activation Function in 5 Minutes!


10. Flattening an array:

a = np.array([[1, 2, 3], [2, 3, 4]])
a = np.ndarray.flatten(a)
print(a.shape)

Output: (6, )

Whenever we are working with multi-dimensional, we might need to flatten them for specific tasks. The flatten function in numpy will help you to reduce the n-dimensional array into a single entity. The other similar functions that data scientists must explore are the expanding dimensions or squeezing dimensions functions available in numpy.


11. Arange:

a = np.arange(5, 15, 2)
a

Output: array([ 5, 7, 9, 11, 13])

The arange function is used to create arrays that are usually evenly spaced with specified intervals. You can describe the starting point, stopping point, and the number of steps, respectively, to generate any numpy array of the desired shape. To create a multi-dimensional array of a desired shape with the numpy.arange() function, make sure to use the reshape function.


12. Sort an array:

a = np.array([3, 1, 2, 5, 4])
np.sort(a)

Output: array([1, 2, 3, 4, 5])

When we have an array with shuffled values, and we want to receive them all arranged in ascending order, that is, in increasing numerical order, the sort function is quite useful. While a similar action can be performed with lists as well, it is worth noting that such actions are possible in numpy arrays.

When performing various operations, you might encounter numerous instances where you are appending more elements to a particular numpy array, but once you finish getting a list of values, you might want them all sorted together to identify the values accordingly. In such use cases, the sort() function comes in quite handy.


13. Randomize arrays:

np.random.rand(2, 2)

Output: array([[0.21886868, 0.09692608],

[0.60732111, 0.85815271]])

The numpy library also allows the users to randomly generate and create arrays with randomized values. This functionality is similar to the random library in Python. However, the numpy version of random.rand() function, you can generate n-dimensional arrays for performing numerous calculations and computations.

One of the critical operations where random numpy operations are usually used in machine learning and deep learning is when we want to initialize a bunch of weights or biases with random values. Usually, the best approach is to initialize certain values with a random value rather than zero. For understanding the concepts of randomness with an example of proportional sampling, it is recommended that you check out the following article provided below.

Step By Step Guide: Proportional Sampling For Data Science With Python!


14. Minimum, Maximum, And Absolute:

a = np.array([1, 2, -3, 4, 5])
print(np.min(a))
print(np.max(a))
print(np.abs(a))

Output: -3

5 [1 2 3 4 5]

Looking at some other basic operations that we can perform with numpy arrays are to find out the minimum, maximum, and absolute values of a particular numpy array. The np.min() and np.max() functions are quite self-explanatory as these operations are used to compute the minimum and maximum values in the given numpy array, respectively.

While the other two functions return a single value, which is either a single minimum or maximum value, the absolute function will return back an array. However, all the base values are returned without the consideration of the negative sign. The other similar functions which I would recommend the users to experiment with are operations on the ceiling, flooring, and other such operations.

15. Trigonometric functions:

print(np.sin(np.pi/3.))
print(np.cos(np.pi/3.))

Output: 0.8660254037844386

0.5000000000000001

Apart from all the exceptional tasks that you achieve with numpy arrays, you can also perform trigonometric operations with this library. In the above example, we have performed a couple of simple trigonometric operations on sine and cosine for sixty degrees to achieve their respective results. Feel free to explore other similar trigonometric functions.


Conclusion:

Without mathematics, there’s nothing you can do. Everything around you is mathematics. Everything around you is numbers. — Shakuntala Devi

Numpy is one of the best libraries that is available in Python for a wide range of tasks. While machines are not the best with textual or visual information, when these are converted into mathematical arrays with the help of numpy, the computation of many critical tasks becomes possible. Apart from improved compatibility, it also becomes easier to achieve certain tasks. Hence, numpy is one of the best libraries that data scientists must seek to master.

In this article, we understood fifteen of the most essential numpy functionalities that every data scientist must know about. While some of the functions we discussed were the numerous activities that could be performed on numpy arrays like shaping, indexing, and slicing, others were various computations that we could perform on numpy arrays. Some of these include the sum(), mean(), arange(), and other similar operations.

There is tons more stuff you can try out with these numpy arrays. Make sure to explore and dig deep into the following subject. If you have any queries related to the various points stated in this article, then feel free to let me know in the comments below. I will try to get back to you with a response as soon as possible.

Check out some of my other articles that you might enjoy reading!

17 Must Know Code Blocks For Every Data Scientist

6 Best Projects For Image Processing With Useful Resources

Best PC Builds For Deep Learning In Every Budget Ranges

7 Best Free Tools For Data Science And Machine Learning

6 Best Programming Practices!

Thank you all for sticking on till the end. I hope all of you enjoyed reading the article. Wish you all a wonderful day!


Related Articles