Everything about data science starts with data and it comes in various formats. Numbers, images, texts, x-rays, sound and video recordings are just some examples of data sources. Whatever the format data comes in, it needs to be converted to an array of numbers to be analyzed. Complex computer vision models cannot see or process an image like a human being. An image is converted to an array of numbers before analyzed. Hence, it is crucial to effectively store and modify arrays of numbers in data science. NumPy (Numerical Python) is a scientific computing package that offers very functional ways to create and operate on arrays of numbers. So we should definitely learn about NumPy more than "import numpy as np".

Python lists handle mix data types. For example, we can create a list with integers, floats, strings and boolean values. However, all the elements of a NumPy array must be in same type. This sounds like lists can be preferred over NumPy arrays which may be correct for small-sized lists. However, we almost always deal with very large arrays of numbers in real life datasets. NumPy arrays are much more efficient than lists in terms of storage and processing.
It is worth noting that the most commonly used Data Analysis tool, Pandas, is built on NumPy.
In this post, I will cover the basics of NumPy arrays with examples.
Creating NumPy Arrays
We can create an array from a python list:

Data type can be explicitly selected using dtype:

We can also create higher dimensional arrays using nested lists:

Numpy has built-in functions to create arrays from scratch so that we don’t need to pass in values. zeros and ones functions create arrays full of zeros and ones, respectively. The full function allows to select the value to be filled in the array. We just need to pass in the dimensions:

We can also create an array in a specific range with a specific increment using arange function:

The first two arguments specify the range and the third argument is the size of increment. Please note that the upper range value is not inclusive.
Another way to create an array in a range is linspace function. In linspace function, we need to select how many values we want in the specified range (not the size of increments):

The values are equidistant and both lower and upper limits are inclusive. Remember that upper limit is not inclusive with arange function.
We sometimes need random datasets for practice or experimental analysis. The random function of NumPy creates arrays with random numbers:

- random.random creates uniformly distributed random values between 0 and 1.
- The arguments of random.normal are mean, standard deviation and range in order.
- random.randint creates an array of integers in the specified range with specified dimensions.
We created the arrays in the examples above so we know the properties of them. We will not always know the properties beforehand. NumPy provides ndim, size and shape methods to learn about an array:

Accessing Array Elements
Accessing individual elements or slices of arrays is similar to same operations for Python lists:

The index of the first element is 0. If we start from the end, the index of the first element is -1, the second element is -2 and so on. To access a slice of an array, we need to specify the upper and lower limits with a colon in between. If we don’t specify the lower limit (i.e. [:3]), it starts from the beginning. Similarly, if upper limit is not specified, upper limit is the end of array. Please note that specified upper limit of the range is exclusive.
Selecting single elements or all elements in a range are not the only options. In some cases, we may need to access every second or third element. The syntax for this task is a[start:end:step]. It becomes more clear with examples:

If lower and upper values are not specified, start and end of array are taken as lower and upper limit.
To access an element of a multi-dimensional array, comma separated index or nested index can be used:

Operations on Arrays
Arrays are mutable so we can change the value of an element in an array:

We can change the dimensions of array using reshape function:

Arithmetic operations can be done on arrays as a vectorized operation.

NumPy also provides mathematical functions to be used as a vectorized operations. These functions are also called ufuncs (universal functions).

There are many other ufuncs available in NumPy. You can always check the documentation on the official website of NumPy for more detail.
Note: The advantage of vectorized operation is the speed of execution. The examples I show here include small arrays. However, in real projects, we need to operate on very large arrays of numbers (i.e. 1 million elements) for which the speed is crucial.
I covered the basics but there is much more that NumPy offers which I’m planning to cover in a second post about NumPy. You can always visit the official website for accessing more detailed information.
Thank you for reading. Please let me know if you have any feedback.
My Other Posts
Data analysis
Machine Learning Algorithms
- Naive Bayes Classifier – Explained
- Logistic Regression – Explained
- Support Vector Machine – Explained
- Decision Trees and Random Forests – Explained
- Gradient Boosted Decision Trees – Explained