Until now we learned about Python programming in general, which operators when to use and how to simplify repeatable tasks or make decisions using Control flow. Since in Hydrology (& Meteorology) we mostly work with a lot of numbers, we need to look further into tools that can help us deal with a large amount of numbers. Therefore, this article is covering an incredibly popular library in Data Science circles, Numpy.
This article is structured as follows:
- Introduction
- Creating arrays
- Shape and Reshape
- Accessing elements and slicing an Array
- Maths and Statistics with Numpy
- Bonus content – Speed advantages of Numpy
- Conclusion
Enjoy the learning!
Introduction
Why to use Numpy? When printed, a Python list of integers or floats, looks exactly the same as a Numpy array. Both can do mathematic operations on a bunch of numbers, both can do statistical calculations and comparisons can go on… So you could think Numpy is just a mathematical library with similar functionality as lists, but is it? Let me explain…
The data in Numpy arrays is of homogeneous type, meaning all the data in an array is of same type, while lists are just pointers to objects, even though all the data is of the same type. As a consequence, the Numpy arrays use much less memory than regular lists. Also, most of the Numpy operations is implemented in the C language, meaning, the cost of Python loops and dynamic checking of the data type is avoided. This, yields a significant increase in processing speed when comparing Numpy to a Python list.
More often than one we encounter large datasets with tens of thousands rows of data, just think of hourly air temperature measurements for a county or region since the measurements beginning in this region. If your weather service is measuring hourly air temperature for last 50 years, that’s more that 400 000 rows of data, just for one station.
How to install Numpy?
Well, if using Anaconda, Numpy is preinstalled in the base environment. However, more often than not, it’s good practice to create new environments for new projects. To install Numpy, we run an Anaconda prompt and type:
conda install numpy
or
conda install -c anaconda numpy
If pip is being used, Numpy can be installed by typing:
pip install numpy
How to import Numpy?
When importing certain libraries, including Numpy, we follow a convention, basically this means we use well established abbreviations for libraries. In the case of Numpy we use "np".
import numpy as np
The goal is that our code is reproducible, and every Python programmer in the World, knows what the following line does:
a = np.array([3,4])
Congrats, if you have imported Numpy, and used the above command, you have successfully created your first Numpy array. Let’s see what happens if we print it out. Print gives us something that looks like a list, but it’s not. When we check the type we see that’s a "numpy.ndarray" (n dimensional array).

Vectors?
In the example we saw how we can create an 1-dimensional array. If you remember vectors from Math, well a 1-dimensional Numpy array is basically a vector. Since we gave two numbers, 3 and 4, this vector lies in the 2-dimensional space (geometric plane). It’s same as in math when you had a vector:
v= 3i + 4j

In Computer Science, vectors are just lists, where the length of the list (in our case, 2) is the number of dimensions of the vector. And in Data Science terms, a vector represents one or multiple features of and object. Think of meteorological measurements on Monday, you could measure air temperature, precipitation, wind speed, snow depth, etc. To learn more about vectors, I highly recommend this video by 3Blue1Brown.
Creating arrays
Above we already saw how to create a simple 1-D array in Numpy. Often, our data comes in more dimensions, we have multiple features (like above), but also have measurements for multiple days in the week. In this case, we need to add a second dimension to our arrays. Let’s see some 2-D arrays.

To create a 2-D array, we provide a list, containing two lists. Think of this array as measurements on Monday (first list/row) and Tuesday (second row/list) where 1st column is air temperature, 2nd column precipitation, 3rd column wind speed and 4th column snow depth. The excel screenshot should clarify things.

Numpy also provides some useful functions to create arrays of zeros or ones. Try out the following commands by yourself, and print out the results.
To demonstrate how to get the number of dimensions of your newly created array, I will use the np.ones function together with the ndim attribute.
So, our array has four dimensions, but how does a 4-dimensional array look like?

If we take a closer look, we can determine the number of dimension if we count the square brackets on the start or the end of the array also, a handy hack 🙂
Another useful method is arange. It is used to get an evenly spaced array. We need to specify the end number (int or float).
Numpy then assumes that starting point is zero. We can also provide the starting and ending point.
And, we can specify the step as follows:
Similarly, with the method linspace we can create an array, but instead of the step, linspace takes the number of elements in the array. Here we create an array with five elements between 7 and 12. Also, contrary to np.arange and most of Python methods, the last number (ending number) here is including.
Shape and Reshaping
Before, we checked how many dimensions (or axes) our ones array had. But what if we are interested in how many elements are in each of the dimensions? The shape attribute comes in handy. Since we have 4 dimension, we get a tuple of 4 numbers.

To count the number of elements in the whole array we use the size attribute.

In order to change the shape of an array, we use the .reshape() method. Care has to be taken though, the newly reshaped array has to be of same size as the old one. Let me explain..

The original zeros array had the shape of (2, 3), and we can reshape it into (3,2), (6, 1) or (1,6), since it has a size of 6 elements. I shall mention, that in case of reshaping it to (6, 1) or (1, 6) we change the number of dimensions, from a 2-D array, to a 1-D array, but as long we take care of the array size, we are on the safe side.
A handy "shortcut" to a 1-D array are the flatten() and ravel() methods. The difference is that flatten creates a 1-D copy of the original array, while ravel creates a reference to the original array. So, using ravel() has the consequence that changing for example some of the data in the newly created array while also change the data in the original array.
The usage depends on the specific task, most of the time I’ve used the flatten() method.

Last but not least, let’s not forget the transpose() method. This method simply swaps the rows and columns of an array.

In this case, the result is same as before with reshape. In case of a multidimensional array, all dimensions get swapped, let’s see.

Accessing elements and slicing an Array
Until now, we saw how to create, find proportions and reshape or flatten an array. Let’s turn our focus now on data extraction from an array using indexing and slicing. To slice an array means to access it’s elements by providing the desired elements index.
The default syntax of slicing involves the array name and square brackets, like for Python lists, as follows:
array_name[start_index : end_index : step_size]

In our array the temperature measurements start from 1:00. To clarify the things, we printed out hourly temperatures at 12:00, 14:00, 16:00, 18:00 and 20:00.
If we don’t define the step size, every element in the specified range get returned. For example if we need the hourly temperatures from 7:00 to 12:00.

As usual in Lists, so in Numpy also, the start_index is including, while the end_index is not including. Also, the first index in an array is always zero. So to access the temperature at 7:00, we input 6-th index. And since we need the measurements until 12 (which is at index no. 11), we provided 12th index (it’s excluding).
Slicing 2-D arrays
When slicing a 2-D array, we need to specify the row and column of the element we desire. It can be a little tricky at first, but when tried on few examples, it’s soon gets really easy.
The syntax to slice a 2-D array is as follows:
array_name[row_start_index : row_end_index : row_step_size, column_start_index : column_end_index : column_step_size]
To see how slicing a 2-D array works, I will first extend our weather_data array to a full week, so we shall get an array of shape (7,4).

Let’s say we want to know the weekly precipitation. So we need to slice out all rows, and the 2nd column.

In this case, to select all the rows we use a colon sign (:). To select the 2nd column we use the index value of 1 (remember, indices start from zero).
Feel free to try out other possibilities, I would compare slicing with integrals in Math, there are certain rules to follow, but practice makes perfect.
Negative slicing is also allowed, and works in same manner as with python lists. The last item in an array has the index of -1.
"Finding" data in an array
Another way of finding data in an array is by using a very popular function inside of Numpy, np.where(). The function returns the indices of elements that meet a condition. Commonly, it’s used when finding elements that are greater, equal or less then a number. The basic syntax of np.where() is as follows:
np.where(condition [, x, y])
x and y are parameters which can be used to replace the value in the array that meets the given condition. Either we don’t provide x and y (we just need to find indices or values that meet the condition), or we provide both x and y then the values at the found index get changed by x if True, or y if condition is False. Very similar like IF() function in MS Excel.
Let’s say we want to print out the indices of days that were warmer than 14.5 degrees °C.

We have two important things here: firstly, we use the above learned slicing to select the first column (all rows, since we search all weekdays), and then we set the condition > 14.5.
Let’s say we want to convert all temperature values greater than 14.5 to Fahrenheit degrees, and store the resulting array to the variable weather_b.

Again, we first provide a condition ( >14.5 °C), then we give what value (multiply the value by 1.8 and add 32) to use if True, and what value (use the existing value) to use if False. Notice, that we always slice the array, since we work only on the first column.
We can now create a new array called weather_f (identical to weather_data) but the temperature values will be replaced with values in Fahrenheits.

First, we make a copy of weather_data (remember flatten() and ravel() methods ), to avoid changes in original weather_data array, and then we slice the new weather_f (first column), and replace the values with the ones calculated in weather_b.
Maths and Statistics with Numpy
Finally we’ve come to my favourite part of Numpy, mathematical and statistical operations. This is in my eyes what makes Numpy so great, and superior to same operations with common lists. It’s the simplicity and speed advantage when dealing with a great amount of numerical data. Let’s first take a look to mathematical operations. I will provide an example with division, but the general syntax is the same for other operations, and can be looked up at the official Numpy pages.
Our weather_data array contains precipitation data in millimetres, lets convert those to metres. To convert millimetres to metres, we need to divide the value by 1000.

Again we use slicing, to select the second column of the array, and divide the values by 1000. For practice, try to replace precipitation values from the array weather_f with the ones converted to metres. (you can do the change on the weather_f array directly)
As for the Statistics example, I shall use the most common case, we need to calculate the average temperature, precipitation, wind speed and snow depth values for the week. The Numpy function to calculate the average values is called np.mean(). The basic syntax of np.mean() function is as follows:
np.mean(a, axis=None, dtype=None, out=None, keepdims=, *, where=)
For our case, the important part is the axis. Since our goal is to calculate the mean values for each column, we need to set the axis parameter to 0. Setting the axis to 1, would yield the result row-wise.

Other statistical functions retain an equal or similar syntax, and can be looked up at the Statistics section of the official Numpy site.
Bonus content – Speed advantages of Numpy
I’ve mention that Numpy also has some speed advantages, this probably got you tempted. Let’s see that in action. Is Numpy really that faster than an for loop?
First, we will create an random array of floats, let’s say its hourly air temperature measured at some location in th US. The length of the array is 30 000.

We want to convert those numbers to Celsius degrees. Let’s measure the time needed using a for loop and Python list, and then using Numpy.

So, the time taken using an loop over a Python list took around 5 ms, while using an Numpy array the same operation took less than 1 ms. So Numpy is in this specific case (task) around 5 times faster.
Also, please consider that this test is not completely applicable, it really depends on the speed of your computer (mostly CPU) and the chosen task. Since this is not the main topic of the article, I’ll leave it to you to check other articles that are covering speed benefits of Numpy, and hopefully, try it out yourself, on your specific task with your data.
Conclusion
Congrats, we have covered the basics of Numpy. But please, keep in mind that Numpy is much more than that, and practice is very important. Here, I’ve presented mainly the functions that I use very often in my tasks and projects, but there is a lot more of functions and methods in Numpy.
Make sure you check the official Numpy site for more useful possibilities that may be more related to your projects or tasks. Don’t get discouraged at first, all of these syntaxes seem a little complex at first, but believe me, sooner than you think, u will adopt them in no time. 🙂
My next article will cover another popular library in Data Science applications, called Pandas. Stay tuned, and until next time keep practicing, because, practice makes perfect.
For any questions or suggestions regarding this article or my other articles on Medium, feel free to contact me via LinkedIn.
Thank you for taking the time, Cheers! 🙂