
Overview of Your Journey
- Setting the Stage
- 1 – Quick Filtering
- 2 – Reshaping Yourself Out of Trouble
- 3 – Restructuring Your Shape
- 4 – Find Unique Values
- 5 – Combine Arrays
- Wrapping Up
Setting the Stage
When doing Data Science in Python, the package NumPy is omnipresent. Whether you are developing machine learning models with Scikit-Learn or plotting in Matplotlib, you’re sure to have a few NumPy arrays laying around in your code.
When I started with data science in Python, I had a poor grasp of what could be done with NumPy. Over the years, I have sharpened my NumPy skills and become a better data scientist because of it.
Being good at manipulating Numpy arrays can save your life…or at least an hour of frustrating searching. The five NumPy functions I give you here can help you when things get tough 🔥
Throughout the blog post, I assume you have installed NumPy and have already imported NumPy with the alias np
:
import numpy as np
I recommend having seen NumPy previously before reading this blog. If you are completely new to NumPy, then you can check out NumPy’s Beginners Guide or this YouTube video series on NumPy.
1 – Quick Filtering
You can use the where
function to quickly filter an array based on a condition. Say you have an audio signal represented as a one-dimensional array:
# Audio Signal (in Hz)
signal = np.array([23, 50, 900, 12, 1100, 10, 2746, 9, 8])
Let’s say that you want to remove everything in signal
that has a Hz of less than 20. To efficiently do this in NumPy you can write:
# Filter the signal
filtered_signal = np.where(signal >= 20, signal, 0)
# Print out the result
print(filtered_signal)
>>> np.array([23, 50, 900, 0, 1100, 0, 2746, 0, 0])
The where
function takes three arguments:
- The first argument (in our example
signal >= 20
) gives the condition you want to use for the filtering. - The second argument (in our example
signal
) specifies what you want to happen when the condition is satisfied. - The third argument (in our example
0
) specifies what you want to happen when the condition is not satisfied.
As a second example, assume you have an array high-pitch
indicating whether the pitch of the sounds should be raised:
# Audio Signal (in Hz)
signal = np.array([23, 50, 900, 760, 12])
# Rasing pitch
high_pitch = np.array([True, False, True, True, False])
To raise the pitch of signal
whenever the corresponding high-pitch
variable says so, you can simply write:
# Creating a high-pitch signal
high_pitch_signal = np.where(high_pitch, signal + 1000, signal)
# Printing out the result
print(high_pitch_signal)
>>> np.array([1023, 50, 1900, 1760, 12])
That was easy 😃
2 – Reshaping Yourself Out of Trouble
Often one has an array with the correct elements, but with the wrong form. More specifically, assume you have the following one-dimensional array:
my_array = np.array([5, 3, 17, 4, 3])
print(my_array.shape)
>>> (5,)
Here you can see that the array is one-dimensional. You want to feed my_array
into another function that expects a two-dimensional input? This happens surprisingly often with libraries like Scikit-Learn! To do this, you can use the reshape
function:
my_array = np.array([5, 3, 17, 4, 3]).reshape(5, 1)
print(my_array.shape)
>>> (5, 1)
Now my_array
is properly two-dimensional. You can think of my_array
as a matrix with five rows and a single column.
If you want to go back to my_array
being one-dimensional, then you can write:
my_array = my_array.reshape(5)
print(my_array.shape)
>>> (5,)
Pro Tip: As a shorthand, you can use the NumPy function
squeeze
to remove all dimensions that have length one. Hence you could have used thesqueeze
function instead of thereshape
function above.
3 – Restructuring Your Shape
You will sometimes need to reshuffle the dimensions you already have. An example will make this clear:
Say you have represented an RGB image of size 1280×720 (this is the size of YouTube thumbnails) as a NumPy array called my_image
. Your image has the shape (720, 1280, 3)
. The number 3 comes from the fact that there are 3 colour channels: red, green, and blue.
How do you rearrange my_image
so that the RGB channels populate the first dimension? You can do that easily with the moveaxis
function:
restructured = np.moveaxis(my_image, [0, 1, 2], [2, 0, 1])
print(restrctured.shape)
>>> (3, 720, 1280)
With this simple command you have restructured the image. The two lists in moveaxis
specify the source and destination positions of the axes.
Pro Tip: NumPy has other functions such as
swapaxes
andtranspose
that also deal with restructuring arrays. Themoveaxis
function is the most general, and the one I use most of the time.
Why is Reshaping and Restructuring Different?

Many people think that reshaping with the reshape
function and restructuring with the moveaxis
function is the same. Yet, they work in different ways 😦
The best way to see this is with an example: Say that you have the matrix:
matrix = np.array([[1, 2], [3, 4], [5, 6]])
# The matrix looks like this:
1 2
3 4
5 6
If you use the moveaxis
function to switch the two axes, then you get:
restructured_matrix = np.moveaxis(matrix, [0, 1], [1, 0])
# The restructured matrix looks like this:
1 3 5
2 4 6
However, if you use the reshape
function, then you get:
reshaped_matrix = matrix.reshape(2, 3)
# The reshaped matrix looks like this:
1 2 3
4 5 6
The reshape
function simply proceeds row-wise and makes new rows whenever appropriate.
4 – Find Unique Values
The unique
function is a sweet utility function for finding the unique elements of an array. Say that you have an array representing the favourite cities of people sampled from a poll:
# Favorite cities
cities = np.array(["Paris", "London", "Vienna", "Paris", "Oslo", "London", "Paris"])
Then you can use the unique
function to get the unique values in the array cities
:
unique_cities = np.unique(cities)
print(unique_cities)
>>> ['London' 'Oslo' 'Paris' 'Vienna']
Notice that the unique cities are not necessarily in the order they originally appeared in (e.g. Oslo is before Paris).
With polls, it is really common to draw bar charts. In those charts, the categories are the poll options while the height of the bars represent the number of votes each option got. To get that information, you can use the optional argument return_counts
as follows:
unique_cities, counts = np.unique(cities, return_counts=True)
print(unique_cities)
>>> ['London' 'Oslo' 'Paris' 'Vienna']
print(counts)
>>> [2 1 3 1]
The unique
function saves you from writing a lot of annoying loops 😍
5 – Combine Arrays
Sometimes, you will be working with many arrays at the same time. Then it is often convenient to combine the arrays into a single "master" array. Doing this in NumPy is easy with the concatenate
function.
Let’s say that you have two one-dimensional arrays:
array1 = np.arange(10)
array2 = np.arange(10, 20)
Then you can combine them into a longer one-dimensional array with concatenate
:
# Need to put the arrays into a tuple
long_array = np.concatenate((array1, array2))
print(long_array)
>>> [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
Combining Our Tools
What if you wanted to stack array1
and array2
on top of each other? You are hence looking to create a two-dimensional vector that looks like this:
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]]
You can first reshape array1
and array2
into two-dimensional arrays with the reshape
function:
array1 = array1.reshape(10, 1)
array2 = array2.reshape(10, 1)
Now you can use the optional axis
parameter in the concatenate
function to combine them correctly:
stacked_array = np.concatenate((array1, array2), axis=1)
print(stacked_array)
>>>
[[ 0 10]
[ 1 11]
[ 2 12]
[ 3 13]
[ 4 14]
[ 5 15]
[ 6 16]
[ 7 17]
[ 8 18]
[ 9 19]]
Almost there…You can now use the moveaxis
function to finish the job:
stacked_array = np.moveaxis(stacked_array, [0, 1], [1, 0])
print(stacked_array)
>>>
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]]

Awesome! I hope this example showed you how some of the different tools you have just learned can come together.
Wrapping Up
You should now feel comfortable using NumPy for a few tricky situations. If you need to learn more about NumPy, then check out the NumPy documentation.
Like my writing? Check out my blog posts Type Hints, Formatting with Black, Underscores in Python, and 5 Dictionary Tips for more Python content. If you are interested in data science, programming, or anything in between, then feel free to add me on LinkedIn and say hi ✋