The world’s leading publication for data science, AI, and ML professionals.

Introducing NumPy, Part 2: Indexing Arrays

Slicing and dicing like a pro

Quick Success Data Science

Indexing an array by DALL-E3
Indexing an array by DALL-E3

NumPy is Python’s foundational library for numerical calculations. With NumPy, the heavy lifting is handled by arrays, essentially tables of elements of the same data type. Arrays are optimized for performance, permitting faster mathematical and logical operations than traditional Python data types, like lists.

In Part 1, we covered how to create arrays, describe them, and access their attributes using dot notation. In this article, we’ll examine how to access the elements in arrays using indexes and slices, so you can extract the value of elements and change them using assignment statements. Array indexing uses square brackets [], just like Python lists.


Array Dimensions and Axes

As a refresher from Part 1, here is a graphical representation of a 1D, 2D, and 3D array, with the axes annotated. You’ll need to understand the axes’ directions to index properly.

Graphical representation of 1D, 2D, and 3D arrays (from Python Tools for Scientists) (This and several future links to my book are affiliate links)
Graphical representation of 1D, 2D, and 3D arrays (from Python Tools for Scientists) (This and several future links to my book are affiliate links)

Indexing and Slicing 1D Arrays

One-dimensional arrays are zero-indexed, so the first index is always 0. For indexing and slicing in reverse, the first value is -1. The following figure describes the indexes of five elements in an array:

The indexes of a 1D array (from Python Tools for Scientists)
The indexes of a 1D array (from Python Tools for Scientists)

If you’re familiar with list indexing, you won’t have problems indexing 1D arrays. Let’s look at some examples in which we select elements using both positive and negative indexing:

In [1]: import numpy as np

In [2]: arr1d = np.array([15, 16, 17, 18, 19, 20])

In [3]: arr1d[0]
Out[3]: 15

In [4]: arr1d[-6]
Out[4]: 15

In [5]: arr1d[-1]
Out[5]: 20

To access every other element in the array, include a step value of 2:

In [6]: arr1d[::2]
Out[6]: array([15, 17, 19])

To access multiple elements at once, use an array of comma-separated indexes, as follows:

In [7]: arr1d[[0, 2, 4]]
Out[7]: array([15, 17, 19])

After you’ve selected these elements, you can assign them a new value and change the values in the underlying array, like this:

In [8]: arr1d[[0, 2, 4]] = 0

In [9]: arr1d
Out[9]: array([ 0, 16,  0, 18,  0, 20])

You can also assign new values to a group of array elements with array slices. In this next example, we use slicing to change the first three elements to a value of 100:

In [10]: arr1d[:3] = 100

In [11]: arr1d
Out[11]: array([100, 100, 100,  18,   0,  20])

In the previous example, the value of 100 was propagated across the entire slice. This process is known as broadcasting. Because array slices are views of the source array rather than copies, any changes to the view will modify the original array. This is advantageous when working with very large arrays, as it keeps NumPy from making memory-intensive copies on the fly.

A view and the original array share the same data buffer. Any modifications made to the view will affect the original array and vice versa.

Note that this assignment behavior persists even when array slices are assigned to a variable:

In [12]: arr1d = np.array([0, 1, 2, 3, 4])

In [13]: a_slice = arr1d[3:]

In [14]: a_slice
Out[14]: array([3, 4])

In [15]: a_slice[0] = 666

In [16]: arr1d
Out[16]: array([  0,   1,   2, 666,   4])

In [17]: a_slice[:] = 42

In [18]: arr1d
Out[18]: array([ 0,  1,  2, 42, 42])

Because the slice is an array, it has its own set of indexes. Thus, a_slice[:] corresponds to arr2d[3:].

To make an actual copy rather than a view, call the copy() method, as shown here:

In [19]: a_slice = arr1d[1:3].copy()

In [20]: a_slice[:] = 55

In [21]: a_slice
Out[21]: array([55, 55])

In [22]: arr1d
Out[22]: array([ 0,  1,  2, 42, 42])

Now, the a_slice array is separate from arr1d, and changing its elements does not affect the source array.

Alternatively, you can first call the array function on the slice and then mutate the result:

In [23]: a_slice = np.array(arr1d[:])

In [24]: a_slice[:] = 55

In [25]: arr1d
Out[25]: array([ 0,  1,  2, 42, 42])

Changing the a_slice array did not affect arr1d, because the arrays represent separate objects.


Indexing and Slicing 2D Arrays

Two-dimensional arrays are indexed with a pair of values. These value pairs resemble Cartesian coordinates, except that the row index (the axis-0 value) comes before the column index (the axis-1 value), as shown in the following figure. Square brackets are used again.

Indexes of a 2D array (from Python Tools for Scientists)
Indexes of a 2D array (from Python Tools for Scientists)

Let’s create the 2D array in the previous figure to study this further (for a refresher on how to create arrays see Part 1):

In [26]: arr2d = np.arange(1, 10).reshape(3, 3)

In [27]: arr2d
Out[27]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In a 2D array, each value in the index pair references a 1D array (a whole row or column) rather than a single element. For example, specifying an integer index of 1 outputs the 1D array that comprises the second row of the 2D array:

In [28]: arr2d[1]
Out[28]: array([4, 5, 6])

Slicing a 2D array also works along 1D arrays. Here we slice over rows, taking the last two:

In [29]: arr2d[1:3]
Out[29]: 
array([[4, 5, 6],
       [7, 8, 9]])

This produced a 2D array of shape (2, 3), meaning 2 rows and 3 columns.

To obtain a whole column in the 2D array, use the following syntax:

In [30]: arr2d[:, 1]
Out[30]: array([2, 5, 8])

The colon (:) tells NumPy to take all the rows; the 1 selects only column 1, leaving you with only a 1D array from the center column of arr2d.

You can also extract a column with the following syntax, though in this case, rather than outputting a 1D array containing the column’s values, you generate a 2D array of shape (3, 1):

In [31]: arr2d[:, 1:2]
Out[31]: 
array([[2],
       [5],
       [8]])

In [32]: arr2d[:, 1:2].shape
Out[32]: (3, 1)

As a rule of thumb, if you slice a 2D array using a mixture of integer indexes and slices, you’ll get a 1D array. If you slice along both axes, you’ll get another 2D array. For a reference, see the next figure, which shows the results of using various expressions to sample a 2D array:

Example slices through a 2D array (from Python Tools for Scientists)
Example slices through a 2D array (from Python Tools for Scientists)

As with 1D arrays, 2D slices are views of the array that you can use to modify the values in the source array. In this example, we select the middle column in our original arr2d array and change all its elements to 42:

In [33]: a2_slice = arr2d[:, 1]

In [34]: a2_slice
Out[34]: array([2, 5, 8])

In [35]: a2_slice[:] = 42

In [36]: arr2d
Out[36]: 
array([[ 1, 42,  3],
       [ 4, 42,  6],
       [ 7, 42,  9]])

To select individual elements from 2D arrays, specify a pair of integers as the element’s indexes. For example, to obtain the element from the intersection of the second row and second column, enter the following:

In [37]: arr2d[1, 1]
Out[37]: 42

Note that this syntax is a less cumbersome version of the more traditional nested list syntax in which each index is surrounded by brackets:

In [38]: arr2d[1][1]
Out[38]: 42

Indexing and Slicing Higher-Dimensional Arrays

The key to indexing and slicing arrays with more than two dimensions is to think of them as a series of stacked arrays of a lower dimension. We’ll refer to these stacked arrays as plans. As with 2D arrays, the order in which you index 3D arrays is determined by their shape tuples.

Let’s start by looking at a 3D array with a shape of (2, 3, 4). You can think of the first value in the shape tuple as the number of 2D arrays within that 3D array. The next two numbers are treated as the shape tuple for these 2D arrays, representing their rows and columns, respectively. Here’s an example:

In [39]: arr3d = np.arange(24).reshape(2, 3, 4)

In [40]: arr3d
Out[40]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

When you look at the output, you should see two separate 2D arrays of shape (3, 4) stacked one atop the other. These are delineated by a space in the output and a new set of square brackets around the second 2D array.

Because the array contains two matrices, the 3D component to the shape tuple is 2. This number comes first, so you can think of the shape tuple as recording the number of plans, rows, and columns.

To see how this works, let’s use indexes to retrieve the value 20 in the array. We can use the array’s shape tuple (plans, rows, columns) to guide us:

In [41]: arr3d[1, 2, 0]
Out[41]: 20

First, we had to choose the second 2D array, which has an index of 1 because Python starts counting at 0. Next, we selected the third row using 2. Finally, we selected the first column using 0. The key is to work your way through the shape tuple in order. The dimension of the array will let you know how many indexes you’ll need (three for a 3D array, four for a 4D array, and so on).

Slicing also follows the order of the shape tuple. For example, to get a view of the arr3d array’s lower 2D array, you would enter 1 for the plan and then use the colon shorthand notation to select all of its rows and columns:

In [42]: arr3d[1, :, :]
Out[42]: 
array([[12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

For reference, the following figure shows some example slices through a 3D array, along with the resulting shapes:

Some example slices through a 3D array (from Python Tools for Scientists)
Some example slices through a 3D array (from Python Tools for Scientists)

Notice how the first index references the highest dimension. For a 2D array made of rows and columns, the first axis (0) is for rows. In a 3D array, the first axis (0) is for plans, and the second axis (1) is for rows.

As always, changing the values of elements in a slice will change the source array, unless the slice is a copy:

In [43]: arr3d[0, : :] = 0

In [44]: arr3d

Out[44]: 
array([[[ 0,  0,  0,  0],
        [ 0,  0,  0,  0],
        [ 0,  0,  0,  0]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

Before we move on, let’s practice indexing and slicing an array with more than three dimensions. For example, look at the following 4D array:

In [45]: arr4d = np.arange(24).reshape(2, 2, 2, 3)

In [46]: arr4d

Out[46]: 
array([[[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]]],

       [[[12, 13, 14],
         [15, 16, 17]],

        [[18, 19, 20],
         [21, 22, 23]]]])

Note how the array starts with four square brackets and uses two blank lines to separate the two stacked 3D arrays. Because we’re dealing with a 4D array, to select the 20 element, you will need to enter four indexes:

In [47]: arr4d[1, 1, 0, 2]
Out[47]: 20

Here, from left to right, you indexed a 4D array to a 3D array; a 3D array to a 2D array; a 2D array to a 1D array; and a 1D array to a single element. This might be more obvious in the next figure, which demonstrates stepping through these in order:

Indexing a 4D array down to a single element at [1, 1, 0, 2] (from Python Tools for Scientists)
Indexing a 4D array down to a single element at [1, 1, 0, 2] (from Python Tools for Scientists)

This style of ordering will hold for any number of dimensions.


Boolean Indexing

In addition to using numerical indexing and slicing, you can select elements in arrays using conditions and Boolean operators. This lets you extract elements without any prior knowledge of where they are in the array. For example, you might have hundreds of monitor wells around a landfill, and you want to find all the wells that detect the pollutant toluene above a certain threshold value. With Boolean indexing, not only can you identify these wells, but you can also create a new array based on the output.

To illustrate, the following condition searches an array for any elements that are integers greater than or equal to four:

In [48]: arr1d = np.array([1, 2, 3, 4, 5])

In [49]: print(arr1d >= 4)
[False False False  True  True]

As you can see, Python will return an array of Boolean values containing True values where the condition is satisfied. This syntax works for ndarrays of any dimension.

NumPy can also use the Booleans behind the scenes, allowing you to slice an array based on a conditional:

In [50]: a_slice = arr1d[arr1d >= 4]

In [51]: a_slice
Out[51]: array([4, 5])

Comparing two arrays also produces a Boolean array. In this example, we flag as True all the values in arr_2 that are greater than those in arr_1:

In [52]: arr_1 = np.random.randn(3, 4)

In [53]: arr_2 = np.random.randn(3, 4)

In [54]: arr_2 > arr_1
Out[54]: 
array([[ True,  True, False,  True],
       [ True, False, False, False],
       [False, False,  True,  True]])

A common use of Boolean indexing is to partition a grayscale image into foreground and background segments, a process called thresholding. This produces a binary image based on a cutoff value. Here’s an example in which we create a 2D image array and then threshold on values above 4:

In [55]: img = np.array([
    ...:      [12, 13, 14, 4, 16, 1, 11, 10, 9],
    ...:  [11, 14, 12, 3, 15, 1, 10, 12, 11],
    ...:  [10, 12, 12, 1, 14, 3, 10, 12, 12],
    ...:  [ 9, 11, 16, 0, 4, 2, 3, 12, 10],
    ...:  [12, 11, 16, 14, 10, 2, 16, 12, 13],
    ...:  [10, 15, 16, 14, 14, 4, 16, 15, 12],
    ...:  [13, 17, 14, 10, 14, 1, 14, 15, 10]])

In [56]: img_thresh = (img > 4).astype(int)

Remember that True evaluates to 1, and False evaluates to 0. This lets us convert a Boolean array to a numerical array by tacking on the astype() method and passing it the integer data type.

After thresholding, the 0 values in the new array should form the number 4:

In [57]: print(img_thresh)
[[1 1 1 0 1 0 1 1 1]
 [1 1 1 0 1 0 1 1 1]
 [1 1 1 0 1 0 1 1 1]
 [1 1 1 0 0 0 0 1 1]
 [1 1 1 1 1 0 1 1 1]
 [1 1 1 1 1 0 1 1 1]
 [1 1 1 1 1 0 1 1 1]]

To assign values based on a Boolean array, you index the source array based on a conditional and then assign a value. Here, we assign 0 to all the elements in the array with a value less than 5:

In [58]: img[img < 5] = 0

In [59]: img
Out[59]: 
array([[12, 13, 14,  0, 16,  0, 11, 10,  9],
       [11, 14, 12,  0, 15,  0, 10, 12, 11],
       [10, 12, 12,  0, 14,  0, 10, 12, 12],
       [ 9, 11, 16,  0,  0,  0,  0, 12, 10],
       [12, 11, 16, 14, 10,  0, 16, 12, 13],
       [10, 15, 16, 14, 14,  0, 16, 15, 12],
       [13, 17, 14, 10, 14,  0, 14, 15, 10]])

Likewise, you can change entire rows, columns, and plans in a Boolean array using indexing. For example, img[0] = 0 changes all the elements in the first row of the img array to 0.

The use of Booleans in arrays involves a few quirks. Extracting elements from an array using Boolean indexing creates a copy of the data by default, meaning that there is no need to use the copy() method. Another idiosyncrasy of Boolean arrays is that you must replace the and and or keywords with the & and |characters, respectively, when writing comparison statements.


Test Your Knowledge

Testing yourself on newly acquired knowledge is a great way to lock in what you’ve learned. Here’s a quick quiz to help you on your way. The answers are at the end of the article.

Question 1: Create a 2D ndarray of size 30 and shape (5, 6). Then, slice the array to sample the values highlighted in gray:

Test question 1 (by author)
Test question 1 (by author)

Question 2: Now, resample the array from Question 1 to retrieve the elements highlighted in gray:

Test question 2 (by author)
Test question 2 (by author)

Question 3: Slicing an ndarray produces:

a. A new array object

b. A copy of the source array

c. A view of the source array

d. A Python list object

Question 4: Slicing a 2D array with a combination of a scalar index and another slice produces:

a. A 2D array

b. A 1D array

c. A single element (0D array)

d. None of the above

Question 5: What is the rank of this array:

array([[[[ 0, 1, 2, 3],
 [ 4, 5, 6, 7]],

 [[ 8, 9, 10, 11],
 [12, 13, 14, 15]]],

 [[[16, 17, 18, 19],
 [20, 21, 22, 23]],

 [[24, 25, 26, 27],
 [28, 29, 30, 31]]]])

Further Reading

If you’re a beginner curious about Python’s essential libraries, like NumPy, Matplotlib, pandas, and more, check out my latest book, Python Tools for Scientists (it’s not just for scientists):

Python Tools for Scientists: An Introduction to Using Anaconda, JupyterLab, and Python’s Scientific…


Answers to Quiz

1.

In [1]: import numpy as np

In [2]: arr2d = np.arange(30).reshape(5, 6)

In [3]: arr2d[::2]
Out[3]: 
array([[ 0,  1,  2,  3,  4,  5],
       [12, 13, 14, 15, 16, 17],
       [24, 25, 26, 27, 28, 29]])

2.

In [4]: arr2d[1::2, 1::2]
Out[4]: 
array([[ 7,  9, 11],
       [19, 21, 23]])

In [5]: # also:

In [6]: arr2d[1:5:2, 1:6:2]
Out[6]: 
array([[ 7,  9, 11],
       [19, 21, 23]])
  1. c
  2. b
  3. 4

If you found this article useful and would like to support me, please buy me a coffee. I promise I’ll drink it.


Thanks!

Thanks for reading and clapping and please follow me for more Quick Success Data Science projects in the future.


Related Articles