The world’s leading publication for data science, AI, and ML professionals.

The illustrated loc and iloc affair

A visual guide to index-based data selection in pandas

In this era of munching and crunching data efficiently, the pandas library has become the bread and butter of every data scientist. The library has a plethora of data manipulating abilities, of which, our focus here lies upon index-based data selection in Series and DataFrame objects. Although the topic has received some hands-on attention in the past, special visual emphasis is placed here on the motives behind the existence of two well known indexers

Image by author, made using diagrams.
Image by author, made using diagrams.

To begin with, let us take a look at the below illustration where we construct two Series objects named _s_im_p and _sexp. The suffixes im_p and __ex_p denote implicitly- and explicitly-indexed.

Illustrates implicitly- and explicitly-indexed series objects, s_imp and s_exp. Note that when displaying s_imp (left) Jupyter notebook outputs the implicit default index [0, 1, 2, 3] whereas in the case of s_exp (right), Jupyter notebook only outputs the explicit index [4, 5, 6, 7] and the data ['a', 'b', 'c', 'd'] as output. The implicit (default) integer-based index, surrounded by the dashed-line ellipse, is illustrated solely for the purpose of clarity. Image by author, made using diagrams.
Illustrates implicitly- and explicitly-indexed series objects, s_imp and s_exp. Note that when displaying s_imp (left) Jupyter notebook outputs the implicit default index [0, 1, 2, 3] whereas in the case of s_exp (right), Jupyter notebook only outputs the explicit index [4, 5, 6, 7] and the data [‘a’, ‘b’, ‘c’, ‘d’] as output. The implicit (default) integer-based index, surrounded by the dashed-line ellipse, is illustrated solely for the purpose of clarity. Image by author, made using diagrams.

As shown above, both of the series objects, by default, comprise an implicitly generated integer-based index starting at 0. In the case of _simp the default implicit index is its only index whereas in the case of _sexp, it ends up with two, i.e., an implicit default integer-based index starting at 0 and an explicit user-defined index starting at 4. At first, this appears to be fine; however, it could lead to unnecessary confusion when performing index-based data selections. For example, let us consider the earlier defined Series object _sexp and perform a couple of operations such as indexing and slicing, see below

Illustrates the indexing and slicing operation (left) using an explicitly-indexed series object, s_exp, together with the cases of confusion (right). Note that the implicit default integer-based index, surrounded by a dashed-line ellipse, is illustrated solely for the purpose of clarity. Image by author, made using diagrams.
Illustrates the indexing and slicing operation (left) using an explicitly-indexed series object, s_exp, together with the cases of confusion (right). Note that the implicit default integer-based index, surrounded by a dashed-line ellipse, is illustrated solely for the purpose of clarity. Image by author, made using diagrams.

As shown in the above illustration, _sexp makes use of the explicit user-defined index [4, 5, 6, 7] for indexing and the implicit default index [0, 1, 2, 3] for slicing. Although these implicit-explicit index choices are made under the hood, remembering operation specific indices can be confusing. Hence, it is for this particular reason why Pandas provides two separate indexers 🎉 the loc and iloc – one for using the explicit user-defined index (loc) and the other for using the implicit default index (iloc). This is highly convenient as the main purpose of having separate indexers is to help avoid needless confusion, see below

Illustrates the indexing and slicing operations using the loc and iloc indexer. On a closer look at the term iloc, the 'i' could either imply implicit or integer-based. Similarly, the term 'loc' could also be thought of as a stump word for 'locator'. Here we choose 'iloc' to be called as an implicit indexer. Note that the implicit default integer-based index in the right bottom cell is illustrated solely for the purpose of clarity. Image by author, made using diagrams.
Illustrates the indexing and slicing operations using the loc and iloc indexer. On a closer look at the term iloc, the ‘i’ could either imply implicit or integer-based. Similarly, the term ‘loc’ could also be thought of as a stump word for ‘locator’. Here we choose ‘iloc’ to be called as an implicit indexer. Note that the implicit default integer-based index in the right bottom cell is illustrated solely for the purpose of clarity. Image by author, made using diagrams.

Additionally, on a closer look at the slicing operation above, you can observe that the loc indexer includes the last index 5 whereas the iloc indexer does comply with the standard slicing rule, i.e., by not including the last index 2. This is quite handy as explicit indices can also be of type string in contrast to the implicit indices, which are by default integer-based starting from 0. See the below illustration where a slicing operation is performed utilising a series with string-based index

Illustrates the slicing operations using the loc and iloc indexers (locators) for a series with an explicitly defined string-based index (label). Note the slicing operation using loc. The loc includes the label 'three' whereas the slicing operation using an iloc indexer ignores the corresponding integer-based index 2. The implicit default integer-based index, surrounded by the dashed-line ellipse, is illustrated solely for the purpose of clarity. Image by author, made using diagrams.
Illustrates the slicing operations using the loc and iloc indexers (locators) for a series with an explicitly defined string-based index (label). Note the slicing operation using loc. The loc includes the label ‘three’ whereas the slicing operation using an iloc indexer ignores the corresponding integer-based index 2. The implicit default integer-based index, surrounded by the dashed-line ellipse, is illustrated solely for the purpose of clarity. Image by author, made using diagrams.

So far, the above illustrations focused on Series objects alone. But, what about implicit-explicit indices in a DataFrame object? And, how are loc and iloc indexers applicable to dataframes?

Technically, a simple two-dimensional DataFrame object is a collection of Series objects where each of the series is given a name tag aka label and, lets say for visual purposes, are placed next to each other. As a result, we get a two-dimensional data structure with rows and columns where each row has an index and each column has a label. However, for consistency, lets stick with column index instead of column label. Now, as a DataFrame object builds on a Series object, the presence of an implicit and explicit index showcased in the aforementioned illustrations is also observable in dataframes. In fact, in an explicitly-indexed dataframe, the DataFrame object comprises two pairs of indices, i.e., an implicit-explicit row index pair and an implicit-explicit column index pair. For example, see the below two illustrations, which showcase the differences between an implicitly- and explicitly-indexed dataframe. The first illustration presents an implicitly-indexed dataframe that has a default integer-based row and column index whereas the second exhibit comprises an explicitly-indexed dataframe consisting of implicit-explicit index pairs, i.e., an implicit default integer-based index and an explicit user-defined index.

Illustrates an implicitly-indexed dataframe in detail. Image by author, made using diagrams.
Illustrates an implicitly-indexed dataframe in detail. Image by author, made using diagrams.
Illustrates an explicitly-indexed (labelled) dataframe where the explicit row and column index is of string-type and the implicit default row and column index is integer-based. Note that the implicit row and column index, surrounded by the dashed-line ellipse, is illustrated solely for the purpose of clarity. Image by author, made using diagrams.
Illustrates an explicitly-indexed (labelled) dataframe where the explicit row and column index is of string-type and the implicit default row and column index is integer-based. Note that the implicit row and column index, surrounded by the dashed-line ellipse, is illustrated solely for the purpose of clarity. Image by author, made using diagrams.

Often when working with explicitly-defined string-based row and column index, as showcased above, the term index is often swapped with label. Therefore, leading to the terms like row- and column-label in place of row- and column-index. Furthermore, renaming the keyword arguments index and columns, in the DataFrame object’s constructor, to _rowindex and _columnindex could contribute to naming consistency. However, let us not get into naming conventions here as it is purely a matter of taste and probably a story in its own right 😉

So, continuing further, a dataframe is in many ways like a two-dimensional structured array. Thereby, to perform a conventional array-like indexing the above mentioned implicit index scheme plays a crucial role. Using the iloc indexer, an array-like indexing can be performed as shown below

Examples illustrating array-like indexing and slicing operations for an explicitly indexed dataframe, df_exp, using the iloc indexer. The same array-like indexing is also applicable to an implicitly indexed dataframe, df_imp. Note that the implicit row and column index, surrounded by the dashed-line ellipse, is illustrated solely for the purpose of clarity. Image by author, made using diagrams.
Examples illustrating array-like indexing and slicing operations for an explicitly indexed dataframe, df_exp, using the iloc indexer. The same array-like indexing is also applicable to an implicitly indexed dataframe, df_imp. Note that the implicit row and column index, surrounded by the dashed-line ellipse, is illustrated solely for the purpose of clarity. Image by author, made using diagrams.

And of course, the above indexing and slicing operations can also be performed using a loc indexer where it utilises the explicit row and column index (label), e.g. _dfexp.loc[‘r1’, ‘c2’], instead of the default integer-based index. No illustration is presented here as it is left out as an exercise for the reader 😜 However, for the curious ones, pandas did provide a hybrid indexer called ix, which had the ability to utilise, both, the implicit and the explicit index at the same time. However, this new indexer is deprecated since 2018 as it had all the potential to lead to needless confusion, which is exactly what the loc and iloc wanted to avoid at the first place 😄

So, that brings us to the end of the loc and iloc affair. Hope the above illustrations have clearly showcased the the difference between an implicit and explicit index in a Series and DataFrame object and, more importantly, helped you understand the true motive behind having two separate indexers, the explicit (loc) and the implicit (iloc). Thank you 🙏


Related Articles