The world’s leading publication for data science, AI, and ML professionals.

loc vs iloc in Pandas

What’s the difference between loc[]and iloc[] in Python and Pandas

Photo by Nery Montenegro on Unsplash
Photo by Nery Montenegro on Unsplash

Introduction

Indexing and slicing pandas DataFrames and Python may sometimes be tricky. The two most commonly used properties when it comes to slicing are iloc and loc.

In today’s article we are going to discuss the difference between these two properties. We’ll also go through a couple of examples to make sure you understand when to use one over the other.


First, let’s create a pandas DataFrame that we’ll use as an example to demonstrate a few concepts.

import pandas as pd
df = pd.DataFrame(
 index=[4, 6, 2, 1], 
 columns=['a', 'b', 'c'], 
 data=[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]],
)
print(df)
#     a   b   c
# 4   1   2   3
# 6   4   5   6
# 2   7   8   9
# 1  10  11  12

Slicing using loc[]

loc[] property is used to slice a pandas DataFrame or Series and access row(s) and column(s) by label. This means that the input label(s) will correspond to the indices of rows that should be returned.

Therefore, if we pass an integer to loc[] it will be interpreted as the label of the index and not as the positional index. In the example shown below, loc will return the row with index label equal to 1.

>>> df.loc[1]
a    10
b    11
c    12
Name: 1, dtype: int64

loc also accepts an array of labels:

>>> df.loc[[6, 2]]
   a  b  c
6  4  5  6
2  7  8  9

Similarly, we can also use a slice object to retrieve specific range of labels. In the example below, notice how the slicing is computed; 4:2 does not correspond to indices but instead, to labels. In other words, it tells pandas to return all the rows in between the indices 4 and 2.

>>> df.loc[4:2]
   a  b  c
4  1  2  3
6  4  5  6
2  7  8  9

Slicing using iloc[]

On the other hand, iloc property offers integer-location based indexing where the position is used to retrieve the requested rows.

Therefore, whenever we pass an integer to iloc you should expect to retrieve the row with the corresponding positional index. In the example below, iloc[1] will return the row in position 1 (i.e. the second row):

>>> df.iloc[1]
a    4
b    5
c    6
Name: 6, dtype: int64
# Recall the difference between loc[1]
>>> df.loc[1]
a    10
b    11
c    12
Name: 1, dtype: int64

Again, you can even pass an array of positional indices to retrieve a subset of the original DataFrame. For example,

>>> df.iloc[[0, 2]]
   a  b  c
4  1  2  3
2  7  8  9

Or even a slice object of integers:

>>> df.iloc[1:3]
   a  b  c
6  4  5  6
2  7  8  9

iloc can also accept a callable function that accepts a single argument of type pd.Series or pd.DataFrame and returns an output which is valid for indexing.

For instance, in order to retrieve only the rows with odd index a simple lambda function should do the trick:

>>> df.iloc[lambda x: x.index % 2 != 0]
    a   b   c
1  10  11  12

Finally, you can also use iloc to index both axes. For example, in order to fetch the first two records and discard the last column you should call

>>> df.iloc[:2, :2]
   a  b
4  1  2
6  4  5

Final Thoughts

In this article we discussed how to properly index slice pandas DataFrames (or Series) using two of the most commonly properties namely loc and iloc.

It’s very important to understand the differences between these two properties and be able to use them effectively in order to create the desired output for your specific use-case. loc is used to index a pandas DataFrame or Series using labels. On the other hand, iloc can be used to retrieve records based on their positional index.


You may also like

Dynamic Typing in Python

How to Refine Your Google Search and Get Better Results

Easter Eggs in Python


Related Articles