The world’s leading publication for data science, AI, and ML professionals.

5 Use Cases of Pandas loc and iloc Methods

Make them more useful.

Photo by Alejandro Piñero Amerio on Unsplash
Photo by Alejandro Piñero Amerio on Unsplash

Pandas is a highly flexible and powerful library for data analysis and manipulation. It provides lots of functions and methods to perform efficient operations in each step of data analysis process.

The loc and iloc are essential Pandas methods used for filtering, selecting, and manipulating data. They allow us to access a particular cell or multiple cells within a dataframe.

In this article, we will go over 5 use-cases of loc and iloc which I think are very helpful in a typical data analysis process.

We will use the Melbourne housing dataset available on Kaggle for the examples. We first read the csv file using the read_csv function.

import numpy as np
import pandas as pd
df = pd.read_csv("/content/melb_data.csv")
print(df.shape)
(13580, 21)
df.columns
Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG','Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car','Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude','Longtitude', 'Regionname', 'Propertycount'],
dtype='object')

The dataset contains 21 features about 13580 houses in Melbourne.


Example 1

The main difference between is the way they access rows and columns:

  • loc uses row and column labels
  • iloc uses row and column indices

Let’s use both methods to select the first rows in the address column.

df.loc[:5, 'Address'] # df.loc[0:5, 'Address'] works as well
0        85 Turner St 
1     25 Bloomburg St 
2        5 Charles St 
3    40 Federation La 
4         55a Park St 
5      129 Charles St
df.iloc[:5, 1]
0        85 Turner St 
1     25 Bloomburg St 
2        5 Charles St 
3    40 Federation La 
4         55a Park St

You may have noticed that we use the same expression to select the rows. The reason is that Pandas assigns integers row labels by default. Thus, unless we specify row labels, the indices and labels of the rows are the same. The only difference is that the upper limit is inclusive with the loc method.

The column indices also start from 0 so the index of the address column is 1.


Example 2

We do not have to specify a range to select multiple rows or columns. We can pass them in a list as well.

df.loc[[5,7,9], ['Address', 'Type']]
df.iloc[[5,7,9], [1,2]]
(image by author)
(image by author)

We have selected the rows with labels (or indices) 5, 7, and 9 of the address and type columns.


Example 3

We can use the loc method to create a new column. Let’s create a column that takes the value 1 for houses that are more expensive than 1 million. Since each data point (i.e. row) represents a house, we apply the condition on the price column.

df.loc[df.Price > 1000000, 'IsExpensive'] = 1

The "IsExpensive" column is 1 for rows that meet the condition and NaN for the other rows.

df.loc[:4, ['Price','IsExpensive']]
(image by author)
(image by author)

Example 4

The loc method accepts multiple conditions. Let’s create a new column called category which takes the value "Expensive House" for the rows with a price higher than 1.4 million and a type of "h".

df.loc[(df.Price > 1400000) & (df.Type == 'h'), 'Category'] = 'Expensive House'
df.loc[:4, ['Price','Category']]
(image by author)
(image by author)

We can handle the NaN values later on. For instance, the fillna function of Pandas provides flexible ways of handling the missing values. We can also fill the missing values based on other conditions with the loc method.


Example 5

We can use the loc method to update the values in an existing column based on a condition. For instance, the following code will apply a 5% discount on the prices higher than 1.4 million.

df.loc[df.Price > 1400000, 'Price'] = df.Price * 0.95
df.loc[:4, ['Price','IsExpensive']]
(image by author)
(image by author)

We can also use the iloc method for this task but we need to provide the index of the price column. Since it is more convenient to use the column labels rather than indices, the loc method is preferred over the iloc method for such tasks.


Conclusion

Since real life data is usually messy or not in the most appropriate format, one of the most common tasks for data scientists or analysts is to clean and manipulate the data.

It is of crucial importance to have a flexible method to access rows and columns to accomplish such tasks. The loc and iloc methods provide us just what we need in these situations.

Thank you for reading. Please let me know if you have any feedback.


Related Articles