Introduction
Indexing a dataframe in pandas is an extremely important skill to have and master. Indexing just means selecting specific rows and/or columns in a dataframe or series. In this tutorial, we will cover the loc and iloc methods, which are two of the most common ways of indexing a dataframe in pandas. I will be working with the ufo sightings dataset found here in jupyter notebook.
Before we start, let’s read in our data into a dataframe and take a look at the top 5 rows of our ufo dataframe:

And let’s take a look at some other information about our dataframe:

We used the shape and columns attributes to get the shape of our dataframe (number of rows, number of columns) and the column names, respectively.
loc Method
Probably the most versatile method to index a dataframe is the loc method. loc is both a dataframe and series method, meaning you can call the loc method on either of those pandas objects. When using the loc method on a dataframe, we specify which rows and which columns we want using the following format: dataframe.loc[specified rows: specified columns]. There are different ways to specify which rows and columns we want to select. For example, we can pass in a single label, a list or array of labels, a slice object with labels, or a boolean array. Let’s go over each of these ways!
Using Single Label
One way we can specify which rows and/or columns we want is by using labels. For rows, the label is the index value of that row, and for columns, the column name is the label. For example, in our ufo dataframe, if we want the fifth row only along with all the columns, we would use the following:
ufo.loc[4, :]

So we specified which rows we want by using the label of that specific row, which is 4, and since we wanted all of the columns, we would just use a colon.
Note: We could have left out the colon and we would have gotten the same output, however, it is better for code readability to leave the colon in to explicitly show we want all columns.
List or Array of Labels
Let’s say we want multiple rows and/or columns. How would we specify that? Well, with using labels, we can either enter a list of labels, or use something similar to the slice notation that you may be familiar with.
Let’s start with the list of labels:

Note how we can just specify the row and column labels with a list of labels.
Slice Object
We can also use slice notation with this format: start label: stop label. However, in contrast to using slice notation with lists or strings, both the start AND stop labels are included in our output as shown below:

Note how row labels 3, 4, AND 5 were included in our output dataframe. Also note how the City, Colors Reported, and Shape Reported columns were included, even though we stopped at Shape Reported with our slice object. Remember, ufo.columns returned a list with the order of City, Colors Reported, Shape Reported, State, and Time. We are including everything from the City label to the Shape Reported label, which includes the Colors Reported label as well.
Boolean Array
Lastly, we can use an array of boolean values. However, this array of boolean values must have the same length as the axis we are using it on. For example, our ufo dataframe has a shape of (18241, 5) according to the shape attribute we used above, meaning it has 18241 rows and 5 columns. So if we want to use a boolean array to specify our rows, then it would need to have a length of 18241 elements. If we want to use a boolean array to specify our columns, it would need to have a length of 5 elements. The most common way of creating this boolean array is by using a conditional.
For example, let’s say we wanted to select only the rows that included Abilene as the city in which the ufo sightings took place. We can start with the following condition:
ufo.City == 'Abilene'

Note how this returns a pandas series (or array like object) that has a length of 18241 and is made up of boolean values (True or False). This is the exact number of values we need to be able to use this boolean array to specify our rows using the loc method. Imagine you are overlaying this series of True and False values over the index of our ufo dataframe. Wherever there is a True boolean value in this series, that specific row will be selected and will show up in our dataframe. Here we can see that the index or label of 3 is True (in the 4th row), which means that the first row we will see once we use this array of boolean values with our loc method is the row with the label 3 (or 4th row in our ufo dataframe).
ufo.loc[ufo.City == 'Abilene', :]

And that is exactly what we see! We have specified the rows we want using an array of boolean values with a length equal to the number of rows in our original dataframe.
Remember, we can combine these different ways of specifying rows and columns, meaning we can use one way of indexing on the rows and a different way on the columns. For example:
ufo.loc[ufo.City == 'Abilene', 'City':'State']

Note how we used a condition that returns an array of boolean values to specify the rows and the slice object using labels to specify the columns.
iloc Method
The iloc method is also both a dataframe and series method that can be used to index a dataframe (or series). The i in iloc stands for integer, since instead of labels we are using integer-location based indexing based on the position of the rows and columns. Just like with the loc method, we can input an integer or list of integers, a slice object with integer locations, or a boolean array. Let’s just look at one key difference between the loc and iloc methods:
In our ufo dataframe, we did not change the index, so the default index of our dataframe is just the integer location of our rows. However, let’s try using the slice object to specify our rows using the iloc method:
ufo.iloc[3:5, :]

Note how when using the slice object with the iloc method, the stop integer location is NOT included in our dataframe. So we are only seeing rows 3 and 4, but not row 5. This is in contrast with the loc method where both the start and stop labels are included in our dataframe.
Let’s use the iloc method to specify the columns we want as well:
ufo.iloc[3:5, 1:3]

If we look at the columns of our dataframe for reference, we can see that we are slicing our columns from index 1, or the Colors Reported, to index 2, which is Shape Reported. We do not include the index of our stop, which is 3 with the value of State in this case.
Note: We could have used a callable function for either the loc or iloc methods that returns a valid output for indexing (any of the inputs we discussed above). However, we will save that for another tutorial.
All the code used in this tutorial can be seen here:
If you enjoy reading stories like these and want to support me as a writer, consider signing up to become a Medium member. It’s $5 a month, giving you unlimited access to stories on Medium. If you sign up using my link, I’ll earn a small commission.
Conclusion
In this tutorial, we learned how to index a dataframe with both the loc and iloc methods. We learned how the loc method works primarily with labels of the rows and columns, and the iloc method works with integer locations. We also saw how we can use boolean arrays to index or specify our rows and columns.