
Introduction
At times as a Data Scientist, we are going to encounter poor quality data. To be successful we need to be able to effectively manage data quality issues before any analysis. Thankfully there are several powerful open-source libraries that we can utilise to efficiently process data such as Pandas. Today we are going to look at the different ways that we can loop over a DataFrame and access its values. Iterating a DataFrame can be incorporated in steps post initial exploratory data analysis to begin cleansing raw data.
Getting Started
What is Pandas?
For those of you that are new to Data Science or unfamiliar with Pandas, Pandas is an open-source library written in Python. It provides easy to use out of the box functionality to ingest and analysis relational data. Pandas supports the consumption of a wide array of file types such as CSV, Excel, JSON, XML, HTML and SQL to name a few.
Installing Pandas
The easiest method for installing Pandas is using PyPI which installs Pandas from source code using pip
by running the command pip install pandas
in a terminal. Detailed installation instructions for specific operating systems can be found on the Pandas Getting Started page or if you are using PyCharm instructions can be found in Learning Pandas Profiling.
The Pandas DataFrame
For the large majority, Pandas collects data into one of two objects:
- Series: A Series is a 1-Dimension ndarry object.
-
DataFrame: A DataFrame is a 2-Dimensional data structure that contains labelled rows and columns.
Iterating a DataFrame
For this example, we have created a DataFrame with explicitly named rows and columns to get you started and demonstrate the data structure. Generally, we would only name the columns and allow the row index to be autogenerated as a numerical range.

After you have executed the Python snippet you should receive an output similar to the above. Here you can clearly see how the Pandas DataFrame object is structured using a series of rows and columns.
DataFrame.iterrows()
The first method to loop over a DataFrame is by using Pandas .iterrows()
, which iterates over the DataFrame using index row pairs.

After calling .iterrows()
on the DataFrame, we gain access to the index
which is the label for the row and row
which is a Series representing the values within the row itself. The above snippet utilises Series.values
which returns an ndarray of all the values within each column for the row that is being referenced.
Within .iterrows()
the row
variable is of type Series which means we can access the values through the named columns of the DataFrame. For instance, if we were only interested in column_a
we could use the below snippet to return only those values.
The above snippet also demonstrates a safer way to check for a column within a row through row.get('column_name', default_value)
as opposed to row['column_name']
. If we were looking for a column that didn’t exist row['column_name']
would raise a KeyError
exception.
DataFrame.itertuples()
The next method for iterating over a DataFrame is .itertuples()
, which returns an iterator containing name tuples representing the column names and values.

This method still provides the ability to isolate a single column through the syntax row.column_name
. If we only need a tuple returned we can pass name=None
and index=False
to .intertuples()
, this will drop the named columns and index from each row.
Best Practices
Whilst being able to iterate a DataFrame using .iterrows()
and .itertuples()
is convenient, generally, it’s advised not to as the performance is quite slow over a larger DataFrame. Usually, when people are wanting to iterate a DataFrame it is to add in a calculated column or reformat an existing one. Pandas provides this type of functionality through its built-in function .apply()
. The .apply()
function provides a more efficient method for updating a DataFrame. Pandas Apply for Power Users provides an in-depth look at Pandas .apply()
.
Summary
Pandas provides several ways that we as Data Scientist can use to iterate over a DataFrame such as .iterrows()
and .itertuples()
. Both .iterrows()
and .itertuples()
provide powerful safe methods to access DataFrame row values. Whilst many new Data Scientists, with a Programming background, may lean towards the familiarity of looping over a DataFrame Pandas provides a far more efficient approach through the built-in apply function.
Thank you for taking the time to read our story – we hope you have found it valuable!