Introduction
Multiple column selection is one of the most common and simple tasks one can perform. In today’s short guide we will discuss about a few possible ways for selecting multiple columns from a Pandas DataFrame. Specifically, we will explore how to do so
- using basing indexing
- with
loc
- using
iloc
- through the creation of a new DataFrame
Additionally, we will discuss when to use one method over the other, based on your specific use-case and whether you need to generate a view or a copy of the original DataFrame object.
First, let’s create an example DataFrame that we’ll reference throughout this article in order to demonstrate a few concepts.
import pandas pd
df = pd.DataFrame({
'colA':[1, 2, 3],
'colB': ['a', 'b', 'c'],
'colC': [True, False, True],
'colD': [1.0, 2.0, 3.0],
})
print(df)
colA colB colC colD
0 1 a True 1.0
1 2 b False 2.0
2 3 c True 3.0
Using basic indexing
The first option you have when comes to select multiple columns from an existing pandas DataFrame is the use of basic indexing. This approach is usually useful when you know precisely which columns you want to keep.
Therefore, you can take a copy of the original DataFrame containing only those columns by passing the list with the names using the []
notation which is equivalent to [__getitem__](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#basics)
implementation in Python classes.
df_result = df[['colA', 'colD']]
print(df_result)
colA colD
0 1 1.0
1 2 2.0
2 3 3.0
If you want to learn more about how indexing and slicing works in Python, make sure to read the article below.
Using iloc
Alternatively, if you want to reference column indices instead of column names and slice the original DataFrame (for instance if you want to keep say the first two columns but you don’t really know the column names), you can use iloc
.
df_result = df.iloc[:, 0:2]
print(df_result)
colA colB
0 1 a
1 2 b
2 3 c
Note that the above operation, will return a view of the original DataFrame. This means that the view will contain only a chunk of the original DataFrame but it will still point to the same locations in memory. Therefore, if you modify the sliced object (df_result
), then this may also effect the original object (df
).
If you wish to slice the DataFrame but get a copy of the original object instead, then simply call copy()
:
df_result = df.iloc[:, 0:2].copy()
Using loc
Now if you want to slice the original DataFrame using the actual column names, then you can use the loc
method. For instance, if you want to get the first three columns, you can do so with loc
by referencing the first and last name of the range of columns you want to keep:
df_result = df.loc[:, 'colA':'colC']
print(df_result)
colA colB colC
0 1 a True
1 2 b False
2 3 c True
At this point you may want to read about the differences between loc
and iloc
in Pandas and clarify which one to use based on your specific requirements and use-cases.
Creating a new pandas DataFrame
Finally, you can even create a new DataFrame using only a subset of the columns included in the original DataFrame as shown below.
df_result = pd.DataFrame(df, columns=['colA', 'colC'])
print(df_result)
colA colC
0 1 True
1 2 False
2 3 True
Final Thoughts
In today’s short guide we showcased a few possible ways for selecting multiple columns from a pandas DataFrame. We discussed how to do so using simple indexing, iloc
, loc
and through the creation of a new DataFrame. Note that some of the methods discussed in this article, may generate a view of the original DataFrame and thus you should be extra careful.
Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read.
You may also like
How to Auto-Adjust the Width of Excel Columns with Pandas ExcelWriter