The world’s leading publication for data science, AI, and ML professionals.

Selecting Multiple Columns From a Pandas DataFrame

Discussing how to select multiple columns from a DataFrame in pandas

Photo by Aviv Ben Or on Unsplash
Photo by Aviv Ben Or on Unsplash

Introduction

Multiple column selection is one of the most common and simple tasks one can perform. In today’s short guide we will discuss about a few possible ways for selecting multiple columns from a Pandas DataFrame. Specifically, we will explore how to do so

  • using basing indexing
  • with loc
  • using iloc
  • through the creation of a new DataFrame

Additionally, we will discuss when to use one method over the other, based on your specific use-case and whether you need to generate a view or a copy of the original DataFrame object.


First, let’s create an example DataFrame that we’ll reference throughout this article in order to demonstrate a few concepts.

import pandas pd
df = pd.DataFrame({
    'colA':[1, 2, 3], 
    'colB': ['a', 'b', 'c'],
    'colC': [True, False, True],
    'colD': [1.0, 2.0, 3.0],
})
print(df)
   colA colB   colC  colD
0     1    a   True   1.0
1     2    b  False   2.0
2     3    c   True   3.0

Using basic indexing

The first option you have when comes to select multiple columns from an existing pandas DataFrame is the use of basic indexing. This approach is usually useful when you know precisely which columns you want to keep.

Therefore, you can take a copy of the original DataFrame containing only those columns by passing the list with the names using the [] notation which is equivalent to [__getitem__](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#basics) implementation in Python classes.

df_result = df[['colA', 'colD']]
print(df_result)
   colA  colD
0     1   1.0
1     2   2.0
2     3   3.0

If you want to learn more about how indexing and slicing works in Python, make sure to read the article below.

Mastering Indexing and Slicing in Python


Using iloc

Alternatively, if you want to reference column indices instead of column names and slice the original DataFrame (for instance if you want to keep say the first two columns but you don’t really know the column names), you can use iloc.

df_result = df.iloc[:, 0:2]
print(df_result)
   colA colB
0     1    a
1     2    b
2     3    c

Note that the above operation, will return a view of the original DataFrame. This means that the view will contain only a chunk of the original DataFrame but it will still point to the same locations in memory. Therefore, if you modify the sliced object (df_result), then this may also effect the original object (df).

If you wish to slice the DataFrame but get a copy of the original object instead, then simply call copy():

df_result = df.iloc[:, 0:2].copy()

Using loc

Now if you want to slice the original DataFrame using the actual column names, then you can use the loc method. For instance, if you want to get the first three columns, you can do so with loc by referencing the first and last name of the range of columns you want to keep:

df_result = df.loc[:, 'colA':'colC']
print(df_result)
   colA colB   colC
0     1    a   True
1     2    b  False
2     3    c   True

At this point you may want to read about the differences between loc and iloc in Pandas and clarify which one to use based on your specific requirements and use-cases.

loc vs iloc in Pandas


Creating a new pandas DataFrame

Finally, you can even create a new DataFrame using only a subset of the columns included in the original DataFrame as shown below.

df_result = pd.DataFrame(df, columns=['colA', 'colC'])
print(df_result)
   colA   colC
0     1   True
1     2  False
2     3   True

Final Thoughts

In today’s short guide we showcased a few possible ways for selecting multiple columns from a pandas DataFrame. We discussed how to do so using simple indexing, iloc, loc and through the creation of a new DataFrame. Note that some of the methods discussed in this article, may generate a view of the original DataFrame and thus you should be extra careful.


Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read.


You may also like

How to Iterate Over Rows in a Pandas DataFrame


How to Auto-Adjust the Width of Excel Columns with Pandas ExcelWriter


How To Get The Row Count Of a Pandas DataFrame


Related Articles