The world’s leading publication for data science, AI, and ML professionals.

5 Amazing Pandas Features You Probably Don’t Know About

Powerful pandas functions explained to boost your data analytics workflow.

Photo by Sid Balachandran on Unsplash
Photo by Sid Balachandran on Unsplash

When using pandas in you data science or data Analytics projects, you sometimes discover powerful new functions you wish you knew before. Here is my personal top 5.


1. Web Scraping

Pandas has a powerful method read_html() for scraping Data tables from webpages.

Let’s assume we need data on gross national income. It is available in a data table on Wikipedia.

Source: Wikipedia
Source: Wikipedia

Reading all the HTML tables from Wikipedia using pandas is pretty straightforward.

import pandas as pd
url = 'https://en.wikipedia.org/wiki/Gross_national_income'
tables = pd.read_html(url)

The result is a list of tables (i.e. DataFrames). In this example, the table we are interested in is the fourth table (note: Python does use zero-based indexing).

tables[3]
Image by Author
Image by Author

When needed, you can do some tweaking.

df = tables[3].droplevel(0, axis=1)
.rename(columns={'No.':'No', 'GDP[10]':'GDP'})
.set_index('No')
Image by Author
Image by Author

2. Explode

With the explode method you can transform each element of a list to a row, replicating index values.

cars = pd.DataFrame({
    'country': ['Germany', 'Japan', 'USA'],
    'brand': [['Mercedes', 'BMW', 'Audi', 'Volkswagen'],
              ['Toyota', 'Nissan', 'Honda'],
              ['Ford', 'Chrysler', 'Jeep', 'Dodge', 'GMC']
             ]
})
Image by Author
Image by Author
cars.explode('brand')

3. Shift, Difference and Percent Change

These functions are best explained by example. We first generate a DataFrame with dates and values.

import pandas as pd
import random
random.seed(1)
n = 14 # two weeks
df = pd.DataFrame(
    {'value': random.sample(range(10, 30), n)},
    index = pd.date_range("2021-01-01", periods=n, freq='D')
)
Image by Author
Image by Author

Now let’s add some columns to show the values of the shift, diff and pct_change methods.

df['shift()'] = df.value.shift() # value previous day
df['shift(7)'] = df.value.shift(7) # value 7 days ago
df['shift(-1)'] = df.value.shift(-1) # value next day
df['diff()'] = df.value.diff() # difference previous day
df['diff(7)'] = df.value.diff(7) # difference 7 days ago
df['diff(-1)'] = df.value.diff(-1) # difference next day
df['pct_change()'] = df.value.pct_change() # pct change previous day
df['pct_change(7)'] = df.value.pct_change(7) # pct change 7 days ago
df['pct_change(-1)'] = df.value.pct_change(-1) # pct change next day
Image by Author
Image by Author

4. Wrappers to comparison operators

Pandas has some super handy short wrappers to comparison operators, like eq (equal) ne (not equal), le (less or equal), lt, (less than), ge (greater of equal) and gt (greater than). They are equivalent to ==, !=, <=, <, >= and >. Here are some examples.

import pandas as pd
import random
random.seed(102)
df = pd.DataFrame(
    {'A': random.choices(range(25),  k=10),
     'B': random.choices(range(25),  k=10),
     'C': random.choices(range(25),  k=10),
     'D': random.choices(range(25),  k=10),
     'E': random.choices(range(25),  k=10)}
)
Image by Author
Image by Author
df.eq(15)
Image by Author
Image by Author
s = pd.Series([0, 5, 10, 15, 20], index=['A', 'B', 'C', 'D', 'E'])
df.ge(s)
Image by Author
Image by Author

5. Clip and Eval

With clip() you can trim values at input threshold(s). It assigns values outside boundary to boundary values. The method eval is used to evaluate a string describing operations on DataFrame columns.

df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 60, 10)})
Image by Author
Image by Author
df.clip(lower=2, upper = 40)
Image by Author
Image by Author
df.clip(lower=2, upper=40).eval('C = A + B')
Image by Author
Image by Author

Final Thoughts

Pandas is an amazing library for data analysis and Data Science. It has a ton of features. I highly recommend spending some time to explore the documentation so you won’t miss any of all the powerful functions.

What are your favorite pandas functions? Let me know your thoughts.


Find the Topics Your Medium Story was Curated into Automagically

Reactive Data Analysis with Julia in Pluto Notebooks

Plotting Thematic Maps using Shapefiles in Julia

Getting started with Data Analysis in Julia

How to setup Project Environments in Julia


Related Articles