When using pandas in you data science or data Analytics projects, you sometimes discover powerful new functions you wish you knew before. Here is my personal top 5.
1. Web Scraping
Pandas has a powerful method read_html()
for scraping Data tables from webpages.
Let’s assume we need data on gross national income. It is available in a data table on Wikipedia.

Reading all the HTML tables from Wikipedia using pandas is pretty straightforward.
import pandas as pd
url = 'https://en.wikipedia.org/wiki/Gross_national_income'
tables = pd.read_html(url)
The result is a list of tables
(i.e. DataFrames). In this example, the table we are interested in is the fourth table (note: Python does use zero-based indexing).
tables[3]

When needed, you can do some tweaking.
df = tables[3].droplevel(0, axis=1)
.rename(columns={'No.':'No', 'GDP[10]':'GDP'})
.set_index('No')

2. Explode
With the explode
method you can transform each element of a list to a row, replicating index values.
cars = pd.DataFrame({
'country': ['Germany', 'Japan', 'USA'],
'brand': [['Mercedes', 'BMW', 'Audi', 'Volkswagen'],
['Toyota', 'Nissan', 'Honda'],
['Ford', 'Chrysler', 'Jeep', 'Dodge', 'GMC']
]
})

cars.explode('brand')

3. Shift, Difference and Percent Change
These functions are best explained by example. We first generate a DataFrame with dates and values.
import pandas as pd
import random
random.seed(1)
n = 14 # two weeks
df = pd.DataFrame(
{'value': random.sample(range(10, 30), n)},
index = pd.date_range("2021-01-01", periods=n, freq='D')
)

Now let’s add some columns to show the values of the shift
, diff
and pct_change
methods.
df['shift()'] = df.value.shift() # value previous day
df['shift(7)'] = df.value.shift(7) # value 7 days ago
df['shift(-1)'] = df.value.shift(-1) # value next day
df['diff()'] = df.value.diff() # difference previous day
df['diff(7)'] = df.value.diff(7) # difference 7 days ago
df['diff(-1)'] = df.value.diff(-1) # difference next day
df['pct_change()'] = df.value.pct_change() # pct change previous day
df['pct_change(7)'] = df.value.pct_change(7) # pct change 7 days ago
df['pct_change(-1)'] = df.value.pct_change(-1) # pct change next day

4. Wrappers to comparison operators
Pandas has some super handy short wrappers to comparison operators, like eq
(equal) ne
(not equal), le
(less or equal), lt
, (less than), ge
(greater of equal) and gt
(greater than). They are equivalent to ==
, !=
, <=
, <
, >=
and >
. Here are some examples.
import pandas as pd
import random
random.seed(102)
df = pd.DataFrame(
{'A': random.choices(range(25), k=10),
'B': random.choices(range(25), k=10),
'C': random.choices(range(25), k=10),
'D': random.choices(range(25), k=10),
'E': random.choices(range(25), k=10)}
)

df.eq(15)

s = pd.Series([0, 5, 10, 15, 20], index=['A', 'B', 'C', 'D', 'E'])
df.ge(s)

5. Clip and Eval
With clip()
you can trim values at input threshold(s). It assigns values outside boundary to boundary values. The method eval
is used to evaluate a string describing operations on DataFrame columns.
df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 60, 10)})

df.clip(lower=2, upper = 40)

df.clip(lower=2, upper=40).eval('C = A + B')

Final Thoughts
Pandas is an amazing library for data analysis and Data Science. It has a ton of features. I highly recommend spending some time to explore the documentation so you won’t miss any of all the powerful functions.
What are your favorite pandas functions? Let me know your thoughts.
Find the Topics Your Medium Story was Curated into Automagically
Reactive Data Analysis with Julia in Pluto Notebooks
Plotting Thematic Maps using Shapefiles in Julia