Explained with examples.

Pandas is one of the predominant data analysis tools which is highly appreciated among data scientists. It provides numerous flexible and versatile functions to perform efficient data analysis.
In this article, we will go over 3 pandas tricks that I think will make you a more happy pandas user. It is better to explain these tricks with some examples. Thus, we start by creating a data frame to wok on.
The data frame contains daily sales quantities of 3 different stores. We first create a period of 10 days using the date_range
function of pandas.
import numpy as np
import pandas as pd
days = pd.date_range("2020-01-01", periods=10, freq="D")
The days variable will be used as a column. We also need a sales quantity column which can be generated by the randint
function of numpy. Then, we create a data frame with 3 columns for each store.
A = pd.DataFrame({"date": days,
"store": "A",
"sales": np.random.randint(100, 200, size=10)})
B = pd.DataFrame({"date": days,
"store": "B",
"sales": np.random.randint(100, 200, size=10)})
C = pd.DataFrame({"date": days,
"store": "C",
"sales": np.random.randint(100, 200, size=10)})
We now combine these 3 data frames with the concat
function. It is better to also sort the rows based on date since we are interested in the daily sales quantities.
df = pd.concat([A, B, C]).sort_values(by="date")
df.head()

- Rank function
The first trick is about the rank
function. Let’s suppose we need to find the store that has the highest sales quantity for each day. This task can be done using the groupby
and max
functions.
What if we need to sort the stores based on their sales quantities for each day? Please take a look at the screenshot above. For the date 2020–01–01, A is the first, C is the second, and B is the third score.
We can use the rank function to apply such an order for each date.
df["rank"] = df.groupby("date)["sales"]
.rank(ascending=False).astype("int")
df.head(6)

We first group the data points (i.e. rows) by date and then apply the rank function. The generated rank is assigned to a new column. The rank function sorts the values in ascending order by default but it can be changed with the ascending parameter.
The rank information can be used to make a comparison between the stores. Let’s find out the rank distribution of each store.
df.groupby(["store","rank"]).count()[["sales"]]

Store A has the highest number of first ranks. Store B is usually the second store and C has kind of a uniform distribution of ranks.
- Naming the aggregation
When we found the rank distribution in the previous example, we selected a column after the aggregation was done. The reason is that the count aggregation is applied to all the columns that are not used in the grouping. If We do an aggregation with mean, it is applied to all the numerical columns not used in the grouping.
Here is how the result looks without selecting a column.
df.groupby(["store","rank"]).count()

If we do an aggregation with mean, it is applied to all the numerical columns that are not used in the grouping.
Another downside of this approach is that the aggregated column name is not informative of what it represents. Without seeing the code, we cannot really tell what sales column tells us. We can always rename the columns but there is a more practical way.
A solution to both these issues is using a named aggregation with the agg
function. We need to specify the name of the column and the aggregation function as below:
df.groupby(["store","rank"]).agg(rank_count = ("rank", "count"))

We only see the aggregated column names so we do not have to select a column after the aggregation. It also allows us for assigning any name to the aggregated column.
- Ignore the index
Recall the first screenshot.

We combine 3 data frames but the indices are not updated. They represent the index of each individual data frame.
The index is an important part of a data frame so it needs to be accurate. One option is to reset the index after the concatenation.
df = pd.concat([A, B, C])
.sort_values(by="date").reset_index(drop=True)
df.head()

We now have a more accurate index. If we do not set the drop
parameter of the reset_index
function as true, the prior indices are kept as a column in the data frame.
The trick here is the ignore_index
parameter. It saves us from having to use the reset_index
function. Both the concat
and sort_values
functions have this paremeter. When set as true, it ignores the indices of the individual data frames and assigns a new integer index.
df = pd.concat([A, B, C]).sort_values(by="date", ignore_index=True)
df.head()

It is not much of a gain but we achieve the same result with one less function.
Conclusion
We have covered 3 pandas tricks which seem to be simple but come in handy for some cases. They also simplify the syntax and operations.
Pandas is a great library with an easy-to-understand and intuitive syntax. Thus, you can easily learn the basic operations with some practice. After the basics, I suggest to focus on some tricks and detailed techniques for efficiency.
Thank you for reading. Please let me know if you have any feedback.