The world’s leading publication for data science, AI, and ML professionals.

7 Uncommon But Useful Pandas Functions

Functions to boost your Pandas skills

Photo by Erik Mclean on Unsplash
Photo by Erik Mclean on Unsplash

Pandas is one of the most popular data analysis libraries. There are numerous Pandas functions and methods that ease and expedite the data cleaning and analysis process.

Pandas also provides some functions that are not so common but come in handy for certain tasks. In this post, we will cover 7 uncommon Pandas functions.

The functions that will be discussed are:

  • Clip
  • Eval
  • Combine_first
  • Transform
  • Melt
  • Diff
  • Shift

We always start importing the dependencies.

import numpy as np
import pandas as pd

1. Clip

Clip function trims a dataframe based on the given upper or lower values. It does not drop the rows that are outside the specified range by the upper or lower values. Instead, if a value is outside the boundaries, the clip function makes them equal to the appropriate boundary value.

Consider the following dataframe.

(image by author)
(image by author)

Let’s say we do not want to have any negative values and want to make them equal to zero. It can be done by setting the lower parameter of the clip function as 0.

df.clip(lower=0)
(image by author)
(image by author)

All negative values are equal to zero now. We can also assign an upper limit. For instance, we can trim the values to be between 0 and 1.

df.clip(lower=0, upper=1)
(image by author)
(image by author)

2. Eval

The eval function allows for manipulating or modifying a dataframe by passing an operation as a string.

For instance, the following code will modify the values in the "cola".

df.eval("cola = cola * 10")
(image by author)
(image by author)

We can also create a new column:

df.eval("new = cola * colb")
(image by author)
(image by author)

It is important to note that we need to set the inplace parameter to save the changes. Otherwise, the eval function will return a modified version of the dataframe but not change the original one.


3. Combine_first

The combine_first function updates the missing values in a dataframe by using the values in another dataframe. The matching criterion is the position in terms of row and column.

Consider the following two dataframes.

(image by author)
(image by author)

We can update the missing values in the second dataframe based on the first dataframe.

df2.combine_first(df1)
(image by author)
(image by author)

4. Transform

The transform function modifies the values in a dataframe according to the given function. It can be a simple aggregation or a lambda expression.

Consider the following dataframe.

(image by author)
(image by author)

We can take the log of each value by using the transform function.

df.transform(lambda x: np.log(x))
(image by author)
(image by author)

One useful feature of the transform function is that it accepts multiple functions. We can specify them in a list as below.

df.transform([lambda x: np.log(x), np.sqrt]
(image by author)
(image by author)

5. Melt

The melt function converts a dataframe from wide to long format. In the wide form, the similar variables are represented as separate columns. On the other hand, the long format contains a column to store the values of these variables and another column to store the name of the variable.

It is better to have the dataframe in the long format for certain tasks. The melt function provides a quite simple way for this conversion. It will be more clear when we do an example.

Consider the following dataframe in wide format.

(image by author)
(image by author)

The dataframe contains contains daily measurements for some people. The melt function can be used to convert it to a long format as below.

df_long = pd.melt(df, id_vars='name')
df_long.head(10)
(image by author)
(image by author)

6. Diff

The diff function is used to calculate the difference between two consecutive rows or columns depending on the axis parameter.

Consider the following dataframe.

(image by author)
(image by author)

We want to create a new column that contains the difference between the consecutive values in "colc".

df['diff_c'] = df['colc'].diff()
df
(image by author)
(image by author)

Since the first row does not have any previous row, the first value of the diff_c column is null.


7. Shift

The shift function can be used to shift the index of a dataframe. It is especially useful for time series data.

Consider the following dataframe with a datetime index.

(image by author)
(image by author)

We can use the shift function by specifying a positive or negative number of periods.

df.shift(3)
(image by author)
(image by author)

If we pass a negative period, the values will be shifted in the opposite direction. We can also specify a value to be used instead of null values created due to shifting.

df.shift(-3, fill_value=0)
(image by author)
(image by author)

Conclusion

What we have covered in this article is only a small part of Pandas abilities in data analysis process but will certainly be useful for your tasks.

It is not reasonable to try to learn all at once. Instead, learning small chunks and absorbing the information with practice will help you build comprehensive data analysis skills.

Thank you for reading. Please let me know if you have any feedback.


Related Articles