The world’s leading publication for data science, AI, and ML professionals.

5 Pandas Methods You’ve Never Used… And You Didn’t Lose Anything!

Any idea when on earth they can be helpful?

Image by Author
Image by Author

While in general the Pandas library of Python is an incredibly efficient tool for manipulating tabulated data, sometimes it can surprise its users. In this article, we’ll discuss 5 weird pandas methods that seem to be totally redundant for various reasons: being cumbersome, having a more concise and well-known synonym, or just being useless.

1. [ndim](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ndim.html)

This method was inherited from numpy. It returns the number of axes of an object, i.e. 1 for Series and 2 for dataframes:

import pandas as pd
df = pd.DataFrame({'A':[6, 8], 'B':[9, 2], 'C':[1, 5]}, 
                   index =['a', 'b'])
print(df, 'n')
print('ndim for a dataframe:', df.ndim)
print('ndim for a Series:', df['A'].ndim)
Output:
   A  B  C
a  6  9  1
b  8  2  5 

ndim for a dataframe: 2
ndim for a Series: 1

Practically, the only thing we can do with this method is to distinguish between Series and dataframe objects. However, for the same purpose, we can simply use a more well-known and universal approach – type(). What’s more, the result will be output in a more comprehensible way:

print(type(df))
print(type(df['A']))
Output:
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

2. [keys](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.keys.html)

Like with Python dictionaries, it’s possible to use the method keys() on a pandas structure to get its "info axis", meaning the index for Series and columns for dataframes. The syntax doesn’t imply any parameters:

print(df.keys())
print(df['A'].keys())
Output:
Index(['A', 'B', 'C'], dtype='object')
Index(['a', 'b'], dtype='object')

Unlike dictionaries, though, pandas objects represent essentially tables with rows and columns. Hence, it’s more natural (and common) to use columns and index instead. Besides, in this way, we can get the index of a dataframe, not only of a Series:

print(df.columns)
print(df.index)
print(df['A'].index)
Output:
Index(['A', 'B', 'C'], dtype='object')
Index(['a', 'b'], dtype='object')
Index(['a', 'b'], dtype='object')

3. [bool](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.bool.html)

Another method with an unclear practical utility is bool(), without any parameters as well. The only thing it does is to return the bool of a single element pandas structure with a boolean value. If at least one of these two conditions is not satisfied, if will return a ValueError. In other words, the method just returns the only value (of a bool type) of a Series or dataframe:

print(pd.Series([True]).bool())
print(pd.Series([False]).bool())
print(pd.DataFrame({'col': [True]}).bool())
print(pd.DataFrame({'col': [False]}).bool())
Output:
True
False
True
False

It’s difficult to imagine a situation when this operation would be necessary. In any case, there are more familiar (and much more universal) ways to do the same:

print(pd.Series([True]).values[0])
df2 = pd.DataFrame({'col': [False]})
print(df2.loc[0, 'col'])
print(df2.iloc[0, 0])
print(df2.at[0, 'col'])
print(df2.squeeze())
Output:
True
False
False
False
False

4. [assign](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html)

This method adds new columns to a dataframe:

df = df.assign(D=df['A']+df['B'])
print(df)
Output:
   A  B  C   D
a  6  9  1  15
b  8  2  5  10

or overwrites the existing ones:

df = df.assign(D=df['B']+df['C'])
print(df)
Output:
   A  B  C   D
a  6  9  1  10
b  8  2  5   7

or creates multiple columns where one is calculated based on another one defined within the sameassign:

df = df.assign(E=lambda x: x['C'] + x['D'], 
               F=lambda x: x['D'] + x['E'])
print(df)
Output:
   A  B  C   D   E   F
a  6  9  1  10  11  21
b  8  2  5   7  12  19

The same results can be obtained in a less cumbersome and more readable way, though:

df['D']=df['B']+df['C']
df['E']=df['C']+df['D']
df['F']=df['D']+df['E']
print(df)
Output:
   A  B  C   D   E   F
a  6  9  1  10  11  21
b  8  2  5   7  12  19

5. [swapaxes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.swapaxes.html)

As its name suggests, this function swaps the axes of the dataframe and the corresponding values appropriately:

print(df, 'n')
print(df.swapaxes(axis1='index', axis2='columns'))
Output:
   A  B  C   D   E   F
a  6  9  1  10  11  21
b  8  2  5   7  12  19 

    a   b
A   6   8
B   9   2
C   1   5
D  10   7
E  11  12
F  21  19

An odd thing here is that we always have to specify the arguments 'index' and 'columns' (otherwise, a TypeError will be thrown), even though it should be pretty clear that the only application of this method is exactly to swap these 2 axes. The transpose() method seems to be a much more elegant choice in this case:

print(df.transpose())
Output:
    a   b
A   6   8
B   9   2
C   1   5
D  10   7
E  11  12
F  21  19

especially its shortcut:

print(df.T)
Output:
    a   b
A   6   8
B   9   2
C   1   5
D  10   7
E  11  12
F  21  19

Conclusion

The list of redundant pandas methods could be continued with the synonymic methods doing absolutely the same thing and even having the same syntax, but with one of them being more common or having a shorter name, like isnull() and isna(), or mul() and multiply(). If you know some other examples of this kind, or whatever other weird pandas methods, you’re welcome to share your ideas in the comments.

Thanks for reading!

If you liked this article, you can also find interesting the following ones:

11 cool names in data science

The Easiest Ways to Perform Logical Operations on Two Dictionaries in Python

Testing Birthday Paradox in Faker Library (Python)


Related Articles