How many times have you gone to feed a pandas DataFrame into a utility function form another library and it fails due to there being object columns. Maybe it was a graph from seaborn? (we’re looking at you sns.clustermaps()).
>>> sns.clustermap(
df,
method='ward',
cmap="YlGnBu",
standard_scale=1
)
TypeError: unsupported operand type(s) for -: 'str' and 'str'
So, I build a list of all the data types of my columns and filter those non numeric columns out, and pass that resulting data frame into the function that only expected numeric columns.
>>> numeric_cols = [
col
for col, dtype
in df.dtypes.items()
if dtype=='float64'
]
>>> print(numeric_cols)
['col1', 'col2']
>>> sns.clustermap(
df[numeric_cols],
method='ward',
cmap="YlGnBu",
standard_scale=1
)

As I was investigating some data relating to gene expressions in yeasts yesterday, it got me thinking. There has to be a better way. This concept of filtering a DataFrame’s columns by datatype should be common enough to be a one-liner! Well, it turns out it is, and even though I thought I had crawled to documentation enough times to be across the pandas API, it turns out there’s always a new hidden gem!
Pandas DataFrames have a built in method called select_dtypes()
which does exactly what I wanted. And rewriting the above code to use this new (to me) function looks like the following (notice select_dtypes() takes a list, you can filter for multiple data types!):
>>> sns.clustermap(
df.select_dtypes(['number']),
method='ward',
cmap="YlGnBu",
standard_scale=1
)
I’m sure most of you are already well across this since (I did a git blame on the Pandas git repo to see when it was added: 2014 🤦). Anyway, I hope maybe someone will find this as helpful as I have. Happy coding!