The world’s leading publication for data science, AI, and ML professionals.

The Most Underrated Python Packages

A curated list of awesome libraries

source: delphinmedia, via pixabay (CC0)
source: delphinmedia, via pixabay (CC0)

In my experience as a Python user, I’ve come across a lot of different packages and curated lists. Some are in my bookmarks like the great [awesome-python](https://github.com/vinta/awesome-python)-data-science curated list, or awesome-python curated list. If you don’t know them, go check them out asap.

In this post, I’d like to show you something else. These are the results of late-night GitHub/Reddit browsing, and cool stuff shared by colleagues.

Some of these packages are really unique, others are just fun to use and real underdogs among the Data scientist/statistician I’ve worked with.

Let’s start!

Misc (the weird ones)

  • Knock Knock: Send notifications from Python to mobile devices or the desktop or email.
  • tqdm: Extensible Progress Bar for Python and CLI, with built-in support for pandas.
  • Colorama: Simple cross-platform colored terminal text.
  • Pandas-log: It provides feedback about basic pandas operations. Great for debugging long pipe chains.
  • Pandas-flavor: The easy way to extend Pandas DataFrame/Series.
  • More-Itertools: as it sounds, it adds additional functions similar to itertools.
  • streamlit: The easy way to create apps for your Machine Learning projects.
  • SQLModel: SQLModel, SQL databases in Python, designed for simplicity, compatibility, and robustness.

Data Cleaning and Manipulation

  • ftfy: Fixes mojibake and other glitches in Unicode text, after the fact.
  • janitor: A lot of cool functions to clean data.
  • Optimus: Another package for data cleaning.
  • Great-expectations: A great package to check if your data obeys your expectations.

Data Exploration and Modelling

  • Pandas-profile: Create an HTML report full of statistics from Pandas DataFrame.
  • dabl: Allow data exploration using visualisation and preprocessing.
  • pydqc: Allow to compare statistics between two datasets.
  • Pandas-summary: An extension to pandas DataFrames describe function.
  • pivottable-js: drag’n’drop functionality for pandas inside jupyter notebook.

Data Structures

  • Bounter: Efficient Counter that uses a limited (bounded) amount of memory regardless of data size.
  • python-bloomfilter: Scalable Bloom Filter implemented in Python.
  • datasketch: Gives you probabilistic data structures like LSH, Weighted MinHash, HyperLogLog and more.
  • ranges: Continuous Range, RangeSet, and RangeDict data structures for Python

Performance Checking and Optimization

  • Py-spy: Sampling profiler for Python programs.
  • pyperf: Toolkit to run Python benchmarks.
  • snakeviz: An in-browser Python profile viewer with great support for Jupiter notebook.
  • Cachier: Persistent, stale-free, local and cross-machine caching for Python functions.
  • Faiss: A library for efficient similarity search and clustering of dense vectors.
  • mypyc: A library that compile Python code to C extensions using type hints.
  • Scalene: a high-performance CPU, GPU and memory profiler for Python.

I hope you found something useful or fun for your work. I’m going to expand the post in the future, so stay tuned for new updates!


Related Articles