
In my experience as a Python user, I’ve come across a lot of different packages and curated lists. Some are in my bookmarks like the great [awesome-python](https://github.com/vinta/awesome-python)-data-science curated list, or awesome-python curated list. If you don’t know them, go check them out asap.
In this post, I’d like to show you something else. These are the results of late-night GitHub/Reddit browsing, and cool stuff shared by colleagues.
Some of these packages are really unique, others are just fun to use and real underdogs among the Data scientist/statistician I’ve worked with.
Let’s start!
Misc (the weird ones)
- Knock Knock: Send notifications from Python to mobile devices or the desktop or email.
- tqdm: Extensible Progress Bar for Python and CLI, with built-in support for pandas.
- Colorama: Simple cross-platform colored terminal text.
- Pandas-log: It provides feedback about basic pandas operations. Great for debugging long pipe chains.
- Pandas-flavor: The easy way to extend Pandas DataFrame/Series.
- More-Itertools: as it sounds, it adds additional functions similar to itertools.
- streamlit: The easy way to create apps for your Machine Learning projects.
- SQLModel: SQLModel, SQL databases in Python, designed for simplicity, compatibility, and robustness.
Data Cleaning and Manipulation
- ftfy: Fixes mojibake and other glitches in Unicode text, after the fact.
- janitor: A lot of cool functions to clean data.
- Optimus: Another package for data cleaning.
- Great-expectations: A great package to check if your data obeys your expectations.
Data Exploration and Modelling
- Pandas-profile: Create an HTML report full of statistics from Pandas DataFrame.
- dabl: Allow data exploration using visualisation and preprocessing.
- pydqc: Allow to compare statistics between two datasets.
- Pandas-summary: An extension to pandas DataFrames describe function.
- pivottable-js: drag’n’drop functionality for pandas inside jupyter notebook.
Data Structures
- Bounter: Efficient Counter that uses a limited (bounded) amount of memory regardless of data size.
- python-bloomfilter: Scalable Bloom Filter implemented in Python.
- datasketch: Gives you probabilistic data structures like LSH, Weighted MinHash, HyperLogLog and more.
- ranges: Continuous Range, RangeSet, and RangeDict data structures for Python
Performance Checking and Optimization
- Py-spy: Sampling profiler for Python programs.
- pyperf: Toolkit to run Python benchmarks.
- snakeviz: An in-browser Python profile viewer with great support for Jupiter notebook.
- Cachier: Persistent, stale-free, local and cross-machine caching for Python functions.
- Faiss: A library for efficient similarity search and clustering of dense vectors.
- mypyc: A library that compile Python code to C extensions using type hints.
- Scalene: a high-performance CPU, GPU and memory profiler for Python.
I hope you found something useful or fun for your work. I’m going to expand the post in the future, so stay tuned for new updates!