PYTHON

Some of my most popular blogs are about Python libraries. I believe that they are so popular because Python libraries have the power to save us a lot of time and headaches. The problem is that most people focus on those most popular libraries but forget that multiple less-known Python libraries are just as good as their most famous cousins.
Finding new Python libraries can also be problematic. Sometimes we read about these great libraries, and when we try them, they don’t work as we expected. If this has ever happened to you, fear no more. I got your back!
In this blog, I will show you four Python libraries and why you should try them. Let’s get started.
QuickDA
I wrote QuickDA a while ago, and I’m still surprised by how well it works. QuickDA, as the name says, it’s an easy-to-use low-code library that performs data cleaning, data exploration, and data visualization with very few lines of code. QuickDA can save you hours of work, and it has so many cool features that I had to write two blogs to cover the most remarkable features. You can find them [here](https://towardsdatascience.com/how-to-create-data-visualizations-in-python-with-one-line-of-code-8cda1044fe69) and here.
The best part about QuickDA is that it uses libraries such as Pandas, Matplotlib, Seaborn, and Plotly. Thus, you will feel familiar with it once you start using it. For example, do you remember Panda’s .describe()
function? QuickDA can do that but in an improved way.
As we can see below, it returns statistical information about the features, but it also includes the object type, number of null and unique values, and the skewness of the data.

It’s easy and fast to get insights with QuickDA. You can get an overview, including warnings (more about warnings in a bit) about the dataset and visualization the data with one line of code. You don’t need to type endless lines of code to get a single graph.

And what do I mean when I say that you can get warnings about the dataset? QuickDA can show high cardinality, high correlation between features, high percentage of missing values, high percentage of zeros.

QuickDA has many more cool features, and I highly recommend you checking it out. I have written Save Hours of Work Doing a Complete EDA With a Few Lines of Code With This Library and How to Create Data Visualizations in Python With One Line of Code to find more information about it.
ELI5
Machine learning models are not only about how accurate a model can predict but it’s also about how it predicts. Sometimes we need to understand which features are driving the predictions to optimize the model or explain it. For example, in a natural language processing classification problem, how can you easily see which words influenced the prediction? That’s precisely where Eli5 comes in.
Eli5 helps you debugging machine learning classifiers and explain their predictions. It supports the most popular machine learning frameworks and packages, such as scikit-learn, Keras, XGBoost, LightGBM, and CatBoost. A while ago, I worked on an NLP project that classified hotel reviews, and I had to know which words were more influencing good and bad reviews the most. Eli5 was very handy. Let me show you how.
# Install Eli5
!pip install eli5
# Importing Eli5
from eli5.sklearn import explain_weights_sklearn
eli5.explain_weights(model, feature_names = X_train.columns.values.tolist(), top = #_of_features)

There you go! We can see that Eli5 returned a color-coded table showing the features with the highest weight for the model. We can see that the model is able to identify words as excellent and great for positive reviews and dirty and rude for negative reviews, which makes sense.
If you prefer a Pandas DataFrame, that’s possible as well with the following code:
from eli5.formatters import format_as_dataframe
explanation = explain_weights_sklearn(model_name, feature_names = X_train.columns.values.tolist(), top = #_of_features)
format_as_dataframe(explanation)

Eli5 is a nice-to-have library that can save you some time. It has other features that you can find here.
OpenDataSets
Let’s say you are starting a project to practice your data analysis and machine learning skills; where do you start? Most people go to Kaggle, find an interesting dataset, download the file, find the file in the Downloads folder, and drag the file to the folder where the notebook you are working on is. Quite a few steps, right? What if there were a better way? Well, that’s what OpenDataSets solves.
OpenDataSets allows us to download the dataset from a notebook. It will create a folder with the dataset in the same folder where your notebook saves some time. Cool, right?
To use it, you just need to type pip install opendataset
in your terminal. Then, you need to import it to the notebook by typing import opendatasets as od
, and you are good to go. Kaggle will ask for your credentials, but you can easily get it in your Kaggle profile page. In the example below, I want to download the famous heart attack dataset. Here is the code you will need:
import opendatasets as od
od.download("https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset")

As you can see above, the folder on the left of the image didn’t have the folder with the heart attack dataset. However, as soon as I run the code, it downloads the dataset for me. You can see that the dataset comes unzipped. It could not be any easier.
Comma
Comma is one of those libraries that you don’t know you need until you need it. Comma makes it easier to deal with CSV files. For example, you can easily extract the information from a CSV file in a list or dictionary. Here is a demonstration of how it works.
First, you can install Comma by typing pip install comma
in your terminal and you are good to go. Let’s now import Comma and the dataset that we will be using.
import comma
table = comma.load( 'https://raw.githubusercontent.com/DinoChiesa/Apigee-Csv-Shredder/master/csv/Sacramento-RealEstate-Transactions.csv')

The table at the top was created using Comma. I also created a table using Pandas to compare. They look almost identical. Now, let’s say that you want to get the values of a column as a list. You can easily do that with the following code:
table[0:4]['column_name']

If you want to get the information of a row as a dictionary, you can also easy do that by typing table[0]
.

I know that you can do this with Pandas as well, but it requires more code. If you need to do this very often, Comma might save you some time. It’s definitely a nice-to-know library.
Conclusion
Today we went over some libraries that you should know. Some of them are not for everyone and will only make sense when you need them. However, when this time comes, you will see that you can save hours of your precious time.
If you want to learn about more libraries, please check 5 Python Libraries That You Don’t Know About, But Should, [3 Awesome Python Libraries That You Should Know About](https://towardsdatascience.com/3-awesome-python-libraries-that-you-should-know-about-e2485e6e1cbe?source=your_stories_page————————————-), and 3 Awesome Python Libraries That You Should Know About. Thank you for reading and happy Coding.