The world’s leading publication for data science, AI, and ML professionals.

3 Python Packages for Interactive Data Analysis

Explore data in a more interactive way

Photo by Towfiqu barbhuiya on Unsplash
Photo by Towfiqu barbhuiya on Unsplash

Data analysis is a staple activity for any data person and is required to understand what we are working on. To help the data analysis process, we have used the Python language for an easier workflow. However, sometimes we want a more interactive way to explore data. Some have developed Python packages to interactively explore data to answer the needs.

This article will explore 3 Python packages that we could use to explore the dataset interactively. Let’s get into it.


1. PandasGUI

PandasGUI is a simple Python Package that provides GUI for dataset exploration. The package offers a separate GUI with an excel-like experience we could use to explore the dataset, acquire the statistic, visualize the data, and many more. Let’s try the package for a hands-on experience.

First, we need to install the PandasGUI package.

pip install pandasgui

After installing the package, we can instantly use the package to explore our dataset. As an example dataset, I would use the mpg dataset from seaborn.

#Load Dataset
import seaborn as sns
mpg = sns.load_dataset('mpg')
#Initiate the GUI
from pandasgui import show
show(mpg)

You would acquire the following GUI on a new screen using the above code.

Image by Author
Image by Author

The PandasGUI gives us options to explore the data with various features, including:

  • Data Filtering,
  • Statistical Information,
  • Plotting,
  • Data reshaping.

First, let’s take a look around the PandasGUI tabs. In a GIF below, you can see that we could arrange the tabs requirements as necessary.

Gif created by Author
Gif created by Author

Next, let’s take a look at the filtering data tab. This tab allows you to filter the data frame using a specific query. The query to fill is based on the Pandas query, so it would be familiar if you have learned about it.

GIF created by Author
GIF created by Author

Take a look at the GIF above. In my sample, I write the ‘model_year > 72’ query, where the result is the query with a tick box. The filter condition would be permanently in the query filter list, and you can untick it when you don’t require it.

If you make a mistake during the query writing, you only need to double-click the query and rewrite it. As simple as that.

Now, let’s take a look at the statistics tab.

GIF created by Author
GIF created by Author

The statistic tab provides you with simple variable statistics of your data, such as count, mean, and standard deviation. It’s similar to the describe attribute from Pandas.

If you do filtering in the previous tab, the statistic would be changed according to your filter.

Next, we would get into the Grapher tab or the plotting GUI. This tab allows you to create a single variable plot or multiple variable plots. Let me show you the example below.

GIF created by Author
GIF created by Author

Creating a plot is only a matter of drop-and-drag, as easy as that. The plotly package is used for the visualization, so we can explore the graph by hovering the cursor over the graph.

Last is the reshaper tab. This is a tab in which we can reshape the dataset by creating a new pivot table or melting the dataset.

Image created by Author
Image created by Author

If you want to import the dataset into a new CSV file or export a new CSV file to the PandasGUI, you could also click the selection shown in the image below.

Image by Author
Image by Author

2. D-Tale

D-Tale is a Python package for interactive data exploration which uses a Flask back-end and a React front-end to analyze the data easily. The data analysis could be done directly on your Jupyter Notebook or outside the notebook. Let’s try to use the package.

First, we need to install the package.

pip install dtale

Then we could initiate the D-tale process using the following code. I would use the same MPG dataset I used in the previous sample.

import dtale
d = dtale.show(mpg)
d
Image by Author
Image by Author

You could do many activities with D-Tale where I can’t explain every single one. I would only explain the feature that I feel is important for you to know.

First, let’s take a look at the Actions tab. We can manipulate the dataset we have in this tab, such as filtering, merging, or deletion. Let’s see what the action tab provides for us.

Image by Author
Image by Author

The action tab has all the features to manipulate your dataset, such as data conversion, creating data frame function, and filtering. Additionally, you could get the data summary using the Summarize Data function.

If you are unsure what each function does, you could highlight the selection, and the explanation would be available.

Image by Author
Image by Author

Personally, I feel the best feature of the D-tale is its visualization feature.

Image by Author
Image by Author

As we can see in the above image, there is various visualization we could try out, such as:

  • Describe

Describe allowing us to acquire the basic statistic visualization.

Image by Author
Image by Author
  • Predictive Power Score

PPS Score visualization of the dataset.

Image by Author
Image by Author
  • Various chart
Image by Author
Image by Author

After visualization, we could use the Highlight tab to help us highlight various data in our dataset, such as the missing data or the outliers.

Image by Author
Image by Author

And lastly, you could change the D-tale setting, such as the theme, language, and screen size.

Image by Author
Image by Author

3. Mito

Let’s try to install the Mito package. Mito is a Python Package that transforms your data frame into an excel-like analysis data frame. Imagine if you have an excel file, but it is in your Jupyter Notebook. We could use the following code to do that.

python -m pip install mitoinstaller
python -m mitoinstaller install

After installing, we activate the Mito package to create an excel-like sheet with the following code.

import mitosheet
mitosheet.sheet(mpg)
Image by Author
Image by Author

As we can see in the image above, the data frame we have previously is transformed into an excel-like data sheet.

The package is easy to explore, and if you are already familiar with excel, you will feel at home. Let’s try to use some of the features I think are useful for data exploration.

First, we could view the column summary statistics with the View column summary statistics.

Image by Author
Image by Author

Then we could create various graphs easily with the Graph button.

Image by Author
Image by Author

If required, we could also filter the data directly in the column.

Image by Author
Image by Author

There are still many features you could try out with Mito. If you love analysis with excel, Mito would be a good choice for you.


Conclusion

Any data person does data analysis because it is the required step. Sometimes, we want a more interactive way to analyze the data. To do that, here are 3 Python packages to do interactive data analysis:

  1. PandasGUI
  2. D-tale
  3. Mito

I hope it helps!

Visit me on my Social Media to have a more in-depth conversation or any questions.

If you are not subscribed as a Medium Member, please consider subscribing through my referral.


Related Articles