"Exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those that we believe to be there." – John W Tukey
The importance and necessity of data visualization in data science cannot be emphasized enough. The fact that a picture is worth a thousand words can be aptly applied to any project’s life cycle associated with data. However, a lot of times, the tools that enable these visualizations aren’t intelligent enough. This essentially means that while we have hundreds of visualization libraries, most of them require users to write a substantial amount of code for plotting even a single graph. This shifts the focus on the mechanics of the visualization rather than the critical relationships within the data.
What if there were a tool that could simplify data exploration by recommending relevant visualizations to the users? There is a new library in the town called Lux 💡 , and it has been developed to address these very questions.
_This article is based upon @Doris lee’s session on Lux during the Rise Camp 2020. Special thanks to Doris for allowing me to use the resources and images from the slides._
Current challenges to efficient data exploration
Today data analysts have access to multiple tools for data exploration. While interactive Jupyter notebooks enable iterative experimentation, there are also powerful BI tools like Power BI and Tableau, which enable advanced exploration by a mere point and click. However, even with the advent of these incredibly powerful tools, there are still challenges that hinder data exploration flow. This is especially true when we go from a question in our minds to discovering actionable insights. Let’s look at the three major identifiable obstacles that data analysts face currently:
1. The disconnect between code and interactive tools.

While programming tools provide flexibility, they are mostly inaccessible to people with less programming experience. On the other hand, point and click tools are simple to use but have limited flexibility and are hard to customize.
2. Plotting requires lots of code and prior decisions

Secondly, to create a visualization, we first need to think about how the visualization should look like with all the specifications. We then need to translate all these specification details into code. The figure above shows how a substantial amount of code is required in two popular python libraries – Matplotlib and Plotly, just to output a mere bar graph. This again affects data exploration, especially when the users only have a vague idea of what they’re looking for
3. Trial-and-error is tedious and overwhelming

Every EDA requires continuous cycles of trial and error. A user has to experiment with multiple visualizations before settling for the final one. There are chances that analysts might miss out on important insights that are in their data sets. Another common issue is that analysts might not know what set of operations they should perform on their data to get to the desired insights and often lose track or get stuck in their analysis.
There is an apparent gap between how people reason and think about their data and what actually needs to be done to the data to get to these insights. Lux is a step to address these possible gaps.
Lux

Lux is a Python library that helps users explore and discover meaningful insights from their data by automating certain data exploration aspects. It is an effort towards bridging the gap between code and interactive interfaces. Lux features an intent language that allows users to specify their analysis intent in a sloppy manner, and it automatically infers the unspecified details and determines appropriate visualization mappings.
The goal of Lux is to make it easier for data scientists to explore their data even when the user doesn’t have a clear idea of what they’re looking for.

- Lux brings the power of interactive visualizations directly into Jupyter notebooks to bridge the gap between code and interactive interfaces.
- Lux features a powerful intent language that allows users to specify their analysis interests to lower the programming cost.
- Lux provides visualization recommendations of data frames automatically to users.
We now have a fair idea of how Lux tries to plugin the common problems users face when exploring data. Let’s now look at how we can use the Lux library with an example. Since the idea is just to provide a quick demo, I’ll use a very simple example. Once you have a fair idea, you’ll be able to use it with the dataset of your choice.
Case Study: Analysing the Palmer Penguins 🐧 dataset

The palmer penguins dataset has currently become a favorite of the data science community and is a drop-in replacement for the overused Iris dataset. The dataset consists of data for 344 penguins. The data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER. Let’s start by installing and importing the Lux library. You can follow along with this tutorial in a Jupyter notebook via the Binder.
Installation
pip install lux-api
#Activating extension for Jupyter notebook
jupyter nbextension install --py luxwidget
jupyter nbextension enable --py luxwidget
# Activating extension for Jupyter lab
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install luxwidget
For more details like using Lux with SQL engine, read the documentation, which is pretty robust and contains many hands-on examples.
Importing the necessary libraries and the dataset
Once the Lux library has been installed, we’ll import it along with our dataset.
import pandas as pd
import lux
df = pd.read_csv('penguins_size.csv')
df.head()

Lux’s nice thing is that it can be used as it is with the pandas dataframe and doesn’t require any modifications to the existing syntax. For instance, if you drop any column or row, the recommendations are regenerated based on the updated dataframe. All the nice functionalities that we get from pandas like dropping columns, importing CSVs, etc., are also preserved. Let’s get an overview of the data set.
df.info()

There are some missing values in the dataset. Let’s get rid of those.
df = df.dropna()
Our data is now in memory, and we are all set to see how Lux can ease the EDA process for us.
EDA with Lux: Supporting a Visual dataframe workflow

df
When we print out the data frame, we see the default pandas table display. We can toggle it to get a set of recommendations generated automatically by Lux.

The recommendations in lux are organized by three different tabs, which represent potential next steps that users can take in their exploration.
The Correlation Tab: shows a set of pairwise relationships between quantitative attributes ranked by the most correlated to the least correlated one.

We can see that the penguin flipper length and body mass show a positive correlation. Penguins’ culmen length and depth also show some interesting patterns, and it appears that there is a negative correlation. To be specific, the culmen is the upper ridge of a bird’s bill.

The Distribution Tab shows a set of univariate distributions ranked by the most skewed to the least skewed.

The Occurrence Tab shows a set of bar charts that can be generated from the data set.

This tab shows there are three different species of penguins – _Adelie, Chinstrap, and Gento_o. There are also three different islands – Torgersen, Biscoe, and Dream; and both male and female species have been included in the dataset.
Intent-based recommendations
Beyond the basic recommendations, we can also specify our analysis intent. Let’s say that we want to find out how the culmen length varies with the species. We can set the intent here as ['culmen_length_mm','species'].
When we print out the data frame again, we can see that the recommendations are steered to what is relevant to the intent that we’ve specified.
df.intent = ['culmen_length_mm','species']
df
On the left-hand side in the image below, what we see is Current Visualization
corresponding to the attributes that we have selected. On the right-hand side, we have Enhance
i.e. what happens when we add an attribute to the current selection. We also have the Filter
tab which adds filter while fixing the selected variable.

If you closely look at the correlations within species, culmen length and depth are positively correlated. This is a classic example of Simpson’s paradox.

Finally, you can get a pretty clear separation between all three species by looking at flipper length versus culmen length.

Exporting visualizations from Widget
Lux also makes it pretty easy to export and share the generated visualizations. The visualizations can be exported into a static HTML as follows:
df.save_as_html('file.html')
We can also access the set of recommendations generated for the data frames via the properties recommendation. The output is a dictionary, keyed by the name of the recommendation category.
df.recommendation

Exporting Visualizations as Code
Not only can we export visualization as HTML but also as code. The GIF below shows how you can view the first bar chart’s code in the Occurrence tab. The visualizations can then be exported to code in Altair for further edits or as Vega-Lite specification. More details can be found in the documentation.

Resources and Hands-on Exercises
The demo above was just a simple way to get started. Lux’s Github Repository contains a lot of resources and interactive binder notebooks on how to use Lux. This could be a great place to start. Additionally, there is also detailed documentation.
Conclusion & Next Steps
In the above article, we saw how a Data Analysis workflow in a Jupyter notebook could be completely transformed by using Lux. Lux offers a lot more visual richness to encourage meaningful data exploration. Lux is still under active development, and its maintainers are looking to hear from users who are using or might be interested in using Lux. This will help the team to understand how they could improve the tool for you.