Create an Interactive Bubble Plot with PyQt5

Use a GUI to make Matplotlib plots more engaging

Kruthi Krishnappa
Towards Data Science

--

Final Output created with PyQt5 and Matplotlib

Introduction to PyQt5

The Qt GUI framework is used to create user interfaces across platforms. The framework is written in C++ but the PyQt5 library allows it to be imported and used directly in Python. Its ease of use makes it one of the most popular applications to create GUIs in Python.

PyQt5 has many uses within data visualization in Python, one being interactive plots made in matplotlib. Interactive plots allow for the communication of more complex data in an effective way. In this article, I will be demonstrating how to create an interactive bubble plot so the user can dive into the data.

System Requirements

I utilized JupyterNotebook, but any IDE can be used as well. Matplotlib can only be integrated with PyQt5 if the version is matplotlib notebook which allows for interactive plots.

In Jupyter Notebook the line: %matplotlib notebook needs to be run to change the preset because %matplotlib inline does not allow for interactive plots. If an IDE is being used matplotlib notebook is the automatic setting.

Step 1: Make a Bubble Chart

The dataset I will be using is The World Factbook 2020 published annually by the CIA. The dataset contains general information about the people, economy, and government of every country in the world.

It can be downloaded here:

https://www.cia.gov/the-world-factbook/about/archives/download/factbook-2020.zip

The variables used from this dataset are:

  • X: GDP per capita
  • Y: Life Expectancy
  • Hue: Birth Rate
  • Size: Population

Import Libraries

from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns

Data Cleaning

Some data cleaning and data augmentation needs to be applied to each variable before it can be used in the graph.

X Variable: GDP per Capita is currently a string that includes commas and a dollar symbol which both need to be removed in order to convert the string to an integer value.

df = pd.read_csv(“factbook.csv”)df[“GDP per Capita”] = df[“GDP per capita”].str.replace(‘,’,’’).str.replace(‘$’,’’).astype(float).astype(int)

Y Variable: Birth Rate is currently a continuous variable but since it used for the hue it needs to be made into a discrete by creating bins.

bi = []
for i in range(0,60,10):
bi.append(i)
df[‘Birth Rate’] =(pd.cut(df[‘ Birth rate’], bins = bi)

Size variable: Population is currently a string that consists of commas and in order to convert to an integer the commas need to be removed.

df[‘Population (M)’]=(df[‘Population’].str.replace(‘,’,’’)).astype(int)

Seaborn, a data visualization library built on matplotlib, will be used to make this bubble chart. The traditional matplotlib library can also be used.

bubble = sns.scatterplot(data=df, x=”GDP per Capita”, y=”Life expectancy at birth”, size=”Population (M)”, hue=”Birth Rate”, legend= True, sizes=(10, 300))

Add the legend for size and color, and show the plot.

bubble.legend()
plt.show()
Seaborn Plot

This initial graph shows the relationships between the four variables clearly. Trying to add additional variables would make the visualization confusing, however, there are still five other variables in this dataset. By making the visualization interactive, the user can dive further into the data by seeing how different variables interact with each other.

Step 2: Set Up PyQt5

Use the import statements below to download all the libraries and dependencies for PyQt5.

from PyQt5.QtWidgets import QDialog, QApplication, QPushButton, QVBoxLayout, QLabel, QComboBox, QSlider
from PyQt5.QtCore import Qt
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
import math
import sys
import re

Create a class and constructor to begin. Then set the geometry (specify what geometry is) of the popup window. I chose to do (400,400,900,900) as this is what I believed was large enough for the user to capture the details from the plots. The parameters for setGeometry are x, y, width, and height.

class Window(QDialog):
# constructor
def __init__(self, parent=None):
self.setGeometry(400, 400, 900, 900)

Step 3: Add Widgets in Constructor

Widget 1: FigureCanvas

The figure widget is used to display the graph in the visualization.

self.figure = plt.figure()
self.canvas = FigureCanvas(self.figure)

Wiget 2: QComboBox

Add code in the constructor for each ComboBox. The code below is for the first one I named xComboBox to capture the user input for the variable used for the x-axis. First, initialize and name the ComboBox.

self.xComboBox = QComboBox(self)

To add items into the ComboBox .addItems() is used with a list that includes the options. In this case, all the column names are added as options in the ComboBox.

self.xComboBox.addItems([“Area”,”Death rate”, “ Birth rate”,”GDP per capita”,”Population”,”Electricity consumption”, “Highways”, “Total fertility rate”, “Life expectancy at birth”])

Wiget 3: QLabel

A label needs to be created in order to allow the user to know what the other widgets will be used for. In this case, it will allow the user to know what the values in the ComboBox will be used for.

self.xLabel = QLabel(“&X:”)

The label created needs to be linked to the ComboBox in order to make it one combined object.

self.xLabel.setBuddy(self.xComboBox)

Wiget 4: QSlider

A slider is used to allow the user to adjust the values within the visualization. The widget within PyQt5 is QSlider which has a required parameter of the orientation which can either be Qt.Horizontal or Qt.Vertical. In this case, horizontal orientation is the most visually appealing. In this visualization, the slider will change the size variable so the user can increase or decrease the bubbles to determine the best size for visibility. (Look at this again)

self.mySlider = QSlider(Qt.Horizontal, self)

The geometry of the slider will need to be adjusted to best fit the GUI window. The arguments for setGeometry are the same as above, x, y, width, and height.

self.mySlider.setGeometry(30, 40, 200, 30)

A function needs to be linked to the slider to utilize its value within the visualization. First, .valueChanged[int] needs to be used to get the current value based on the position of the slider then .connect() with the name of the function to be used. The function that is used will also be used in the button widget below and it will be discussed in the section.

self.mySlider.valueChanged[int].connect(self.changeValue)

Widget 5: Button

Use QPushButton to create the button widget. The parameter in this function is the button name passed in as a string. The button the name is “Plot Current Attributes” and any time the user changes the slider value or the ComboBox values this button will need to be pressed to update the graph. There also needs to be a function connected to the button to program the actions being carried out when pressed. I created a function called changeValue that is used for both the button and slider.

button = QPushButton(“Plot Current Attributes”, self)
button.pressed.connect(self.changeValue)

changeValue: Slider and Button Function

The button and slider widgets need to be connected to a function in order to utilize their values within the visualization. The changeValue function I created can be used for both the button and the slider. This is possible with the *args parameter which allows for any number of parameters to be passed through. For the button, no parameters will be passed when calling the function, but for the slider, the position value will be passed. This function is outside the constructor.

def changeValue(self, *args):

Retrieve all the current values of the comboBox widgets to be used for the scatterplot.

# finding the content of current item in combo box
x = self.xComboBox.currentText()
y = self.yComboBox.currentText()
s = self.sComboBox.currentText()
c = self.cComboBox.currentText()

Clear the current plot and create a new subplot.

#clear current figure
self.figure.clear()
#create a subplot
ax = self.figure.add_subplot(111)

Adjust the size and color variables. The size variable needs to be normalized so the bubbles will the right size.

#normalize the size data
if len(args) == 0:
df[“s_new”] = df[s]/df[s].abs().max()
df[“s_new”] = df[“s_new”] * 4
else:
df[“s_new”] = df[s] / df[s].abs().max()
df[“s_new”] = df[“s_new”] * args * 4

The color variable needs to be made discrete.

df[‘new_c’] = (pd.cut(df[c], bins=5))

Once the user selects the new values from the ComboBoxes, sets the new size with the slider and the new data is adjusted the scatter plot can be created.

#create scatter plot with new data
b = ax.scatter(x=df[x], y=df[y], s = df[“s_new”], c = df[“new_c”].cat.codes)
#create labels and title
t = y + “ vs “ + x
ax.set(xlabel=x, ylabel =y, title=t )

Create custom labels for the color and size legend. Matplotlib adds labels automatically, however, for the color variable, ranges need to be displayed. The automatic label will just label the color from 1 to n, n being the number of colors. For size, we normalized the data so that is what the automatic labels will show and we want the real data in the legend. This is an optional step, if the data has been unchanged there is no need to do this as the automatic labels will be correct.

#create labels and title
t = y + “ vs “ + x
ax.set(xlabel=x, ylabel =y, title=t )
#extract handles and labels for legend
handles, labels = b.legend_elements(prop=”sizes”, alpha=0.6)
#create custom labels for size legend
num_labels = len(handles)
labels_new = list(np.arange((min(df[s])), (max(df[s])), ((max(df[s]) — min(df[s]))/(num_labels-1))))
labels_new = list(np.around(np.array(labels_new), 1))
# create custom labels that show ranges for color legend
df[‘new_c’] = (pd.cut(df[c], bins=5))
num_labels_c = len(b.legend_elements()[0])
col_bins = pd.cut(df[c], bins=num_labels_c,precision=1)

Add the legend with custom labels and format the graph. The size of the graph needs to be resized to allow the legend to fit outside the graph. This is done by reducing the height and width by 10% and moving the y0 position up a little bit so the color legend can be at the bottom of the graph and the size legend on the right side.

# get and adjust the position of the graph to fit the legends
box = ax.get_position()
ax.set_position([box.x0, box.y0 + box.height * 0.15, box.width * 0.9, box.height * 0.9])
#color legend with custom labels
legend1 =ax.legend(b.legend_elements()[0],col_bins , title = c, loc=’upper center’, bbox_to_anchor=(0.5, -0.15), ncol = 5)
ax.add_artist(legend1)
#size legend with custom labels
legend2 = ax.legend(handles, labels_new, loc = “center left”, title=s, bbox_to_anchor=(1, 0.5))
ax.set(xlabel=x, ylabel =y, title=t )

Draw the new graph with the figure widget.

#draw new graph
self.canvas.draw()

Step 4: Formatting Widgets

Once all the widgets are created they need to be formatted. PyQt5 has a multitude of different layouts, I chose to use the QVBoxLayout(). This arranges the widgets in a vertical box. There is also QHBoxLayout() which arranges the widgets in a horizontal box, QGridLayout() arranges widgets in a grid format and QFormLayout() which arranges the widgets in two columns.

Each widget can be added to the layout one after the other and they will be stacked on top of each other. Finally, once all the widgets are in the layout it needs to be set with self.setLayout(LayoutName) my layout name is grid. Any name can be used, but this initializes the layout object and it will need to be called in order to add widgets to that specific layout.

grid = QVBoxLayout()
grid.addWidget(self.xLabel)
grid.addWidget(self.xComboBox)
grid.addWidget(self.yLabel)
grid.addWidget(self.yComboBox)
grid.addWidget(self.sLabel)
grid.addWidget(self.sComboBox)
grid.addWidget(self.cLabel)
grid.addWidget(self.cComboBox)
grid.addWidget(self.canvas)
grid.addWidget(self.mySlider)
grid.addWidget(button)
self.setLayout(grid)

Step 5: Main Method

The main method creates an instance of the class and infinitely loops to get any changes made to the visualization.

if __name__ == ‘__main__’:
# creating apyqt5 application
app = QApplication(sys.argv)
# creating a window object
main = Window()
# showing the window
main.show()
# loop
sys.exit(app.exec_())

Final Output

Final GUI

Summary

Combine all the steps above to get your interactive bubble plot! When the function is run, the GUI should pop up in a separate window. The full code is linked below. While this was a simple example, PyQt5 can be integrated into any matplotlib visualization. It allows visualizations to be created that add a layer of depth and information that is not achieved with a general report or a static visualization. After all, a picture is worth a thousand words.

The full code can be found here: https://github.com/kruthik109/Data-Visualization/blob/main/Interactive-Bubble-Plot/widgets.py

Sources

Central Intelligence Agency. (2020, April 6). World Factbook 2020. Central Intelligence Agency. Retrieved February 22, 2022, from https://www.cia.gov/the-world-factbook/about/archives/

--

--