Machine Learning is needed to understand and apply in our everyday job as data scientists. The knowledge to know is not limited to importing the code from the library, but it is expanded to the model concept, the algorithm choice, metrics, and many more.
To help learn machine learning concepts, I would outline my four interactive tools that you could use to learn in this article. Let’s get into it!
1. What-If Tool
What-If Tool is a web-based and notebook-based visualization tool to understand how machine learning behaviour works. What-If Tool was developed to understand the intricacies behind our trained model and experiment with the hypothetical situation.
What-If Tool is an interactive tool, means we could play around with the GUI to see the changes in real-time. Let me show you the example in the GIF below.

It seems like an exciting tool, right? The GIF above shows all the tabs we could play around with to understand our trained machine learning model. We use the machine learning model for the binary classification model trained on the Income UCI Data in this example. We could compare and experiment on two trained models on this data, so let’s break down each tab to know better what the function did for the model.

We are offered three tabs in the beginning; Datapoint editor, Performance & Fairness, and Features, where each tab visualize the machine learning model in different aspects:
- Datapoint editor: Visualization exploration for each data point
- performance & Fairness: Model performance exploration with various metrics
- Feature: Summary of features used for model training
Let’s look closely for the useability in each tab.
Datapoint editor

The datapoint editor is a tab where we could explore each data point prediction according to the model. This tab could select each data point to study the model behaviour. Let’s try to choose one of the data points.

In the data point selection, the top part allowed you to select the axis, label, and binning – this is useful if you want to have specific information to know. In default, the X-axis and Y-axis are divided by the model score output (between models 1 and 2).

When you have selected the data point, you will get information similar to the image above. In this section, we bring all the information regarding the feature of the datapoint and the model comparison score for each label.
Performance and Fairness

The performance and Fairness tab is where you could experiment with various thresholds, ground-truth, cost-ratio, and many more to understand the result of your machine learning model when we change certain aspects.

You could change your hypotheses in the left-part section, such as the prediction feature and optimization strategy.

In the right-part section, you could experiment with the threshold of each model to see the effect on the performance. The changes are happening in real-time, so the information you need to decide which model to use and whether your model has met your requirement or not could be assessed fast.
Features

Features tab is a section to get summary statistics of each feature used in the model training. All the basic statistics you need are available – mean, median, standard deviation, and many more. Also, the features are divided into numerical and categorical tabs to have an easier time learning.
If you want to learn more about the What-If Tool in the Notebook environment, you can visit the tutorial here.
2. Deep Playground
The Deep Playground project is an interactive web-based neural network for people to learn from. The web is simple enough for any beginner to understand how the neural network works.

The GIF above summarises the whole interaction; you could tinker with the hyperparameter in the top part – learning rate, activation, regularization, rate, and problem type.

The Deep Playground only have four kinds of dataset to use, but it represents the common problem in the machine learning project. On this part, you could also experiment on how much test data ratio, noise, and batch size.

Next, you could select how you want to treat your feature and the kind of transformation you desire. Also, you could add or decrease the hidden layers and neurons for each hidden layer. When you have finished setting up the experiment, then you only need to press play and look at the output to understand how the neural network would work with your setup.

Deep Playground is an open-source project; if you want to contribute or are curious about the source code, you could visit the GitHub page.
3. Probability Distribution by Simon-Ward Jones
Machine learning is all about probability output from our model, and learning about probability distribution would help us understand how our model works. Sometimes it is hard to understand how probability distribution work without a clear visualization aspect – that is why I recommend the Probability Distribution post by Simon-Ward Jones to help you learn.

The post gives you a visualization experiment to the following probability distribution:
- Bernoulli Distribution
- Binomial Distribution
- Normal Distribution
- Beta Distribution
- LogNormal Distribution
The post detailed the probability density function in each distribution and what happens if we change the parameter value. I suggest you experiment with the changes in the distribution because it helps you understand the concept quicker.
4. Embedding Projector
Unstructured data is harder to understand than structured data when trained using the machine learning model. One way to understand it is by embedding or representing the data as a mathematical vector using an unsupervised algorithm like PCA or t-SNE. The Embedding Projector from TensorFlow gives us an interactive visualization to help us understand the embedding layers.

As you can see in the GIF above, we used the dataset provided by the Word2Vec to understand how each word is represented and how close they are with each other in a low-dimensions feature.
There are five datasets from NLP data, Image data, or Tabular data in the left side part. You could use a few algorithms to embed the dataset, such as UMAP, T-SENE, PCA, or Custom. Lastly, you could choose the component you want to visualize in the bottom part.

On the right side, we could choose how to visualize our data by showing all the data points or isolating the data points. The isolation is related to the neighbour’s number you want to choose, and the decision would be based on the distance metrics you choose. If you’re going to select a specific word, you could use the search bar.

If you want to learn more about the Embedding Projector, you could visit the documentation here.
Conclusion
Understanding machine learning concept is hard, especially without clear visualization. To help you learn, I want to outline four interactive tools to help you understand machine learning; They are:
- What-If Tool
- Deep Playground
- Probability Distribution
- Embedding Projector
I hope it helps!
Visit me on my LinkedIn or Twitter.
If you enjoy my content and want to get more in-depth knowledge regarding data or just daily life as a Data Scientist, please consider subscribing to my newsletter here.
If you are not subscribed as a Medium Member, please consider subscribing through my referral.