The world’s leading publication for data science, AI, and ML professionals.

Boomerang Plot

Visualization for Rapidly Finding Generalizable Models

s

Image by Author
Image by Author

The aiqcboomerang plot visualizes various performance metrics across each split (train, validation, test) for every model in an experiment. When the points of a model trace are tightly clustered/ precise, it means that the model has discovered patterns that generalize across each population.


🧮 How to evaluate many tuned models

Imagine that you’ve just trained a large batch of models that all seem to be performing relatively well. How do you know which one is the best based on the metrics you care about? In order to answer this question, you’d start by:

  • Run the raw predictions back through functions like [sklearn.metrics.f1_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)
  • Do this for each split/ fold(train, validation, test).
  • Do this for each model.
  • Do this for each metric.

At this point, you could calculate aggregate metrics for each model. However, with only 2 or 3 splits to learn from, aggregate metrics aren’t very useful.

For example, if you have a model that overfits on training & evaluation data at 100% & 95% accuracy respectively, but flunks the holdout data with 82% – then most aggregate metrics would be misleading. You’d have to introduce a range or standard deviation measure to make sense of it, but at that point couldn’t you just look at the 3 splits yourself? 🤷 So you’re right back where you started with a table of raw data.

Why not just visualize it? If you’ve trained a [Queue](https://aiqc.readthedocs.io/en/latest/notebooks/api_low_level.html#8.-Queue-of-training-Jobs.) of models using AIQC, it’s as easy as:

queue.plot_performance(
    score_type:str=None,
    min_score:float=None,
    max_loss:float=None
)
  • score_type: choose from the following metrics for categorization (accuracy, f1, roc_auc, precision, recall) or quantification (R², MSE, Explained Variance).
  • The min and max arguments act as thresholds that remove any model with a split that does not meet, and resizing the graph to fit those that do.
Image by Author
Image by Author

🪃 As you may have guessed, it’s eponymously named the boomerang plot because of the curves it makes for each model.

The reason why AIQC is able to do this is that while the Queue is training, it automatically generates metrics and plots for each split/ fold of each model according to the Queue.analysis_type. So when the time comes for evaluation, this information can simply be called up by the practitioner.


🔬 Interpretation of model performance

The beauty of visualization is that it enables the practitioner to conduct their own unsupervised interpretation. We perform our own clustering analysis just by looking at the plot.

A quick glance and a hover over the chart above tells us that:

  • The architectures on the right are inferior.
  • The highly performant models on the left are overfit on the training data.
  • The most generalizable is the orange one Predictor.id==15.
  • However, we’re not done training yet. We need to tweak the orange model to see if we can get its performance up. So next I’d take a look at it’s parameters and learning curve to see what could be improved.
predictor = aiqc.orm.Predictor.get_by_id(15)
predictor.get_hyperparameters()
predictor.plot_learning_curve()

⏩ Can we get insight sooner?

Having gone through this cycle many times, I decided to package the entire experience into a realtime dashboard to solve the following problems:

  • Add models to the plot as they are trained.
  • Separate process for the training queue and the plot.
  • Change scores and metrics without recalling the plot
  • Fetch hyperparameters and other supplemental information without manually querying the ORM.
from aiqc.lab import Tracker
app = Tracker
app.start()
"📊   AIQC Tracker http://127.0.0.1:9991 📊  "

Voila – we now have the realtime Dash app in the gif at the start of the blog post.


_AIQC is an open source library written by the author of this post_Please consider giving AIQC a star on GitHub⭐ https://github.com/aiqc/aiqc


Related Articles