The world’s leading publication for data science, AI, and ML professionals.

Bringing More Humans into the AI Lifecycle

Introducing Rubrix, a human-in-the-loop tool for tracking, exploring, and iterating on data for AI projects

Photo by Ehimetalor Akhere Unuabona on Unsplash
Photo by Ehimetalor Akhere Unuabona on Unsplash

Starting AI projects has become easier than ever before. Initiatives like the Hugging Face Hub or Papers with Code give developers instant access to the latest research. The rise of transfer learning and AutoML methods means you don’t need to spend months to train a model for your project. However, in my experience, this is only (a small) part of the story.

Almost everyone dealing with "applied" AI projects would agree that things like data collection and management, or model serving and monitoring are far more crucial and difficult to get right. While these things are getting easier too, the task of managing data and models still feels like patching tools and processes together. And when we talk about tools and processes, we are ultimately talking about people, we, the humans using the tools and trying to follow the processes.

"We are human after all. Flesh uncovered after all"

«Human after all» by Daft Punk, 2005.


A lot has been written about the so-called ML/Data Science project life cycles, but not so much about the interrelationship between tools, data, models, and people inside such "life cycles". Let’s take a look at a "simplified" life cycle:

A "simplified" AI project life cycle (Image by author)
A "simplified" AI project life cycle (Image by author)

The above picture shows a sequence of activities and roles involved in the process:

  • Data collection and labeling: typical roles involved in this activity are data engineers, data scientists, business and subject matter experts, and even crowd-workers for data-hungry projects.
  • Model development and experimentation: the central role here is data scientist.
  • Model testing: typical roles are ML engineers and data scientists.
  • Model deployment: the roles here are ML and DevOps engineers.
  • Monitoring and observability: the main roles are again ML and DevOps engineers.

So far so good, we have processes and we have roles. But there are two key ingredients we are missing: the "life" and the "cycle". Data and models are not static, they are "living" things, and training/testing/serving models are not one-off activities, they form cycles.

"It’s a modern life but it reads better on TV Whoa. It’s a modern life"

«Modern Life» by Devo, 1981


Moving into the production use of ML models, things can start to get complicated. Several key questions emerge in practice, such as:

Can data scientists easily go back to business and subject matter experts to collect more data?

Can we easily share "live" predictions with experts for collecting more training examples?

Can we measure production accuracy without continuously moving production data to the data collection and labeling phase?

Can we seamlessly collect feedback directly from end-users?

All of the above questions share an underlying theme: the friction between roles, tools, and processes to manage data for AI-based solutions.


At Recognai, we’ve suffered from these frictions since we started working on enterprise NLP projects with clients four years ago. We’ve developed internal and open-source tools to help us better support our clients. But there’s something we’ve been building, rebuilding, and rethinking this whole time: a tool for connecting the models, the data, and the people involved in projects so they can all contribute to improving data at any step of the process.

After many iterations around this idea, we are finally happy to share Rubrix, a free and open-source, human-in-the-loop¹ tool for AI projects.

Rubrix is:

  • A Python library enables data scientists, data engineers, and ML engineers to build bridges between data, models, and users, for data collection, model development, testing, deployment, and monitoring.
  • A web app for exploring, curating, and labeling data throughout projects’ workflows.
  • An open API for storing, retrieving, and searching human annotations and model predictions.
Rubrix and related lifecycle activities (Image by author)
Rubrix and related lifecycle activities (Image by author)

Rubrix can be installed with pip:

pip install rubrix

The only non-Python dependency is Elasticsearch. You can launch it using Docker and docker-compose, or use an existing Elasticsearch instance.

For a more detailed setup guide, check Rubrix’s guide, but if you are eager to try it out and are familiar with Docker:

Download the docker-compose.yml and launch Rubrix server and Elasticsearch:

mkdir rubrix && cd rubrix
wget -O docker-compose.yml https://git.io/rb-docker && docker-compose up

Now, before getting into more details, let’s see Rubrix in action!

The following code loads a zero-shot text classifier from the Hugging Face hub, the AG News dataset, a well-known benchmark for text classification, and uses Rubrix for storing the predictions as well as the human-annotated labels:

And that’s it! you and other members of your team can now explore how this classifier works on an unseen dataset:

Rubrix Exploration mode with the AGNews dataset and zero-shot predictions (Image by author)
Rubrix Exploration mode with the AGNews dataset and zero-shot predictions (Image by author)

Exploring model predictions against a labeled dataset, it’s only one of the many things you can do with Rubrix, because in "real life" sometimes you don’t have labels to start with, or you don’t have a model, or you just want to see your data while you start training your first models. Model development has been one of the very first uses Rubrix was designed for.

This tweet from Andrej Karpathy summarizes the whole point of looking at your data as early as possible.


Another great inspiration about the importance of curating data for AI is Vincent D. Warmerdam, Research Advocate at Rasa and creator of the amazing calmcode.io and libraries like human-learn:

To close our first example with the zero-shot classifier, let’s see another interesting use case for Rubrix: model monitoring and observability. If you are interested in model monitoring I highly recommend you this article from Ben Lorica and Paco Nathan. As I mentioned above, Rubrix is tightly integrated with the Elasticsearch stack. Thanks to Kibana’s ease of use and Rubrix’s data model, it’s a breeze to build custom dashboards on top of Rubrix datasets. This is how a real-time dashboard looks like while we log our zero-shot predictions inside a Jupyter notebook:

Custom monitoring dashboard with Rubrix and Kibana (Image by author)
Custom monitoring dashboard with Rubrix and Kibana (Image by author)

Coming back to why we built Rubrix, let’s introduce Rubrix guiding principles:

1. No one should be forced to change their workflows to contribute to a project.

For data scientists, this principle means that Rubrix doesn’t ask them to leave the tools they love such as Jupyter Lab or Colab, to extend any classes or wrap their models around any new abstraction.

For data and ML engineers, this means Rubrix offers simple but efficient methods for reading and writing (a lot of) data. We’ve designed Rubrix to be easy to use in automation scripts.

For business and subject matter experts, this means a clean and easy-to-use UI where they feel in control and where they can transparently inspect data and model predictions.

Another use case we’re excited about to improve transparency and trust is model interpretability in Rubrix paired with Kibana’s dashboards.

Here’s an example of a Rubrix Dataset, which includes token-level attributions for a sentiment classifier extracted with the Integrated Gradients method, using Transformers Interpret, and the captum.ai library.

Rubrix provides an agnostic data model so you can use other libraries and techniques like Lime.

Exploring sentiment classifier predictions with model interpretations in Rubrix (Image by author)
Exploring sentiment classifier predictions with model interpretations in Rubrix (Image by author)
Kibana's custom visualization of average token-attribution scores for the positive sentiment label (Image by author)
Kibana’s custom visualization of average token-attribution scores for the positive sentiment label (Image by author)

And here’s the code to log predictions and token attributions:

2. The API should be simple enough for everyone while enabling complex workflows.

The data model fits this short description:

A Dataset is a collection of Records whose shape is defined by a Task (for example Text Classification).

A Snapshot is a stored version of a Dataset to be used for reproducibility and automation pipelines.

A Record is defined by its Inputs and optionally contains Annotations, Predictions, and Metadata.

Annotations are typically the ground-truth defined by a human Agent for a given Task and Inputs.

Predictions typically ** come from a machine Agen**t (for example the output of a text classifier).

and into this little diagram:

Rubrix data model (Image by author)
Rubrix data model (Image by author)

Let’s see how to create and log a Record for NER (Named Entity Recognition), a widely known NLP Task:

The above code logs a NER example with the SONG and BAND tags using the rb.log method, which also creates a Dataset if it does not exist. Datasets are incremental and you can use rb.log for bulk logging large datasets (as we’ve seen on our monitoring example). This is how this record looks in the web app:

NER exploration view in Rubrix (Image by author)
NER exploration view in Rubrix (Image by author)

Rubrix has simply two main methods to call:

  • rb.log(records, name) to store records into a Rubrix Dataset.
  • rb.load(name) to read records from a Rubrix Dataset into a Pandas Dataframe.

To understand the kind of workflows these two methods enable, let’s see an example of loading a Rubrix Dataset for fine-tuning a pre-trained model. We’ll follow the guide "Fine-tuning a pre-trained model" from Hugging Face.

The workflow is following:

1– We’ve used rb.log to create a Rubrix dataset with zero-shot predictions and we’ve spent some time using the Rubrix web-app annotation mode for manually labeling a few examples. It’s the same example as before, but we don’t have any prior labels in this case.

Rubrix Annotation mode in a Dataset with zero-shot predictions (Image by author)
Rubrix Annotation mode in a Dataset with zero-shot predictions (Image by author)

2– After spending some time with the annotation, we are ready to use rb.load and prepare our dataset for fine-tuning the model. Normally, we would split it into train, test, and validation, but let’s keep the example as straightforward as possible:

3– From this point on, it’s just regular fine-tuning with the Trainer API as described in the guide. Please note we are using distilbert-base-uncased instead of bert-base-cased:

If this got you interested, you can:

3. Embrace integration with other tools, libraries, and frameworks.

We’re living in exciting times with new libraries and tools being released every month. People feel comfortable with many tools for different things, such as data annotation, or model experiment tracking, and we want to embrace this diversity.

Rubrix is not an all-or-nothing tool or a platform, our goal is to enable users to build novel workflows and combine Rubrix with their tools of choice. That’s why, besides the Python library and the web app, we’ve designed an open REST API for developers to build on.

Also, to spur your imagination and creativity, check out Rubrix’s docs:

4. Provide a minimal feature set and let the community guide what comes next.

As you might have noticed, I’ve been talking about AI projects but we’ve only seen examples of natural language processing. The reason is simple: natural language processing and knowledge graph use cases have been the original driver for Rubrix’s development at Recognai. However, Rubrix’s data model and architecture are designed to easily add new use cases and applications. Rubrix covers a wide range of NLP and knowledge graph applications with only two supported tasks, Text Classification, and Token Classification.

For knowledge graph use cases, feel free to check the node classification tutorial using the amazing kglab and PyTorch Geometric libraries with Rubrix.

Immediate use cases we envision are text2text, which will cover many other NLP applications such as text summarization or machine translation; computer vision tasks such as Image Classification; and speech recognition tasks such as speech2text. But before this, we want to hear your voice.

This is why we are excited to announce Rubrix so you can be part of an open and friendly community and drive what comes next.

If you want to join the Rubrix community or talk about your immediate and envisioned applications, drop us a message on Rubrix’s Github’s Discussion forum.

"I call this number For a data date I don’t know what to do I need a rendez-vous"

«Computer love» by Kraftwerk, 1981


Acknowledgments

Thanks to Paco Nathan, Amélie Viallet, David Carreto, Ignacio Talavera, and Francisco Aranda for their great help and suggestions for improving this article.

Footnotes

  1. If you are not familiar with the concept of human-in-the-loop, I highly recommend you this article from Stanford’s Institute for Human-Centered AI.

Related Articles