Building NLP Web Apps With Gradio And Hugging Face Transformers

A web app is essential for demonstrating your NLP/ML solution to colleagues and clients. But building and deploying one can be a nightmare at times. Enter Gradio.

Published in

Towards Data Science

7 min readAug 11, 2021

Gif of sentiment analysis web app by Chua Chin Hon.

Web app design and deployment arguably rank among the most neglected skills for data scientists and analysts. Yet, these skills are essential if you need to demonstrate your NLP or machine learning solution to colleagues or clients.

While a good presentation goes a long way, nothing quite beats having a prototype where non-technical users can test the proposed solution for themselves. And if you’ve tried deploying an ML web app in recent years, you’d know that most hosting services don’t make it easy for those with limited front-end development experience.

Enter Gradio, a relatively new library that makes the creation and sharing of ML and NLP web apps a breeze, by comparison. With a few lines of code, you can house your model in a number of standard templates that take the hassle out of having to create separate HTML templates and figuring out the UI, color and size of the buttons etc.

To be sure, the Gradio templates are pretty minimalist. But frankly that’s all you need in the early stages of testing and development, when the goal is simply to test fast and iterate fast. Gradio’s close integration with Hugging Face’s transformers library and model hub makes it an even more powerful tool.

Gradio is not the only library you can use for rapid web app development, of course. Streamlit and Plotly are two other prominent names in this space, each with their own strengths. But I’d say Gradio is the most user-friendly by far, and a game changer for early phase NLP/ML solutions development.

Over several Jupyter notebooks, I’ll share examples of how I’ve used Gradio to build standalone and “chain-linked” NLP apps that combine different functionalities and transformer models.

REPO, FILES, AND PROJECT SCOPE

My repo for this project contains all the files needed to run the examples in this post: 5 Jupyter notebooks, 2 audio files and a pickled Logistic Regression model (to demonstrate Gradio’s usage beyond transformer models).

The demos here are geared more towards newcomers, as well as those in the early explorative phase of their NLP/ML projects. Gradio offers options for hosting and private deployment, but that’s well outside the scope of this post.

It takes just a few lines of code to get a Gradio app up and running, on top of what you need to do to define your NLP/ML model’s input and output. The process is even simpler if you use Hugging Face’s public Inference API. But I won’t take that approach in my examples as I find that Hugging Face’s public/free Inference API is relatively slow, and your app could crash if you try to load too many transformer models in one go.

As such, I coded my demos to run off a local machine. To share the app publicly, you just need to change one parameter in the Gradio interface.

1. STANDALONE SENTIMENT ANALYSIS APP

Let’s start with one of the simplest examples possible — building a web app for sentiment analysis using Hugging Face’s pipeline API. The default Distilbert model in the sentiment analysis pipeline returns two values — a label (positive or negative) and a score (float).

Gradio takes the pain out of having to design the web app from scratch and fiddling with issues like how to label the two outputs correctly. The screen-grab below shows how you can easily adjust the look of the app with a few parameter changes in Gradio:

And thanks to the close integration between the Gradio and transformers libraries, it won’t take more than a few minutes to tweak the code in notebook1.0 to turn the sentiment analysis web app into one for translation, summarization, or zero-shot classification.

2. STANDALONE TROLL TWEET DETECTOR APP

Gradio works just as well with pickled models via standard machine learning libraries like Scikit-learn. In notebook1.1, I loaded a pickled Logistic Regression classifier for trolls tweets that I had built in a previous project, and the Gradio app was up and running in no time:

The Gradio template is barebones, no doubt. But that’s all I need at the early stage to see if the model is working as intended, and if the solution is clear to a non-technical audience.

3. GRADIO IN “PARALLEL” — COMPARING 2 SUMMARY MODELS

Of late, it’s become practically impossible to keep up with the number of new transformer models available for various NLP tasks. If you are working on a text summarization project, for instance, how do you demonstrate which of the 248 models available on Hugging Face’s model hub work better for your use case? Or how would you demonstrate that your own fine-tuned model performs better than another model out there?

Gradio provides a neat solution by allowing transformer models to be loaded in “parallel” within the same app. This way you can directly compare the different results from one input:

In notebook2.0, I spun up a quick web app to compare the summarization capabilities of two different models: FB’s Bart and Google’s Pegasus. This is a great way to directly compare the results from multiple models without having to copy out the results from different apps, or switch screens back and forth between two models.

This is also a great way for comparing the performances of text-generation or translation models, where the results from different models can vary quite widely. Gradio doesn’t say what’s the maximum number of models you can load in parallel at one time, so apply some commonsensical limits as required.

4. GRADIO IN “SERIES” — COMBINING 2 TRANSFORMER MODELS FOR TRANSLATION AND SUMMARIZATION

Another way to leverage the huge number of transformer models out there is by linking them up in “series” — ie, connecting models of different functionalities under one Gradio app.

Notebook3.0 demonstrates how you can build a translator-summarizer that takes in Chinese text and produces a summary of the English translation:

The end-result, as the screen-grab above shows, is less than impressive. It is a good reminder of the importance of restraint in these NLP/ML projects — just because something is technically feasible doesn’t mean the results will be any good.

While the ability to chain-link multiple transformer models of different functionalities is a very welcomed one, it will take considerably more effort to figure out a combination that works and delivers good results.

5. SPEECH-TO-TEXT “MIXED MEDIA” APP WITH AUDIO INPUT AND TEXT OUTPUT

My final example, in notebook4.0, demos a quick way to build a simple speech-to-text web app using Hugging Face’s implementation of Facebook’s Wav2Vec2 model.

With the latest version of Gradio, you can easily configure mixed-media apps that take one particular format of inputs, say audio or video, and output them in another format, say, text or numbers. To keep the demo simple, I settled on a speech-to-text app that takes in audio clips and returns a text transcript:

It would probably take a newcomer hours, if not days, to create a similar speech-to-text app using Flask and HTML templates, to say nothing of the additional hoops to jump through in deploying correctly to the hosting provider.

With Gradio, the challenge with this particular example lies predominantly in defining a function that would handle longer audio files without out-of-memory issues that would crash the local machine:

The creation of the app interface practically takes just 1 line of code:

I’ve included two audio files in the repo that you can try the speech-to-text app with — US President John F Kennedy’s famous inauguration speech in 1961, and youth poet Amanda Gorman’s poem at the inauguration of US President Joe Biden in 2021.

END NOTES

The growing list of skills that data professionals need in order to be effective is frankly ridiculous. Few, if any, will ever have the time to learn to be as good at data-related coding and analysis skills, as well as front-end development.

Libraries like Gradio help to “compress” that list of desired skills into something more manageable, by freeing up data scientists and analysts from having to spend hours tinkering with Flask and HTML for a simple demo.

In the long run, I think this helps promote greater transparency and accountability in machine learning, as data professionals no longer have any excuse to say that they can’t quickly build apps to let others try out the proposed solutions.

And as more people get to test such apps in the early stages, hopefully the biases and ethical issues buried deep in a particular solution’s training data and design will be easier to spot.

As always, if you spot mistakes in this or any of my earlier posts, ping me at:

Twitter: Chua Chin Hon
LinkedIn: www.linkedin.com/in/chuachinhon