PyCaret and Streamlit: How to Create and Deploy Data Science Web App

Building and deploying a machine learning model have never been easier. Right now, we have a lot of frameworks and libraries that enable us to build machine learning models with just a few lines of code. Among all of them, PyCaret is one of the best. To create and deploy a web app for our Data Science project, Streamlit has become very popular lately.

In this article, we will use these two libraries to create a data science web app. We’re going to use PyCaret to build a wine quality classifier. Next, we’re going to use Streamlit to create and deploy this wine classifier. You’ll be surprised of how easy and quick it is to build the classifier and deploy the web app with these two libraries. So, let’s get started!

Load and Preprocess the Data

The data that we will use in this article is the Wine Quality dataset, which you can download for free here. This dataset consists of 1599 instances with 12 features. Let’s load the dataset with Pandas.

As you can see above, we have different features such as fixed acidity, citric acid, pH, and so on. The task of our classifier is to predict whether the wine quality is good or bad. However, the values for quality feature is not what we would’ve expected. We need to transform the value in this feature to be either ‘good’ or ‘bad’.

To do this, we need to set certain rules. If the wine quality is equal or greater than 6, then we can classify the quality of the wine as good, otherwise the quality is bad.

Now we have the data that we’re looking for! Note that you can also check that there are 855 wines classified as ‘good’ and 744 wines classified as ‘bad’. This proportion seems pretty balanced and safe to say that we don’t have an imbalanced dataset issue.

The dataset is also clean, which means that there is no missing value, no duplicate value, and the data types are all correct.

Next, let’s build our classifier model using PyCaret.

Build Classifier Model Using PyCaret

PyCaret is a low-code Machine Learning library that automates all of the machine learning workflows. What it does is that it provides a wrapper for popular machine learning libraries such as scikit-learn, XGBoost, LightGBM, CatBoost, and many more.

With PyCaret, we can basically build our machine learning model for classification, regression, clustering, anomaly detection, or NLP problems in just a few lines of code.

If you haven’t installed PyCaret, you can easily do so by typing the following pip command.

pip install pycaret

Since we’re going to solve a classification problem, then we need to use pycaret.classification module. If you solve different problems, then you need to use other modules, which you can find out more in PyCaret official documentation page.

Experiment 1: Default Setup

First things first, we need to setup our PyCaret environment with setup() function. This function needs to be called first before we call other functions in PyCaret.

from pycaret.classification import *

exp_clf01 = setup(data = wine_df, target = 'quality', session_id = 123)

As you can see, we passed two parameters as the argument for setup() function:

data – Our input data.
target – The name of the feature that we want to predict (dependent variable).
session_id – The identifier for our setup environment.

If you run the code snippet above, you’ll get the following outputs:

From the output above, you can see that the setup() function will automatically split our data into train set and test set.

Also, it will automatically infer the data types of your features: whether your feature is a numerical feature or a categorical feature. You need to take a look at the output carefully because there are times when the function infers the data types incorrectly. If you find that one of the feature is inferred incorrectly, you can correct it by doing the following:

exp_clf01 = setup(data = wine_df, target = 'quality', session_id = 123, categorical_features = ['feature1', 'feature2'], numerical_features = ['feature3', 'feature4'])

You can use categorical_features or numerical_features parameter to change the data types that are incorrectly inferred by setup() function. You need to pass a list of string of the name of the features that you want to change.

Next, let’s build our classifier model.

When we want to build a machine learning model, most of the times we don’t know in advance which models that will give us the best performance according to our metrics. With PyCaret, you’re able to compare the performance of different kinds of classification models with literally single line of code.

best = compare_models()

As you can see, it turns out that Random Forest classifier gives us the best performance in 5 out of 7 metrics. Let’s say we want to use F1 score metrics for our wine classifier, then of course Random Forest classifier will give us the best performance.

Experiment 2: Tuned Setup

Before we go further, let’s see whether we can improve the performance of the models by tuning our setup() function.

exp_clf102 = setup(data = train_data, target = 'quality',                                                                                                                                                                                                                                                          session_id=123, normalize = True, transformation = True)

As you can see, we passed several additional parameters there to tune our setup:

normalize – To transform our features by scaling them to a given range.
transformation – To transform our features such that our data can be represented by normal distribution. This can be helpful for models like Logistic Regression, LDA, or Gaussian Native Bayes.

There are a lot of tuning options that you can do inside this setup() function. You can learn more about it here.

Also note that we use the same session_id as our previous setup() function. This is to make sure that all of the future improvements on the model are solely due to the change that we’ve implemented in this setup() function.

Let’s compare the models once again with our new setup.

As you can see, most of the metrics are slightly improved after we tuned the setup. Before we tuned the setup, the F1 score of Extra Tree classifier is 0.8306. After we tuned the setup, the F1 score becomes 0.8375.

Based on this result, let’s build our Extra Tree classifier. We can do this with a single line of code.

et_model = create_model('et')

Next, you can evaluate your model by looking at the visualization of the ROC curve, feature importance, or confusion matrix of your model with also a single line of code.

evaluate_model(et_model)

As a final check, we can use our Extra Tree classifier to predict the test data that has been generated by PyCaret. As mentioned earlier, soon after we executed the setup() function at the very first step, PyCaret will automatically split our data into training data and test data. All of the model performance and evaluation metrics that we’ve seen above are solely based on the training data.

To use the model to predict the test data, we can use the predict_model function.

predict_model(et_model)

Finally, let’s save our Extra Tree classifier model.

save_model(et_model, model_name = 'extra_tree_model')

And that’s it for model building with PyCaret. After this you should have a pickle file called ‘_extra_treemodel‘ in your working directory. We will use this saved model to build a wine classifier web app with Streamlit.

Build the Web App with Streamlit

Now it’s time for us to build our wine classifier web app. In this post, we’re going to use Streamlit to build the web app as it is more beginner friendly than Flask. Plus, you don’t need to have any prior experience with HTML and CSS to use Streamlit.

If you haven’t installed Streamlit yet, you can do so by typing the following pip command:

pip install streamlit

The first thing that we need to do is importing all of the relevant libraries. Note that we’re going to use the Extra Tree classifier model that we’ve saved before using PyCaret. To load the model and to make a prediction using the saved model, we can use load_model and predict_model functions from PyCaret.

In the code above, we import the libraries, create a function to make a prediction, load the saved model, and create the title and text of our web app.

Next, we need to let the user to specify the value of our features. Since our features are all numeric features, it will be best to represent them with a slider widget. To create a slider widget, we can use slider() function from Streamlit.

Note that we passed several parameters to slider() function:

label – The label of the feature that will be displayed in the web app.
min_value – Minimum value of the slider.
max_value – Maximum value of the slider.
value – Default value of the slider when you open your web app.
step – The amount of increment and decrement when you move the slider.

Next we need to convert all of those user input values into a dataframe. Then, we can use the dataframe as the input of our model’s prediction.

And that’s it basically! Now your web app is done.

To check your web app, you need to open your prompt, then go to the working directory of your Python file.

In the working directory of your Python file, type the following:

streamlit run your_python_file.py

Next, a browser window will pop-up to show you the UI of your web app like the following:

However, there is still one small problem to solve. Your web app now can only be accessed in your local computer. That means that other people can’t see and use your web app. If you want other people to use your web app, you need to deploy your web app. In this post, I’m going to show you how to deploy your Streamlit web app with Streamlit sharing.

Deploy the Web App with Streamlit Share

There are two easy options if you want to deploy your web app: either using Heroku or Streamlit sharing. The problem with Heroku is that if you only have free tier access, they will limit your slug size to around 500MB. Meanwhile, PyCaret itself has a lot of dependencies, which means that the size of your web app will exceed the slug size limit of Heroku.

Because of that, let’s use Streamlit sharing to deploy the wine classier web app. It’s really easy to deploy your Streamlit web app with Streamlit sharing. Below is the step-by-step on how you can deploy your web app:

The first thing that you need to do is requesting an invite to Streamlit sharing. You can do so by accessing this page. All you need to do is to enter your name and the email that you use for your GitHub account. If you don’t remember your email, you can do so by logging in into your GitHub account, then go to Settings. In the Settings, select Emails and you’ll see your email address.
Next, create a repo in your GitHub that contains three files: the Python file for creating the web app, the pickle file of classifier model that we’ve built using PyCaret, and a text file called requirements.txt. This text file should contain all of the dependencies that we need to create our web app. For our wine classifier, the content should look like this:

pycaret
streamlit
pandas
numpy

After you’ve requested an invite, you should wait a little bit until you get an invitation from Streamlit via your Email.
Next, go to this page and sign in with your GitHub account.
Now you should see the following page after you’ve signed in.

Next, click on ‘New app’ button and you’ll see the following page.

In the Repository field, enter the path to your GitHub repo which contains the Python file to create the web app. In the Main file path, enter the name of your Python file. Finally, click Deploy!
Now wait a little bit until your web app is deployed.

Note that after your web app has been successfully deployed, you’ll see the URL to access your web app with the format as follows:

https://share.streamlit.io/[user name]/[repo name]/[branch name]/[app path]

You can share this URL to other people so that they can use and play around with your web app.

You can check the deployed version of wine classifier covered in this article here.

Also, you can check the complete Notebook and Python file to build the wine classifier web app here.

And that’s it! Hopefully, this article is somewhat helpful for you.

I’ve always thought that Heroku would be the easiest available option to deploy your data science web app. However, if you build your web app with Streamlit, then Streamlit sharing would be the easiest and the most convenient way to deploy your web app.