Your ML Model has graduated. Now it needs a job.
After you’ve properly trained and validated your ML Model, it is time for it to do what it was made for: to serve. One common way to do that is to deploy it as a REST API. In this article, I want to share how I did that on my personal project of building a simple web application for fake news detection.
But also, I’d like to take this opportunity to discuss some aspects of a very important matter: technical debt.
Nowadays, it’s relatively fast to deploy ML systems, but it is very easy to overlook how difficult and expensive it is to maintain those systems over time¹. I’m not an expert on the matter, so instead of trying to discuss every aspect of it, I want to talk about two issues:
- Model Staleness: ML systems can encounter changing, non-stationary data. In these cases, if the model is not trained often enough to produce up-to-date models, the model is known to be stale².
- Training/Serving Skew: Features might be calculated by different codepaths in different parts of the system. When these codepaths generate different values, this is known as training/serving skew².
Depending on the application’s nature and complexity, it is very important to understand the impact of both of these issues, and also how to detect and avoid (or mitigate) them.
So, as I show you how I deployed my application, I also want to discuss how those issues are present on this particular project, and how we can take them into consideration in our design.
The Application
In the end, what I want is a web application for fake news detection: a page where a user can enter a URL of a news article, and the system will tell the result of its prediction: whether it’s fake or real. To illustrate, this is the end result:

The complete code for the application can be found on the project’s Github.
Note: This article builds on my last one, in which I trained a text classification SVM model (and tracked its performance). To train this model, I used data from a Kaggle dataset you can find here.
What to look out for
When taking the model to production, there are some things (among others) we should consider.
For instance, the dataset used to train the model contains news articles from between 2015 and 2018. So, as an example, if an article about COVID-19 were to be presented to the model for inference, what would its prediction be? News, by its nature, is ever-changing. It is straightforward to assume that model staleness is an important consideration here.
Another issue is that the method by which the text is extracted in the original dataset is not shown. It is, most probably, not the same method I used to extract content in production. So, there is some training/serving skew here that cannot be avoided. But we must have ways to assess it and to mitigate it in the long run, when retraining the model with data gathered in production.
Finally, we should also consider the differences between text preprocessing during training and serving. If I remove stopwords during training, I have to make sure I’m also removing them in production. If additional preprocessing steps are introduced in production, then I should retrain the model. Another example of training/serving skew.
Overview
Setting up a web application can be done in a whole lot of different ways. For this project, this is the way I chose:

The main application is built using Flask. Cloud Run is a fully managed solution by Google to deploy a containerized web application. Thus, in order to deploy my application on Cloud Run, it first has to be packaged into a docker container.
Well, the model files need to be stored somewhere. In my last article, I was using AWS, but since I’m going with Cloud Run, let’s keep everything in the same cloud environment by using Google Cloud Storage. When the instance is created, the required files are then downloaded from my Storage Bucket.
To address the problems discussed earlier, I decided to store all prediction results in a table. In this case, Google Cloud BigQuery. This way, I can use the data from online predictions to retrain the model in the future. Also, if I store prediction probability values from each inference, I can check if my model is growing uncertain with time. By storing the news contents, I can also check how text is being processed and verify word distribution over time. I can also log the name of the model that yielded the result, so I can do all sorts of tests between different versions and types of models.
Preliminary Steps
There were some steps I had to make in order to set up my environment.
Creating a project
The first one is to create a GCP account, and then creating a project by selecting "create a project" here.
Creating a service account
To use the Python APIs for accessing Storage and BigQuery, I need GCP credentials in JSON format. I followed these instructions on "creating a service account" in this Google documentation to download my credentials.
Creating a bucket
I also had to create a bucket in Cloud Storage as instructed [[here](https://github.com/FelipeAdachi/fake-news-experiments)](https://towardsdatascience.com/how-i-learned-to-stop-worrying-and-track-my-machine-learning-experiments-d9f2dfe8e4b3). My bucket is named "models", and in it, I stored the model files I needed according to the model’s names. The names and files are created according to my last article, where I discussed tracking ML experiments (Article here and Github here).

Creating the table
The last step is to create the table at BigQuery to store my predictions. This can be done in different ways. I did it by creating a dataset at BigQuery through the console and then creating the table itself through [google-cloud-bigquery](https://pypi.org/project/google-cloud-bigquery/)
in Python:
pip install -upgrade google-cloud-bigquery
And then through this script:
Note we’re using the credentials downloaded in the previous step as an environment variable. The table_id
is built based on the names of the project, dataset, and table to be created.
Upon the table’s creation, you can set its schema. For this project, I decided the following fields are important to be stored:
- title: The title of the news
- content: The text content of the news
- model: The name of the model that yielded the prediction
- prediction: Prediction results: real or fake
- confidence: The prediction’s probability, denoting the model’s confidence in its prediction
- url: The URL of the original news article
- prediction_date: The date and time of when the prediction was made
The Front-End
Since it’s a very simple interface, there’s not much going on here. The home URL just renders a header and a text box as user input:
The Flask App
This is the main application code. When a POST request is sent by the form, the classify_news
function is called. It basically executes the following steps:
- Reads the content from the URL
- Preprocess, vectorize and transform the text content into a tf-idf representation
- Predicts whether it’s fake or real news
- Insert prediction results to BigQuery
-
Display information to the user
You can see that there are two routes to
classify_news
. One is for when it is called from the home URL, in which casetext
is fetched from a form, and the function should return an HTML formatted string.
You can also call it directly as an API endpoint, to make it easier to call from any kind of application. The return would, then, be equal to the dictionary used to insert the results to BQ table. Here is an example from when I was testing the application locally on Postman:

Getting what we need
In order to make the predictions, we first need to get our model files. That includes a pickle file, which is the vocabulary for the CountVectorizer mapping, and the joblib file, which is the sklearn model itself.
To download the files, I used the [google-cloud-storage](https://pypi.org/project/google-cloud-storage/)
package. We can use the same credentials used for the table created earlier.
Reading the content
Then, we need a way to actually extract the text, given an URL. To do so, I used a package called [boilerpy3](https://pypi.org/project/boilerpy3/)
.
Preprocessing
Once we have the text, we need it to transform to a format that our model can understand. During training, we had a simple text preprocessing stage, followed by a count vectorization and then transforming it into a tf-idf representation. I won’t be discussing the NLP methods in detail, but you can learn more about them in this sklearn tutorial.
You see that it’s important to know the codepath that generated your model so that you can recreate it as best as possible while serving. And, also, to store every artifact that you might need later.
Prediction
Now that we have the right input, we can make the prediction by loading our trained model and calling its prediction method. In this case, prediction_proba
returns the probability for both classes (Real=1 and Fake=0). Whichever is higher is the predicted class.
We need the probability values of the predictions in order to monitor our model’s performance. But to have access to it, one must set the right parameter way before, during the training stage. To realize that beforehand, and not only when it’s already in production, certainly makes things a lot easier.
Storing the Results
We can finally gather all the info in a dictionary to_insert
and then send the result to BigQuery:
Package and Deploy
Once the Flask application is properly tested locally, we can now proceed to create a Docker image and deploy it on the cloud. To do so, I followed this great tutorial here: Deploy APIs With Python and Docker.
I won’t be explaining every detail in the process, as the above tutorial is already pretty well explained. But, to summarize, I needed to:
- Create the Dockerfile:
FROM python:3.7
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 app:app
docker build -t fakenewsdeploy:<model_name> .
- Install Google SDK
gcloud auth login
gcloud auth configure-docker
gcloud builds submit -t gcr.io/<your_project_id>/fakenewsdeploy:<model_name>
Up until this point, we created the image and uploaded it into Google Container Registry. Now, we have to deploy a cloud run application from our image. We can do so by clicking "Create Service" at Cloud Run in the console, and the end result should be something like this:

In the previous steps, we inserted tags in our images with the model name and, here, we can also insert Revision URLs for each revision. That is not only a tag but also an URL to specifically access that revision. That way, we can access and compare the performance of each model, or any other difference there might be between revisions.
Another cool functionality is that you are able to set how much traffic is directed to each revision. Let’s say you just trained a new fancy Transformer model, and want to compare it to your classic one. That way, you can gradually increase the traffic to the newest model, in order to reduce risks. Since you’re logging the results, you can easily revert the traffic if there’s something wrong.
The Result
Unfortunately, I won’t be keeping the service online for much longer, as my trial period at GCP is almost over. But if you read this soon enough, you can check it out at https://fakenewsdeploy-2hqadru3za-uc.a.run.app
The end result was already shown at the beginning of this article. What’s left to show is our table at BigQuery:

We can now visually inspect the results or make some SQL operations to calculate some useful metrics over time, such as model confidence, content’s word distribution, and class distribution.
Conclusion
In this article, I showed you how I deployed my simple web application for fake news detection while trying to reduce debt for my future self. There are some painless actions that you can take now in order to avoid some big headaches in the future. I’m sure there is much more to be done in this matter, but, hey, let’s learn things one at a time.
Now that it’s up, what is left is to maintain it. You should be constantly monitoring your application, and retrain/redeploy if the need arises.
To retrain your model, you eventually will have to label your stored online predictions. I have read about people selecting predictions whose confidence is above a certain threshold and taking it as ground truth. I don’t know if that is actually recommended, but it might be worth a try.
If I continue this personal project, I think the next step would be to set up a monitoring dashboard. To keep everything in the same place, this could be done at Google Data Studio, for instance, by setting our table as a data source. That way, I could set up some neat time series plots, charts, and graphics.
That’s it for now! If you have any suggestions, feel free to reach out!
Thank you for reading!