Failure to Deploy: A Success Story

How I did not deploy my first SARIMA COVID-19 forecasting model using Dash, Plotly, and Heroku.

Josh Johnson
Towards Data Science

--

Photo by Ethan Hu on Unsplash

I did not deploy a SARIMA time series model using the statsmodels library that predicts future COVID-19 infection and death rates. Using Plotly to create interactive graphs of current and predicted case and death rates, allowing users to to decide which statistics to include, which countries or states to predict, and how far out to predict, I did not make a publicly accessible and interactive predictive website. I worked hard and learned a lot in not deploying this model to a Heroku server.

Here is my story.

In this walkthrough you will learn to deploy a website to Heroku that does NOT make machine learning predictions. You WILL learn to get started with:

  1. Plotly, an interactive graphing library for Python
  2. Dash Open Source, a web deployment library that works great to put your custom Plotly graphs on the web for your users to interact with.
  3. Heroku Free Version, a web hosting service with benefits…and limitations.
  4. Not a lot about time series modeling. I’m saving this for another story later.

Background

I am a budding data scientist and have learned a lot about data and modeling, but I want a way to show the world what I can do, and maybe make some tools to be of use to others. Jupyter Notebooks are great for communicating with other data scientists and showing the validity and reproducibility of your findings. But, they aren’t the right tool to share your work with everybody. We must deploy!

After graduating from Flatiron Data Science Intensive boot camp, I was invited to join a team competing in an X-Prize COVID-19 Pandemic Response competition. I will write the story later about that journey, but the short of it is that I knew some basics about time series forecasting and decided to join the team to learn more. My most successful model was a SARIMA model that could, with 99% accuracy and access to only data up to November, predict changes in infection rates for the month of December in the USA (my validation data).

I also tried out the Facebook Prophet time series prediction library. Though it didn’t give me the best results, I did learn about an integration between their library and Plotly. Using Plotly, I was able to make my first interactive plots to explore forecasts and trends.

I took the time to learn more about Plotly and continued to use that fun and easy library to learn how make interactive plots shine. I am familiar with matplotlib, and Plotly has much of the same functionality, plus the interaction and animation options. New data science superpowers!

While I was using Plotly’s documentation to keep learning new skills, the site kept pushing their Dash Enterprise deployment library on me. They wanted me to buy the package and comprehensive development environment for my enterprise. Well, I don’t have an enterprise yet, just an enterprising spirit. I knew I wanted to learn how to deploy my models and using the free version of Dash to deploy a COVID-19 forecaster seemed like a good place to start. I did not end up deploying my model, but not because of Dash. I definitely recommend Dash for machine learning web deployment!

My coding experience had previously been mostly focused on Jupyter Notebooks and custom libraries to support them. Dash is designed for deployment to web servers and used some tools that were new to me, like function wrappers and callbacks.

I know many new data scientists would love to learn how to deploy their work, so let me know you what didn’t…and what did work for me!

Plotly

Plotly is a data visualization library that lets you easily make interactive plots to encourage your audience to explore the data. It doesn’t take much to get a basic plot going. Take a look below to see how to download COVID data and plot it by country and region in an interactive plot:

Created by Author
Created by Author

Feel free to change the country and region variables if you want to see plots for other regions. Note, though, that only the United States, Canada, Brazil, and the United Kingdom have data by region.

Also, if you run the above code, take a moment to interact with the graph. You can draw a box over areas of interest to zoom in, then double click to zoom back out. You’ll notice that deaths and cases are on very different scales and to see details of the cumulative deaths you’ll have to zoom in a lot!

SARIMA

Time series forecasting has nuance to it and making a good one is beyond the scope of this article. In short, you have to account for many aspects and patterns in your data to allow the model to use past values to predict future ones. The models are picky about how the data is distributed, and some transforms are usually necessary to get a good prediction. If you want to replicate my model, the code is below. I’m not going to really explain it or how I came to the particular hyperparameters here. Watch for my next blog on what I learned about time series forecasting to learn more of the nitty gritty.

Here’s the code for my forecasting model:

Created by Author
Created by Author

Dash

This is the key ingredient to deploying your models and analysis live online. Dash uses Flask servers on the back end to make deployment easy and intuitive. I created custom module using the functions you see in the Gists above to integrate into a Dash app layout. Dash works by creating aDash() app object with methods for generating an HTML layout, callback decorators for your custom functions to interact with that layout, and a .run_server() method to start Flask server. Running the code in your terminal creates a local server that you can connect to with your browser to preview your new site. It’s really pretty easy once you wrap your head around how the decorators interact with the layout.

app = dash.Dash(__name__)

Style sheets

When the Dash() app object is instanced, it can be given a style sheet template to start from.

app = dash.Dash(__name__, external_stylesheets=URL)

Layout

app.layout is the attribute of the app that controls the website layout and its components . The layout controls what your visitors see and it is filled with things called components. Components can be text images, graphs, interactive objects, inputs, and more. If you are familiar with HTML, that will help you here, as the syntax will be familiar. Each component has anid which is used by callbacks to help your functions interact with it.

app.layout = html.Div([html.H2('Sample'), html.Div(id='first-division'), dcc.Dropdown(id='dropdown-1')] etc.

One thing that might not be obvious is that html.Div objects have children attributes which are new components that appear under that division of the layout in the displayed web page. This is how the callback for display_value() , which returns a dcc.Graph() component, creates a new graph under the html.Div(id='graph') component.

Decorators and Callbacks

Function decorators were new to me, and I didn’t understand how they worked at first, but I’d like to share what I learned:

Function decorators change the function that is directly below them (no whitespace!).

Dash uses the@app.callback() decorator to move values between the layout and your functions as the users interact with the deployed site. They are a way for your code to interact with what your site visitors see and do.

@app.callback([<dependencies>..])

def function()

Dependencies

As you read the code below, notice how the dash.dependencies.Input() and dash.dependencies.Output() objects in the callback function reference a component in the layout through its id in the first argument and a part of the component in the second argument. As you might have guessed, the Input() objects take in values from components and the Output objects change parts of the components.

@app.callback(Output('first-division','children'),[Input('dropdown-1','value')])

def function(dropdown_input):

The values from the Input() objects are fed to the function below the wrapper in the order they appear in the callback, and become the arguments for the function. This is key. It doesn’t matter what you call the variables in the function definition, they receive the values from the Input() objects in the order those appear in the callback method.

Guidance and Clarity

This is a short guide on a slightly complex interaction between elements. For more information, or if I’ve confused you, please visit Dash Open Source Docs to get clarity. You can also take a look at the code below to see how it all fits together.

WARNING!

While the code I’ve given you so far in this article can be run on Colab, wonderfully sandboxed, the code below won’t actually get you a working server on Colab, you’ll have to run it locally. Always be very careful about running other people’s programs on your computer. The one below is safe, but others may not be. If you have any concerns, carefully inspect the code, research the imported libraries, and/or consult your expert coding friends.

You can run the code it as a notebook, or copy it into a text file and rename it app.py to run it through your terminal. Make sure you get the source code for the custom src module below, too.

Created by Author

Here’s the SRC, too. You’ll need both to get your server working. Copy the code below into a text file and name the file src.py. Then put it in the same folder as the Dash app file above.

OKAY! I know what you are going to say. You’ve noticed it. There is no predictor here. If you run this code, you will set up a server and a site that you can connect to locally through your browser. But, that site only lets you explore current COVID data charts, not predicted cases like the SARIMA code. We’ve come to my ‘failure’.

Heroku

Heroku offers a free website hosting service that lets you connect a properly set up git repository to create your site. With a few additional files in your repo (provided below), the code in the above Gist, when deployed on Heroku, will work great to show the world your COVID-19 tracker. You will need to make an account here on Heroku, though.

To deploy the above file you’ll need to copy it into a text file and rename the file app.py. Then you’ll need some files available in the git repo linked at the end of this walkthrough, requirements.txt, runtime.txt, and Procfile. requirements.txt tells Heroku what packages it needs to install, runtime.txt tells it what programming language your code is in, and Procfile tells it how to get started on deploying your code. I won’t go into more detail, you can check the repo at the end of this article for what those files need to contain.

Fail!

I want to tell you why I couldn’t deploy my predictive model to Heroku. Heroku offers this free service, which is awesome, but the free servers are not the fastest and the SARIMA model takes some horsepower to train. I can’t afford, at the time of this writing, to pay for the full version.

You might just think this would make it take forever to return the graph, but Heroku has a hard limit of 30 seconds for backend app to respond to a user request. The freely available servers can’t get the model trained and the prediction made in that time. So, we are limited to the lovely interactive Plotly graph and the ability for users to explore current and historical data in different countries and regions. That’s still pretty cool, though.

Future Success

My next learning goal is to master Amazon Web Services. AWS powers a huge amount of the internet these days and has powerful servers optimized for data analysis and Machine Learning. I suspect my future ML deployment success may be possible through their service. However…free service with them is limited and accidental fees are not hard to accrue. I’ll be working toward my AWS ML certification and I’ll check back in with you all to share what I learn in the future.

Other possible solutions found by others are to have the server return information incrementally. I could look for a way for my app to return something, like a ‘still training’ message or something, to extend the time before the timeout. This was suggested by Aatish Neupane, though they note that this can only extend the timeout to 55 seconds.

Another solution would be to add more dynos (the background processors in Heroku). The free account gives you 2 dynos to distribute between your deployed sites, so that’s a big ask, basically making this the only site I can deploy.

Finally, it’s been suggested to me that I could pre-train models on each day’s new data.

Here is the link to my deployed COVID-19 data site. It loads slowly at first because each time it loads the app downloads all the freshest COVID-19 data. Once it loads, though, it’s quite responsive. Plotly is optimized for Dash and Pandas, and Pandas is optimized for data. With these great libraries, the site displays the data nicely and quickly.

Here is the link for the full repo. Please clone it and have fun!

Let me know what works for you!

In the meantime, enjoy the code. Please, use it as a starting point to launch your own ideas. Be creative, break it, change it, add to it, and post your insights, struggles, and successes in the comments.

Sincerely and with love,

Josh Johnson

Thanks to:

Special Guest Star:

Layla Rowen: My partner in this project, especially for web deployment.

Great data:

The University of Oxford and Blavatnik Schook of Governments Coronavirus Government Response Tracker.

Gentle SARIMA model guidance:

Jason Brownlee’s A Gentle Introduction to SARIMA for Time Series Forecasting in Python

--

--

I'm a data scientist with a background in education. I empower learners to become the folks they want to be.