
Data Science
I’ve always wanted to host some of my Data Science projects to the web so that my models could be interactive and have a greater reach. The data professor (Chanin Nantasenamat) on YouTube had a great tutorial series on Streamlit, an open-source interactive data app platform, that appeared to offer everything I was looking for!
TLDR: Here’s my final page!
So I returned to a previous project, predicting the need for the hospitalization of diabetic patients based on electronic health records (EHR). My GitHub repo contains all of the code for the original modeling and the necessary additions for hosting on Heroku.
But! I had a problem. Even when hobbling together a series of other guides and tutorials for simpler apps, I was still having problems deploying the final app to Heroku! So here’s another guide, hopefully this one helps you follow in my footsteps.
Step 0 – All the traditional CRISP-DM steps!
This guide is from final model to hosting. So I’m assuming you’ve done the due diligence for developing a model, from Data and Business Understanding through Cleaning and EDA to Modeling and Evaluation.
Step 1 – Pickle your model

I developed a multiple logistic regression model using scikit-learn’s LogisticRegressionCV. It’s called _logregnopoly because no polynomial or interaction features were included.
import pickle
pickle.dump(logreg_nopoly, open('diabetes_model.pkl', 'wb'))
Your pickled model should be in the root directory.
Step 2 – Create web_app.py file
Here’s the link to my file. My model had 19 features and a mix of categorical and continuous features, so I needed to clean my data and perform the same transformations (feature engineering and get_dummies) on the user inputted data.
Streamlit allows for easy markdown style headings and text as well as embedding images and graphs. In order to feed data into the model to get a prediction, you should create a function that allows the user to define a value for each of your features.
For continuous variables:
st.sidebar.slider(‘name’, min_value, max_value, default_value)
num_medications = st.sidebar.slider('Num of Medications', 1, 79, 16)
For categorical variables:
st.sidebar.selectbox(‘name’, (‘option1’, ‘option2’, …) where all categories in that feature are included
gender = st.sidebar.selectbox('Gender', ('Female', 'Male'))

The first half of the function will create the various sliders and boxes that the user will interact with on the app. These can be in any order. I put some of the most influential features at the top of the stack to make the experience of altering the prediction easier.

The second half of the function is creating a dictionary of the data set feature names (key) and the variable we created above for the respective feature (value). This is then loaded into a pandas DataFrame. This dictionary should be in order of how the columns are listed in the original data so that our input_df and original df match up when we concatenate.
You’ll note that in order to make sure that all categorical variables are encoded the same way (so that our get_dummies encodes all possible entries in the feature) we need to perform the same data cleaning and feature engineering steps in this file, and we need to concat this user input to the dataset.
Some feature engineering steps as an example – here is the full file to examine. Here I create some binary features determining whether a patient received a HbA1c test and if their medication was changed in response.
# Feature Engineering
df['A1C_test'] = np.where(df.A1Cresult == 'None', 0, 1)
df.change = np.where(df.change == 'No', 0, 1)
df['A1C_test_and_changed'] = np.where((df.change == 1) & (df.A1C_test == 1), 1, 0)
Then we drop the target feature (‘readmitted’) and concat the user input before encoding.
X = df.drop('readmitted', axis = 1) # drop target feature
df = pd.concat([input_df, X], axis=0) # add user input to df
encode = [categorical_variables]
for col in encode: # encode all categorical
dummy = pd.get_dummies(df[col], prefix=col)
df = pd.concat([df, dummy], axis=1)
del df[col]
df = df[:1] # first row is user input
Now we will read in our pickled model in order to make a prediction on the user input! The st.write method accepts strings to display, but it will display pandas DataFrames and Series natively, as well.
load_clf = pickle.load(open('diabetes_model.pkl', 'rb'))
prediction = load_clf.predict(df)
prediction_proba = load_clf.predict_proba(df)
readmitted = np.array(['NO','<30','>30'])
st.write(readmitted[prediction]) # writes value from array
st.write(prediction_proba) # writes probability of each value
Step 3 – Refine Steamlit App Locally
Once you have the modeling taken care of, you can host the Streamlit locally and add in the necessary markdown to build out the app.
In your Terminal/CLI
This will install Streamlit (if not already installed) and open the demo page.
$ pip install streamlit
$ streamlit hello
Now that you have a web_app.py file ready (or close enough), open it locally for real-time viewing and editing. You should be in the app’s root directory.
$ streamlit run web_app.py
Once open, anytime you update and save the .py file in your IDE, the Streamlit app will detect the changes and can be set to automatically update the app.
Step 4 – Create Heroku Dependencies
So your app is ready to go and is running locally! Excellent! So now we’ll need to create some files in this root directory to feed to Heroku. This section is heavily informed by this previous guide, however I’ve changed the contents of the setup.sh file as the original guide didn’t work for me.
Procfile
In Jupyter, create a file called Procfile. Paste this as the only content. This tells Heroku to run the setup.sh file we are about to create and run the Streamlit command we ran earlier to run our app.
web: sh setup.sh && streamlit run app.py
requirements.txt
In Jupyter, create a file called requirements.txt. Here is where you can specify the necessary packages and their versions to make sure your app doesn’t break with further updates. Here is my version (I need sklearn because of the model I’m using).
streamlit==0.71.0
scikit-learn==0.23.2
setup.sh
In Jupyter, create a file called setup.sh. This file creates a directory and sets some variables used by Heroku when hosting your app and stores them in a config.toml file (an open-source configuration file format).
mkdir -p ~/.streamlit
echo "[server]
headless = true
port = $PORT
enableCORS = false
" > ~/.streamlit/config.toml
Step 5 – Heroku!
From here you can follow [Hamilton Chang’s original guide](http://streamlit==0.71.0 scikit-learn==0.23.2)! He can walk you through Loading Your Files, Running the App, and Deploying the App. I’ll add just one aspect…
How to Update the Git Remote Path
Heroku gives you a randomly-generated app name. After renaming your app deployment, you’ll have to update the Git remote path so you can continue to update the app and push changes.
First, remove the original heroku path. Then, add a new remote path. I named mine diabetes-hospitalization (i.e. what I named my deployed app).
$ git remote rm heroku
$ heroku git:remote -a diabetes-hospitalization
Then you can update the app and push changes to your repo!
$ git push heroku master
Conclusion
I hope this guide is helpful and up-to-date (!). These steps allowed me to host an interactive version of a model to the web for users to interact with! You can visit it here and play around. Having a working demo of your projects is a great way to expand the reach of your projects to employers, and it shows that you can deploy a working/robust model. While Streamlit is not the most powerful tool out there, it’s ease of use and open-source nature makes it uniquely suited for beginner projects to get your feet wet!
Connect
I’m always looking to connect and explore other projects! Or let me know if something in this guide could be expanded/isn’t working!