DATA SCIENCE IN THE INDUSTRY

Real-time Twitter Sentiment Analysis for Brand Improvement and Topic Tracking (Chapter 3/3)

Deploy a Real-time Twitter Analytical Web App on Heroku using Dash & Plotly in Python

Chulong Li
Towards Data Science
10 min readSep 20, 2019

--

Photo by Kalen Emsley on Unsplash

This tutorial will teach you 1) how to deploy all data analytics with insights on the Heroku cloud application platform and 2) how to migrate Plotly-based data visualization to analytical dashboard web app using Dash in Python.

https://twitter-analysis-web-app.herokuapp.com

Note: Real-time Twitter Data Collection and Data Analytics & Sentiment Analysis were completed in previous chapters.

  • Chapter 1: Collecting Twitter Data using Streaming Twitter API with Tweepy, MySQL, & Python
  • Chapter 2: Twitter Sentiment Analysis and Interactive Data Visualization using RE, TextBlob, NLTK, and Plotly
  • Chapter 3 (You’re here !): Deploy a Real-time Twitter Analytical Web App on Heroku using Dash & Plotly in Python
  • Chapter 4 (Optional): Parallelize Streaming Twitter Sentiment Analysis using Scala, Kafka and Spark Streaming

Why Dash?

  • Dash is a productive Python framework for building web applications. Written on top of Flask, Plotly.js, and React.js, Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python.
  • Dash Core Components (dcc) provide supercharged components for interactive user interfaces.
  • Dash Html Components (html) provide pure Python abstraction around HTML, CSS, and JavaScript.

Why Heroku?

  • Heroku is a platform as a service (PaaS) that enables developers to build, run, and operate applications entirely in the cloud.

In order to run the real-time twitter monitoring system, we will use two scripts (or two dynos, or two apps). One is used for collecting the streaming data, and another one is used for data analysis and visualization in real-time. This approach could effectively reduce the latency of data pipeline when handling high-throughput Twitter textual data.

Create Heroku Account & Set Up the Environment

This part is written for beginners to how to set up new Heroku app environment from scratch. If you’re familiar with Heroku platform, you may consider to skip this part.

Two important deployment guides for references:

Sign up your account on Heroku: Cloud Application Platform

Check your email to confirm your account, and then log in Heroku platform.

Click on New button to create new app. App name must be unique since everyone can access the web app via that name after app is published.

1. Install the Heroku CLI

First, download and install the Heroku CLI. Heroku Command Line Interface (CLI) is an essential component allowing us to create and manage Heroku apps via terminal.

Then, log in to your Heroku account and follow the prompts to create a new SSH public key.

$ heroku login

2. Create a new Git repository

Heroku uses Git to version-control the development of applications.

$ mkdir Real-time-Twitter-Monitoring-System
$ cd Real-time-Twitter-Monitoring-System

2. Initialize the folder with git and a virtualenv

$ git init        # initializes an empty git repo 
$ virtualenv venv # creates a virtualenv called "venv"
$ source venv/bin/activate # uses the virtualenv
$ Heroku git: remote -a THIS-IS-YOUR-APP-NAME

virtualenv creates a fresh Python instance. You will need to reinstall your app's dependencies with this virtualenv:

$ pip install dash 
$ pip install plotly
$ pip install gunicorn

Note: gunicorn is a new dependency for deploying the app.

3. Set up several required files

Initialize the folder with a sample app (app.py), a .gitignore file, requirements.txt, and a Procfile for deployment

I. Create a file called app.py and fill in the sample demo code.

# Simple demo app only
import dash
app = dash.Dash(__name__)
server = app.server # the Flask app

II. Create .gitignore

venv
*.pyc
.DS_Store
.env

III. Create Procfile

web: gunicorn app:server

(Note that app refers to the filename app.py. server refers to the variable server inside that file).

IV. Create requirements.txt

Then let’s add this long dependencies. Note: Some of them may not be directly used but can be useful when you’re trying to tune your app.

Click==7.0
dash==1.1.1
dash-core-components==1.1.1
dash-html-components==1.0.0
dash-renderer==1.0.0
dash-table==4.1.0
Flask==1.1.1
Flask-Compress==1.4.0
gunicorn==19.9.0
itsdangerous==1.1.0
Jinja2==2.10.1
MarkupSafe==1.1.1
plotly==4.1.0
PyYAML==5.1.2
retrying==1.3.3
six==1.12.0
Werkzeug==0.15.5
pandas==0.25.0
nltk==3.4.4
textblob==0.15.2
tweepy==3.8.0
psycopg2-binary==2.8.3
Flask==1.1.1

4. Deploy the application

Initialize Heroku app, add and commit code to the repo, and push to Heroku cloud using Git.

$ heroku create THIS-IS-YOUR-UNIQUE-APP-NAME
$ git add .
$ git commit -m 'Initial the app'
$ git push heroku master # deploy code to heroku

The Heroku cloud should already set up a dyno, a cluster to run your app, for you at the beginning. If not, then create a dyno manually.

$ heroku ps:scale web=1  # run the app with a 1 heroku "dyno"

5. Update code and re-deploy

To update our app on Heroku cloud in the future, we just need to add, commit, and push again our new codes.

$ git status # view the changes 
$ git add .
$ git commit -m 'a description of the changes'
$ git push heroku master

Now we need to write some real codes.

Photo by Jace & Afsoon on Unsplash

Migrate Data Analytics & Visualizations from Plotly to Dash

To put our data visualization from the previous chapter on the Heroku app, we need to wrap our Plotly-based dashboard with Dash framework. All data analysis and visualizations will be processed in the file app.py , and you may check my entire code for this file here.

Start with a new app server with the Dash default CSS sheet. Note: I recommend to improve the appearance of the web app using Bootstrap.

external_stylesheets=['https://codepen.io/chriddyp/pen/bWLwgP.css'] app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
app.title = 'Real-Time Twitter Monitor'
server = app.server

The core framework of Dash web app is using app.layout as a background layout. Note: Click on two links below to understand how they work and check some typical samples as they are VERY IMPORTANT to utilize the Dash.

  • Dash Layout allows us to display integrated data visualization along with other text descriptions.
  • Dash Callbacks enables the application to update everything consistently with real-time data.

Dash Layout

Since Dash is built on top of HTML, it uses HTML in its own way (Dash HTML Components). But some of HTML features are not allowed in the Dash.

html.Div(id='live-update-graph') describes the top part of the dashboard in the application, including descriptions for tweet number and potential impressions. html.Div(id='live-update-graph-bottom') describes the bottom part of dashboard in the web app.

dcc.Interval is the key to allow the application to update information regularly. Although the data collector, which will be explained later, on another dyno works in real-time, the analytical dashboard only analyzes and visualizes data per 10 seconds because of the utilization of aggregated data for visualization and the consideration of cost-efficiency.

app.layout = html.Div(children=[    

# Parts of Title hided
html.H2('This-is-your-tittle'),
html.Div(id='live-update-graph'),
html.Div(id='live-update-graph-bottom'),

# Parts of Summary hided
html.Div(
dcc.Markdown("Author's Words: ...... ")
)
# Timer for updating data per 10 seconds
dcc.Interval(
id='interval-component-slow',
interval=1*10000, # in milliseconds
n_intervals=0
)

], style={'padding': '20px'})
}

For each Div in the HTML of Dash, it has className , children , and style . children is a list of dcc.Graph (Graphs in Dash Core Components), dcc.Markdown, html.Div , html.H1 , and other interactive buttons (e.g. dcc.Dropdown and dcc.Slider).

html.Div(
className='row',
children=[
dcc.Markdown("..."),
dcc.Graph(...),
html.Div(...)
],
style={'width': '35%', 'marginLeft': 70}
)

style is important for building a good layout and ensuring proper distances among several visualization graphs, but can be time-consuming to tune the details.

Give an example of dcc.Graph below (Graphs in Dash Core Components). dcc.Graph has attributes id and figure . Under the figure , there are 'data' , containing different kinds of graphs in Plotly (e.g. go.Scatter and go.Pie), and 'layout' , which is only the layout of this graph rather than app.layout .

# import dash_core_components as dcc
# import plotly.graph_objs as go
dcc.Graph(
id='pie-chart',
figure={
'data': [
go.Pie(
labels=['Positives', 'Negatives', 'Neutrals'],
values=[pos_num, neg_num, neu_num],
)
],
'layout':{
'showlegend':False,
'title':'Tweets In Last 10 Mins',
'annotations':[
# ...
]
}
}
)

'annotations' is another important part to ensure the current labels. Note: Reference for Annotation in Dash is quite ambiguous, since parameters may need to be expressed as go.layout.Annotation, a dictionary dict(...), and sometimes a list.

Dash Callbacks

In the Dash app layout, reactive and functional Python callbacks provide connections between inputs and outputs, allowing customizable declarative UIs.

Note: We skip all data analysis parts that were explained in the previous chapter, although there might be a little bit difference (or improvement). Thus, we dive directly into Dash-based data visualization parts, which are implemented through complicated Plotly graphs plotly.graph_objs (a.k.a. go).

# Multiple components can update everytime interval gets fired.
@app.callback(
Output('live-update-graph', 'children'),
[Input('interval-component-slow', 'n_intervals')]
)
def update_graph_live(n):
# Lots of nested Div to ensure the proper layout
# Graphs will be explained later
# All codes are hided
return children

All three line charts use go.Scatter along with stack groups to ensure the overlapped areas under lines in the same graph.

# import plotly.graph_objs as go
go.Scatter(
x=time_series,
y=result["Num of A-Brand-Name mentions",
name="Neutrals",
opacity=0.8,
mode='lines',
line=dict(width=0.5, color='rgb(131, 90, 241)'),
stackgroup='one'
)

Pie chart was explained in the example of dcc.Graph above.

For text descriptions (e.g. Tweet Number Change per 10 Mins), we use two HTML paragraphs to embed data info. By comparing the previous 10-min interval and current 10-min interval, we could generate such a result.

html.P(
'Tweets/10 Mins Changed By',
style={'fontSize': 17}
),
html.P(
'{0:.2f}%'.format(percent) if percent <= 0 \
else '+{0:.2f}%'.format(percent),
style={'fontSize': 40}
)

For Potential Impressions Today, we want to figure out how many people at most have a chance to see these tweets. By counting the sum of follower numbers of people who posted these tweets, we can get the potential impressions. Add a dynamic numeric unit to better display this value in a very large range.

html.P(
'{0:.1f}K'.format(daily_impressions/1000) \
if daily_impressions < 1000000 else \
('{0:.1f}M'.format(daily_impressions/1000000)
if daily_impressions < 1000000000 \
else '{0:.1f}B'.format(daily_impressions/1000000000)),
style={'fontSize': 40}
)

Counting Daily Tweet Number is an easy approach by storing and adding up tweet numbers in each app data update. And this value will be reset to zero at midnight.

html.P(
'{0:.1f}K'.format(daily_tweets_num/1000),
style={'fontSize': 40}
)

The callback function for the bottom dashboard is similar to the first one.

@app.callback(Output('live-update-graph-bottom', 'children'),
[Input('interval-component-slow', 'n_intervals')])
def update_graph_bottom_live(n):
# Lots of nested Div to ensure the proper layout
# Graphs will be explained later
# All codes are hided
return children

Bar chart for hottest topic tracking. Set orientation as horizontal to better display bars and related word names

go.Bar(
x=fd["Frequency"].loc[::-1],
y=fd["Word"].loc[::-1],
name="Neutrals",
orientation='h',
marker_color=fd['Marker_Color'].loc[::-1].to_list(),
marker=dict(
line=dict(
color=fd['Line_Color'].loc[::-1].to_list(),
width=1
),
)
)

Focus the scope on State-level Map by setting 'layout': {'geo':{'scope':'usa'}}

go.Choropleth(
locations=geo_dist['State'], # Spatial coordinates
z = geo_dist['Log Num'].astype(float), # Color-coded data
locationmode = 'USA-states',
text=geo_dist['text'], # hover text
geo = 'geo',
colorbar_title = "Num in Log2",
marker_line_color='white',
colorscale = ["#fdf7ff", "#835af1"]
)

In order to parallel multiple graphs in a single row, you may consider using the following. width should be 1/N percentage (N = number of graph), and we may subtract 1% to give enough gaps between graphs.

style={'display': 'inline-block', 'width': '33%'}
Photo by Banter Snaps on Unsplash

Implement Real-time Twitter Data Collector

In order to run another script connected with web (for internet connection) without paying, we need to set up another application with the exact same configuration except for app.py and gunicorn . Or you can pay for a new dyno to run in the original application.

Note: The free plan could only run around 3 weeks per month due to two-app consumption, and the web app may sleep if inactive for 30 mins although the data collector won’t.

  1. Create a new app server scraping_server.py in the new application, and add below.
from os import environ
from flask import Flask
app = Flask(__name__)
app.run(environ.get('PORT'))

2. Create a data collector file scraping.py in the new application, which is very similar to the file in the Chapter 1. However, this time we’ll use Heroku PostgreSQL rather than MySQL, which also requires us to update the SQL query a little bit. You may check the code here.

# Due to the similar code, check Chapter 1 for detailed explanation.

3. Remove app.py , and update gunicorn as below.

worker: python scraping.py
web: python scraping_server.py

*Note: Don’t forget to create credentials.py and settings.py just like in Chapter 1.

Connect Two Servers/Scripts/Dynos via Heroku PostgreSQL

To share the same Postgres database between applications, follow this Guide:

$ heroku addons:attach my-originating-app::DATABASE --app sushi

New Approach:

Heroku now allows a new data pipeline feature, so you may consider to utilize this benefit rather than using my approach of two applications connected with a PostgreSQL.

Author’s Words:

Thanks for reading! This series of articles for my real-time Twitter monitoring system is over, but the development of the new technique won’t stop. Chapter 4 will be published as an independent article in October. Hope to see you next time!

--

--

Aspiring software engineer at Microsoft, focusing on building large-scale distributed systems, especially real-time streaming computing and storage components.