The world’s leading publication for data science, AI, and ML professionals.

How to Develop a Streamlit Data Analytics Web App in 3 Steps

Step-by-Step Guide to Build Your First YouTube Analytics App

Photo by Tran Mau Tri Tam on Unsplash
Photo by Tran Mau Tri Tam on Unsplash

For the majority of the time, data science/Data Analytics projects end up as delivering a static report or dashboard, which tremendously downgrades the efforts and thoughts being put into the process. Alternatively, web app is a great way to demonstrate your data analytics work, which can be further expanded as a service on self-served and interactive platforms. However, as data scientists or data analysts, we are not trained for developing softwares or websites. In this article, I would like to introduce tools like Streamlit and Plotly that allows us to easily develop and serve your data analytics projects through a web app, with the following three steps:

Develop Data Analytic Web App in 3 Steps (image from author's website)
Develop Data Analytic Web App in 3 Steps (image from author’s website)
  1. Extract Data and Build Database
  2. Define Data Analytics Process as Functions
  3. Construct Web App Interface

Afterwards, we will be able to create a simple web app like this:

Web App Demo (image by author)
Web App Demo (image by author)

Step 1. Extract and Build Database

Step 1 in Developing Data Analytics Web App (image by author)
Step 1 in Developing Data Analytics Web App (image by author)

We will use YouTube Data as an example here, since it is relevant to our daily life. YouTube Data API allows us to get public YouTube data, such as video statistics (e.g. number of likes, views) or content details (e.g. tags, title, comments). To set up the YouTube API, it is required to sign up a Google Developer account and set up an API key. Here are some resources I found helpful to get myself started on using YouTube API.

These resources takes us through how to create a YouTube API key and install required library (e.g. googleapiclient.discovery). After these dependencies have been resolved, we set up the connection to the API using Python and your own API key, using the following command:

from googleapiclient.discovery import build
youtube = build('youtube', 'v3', developerKey=<your_api_key>)

After establishing the connection, it’s time to explore what data is available for your Data Science projects. To do this, take a look at the YouTube Data API documentation, which provides an overview of the different kinds of data that can be accessed.

YouTube Data API reference list (screenshot by author)
YouTube Data API reference list (screenshot by author)

We will use "Videos" as an example for this project and the list() __ method allows us to request the "Video Resource" by passing the _par_t parameter and several _filter_s. _par_t parameter specifies which components from the Video Resource you would like to fetch and here I am getting _snippet, statistics, and contentDetail_s. Have a look at this documentation which details all fields you can get from videos().list() method. And we specify the following _filte_r parameters to limit the results returned from this request.

  • chart='mostPopular': get the most popular videos
  • regionCode='US': videos from US
  • videoCategoryId=1: get the videos from a specific video category (e.g. 1 is for Film & Animation), which can be found in this catalog of video category ID.
  • maxResults=20: return a maximum number of 20 videos
video_request = youtube.videos().list(
                part='snippet,statistics,contentDetails',
                chart='mostPopular',
                regionCode='US',
                videoCategoryId=1,
                maxResults=20
              )
response = video_request.execute()

We then execute the request using video_request.execute() and the response will be returned as JSON format, which typically looks like the snapshot below.

response in JSON format (image by author)
response in JSON format (image by author)

All information are stored in the "items" in the response. Then we extract the ‘items’ key and create the dataframe video_df by normalizing the JSON format.

video_df = json_normalize(response['items'])

As the result, we manage to tidy up the output into a structure that is easier to manipulate.

video_df (image by author)
video_df (image by author)

To take a step further of working with JSON using Python, I recommend reading the article "How to Best Work with JSON in Python".

Step 2. Define Data Analytics Process as Function

Step 2 in Developing Data Analytics Web App (image by author)
Step 2 in Developing Data Analytics Web App (image by author)

We can package multiple lines of code statements into one function, so that it can be iteratively executed and easily embedded with other web app components at the later stage.

Define extractYouTubeData()

For instance, we can encapsulate the data extraction process above into a function: extractYouTubeData(youtube, categoryId), which allows us to pass a categoryId variable and output the top 20 popular videos under that category as video_df. In this way, we can get user’s input on which category they would like to select, then feed the input into this function and get the corresponding top 20 videos.

def extractYouTubeData(youtube, categoryId):
    video_request = youtube.videos().list(
    part='snippet,statistics,contentDetails',
    chart='mostPopular',
    regionCode='US',
    videoCategoryId=categoryId,
    maxResults=20
    )
    response = video_request.execute()
    video_df = json_normalize(response['items'])
    return video_df

We can use video_df.info()to get all fields in this dataframe.

fields in video_df (image by author)
fields in video_df (image by author)

With this valuable dataset we can carry out a large variety of analysis, such as exploratory data analysis, sentiment analysis, topic modeling etc.

I would like to start with designing the app for some exploratory data analysis on these most popular YouTube videos

  • video duration vs. the number of likes
  • the most frequently occurred tags

In the future articles, I will explore more techniques such as topic modeling and natural language processing to analyze the video title and comments. Therefore, if you would like to read more of my articles on Medium, I would really appreciate your support by signing up Medium membership ☕.

Define plotVideoDurationStats()

I would like to know whether video duration has some correlation with the number of likes for these popular videos. To achieve this, we firstly need to transform the contentDetails.duration from ISO datetime format into numeric values using isodate.parse_duration().total_seconds(). Then we can use scatter plot to visualize the video duration against the likes count. This is carried out using Plotly which allows more interactive experience for end users. The code snippet below returns the Plotly figure which will be embedded into our web app.

import isodate
import plotly.express as px

def plotVideoDurationStats(video_df):
    video_df['contentDetails.duration'] = video_df['contentDetails.duration'].astype(str)
    video_df['duration'] = video_df['contentDetails.duration'].apply(lambda x: isodate.parse_duration(x).total_seconds())
    fig = px.scatter(video_df, x="duration", y='statistics.likeCount', color_discrete_sequence=px.colors.qualitative.Safe)
    return fig
figure output from plotVideoDurationStats (image by author)
figure output from plotVideoDurationStats (image by author)

To explore more tutorials based on Plotly, check out these blogs below:

An Interactive Guide to Hypothesis Testing in Python

How to Use Plotly for More Insightful and Interactive Data Explorations

Define plotTopNTags()

This function creates the figure of top N tags of a certain video category. Firstly, we iterate through all snippet.tags and collect all tags into a tag list. We then create the tags_freq_df that describe the counts of top N most frequent tags. Lastly, we use px.bar() to display the chart.

def plotTopNTags(video_df, topN):
    tags = []
    for i in video_df['snippet.tags']:
        if type(i) != float:
            tags.extend(i)
    tags_df = pd.DataFrame(tags)
    tags_freq_df = tags_df.value_counts().iloc[:topN].rename_axis('tag').reset_index(name='frequency')
    fig = px.bar(tags_freq_df, x='tag', y='frequency')
    return fig
figure output from plotTopNTags() (image by author)
figure output from plotTopNTags() (image by author)

Step 3. Construct Web App Interface

Step 3 in Developing Data Analytics Web App (image by author)
Step 3 in Developing Data Analytics Web App (image by author)

We will use Streamlit to develop the web app interface. It is the easiest tool I found so far for web app development running on top of Python. It saves us the hassle to handle the HTTP request, define routes, or write HTML and CSS code.

Run !pip install Streamlit to install Streamlit to your machine, or use this documentation to install Streamlit in your preferred development environment.

Creating a web app component is very easy using Streamlit. For example displaying a title is as simple as below:

import streamlit as st
st.title('Trending YouTube Videos')

Here we need several components to build the web app.

1) input: a dropdown menu for users to select video category

dropdown menu (image by author)
dropdown menu (image by author)

This code snippet allows us to create a dropdown menu with the prompt "Select YouTube Video Category" and options to choose from ‘Film & Animation’, ‘Music’, ‘Sports’, ‘Pets & Animals’.

videoCategory = st.selectbox(
    'Select YouTube Video Category',
    ('Film &amp; Animation', 'Music', 'Sports', 'Pets &amp; Animals')
)

2) input: a slider for users choose the number of tags

slider (image by author)
slider (image by author)

This defines the slider and specifies the slider range from 0 to 20.

topN = st.slider('Select the number of tags to display',0, 20)

3) output: a figure that displays the video duration vs. number of likes

video duration vs. number of likes (image by author)
video duration vs. number of likes (image by author)

We firstly create the videoCategoryDictto convert the category name into categoryId, then pass the categoryId through the extractYouTubeData() __ function that we defined previously. Check out this page for a full list of the video category their corresponding categoryId.

We then call the plotVideoDuration() function and display the plotly chart using st.plotly_chart().

videoCategoryDict = {'Film &amp; Animation': 1, 'Music': 10, 'Sports': 17, 'Pets &amp; Animals': 15}
categoryId = videoCategoryDict[videoCategory]
video_df = extractYouTubeData(youtube, categoryId)
duration_fig = plotVideoDurationStats(video_df)
fig_title1 = 'Durations(seconds) vs Likes in Top ' + videoCategory + ' Videos'
st.subheader(fig_title1)
st.plotly_chart(duration_fig)

4) output: a figure that displays the top tags in that video category

top tags in the video category (image by author)
top tags in the video category (image by author)

The last component requires us to feed user’s input of number of tags to the function plotTopNTags(), and create the plot by calling the function.

tag_fig = plotTopNTags(video_df, topN)
fig_title2 = 'Top '+ str(topN) + ' Tags in ' + videoCategory + ' Videos'
st.subheader(fig_title2)
st.plotly_chart(tag_fig)

These code statements can be all contained in a single Python file python (e.g. YoutTubeDataApp.py). Then we navigate to the command line and use !streamlit run YouTubeDataApp.pyto run the app in a web browser.


Take-Home Message

Building a web app may seem intimidating for data analysts and data scientists. This post covers following three steps to get your hands on building your first web app and extend your data analytics projects to a self-served platform:

  • Extract Data and Build Database
  • Define Data Analytics Process as Functions
  • Construct Web App Interface

More Resources Like This

Get Started in Data Science

EDA and Feature Engineering Techniques

How to Use Plotly for More Insightful and Interactive Data Explorations


Originally published at https://www.visual-design.net on Feb 23rd, 2023.


Related Articles