
For the majority of the time, data science/Data Analytics projects end up as delivering a static report or dashboard, which tremendously downgrades the efforts and thoughts being put into the process. Alternatively, web app is a great way to demonstrate your data analytics work, which can be further expanded as a service on self-served and interactive platforms. However, as data scientists or data analysts, we are not trained for developing softwares or websites. In this article, I would like to introduce tools like Streamlit and Plotly that allows us to easily develop and serve your data analytics projects through a web app, with the following three steps:

- Extract Data and Build Database
- Define Data Analytics Process as Functions
- Construct Web App Interface
Afterwards, we will be able to create a simple web app like this:

Step 1. Extract and Build Database

We will use YouTube Data as an example here, since it is relevant to our daily life. YouTube Data API allows us to get public YouTube data, such as video statistics (e.g. number of likes, views) or content details (e.g. tags, title, comments). To set up the YouTube API, it is required to sign up a Google Developer account and set up an API key. Here are some resources I found helpful to get myself started on using YouTube API.
- Python YouTube API Tutorial: Getting Started – Creating an API Key and Querying the API
- YouTube Data API Documentation
- Google API Python Client
These resources takes us through how to create a YouTube API key and install required library (e.g. googleapiclient.discovery). After these dependencies have been resolved, we set up the connection to the API using Python and your own API key, using the following command:
from googleapiclient.discovery import build
youtube = build('youtube', 'v3', developerKey=<your_api_key>)
After establishing the connection, it’s time to explore what data is available for your Data Science projects. To do this, take a look at the YouTube Data API documentation, which provides an overview of the different kinds of data that can be accessed.

We will use "Videos" as an example for this project and the list()
__ method allows us to request the "Video Resource" by passing the _par_t parameter and several _filter_s. _par_t parameter specifies which components from the Video Resource you would like to fetch and here I am getting _snippet, statistics, and contentDetail_s. Have a look at this documentation which details all fields you can get from videos().list()
method. And we specify the following _filte_r parameters to limit the results returned from this request.
chart='mostPopular'
: get the most popular videosregionCode='US'
: videos from USvideoCategoryId=1
: get the videos from a specific video category (e.g. 1 is for Film & Animation), which can be found in this catalog of video category ID.maxResults=20
: return a maximum number of 20 videos
video_request = youtube.videos().list(
part='snippet,statistics,contentDetails',
chart='mostPopular',
regionCode='US',
videoCategoryId=1,
maxResults=20
)
response = video_request.execute()
We then execute the request using video_request.execute()
and the response will be returned as JSON format, which typically looks like the snapshot below.

All information are stored in the "items" in the response. Then we extract the ‘items’ key and create the dataframe video_df
by normalizing the JSON format.
video_df = json_normalize(response['items'])
As the result, we manage to tidy up the output into a structure that is easier to manipulate.

To take a step further of working with JSON using Python, I recommend reading the article "How to Best Work with JSON in Python".
Step 2. Define Data Analytics Process as Function

We can package multiple lines of code statements into one function, so that it can be iteratively executed and easily embedded with other web app components at the later stage.
Define extractYouTubeData()
For instance, we can encapsulate the data extraction process above into a function: extractYouTubeData(youtube, categoryId)
, which allows us to pass a categoryId
variable and output the top 20 popular videos under that category as video_df
. In this way, we can get user’s input on which category they would like to select, then feed the input into this function and get the corresponding top 20 videos.
def extractYouTubeData(youtube, categoryId):
video_request = youtube.videos().list(
part='snippet,statistics,contentDetails',
chart='mostPopular',
regionCode='US',
videoCategoryId=categoryId,
maxResults=20
)
response = video_request.execute()
video_df = json_normalize(response['items'])
return video_df
We can use video_df.info()
to get all fields in this dataframe.

With this valuable dataset we can carry out a large variety of analysis, such as exploratory data analysis, sentiment analysis, topic modeling etc.
I would like to start with designing the app for some exploratory data analysis on these most popular YouTube videos
- video duration vs. the number of likes
- the most frequently occurred tags
In the future articles, I will explore more techniques such as topic modeling and natural language processing to analyze the video title and comments. Therefore, if you would like to read more of my articles on Medium, I would really appreciate your support by signing up Medium membership ☕.
Define plotVideoDurationStats()
I would like to know whether video duration has some correlation with the number of likes for these popular videos. To achieve this, we firstly need to transform the contentDetails.duration
from ISO datetime format into numeric values using isodate.parse_duration().total_seconds()
. Then we can use scatter plot to visualize the video duration against the likes count. This is carried out using Plotly which allows more interactive experience for end users. The code snippet below returns the Plotly figure which will be embedded into our web app.
import isodate
import plotly.express as px
def plotVideoDurationStats(video_df):
video_df['contentDetails.duration'] = video_df['contentDetails.duration'].astype(str)
video_df['duration'] = video_df['contentDetails.duration'].apply(lambda x: isodate.parse_duration(x).total_seconds())
fig = px.scatter(video_df, x="duration", y='statistics.likeCount', color_discrete_sequence=px.colors.qualitative.Safe)
return fig

To explore more tutorials based on Plotly, check out these blogs below:
An Interactive Guide to Hypothesis Testing in Python
How to Use Plotly for More Insightful and Interactive Data Explorations
Define plotTopNTags()
This function creates the figure of top N tags of a certain video category. Firstly, we iterate through all snippet.tags
and collect all tags into a tag list. We then create the tags_freq_df
that describe the counts of top N most frequent tags. Lastly, we use px.bar()
to display the chart.
def plotTopNTags(video_df, topN):
tags = []
for i in video_df['snippet.tags']:
if type(i) != float:
tags.extend(i)
tags_df = pd.DataFrame(tags)
tags_freq_df = tags_df.value_counts().iloc[:topN].rename_axis('tag').reset_index(name='frequency')
fig = px.bar(tags_freq_df, x='tag', y='frequency')
return fig

Step 3. Construct Web App Interface

We will use Streamlit to develop the web app interface. It is the easiest tool I found so far for web app development running on top of Python. It saves us the hassle to handle the HTTP request, define routes, or write HTML and CSS code.
Run !pip install Streamlit
to install Streamlit to your machine, or use this documentation to install Streamlit in your preferred development environment.
Creating a web app component is very easy using Streamlit. For example displaying a title is as simple as below:
import streamlit as st
st.title('Trending YouTube Videos')
Here we need several components to build the web app.
1) input: a dropdown menu for users to select video category

This code snippet allows us to create a dropdown menu with the prompt "Select YouTube Video Category" and options to choose from ‘Film & Animation’, ‘Music’, ‘Sports’, ‘Pets & Animals’.
videoCategory = st.selectbox(
'Select YouTube Video Category',
('Film & Animation', 'Music', 'Sports', 'Pets & Animals')
)
2) input: a slider for users choose the number of tags

This defines the slider and specifies the slider range from 0 to 20.
topN = st.slider('Select the number of tags to display',0, 20)
3) output: a figure that displays the video duration vs. number of likes

We firstly create the videoCategoryDict
to convert the category name into categoryId
, then pass the categoryId
through the extractYouTubeData()
__ function that we defined previously. Check out this page for a full list of the video category their corresponding categoryId.
We then call the plotVideoDuration()
function and display the plotly chart using st.plotly_chart()
.
videoCategoryDict = {'Film & Animation': 1, 'Music': 10, 'Sports': 17, 'Pets & Animals': 15}
categoryId = videoCategoryDict[videoCategory]
video_df = extractYouTubeData(youtube, categoryId)
duration_fig = plotVideoDurationStats(video_df)
fig_title1 = 'Durations(seconds) vs Likes in Top ' + videoCategory + ' Videos'
st.subheader(fig_title1)
st.plotly_chart(duration_fig)
4) output: a figure that displays the top tags in that video category

The last component requires us to feed user’s input of number of tags to the function plotTopNTags()
, and create the plot by calling the function.
tag_fig = plotTopNTags(video_df, topN)
fig_title2 = 'Top '+ str(topN) + ' Tags in ' + videoCategory + ' Videos'
st.subheader(fig_title2)
st.plotly_chart(tag_fig)
These code statements can be all contained in a single Python file python (e.g. YoutTubeDataApp.py). Then we navigate to the command line and use !streamlit run YouTubeDataApp.py
to run the app in a web browser.
Take-Home Message
Building a web app may seem intimidating for data analysts and data scientists. This post covers following three steps to get your hands on building your first web app and extend your data analytics projects to a self-served platform:
- Extract Data and Build Database
- Define Data Analytics Process as Functions
- Construct Web App Interface
More Resources Like This
EDA and Feature Engineering Techniques
How to Use Plotly for More Insightful and Interactive Data Explorations
Originally published at https://www.visual-design.net on Feb 23rd, 2023.