The world’s leading publication for data science, AI, and ML professionals.

Build and Deploy simple ML Tools by stacking coolest libraries

An end-to-end tutorial to learn how to create value from data. All the steps of a project that makes Data Science an incredible playground…

Photo by Max Duzij on Unsplash
Photo by Max Duzij on Unsplash

The online courses are very well built to learn the concepts of Data Science and Machine Learning. But in the end, one always wonders if the role of a Data Analyst or Data Scientist is just to answer problems by coding things on his side 🤷‍♂️.

Let’s imagine the following discussion:

Ben: Hey, with the marketing team we’d like to know when is the best time to invest in advertising on product A next month. Is it possible to predict future Google search trends?

You: Hmmm yeah I think I can look on Google Trends to see how trends evolve over time. And we should be able to apply some forecasting methods to get an idea of the searches next month.

Ben: That would be perfect!

You: Okay, I’m going to export the Google Trends data from some keywords. Then I’m going to code a small model and send you the graph of the predictions.

Ben (the next day): Thanks for the report, we changed our mind a bit, can you do the same for the B product?

You: Yeah, sure.

Ben (2 hours later): And finally, we would like to see the forecasts for the whole quarter to know if it is not more judicious to wait a little bit.

You: Mmmh yes…

Ben (2 hours later): And is it possible to compare C and D over the next 6 months?

You: 😓


You know what I mean…

Allowing other people to use, interact, and modify a model without getting into the code makes things much more interesting for everyone.

In this tutorial, we will see the different steps of building this trend prediction application with very simple tools. In the end, it will look like this:

All the code and the GitHub repo is included in the tutorial.

On more complex problems it will of course be more difficult, but the mindset is the same: bring information through data directly where it is needed.


Step #0: Environment setting

The first thing to do before going headlong is to prepare your work environment.

Virtual Environment

Here (as in any project) we will be working with several packages. In order to have total control over the tools you use, it is always recommended to work in a virtual environment.

We will use Anaconda, a Python distribution that offers a package management tool called Conda. It will allow us to easily update and install the libraries we need for our developments.

Anaconda | The World’s Most Popular Data Science Platform

Download the latest version of Anaconda, then run the program "Anaconda Prompt". We could also use the GUI, but we’re not going to spend too much time on this part.

From the Anaconda command prompt, configure a new environment, activate it, and install the basic packages.

conda create -n trends-predictor python
conda activate trends-predictor
conda install pandas numpy matplotlib

We will install the other packages as we go along.

To edit your code easily, choose your preferred IDE and configure it if needed with your virtual environment. To make it simple, I will use Spyder which is installed with anaconda. To launch it, type spyder in your command prompt.

Git/GitHub

We’re starting to look good! One last thing, we initialize our git repository and link it to GitHub. I pass the details here, there is plenty of online tutorials.

Why is it used here? Spoiler: In the end, we will deploy our application with Heroku, and it’s very simple to do it from a GitHub repo.

From git bash, type:

echo "# trends-predictor" >> README.md
git init
git add README.md
git commit -m "first commit"
git remote add origin <https://github.com/elliottrabac/trend-predictor.git>
git push -u origin master

Okay, now that everything is ready, it’s time to code. We will do this in several steps

  1. Access data automatically
  2. Make predictions in the future
  3. Make your tool accessible to everyone

Step #1: Access the data 🧲

There are plenty of ways to access data, and where we could download the files in csv, and then import them into our program. The problem is that it’s very manual…

To automate all this, we can use scrapping tools. Either directly with queries that browse the HTML content of a page, or even reproducing the actions of a user with Selenium.

But we will make it better/simpler. There is a package called "pytrends", which is designed to pull google trends using python.

💡 Tip: Always check if the work hasn’t already been done before coding something. There is probably already a library or a GitHub repo to use to do the job.

First of all, we need to install that package. You could find the comprehensive documentation of the pytrends API here.

pytrends

Install the package with pip(in your virtual environment):

pip install pytrends

Pull Google Trends

Let’s create a new python script and start by importing the necessary packages:

import pandas as pd
import pytrends
from pytrends.request import TrendReq

Using pytrend is very simple and well documented so we will go straight to the point.

All you have to do is to establish a connection with Google (with TrendReq), build the query with the keywords you are looking for (in the form of a list), and apply the desired method. Here we want the evolution of searches as shown on Google Trends’ Interest Over Time section, so we use: interest_over_time()

We will anticipate the next step (prediction) by deleting the isPartial column and renaming the others.

This returns a pandas.Dataframe with the data we need. Within a function, it looks like this:

Step #2: Make forecasts 🔮

Now it’s time to predict the future! We are here in the case of a time series forecasting.

Time series forecasting is the use of a model to predict future values based on previously observed values. – Wikipedia

To predict the next values, we have a whole range of possible tools and concepts. We can use statistical methods such as ARMA, AMIRA, even SARIMAX models 🤯 Today, we also find very powerful Deep Learning models for this kind of problem, such as the classic MLP, CNN, RNN, and their more advanced forms.

We will keep it simple and efficient, we’ll use Facebook’s Prophet. Like any model, it has its advantages and disadvantages; but we are not here to debate about which algorithm to use (online courses are very good for that).

Quick Start

We start by installing fbprophetin our work environment:

conda install -c conda-forge fbprophet

Make predictions

We will create a new function make_pred()that takes as parameter the data set, and the length of the period to predict.

In a very simple way, we create a new Prophet() object and fitthe model to the dataset. This one must always have two columns named ds and y (as we do it just before).

We extend the column containing the dates with the make_futur_dataframe(). Then we predict future values with predict(), which returns a dataframe with the predictions. Of course, you can play here with some "hyperparameters". All information is as always, in the documentation.

Plot the forecasts

But don’t forget our friend Ben, the goal is to send him a good and understandable graph 📈 .

Here we use Prophet’s plot()method. And since a little detail doesn’t hurt, we also provide him some additional information with plot_components().

That’s it! Now we have our graphs with the forecasts for the desired keyword.

If you are using Spyder IDE, you can see your dataframes and graphs in the Variable Explorer and Plot tab.

Graphs in Spyder IDE
Graphs in Spyder IDE

Step #3: Send it on the web 🚀

Now we get to the best part!

You can make predictions on your own and send them as a report. But here we’re going to allow anyone to choose keywords, options on predictions, and all this in a UX friendly interface!

Create a web app

We use a library that’s growing strongly, Streamlit (you’ll be able to check it with the trend prediction application at the end 😉 )

Streamlit – The fastest way to create data apps

No need for web development skills, no need to build an interface with Flask or Django, everything is done in a single script, in a few lines of code.

Start by installing the library.

pip install streamlit

And import it into your script.

import streamlit as st

Without going into details, Streamlit takes the lines one by one and displays the Streamlit elements in the interface.

We start by adding a title to our page with:

st.write("""
# Trend Predictor App :crystal_ball:
### This app predicts the **Google Trend** you want!
""")

Now we put some elements in the sidebar, this will allow the user to choose the parameters.

  • keyword will be displayed as a text field and will take the value of the content, the default value here is "Sunglasses".
  • pediodwill be displayed as a slider between 7 and 365 days with the default value at 100.
  • detailsis a boolean variable that is displayed as a checkbox.
st.sidebar.write("""
## Pick a keyword and a forecasting period :dizzy:
""")
keyword = st.sidebar.text_input("Keyword", "Sunglasses")
periods = st.sidebar.slider('Prediction time in days:', 7, 365, 90)
details = st.sidebar.checkbox("Show details")

To display the graphics we use:

st.pyplot(fig1)
if details:  # if detail checkbox set to True
    st.write("### Details :mag_right:")
    st.pyplot(fig2)

There are a lot of great tools in this library and here again, the documentation is amazing!✨ We just add a decorator above our get_data()function:

@st.cache(suppress_st_warning=True)

The @st.cache decorator indicates that Streamlit will perform internal magic so that any cached operation will run only once and cached for future use.

We will add some configuration elements to our page with:

st.beta_set_page_config(page_title="Trend Predictor",
                               page_icon=":crystal_ball",
                               layout='centered',
                               initial_sidebar_state='auto')

Not bad at all! Let’s see what it looks like locally?

To do so, open a terminal in the folder where your script is located and type:

streamlit run trend-prediction.py

🤩🤩🤩

Deploy it on the web

One last effort before you can send the link to Ben 💪 .

You can finish your work by deploying the application on the web. And Heroku is the perfect tool to do this! Easy to use, free, fast.

Cloud Application Platform | Heroku

I’m not going to lengthen this tutorial with all the deployment, just type "Streamlit Heroku" on Google.

🛑 The only thing to be careful! When you’re going to create your requirements.txtfile, you need to add an extra library. We installed Prophet with conda, but Heroku will need pystanas well. Think about adding it and it will work.

elliottrabac/trend-predictor

Conclusion

In this tutorial, we have seen that it is possible to combine different libraries that take care of each step of the building process. Of course, this project is very "theoretical": the predictions are basic and do not allow to adapt the parameters to the data set.

But the most important thing here is the global state of mind. This allows at least to test things quickly, to validate hypotheses, and to ship things to your team!

Happy learning!🥂 E.T.


About me 👨 ‍💻 :

I am an engineering student whose greatest passion is learning new things. After 5 years in mechanical engineering, I learned data science through the incredible resources that can be found online. I try to give back and continue to learn by writing a few posts.

Feel free to give me feedback on this article or contact me to discuss any topic!


Related Articles