The world’s leading publication for data science, AI, and ML professionals.

A Guide to Building Effective Training Pipelines for Maximum Results

Lesson 2: Training Pipelines. ML Platforms. Hyperparameter Tuning.

THE FULL STACK 7-STEPS MLOPS FRAMEWORK

Photo by Hassan Pasha on Unsplash
Photo by Hassan Pasha on Unsplash

This tutorial represents lesson 2 out of a 7-lesson course that will walk you step-by-step through how to design, implement, and deploy an ML system using MLOps good practices. During the course, you will build a production-ready model to forecast energy consumption levels for the next 24 hours across multiple consumer types from Denmark.

By the end of this course, you will understand all the fundamentals of designing, coding and deploying an ML system using a batch-serving architecture.

This course targets mid/advanced Machine Learning engineers who want to level up their skills by building their own end-to-end projects.

Nowadays, certificates are everywhere. Building advanced end-to-end projects that you can later show off is the best way to get recognition as a professional engineer.


Table of Contents:

  • Course Introduction
  • Course Lessons
  • Data Source
  • Lesson 2: Training Pipelines. ML Platforms. Hyperparameter Tuning.
  • Lesson 2: Code
  • Conclusion
  • References

Course Introduction

At the end of this 7 lessons course, you will know how to:

  • design a batch-serving architecture
  • use Hopsworks as a feature store
  • design a feature engineering pipeline that reads data from an API
  • build a training pipeline with hyper-parameter tunning
  • use W&B as an ML Platform to track your experiments, models, and metadata
  • implement a batch prediction pipeline
  • use Poetry to build your own Python packages
  • deploy your own private PyPi server
  • orchestrate everything with Airflow
  • use the predictions to code a web app using FastAPI and Streamlit
  • use Docker to containerize your code
  • use Great Expectations to ensure data validation and integrity
  • monitor the performance of the predictions over time
  • deploy everything to GCP
  • build a CI/CD pipeline using GitHub Actions

If that sounds like a lot, don’t worry, after you will cover this course you will understand everything I said before. Most importantly, you will know WHY I used all these tools and how they work together as a system.

If you want to get the most out of this course, I suggest you access the GitHub repository containing all the lessons’ code. This course is designed to read and replicate the code along the articles quickly.

By the end of the course, you will know how to implement the diagram below. Don’t worry if something doesn’t make sense to you. I will explain everything in detail.

Diagram of the architecture you will build during the course [Image by the Author].
Diagram of the architecture you will build during the course [Image by the Author].

By the end of Lesson 2, you will know how to implement and integrate the training pipeline and ML platform.

Note: This is the most extended lesson, as I couldn’t logically split the training pipeline from the ML platform. Enjoy!


Course Lessons:

  1. Batch Serving. Feature Stores. Feature Engineering Pipelines.
  2. Training Pipelines. ML Platforms. Hyperparameter Tuning.
  3. Batch Prediction Pipeline. Package Python Modules with Poetry.
  4. Private PyPi Server. Orchestrate Everything with Airflow.
  5. Data Validation for Quality and Integrity using GE. Model Performance Continuous Monitoring.
  6. Consume and Visualize your Model’s Predictions using FastAPI and Streamlit. Dockerize Everything.
  7. Deploy All the ML Components to GCP. Build a CI/CD Pipeline Using Github Actions.
  8. [Bonus] Behind the Scenes of an ‘Imperfect’ ML Project – Lessons and Insights

If you want to grasp this lesson fully, we recommend you check out the previous lesson, which talks about designing a batch-serving architecture, building a FE pipeline, and loading features into the feature store:

A Framework for Building a Production-Ready Feature Engineering Pipeline


Data Source

We used a free & open API that provides hourly energy consumption values for all the energy consumer types within Denmark [1].

They provide an intuitive interface where you can easily query and visualize the data. You can access the data here [1].

The data has 4 main attributes:

  • Hour UTC: the UTC datetime when the data point was observed.
  • Price Area: Denmark is divided into two price areas: DK1 and DK2 – divided by the Great Belt. DK1 is west of the Great Belt, and DK2 is east of the Great Belt.
  • Consumer Type: The consumer type is the Industry Code DE35, owned and maintained by Danish Energy.
  • Total Consumption: Total electricity consumption in kWh

Note: The observations have a lag of 15 days! But for our demo use case, that is not a problem, as we can simulate the same steps as it would be in real-time.

A screenshot from our web app showing how we forecasted the energy consumption for area = 1 and consumer_type = 212 [Image by the Author].
A screenshot from our web app showing how we forecasted the energy consumption for area = 1 and consumer_type = 212 [Image by the Author].

The data points have an hourly resolution. For example: "2023–04–15 21:00Z", "2023–04–15 20:00Z", "2023–04–15 19:00Z", etc.

We will model the data as multiple time series. Each unique price area and consumer type tuple represents its unique time series.

Thus, we will build a model that independently forecasts the energy consumption for the next 24 hours for every time series.

Check out the video below to better understand what the data looks like 👇


Lesson 2: Training Pipelines. ML Platforms. Hyperparameter Tuning.

The Goal of Lesson 2

This lesson will teach you how to build the training pipeline and use an ML platform, as shown in the diagram below 👇

Diagram of the final architecture with the Lesson 2 components highlighted in blue [Image by the Author].
Diagram of the final architecture with the Lesson 2 components highlighted in blue [Image by the Author].

More concretely, we will show you how to use the data from the Hopsworks feature store to train your model.

Also, we will show you how to build a forecasting model using LightGBM and Sktime that will predict the energy consumption levels for the next 24 hours between multiple consumer types across Denmark.

Another critical step we will cover is how to use W&B as an ML platform that will track your experiments, register your models & configurations as artifacts, and perform hyperparameter tuning to find the best configuration for your model.

Finally, based on the best config found in the hyperparameter tuning step, we will train the final model on the whole dataset and load it into the Hopsworks model registry to be used further by the batch prediction pipeline.

NOTE: This course is not about time series forecasting or hyperparameter tuning. This is an ML engineering course where I want to show you how multiple pieces come together into a single system. Thus, I will keep things straight to the point for the DS part of the code without going into too much detail.


Theoretical Concepts & Tools

Sktime: Sktime **** is a Python package that provides tons of functionality for time series. It follows the same interface as Sklearn, hence its name. Using Sktime, we can quickly wrap LightGBM and perform forecasting for 24 hours in the future, cross-validation, and more. Sktime official documentation [3]

LightGBM: LightGBM is a boosting tree-based model. It is built on top of Gradient Boosting and XGBoost, offering performance and speed improvements. Starting with XGBoost or LightGBM is a common practice. LightGBM official documentation [4]

If you want to learn more about LightGBM, check out my article, where I explain in 15 minutes everything you need to know, from decision trees to LightGBM.

ML Platform: An ML platform is a tool that allows you to easily track your experiments, log metadata about your training, upload and version artifacts, data lineage and more. An ML platform is a must in any training pipeline. You can intuitively see an ML platform as your central research & experimentation hub.

Weights & Biases: W&B is a popular serverless ML platform. We choose them as our ML platform because of 3 main reasons:

  1. their tool is fantastic & very intuitive to use
  2. they provide a generous freemium version for personal research and projects
  3. it is serverless – no pain in deploying & maintaining your tools

Training Pipeline: The training pipeline is a logical construct (a single script, an application, or more) that takes curated and validated data as input (a result from the data and feature engineering pipelines) and outputs a working model as an artifact. Usually, the model is uploaded into a model registry that can later be accessed by various inference pipelines (the batch prediction pipeline from our series is an example of a concrete implementation of an inference pipeline).


Lesson 2: Code

You can access the GitHub repository here.

Note: All the installation instructions are in the READMEs of the repository. Here we will jump straight to the code.

All the code within Lesson 2 is located under the training-pipeline folder.

The files under the training-pipeline folder **** are structured as follows:

A screenshot that shows the structure of the training-pipeline folder [Image by the Author].
A screenshot that shows the structure of the training-pipeline folder [Image by the Author].

All the code is located under the training_pipeline directory (note the "_" instead of "-").

Directly storing credentials in your git repository is a huge security risk. That is why you will inject sensitive information using a .env file.

The .env.default is an example of all the variables you must configure. It is also helpful to store default values for attributes that are not sensitive (e.g., project name).

A screenshot of the .env.default file [Image by the Author].
A screenshot of the .env.default file [Image by the Author].

Prepare Credentials

First of all, we have to create a .env file ** where we will add all our credentials. I already showed you in [Lesson 1](https://towardsdatascience.com/a-framework-for-building-a-production-ready-feature-engineering-pipeline-f0b29609b20f) how to set up your .env file. Also, I explained in Lesson 1 how the variables from the .env file are loaded from your ML_PIPELINE_ROOT_DIR directory into a SETTINGS** Python dictionary to be used throughout your code.

Thus, if you want to replicate what I have done, I strongly recommend checking out Lesson 1.

If you only want a light read, you can completely skip the "Prepare Credentials" step.

In Lesson 2, we will use two services:

  1. Hopsworks
  2. Weights & Biases

Hopsworks (free)

We already showed you in Lesson 1 how to set up the credentials for Hopsworks. Please visit the "Prepare Credentials" section from Lesson 1, where we showed you in detail how to set up the API KEY for Hopsworks.

Weights & Biases (free)

To keep the lessons compact, we assume that you already read and applied the steps for preparing the credentials for Hopsworks from Lesson 1.

The good news is that 90% of the steps are similar to the ones for configuring Hopsworks, except for how you can get your API key from W&B.

First, create an account on W&B. After, create a team (aka entity) and a project (or use your default ones, if you have any).

Then, check the image below to see how to get your own W&B API KEY 👇

Go to your W&B account. After, in the top-right corner, click your profile account, then "User settings." Once in your user settings, scroll down until you reach the "Danger Zone" card. Then, under the "API keys," hit the "New key" button. Copy your API key, and that is it. You have your API key [Image by the Author].
Go to your W&B account. After, in the top-right corner, click your profile account, then "User settings." Once in your user settings, scroll down until you reach the "Danger Zone" card. Then, under the "API keys," hit the "New key" button. Copy your API key, and that is it. You have your API key [Image by the Author].

Once you have all your W&B credentials, go to your .env file and replace them as follows:

  • WANDB_ENTITY: your entity/team name (ours: "teaching-Mlops")
  • WANDB_PROJECT: your project name (ours: _"energyconsumption")
  • WANDB_API_KEY: your API key

Loading the Data From the Feature Store

As always, the first step is to access the data used to train and test the model. We already have all the data in the Hopsworks feature store. Thus, downloading it becomes a piece of cake.

The code snippet below has the load_dataset_from_feature_store() IO function under the training_pipeline/data.py file. You will use this function to download the data for a given feature_view_version and training_dataset_version.

NOTE: By giving a specific data version, you will always know with what data you trained and evaluated the model. Thus, you can consistently reproduce your results.

Using the function below, we perform the following steps:

  1. We access the Hopsworks feature store.
  2. We get a reference to the given version of the feature view.
  3. We get a reference to the given version of the training data.
  4. We log to W&B all the metadata that relates to the used dataset.
  5. Now that we downloaded the dataset, we run it through the prepare_data() function. We will detail it a bit later. For now, notice that we split the data between train and test.
  6. We log to W&B all the metadata related to how we split the dataset, plus some basic statistics for every split, such as split size and features.

Important observation: Using W&B, you log all the metadata that describes how you extracted and prepared the data. By doing so, you can easily understand for every experiment the origin of its data.

By using run.use_artifact(""), you can link different artifacts between them. In our example, by calling run.use_artifact("energy_consumption_denmark_feature_view:latest") we linked this W&B run with an artifact created in a different W&B run.

Check out the video below to see how the W&B runs & artifacts look like in the W&B interface 👇

Now, let’s dig into the prepare_data() function.

_I want to highlight that in the prepare_data() function, we won’t perform any feature engineering steps._

As you can see below, in this function, you will restructure the data to be compatible with the sktime interface, pick the target, and split the data.

The data is modeled for hierarchical time series, translating to multiple independent observations of the same variable in different contexts. In our example, we observe the energy consumption for various areas and energy consumption types.

Sktime, for hierarchical time series, expects the data to be modeled using multi-indexes, where the datetime index is the last. To learn more about hierarchical forecasting, check out Sktime’s official tutorial [7].

Also, we can safely split the data using sktime’s temporal_train_test_split() function. The test split has the length of the given fh (=forecast horizon).

One key observation is that the test split isn’t sampled randomly but based on the latest observation. For example, if you have data from the 1st of May 2023 until the 7th of May 2023 with a frequency of 1 hour, then the test split with a length of 24 hours will contain all the values from the last day of the data, which is 7th of May 2023.


Building the Forecasting Model

Baseline model

Firstly, you will create a naive baseline model to use as a reference. This model predicts the last value based on a given seasonal periodicity.

For example, if seasonal_periodicity = 24 hours, it will return the value from "present – 24 hours".

Using a baseline is a healthy practice that helps you compare your fancy ML model to something simpler. The ML model is useless if you can’t beat the baseline model with your fancy model.

Fancy ML model

We will build the model using Sktime and LightGBM.

Check out Sktime documentation [3] and LightGBM documentation [4] here.

If you are into time series, check out this Forecasting with Sktime tutorial [6]. If you only want to understand the system’s big picture, you can continue.

LightGBM will be your regressor that learns patterns within the data and forecasts future values.

Using the WindowSummarizer class from Sktime, you can quickly compute lags and mean & standard deviation for various windows.

For example, for the lag, we provide a default value of list(range(1, 72 + 1)), which translates to "compute the lag for the last 72 hours".

Also, as an example of the mean lag, we have the default value of [[1, 24], [1, 48], [1, 72]]. For example, [1, 24] translates to a lag of 1 and a window size of 24, meaning it will compute the mean in the last 24 days. Thus, in the end, for [[1, 24], [1, 48], [1, 72]], you will have the mean for the last 24, 46, and 72 days.

The same principle applies to the standard deviation values. Check out this doc to learn more [2].

You wrap the LightGBM model using the make_reduction() function from Sktime. By doing so, you can easily attach the WindowSummarizer you initialized earlier. Also, by specifying strategy = "recursive", you can easily forecast multiple values into the future using a recursive paradigm. For example, if you want to predict 3 hours into the future, the model will first forecast the value for T + 1. Afterward, it will use as input the value it forecasted at T + 1 to forecast the value at T + 2, and so on…

Finally, we will build the ForecastingPipeline where we will attach two transformers:

  1. transformers.AttachAreaConsumerType(): a custom transformer that takes the area and consumer type from the index and adds it as an exogenous variable. We will show you how we defined it.
  2. DateTimeFeatures(): a transformer from Sktime that computes different datetime-related exogenous features. In our case, we used only the day of the week and the hour of the day as additional features.

Note that these transformers are similar to the ones from Sklearn, as Sktime kept the same interface and design. Using transformers is a critical step in designing modular models. To learn more about Sklearn transformers and pipelines, check out my article about How to Quickly Design Advanced Sklearn Pipelines.

Finally, we initialized the hyperparameters of the pipeline and model with the given configuration.

The AttachAreaConsumerType transformer is quite easy to comprehend. We implemented it as an example to show what is possible.

Long story short, it just copies the values from the index into its own column.

IMPORTANT OBSERVATION – DESIGN DECISION

As you can see, all the feature engineering steps are built-in into the forecasting pipeline object.

You might ask: "But why? By doing so, don’t we keep the feature engineering logic in the training pipeline?"

Well, yes… and no…

We indeed defined the forecasting pipeline in the training script, but the key idea is that we will save the whole forecasting pipeline to the model registry.

Thus, when we load the model, we will also load all the preprocessing and postprocessing steps included in the forecasting pipeline.

This means all the feature engineering is encapsulated in the forecasting pipeline, and we can safely treat it as a black box.

This is one way to store the transformation + the raw data in the feature store, as discussed in Lesson 1.

We could have also stored the transformation functions independently in the feature store, but composing a single pipeline object is cleaner.


Hyperparameter Tuning

How to use W&B sweeps

You will use W&B to perform hyperparameter tuning. They provide all the methods you need. Starting from a regular Grid Search until a Bayesian Search.

W&B uses sweeps to do hyperparameter tuning. A sweep is a fancy word for a single experiment within multiple experiments based on your hyperparameter search space.

We will use the MAPE (mean absolute percentage error) metric to compare experiments to find the best hyperparameter configuration. We chose MAPE over MAE or RMSE because the values are normalized between [0, 1], thus making it easier to analyze.

Check out the video below to see how the sweeps board looks in W&B 👇

Now that we understand our goal let’s look at the code under the training_pipeline/hyperparamter_tuning.py file.

As you can see in the function below, we load the dataset from the feature store for a specific feature_view_version and a training_dataset_version.

Using solely the training data, we start the hyperparameter optimization.

Note: It is essential that you don’t use your test data for your hyperparameter optimization search. Otherwise, you risk overfitting your test split, and your model will not generalize. Your test split should be used only for the final decision.

Finally, we save the metadata of the run, which contains the sweep_id of the search.

Now, let’s look at the run_hyperparameter_optimization() function, which takes the training data, creates a new sweep and starts a W&B agent.

Within a single sweep run, we build the model and train the model using cross-validation.

As you can see, the config is provided by W&B based on the given hyperparameter search space (we will explain this in a bit). Also, we log the config as an artifact to access it later.

In our example, we used a simple grid search to perform hyperparameter tuning.

As you can see below, we created a Python dictionary called sweep_config with the **** method, the metric to minimize, and the parameters to search for.

Check out W&B official docs to learn more about sweeps [5].

Note: With a few tweaks, you can quickly run multiple W&B agents in parallel within a single sweep. Thus, speeding up the hyperparameter tuning drastically. Check out their docs if you want to learn more [5].

How to do cross-validation with time series data

So, I highlighted that it is critical to do hyperparameter-tuning only using the training dataset.

But then, on what split should you compute your metrics?

Well, you will be using cross-validation adapted to time series.

As shown in the image below, we used a 3-fold cross-validation technique. The key idea is that because you are using time series data, you can’t pick the whole dataset for every fold. It makes sense, as you can’t learn from the future to predict the past.

Thus, using the same principles as when we split the data between train and test, we sample 1/3 from the beginning of the dataset, where the forecasting horizon (the orange segment) is used to compute the validation metric. The next fold takes 2/3, and the last one 3/3 of the dataset.

Once again, Sktime makes our lives easier. Using the ExpandingWindowSplitter class and cv_evaluate() function, you can quickly train and evaluate the model using the specified cross-validation strategy – official docs here [8]

In the end, we restructured the results DataFrame, which the cv_evaluate() function returned to fit our interface.

Excellent, now you finished running your hyperparameter tuning step using W&B sweeps.

At the end of this step, we have a sweep_id that has attached multiple experiments, where each experiment has a config artifact.

Now we have to parse this information and create a best_config artifact.


Upload the Best Configuration from the Hyperparameter Tuning Search

Using the training_pipeline/best_config.py script, we will parse all the experiments for the given sweep_id and find the best experiment with the lowest MAPE validation score.

Fortunately, this is done automatically by W&B when we call the best_run() function. After, you resume the best_run and rename the run to best_experiment.

Also, you upload the config attached to the best configuration into its artifact called best_config.

Later, we will use this artifact to train models from scratch as often as we want.

Now you have the best_config artifact that tells you precisely what hyperparameters you should use to train your final model on.


Train the Final Model Using the Best Configuration

Finally, training and loading the final model to the model registry is the last piece of the puzzle.

Within the from_best_config() function from the training_pipeline/train.py file, we perform the following steps:

  1. Load the data from Hopsworks.
  2. Initialize a W&B run.
  3. Load the best_config artifact.
  4. Build the baseline model.
  5. Train and evaluate the baseline model on the test split.
  6. Build the fancy model using the latest best configuration.
  7. Train and evaluate the fancy model on the test split.
  8. Render the results to see how they perform visually.
  9. Retrain the model on the whole dataset. This is critical for time series models as you must retrain them until the present moment to forecast the future.
  10. Forecast future values.
  11. Render the forecasted values.
  12. Save the best model as an Artifact in W&B
  13. Save the best model in the Hopsworks’ model registry

Note: You can either use W&B Artifacts as a model registry or directly use the Hopsworks model registry feature. We will show you how to do it both ways.

Notice how we used wandb.log() to upload to W&B all the variables of interest.

Check out this video to visually see how we use W&B as an experiment tracker 👇

Train & evaluate the model.

To train any Sktime model, we implemented this general function that takes in any model, the data, and the forecast horizon.

Using the method below, we evaluated the model on the test split using both aggregated metrics and slices over all the unique combinations of areas and consumer types.

By evaluating the model on slices, you can quickly investigate for fairness and bias.

As you can see, most of the heavy lifting, such as the implementation of MAPE and RMSPE, is directly accessible from Sktime.

Render the results

Using Sktime, you can quickly render various time series into a single plot.

As shown in the video above, we rendered the results for every (area, consumer_type) combination in the W&B experiment tracker.

Visually comparing the prediction and real observations for area = 2 and consumer type = 119 [Image by the Author].
Visually comparing the prediction and real observations for area = 2 and consumer type = 119 [Image by the Author].
Visually observing forecasted values into the future for area = 2 and consumer type = 119 [Image by the Author].
Visually observing forecasted values into the future for area = 2 and consumer type = 119 [Image by the Author].

Upload the model to the model registry

The last step is to upload the model to a model registry. After the model is uploaded, it will be downloaded and used by our batch prediction pipeline.

During the experiment, we already uploaded the model as a W&B Artifact. If you plan to have dependencies with W&B in your applications, using it directly from there is perfectly fine.

But we wanted to keep the batch prediction pipeline dependent only on Hopsworks.

Thus, we used Hopswork’s model registry feature.

In the following code, based on the given best_model_artifact, we added a tag to the Hopsworks feature view to link the two. This is helpful for debugging.

Finally, we downloaded the best model weights and loaded them to the Hopsworks model registry using the mr.python.create_model() method.

Now with a few lines of code, you can download and run inference on your model without carrying any more about all the complicated steps we showed you in this lesson.

Check out Lesson 3 to see how we will build a batch prediction pipeline using the model from the Hopsworks model registry.


Conclusion

Congratulations! You finished the second lesson from the Full Stack 7-Steps MLOps Framework course.

If you have reached this far, you know how to:

  • use an ML platform for experiment & metadata tracking
  • use an ML platform for hyperparameter tuning
  • read data from the feature store based on a given version
  • build an encapsulated ML model and pipeline
  • upload your model to a model registry

Now that you understand the power of using an ML platform, you can finally take control over your experiments and quickly export your model as an artifact to be easily used in your inference pipelines.

Check out Lesson 3 to learn about implementing a batch prediction pipeline and packaging your Python modules using Poetry.

Also, you can access the GitHub repository here.


💡 My goal is to help machine learning engineers level up in designing and productionizing ML systems. Follow me on LinkedIn or subscribe to my weekly newsletter for more insights!

🔥 If you enjoy reading articles like this and wish to support my writing, consider becoming a Medium member. By using my referral link, you can support me without any extra cost while enjoying limitless access to Medium’s rich collection of stories.

Join Medium with my referral link – Paul Iusztin


References

[1] Energy Consumption per DE35 Industry Code from Denmark API, Denmark Energy Data Service

[2] WindowSummarizer Documentation, Sktime Documentation

[3] Sktime Documentation

[4] LightGBM Documentation

[5] W&B Sweeps Documentation, W&B Documentation

[6] Sktime Forecasting Tutorial, Sktime Documentation

[7] Sktime Hierarchical, Global, and Panel Forecasting Tutorial, Sktime Documentation

[8] Sktime Window Splitters Tutorial, Sktime Documentation


Related Articles