How to Deploy Large-Size Deep Learning Models into Production

Deploy deep learning models with GB sizes of training, and host them online as a business production

Published in

Towards Data Science

6 min readDec 12, 2021

Deploy large-size deep learning models — Photo by Alex Knight from Pexels

It is significant to learn how to deploy deep learning models out of hard work from the local machine as offline productions to online productions, but one of the main challenges is the large size of the trained model. This post will show different solutions and ways to follow when it comes to the deployment of deep learning models with bigger sizes.

Deep learning Models After Training

After choosing the best algorithms with the best tuning and prediction time, you have therefore finally finished training your model.
Time to tell your friends about your success and celebrate, right? Not really the case if you want to deploy the model online with a fast predicting response and managing a large size training load!

Web services, on the other hand, are cheaper and can provide free space to deploy ML models, but with limitations in terms of size like Heroku Platform and Streamlit Cloud or, you have to pay for expensive services to deploy deep learning models. But again, the size will be a big challenge when it comes to gigabyte trained models that need high computing power in GPUs or TPUs.

CPU performance availability is an issue if the model was trained in GPU or TPU cloud service. The deep learning model might encounter slow processing to be online, or available to other applications to serve, such as through API calls. And it is difficult to deploy a deep learning model due to its huge size and other issues such as, running and training models on TPUs and deploying on CPUs. So, choosing how and which cloud service to deploy with the same performance as a local machine or Colab notebook training is essential with hardware considerations.

Deploying Models in Deep Learning

There are many different ways to deploy deep learning models as a web app by using Python frameworks like Streamlit, Flask, and Django. Then, build a REST API for model service using Flask RESTful to interact with other applications online and make your model act on time when it's called. (Here is a good article for building API ).

After building a web app interface, it is time to deploy the large size of TensorFlow, Keras, or PyTorch models into a real environment. The below example uses Streamlit to load the Tensor model, but make sure how to save and load your own model by reading the documentation of that particular library as underlined.

Where and how to store a large training model?

There are many ways, but let’s show the most used and easy to work with for free, but with limitations based on how much you consume and which region to store your data (stay with the free offers and then later you can expand it with a less costly price on the top of the free subscription). Amazon S3 and Google Cloud Storage can be categorized as “Cloud Storage” tools. Find more about Amazon S3 and Google Cloud Storage for creating a storage bucket (containers that hold your data) to store your Gb size model.

Here is an example of using AWS S3 cloud storage
Code example of creating Cloud Storage in GCP (Google Cloud Platform) as shown below:

# Colab Notebook! Run code in each cell
from google.colab import auth
auth.authenticate_user()  # Authenticate your cloud account# your project ID in GCP ( from google console my project ) 
CLOUD_PROJECT = 'axial-trail-334'# storage name
BUCKET = 'gs://' + 'axial-trail-334' + '-tf2-safarji-model' 
print(BUCKET)# with $ or !gcloud config set project 'axial-trail-334'
!gcloud config set project $CLOUD_PROJECT # with $  (create the BUCKET)
!gsutil mb $BUCKET# Revoked credentia
ls!gcloud auth revoke --all## with $  (check BUCKET) 
!gsutil ls -r $BUCKET

With this, you are ready to deploy the model to AI platform. In AI platforms, model resources contain different versions of models. Model names must be unique across projects. First, let’s create a model.

# Create model in Google Cloud (GCP) after following above steps 
MODEL = 'newmodel-T5'!gcloud ai-platform models create $MODEL --regions=us-central1# check google

In AI Platform Cloud Console, you will see your model there as below:

Create a version of your model deployment based on your model types TensorFlow or others.

Important: If the Bucket size of the saved model is very large in Gb, you won't be able to get the AI Platform service as shown below because the size is exceeded the limit.

Unfortunately, this limit is currently not adjustable, but may be adjusted in the future. In the meantime, you need to adjust the vocab and embed size or reduce the model size overall. Also, contact Google Cloud to adjust the quota on a per-project basis cloudml-feedback@google.com with your project name.
* The issue is for most cloud providers of their AI platform size to accept large models.

Is there another solution?

Yes, you can choose a custom deep learning cloud instance (VM) and size to deploy your best requirements for your model that is needed in terms of GPU and TPUs. However, the bucket will be the same to store the model and load it to the new instance of your new server by GCP, or AWS instances.

At the top of that: there is another service in AWS for deploying even transformers as they are large in size:

Amazon SageMaker - Machine Learning - Amazon Web Services

Enable more people to innovate with ML through a choice of tools-integrated development environments for data…

aws.amazon.com

Storage Bucket in Cloud Services

After creating the storage bucket in cloud services, it is time to store and load the model. Here is a snippet of loading:

Note: make sure to use cache and other tactics to optimize the model load in the web app based on the framework you use because it will help not to consume time and bandwidth to call the model again for each use or prediction.

Conclusion:

Training deep learning models with large size is just one aspect of a data science project which puts a lot of effort into making it available in production (online). In this post, we have shown different solutions and ways to follow when it comes to the deployment of deep learning models with bigger sizes.

Depending on your application, you may need to select one of the available options which cloud services and which custom instances support model deployment and infrastructure. However, this post is just one example of what I’ve experienced when deploying a large GB-sized deep learning model. But I think it provided a starting point for another overview in terms of cloud services and cloud storage tools. Deep learning models in production have many other follow-up actions, including training evaluation on GPU and TPU; however, these are intended for the next post.

References:

Deploying models | AI Platform Prediction | Google Cloud

This page explains how to deploy your model to AI Platform Prediction to get predictions. In order to deploy your…

cloud.google.com

How to build, train, and deploy a machine learning model with Amazon SageMaker | AWS

Learn how to build, train, and deploy a machine learning model with Amazon SageMaker in 10 minutes.

aws.amazon.com

Save and load models | TensorFlow Core

If you are using the SavedModel format, you can skip this section. The key difference between HDF5 and SavedModel is…

www.tensorflow.org

ACG Videos

In Cloud Provider Comparisons, we take a look at the same cloud services across the three major public cloud providers…

acloudguru.com

All source code in this post and more can be found over at my GitHub at: https://gist.github.com/A-safarji