TorchServe & Flask for Image Style Transfer

An example of web app backed by TorchServe model server

Published in

Towards Data Science

6 min readApr 20, 2023

Image by author. Exposing an ML model as decoupled model server is way more scalable, extensible and maintainable pattern.

In the previous post I showed an example of serving an image classification model with the TorchServe framework. Now let’s extend the example and make it a bit closer to the real world scenarios.

Let’s say I want to develop a web app to let users apply the filters to their images. As you know there’s a lot of such applications. One of the feature could be neural style transfer — users can upload an image with content and an image with style (or select a filter in the app) and get the new image of the content in desired style. Let’s build this example from end to end.

The focus of the post will be, of course, not how to send a request from Flask app to another url 😁 I’ll try to make it a bit more useful. First of all, I’ll show an example of a complex handler with additional dependencies. Then the model server will return an image instead of simply the labels or probabilities. Finally the code could be helpful as a working example of how to pass images in appropriate format between the browser, Flask app and model server. And just for fun let’s deploy the whole solution in Kubernetes.

GitHub repo with code is here.

(btw, if you want to see a very simple example of web app + TorchServe for image classification then checkout to the feature/classify branch in git)

In this post I assume you’re already familiar with basics of TorchServe (handler, model files etc.). If not refer to the previous post.

Neural style transfer

In case you don’t remember how the style transfer works here is a short description. It’s important to get the high-level overview in order to understand what will be going on in the handler for TorchServe.

“Inference” in style transfer is not just one pass of the input tensor through the net. Instead a tensor (which is going to be the output picture at the end) is passed many times and the tensor itself is modified so that to minimize the content and style loss functions. At each iteration the image is changed. And the “inference” is a sequence of the iterations.

For the solution it means the following:

the inference function in handler will be pretty complex. It’d be messy to put everything in handler.py. So, I’ll put it in additional module and show how to include it into TorchServe artifacts
a side effect: “inference” will take some time. To not ask a user for waiting a minute while Flask will reload the entire page I’ll use ajax request from browser to the app. So, the page in browser won’t be frozen

Model file

The VGG19 pretrained model is used in the solution. Generally speaking I just followed the style transfer official example from PyTorch: https://pytorch.org/tutorials/advanced/neural_style_tutorial.html

I had to slightly modify the state dict of the model to get rid of classifier layers (80M vs. 500M of full vgg19 model). You can check how the .pth file was produced in the notebook model_saving.ipynb in the repo. It generates the vgg19.pth artifact. The model_nst.py contains the definition of the model architecture.

Handler

Preprocess function

It is almost the same as you saw in the first post. The difference is that there are two images as input and the function must return only one tensor. So, the two will be just stacked in one to be split back later.

Postprocess function

The function is pretty straightforward. The only thing here is how to pass the image to the Flask app so that it can be correctly read from json. Passing it as bytes buffer worked for me.

Inference function

The code for “inference” is noticeable long. So, I didn’t place it directly in the handler module. Instead, it’s located in the utils.py. As I mentioned the code is from the official PyTorch example of style transfer, I won’t go into details.

Let’s see how to include additional modules into TorchServe artifacts. For model archiver you need to specify the extra files you want to include:

torch-model-archiver --model-name nst --version 1.0 \
 --model-file model_nst.py --serialized-file vgg19.pth \
 --handler handler_nst.py --extra-files utils.py

Now we’re good to go from the TorchServe side. Let me briefly walk you through the web app.

Flask app

The app has only two endpoints — to check the status of the model server and to generate the image with desired style. When the generation endpoint is called the app just forwards the content and style images to the TorchServe model server. Then it decodes the received generated image and return it back as json object.

As I mentioned to not waiting long time for reloading the whole page the generation request is sent as ajax. So, there is also a simple JQuery script for that:

Run the application

If you don’t want to run the whole solution in Kubernetes you can stop here and just start the model server from its directory as:

torchserve --start --model-store . --models nst=nst.mar \
 --ts-config torch_config

And application server:

python app.py

Now everything shall be functional. Go to localhost:5000 in browser, upload the content and style images, click “transfer style” and in a while you will get the generated image with desired style in your browser.

Run with Kubernetes in a single-node cluster

For the Kubernetes run on your local machine create two Docker images with the names flask_server and model_server from the Dockerfiles in the repo. There is also yaml file for Kubernetes. So, just apply it:

kubectl apply -f app_pod.yml

Don’t forget about port forwarding (e.g. bind 8700 port of your machine to the pod’s 5000 port):

kubectl port-forward pod/application-pod 8700:5000

And here we are. Go to your browser and open localhost:8700.

Forgive me the design and UI, I’m just a DS/ML engineer. I know it’s ugly. 😅

Live demo

For the screen recording below I’ll use my own pictures I have at my fingertips to avoid any author rights violation. Let me check if I can do a crazy thing: draw my cat with the symbols of International Specification for Orienteering Maps (ISOM). They are the symbols that are used to draw maps for sport orienteering competitions.

Well, looks interesting 🤪

Conclusion

This post together with the previous one shows how to serve your ML models with a dedicated serving framework and how to use an approach of model server detached from application server.

It’s shown that TorchServe allows flexible customisation of pre-, postprocessing and inference functions. Thus , you can incorporate any complex logic of your models.

Also the GitHub repo with the end-to-end example can be used as a starting point for your own experiments.

As a conclusion let me mention just few benefits that an approach with a model server offers:

more efficient use of hardware (e.g. model server can be deployed on a machine with GPUs while the application server may not need it)
dedicated serving frameworks offer features to serve models at scale (e.g. threads and workers in TorchServe)
serving frameworks also provide the features to speed up the development (and to not reinvent the wheel): model versioning, logs, metrics etc.
message queuing service can be easily added to scale the solution
dev and ML/DS teams can work more independently

It’s not the complete list but just few reasons to think about serving ML models with dedicated frameworks.

Hope you could find some helpful and practical stuff in this post.