TensorFlow Serving client. Make it slimmer and faster!

Published in

Towards Data Science

3 min readMar 6, 2018

TensorFlow Serving provides a neat way to deploy and serve the models in the production. I have described the deployment process previously here. Unfortunately, there are two problems I noticed much later, thanks to valuable comments. First — it takes too much time for a single prediction. And second — actually, there is no need to use TensorFlow in a client.

The bottleneck is a call to TensorFlow that creates a tensor protobuf (the complete code can be found here):

tf.contrib.util.make_tensor_proto(data, shape=[1])

All other things do not depend on TensorFlow and, merely, requires a generation of Python file from the protobufs.

The solution

I introduce a solution specifically for a prediction on images without changing the original protobufs. I found a great blog post where the author went further, copied and change original protobufs. That allows reduction of a code to a minimum and I would definitely go for it if I need prediction only.

Generate Python code from protobufs

We must create a TensorProto object that is described in the TensorFlow protobuf file with a certain type and shape. We can use our image data array as it is.

More concrete, it means that we have to generate Python files from the TensorFlow Core protobufs and use them directly instead of any wrappers. I have generated them and put into my repository here. You can do it yourself if you’d prefer (assuming you cloned the sample project):

# 1
cd <tensorflow serving source folder># 2
python -m grpc.tools.protoc ./tensorflow/tensorflow/core/framework/*.proto --python_out=<path to the project> --grpc_python_out=<path to the project> --proto_path=.# 3
python -m grpc.tools.protoc ./tensorflow/tensorflow/core/example/*.proto --python_out=<path to the project> --grpc_python_out=<path to the project> --proto_path=.# 4
python -m grpc.tools.protoc ./tensorflow/tensorflow/core/protobuf/*.proto --python_out=<path to the project> --grpc_python_out=<path to the project> --proto_path=.

Unfortunately, you need all this because of dependencies or you have to adjust the protobufs to your needs and get rid of unneeded dependencies.

Replace the TensorFlow code

Now we can replace a “standard” tensor protobuf creation

tf.contrib.util.make_tensor_proto(data, shape=[1])

with a following:

dims = [tensor_shape_pb2.TensorShapeProto.Dim(size=1)]
tensor_shape_proto = tensor_shape_pb2.TensorShapeProto(dim=dims)
tensor_proto = tensor_pb2.TensorProto(
    dtype=types_pb2.DT_STRING,
    tensor_shape=tensor_shape_proto,
    string_val=[data])

What is all about? First, we create a dimension object that matches our data — we have only one image to predict. Next, we initialize an appropriate tensor shape using the created dimension object. And last, we create the desired tensor protobuf of a type string with the tensor shape and our data. We have to put our data into string_val since we have a tensor of string. You can find other available types in the generated tensorflow/core/framework/types_pb2.py file.

Now we can set our tensor protobuf as a request input:

request.inputs['images'].CopyFrom(tensor_proto)

You can find a complete code here.

Performance

You can significantly improve a performance too. If you’d prefer to keep using a TensorFlow framework in the client, make the following change: import make_tensor_proto at the beginning and call it later.

...
from tensorflow.contrib.util import make_tensor_proto
...request.inputs['images'].CopyFrom(make_tensor_proto(data, shape=[1]))

If you get rid of using TensorFlow in the client, the performance improves automatically. On my system now it takes 3 milliseconds for a single image prediction instead of 300 milliseconds.

Summary

TensorFlow Serving client does not need a TensorFlow framework to make the requests. The only thing you need is a proper initialization of gRPC requests to the server. You can do that using original TensorFlow Core protobufs with the following generation of Python files and creation of an appropriate tensor protobuf object. If you want to have as less as possible from TensorFlow in your client, you need to adjust original protobufs to your particular needs.