Can we really run AI in the browser?

Let’s explore with tensorflow.js

Thomas Detry
Towards Data Science

--

The aims of this post is to see how to run deep learning models in the browser and what would be the pros and cons of that approach. The code to reproduce can be found on github.

If we define Artificial Intelligence as a big trained model. This model being a function f(), that takes inputs and generates output in the form of results,

where f() can be represented as one of the following:

neural network graphs
Graphical representation of a neural network.
neural network in Keras
Keras abstraction of a neural network.

… how easy would it be to have this kind of models running in our browser? And what would be the pros and cons of this approach? Let’s explore by trying.

Experiment

In order to do so, I used tensorflow.js. I wanted to use a pre-trained model, which means that someone already tweaked the function and fed it with data to get the parameters. The parameters can be viewed as a big list of numbers that will determine how the inputs get converted to an output. This person is kindly sharing the parameters and function. In the case of Tensorflow, this can happen in TensorHub. Afterwards, I could have decided on improving the model, this would be perceived as transfer learning, or to use it as-is for the task at hand.

Let’s take a three steps approach to have this model running in our browser.

  1. Understand the model in Python and export it to Javascript

I wanted to give a try at BERT Questions&Answering, but I encountered a couple of issues. The first one is the size of the model. I do not think my users will wait for the 1gb of parameters to download. In that regard, it looks like work is being done in reducing the size of the models with Tensorflow Lite. It focuses on size/performance optimisation by using quantization techniques.

Secondly, the tokeniser which is the piece of code that turns the letters, words, sentences into inputs the function can understand. It relies on HuggingFace, a library written in Python. I was not really in favor of rewriting it in Javascript.

Print-screen from TensorHub.

So instead of relying on TensorHub, I decided to rely on this tutorial. I followed all the steps to quickly train a small model from scratch. The model’s task is to perform sentiment analysis and to turn short sentences into a sentiment score ranging from 0 (negative sentiment) to 1 (positive sentiment). Afterwards, I converted it as a tensorflow.js model.

// Python - save modelimport tensorflowjs as tfjs
tfjs.converters.save_keras_model(model, tfjs_target_dir)

Then following the documentation, “simply” do this in your Javascript code to load the model.

// JavaScript - import model

import * as tf from '@tensorflow/tfjs';

const model = await tf.loadLayersModel('https://foo.bar/tfjs_artifacts/model.json');

However, I had a lot of issues on the way (versioning, layers naming, layers not supported in tensorflow.js,…). If you dig in the code, you’ll see that I decided to rely on the same model but converted to Javascript by Google.

So tensorflow.js is there, but it does not have all the capability we find in Tensorflow.

2. Try the model in Javascript

Once you get your model in the browser there is still some work remaining. You need to make sure the inputs you receive fit with what is expected by the model and to handle its results. There I got help from the ml5-library. I did not use the library but it greatly helped in understanding that in order to get the results you need to go through those steps.

// JavaScript - get predictionconst predictOut = model.predict(input);const score = predictOut.dataSync()[0];predictOut.dispose();input.dispose();

Which is linked to the Tensorflow story about getting numbers to tensors and vice versa.

3. Build a small UI and expose it

Once the Javascript part of the model was cracked. I wrapped it into a small React application and used surge.sh to have it quickly served. And here is the result, a Tensorflow model running directly in your browser with zero backend call for the predictions.

In order to reproduce:

// bash# clone the repository and cd to directory
git clone https://github.com/tdetry/tensorflowJS-in-React
cd tensorflowJS-in-React/
# install the dependancies and build the application
npm i
npm run build
# expose the build to the internet thanks to surge
cd build
surge

Retrospective

Theoretically, as long as the model does not contain too many parameters it should work in the browser. However, we see models of increasing size, for example, the 9.4 billion parameters for the new Facebook chatbot. The time it would take to download all the parameters would be an issue. But with such a big model, getting a prediction in time would also quickly become an issue, given that the input needs to be “multiplied” by all those parameters in order to get an output. In that regards, technology such as Web Assembly could be used to speed up computation compared to Javascript. As it is already the case for Google Earth.

Question is, will it overcome the network latency issue (having the computation being performed in the backend then sent to the frontend)?

Finally, there is the privacy and intellectual property part. From a privacy point of view having the model running in the browser is awesome for the user, because he does not have to send his personal data. Those are kept in his browser. I see two nice applications. For example, check if you are eligible for something (a loan?) without sending your full identity. Also, you could use a model that allows you to browse with the movement of your eyes or hands without sending your face and background.

However, I see an intellectual property risk as the full model would be sent out to the user, the user could transgress the model license and use it for other purposes.

Conclusion

Let’s recap.

Pro’s:

  • Data privacy. You don’t need to send your data to get a prediction.
  • Network latency at runtime. You do not need to rely on a network to receive the prediction. Except for, well, downloading the model first.

Con’s:

  • Network latency to get the model. The bigger the model, the more time to download it.
  • Model IP. Your full model is exposed.
  • Prediction performance. The bigger the model, the more computations to be run in the browser.
  • Nice frameworks to serve models over endpoints such as TensorFlow Serving exist.
  • Javascript support for ML is fairly limited.

I hope you enjoy the reading, if you have questions, remarks, I’ll happily answer in the comments.

Thomas

--

--