Efficient reinforcement learning on the edge

With orthogonal persistence we can implement sequential learning on Edge devices

Published in

Towards Data Science

5 min readNov 2, 2020

In previous post(s) we have shown how one can exploit the differences between model training (i.e., requiring all the data and often many passes through the dataset) and prediction (i.e., based on a single function with pre-trained parameters) to efficiently deploy Machine Learning (ML) and Artificial Intelligence (AI) models. By transpiling trained models to WebAssembly it is possible to deploy supervised and unsupervised models efficiently to the cloud and to Edge devices. We have been able to deploy models on devices that have less than 64Kb in available memory. Pretty cool. However, we often get asked how models can be updated efficiently (and preferably locally): An easy way to update the state of the model on the edge device itself would allow for efficient sequential learning and reinforcement learning applications.

An easy way to update the state of a model on the edge device itself would allow for efficient sequential learning and reinforcement learning applications.

In this post we explain how we can use orthogonal persistence to update the state of a WebAssembly binary to enable such functionality. We will first explain the conceptual (ML/AI) view to model updating, and subsequently focus on the implementation in WebAssembly. Finally, we will discuss the possibilities that orthogonal persistence offers.

Updating a model’s state

As we described earlier, after training, most machine learning models consist of a simple (albeit often high-dimensional) mathematical function combined with a set of parameters. For example, in a neural network, the entries of the weight matrices involved constitute the parameters that are obtained by training the model. Somewhat abstractly, after training, we can define a model as a function f() that maps an input x to an (expected) output y. We can make explicit that the function’s behavior is dependent on the state S1 of the parameters:

y = f(x; S1)

where S1 would contain (e.g.,) all the weights necessary.

In many situations it is useful to be able to update the state S1 over time. This might be useful when (e.g.,) there is concept drift and the performance of the model drops over time, or when a model which is deployed to an edge device needs to be updated to reflect the local situation. In such cases (and more, see our discussion below), it would be useful to update S1 to S2 — i.e., a new state of the parameters involved — without much trouble. Furthermore, when a model is deployed, its updated state should be maintained: i.e., when running on an edge device the updated model should persist.

Although conceptually easy — we just store the new state S2 somewhere — the efficient implementation of persistent state changes on edge devices when deploying models using WebAssembly is somewhat tricky. It is however very well possible.

Although conceptually easy , the efficient implementation of persistent state changes on edge devices is in practice often tricky. It is however very well possible.

Updating the state of a WebAssembly binary

We generally deploy models to edge devices using WebAssembly. When thinking about persistent updates of such models it is useful to have a reasonable understanding of the general process involved:

We deploy WebAssembly Binaries — which effectively contain f(.;S) — to edge devices. These binaries are stored “on disk” on the device including their state S.
The edge device contains a WebAssembly runtime which is an application that loads the binary and writes its so-called data section (see this article for more information regarding the anatomy of a WebAssembly binary) to its (linear) memory: this data section would generally be used to store the state S of the model. The runtime ensures the execution of a prediction on an edge device (in our case when the exported predict() function of the .WASM binary is called, see this tutorial).

Generally, it is relatively easy to transfer data to and from the WebAssembly binary: we do so routinely when passing a pointer to the location in memory where (e.g.,) the feature vector is stored (see this tutorial for details — the feature vector is the x in the y=f(x;S) notation introduced above). However, passing a new state S that persists is much trickier: it is not allowed to write directly to the existing data section, nor would doing so ensure its persistence on disk.

We solve this effectively by automatically adding a new custom data section to all our WebAssembly binaries that need persistent updates. This custom data section is functionally the same as the standard WebAssembly data section with the exception that this time it can be updated: we add an exported update() function to the deployed .WASM binary that contains the logic necessary for any updates to the model’s state. Thus, we can now update the state from S1 to S2 to ensure the update of a model.

Simply updating the state in the linear memory is however not sufficient for persistence; we need to make sure that any change in the state is also reflected on disk (i.e., when turning the device on and off, we would like S2 to be retained). This is solved by extending the standard WebAssembly runtime: our edge runtimes actively monitor the state S in linear memory. When it changes, the runtime overwrites on disk the custom data section of the WebAssembly binary involved. Hence, we now have a new, and persistent, state.

Thus, by extending WebAssembly’s data structure and the default runtime, it is possible to allow for efficient persistent updates of ML/AI models on the edge.

Thus, by extending WebAssembly’s data structure and the default runtime, it is possible to allow for efficient persistent updates of ML/AI models on the edge.

The possibilities of persistent state updates

We already sketched some scenarios when (local) updates of deployed WebAssembly binaries might be of interest: in cases of concept drift or local variation it might be useful to update a deployed model every now and then using a new state S’. However, quick and persistent updates allow for much richer applications:

Quick and persistent updates of the state of a model allow for sequential learning of models on edge devices. For example, when models can be expressed in summation form, a deployed model can easily be updated with each new (labeled) datapoint that becomes available.
Quick and persistent updates of the state of a model allow for reinforcement learning on edge devices (thus extending stand-alone WebAssembly deployment beyond supervised and unsupervised models). For example, a Multi Armed Bandit policy can be implemented using the predict() and update() functions iteratively (and adding some active exploration).

We will add persistent updates soon to our WebAssembly Javascript runtime thus allowing you to see the actual implementation of the “runtime overwrite to disk” in this setting. However, we hope to have provided a valuable sketch of the approach in the above post.

Disclaimer

It’s good to note my own involvement here: I am a professor of Data Science at the Jheronimus Academy of Data Science and one of the cofounders of Scailable. Thus, no doubt, I have a vested interest in Scailable; I have an interest in making it grow such that we can finally bring AI to production and deliver on its promises. The opinions expressed here are my own.