Introduction to Federated Learning and Privacy Preservation

Federated Learning and Additive Secret Sharing using the PySyft framework.

Published in

Towards Data Science

10 min readJul 16, 2019

Federated Learning involves training on a large corpus of high-quality decentralized data present on multiple client devices. The model is trained on client devices and thus there is no need for uploading the user’s data. Keeping the personal data on the client’s device enables them to have direct and physical control of their own data.

The server trains the initial model on proxy data available beforehand. The initial model is sent to a select number of eligible client devices. The eligibility criterion makes sure that the user’s experience is not spoiled in an attempt to train the model. An optimal number of client devices are selected to take part in the training process. After processing the user data, the model updates are shared with the server. The server aggregates these gradients and improves the global model.

All the model updates are processed in memory and persist for a very short period of time on the server. The server then sends the improved model back to the client devices participating in the next round of training. After attaining a desired level of accuracy, the on-device models can be tweaked for the user’s personalization. Then, they are no longer eligible to participate in the training. Throughout the entire process, the data does not leave the client’s device.

HOW IS THIS DIFFERENT FROM DECENTRALIZED COMPUTATION?

Federated learning differs from decentralized computation as:

Client devices (such as smartphones) have limited network bandwidth. They cannot transfer large amounts of data and the upload speed is usually lower than the download speed.
The client devices are not always available to take part in a training session. Optimal conditions such as charging state, connection to an unmetered Wi-Fi network, idleness, etc. are not always achievable.
The data present on the device get updated quickly and is not always the same. [Data is not always available.]
The client devices can choose not to participate in the training.
The number of client devices available is very large but inconsistent.
Federated learning incorporates privacy preservation with distributed training and aggregation across a large population.
The data is usually unbalanced as the data is user-specific and is self-correlated.

Federated Learning is one instance of the more general approach of “bringing the code to the data, instead of the data to the code” and addresses the fundamental problems of privacy, ownership, and locality of data.

In Federated Learning:

Certain techniques are used to compress the model updates.
Quality updates are performed rather than simple gradient steps.
Noise is added by the server before performing aggregation to obscure the impact of an individual on the learned model. [Global Differential Privacy]
The gradients updates are clipped if they are too large.

INTRODUCING PYSYFT

We will use PySyft to implement a federated learning model. PySyft is a Python library for secure and private deep learning.

INSTALLATION

PySyft requires Python >= 3.6 and PyTorch 1.1.0. Make sure you meet these requirements.

BASICS

Let’s start by importing the libraries and initializing the hook.

This is done to override PyTorch’s methods to execute commands on one worker that are called on tensors controlled by the local worker. It also allows us to move tensors between workers. Workers are explained below.

Jake has: {}

Virtual workers are entities present on our local machine. They are used to model the behavior of actual workers.

To work with workers distributed in a network, PySyft offers two types of workers:

Network socket workers
Web socket workers

Web sockets workers can be instantiated from the browser with each worker on a separate tab.

Here, Jake is our virtual worker which can be considered as a separate entity on a device. Let’s send him some data.

x: (Wrapper)>[PointerTensor | me:50034657126 -> jake:55209454569]
Jake has: {55209454569: tensor([1, 2, 3, 4, 5])}

When we send a tensor to Jake, we are returned a pointer to that tensor. All the operations will be executed with this pointer. This pointer holds information about the data present on another machine. Now, x is a PointTensor.

Use the get() method to get back the value of x from Jake’s device. However, by doing so, the tensor on Jake’s device gets erased.

x: tensor([1, 2, 3, 4, 5])
Jake has: {}

When we send the PointTensor x (pointing to a tensor on Jake’s machine) to another worker - John, the whole chain is sent to John and a PointTensor pointing to the node on John’s device is returned. The tensor is still present on Jake’s device.

x: (Wrapper)>[PointerTensor | me:70034574375 -> john:19572729271]
John has: {19572729271: (Wrapper)>[PointerTensor | john:19572729271 -> jake:55209454569]}
Jake has: {55209454569: tensor([1, 2, 3, 4, 5])}

Figure 2: Using the send() method on a PointTensor [Step 2].

The clear_objects() method removes all the objects from a worker.

Jake has: {}
John has: {}

Suppose we wanted to move a tensor from Jake’s machine to John’s machine. We could do this by using the send() method to send the ‘pointer to tensor’ to John and let him call the get() method. PySfyt provides a remote_get() method to do this. There’s also a convenience method - move(), to perform the operation.

(Wrapper)>[PointerTensor | me:86076501268 -> john:86076501268]
Jake has: {}
John has: {86076501268: tensor([ 6,  7,  8,  9, 10])}

Figure 3: Using the move() method on a PointTensor. [Step 2]

STRATEGY

We can perform federated learning on client devices by following these steps:

send the model to the device,
do normal training using the data present on the device,
get back the smarter model.

However, if someone intercepts the smarter model while it is shared with the server, he could perform reverse engineering and extract sensitive data about the dataset. Differential privacy methods address this issue and protect the data.

When the updates are sent back to the server, the server should not be able to discriminate while aggregating the gradients. Let’s use a form of cryptography called additive secret sharing.

We want to encrypt these gradients (or model updates) before performing the aggregation so that no one will be able to see the gradients. We can achieve this by additive secret sharing.

ADDITIVE SECRET SHARING

In secret sharing, we split a secret x into a multiple number of shares and distribute them among a group of secret-holders. The secret x can be constructed only when all the shares it was split into are available.

For example, say we split x into 3 shares: x1, x2, and x3. We randomly initialize the first two shares and calculate the third share as x3 = x - (x1 + x2). We then distribute these shares among 3 secret-holders. The secret remains hidden as each individual holds onto only one share and has no idea of the total value.

We can make it more secure by choosing the range for the value of the shares. Let Q, a large prime number, be the upper limit. Now the third share, x3, equals Q - (x1 + x2) % Q + x.

Shares: (6191537984105042523084, 13171802122881167603111, 4377289736774029360531)

The decryption process will be shares summed together modulus Q.

Value after decrypting: 3

Figure 5: Decrypting x from the three shares.

HOMOMORPHIC ENCRYPTION

Homomorphic encryption is a form of encryption that allows us to perform computation on encrypted operands, resulting in encrypted output. This encrypted output when decrypted matches with the result obtained by performing the same computation on the actual operands.

The additive secret sharing technique already has a homomorphic property. If we split x into x1, x2, and x3, and y into y1, y2, and y3, then, x+y will be equal to the value obtained after decrypting the summation of the three shares: (x1+y1), (x2+y2) and (x3+y3).

Shares encrypting x: (17500273560307623083756, 20303731712796325592785, 9677254414416530296911)
Shares encrypting y: (2638247288257028636640, 9894151868679961125033, 11208230686823249725058)
Sum of shares: (20138520848564651720396, 6457253737716047231095, 20885485101239780021969)
Sum of original values (x + y): 14

We are able to calculate the value of the aggregate function - addition, without knowing the values of x and y.

SECRET SHARING USING PYSYFT

PySyft provides a share() method to split the data into additive secret shares and send them to the specified workers. For working with decimal numbers, fix_precision() method is used to represent the decimals as integer values under the hood.

Jake has: {}
John has: {}
Secure_worker has: {}

The share() method is used to distribute the shares among several workers. Each worker specified then receives a share and has no idea of the actual value.

x: (Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:61668571578 -> jake:46010197955]
	-> (Wrapper)>[PointerTensor | me:98554485951 -> john:16401048398]
	-> (Wrapper)>[PointerTensor | me:86603681108 -> secure_worker:10365678011]
	*crypto provider: me*
Jake has: {46010197955: tensor([3763264486363335961])}
John has: {16401048398: tensor([-3417241240056123075])}
Secure_worker has: {10365678011: tensor([-346023246307212880])}

As you can see, x now points to the three shares present on Jake’s, John’s and Secure_worker’s machine respectively.

Figure 6: Encryption of x into three shares.

Figure 7: Distributing the shares of x among 3 VirtualWorkers.

(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:86494036026 -> jake:42086952684]
	-> (Wrapper)>[PointerTensor | me:25588703909 -> john:62500454711]
	-> (Wrapper)>[PointerTensor | me:69281521084 -> secure_worker:18613849202]
	*crypto provider: me*

Figure 8: Encryption of y into 3 shares.

Figure 9: Distributing the shares of y among 3 VirtualWorkers.

(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:42086114389 -> jake:42886346279]
	-> (Wrapper)>[PointerTensor | me:17211757051 -> john:23698397454]
	-> (Wrapper)>[PointerTensor | me:83364958697 -> secure_worker:94704923907]
	*crypto provider: me*

Notice that the value of z obtained after adding x and y is stored in the three workers’ machines. z is also encrypted.

Figure 10: Performing computation on encrypted inputs.

tensor([14])

Figure 11: Decryption of result obtained after computation on encrypted inputs.

The value obtained after performing addition on encrypted shares is equal to that obtained by adding the actual numbers.

FEDERATED LEARNING USING PYSYFT

Now, we’ll implement the federated learning approach to train a simple neural network on the MNIST dataset using the two workers: Jake and John. There are only a few modifications necessary to apply the federated learning approach.

Import the libraries and modules.

2. Load the dataset.

In real-life applications, the data is present on client devices. To replicate the scenario, we send data to the VirtualWorkers.

Notice that we have created the training dataset differently. The train_set.federate((jake, john)) creates a FederatedDataset wherein the train_set is split among Jake and John (our two VirtualWorkers). The FederatedDataset class is intended to be used like the PyTorch’s Dataset class. Pass the created FederatedDataset to a federated data loader “FederatedDataLoader” to iterate over it in a federated manner. The batches then come from different devices.

3. Build the model

4. Train the model

Since the data is present on the client device, we obtain its location through the location attribute. The important additions to the code are the steps to get back the improved model and the value of the loss from the client devices.

Epoch:  1 [    0/60032 (  0%)]	Loss: 2.306809
Epoch:  1 [ 6400/60032 ( 11%)]	Loss: 1.439327
Epoch:  1 [12800/60032 ( 21%)]	Loss: 0.857306
Epoch:  1 [19200/60032 ( 32%)]	Loss: 0.648741
Epoch:  1 [25600/60032 ( 43%)]	Loss: 0.467296
...
...
...
Epoch:  5 [32000/60032 ( 53%)]	Loss: 0.151630
Epoch:  5 [38400/60032 ( 64%)]	Loss: 0.135291
Epoch:  5 [44800/60032 ( 75%)]	Loss: 0.202033
Epoch:  5 [51200/60032 ( 85%)]	Loss: 0.303086
Epoch:  5 [57600/60032 ( 96%)]	Loss: 0.130088

5. Test the model

Test set: Average loss: 0.2428, Accuracy: 9300/10000 (93%)

That’s it. We have trained a model using the federated learning approach. When compared to traditional training, it takes more time to train a model using the federated approach.

PROTECTING THE MODEL

Training the model on the client device protected the user’s privacy. But, what about the model’s privacy? Downloading the model can threaten the organization’s intellectual property!

Secure Multi-Party Computation, which consists of secret additive sharing, provides us with a way to perform model training without disclosing the model.

To protect the weights of the model, we secret share the model among the client devices.

For this to work, some changes are to be made to the above federated learning example.

As illustrated in the SECRET SHARING USING PYSYFT section, now the model, the inputs, model outputs, weights, etc. will be encrypted as well. Working on encrypted inputs will yield encrypted output.

REFERENCES

[1] Theo Ryffel, Andrew Trask, Morten Dahl, Bobby Wagner, Jason Mancuso, Daniel Rueckert, Jonathan Passerat-Palmbach, A generic framework for privacy preserving deep learning (2018), arXiv

[2] Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, Daniel Ramage, Federated Learning for Mobile Keyboard Prediction (2019), arXiv

[3] Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečný, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, Jason Roselander, Towards Federated Learning at Scale: System Design (2019), arXiv

[4] Brendan McMahan, Daniel Ramage, Federated Learning: Collaborative Machine Learning without Centralized Training Data (2017), Google AI Blog

[5] Differential Privacy Team at Apple, Learning with Privacy at Scale (2017), Apple Machine Learning Journal

[6] Daniel Ramage, Emily Glanz, Federated Learning: Machine Learning on Decentralized Data (2019), Google I/O’19

[7] OpenMind, PySyft, GitHub