Introduction to Federated Learning and Challenges

A brief intro to Federated learning and challenges

Kelvin
Towards Data Science

--

The next generation of artificial intelligence is built upon the core idea revolving around “data privacy”. When data privacy is a major concern and we don’t trust anyone withholding our data we can turn to federated learning for building privacy-preserving AI by building intelligent systems privately.

Federated learning is about moving computations to data. Where a globally shared model is bought to where the data is e.g. Smartphones. By moving the model to the device, we can collectively train a model as a whole.

With this concept in mind, anyone can take part in Federated learning on their devices either directly or indirectly, E.g. Edge devices such as smartphones and IoT devices can benefit from on-device data without the data ever leaving the device especially for computationally constrained devices where communication is a bottleneck with smaller devices.

The concept of moving computations to data is a powerful concept in terms of building any intelligent system while protecting the privacy of any individuals.

Figure 1. Your phone personalizes the model locally, based on your usage A). Many users’ updates are aggregated B) to form a consensus change C) to the shared model, after which the procedure is repeated. (Figure by Google AI Blog)

Federated learning also comes in three categories such as “Horizontal federated learning”, “Vertical federated learning”, and “Federated transfer learning”.

Horizontal federated learning uses datasets with the same feature space across all devices, this means that Client A and Client B has the same set of features as shown in a) below. Vertical federated learning uses different datasets of different feature space to jointly train a global model as shown in b) below. One example would be Client A (Amazon) has information about the customer’s movie purchases on Amazon, and Client B (IMDB) has information about the customer's movie reviews, using these two sets of datasets from different domains allows one to serve the customers better using movie reviews information (IMDB) to provide better movie recommendation to the customers browsing movies in Amazon. Lastly, Federated transfer learning is vertical federated learning utilized with a pre-trained model that is trained on a similar dataset for solving a different problem. One such example of Federated transfer learning is to train a personalised model e.g. Movie recommendation for the user’s past browsing behavior.

Figure 2. Categorization of Federated Learning. a) Horizontal learning, b) Vertical learning, c) Transfer learning (Figure by link)

How Federated learning works

Federated learning revolves around the Federated averaging algorithm called “FedAvg” [3]. FedAvg is the very first vanilla Federated learning algorithm formulated by Google [3] for solving Federated learning problems. Since then, many variants of FedAvg algorithms such as “FedProx”, “FedMa”, “FedOpt”, “Scaffold” etc.. has been developed to address many of the Federated learning problems in [2].

The following describes how FedAvg algorithm work at a high level.

At each round of FedAvg, the aim is to minimize the objective of the global model w which is just the sum of the weighted average of the local device loss.

  1. A subset of clients/devices are sampled randomly.
  2. The server broadcasts its global model to each client.
  3. In parallel, the clients run Stochastic Gradient descent (SGD) on their own loss function and send the resulting model to the server for aggregation.
  4. The server then updates its global model as the average of these local models.
  5. The process is then repeated for n such communications rounds.

Challenges in Federated learning

With such a useful technology comes with an abundance of challenges that need to be addressed [2, 4]. Some of the most important challenges which I think need to be addressed were the following four points curated from [4] (The list is by no means exhaustive).

  1. Trade-off between noise and accuracy: Using Differential Privacy (DP), we can add noise to the data to enhance privacy protection. However, with DP we sacrificed the performance of the model. Hence a trade-off is required for adding the right amount of noise and not compromise the performance of the model.
  2. System and Statistical heterogeneity: Training on heterogeneous devices is a challenge, it is important to ensure federated learning scale effectively on all devices regardless of the type of devices. The dissimilarity of statistical information refers to the incapability of one device to derived the global statistical pattern such that the populations, samples, or results are different from one device as compared to the other devices.
  3. Communication bottlenecks: The communication cost for bringing the models to the device should be moderately low since it may impact the FL environment where one device can be crippled due to communication bottlenecks, which in turn stall the Federated training process. There are several works on addressing communication bottlenecks such as dropping stragglers (devices that failed to compute training within the specified time-window), and model compression/quantization to reduce the bandwidth cost.
  4. Poisoning: Poisoning comes in two forms, 1. Data Poisoning: During a Federated training process, multiple clients can participate by contributing their on-device training data, and it is difficult to detect/prevent malicious clients from sending malicious/fake data to poison the training process which in turn poison the model. 2. Model Poisoning, Contrary to data poisoning, malicious clients modify the received model by tampering its gradient/parameters before sending back to the central server for aggregation, as a result, the global model can be severely poisoned with invalid gradients during the aggregation process.
  5. Trade-off between efficiency and privacy: Using Secure Multi-Computation (SMPC) and Differential Privacy (DP) boost the privacy protection capability in Federated learning, however, such protection comes with a trade-off between cost and efficiency. Using SMPC, clients are to encrypt the parameters of the models before sending back to the central server, therefore additional computational resources are required for encryption which will compromise the efficiency of training the model. With DP, noise is added to the model and data, hence some accuracy is lost. Therefore finding a suitable trade-off between SMPC and DP is an open challenge in Federated learning.

Conclusion

Federated learning is still a relatively new field with many research opportunities for making privacy-preserving AI better. This includes challenges such as system heterogeneity, statistical heterogeneity, privacy concerns, and communication efficiency, etc.. This brings forth many open problems in Federated learning that needs to be addressed as a whole before Federated learning can be widely adopted by the industry.

References:

[1] Federated Learning: Collaborative Machine Learning without Centralized Training Data

[2] Advances and Open Problems in Federated Learning

[3] Communication-Efficient Learning of Deep Networks from Decentralized Data

[4] A survey on security and privacy of federated learning

--

--