Google Federated Learning and AI

Confidentiality and artificial intelligence

Alex Moltzau
Towards Data Science
11 min readJul 16, 2019

--

I have heard mention of federated learning before. However diving into Semi-Supervised Learning the last few days made me aware of this concept and its usage. In this article I will give a brief explanation of the concept without going into extensive technical details. Facebook was recently hit by a $5 billion fine by the Federal Trade Commission due to its negligent management of user data. Google has equally faced billion dollar formal charges from the European Commission. How can these companies improve? Google proposes part of a solution.

Due to rising concerns over privacy in communications technology and for security purposes it is great to see Google wishes to address this issue through this concept. I would now like to explore it further, but first I would like to share part of a cartoon from Google on Federated Learning:

Thus starts the cartoon on Federated Learning by Google. I would really recommend you to check out the full version in the link within the previous sentence to read the whole story.

According to the short stub in the Machine Learning page on Wikipedia:

Federated learning is a new approach to training machine learning models that decentralizes the training process, allowing for users’ privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, Gboard uses federated machine learning to train search query prediction models on users’ mobile phones without having to send individual searches back to Google.

Federated learning in short: is about training a centralised model on decentralised data.

So why is this important?

Privacy and Artificial Intelligence

I decided to check out what Privacy International has to say about artificial intelligence.

Privacy International (PI) is a charity that challenges the governments and companies that want to know everything about individuals, groups, and whole societies. The future PI wants is one where people are in control of their data and the technology they use, and governments and companies are no longer able to use technology to monitor, track, analyse, profile, and ultimately, manipulate and control us. But we have to fight for that future.

They have a page on their website on the topic of artificial intelligence. I will draw upon a few bullet points based or copied (italics) from their short page:

  • Re-identification and de-anonymisation: AI applications can be used to identify and thereby track individuals across different devices, in their homes, at work, and in public spaces.
  • Opacity and secrecy of profiling: Some applications of AI can be opaque to individuals, regulators, or even the designers of the system themselves, making it difficult to challenge or interrogate outcomes.
  • Data exploitation: People are often unable to fully understand what kinds and how much data their devices, networks, and platforms generate, process, or share. As we bring smart and connected devices into our homes, workplaces, public spaces, and even bodies, the need to educate the public about such data exploitation becomes increasingly pressing.

There is an asymmetry in this landscape where uses of AI for purposes like profiling, or to track and identify people across devices and even in public spaces, amplify a divide. So what is their proposed solution?

  1. Make the field of AI and the usage of machine learning techniques subject to the minimum requirement of respecting, promoting, and protecting international human rights standards.
  2. Ensure AI protect individuals from the risks posed by reviewing existing laws and if necessary amend them, to address the effects of new and emerging threats to privacy.

Jumping back to the Google cartoon, how is this solved within Federated Learning?

Secure aggregation is an interactive cryptographic protocol for computing sums of masked vectors, like model weights. It works by coordinating the exchange of random masks among pairs of participating clients, such that the masks cancel out when a sufficient number of inputs are received. To read more about secure aggregation, see Practical Secure Aggregation for Privacy-Preserving Machine Learning.

The Malpractice of Track and Tell

Do data scientists, engineers, software, hardware or sellers in the technology sector have any sort of responsibility? Yes, so let me bring in the idea of malpractice.

Malpractice: in the law of torts, malpractice, also known as professional negligence, is an “instance of negligence or incompetence on the part of a professional”

This is notoriously hard to claim, however the idea of responsibility is an interesting one. We can seriously question whether this malpractice or professional negligence is what gives the technology industry such a bad reputation for handling data. It would be great if there were some professional standards in regards to how you approach the data [read: people’s trusted information] that you are handling.

How do you feel about Walmart knowing you better? Walmart has extensive market research, however we can question if their new cameras with machine learning will be used only to track theft. Walmart revealed it has been tracking checkout theft with AI-powered cameras in 1,000 stores.

Everseen CEO Alan O’Herlihy said the company’s technology is designed to reduce friction at checkouts and “digitize” checkout surveillance.

Ireland-based Everseen is one of several companies supplying Walmart with the technology for its Missed Scan Detection program.YouTube/Everseen from Business Insider

This picture is of course a wonderful case of anthropomorphism as I have mentioned in earlier articles — wanting to display the machine as human. There is of course no human-like machine staring at your images although it looks like an embodiment of a series of catchy words such as robotics, robotic process automation (RPA), autonomous, etc.

A new company Near just raised $100M for an AI that merges online and offline behavior to build consumer profiles.

One of the Holy Grails in the world of advertising and marketing has been finding a way to accurately capture and understand what consumers are doing throughout the day, regardless of whether it’s a digital or offline activity. […] Near — which has built an interactive, cloud-based AI platform called Allspark that works across 44 countries to create anonymised, location-based profiles of users (1.6 billion each month at present) based on a trove of information that it sources and then merges from phones, data partners, carriers and its customers, but which it claims was built “with privacy by design”

So how about tracking how you feel, aggregating it and selling it without your consent? At least not your clear consent (read term and conditions).

There is no current case of consent, as such this may strictly not be legal in the EU, yet I am unsure. There may be some grey areas that some actors can exploit, and I am not a lawyer.

Consent occurs when one person voluntarily agrees to the proposal or desires of another. This vague definition is not desirable in the realm of tech as it seems most companies take it as consent for you to enter their platform.

GDPR Consent: Processing personal data is generally prohibited, unless it is expressly allowed by law, or the data subject has consented to the processing. While being one of the more well-known legal bases for processing personal data, consent is only one of six bases mentioned in the General Data Protection Regulation (GDPR). The others are: contract, legal obligations, vital interests of the data subject, public interest and legitimate interest as stated in Article 6(1) GDPR.

There may be more requirements now to state what data you gather and for those who sell data to declare it. Yet Walmart and Near may be able to circumvent this somehow. It is notoriously hard to understand the landscape of sellers of data and what has been sold to whom. Yet as a heuristic rule of thumb we can could speculate whether this is too much to ask.

Ask permission before you:

  1. Use data
  2. Aggregate data
  3. Store data
  4. Sell data

This seems oft ignored simply out of the notion of expediency — AKA move fast and break things which seems to have become synonymous with handling of data. A practice that last year (2018) was described by The Quantified VC to be very much alive.

Facebook may in July 2019 have been hit by a $5 billion fine by the FTC. It did however say in April that the company set aside $3 billion expecting this to happen.

Specifically, regulators sought to determine whether Facebook violated the terms of an agreement it made with the authority in 2011, where it pledged to not collect and share personal data without users’ consent.

We can seriously question whether this malpractice or professional negligence by people in companies and systemic issues of handling data is what gives the technology industry such a bad reputation. Google as an example has to date been fined more than $8 billion by the European commission (although this only makes a small dent in their earnings).

One of Amazon’s best selling product is their new alarm bell with a camera. This could however have unintended adverse consequences. Private landlords have started using cheap camera software to track license plates. Even some promoters of AI and ethics speak of it as the ‘AI algorithm’ and marrying ‘it’ with ethical conduct. Companies on different level of revenue as well as individuals are applying these technologies and there is a need to determine how to handle the data of different individuals and when that line is crossed.

We have to start discussing the people more than the technology although transparent algorithms are to be strived for. A wise statement was made by Cassie Kozyrkov in this regard talking of bias in algorithms:

Textbooks reflect the biases of their authors. Like textbooks, datasets have authors. They’re collected according to instructions made by people. […] Bias doesn’t come from AI algorithms, it comes from people.

Bias: inclination or prejudice for or against one person or group, especially in a way considered to be unfair.

By saying this I do not mean we can blame the people and not the companies that they work within or for. However there has to be a professional accountability as to the practices pertaining to handling data. We cannot keep scapegoating ‘the algorithm(s)’ or the ‘AI’ in this regard.

Federated Computing and Learning

A presentation from Google regarding Tensorflow on the 9th of May 2019 showed both an overview of federated computing and federated learning. You can see the video version of the presentation here, however you could scroll on for a few of the slides presented here:

So to sum up the four principles of federated learning privacy technology are:

  1. On-device datasets: keep raw data, expire old data, expire at rest.
  2. Federated aggregation: combine reports from multiple devices.
  3. Secure aggregation: compute a (vector)-sum of encrypted device reports. A practical protocol with security guarantees; communication efficiency and dropout tolerance.
  4. Federated model averaging: many steps of gradient descent on each device.

A federation is a group of computing or network providers agreeing upon standards of operation in a collective fashion. The term may be used when describing the inter-operation of two distinct, formally disconnected, telecommunications networks that may have different internal structures. The term may also be used when groups attempt to delegate collective authority of development to prevent fragmentation. In a telecommunication interconnection, the internal modus operandi of the different systems is irrelevant to the existence of a federation.

In software engineering fragmentation or a project fork happens when developers take a copy of source code from one software package and start independent development on it, creating a distinct and separate piece of software. The term often implies not merely a development branch, but also a split in the developer community, a form of schism (division between people).

A modus operandi (often shortened to M.O.) is someone’s habits of working, particularly in the context of business or criminal investigations, but also more generally. It is a latin phrase, approximately translated as mode of operating.

As previously mentioned in my previous article on Semi-Supervised Machine Learning (SSL) where I heard of the term federated learning: “Me as a writer explaining articles by members of Google Brain and blog posts from Google AI may seem like a teenager commenting on a professional sports team.”

In this sense if there is anything that was unclear it may have been unclear to me and if you want to explain or discuss I am always happy to do so.

On a final note I do realise this is a sales-pitch from Google and it has to be taken with a grain of salt and perhaps some pepper.

This is day 44 of #500daysofAI

What is #500daysofAI?
I am challenging myself to write and think about the topic of artificial intelligence for the next 500 days with the #500daysofAI. Learning together is the greatest joy so please give me feedback if you feel an article resonates.

--

--