The Royal Society report on Machine Learning (April 2017)

Julian Harris
Towards Data Science
10 min readApr 28, 2017

--

The Royal Society launched a (128 page!) report on machine learning. I’ve waded through it and taken some notes from it where there are things usefully different or conveyed in useful ways, that others may find useful. Here and there I’ve linked to other work I’ve found and added a few observations.

Purdah disclosure (up to June 9 2017): as I consult to civil service I need to be particularly sensitive to the UK snap election guidelines on politically-motived statements that could be seen to influencing the election. Where there are quotes they’re verbatim from the material and not my own opinion.

“Ensuring the best possible environment for the safe and rapid deployment of machine learning will be essential for enhancing the UK’s economic growth, wellbeing, and security, and for unlocking the value of ‘big data’. Action in key areas — shaping the data landscape, building skills, supporting business, and advancing research — can help create this environment.”

Extracting value from data

  • Creating a data environment to support machine learning. UK Biobank is a good example of this in practice.
  • Extending the lifecycle of open data requires open standards. “The Government has a key role to play in the creation of new open standards”

Creating value from machine learning

  • Schools and Universities need to be updated to focus on data skills and research. (And EQ). Note also see FT.com’s view that says that EQ is critical in schools too. So we have EQ + data literacy as the topmost valuable school skills.
  • “The UK’s approach to immigration should support the UK’s aim to be one of the best places in the world to research and innovate, and machine learning is an area of opportunity in support of this aim”

“Almost 90% of the world’s data is estimated to have been produced within the last five years

“The term ‘machine learning’ is not one with high salience for the public; research by the Royal Society and Ipsos MORI showed that only 9% of people recognise it. However, many people are familiar with specific applications of machine learning17, and interact with machine learning systems every day.”

Artificial intelligence

The term ‘artificial intelligence’ lacks a broadly agreed definition, but has variously been described as:

  • “[…automation of] activities that we associate with human thinking, activities such as decision-making, problem solving, learning…” (Bellman, 1978)
  • “The art of creating machines that perform functions that require intelligence when performed by people.” (Kurzweil, 1990)
  • “The study of the computations that make it possible to perceive, reason, and act.” (Winston, 1992)
  • “The branch of computer science that is concerned with the automation of intelligent behaviour.” (Luger and Stubblefield, 1993)
  • “…that activity devoted to making machines intelligent, and intelligence is that quality that enables an entity to function appropriately and with foresight in its environment.”

Developments in machine learning and AI

Key thing here is that AI as a concept has been around pretty much since the invention of computers, back in the 50s.

Only thing to add to the chart below is that in 2014 the Turing Test was won by a robot for the first time ever and since then the Turing Olympics is the preferred measure for “AI sentience”.

Limitations of existing approaches

Lots of labelled data hand-labelled by people. Those with the necessary scale have done some clever tricks to solve this: For instance,

  • Google’s 411 directory service service in the US helped train their voice recognition systems, and
  • their reCAPTCHA system used across the web helped train image recognition.

Context is very hard to establish. What is “common sense” for humans has no equivalent with computers. Otherwise known as the “Frame problem”.

Humans are good at transferring ideas from one problem domain to another. This remains challenging for computers even with the latest machine learning techniques”. The active area of research here is called transfer learning.

Standard kinds (canonical) of problems that machine learning helps solve

Medium doesn’t offer tables (!!! HTML 1.0 offered tables… go figure…) so I’ve dumped an image below and I made a table of it here

Extracting value from data

“90% of the world’s data has been created in the last 5 years.” — though IBM states it’s the last 2 years. The report states 2.5bn GB created a day, but the data is from 2014. This site has a live estimate that shows the figure 2.8bn TB/d as of 27 April 2017 (1000x as much as indicated)

“The UK has already committed to the G7 Open Data Charter, the principles of which state that government data should be openly published by default, usable by all, and released in highquality formats that allow automated processing”

Building skills at every level

The role of data and its impact on our lives will continue to become more ubiquitous at home and at work, and therefore requires basic “data literacy” from an early age. Shut down or restart (2012) catalysed positive changes in UK school curricula and “The rapidly evolving data science needs of other disciplines will need to be considered in future curricula and qualification reviews”.

“the ability of people to understand the world in which they live and work increasingly depends on their understanding of scientific ideas and associated technologies and social questions”

Key concepts in machine learning and recommended school age for introduction, paving the way for data literacy

“At a minimum, newly-qualified professionals should be data literate, and able to ‘think algorithmically’”

“As an interdisciplinary area, machine learning is not currently well-served by existing funding models”

“If the field is to advance in a way that represents a broad range of interests, then it will need to draw from a broad pool of people, if it is to avoid developing a form of research and development ‘myopia’. Attracting a range of people to the field will also be essential in improving the overall strength of the UK’s skills base in this area.”. This is a tough one to crack: computer science in general has had it for a very long time and it doesn’t seem to be changing much. It’s unclear exactly in what form this attraction would manifest.

Machine learning in society

The Royal Society with Ipsos MORI surveyed 978 members of the public. While only 9% understood the term “machine learning”, most (89%) recognised its application. This seems reasonable: technology is successful when it’s transparent.

Attitudes: One of the clearest messages from these public dialogues is that the public do not have a single view of machine learning; attitudes, positive or negative, vary depending on the circumstances in which machine learning is being used.”

Opportunities

Machine learning as a brand rides on “big data” which has made it more accessible. The opportunities people identified were in summary:

  • More objective
  • More accurate
  • More efficient
  • New growth
  • Contribute to addressing large-scale societal challenges such as the aging population

Concerns and plans to address them

It wasn’t all rainbows and unicorns though, there were some concerns, all of which I think are pretty reasonable, but helpfully, the report points to active research for all of them:

  • Potential for causing harm: solution research includes reassurance systems would be “robust”, strong evidence of safety, and human oversight or final decision-making
  • De-skilling and over-reliance on machine learning and the resultant societal de-skilling or relegation to niche. This is inevitable and has been happening for 500 years though. In the extreme, the survey revealed a public concern of machines replacing people in the workplace, and “changing relationship with an activity of personal significance: feelings of freedom or autonomy”. Also for care providers it’s important to underscore that accuracy is not the sole measure of success. Human empathy and personal engagement are particularly important in health and social care.
  • Restriction of choices and on human experience: the risk of missing nuanced interpretation (which is only possible with much larger volumes of data and interpretation capabilities).

The context of concerns

People’s level of concerns in the survey varied based on the aspects below:

  • “the perceived intention of those using the technology;
  • who the beneficiaries would be;
  • how necessary it was to use machine learning, rather than other approaches;
  • whether there were activities that felt clearly inappropriate; and
  • whether a human is involved in decision-making.
  • Accuracy and the consequences of errors were also key considerations.”

“Fundamentally, the concerns raised in these public dialogues related less to whether machine learning technology should be implemented, but how best to exploit it for the public good. Such judgements were made more easily in terms of specific applications, than in terms of broad, abstract principles.”

Use of data, privacy and consent

As it is put to use in this new environment, machine learning reframes existing questions about privacy, the use of data, and the applicability of governance systems designed in an environment of information scarcity”

“Machine learning further destabilises the current distinction between ‘sensitive’ or ‘personal’ and ‘non-sensitive’ data: it allows datasets which at first seem innocuous to be employed in ways that allow the sensitive to be inferred from the mundane.”

“For example, research has shown how accessible digital records, such as Facebook ‘Likes’ by which Facebook users express positive sentiment about content on the social media site, can be used to infer sensitive personal attributes. By analysing a user’s Facebook Likes, researchers found that they could predict characteristics such as sexual orientation, ethnicity, religious or political views, intelligence, or gender. Although Likes are publicly available by default, users do not necessarily expect these to reveal more sensitive information. Yet such information may now be predicted from their online activity, or digital records that are available to a range of organisations”

In the past, consent has been pitched as the hallmark of good data governance. However, it is by no means clear that, even in cases where consent is used as the ‘gold standard’ for data use, this consent is informed. Although up to 33% of people claim they usually read website terms and conditions, server-side surveys indicate that only 1% actually have…New approaches to navigating questions about consent are therefore needed”

Fairness and statistical stereotyping

“There are two different ways in which machine learning applications may give rise to biases or lack of fairness.

  1. when machine learning algorithms inherit subjective biases which are present in the data on which the algorithms are trained. [e.g. a resume pool with few women in it might infer a resume pool with few women it in future too. -J]
  2. A different source of bias or unfairness can arise when a machine learning algorithm correctly finds that a particular attribute of individuals is valuable in predicting outcomes, in contexts where society may deem use of such an attribute inappropriate”, [such as gender for insurance premiums — something that in the EU is now illegal, but it can also be inferred by other parameters -J]

Interpretability and transparency

Neural networks with hidden layers also hide their reasoning. For high impact decisions, interpretability and transparency is important for users who want to understand why, and developers who want to improve the system.

“A ‘right to an explanation’ is implied in legal frameworks surrounding the use of data, namely the new European General Data Protection Regulation”

“In attempting to resolve issues of transparency, there can be trade-offs between accuracy and interpretability. At a basic level, hard-coded rules are more interpretable, but more opaque approaches such as neural networks are often more powerful and can produce more accurate results. This trade-off between transparency and performance has different consequences in different applications, raising questions about whether the decision to prioritise transparency or accuracy needs to be made explicitly and, if so, how and by whom.” Also, hard-coded models (such as traditional programming) are immensely more costly to build as every nuance has to be put in place.

Accountability

“Amongst the public, the most common response to the question “who should be held accountable when machine learning ‘goes wrong’” was “the organisation the operator and machine work for” (32%), and, though this did not constitute a majority response, it did clearly outweigh the number of respondents who believed the machine itself should be held accountable (3%).” — this last point is interesting because I myself think the machine itself should be held accountable only if the machine is aware of the consequences of its actions and has sufficient sentience as to provably guide its decisions towards incentives and away from disincentives.

Three approaches to addressing liability:

  • “The so-called Bolam Test, or whether a reasonable human professional would have acted in the same way;
  • Strict liability — or liability without negligence or intent to harm — for autonomous vehicles; and
  • Third party liability, akin to provisions made for dangerous dogs.”

Potential social consequences associated with the increased use of machine learning

Bubbles / echo rooms: stratification of societal groups by reinforced content and messaging that reinforces existing beliefs. (e.g. see WSJ Red Feed, Blue Feed on Facebook)

New power asymmetries: privacy vs access. “new power asymmetries may be created; a ‘Faustian pact’ whereby individuals willingly give up privacy in exchange for efficiency, convenience, or a need to access a service, without giving informed consent”

Human-machine interaction: as this becomes more common, it is starting to affect social norms, such as Alexa not asking that children say please or thank you.

The future of machine learning

Possibly to be summarised at a later date.

View the full report (PDF, 128 pages)

--

--

Ex-Google Technical Product guy specialising in generative AI (NLP, chatbots, audio, etc). Passionate about the climate crisis.