The world’s leading publication for data science, AI, and ML professionals.

AI + Safety with DNV-GL

Executive Summary and Current Status Reviewed

Photo by @eadesstudio Unsplash
Photo by @eadesstudio Unsplash

After writing about OpenAI and their research papers on safety [1] [2] [3] I decided to look closer to home. Since I live in Norway I wondered whether there was anyone focusing on AI Safety here. Of course there is, and there are likely far more than I can find. However I was lucky to discover a position paper from a researcher at DNV-GL on the topic. The paper is called AI + Safety Safety Implications for Artificial Intelligence: Why We Need To Combine Causal- and Data-Driven Models.

As I mention when covering Google and OpenAI previously my interpretation of their work may be lacking due to my lack of experience and the short time to review the paper (one day). I will do my best. Still, I hope you stick with me for a short examination of this position paper by Simen Eldvik who is Principal research scientist @ DNV GL – Risk & Machine learning.

About DNV-GL

Since 1864, the purpose of DNV-GL has been to safeguard life, property and the environment. DNV GL is an international accredited registrar and classification society headquartered in Høvik, Norway. They have 350 offices spread over a 100 countries. They are the leading provider of risk management and quality assurance services to the maritime, oil and gas, and power and renewables industries. They are also global leaders in certifying management systems of companies across all types of industries, including healthcare, food and beverage, automative and aerospace. Their current revenue is 19.5 billion NOK [$2.23 billion]. They invest 5% of our revenue every year in research and development. Eldvik works in their research department with Risk & Machine learning.

About Simen Eldvik

Simen has been doing research related to machine learning (ML), artificial intellingence (AI), and physics-based constraints for rare and high consequence scenarios. He has been working to understand how ML methods and virtual testing can be used for predictions where little or no data exists. This work has been to ensure the safety of high-risk engineering systems ML need to be combined with known causations. He wrote his PhD thesis in physical acoustics and material science titled: "Measurement of non-linear acoustoelastic effect in steel using acoustic resonance" at the University of Bergen. Prior to this he has a M.Sci in physics.

How is safety defined by Eldvik?

Eldvik refers to the ISO/IEC guide in defining safety as "freedom from risk which is not tolerable" (ISO). Further he says: "This definition implies that a safe system is one in which scenarios with non-tolerable consequences have a sufficiently low probability, or frequency, of occurring. AI and ML algorithms need relevant observations to be able to predict the outcome of future scenarios accurately, and thus, data-driven models alone may not be sufficient to ensure safety as usually we do not have exhaustive and fully relevant data."

As such there is a few aspects worth mentioning to further his argument. There is a paper by the European Union Agency for Fundamental Rights (FRA) called Data quality and artificial intelligence – mitigating bias and error to protect fundamental rights. Previously writing about EU vs. Facebook, due to their failure to protect user data [4ba4abe40?source=friends_link&sk=756bbc8cf12d7bab4c83f6b802abdf9e)]. If we move beyond the data there is also a considerable question of fairnesss in AI [5] and the crisis of diversity in AI [6].

Reducing Risk with Sensor Data and Data-Driven Models

As Eldvik mentions accidents will occur, so there is the question of how to reduce risk. He has three clear suggestions which I have copied and shortened slightly:

_1) We need to utilize data for empirical robustness. High-consequence and low-probability scenarios are not well captured by data-driven models alone, as such data are normally scarce. However, the empirical knowledge that we might gain from all the data we collect is substantial. If we can establish which parts of the data-generating process (DGP) are stochastic in nature, and which are deterministic (e.g., governed by known first principles), then stochastic elements can be utilized for other relevant scenarios to increase robustness with respect to empirically observed variations._

As such due to his first point let me elaborate on the Data generating process (DGP): (a) ** the _data collection proces_s, being routes and procedures by which data reach a database (often dynamic); (b) _statistical mode_l used to represent supposed random variations in observations, often in terms of explanatory and/or latent variables; (c) a notional and non-specific _probabilistic mode_l (relating to chance/probability not directly described or explicitly set down) including the random influences that combine together to lead to individual observations, where one instance would be the supposed justification of the "common occurrence" of the normal distribution in terms of a combination of multiple random additive effects. Stochasti**c means: having a random probability distribution or pattern that may be analysed statistically but may not be predicted precisely.

2) We need to utilize causal and physics-based knowledge for extrapolation robustness. If the deterministic part of a DGP is well known, or some physical constraints can be applied, this can be utilized to extrapolate well beyond the limits of existing observational data with more confidence. For high-consequence scenarios, where no, or little, data exist, we may be able to create the necessary data based on our knowledge of causality and physics.

Again as a side note Extrapolation from a source to a target, is a promising approach to utilize external information when data are sparse. In computer science, robustness is the ability of a computer system to cope with errors during execution and cope with erroneous input. Robustness can encompass many areas of computer science, such as robust programming, robust machine learning, and Robust **** Security Network.

3) We need to combine data-driven and causal models to enable real-time decisions. For a high-consequence system, a model used to inform risk-based decisions needs to predict potentially catastrophic scenarios prior to these scenarios actually unfolding. However, results from a complex computer simulations or empirical experiments are not usually possible to obtain in real-time. Most of these complex models have a significant number of inputs, and, because of the curse-of-dimensionality, it is not feasible to calculate/simulate all potential situations that a real system might experience prior to its operation. Thus, to enable the use of these complex models in a real-time setting, it may be necessary to use surrogate models (fast approximations of the full model). ML is a useful tool for creating these fast-running surrogate models, based on a finite number of realizations of a complex simulator or empirical tests.

Previously I wrote an article about Advancements in Semi-Supervised Learning with Unsupervised Data Augmentation [7] where I described the curse of dimensionality. This can in other words be called Dimensionality of the input space. High dimensional spaces (100s or 1000s). The volume of the space increases so much that the data becomes sparse. Computing each combination of values in an optimisation problem for example.

4) A risk measure should be included when developing data-driven models. For high-risk systems, it is essential that the objective function utilized in the optimization process incorporates a risk measure. This should penalize erroneous predictions, where the consequence of an erroneous prediction is serious, such that the analyst (either human or AI) understands that operation within this region is associated with considerable risk. This risk measure can also be utilized for adaptive exploration of the response of a safety-critical system (i.e., as part of the design-of-experiments).

In the paper on AI Safety by OpenAI The objective function is mentioned too. It is in linear programming the function that it is desired to maximize or minimize. In plain English, perhaps too simplified, we may ask: did we do the right thing? If we maximised value of mineral resource extraction yet forgot possible environmental externalities (read: damage) this could be an example. In this they define an accident as:

"Very broadly, an accident can be described as a situation where a human designer had in mind a certain (perhaps informally specified) objective or task, but the system that was designed and deployed for that task produced harmful and unexpected results."

You can read my article on Avoiding Side Effects and Reward Hacking in Artificial Intelligence for further thoughts on this subject and more information on the report called Concrete Problems in AI Safety. Luckily Simen Eldvik shortly after refers to this same paper, so it seems we have shared interests.

5) Uncertainty should be assessed with rigour. As uncertainty is essential for assessing risk, methods that include rigorous treatment of uncertainty are preferred (e.g., Bayesian methods and probabilistic inference).

Bayesian inference is a method of statistical inference in which Bayes’ Theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is closely related to subjective probability. In probability theory and statistics, Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Codified Past Does not Invent the Future

This heading is a shortening of the quote which refers to Cathy O’Neil the writer of Weapons of Math Destruction which Eldvik includes in his paper. Cathy says this of big data and I found it so striking so I would like to share it in this article:

"Big Data processes codify the past. They do not invent the future." – Cathy O’Neil

As such predictions with models can only do so much, however it may be useful to play out some scenarios particularly in areas with a large risk to human life.

This is not mentioned in the position paper by Eldvik and it is not for me to say whether models could have made any difference, yet in Norway the Alexander Kielland drilling rig capsized in March 1980 and killed 123 people. As such the stakes in modelling for safety can be high. We cannot of course predict or model conditions, yet there is an importance that can be critical to life or death.

Eldvik presents a figure based on Allen and Tildesley Computer Simulation of Liquids:

"Data-driven decisions are based on three principles in data science: predictability, computability, and stability (B. Yu, 2017). In addition, and particularly important for safety-critical systems, the consequences of erroneous predictions need to be assessed in a decision context."

He talks of the Kaggle competitions and that the challenges to be "won" may not have been followed by empirical rigour. He refers to a paper written by members of the Google Brain team called Winner’s Curse.

Kaggle is a platform for predictive modelling and analytics competitions in which companies and researchers post data and statisticians and data miners compete to produce the best models for predicting and describing the data.

They suggest ways to standards for empirical evaluation: (1) Tuning Methodology; (2) Sliced Analysis; (3) Ablation Studies; (4) Sanity Checks and Counterfactuals; (5) At Least One Negative Result. You can go to either the Google Brain paper or to Simen Eldvik for further information on this.

Conclusion

It is worth having a look at Simen Eldvik’s research and his position paper on the topic of AI + Safety. DNV-GL is a company that works with risk and as such it is not unlikely that we may see more research from their team in the future. Perhaps there will be a risk measure and certification of algorithms in the future? Us lawmakers are proposing a bill that would require large companies to audit machine learning-powered systems, perhaps we should take similar steps in Norway. If so: then DNV-GL would be the perfect actor to contribute to such a policy project.

List of my articles referred to in the text:

[1] Debating the AI Safety Debate

[2] Avoiding Side Effects and Reward Hacking in Artificial Intelligence

[3] Social Scientists and AI

[4] Facebook vs. EU Artificial Intelligence and Data Politics

[5] Artificial Intelligence and Fairness

[6] Artificial Intelligence and Norwegian Gender Quotas

[7] Advancements in Semi-Supervised Learning with Unsupervised Data Augmentation


Thank you for reading. This is day 60 of #500daysofAI. I write one new article every day on the topic of artificial intelligence.


Related Articles