Thoughts and Theory

Corporate Imperatives and the Future of AI, Part I

Has economic ideology co-opted AI research and development?

Travis Greene
Towards Data Science
15 min readOct 8, 2021

--

Is the corporate pursuit of maximizing shareholder value slowly destroying human value? Photo by Reza Hasannia on Unsplash

The COVID-19 pandemic marks an inflection point in the human relationship with digital technology. Online co-working, remote learning, and remote work are the new normal. Our daily average screen time is likely higher now than at any time in the past. Vast networks of new technologies including sensors, devices, and apps, combined with advances in 5G, Internet of Things (IOT), Edge computing, and AI/ML offer new possibilities for automated data collection and influencing user behavior to achieve public policy and corporate objectives. Here I consider reinforcement-learning (RL)-based technology as it is commercially applied to individual human persons and societies.

Why focus on RL? RL is likely our best shot at achieving Artificial General Intelligence, according to a recent paper by David Silver and Richard Sutton of Deepmind. And Deepmind, of course, is part of Alphabet Inc., Google’s parent company.

This post examines the following questions:

  • Is the corporate pursuit of maximizing shareholder value, coupled with a neoliberal economic ideology, pushing AI research, development, and applications in potentially pathological directions?
  • Should we trust that market-driven AI research and development in the “private interest” will align with human-centric values of transparency, justice, fairness, responsibility, accountability, trust, dignity, sustainability, and solidarity?

A Brief Introduction to Reinforcement (Learning)

Today, the initially qualitative descriptions of reinforcement given by Behaviorist psychologists can be — and increasingly are — implemented digitally and put towards the goal of maximizing corporate profit. These digital interventions are effectively invisible to their targets— individual persons using and interacting with the digital system. Isolated on our personal devices and apps, we navigate the Infosphere atomistically, ignorant of how or why my Google is “personalized” in this way and yours that way. Yet the data platform sees, knows, and analyzes all. The platform is digitally omniscient, nearly omnipotent, but not obviously benevolent.

What originally started as a linguistic description of reinforcement, known as Thorndike’s Law of Effect, has gradually evolved into a precise algorithm. The “Law” states that past behaviors resulting in positive rewards are more likely to be repeated in the future, while those resulting in negative reward (punishment) are less likely. Crucially, this idea can now be implemented by a computer to automatically learn to manipulate the behavior of both human individuals and collectives at scale. A recent paper provides key empirical evidence for the “adversarial” use of RL on humans.

RL is a deceptively simple formalism. Its algorithms make the Law of Effect explicit by providing an algorithmic mechanism allowing an idealized agent to optimally adapt its behavior to its environment as a result of past experience. The theoretical and practical properties of RL algorithms are complex, however. Diverse fields such as computer science, statistics, economics, and the cognitive sciences each have a considerable literature devoted to them. But only now have these scientific insights trickled down into consumer-facing technology.

Think of RL as a general method for solving problems. It works by learning instrumental associations between perceptual stimuli (states) and actions in order to maximize some positive reward determined by the algorithm’s designer. A basic interaction sequence looks like this: the agent observes the state of the environment, selects an action, and the environment transitions into a new state. The agent’s goal is to figure out which sequences of actions in which states lead to maximum cumulative reward. Unlike supervised learning paradigms, the agent only receives “evaluative” feedback about the quality of its action in a given state via the reward signal — there is no “ground truth” that tells it about the “correct” action.

RL is more than simply an interesting topic in animal learning. RL has successfully been applied to problems in robotics, clinical decision making and personalized medicine, energy grid optimization, and “Just in Time” behavioral interventions. RL is a means to optimally control any complex system, including a digital platform consisting of millions of human users. Anything you can imagine as a sequence of decisions can be modeled using the RL formalism.

RL is also arguably the best theory we have for how animals learn complex behaviors via trial and error. In computational neuroscience, for instance, parameters of RL models can be fit very precisely to experimentally-collected human (and monkey) behavior data, suggesting that regions of our brains implement something like an RL algorithm. In fact, the phasic activity of dopamine neurons in the ventral tegmental area (VTA) is believed to encode reward prediction error (corresponding to surprise about events and subsequently driving various attentional mechanisms to changes in our environments). The concept of reward prediction error is also used to update state value estimates in RL based on temporal difference learning, thus connecting AI with neuroscience.

Economic Implications of RL-based Personalization

If our brains really do implement RL algorithms, then when we interact with RL-based systems on social media platforms, we are thus engaged in something like a multi-agent RL scenario. But do our interests align? Are we cooperating, or locked in conflict with one another?

If we are in conflict, what are the long-term cognitive effects on individuals and political and social ramifications for societies? Shoshanna Zuboff, Karen Yeung, and Mireille Hildebrandt and other academics are concerned with how the knowledge obtained through decades of animal learning, feedback control, AI/ML, and computational neuroscience research is now being deployed on social media platforms. Their work demonstrates how economic imperatives bias the application of scientific knowledge towards technology producing short-term, immediate profits at the expense of long-term human wellbeing.

But a careful historical examination of failures in predicting the behavior of complex systems, particularly economic and natural ecosystems, should remind us of the importance of epistemic humility. Control may be a necessary, but not sufficient, condition for understanding complex natural phenomena. Climate change and financial crises, anyone?

Below is a very simple schematic illustrating how the basic process of feedback control could be implemented by a personalized recommender system on a social media platform.

Better and more data behavioral collection by platforms means more precise measurements of the “state of the system,” which is the collective of human users interacting on the platform. Better predictions of future states make it easier to direct, push, or “control” the system in the desired direction. Reinforcement learning-based recommender systems function as adaptive controllers of “environments” of users on platforms aimed at achieving corporate objectives. Source: Author.

Behaviorism, Science, and Deep Neural Nets

The Behaviorists were the first to see the connection between probability, statistics, and reinforcement, even though they relied on an eccentric methodology that eschewed large samples and statistical inference. Yet with the discovery of Herrnstein’s Matching Law, Behaviorists realized that with proper reinforcement of initially random trial and error behavior, animals can learn to “match” statistical regularities in their environments. That is, animals (and humans) seem to naturally adjust the frequency of behavior to approximate the underlying reward statistics of the environment.

By intervening in the right way and at the right time, an animal’s behavior can be incrementally shaped or controlled in order to solve arbitrary tasks, such as when pigeons learn — via reward shaping — to stack wooden blocks to reach a food-dispensing lever. This suggests that whatever our brains do when we behave, they implement something like an RL algorithm.

Behaviorism is based on an engineering-centric philosophy of science descending from the Austrian physicist Ernst Mach, later adapted by physicist Percy Bridgman, and finally made popular by BF Skinner. Mach believed science should aim at compressing knowledge and not needlessly expand its theoretical vocabulary — except when doing could explain a broader swath of empirically-observable phenomena. A rigorous scientific methodology ideally cuts out all noise and redundancy and keeps only signal, which then is communicated to future generations of scientists to further compress and understand until perhaps a unifying theory of everything might be found.

Mach’s influential ideas combine aspects of evolutionary theory and information theory to explain the accumulation and compression of scientific knowledge. His ideas would also later influence positivist philosophies of science, which removed talk of unobservable causes and other “metaphysical, transcendental mumbo-jumbo” stemming from philosophies of Kant and Hegel. Behaviorism demanded that mentalistic terms of intentionality be replaced by operationally-defined behaviors defined by the experimenter. For example, in a 1938 paper Edward Tolman equates the “lookings back and forth” of a rat in a maze with a “behavioristic definition of conscious awareness.” Unobservable mental states such as “beliefs,” “desires,” or “intentions” about, or representing features of an organism’s environment were now theoretically off-limits.

But Behaviorism was not entirely wrong. Although frequently criticized today, Behaviorism’s insights have permeated a variety of fields. Its major sin consisted in its eagerness to reduce all of human experience to the level of non-conscious zombie-like behavior, when in fact, only some of our behavior is driven by these sub-personal, goal-directed processes. Humanists dislike how this unjustly downplays the role of self-consciousness in human deliberation and reduces human autonomy, free will, and moral responsibility to a sequence of environmental interactions which may be controlled and shaped by not-always-benevolent social engineers.

Reinforcement Schedules to Control Platform Users

Behaviorism, when combined with advances in deep learning, marks a new era of platform-based observation, prediction, and control of users. The old Behaviorist guard of JB Watson, Skinner, Egon Brunswik, Edward Tolman and others relied on synchronic definitions of emitted behavior. They only cared whether a rat or pigeon pressed a lever at some point in time. It didn’t matter whether the rat used its left arm or right arm. Both actions were identical as far as the operational definitions were concerned. To see the behaviorist influence on personalization research today, just replace “lever press” with “click.” The same kind of thinking applies.

A Skinner Box. Source: Wikipedia.

Early behaviorists lacked the technology to analyze the complex temporal sequence of behaviors (states) which led up to a particular lever press. Pen and paper is simply not very efficient. BF Skinner would develop the Skinner Box to deal with this problem. This device physically implements a particular reinforcement schedule, which are rules describing how certain target behaviors are rewarded or punished during operant conditioning. Typically fixed or variable intervals of time or ratio of trials are used.

In a 1958 article published in American Psychologist, Skinner described how a reinforcement schedule is implemented by a “programming system” specified in physical terms. I have bolded key words and phrases to illustrate the connection with RL as it might be applied by a social media platform aiming to reinforce user behaviors conducive to its business goals.

A schedule of reinforcement is arranged by a programming system which can be specified in physical terms. A clock is introduced into the circuit between key and magazine so that the first response made to the key after a given interval of time will be reinforced. A counter introduced into the circuit establishes a contingency in terms of number of responses emitted per reinforcement.

As the result of careful scheduling, pigeons, rats, and monkeys have done things during the past five years which members of their species have never done before. It is not that their forebears were incapable of such behavior; nature had simply never arranged effective sequences of schedules.

…The new principles and methods of analysis which are emerging from the study of reinforcement may prove to be among the most productive social instruments of the twentieth century.

Innovations in Reinforcement Learning

Skinner’s primitive “programming system” has today evolved into commercially-focused RL. RL’s business value stems from its ability to invisibly and automatically intervene in digital spaces, thus generalizing costly A/B testing. Indeed, advances in RL technology increasingly draw the interest of corporations and governments for achieving financial and public policy objectives (e.g., automating and personalizing “nudges”).

What is worrying though, is that now deep neural network (DNN) architectures based on recurrent neural networks (RNNs)(including LSTMs, and newer attention-based Transformers), can be used in a self-supervised manner to discover and encode these complex behavioral interaction sequences as states within the RL formalism. DNNs allow us to apply black-box function approximators to automatically generate state representations that condition the RL agent’s actions. DNNs can handle nonlinear user-item relationships and unlabeled and unstructured data, such as images, text, and interaction sequences, vastly expanding personal data collection possibilities.

These new technologies further allow us to relax the strict “Markovian” assumptions of traditional Markov Decision Processes used in RL. Regularities in these long, complex behavioral sequences can be automatically and efficiently discovered — with little human intelligibility — and platforms can direct or shape them with the right selection of actions aimed at achieving reward. From the platform’s perspective, control just means the selection of an optimal action given that a user is in a particular state (as represented by an RNN) in order to achieve maximum cumulative reward.

Utility and Platform Economics

Platform data scientists assume behavioral metrics such as “dwell time” and “click through rates” allow them to peek into the inner world of platform users’ minds. Millions of dollars are spent on data science research focused on business-centric “engagement” metrics derived from a smorgasbord of marketing theory (which equates observable behavior with expressions of “customer satisfaction”) and normative economic theory as revealing preferences for various states of affairs or objects, ranked by their utility to the user.

Modern data scientists have kept much of the methodology of Behaviorism, while updating their theoretical vocabulary to include unobservable latent mental states. But the reasons for doing so were pragmatic, not epistemological. Positing unobservable latent preferences allows better, more accurate predictions of user behavior and seemingly pays lip service to respecting the consumer as a unique thinking, feeling, and self-conscious person whose subjective experience differs from that of a sea squirt. In practice, however, the same theoretical formalism of utility applies to sea squirts as well as human platform users.

As philosophers such as Amartya Sen, Elizabeth Anderson, and Charles Taylor pointed out long ago, utility is the monistic common currency of economics. All questions of value are reduced to utility. It is now also an ascendant concept in data science, even though it suffers from grave theoretical and philosophical problems. Yet on the whole, engineers and data scientists appear happy to rely on the idea of utility. Think of how many Data Science for Social Good initiatives have been framed around utilitarian notions of maximizing social welfare.

Utility conflates at least three senses of the word value: economic value, moral value, and numerical value. It is easy to see the appeal of this move for algorithm-driven digital platforms. Moral values are obscure objects, not well understood, and often contentious. The reduction of moral value to a single number also expressible in monetary terms makes decision-making simple and, crucially, computationally tractable. It permits a naturalistic reduction of ethics to scientific computation. If moral value can be reduced to pleasure (i.e., “interest”), and pleasure to behavioral phenomena (i.e., dwell time), and behavioral phenomena to neuronal activity, for instance, then questions about “incommensurable values” or the “diversity of goods” can be politely sidestepped.

The normative aspect of utility theory as used in data science is particularly dangerous because it smuggles in a variety of unjustified assumptions about human nature and cognition. The theory of revealed preferences is built on top of a theory of rational choice: observed behaviors are assumed to be the outcome of a idealized rationalization process where utilities and probabilities are combined in an optimal way, as described by Bayes’ rule.

If revealed preferences are viewed normatively — as the result of an idealized Bayesian agent with complete information, infinite time and compute resources— then we also get a justification for the ideology of consumer sovereignty. Who are we to criticize the choices of a consumer when they are the output of a provably optimal mathematical procedure? It is not coincidence that a godfather of modern Economic theory, Paul Samuelson, developed the theory of revealed preference when Behaviorism dominated academic psychology.

Digital Marketing Funnels and Rat Mazes

Presaging the modern idea of a digital marketing funnel, in 1938 Edward Tolman wrote about the power of the rat mazes as a tool for investigating the processes of animal learning.

Let me close, now, with a final confession of faith. I believe that everything important in psychology… can be investigated in essence through the continued experimental and theoretical analysis of the determiners of rat behavior at a choice-point in a maze.

Given economic imperatives, RL will increasingly be used as automated means of human behavior modification aimed at corporate goals. For digital marketing, RL can be used to systematically influence user behavior at select choice points in order to achieve the goal of product purchase (conversion). The sections of the funnel can be viewed as user states, and marketing interventions (e.g., notify/display/recommend item x) as actions. An optimal marketing policy selects the best action given a user state. Better user surveillance and data capture means finer-grained state representations. Source: Seobility.

In case the comparison is not clear, the users of platforms are like rats in a maze. The platform is the corporate experimenter who wants you to run to the end of the maze and purchase some product or click on some ad. Because data platforms can globally monitor and record user behavior in a Panopticon-like manner, RL algorithms can now be used to shape and direct it towards the business goals of the platform. In fact, in specifying the reward function several optimization goals can be realized at once. For example, rewards might be a function of click behavior (to control browsing behavior) and purchases (to make sure the RL agent selects actions ultimately conducive to purchases and not simply lots of clicks).

You can imagine how an RL agent might select an optimal policy (i.e., sequence of intervention actions) to push users towards buying more products, voting for certain political candidates, or — to give examples of non-evil applications of behavior modification — losing weight or stopping smoking. Digital marketing and computational advertising is now where the science of behavior modification meets RL. Combined with Generative Adversarial Networks (GANs) and the ability to create adaptable “personalized” audio, video, and image-based content, possibilities are endless. Billions of dollars are to be made, and little is currently regulated in this space.

Imagine the scale of this in a society such a China. RL-agents can interact with billions of users on a daily basis on platforms such as TaoBao, WeChat, MeiTuan WaiMai, JD, and DiDi ChuXing. A massive multi-agent RL system could be constructed using a centralized data storage “replay buffer” owned and operated by the Chinese Communist Party in which the individual interactions of myriad specialized RL-agents share and improve behavior policies by pooling state-action-reward histories. Although far fetched, this isn’t impossible: Stanford’s 2021 AI Index report reveals that China has now surpassed the US in AI journal citations.

Photo by Joshua Hoehne on Unsplash

Data Science in the Private Interest & Doubt Mongering

Mixing corporate imperatives with automated behavior modification techniques is a recipe for social disaster. Case in point, the Wall Street Journal recently published a piece revealing how Facebook made changes to its newsfeed algorithm around 2017–2018 in order to counteract declining user engagement. These changes were designed to encourage more personal connections and improved mental health, but they backfired. The algorithm instead effectively became an outrage and misinformation-slinging machine, and corporate profit imperatives were a major reason why Zuckerberg and others did nothing to counter-act the ensuing social externalities.

Facebook’s Head of Research offered an official rebuttal of the Wall Street Journal’s claims, however. But it’s a bit like expecting Phillip Morris to give you unbiased information about the effects of smoking. Given corporate profit imperatives, it’s hard to know what to believe. Unfortunately, the prognosis doesn’t appear good if academic brain drain continues at its current pace. Stanford’s AI Index report states that 65% of graduating AI PhDs in North America now go into industry, compared to only 44% in 2010.

So can we do anything to combat the inevitable use of RL-based technologies as a means to greater corporate profit at the expense of our mental health and political and social stability? In his book Science in the Private Interest, Sheldon Krimsky details the history behind the “thinned ranks” of free-minded scientists studying “neglected human needs and injustices” in areas of tobacco, product safety, environmental pollution, workplace toxics, and efficacy and side effects of drugs on adults and children.

Academics such as Naomi Oreskes and Krimsky have revealed coordinated corporate strategies of doubt mongering in order to purposefully confuse and mislead the public on issues of collective importance such as smoking and climate change. The strategy rests on the premise that uncertainty about harmful consequences favors the status quo. Highly unlikely hazards, even if large, do not warrant taking precautionary action. Why wouldn’t platforms also use a similar strategy to “muddy the research waters” around teen well-being and mental health and social media use? We are already seeing something similar being done in response to Facebook’s whistleblower Frances Haugen.

Corporate data platforms hold the keys to answering similar questions with important social implications. They collect and control the behavioral data of billions of users, are flush with cash from advertising, and can intervene at will into the online experiences of millions of users in order to test causal hypotheses about things such as emotional contagion. It’s hard for cash-strapped, publication-hungry data science academics to compete with that.

Towards Data Science in the Public Interest

Krimsky forcefully argues that we must defend the integrity of America’s research institutions, particularly the research university. Why? Because

universities are more than wellsprings of wisdom… they are the arenas through which men and women of commitment can speak truth to power on behalf of the betterment of society.

We’ll likely need the help of government regulators and agencies to make public interest data science research a reality. But legislators need to recognize that meaningful platform research requires not only access to troves of behavioral data, but to the recommendation serving systems themselves. Good science aims at uncovering the unobservable causes of phenomena; it must go beyond simply describing observable correlations in previously datasets curated by platforms. Lastly, academic data science research in the public, rather than private, interest adheres to international human subjects research ethics protocols — which we may also need to update to reflect new paradigms of socially-responsible data science research.

So here’s to the training of a new generation of academic data scientists who are willing to speak truth to power.

--

--