The world’s leading publication for data science, AI, and ML professionals.

The Social Graphs in Viral Growth

Means to Reason About Collective Outcomes from Interpersonal Interactions: Computer Simulations of Mathematical Models

At the time of writing, the coronavirus disease COVID-19, caused by the virus SARS-CoV-2, is spreading over the globe. The virus spreads from a contagious person to another person, and after a time of incubation, the spread might continue to yet another person.

In one view of the matter, the virus spreads in the social graph of the world. The social graph is a mathematical representation of relevant social interactions of people in a community – in this case the real-world interactions that have a non-zero probability of mediating the spread of the virus.

In what follows I illustrate how the spread depends on features of the social graph.

I use computer simulations of a mathematical model built on abstractions. Therefore the simulations deal with the general phenomena of how things or beliefs can replicate in networks, and how the nature of said networks in turn impacts the nature of the spread. Economists, marketers, counter-terrorists and epidemiologists, among others, apply the theory of networks and graphs in their practice.

The simulations I run connect macro statements of viral spread in a community to details of the mechanism of interaction and replication. In other words, intuitions about person-to-person interactions can be linked to macroscopic or collective properties of the viral spread. The bottom-up computer simulation is by design explainable in terms of beliefs about fundamental properties. That is one of the virtues of this approach.

I do not claim, however, to present quantitative predictions or advice for individual or collective action for the particular COVID-19 Disease. Local health authorities are better places to look for that information. Rather, I am motivated by the belief that a more definite way to reason, as a community, about complex events can be attained if the model of the world and its logic is explicit. Any preference we declare for one action over another, or judgement of the past, builds on beliefs about how the world works. Our choice is only if, and how, we make these beliefs – that is the model – evident.

A far greater effort of calibration against specific data and validation is needed in order to go beyond the qualitative statements I formulate. But viral spread is reasoned about now more than ever. That is why I present these tools.

In the next sections I present and discuss results of the computer simulations. At the end I provide a few technical method details along with links to the Python code I created and used.

The Key Quantities Established: Viral Spread in the Completely Mixed Social Graph

Picture a community wherein everyone interacts with everyone with no preference for one over another. An idealized community like this is a suitable baseline against which to compare subsequent simulations with additional heterogeneity added to the social graph.

A well-mixed community maps onto a complete graph. The nodes represent the individuals in the community. The edges connect pairs of individuals that can meet, at which time a viral transmission can take place with a typically low, but non-zero, probability.

Small illustration of four nodes (individuals) in a complete social graph
Small illustration of four nodes (individuals) in a complete social graph

In what follows I employ a general Time Unit (TU) rather than hour, day, or week in order to not create a false sense of specificity. The reader can if so desired make the substitution of terminology.

For the baseline scenario, further picture the following: All individuals, except one, start out healthy. Transmission of the virus in one TU takes place with probability, P, when a contagious individual meets a non-infected and non-immune individual in the social graph. Once infected, the individual in the baseline scenario progresses as follows:

  • After ~1–3 TUs of the infection event, the individual turns contagious and can start to infect other individuals they meet.
  • After ~6–10 TUs, the disease is revealed at which point the individual self-quarantines and ceases to meet (and therefore infect) other individuals in the social graph.
  • After ~10–15 TUs the individual recovers and thereafter is immune to further infections, and returns to their previous place in the social graph.

I omit modelling death events. They are of course the most momentous in real-world cases. My current focus is on the short-term viral spread, so this omission is justified.

Further imagine two minor variations to the baseline scenario:

  • More Likely Transmitter: as the baseline scenario, except probability of transmission when meeting takes place is double that of the baseline.
  • Delayed Recovery: as the baseline scenario, except recovery takes place ~13–18 TUs after the infection event.

These three scenarios are simulated with one thousand individuals in the complete social graph.

Common to the three scenarios are that early in the progression of the disease, the number of infected individuals grows exponentially. That is because the infected individuals are relatively rare at that time, so the vast majority of meetings the infected individual is part of has the potential to transmit the virus to a healthy individual.

The growth levels off some time later and reaches a peak value. When many meetings involve pairs of already infected individuals, or immune individuals, fewer meetings in the social graph can contribute to the spread of the virus in the community, and recovery rate soon exceeds infection rate.

The difference in the collective outcome between the scenarios stem from these relations:

  • If a virus transmits more easily between individuals, the number of infected individuals grows more rapidly in the beginning, and ramps down quicker too.
  • If a disease takes more time to recover from, the number of infected individuals at any TU is greater, because the flow of recovered individuals back to the social graph takes more time to begin.

These are quite intuitive relations and a good ground for the further analysis.

A different way to characterize the collective outcome is to determine the number of healthy individuals a contagious individual transmits the viral particle to before they self-quarantine and later recover.

From the first 100 infected individuals’ disease progression, the following histograms are obtained.

In the more likely transmitter scenario, an infected individual is more likely to infect a greater number of individuals before self-quarantine. That is a different way to quantify that the growth in infections is greater in that scenario. The delayed recovery is no different from baseline with respect to this quantity, which is because they differ not with respect to the nature of transmission, but with respect to recovery.

The average value of transmissions per infected individual is sometimes called R0 in the epidemiology literature. In some models of viral spread, these histograms are inputs to the model, not outcomes, and they can under reasonable theoretical assumptions be modelled as negative binomial distributions.

The outcomes and relations described so far can be obtained from standard growth models using ordinary differential equations or simple types of random sampling. They can therefore be mathematically very efficient. Such models make various assumptions of homogeneity, including with respect to the social graph.

Since I model the individual-to-individual dynamics explicitly, I can introduce heterogeneity and see how that manifests itself in the collective outcomes. That is the connection between intuitions and macro observables I wish to explore.

Social Distancing Benefits All, Even When Not All Participate – A Possible Coordination Dilemma In The Making

I model social distancing along two dimensions: how much social distancing is done by a given individual, and how many individuals engage in such social distancing. A non-uniform degree of social distancing across the individuals in the community is therefore modelled.

I continue with the complete social graph and employ the same baseline scenario as described above. In addition, variations of the following kind are also simulated: X% are Y% more cautious. This signifies that X% of the individuals in the simulated community are Y% less likely to transmit or receive the virus when they interact with another individual in the social graph. The remaining percentage of individuals, 100-X%, are no different than the individuals in the baseline scenario.

Again, one thousand individuals are simulated.

The curves in the diagram exhibit the property that has become known as flattened curve. The greater the proportion of the individuals in the community that are cautious in their social interactions, the slower the growth and consequently the lower the peak of infections.

The diagram below shows how many other individuals the first 100 infected individuals transmit the virus to in turn. As expected, the more cautious the interaction, the fewer transmissions take place per infected individual.

In these simulations the population of the community consists of individuals of two types: those who practice social distancing, and those who do not. The latter part are in other words identical to the types of individuals that comprise the population of the baseline scenario.

That part of the simulated populations, that is the non-distancing individuals, is analyzed separately.

The diagram shows relative numbers of non-distancing individuals who are infected in the different simulated communities. Note that the curves have been scaled so they can be compared and not simply reflect the different number of non-distancing individuals in the communities.

The key conclusion is that individuals that do not alter their individual actions, still become infected at a lower rate as the number increases of other individuals in the community that practice social distancing.

The people who practices social distancing also benefits from their practice, though less so if fewer other individuals engages in the same practice. The collective outcome is in other words cooperative, not just an summation of its parts.

Under some conditions this can therefore function similar to a classical coordination dilemma, akin to the famous Prisoner’s Dilemma. If for example there is a cost to practicing social distancing, the individual optimum is to be the sole person who does not practice social distancing, while all others do. Once some individuals "defect" from the distancing practice, the individual benefit to remain socially distant decreases, and the incentive to defect increases even more. The collective optimum is therefore unstable.

It must be said, however, that just because a dilemma might be present it may not become activated in real-world settings. As shown by Ostrom and Sigmund, among others, studies on real-world humans, institutions and cultures, have shown that additional norms and taboos often have evolved to indirectly short-circuit coordination dilemmas. Once we can look back at the COVID-19 pandemic, it would be instructive to empirically investigate how different communities across the world have navigated these potential coordination dilemmas.

The effects of social distancing on the more likely transmitter scenario (see previous section) is contrasted against baseline in the diagram below.

Considerable social distancing in the more likely transmitter scenario still leads to more rapid infection growth than the baseline scenario.

This illustrates that even though social distancing qualitatively always flattens the curve, other properties of the viral spread matter greatly to the precise quantitative reduction that can be attained through such mitigating efforts. If the peak needs to be flattened below a given threshold, how much social distancing by how many that is required must be inferred from quantitative considerations of relevant properties of the viral spread.

I have only modelled modest heterogeneity among the individuals of the simulated communities. It is likely that degree and extent of social distancing will vary as a function of perceptions about the severity of the disease in the community, which changes over the course of the viral spread. To account for this, more complex models of intentions and behaviours of individuals are needed. So called agent-based models provide the means to model behaviour from beliefs and intentions under certain assumptions of self-interest. When faced with extreme events for which, by definition, little retrospective data can be used for training of a model, first principles models of human and social mechanics are required instead.

The Virus Random Walk in Caveman Graphs and Small Worlds: Social Graph Topology Can Be An Amplifier of Uncertainties in Person-to-Person Dynamics

In Graph Theory a host of archetypical graphs are defined and studied. The complete graph used in the previous section is one. It is an extreme case of connectedness: each node is within one step from all other nodes.

A far less connected graph is the so-called caveman graph. I will use the related relaxed caveman graph as the social graph and explore how that alters the nature of the spread.

The blue dots in the image represent nodes in the graph which correspond to healthy individuals; the four red dots are infected individuals; and the white overlapping and intersecting lines are edges which connect pairs of nodes. The nodes are placed such that nodes with many connections between each other are close. Each of the ten white bundles contain 100 nodes, each which form almost completely connected groups or cliques. Only 1% of edges in the graph are between groups.

A close-up of a bundle of almost completely connected nodes.

The relaxed caveman graph represents an archetypical community wherein everyone is tightly connected to a small sub-community. Some individuals are directly connected to two or more sub-communities, though the vast number of social connections are within the group for any individual. However, there always exists a path through intermediaries between all individuals in the community. Everyone is at least a friend-of-a-friend-of-a-friend of everyone else.

The animated image below shows one simulated disease trajectory in steps of five TUs. In addition to blue and red nodes, purple ones represent the reactively self-quarantined individuals, and the green ones the recovered and immune individuals.

The spread appears to progress in two bursts. First, five groups start to transition from blue to red, purple then green. Second, when the first groups are mostly green, the remaining five groups begin to significantly shift from blue.

This appearance of sudden jumps in the spread is a consequence of that transmissions that take the virus between groups are rare. In effect, there is an invisible transition barrier.

However, there is nothing in the viral spread or the social graph that requires the spread to proceed in two bursts like above. The particular outcome depends on how a few critical low probability events turn out. The diagram below shows the number of infected over time in nine independent simulations of the relaxed caveman social graph (different randomizations of the social and transmission events), with otherwise baseline virus properties.

There is a great deal of variance in the viral spread around the mean (or expected) progression of the infection in the relaxed caveman social graph. Repeat number 2, the clearly bimodal curve, is the one shown in the animated image.

The key point is that the uncertainty in the input information translates into a great deal of uncertainty in the outcome. The random variables in the model quantify the uncertainty of our knowledge of the individual components of the process. For example, in a given day, individual A might meet with individual B, a meeting which if it happens, might lead to a transmission of the virus. If individual A and B belong to different groups of nodes, this meeting and transition might prove – after the fact – to be the critical event that made the virus spread to a mostly healthy group of individuals.

For the complete social graph, the same uncertainty of input knowledge exists. However, viral spread in that type of social graph has a far narrower band of possible outcomes in collective output. So, identical person-to-person transmission properties translate into distinct collective outcomes as the topology of the social graph is altered.

Another consequence is that two identical groups in the relaxed caveman social graph can experience quite different outcomes at a point in time for no other reason than that a single rare event took a different turn. Some differences observed retrospectively have no grand structural causes in need of being explained and turned into prospective statements.

Another class of graphs is the small world graph. Many real-world networks are highly clustered (one’s friends are also likely friends with each other), and have modest shortest path lengths (if one can find it, there is a relatively short path between almost any two individuals in the network). The Strogatz-Newman graph algorithm is used to construct an archetypical small world social graph, and the spread of the baseline scenario virus is simulated again.

The variability in outcomes is not as great as for the relaxed caveman, though greater than for the complete social graph.

Another archetypical graph is the ring lattice. The individuals are clustered in the identical manner as in the small-world graph. The shortest path length, though, between any two nodes can be quite high. However, there are no groups of nodes as distinct as in the relaxed caveman graph.

For reference, the simulations with the different social graphs are set up such that the total number of meetings in a TU is the same regardless. Hence, differences between graph types do not reflect any modification in aggregate totals of meetings per TU.

The mean values of the growth are put together for easier comparison. The small world (10%) graph is like the small world social graph above but with additional paths between non-clustered nodes.

For more clustered social graphs, the viral spread is more protracted and the peak value of infected individuals is hence reduced, though, any single particular outcome for these social graphs can deviate more from the mean values.

If we picture the virus as an object that is moving randomly in the social graph, its ability to discover a path to a certain individual will depend on the topology of the graph. Bottlenecks, number of shortcuts between clusters, and so on, are features of the social graph that in addition to how readily the virus transmits between persons, determines collective outcomes, including the range of possible outcomes. After all, we observe only one reality, so range matters to statements about possible futures to be prepared for (a topic I have discussed in more detail with respect to urban fires in Toronto).

The social graphs above are illustrative archetypes. They are only meant to typify how variations in social features can impact collective outcomes under otherwise identical conditions. What the actual social graph that is relevant to any specific case, like SARS-CoV-2 spreading in a city, cat memes on social media, or political influence in Renaissance Florence, is a non-obvious empirical question.

What If Recovery Does Not Lead to Immunity and Other Probing Questions?

All simulations so far have assumed that immunity to further infections follows upon recovery. For virus infections this tends to be the case since the immune system can adapt to specific pathogen. However it is not guaranteed, since persons can recover through the non-adaptive parts of the immune system, through medical treatment (not vaccine), or it can also happen that during the course of the epidemic the virus mutates, such that immunological memory of the previous infection seizes to apply.

The dynamics of this can be complicated and would among other things depend on the segment of the population that succumb to the first round of infections and how adaptable the specific virus is. I have too little present understanding of the range of possibilities. So I will not explore this in any further detail, only note that this can matter a great deal for the long-term progression of the viral spread.

Another feature that matters to the spread in the social graph is proactive quarantine, that is when people are quarantined regardless of their known state of infection. In its simplest form it could be modelled as a group of nodes in the caveman graph that is entirely isolated from the other groups. However, large-scale quarantines typically involve friction, with individuals who resist, or where duration, not status of epidemic, determines if the quarantine applies. These dynamics require more thought to include, and I defer that to another time.

Model Details

Next I provide brief remarks about the model. Link to code at the bottom.

An individual is modelled as having a disease state. The state can undergo a state transition. The green boxes in the image below represent disease states in the mathematical model, and the black text above the arrows represent state transitions.

The arrows represent conditional relations (not a flow-chart). Activation, and downstream state transitions, are modelled as stochastic events. The uncertainty in the knowledge of the dynamics are quantified through cumulative probability density distributions. For example, the activate transition requires infection to be present. Given that condition, the outcome of a Bernoulli trial determines if a state transition takes place in a given day. The probability parameter for the Bernoulli trial depends on how many days have passed since the infect transition, such that a state transition becomes more likely as time passes.

The infect state transition depends on an interaction between individual and a neighbour in the social graph. The steps that can lead to infection are:

  1. Loop over all edges in the social graph.
  2. For an edge, let the outcome of a Bernoulli trial determine if the meeting represented by the edge takes place. The probability parameter is the weight associated with the edge.
  3. If a meeting takes place, then only if one individual is contagious and the other is neither quarantined nor immune nor already infected, attempt to make a viral transition with some probability. If successful, then enact the infect state transition.

In the simulations described above the weights on the edges are set such that the total expected number of meetings within one TU is proportional to the number of nodes. That is true regardless of social graph topology, which implies the differences between graph topologies are not from any difference in the number of attempted viral transmissions.

The Python code is freely available in this Github repository without any quality guarantees or support. The code builds on the Pandas and networkx libraries in particular, and the graphics are created with the Bokeh library, except the network images, which are created with Cytoscape.


Related Articles