Your friends (probably) have more friends than you.

The math behind this paradox could help us predict the next pandemic.

Caine Ardayfio
Towards Data Science

--

Photo by Chang Duong on Unsplash

Most of us think we have a pretty good number sense that, in our daily lives, often goes unchallenged. But sometimes, we encounter statements that seemingly defy logic. Due to millennia of evolution, our brain takes shortcuts to process the stimuli we encounter.

Brain teasers like the Monty Hall problem exploit this very fact to show that the world is not always as it seems. We feel like we are always right—until we aren’t:

Your friends (probably) have more friends than you.

This hard-to-swallow pill is a mathematical truth that runs counter to inuition.

Layman’s Explanation

In layman’s terms, the reason behind this phenomena is as follows:

Let’s say Alice is a social butterfly who makes friends with almost everyone whereas Bob only has a few friends.

If you were to meet Alice and Bob, you are much more likely to become friends with Alice than with Bob. Generally, you are more likely to be friends with somebody with lots of friends. It follows that your average friend will have more friends than you.

Like many things, this isn’t what the average person thinks. According to an NYU study, most people think they have more friends than their friends.

The Mathematical Intuition

Although this intuitively makes sense, the math behind this is remarkably rich and can even be used to detect the spread of disease earlier than with usual methods. Skip to the next section if you don’t like math :)

Our goal is to calculate, for the average person, how many friends do their friends have. This is best visualized as a graph, where a vertex represents a person and an edge represents a friendship. All friendships are bidirectional—for example, if Alice is friends with Bob, then Bob must also be friends with Alice.

A sample graph where each vertex represents a person and each edge represents a friendship.

Our goal is to average the following:

for a given person’s friend, how many friends does that person have.

This can be modeled as the expected value of randomly selecting a person, selecting a random friend of that person, and obtaining the number of friends that friend has. This is a hard problem. We can simplify it by instead taking a random friendship (edge) and randomly selecting one of the two friends (vertices) at either end of the friendship. See equation 1 for the probability that a given vertex is selected.

d(v) is the degree of vertex v. |E| is the number of edges in the graph. p(v) is the probability that when selecting a random friend, we select vertex v.

Now, we can average the number of friends for the selected vertex. We do this by going over each vertex and taking the product of its likelihood of being selected, p(v), and the number of friends that person has, d(v)—this yields equation 2.

To continue modeling this system, we’re going to have to make some assumptions from our statistical toolbox. Let’s assume that the number of friends someone has is normally distributed with a defined variance and average. In statistics, variance represents the expected difference squared between a random variable and the population mean. Equation 3 showcases an interesting property of this fact.

Let mean = mu, variance = sigma², N = |E|. The first half shows that variance is the expected difference squared between a random variable and the population mean. The latter half is merely a simplification of this statement.

Using equation 3, we can simply equation 2. We find that, on average, the number of friends for a given friend is as follows.

What does this mean? Well, it means that the average number of friends of friends is almost always greater than the average number of friends. To simplify: your friends (probably) have more friends than you. The math behind this generalizes beyond just friends to a number of phenomena, including why you partner has likely had more partners than you and the average college student perceives the average class size to be greater than it truly is.

Beyond Averages

I should note, this isn’t the full story. We aren’t automatons that choose our friends from a normal distribution. Instead, numerous subtleties effect who we make friends with.

As a college student, it’s pretty clear that the popular students are more often friends with popular students and the less popular students were more often friends with the less popular students.

This changes the dynamics of our system. We can no longer assume that the size of our friend groups is normally distributed. If people with few friends are friends with people with few friends (and vice versa), then our initial assumption that our friends have the same number of friends as we do may in fact be accurate.

Beyond Probability: Detecting disease spread early

The friendship paradox may be more than a brain teaser to make you feel self-consciouss. In communities, some individuals are more connected than others; these well-connected individuals have interactions with more people because they have a smaller “degree” of seperation from the average person. It follows that these central individuals are often the first carriers of contagions.

Some have proposed monitoring these central individuals to detect the spread of disease early. However, this would require analyzing every social circle in the community in order to find a few central individuals. This is practically impossible.

Instead, we could randomly select a friend of an individual in the community. Over large populations, this friend would likely be more popular and well-connected than average. This person, in theory, could be monitored as a likely early carrier of disease.

Two professors, Christakis (Harvard) and Fowler (UCSD, Political Science) did just that. They monitored the friends of random Harvard students for the flu. They randomly selected one Harvard student. They then asked them to nominate a friend to also be monitored. Their results were remarkable.

They found that the “the progression of the epidemic in the friend group occurred 14.7 days…in advance of the randomly chosen group.” The scientists were able to detect the onset of flu spread weeks before they would have with conventional technique.

Although this research was done in 2009, the paper has been cited 466 times with dozens of citations in the past year. It’s unclear whether this seemingly paradoxical quirk of statistics has been applied more recently—but the applications to the spread of COVID variants are obvious.

--

--

I’m a computer science undergraduate at Harvard. I write about technology 👨‍💻 and am interested in entrepreneurship 🚀. Check me out at www.ardayf.io