The world’s leading publication for data science, AI, and ML professionals.

How to Create a Synthetic Social Network Using Python

Understand the application of graph generation algorithms in creating synthetic graphs

Photo by Maxim Berg on Unsplash
Photo by Maxim Berg on Unsplash

Finding an appropriate graph dataset to evaluate Algorithms can be a daunting task sometimes. There are several options available, and it often takes quite a while to go through them.

Even when you find the perfect graph dataset, you have to verify its usage, sharing, and privacy policies.

It brings us to this discussion.

Is there any quicker way to find a graph dataset for evaluation purposes? Fortunately, yes! We need something called synthetic graphs. These graph datasets are generated artificially.

Need for synthetic graphs

Right off the bat, the first reason is convenience.

It is convenient to generate your own dataset and not worry about things such as:

  • Controlling the size of the graph
  • Privacy and data sharing restrictions
  • Graph data format

These reasons might not convince everyone, and there are people who require real-world graphs for several graph analytic tasks which synthetic graphs cannot perform.

But I argue synthetic graphs provide a quick way to test your algorithms, and after developing a framework, you can deploy them onto real-world scenarios.

What exactly are synthetic graphs?

Synthetic graphs are generated using graph generative models. They are constructed to mimic real-world graphs as closely as possible.

There exist algorithms that generate synthetic graphs. Some of these are:

  1. Erdös-Rényi model – In this model, we start with a predefined set of nodes, say, N. Now we add the edges between nodes using a probability to generate a graph. The probability is fixed, and it is the same for all pairs of nodes in the graph. Hence, a higher probability makes a dense graph and a lower probability a sparse one. This is a simple model and does not come close to a real-world graph.
  2. Watts-Strogatz model – This is a way of generating graphs with a small-world property. In this context, a small-world is defined as something that has a small path length and a high clustering coefficient.

Path length: It is the measure of distance between two nodes in a graph. The shorter the path length, the closer the nodes are to each other.

Clustering coefficient: It measures how tightly a node’s neighbors are connected to each other.

This model starts with a regular grid-like structure with a fixed number of nodes and connects edges from a node to its nearest neighbors. It uses a rewiring probability which means that some edges are randomly removed from a place and added elsewhere.

It is used to model real-world networks, which are instances of small-world like social networks and transportation networks.

  1. Barabasi-Albert model – This graph generative model follows the "rich get richer" principle. The model connects new nodes to existing nodes that already have more connections. It causes the development of a few highly connected nodes and several poorly connected nodes in the graph. It is used in modeling scale-free networks like the internet and social networks.

Synthetic social network

We can generate a synthetic social network using the NetworkX Python library.

Let’s check how a synthetic graph uses all three models and see how they look.

Below is the code to generate a graph using NetworkX:

import networkx as nx
import numpy as np
import matplotlib.pyplot as plt

# Erdos-Renyi model
G1 = nx.erdos_renyi_graph(n=50, p=0.2, seed=42)

# Watts-Strogatz model
G2 = nx.watts_strogatz_graph(n=50, k=5, p=0.4, seed=42)

# Barabasi-Albert model
G3 = nx.barabasi_albert_graph(n=50, m=5, seed=42)

# plot side by side
fig, ax = plt.subplots(1, 3, figsize=(15, 5))
# add title to each plot
ax[0].set_title('Erdos-Renyi')
ax[1].set_title('Watts-Strogatz')
ax[2].set_title('Barabasi-Albert')
nx.draw(G1, ax=ax[0])
nx.draw(G2, ax=ax[1])
nx.draw(G3, ax=ax[2])
plt.show()

The graph visualization looks as follows:

Synthetic graph comparison (Image by Author)
Synthetic graph comparison (Image by Author)

Adjusting the hyperparameters generates drastically different graphs.

For a Social Network, we would like to add node features and node labels. This can be done using faker python library, which generates fake names.

Let us see how it looks.

from faker import Faker
import networkx as nx
import matplotlib.pyplot as plt

faker = Faker()

names = []

# generate 10 unique names
for i in range(10):
    # Generate a random name 
    name = faker.name()
    # Append the name to the list
    names.append(name)

# Barabasi-Albert model
G = nx.barabasi_albert_graph(n=10, m=5, seed=42)

# add the names to the graph
mapping = {i: names[i] for i in range(len(names))}
G = nx.relabel_nodes(G, mapping)

fig, ax = plt.subplots(figsize=(3, 2), dpi=300)
nx.draw(G, with_labels=True, node_size=50,width=0.1, font_size=3.5)
plt.show()
Synthetic social network
Synthetic social network

The synthetic social network now has labels. Each node represents a person, and their name acts as the node label.

It is possible to generate other node attributes like age, sex, and occupation of each person in the social network and update the graph.

Now we have a fully loaded synthetic social network that can be used to perform graph analytic tasks.


Thanks for reading, and cheers!

Want to Connect? Reach me at LinkedIn, Twitter, GitHub, or Website!


Related Articles