The Ndrangheta on Trial

Using NetworkX to analyze Europe’s most powerful mafia

David Rhode
Towards Data Science

--

Suspected members of the Ndrangheta (Source: Italian Carabinieri)

Italy’s biggest mafia trial since the 1980s has just begun. Over 900 witnesses will give evidence against 350 defendants in a specially-constructed, high-security courthouse in Calabria. The men in the dock stand accused of being members of the Ndrangheta, one of the most dangerous criminal organizations in the world. For decades it was the poor cousin of Sicily’s Cosa Nostra and the Camorra of Naples. Overlooked by Hollywood, the Ndrangheta was little known outside of its home turf, Calabria’s remote and rugged hillsides at the ‘toe’ of Italy’s ‘boot’. Its core businesses were extortion and kidnapping. Its structure was a loose association of families or clans, whose blood ties made them almost impossible to penetrate.

As the Ndrangheta evolved in recent years, this structure seems to have changed. The organization’s income grew rapidly when it formed ties with the South American cartels and took a grip on the European cocaine market. In 2013, a report by the Demoskopia research institute claimed the Ndrangheta had a turnover of £44bn, more than Deutsche Bank and McDonald’s combined. CNN estimates that it controls up to 80% of the cocaine coming into Europe. Along with this growth, the Ndrangheta also expanded its reach, coming to dominate the underworld in the northern Italian district of Lombardy. This expansion was accompanied by more centralized control and a higher public profile.

The Ndrangheta’s spread became clear in 2007, when a feud between two clans saw six members gunned down outside a pizzeria in the small town of Duisburg in western Germany. The following year Carmelo Novella, the head of the Ndrangheta in Lombardy, was murdered in a bar in San Vittore Olona, a small town outside Milan. His demands for more autonomy had angered the bosses back home in Calabria.

The dataset for this project comes from this period of turbulence. In response to unfavourable headlines, the authorities in Lombardy launched Operation Infinito, tracking the movements and interactions of more than a hundred Mafiosi over a period of two years. It culminated with the recording in October 2009 of the Summit di Paderno Dugnano, where senior gangsters met to elect a successor to Carmelo Novella. In 2010, the evidence gathered during Operation Infinito formed the basis for dozens of indictments, mostly on charges of participation in a mafia-type organization — defined by an omerta of silence and the power to intimidate as a group.

Once these cases were concluded, the authorities released the information gathered during Operation Infinito. A decade later, as the Ndrangheta again finds itself in the dock, this surveillance may provide the best insight into the structure of its leadership. So what can data science tell us about Europe’s most powerful mafia?

The data can be downloaded here.

The Infinito dataset refers to 48 meetings that took place over a two-year period. More than 150 Mafiosi were involved, but only a handful normally attended any given meeting, and each Mafioso typically only attended one or two. The result is quite a sparse matrix. The NetworkX library allows us to visualize this by creating a ‘bipartite’ network — in the graph below, each of the blue nodes represents a Mafioso, linked to specific red nodes, or meetings. A small number of blue nodes have links to multiple red ones, but mostly the network isn’t densely populated.

(Image by Author)

The next stage is to see how the Mafiosi relate to one another by creating an undirected graph. To do this, the Combinations class from Itertools can be used on each meeting’s attendance list to produce a list of ‘edges’, linking the Mafiosi who make up the nodes of the new network. The network contains 151 nodes, or Mafiosi, 1619 edges (connections between Mafiosi), and each Mafioso on average is connected to 21 others. So what does the network tell us about the Ndrangheta’s structure, and who its most important members are?

The obvious starting point is to look at how well connected each Mafioso is, or how many ‘degrees’ each node has. This reveals significant inequalities throughout the network. Six Mafiosi have over fifty connections, more than twice the network average.

Another metric to consider is the clustering coefficient, which measures how far nodes tend to cluster together — if a Mafioso's close associates are all linked, then he will have a high clustering coefficient. A better-connected node might well have a lower clustering coefficient, as having more connections makes it less likely that all the connections will be linked to each other. We can see that most nodes have a high coefficient, but a few don’t. The six with the lowest clustering coefficients are the same guys with the highest number of connections. It looks like these six are all connected to many other Mafiosi, who in turn are not all connected to each other — these six are the major hubs in the network

(Image by Author)

A third indicator to consider is ‘Betweenness Centrality’. This looks at how central each node is, in terms of how many of the ‘shortest paths’ between other nodes pass through it. Most Mafiosi score low on this metric, but the top seven scores include the same six names that we saw on the other two metrics, plus one other guy. So who are the seven?

  • Alessandro Manno was the boss of the Ndrangheta in the Milanese district of Pioltello. Following Operation Infinito he would be sentenced to sixteen years.
  • Cosimo Barranca was the boss of the Ndrangheta in central Milan and was sentenced to 14 years following Operation Infinito.
  • Antonino Lamarmore was Boss in Limbiate, a district of Milan.
  • Pietro Francesco Panetta was boss in the district of Cormano and a member of the Provincia ruling council.
  • Cosimo Raffaele Magnoli was Panetta’s deputy in Cormano.
  • Francesco Muia was boss on the Ionian coast and was sentenced to 24 years in 2016.
  • Francesco Cristello was sentenced to life imprisonment in 2019 for his part in the 2010 murder of a criminal called Rocco Stagno.

When we visualize the network as a whole, it looks like Francesco Cristello is the odd one out. The other six all had high numbers of connections, low clustering coefficients, and high Betweenness Centrality. Cristello made the list purely on the basis of Betweenness Centrality, which taken by itself may have overstated his importance. Certainly, by eye he seems to be far less prominent within the network, and there is no published information to suggest that he was a major figure.

(Image by Author)

So how useful is NetworkX in understanding the Ndgrangheta’s structure? A combination of degrees, clustering coefficients, and Betweenness Centrality seems to have identified six bosses, but how many more are there, and how good a job can it do in picking them out?

Following Operation Infinito, the prosecutor in Milan assessed the status of dozens of Mafiosi and classed some of them as bosses, but it seems that their report has never been published in English. But to judge how effective the network analysis is, we need to assign class labels (ie ‘boss’ or ‘not_boss’) to each Mafioso. Luckily, even without a definitive list from the prosecutor, there may be a work-around.

It turns out that not all of the Ndrangheta meetings were equally important. The final one became known as The Summit of Paderno Dugnano, held at a meeting hall in a small town outside Milan. The hall was located in Piazza Falcone e Borsellino, ironically named after two anti-mafia magistrates who were murdered in the 1990s for their part in the ‘Maxi-Trials’ of 1986–7.

Surveillance footage of the Ndrangheta meeting in Paderno Dugnano (Source: Milan Prosecutor’s Office)

The summit took place on October 31st, 2009, and marked the election of Pasquale Zappia as the new Mastro Generale, charged with maintaining relations between the Ndrangheta based in Lombardy and their homeland in Calabria. In the absence of a definitive list of bosses, we can probably assume that the attendees of the Summit were the most senior members of the Ndrangheta in Lombardy.

This isn’t a perfect solution, as there were several attendees who remain unidentified. However it seems a reasonable assumption that these unidentified men do not come from within the surveillance dataset (or there probably would have been no headache with identifying them) — it’s likely they were not local Ndrangheta from Lombardy, but visitors from Calabria.

When we look at the confirmed names who attended the Summit, they certainly seem to be fundamental within the network. They appear to be part of a centralized structure, not the loose association of families or clans that the Ndgrangheta used to be. Network analysis seems to confirm the assessment of Italy’s chief anti-mafia prosecutor Pietro Grasso that the Ndrangheta has become “hierarchical, united and pyramidal”.

Summit attendees are shown in red (Image by Author)

The question then is whether we could have predicted who would be invited to the meeting before it took place — and to what extent does NetworkX enhance our ability to do that? Before we can answer this, we first need to create a new network graph that doesn’t contain the data from the summit meeting (or the target variable would be helping to predict itself). The number of edges in the network goes down from 1619 to 1443, and the average number of degrees drops from 21 to 19.

Examining the new, adjusted network, we have a baseline of 85.9%. This is the proportion made up of the majority class (non-bosses), so a model will need to score higher than this if it can be said to have any predictive value.

To understand what value the NetworkX analysis adds, we first need to see how well we can do without it. Training a Logistic Regression model just on the attendance data (without NetworkX features such as degrees, clustering coefficient, and Betweenness Centrality), we get a cross-validated Accuracy score of 0.866, slightly above baseline. The Accuracy score on the test set is the same, but the model has Recall and f1 scores on the test set of 0, suggesting that it struggles to pick out members of the minority class — a common issue in classification problems when there is a class imbalance.

When we allow the model to work with the data from NetworkX, the situation improves significantly. The cross-validated training score goes up to 0.891, and the Recall and F1 scores for the test set both go up to 0.500. We can get a further improvement by adjusting the model’s threshold, with the best performance on the test set of 0.900 Accuracy, 0.750 Recall, and 0.667 f1. Using SMOTE to oversample the minority class doesn’t offer any additional improvement, though it does help relative to using the model’s default threshold.

(Image by Author)

Out of various other models (RandomForest, KNN, GradientBoostingClassifier, SVC), the best performance comes from the Support Vector Classifier — a cross-validated training set Accuracy of 0.891, Accuracy on the test set of 0.933, test set Recall of 0.750, and test set f1 of 0.750.

Confusion matrix for the most successful SVC model (Image by Author)

Classification problems are always challenging when the classes are imbalanced, and the features generated by the NetworkX analysis were a big help in allowing the models to score significantly above baseline.

The code for this project can be viewed here.

--

--

Data Scientist, with experience in TV production and establishing a luxury brand