A Rescue Mission: 3 Ways Deep Learning Could Combat Human Trafficking

How Deep Learning could halt the global enterprise of modern human slavery.

Published in

Towards Data Science

10 min readJul 8, 2019

Figure 1: A 2-Dimensional t-SNE feature model illustrating useful and non-trivial relationships in the Trafficking-10k dataset.

It is 6:00 PM in the evening — the sun glares through your backseat window with a nuanced yet familiar hue of orange sharp radiance. It is the midsummer of July, and you and your family drive back to the tense centralized low-income neighborhood of Chicago. Today, your father is driving the car, slowly entering the cracked driveway until he arrives at a relaxed stop. As you open the car door, you are inundated with the breeze of perfectly crafted, transcendent, and ethereal summer weather. Ultimately, this enchanting emulsion of the radiant sun, the 23°C Northwind breeze, and the seductive blue sky culminate in you asking your mom “Can we go to the park?” In the midst of this atmospheric stress-free day, your mother provides a jovial and resounding “yes!”. Excitedly, you and your mom hold hands, ritualistically strolling to the local neighborhood park with untainted bliss and innocence. The playground platform is layered with a decades-old concrete mixture, and its epicenter is an ‘improvised’ maze of old teetering swingsets, monkey bars, and rickety slides.

The local park is seemingly anomalistic today, slowly shaded into an abyss devoid of the usual animalistic children swinging, tagging, and racing after each other. However, 12 years old ‘you’ embraces this immersive and exploratory moment of play and activity alone while your mother sits on a wooden bench 12 yards away from the play area. Seemingly, your mother receives a blasting phone call- it’s one of her friends- so she stands up and walks nearly ~4 more yards behind the bench, looking down at the ground engrossed in a private conversation. You and your mother have visited this park thousands of times, so your navigation and instinctual insight around the location should be second-nature.

Eventually, during your transition from the swingset to the monkey bars, you see a middle-aged man casually approach you cloaked in an overhanging brown sweater and some ragged blue jeans. Almost instantaneously, he sprints up towards you and snatches your hand, carrying you over his shoulders while you scream towards your mom in panicked tears. Your mother runs but can’t make it in time. He throws you into the back seat, locking the door immediately while the car drives towards the wraths of your unknown destination. Eventually, explosions of “missing child” posters become rampant across local bulletin posts in stores, streetways, and inner-city neighborhoods.

Figure 2: A conventional galleried array of missing children profiles in a Walmart entrance encouraging inflows and outflows of customers to report significant information.

The family splurges thousands of dollars and painstaking efforts in law enforcement documentation, local-state leveled crime networks, and antiquated youth kidnapping databases, left with loosely connected key points of data. Posters and ‘lost child’ bulletins in stores deliberatively wait for the “right person” to walk by with either 1.) a wealth of information or 2.) a burning commitment towards critical human trafficking investigation. Meanwhile, no one pays attention to the loosely connected bullet points of information on the posters, as the onlookers rarely have any relative connection/inclusion with the circumstances surrounding the incident. However, Deep Learning (DL) could resolve a year’s worth of intricate human trafficking crime investigations within a few days by contextualizing data, ranking its importance, and connecting the most essential key points. Ultimately, DL has the capacity to transform the frontier of secretive and illicit criminal enterprises with human trafficking being a hallmarked example.

1.) Using NLP to Classify High-Risk Sex Trafficking Advertisements

Natural Language Processing (NLP) typically exploits Long Short Term Memory (LSTM) models layered with a Recurrent Neural Network (RNN) backbone to contextualize the intent, inherent meaning, and message of a sentence. LSTM architectures operate by establishing temporary memory states of words, vectorizing those words, and outputting inferred probabilities of an intention/background meaning in the sentence. Typically, LSTM approaches towards NLP have been implemented for sentiment classifications of short text. For example, a previous Kaggle competition challenged users to detect and categorize multi-class toxic comments (e.g. Youtube comments) for the automatic severity ranking of short sentences.

Figure 3: A Deep Neural Network (DNN)-based NLP model that uses embeddings for binary classification across a given input sentence.

Wang et al., took advantage of contextual word LSTM embeddings to infer sentiment from social media posts in the paper “An LSTM Approach to Short Text Sentiment Classification with Word Embeddings”. The most dynamic platforms of user text exchange include social media platforms, text messaging, and other relative methods of swift communication. In other words, these platforms involve unique slang and modern abbreviations made only for specific cultures/groups of people to understand. Moreover, for Deep Neural Networks (DNNs), short 50–100 characters exchanges of information doesn’t provide sufficiently rich features for effective classification. In order to infer and generalize short-context lines, researchers can utilize a combination of word embeddings and LSTM models. Word embeddings serve as learned representations for a text where words/phrases that have similar meaning have a similar representation. A word embedding model vectorizes loosely-closely related words to categorize and generalize the intention of sentences with minimal context.

Figure 4: Word embeddings create sequence-lengthed representations of words in a sentence to eventually create output scores in an NLP model.

Due to the capacity to of combined LSTM-word embedding models to generalize context on scarce features, it provides potential applications in textual cases consisting of a jumbled or short amount of characters. Tong et al., for example, executed LSTM word embeddings to vectorize and infer the risk of human trafficking escort advertisements in the paper “ Combating Human Trafficking with Deep Multimodal Models”. They created the novel Trafficking-10k dataset, surfacing more than 10,000 annotated advertisements for this supervised training task. More importantly, the research team essentially integrated word embeddings of these trafficking texts/exchanges into an LSTM to produce a new context-aware embedding. Ultimately, the paper uses an intuitive two-step process of first extracting low-level embedded features from raw text and translating that into context-aware embeddings via a language LSTM network.

Figure 5: A language network processes tokenized input sentences through four LSTM’s into a Convolutional Decision Network to eventually output classification results for the traffick dataset.

Furthermore, the researchers created a 2D t-SNE modeled representation of their trafficking dataset to corroborate its fine distribution of features across Trafficking-10k. t-SNE (t-distributed stochastic neighbor embedding) models compress 3-Dimensional datapoints into a 2D model where clusters of points represent regions of similar features. The researchers illustrate the discrete representation of different input features for their baseline models in the figure below. The divisions of clustered data demonstrate that the researchers’ novel trafficking dataset is not trivial but rather has comprehensible and well-structured features:

Figure 6: A 2-Dimensional t-SNE feature model illustrating useful and non-trivial relationships in the Trafficking-10k dataset.

2. Anti-Money Laundering in Human Trafficking Exports

Money laundering schemes involve criminals circulating illicit funds through a financial system typically for large-scale financial transfers such as cross-border drug cartels, and in this case, human trafficking. An estimated 700,000 individuals are exported in the human trafficking industry which overarchingly enslaves upwards of 40 million people. In the theater of human trafficking, human “exports” are interchangeably seen as walking profits due to their monetary value in sex-trafficking and exchanges with other traffick dealers. Although the proposition is very preliminary, the most interesting integration of deep learning involves the use of graph CNNs to analyze dense/dynamic forensic-financial data in the paper “Scalable Graph Learning for Anti-Money Laundering: A First Look” by Weber et al., Anti-Money Laundering (AML) uses several widespread and emerging techniques to understand the circulation of suspicious money in a global economy. The paper extends on the method of graph analytics to determine the cash flow relationships between entities (i.e. network structures). In graph analytics, a single account in the financial system is represented as a vertex and a single transaction is an edge. The size of both the edges and the vertex are proportion to the size-potential of the account and the transaction. The following figure illustrates the dynamic of graph analytics.

Figure 7: An illustration of vertice and edge graph analyses, where nodes (the vertices-circles) represent account holders and the edges (connecting lines) represent transactions.

The conventional attack towards Graph Analytics includes manually drawn relationships/edges between vertexes and nodes through human-based deductions. In other words, the human eye is frequently responsible for catching suspicious financial relationships between individual accounts and assets. However, in a large scale industry, money laundering involves potentially millions of account processes by the minute, creating high improbability of a human/team of humans localizing these patterns. However, by employing deep learning and embeddings to vectorize these accounts/financial representations, a Deep Learning model can automatically create and process these Graph Analyses in seconds.

Figure 8: An example model for AML employed in a business for real-time alerts based on data analytics.

In addition to establishing automatic Graph Analyses in illicit financial relationships, Weber et al., also highlights the implementation of NLP in AML. Ultimately, NLP could process unstructured, heterogeneous data in real-time to produce suspiciousness scores and visualizations to reinforce human forensic analysis. NLP researchers could tap into massive updating data streams such as news articles, financial reports, social media, and public/private fraud databases. International-domestic banks could implement this system in conjunction with law enforcement agencies to create informed AML decisions in the long runs. However, on a practical scale, graph-based learning presents the most promising results through real-time transaction-relationships understanding. The following table demonstrates Weber et al.,’ training time and results with 1M nodes and 9M edges.

Figure 9: Graph Learning on AMLSim Data (1M nodes, 9M edges)

3. Accelerating Human Trafficking Investigations through Hotel Recognition

The scattered epicenters of human trafficking consist of hotels and motels. These are critical sites for both sex and labor trafficking business operations to quickly unfold and leave without traces. However, the overarching key platform in this article is escorting advertisement websites. Escort websites not only include annotated descriptions of the victims being exported or exchanged but also attach digital photographs of the victims, frequently with a background of hotel wallpaper/furniture. In the paper “ Hotels-50K: A Global Hotel Recognition Dataset” by Stylianou et al., the value of hotel image backgrounds in human trafficking investigations is recognized and manipulated through the implementation of CNNs for image attribution and labeling.

Figure 10: Example of correspondence between query image background and the ground truth prediction in the hotel dataset.

The research team curated a dataset of over 1 million annotated hotel room images from over 50,000 hotels. The dataset was solidified with diversified hotel locations extending from western-eastern U.S., Western Europe, and along popular worldwide coastlines. The locations amassed by Stylianou et al., was extracted from travel websites and hotel room profile websites. The figure below reveals the geographic distribution of their collective Hotels-50K dataset.

Figure 11: Global geographic distribution of the hotel dataset- includes the U.S., Western Europe, and notable coastlines.

In the computer vision community, scene recognition has been a primary hurdle due to the nature of feature extraction. Having a CNN architecture extract background scenes when the main features include human faces, bodies, and extremities is a difficult task for traditional computer vision models. The team used a Resnet-50 classification network pretrained on ImageNet. Resnet-50 uses residual networks to enable ease of training regardless of extremely deep layers trained on large partitions of data. The Resnet architecture can be illustrated in the figure below:

Figure 12: A massive Resnet network which uses a series of Residual layers and convolutions to create easier training.

Ultimately, the combination of a diverse hotel dataset and a deeply driven Resnet-50 architecture pretrained on the classic ImageNet weights/dataset could allow law enforcement to rapidly identify potential hotel locations and regions of interest across the state/nation. Additionally, since the dataset is strengthened with images across North America, any national movement by a human trafficking scheme can be accounted for.

Conclusion

The global theater of human trafficking is a hyper-complex game that includes multiple layers of dynamic key-players. More importantly, human trafficking is, frankly, a strategic economic exchange involving the exports of human bodies seen with monetary value. Due to the financial values of these transfers, it has become engrained into the illicit money-laundering economy and secretive online escorting platforms. However, the implementation of Deep Learning architectures could downsize the time of investigation by over ten-fold, providing law enforcement and the victim’s family with assuring data more efficiently. By using the large interspersion of opensource data in text mining, graph analyses, and image repositories, models can be trained and privately deployed for relevant officials to use in a timely fashion. It is evident that as human trafficking becomes more advanced surfacing the 21st century, A.I. will outrun it in the long term.

Bibliography

Stylianou, A. (2019). Hotels-50K: A Global Hotel Recognition Dataset. Arxiv,1–8. Retrieved July 8, 2019, from https://arxiv.org/pdf/1901.11397.pdf.

Tong, E. (2017). Combating Human Trafficking with Deep Multimodal Models. 1–10. Retrieved July 8, 2019, from https://arxiv.org/pdf/1705.02735.pdf.

Wang, J. (n.d.). An LSTM Approach to Short Text Sentiment Classification with Word Embeddings. 2018 Conference on Computational Linguistics and Speech Processing,214–223. Retrieved July 8, 2019, from https://www.aclweb.org/anthology/O18-1021.

Weber, M. (2018). Scalable Graph Learning for Anti-Money Laundering: A First Look. Arxiv,1–7. Retrieved July 8, 2019, from https://arxiv.org/pdf/1812.00076.pdf.