The world’s leading publication for data science, AI, and ML professionals.

Reasoning on Financial Intelligence

How you can help fight organized crime using AI.

KNOWLEDGE GRAPHS & REASONING

Photo by Jason Leung on Unsplash
Photo by Jason Leung on Unsplash

Serious crime is motivated by profit, and no matter the size, most criminal acts leave a financial trail. Financial Intelligence is the process of gathering information about criminals that seek to exploit vulnerabilities within the financial sector to disguise illicit funds.

Here we are not talking about some random Ponzi Scheme or how to spend the swag of the robbery of Ocean’s Eleven. It is not so fun or glamorous. Here, we are talking about destroyed lives. A trillion dollars affair with all kinds of despicable underlying crimes, named predicate offenses, such as corruption, child exploitation, slavery, drug trafficking, tax evasion, human trafficking, human smuggling, organized crime, and many more.

The nerve center of Financial Intelligence consists of Financial Intelligence Units (FIUs), independent agencies placed in every country to fight money laundering, and the finance of terrorism. FIU analysts collaborate with the private sector, law enforcement, and other national security partners to identify, detect, and disrupt criminal networks and the proceeds of their crimes, to achieve a financial system free from criminal abuse. They receive suspicious transaction reports and release a detailed and coherent analysis of the actors, financial transactions, and suspicious crimes involved.

If you want to know more about the functioning of the anti-money laundering system, you can keep informed here:

For A Few Dollars More

Whether you are a financial analyst, an AI geek, or a researcher, wouldn’ you like to help out, with the use of reasoning on knowledge graphs?

Let’s see how!


THE CONTEXT

In the daily investigation duties of an FIU, many tasks have to be accomplished. A Financial Intelligence Unit receives thousands of possible suspicious financial data in the form of Suspicious Transaction Reports (STRs) from the private sector and develops complex Money laundering cases to unveil the seemingly legal origin of criminal proceeds.

In doing this, an FIU should assess the suspicion (work out a score or a heuristics to value the level) and establish the offenses involved using a certain classification. In the process, managing a huge amount of data, also from other government agencies, FIUs need to perform risk-based scheduling (for example, pursuing first more suspicious cases or major crimes). Connecting the dots of all the tasks and subtasks is the reconstruction process, generally writing a report with all the possible hypotheses and conclusions done including all the pieces of evidence.

AML tasks of an FIU - Image by Author
AML tasks of an FIU – Image by Author

In a world where crime is transnational, the collaboration among all the actors participating in the world-wide FIU system is paramount. Many international stakeholders claim that strong and consistent use of modern technologies can foster and sustain virtuous cooperation among FIUs as well as between FIUs and the private sector. [1,5,6,7,8]

Following an emerging rule-based view of AML, recently published in the Industrial Track of ‘Declarative AI 2020’, we can set AML tasks and subtasks of interest to an FIU as Reasoning tasks over an encompassing anti-money laundering knowledge graph (AML-KG), modeling all the relevant domain objects and interconnections.

A knowledge graph is a semi-structured data model characterized by three components:

  1. a (ground) extensional component (EDB, extensional database), with constructs, namely facts, to represent data in terms of a graph or a generalization thereof;
  2. an intensional component (IDB, intentional database), with reasoning rules over the facts of the ground extensional component;
  3. a derived extensional component, the reasoning component, produced in the reasoning process, which applies rules on the ground facts. [4]

Reasoning on a knowledge graph concerns the different ways to traverse it while answering queries, reaching all the interlinked involved entities along different paths, possibly requiring the creation of new parts of the graph: creating new knowledge.

AML Knowledge Graph - Image by the Author
AML Knowledge Graph – Image by the Author

In this picture, the AML-KG is represented at a high level of abstraction.

In particular, in the AML-KG, Suspicious Transaction Reports (STRs), facts from social networks and media, newspapers, or follow-up feedback from law enforcement authorities are represented as EDB of the KG, as well as data from enterprise knowledge systems. For intermediaries, they include transaction, enterprise, and KYC data. IDB can be used to represent and operationalize official regulations, in a ‘RegTech’ approach, and encode custom criteria, including money laundering patterns (e.g., circular wire transfers, pyramidal control structures), domain rules, and suspicious behaviors, widely known by financial intelligence analysts.

Most of the money laundering patterns, suspicious behaviors, and financial business rules can effectively be described with a Knowledge and Reasoning Representation language (KRR), supporting full recursion, ontological reasoning, probabilistic reasoning, and machine learning models. [1,4]

If you want to read more about the desiderata for a KRR to be able to reason on a knowledge graph, you can check out this recent article on Towards Data Science.

In an AML-KG, reasoning rules should be designed by financial analysts and domain engineers. In fact, in this domain, crafted rules embody valuable domain knowledge that cannot be induced from the data. For instance, a compliance regulation, the internals of a money-laundering pattern, or tactics for financial trail obfuscation can be well known to the analysts to the point that inducing them from data would result in lower accuracy and explainability. On the other hand, when rules embed parametric machine learning models, such parameters can in turn be inferred from data, moreover learning rules from data can sensitively expedite the ordinary rule design process. The learning bus in the picture ‘AML Knowledge Graph’ denotes such a hybrid deductive/inductive approach.

The reasoning process encoding the application of the rules (of the KG IDB) to the EDB can be executed using a KGMS (Knowledge Graph Management System), a software reasoning engine that performs complex rule-based reasoning tasks over very large amounts of data and besides, provides methods and tools for data analytics and machine learning. [4]

In this case, I have used Vadalog System by the University of Oxford as KGMS, but there are many others available such as RDFox, LogicBlox, Ontotext, and others.


THE TRIGGER

Let’s go through a case of collusion and corruption in which many actors and intertwined financial activities are involved. In this case, we exploit the power of full reasoning on an AML-KG with a good part of the information an FIU can employ in its daily tasks.

This is a real money-laundering case from a real FIU that we can now effectively analyze and handle with a reasoning approach. The case has been anonymized and partially simplified for graphical reasons, but all the essential and hard-to-solve elements in the case are included. We will use of course icons and nicknames for persons and entities.

As we see before in the context section, an FIU can receive thousands of STRs.

Some are full of details, some are false positive, most of them are so blurred and impenetrable that neither scientists nor software can achieve any conclusion, alone.

Let’s consider this STR sent by a bank to an FIU.

The trigger of the case in the AML-KG.
The trigger of the case in the AML-KG.

A formerly convicted individual (with the icon of ‘The Bad Guy’) asks Acme Bank for a loan. That’s it. No other clue.

Probably Acme Bank is trying to accomplish its compliance burden, reporting any even remotely suspicious activity to its local FIU. But…

What is an FIU supposed to conclude about AML activities regarding this report from Acme Bank?

The FIU should take the crystal ball and give us a fully explainable and reasonable justification of whether there are some reasons to believe this transaction is anomalous or not. Or, more typically, it should rely on the financial intelligence analyst’s experience and expertise.

This process starts with a long and painful analysis, possibly including other private entities, other FIUs, and more simply all the information the FIU already has and should try to use to solve this riddle. In doing this, an analyst should follow the money, the instinct, the experience, and more practically, start querying all available data. This is a perilous process in which the generalization of patterns and all possible variants of well-known criminal activities are envisioned and accurately considered if it is the case.

The goal is to understand if this STR leads to something related to money-laundering, and here the prominent activity is deciding on the suspiciousness of this STR and as a consequence, assessing the suspiciousness scoring.

Let us now see how it can be carried out with a rule-based approach!


THE CASE

We can start to explore a typical money laundering pattern based on the concealment of the ultimate beneficial owner of an asset.

In this case, a person who is issuing a loan request from a bank of which he or she is the ultimate beneficial owner may intend to launder unclean money via the bank. The ultimate beneficial owner is the entity that truly controls an asset. And at the same time, we need to specify all the possible patterns to conceal the ultimate beneficial owner of an asset, in this case, Acme Bank.

But how can we express the meaning of company control? And how can I generalize all possible paths of control by an individual or another company with ‘something’ a computer can run in a reasonable amount of time?

That’s how!

This is a set of 5 rules written in Vadalog, a language of Datalog± family that extends Datalog with many useful features such as existential quantification, aggregations, stratified negation, Boolean conditions, mathematical expressions, probabilistic reasoning, embedded functions, and arbitrary machine learning models while guaranteeing scalability thanks to PTIME data complexity for the reasoning task. [4]

With this set of rules, we can easily describe the concept of control of a company.

Let us describe the concept of company control via a set of Datalog rules as follows:

Rule 1 is the reflexive property for the predicate ‘control’. In general, a company (or a person or a family) x controls a company y, if:

  • (Rule 2) x directly owns more than 50% of y
  • or, (Rule 3) x controls a set of companies that jointly (i.e., summing the share amounts), and possibly together with x, own more than 50% of y. [2,3]

We can also assume that the CEO of a company has full control over it (Rule 4). This is of course a simplification but applies to this case. In Rule 5 we see the aggregation function that accumulates, summing them, direct and indirect ownerships along all possible ownership paths.

With 5 lines of Datalog, we can test thousands of path control among millions of companies in an AML- KG in minutes if we run the reasoning process on state-of-the-art cloud machines and with the Vadalog System. Instead of trying to find plausible paths via queries or with ad-hoc programs or algorithms. Also consider that expressing unknown navigation patterns in the graph is not trivial and involves resorting to sophisticated devices such as recursion, beyond the reach of standard programming skills of the analysts.

Let’s go deeper into the activation of these simple five rules on the FIU data!

This is the partial result of the AML-KG of the reasoning process combining the IDBs and EDBs of the KG. In black solid, the edges already present of the EDB that represent the ownership levels between companies, as well as the link isCeoAt. While the dotted green Control edge between My bank and Acme Bank has been inferred by the reasoning! So this green link belongs to the derived EDB part, the reasoning part, inferred through the application of the rules.

For now, we discovered that our bandit does not control Acme Bank. We only know that My Bank controls Acme Bank.

Now, after having tested a very common pattern of money laundering that is hiding the beneficial ultimate owner of an asset, let’s go further.

Sometimes criminals, especially in organized crime, try to conceal the control of an asset through their affiliates, often even family members or relatives, as usual within Mafia families.

So let’s add some more rules to spot this kind of relationship.

The goal of this other group of rules is to cluster individuals into families that can be real families or just criminal affiliates in a broader sense. In particular, Rule 1 contains a specialized machine learning model for link prediction, denoted by the #sim embedded function. It returns a score p measuring how likely the two individuals i_1 and 1_2 are spouses. Observe that the "::" symbol deviates from the standard Datalog syntax and denotes a kind of ‘rule probability’. In particular, Rule 1 yields spouses facts with a probability depending on p.

Rule 3 states that every individual belongs to a family, his/her own, and Rule 4 merges families f_1and f_2 whenever they contain two spouses, i_1 and i_2. Similar rules could merge families having individuals with different kinds of relationships. The overall effect is clustering the person’s space.

Then, we can link the first group of rules with the second group in Rule 5 where we can aggregate ownership amounts from different family members.

This is what we can finally reveal, using the reasoning on the available data:

Applying the second group of rules we find out the family members of ‘The bad guy’, in particular his spouse P1. The family also contains P2, P3, and potentially more people. Knowing the family members, we can determine the overall relationship of the family f with Acme Bank. To this aim, Rules 5 aggregates ownership amounts originating from different family members that together possibly control the asset with all the different contributions.

And yes, this is our case!

We can finally conclude that ‘The bad guy’ does not control Acme Bank but he is concealing the control of Acme Bank through his MAFIA family. P2 directly owns 0.34 of My Bank and P1 indirectly owns the 0.21% of My Bank deriving by 1%0.93%0.23%. In total, family f controls My Bank owning the 0.55% of the shares. My Bank, in turn, controls Acme Bank holding with 0.52% of the shares via a pyramidal shareholding structure, probably set up to obfuscate the connection between the two companies.

Family f controls Acme Bank and ‘The bad guy’ was trying to conceal the control of Acme Bank through his family. So, the trigger of the case, the initial STR containing only the transaction in which ‘The bad guy’ asks for a loan to Acme Bank, the bank that he indirectly controls, is probably an attempt to launder money by justifying unclean money with a fake loan. The overall confidence in this conclusion depends on the certainty in the existence of the personal relationship, the output of a link prediction model, as well as on the intrinsic reliability of the money laundering pattern.

But then, how probably?

Remember that the goal here is deciding on the suspiciousness of this STR and as a consequence, assessing a score of the suspiciousness. To settle this score, we can use this rule:

This rule tells us our individual is not literally the ultimate beneficial owner of Acme Bank BUT his family as a whole is. Moreover, as we have seen, the ‘w’ in the left-hand side of the rule, controls the bias towards activating the rule. It is in some sense a measure of the importance of the rule and, consequently, controls the likelihood of the suspicious.

Here is the full set of 11 rules used for the explanation of this case:


CONCLUSIONS

With the activation of only 11 rules in an AML-KG containing regular data in the availability of an FIU, even with a so minimal and muffled trigger like this one on the left, we can solve a complex financial intelligence riddle that includes: many interlinked entities, a lot of possible variants of a criminal pattern (the concealment of the ultimate beneficial owner of an asset) and serious crimes such as collusion, corruption, and organized crime.

Although these elements could be considered by financial analysts, the number of variants, the volume of the data, and the specific nature of data (high number of interconnections, highly cyclical, etc.) suggest that this is highly unlikely, at least in a reasonable amount of time.


Eleonora Laurenza – Medium

Follow me on Medium for more.

Let’s keep in touch also on Linkedin!


REFERENCES

[1] L. Bellomarini et al.; Rule-based Anti-Money Laundering in Financial Intelligence Units: Experience and Vision (2020); RuleML+RR.

[2] S. Ceri et al.; Logic programming and databases (2012); Springer.

[3] A.A. Berle Jr.; The modern Corporation and private property (1932); Gardiner.

[4] L. Bellomarini et al.; The Vadalog system: Datalog-based reasoning for knowledge graphs (2018); PVLDB.

[5] EU Commission; Report from the Commission to the European Parliament and the Council – Assessing the framework for cooperation between Financial Intelligence Units (2019);

[6] Refinitiv: Innovation and fight against financial crime. https://www.refinitiv.com/en (2019).

[7] J.Cassara; "Modernizing AML Laws to Combat Money Laundering and Terrorist Financing (2017) Congressional Testimony: foundation for defense of democracies"; https://www.judiciary.senate.gov/imo/media/doc/Cassara%20Testimony.pdf

[8] The Financial Action Task Force; Methodology for assessing technical compliance with the FATF Recommendations and the effectiveness of the AML/CFT systems (2019).


Related Articles