THE LLM OSINT ANALYST EXPLORER SERIES

How Large Language Models Changed My Entire OSINT Workflow

Introduction: Leveraging LLMs to augment Intelligence analysis

6 min readApr 14, 2023

Welcome to the LLM OSINT Analyst Explorer series!

Welcome to our series that delves into the potential of Large Language Models (LLMs) for supporting the generation of expert-driven insights on specialized topics such as Defense and National Security.

Let’s start by making one thing absolutely clear: in their present condition, I would not endorse utilizing Large Language Models (LLMs) for direct fact-checking or information gathering, at least not without appropriate safeguards and verification procedures (the reasons for which will be demonstrated throughout this series). The primary motivation behind this series is to offer a preliminary solution for employing them with these specific objectives in mind.

Throughout this journey, we will demonstrate the development of custom knowledge extraction pipelines and the creation of Subject Matter Expert (SME)-driven knowledge graphs, made possible by LLM copilot capabilities. These knowledge graphs can subsequently be employed to generate a personalized LLM interface, constrained by the curated knowledge, thus addressing one of the major challenges associated with these models: the lineage and traceability of the information they produce.

Our ultimate objective is to inspire a new generation of tech-savvy Open Source Intelligence Analysts, encouraging them to construct their own OSINT toolkits, now with the support of LLM coder copilots.

Check the bottom of this article to discover the entire series!

How did it all start?

As an intelligence analyst, my days were packed with five essential activities that required quick thinking, in-depth analysis, and strong communication skills to support closing the OODA loop (Observe, Orient, Decide and Act). The same activities that I now can do 200% faster thanks to the release of LLMs API and the generation of bespoke Natural Language Processing (NLP) pipelines:

Rapidly comprehending developing situations, such as terrorist attacks, civil unrest, or natural disasters, and formulating an accurate understanding of the situation.
Collecting, sorting, and organizing vast amounts of information on entities that I monitored, such as individuals, locations, organizations, weapons, and information sources to build expert curated profiles.
Understanding the intricate relationships between these entities, including supply chains, propaganda campaigns, and non-state armed group organizations, in order to disrupt, support or neutralize their networks.
Compiling and summarizing all the above into custom-made, concise intelligence reports, which informed my superiors, operational teams, and customers.
These reports also contained my personal assessment of the situation, and eventually my recommendations for the implementation of a defined course of action.

I did this for more than the first half of my career, slowly evolving towards data-driven intelligence workflows to eventually end up leading product prototypes for private intelligence companies. This is when I discovered Deep Learning and their applications in the realm of NLP. And this changed EVERYTHING!.

Progresses in NLP sciences rocked my world, twice !

You see, apart from the fifth point, where I use my Subject Matter Expert (SME) knowledge to generate a specific analysis, the four first activities have direct equivalent in the realm of NLP. Put all the following NLP model classes together in dedicated, Intelligence Analysts focused NLP engines, and you have a solution that could improve their efficiency by several order of magnitude, cutting their reading time, automatically generating reports, creating and updating at machine speed bespoke and expert driven knowledge graphs!

Search and recommendation engines: algorithms that recommend similar content to users based on their previous reading history or the communities they belong to. Bespoke recommendation engines can be fine-tuned to recommend related documentation that may not be within the user’s immediate sphere of interest, but still relevant. Why is this useful? It enables the discovery of “unknown unknowns” — a nightmare for intelligence analysts. By expanding the scope of recommended content beyond the user’s known areas of interest, bespoke recommendation engines can uncover unexpected connections, insights, and patterns that would otherwise remain undiscovered. This can be especially useful for intelligence analysts who need to stay ahead of the curve and anticipate emerging threats or opportunities.
Topic extraction and Document clustering : These are specific models that can generate topics from multiple texts and detect similarities between documents publishing by dozens, sometimes hundred information feeds. Their value is most of the time underated, but they are probably one of the most essential part of a NLP pipeline dedicated to augment the efficiency of intelligence analysts. Why? Because you don’t get to read every single document to get a higher view of the main problematics evolving within your multiple information feeds.
Named (and unnamed) Entity Extraction and Disambiguation (NER / NED): involves identifying and categorizing named entities, such as people, organizations, locations, equipment in a given text. The extraction part involves locating and tagging entities, while the disambiguation part involves determining the correct identity or meaning of an entity, especially when it can have multiple interpretations or references in a text. Why is it useful? Think of disambiguated entities as unique anchors in your document sets. From these anchors, you can build entire NLP logics to keep tracks of meaningful facts about this entity, order it by timeliness and relevance. This will allow you to start building bespoke, expert curated profiles.
Relationships extraction models, consist in identifying and extracting semantic relationships between entities in a given text. The goal of relationship extraction is to identify the nature and type of relationships between different entities, such as individuals, organizations, and locations, and to represent them in a structured format that can be easily analyzed and interpreted. Why is it useful? Generating accurate connections between across thousands of documents will build expert driven, queryable knowledge graphs in a matter of days. From multiple conversations with US military leaders, especially General (retired) Stanley Mc Crystal, this was the missing capability during most of the Iraqi counter-terror effort.
Multi-document abstractive summarisation: automatically generated a concise and coherent summary of multiple documents on a given topic, by creating new sentences that capture the most important information from the original texts. Unlike multi-document extractive summarization, which selects and arranges existing sentences from the original documents to create a summary, multi-document abstractive summarization uses natural language generation techniques to create new sentences that capture the essential information from multiple sources. The process involves analyzing multiple documents, identifying the most relevant information, and generating a summary that reflects the overall message of the original documents while preserving the coherence and fluency of the text. Why is is useful? These models enable users to obtain a concise and coherent summary of the most important information from a large amount of text data.

Note: most of these use a combination of the others to improve their overall performance — a document clustering model might use topic extraction and NER to improve the quality of their clusters.

A word of caution: numerous “AI-powered” intelligence companies have assumed that simply presenting the output from various models would suffice in creating a novel type of intelligence solution. We believe this approach is misguided. Instead, these models ought to be integrated into custom engines designed specifically to enhance an analyst’s efficiency in managing the different activities listed at the top of this article.

The problem and the revolution

The problem back in 2015/2016 was the overall performance and accuracy of these models. To accomplish such complex tasks (considering how challenging it is for a human to summarize large texts into one or two paragraphs, even with proper training), these models needed to understand the nuances of human language in extreme detail, as well as the context in which each word was used. Unfortunately, the necessary scientific advancements and computational power were simply not available at the time.

Then came two revolutions: the commercialization of cloud technologies and the creation of transformer models, along with their latest iterations — the Large Language Models (LLMs). This changed everything, and this time, by several orders of magnitude.

In the following articles, we will explore how we can utilize LLMs to create powerful, NLP-driven intelligence workflows. We will also discuss the current limitations of this technology and how to overcome them.

However, before we delve into building the various logics required for such revolutionary workflows, it is essential that we take some time to understand what LLMs are and what they are not, as well as the opportunities and risks they present. Only by truly understanding this technology can we use it responsibly and unleash its full potential.

Liked this article?

This is the introduction to the LLM OSINT Analyst Explorer Series, many more will follow and the Series article links will be added here as they are published!

Episode 1: Brief Introduction to LLMs and Their Transformer Logics for Defense Content Analysis.
Episode 2: GPT-4 for Defense entity extraction

We will gradually get deeper into the implementation details to create an automated, LLM-powered and expert-verified knowledge base that could be used for targeted Intelligence work, so don’t miss out and let’s connect! You can find me on LinkedIn or follow me on Medium!

Thanks for your support, and I shall see you in the next one !