
The financial services industry is falling in love with text crunching – also known as Natural Language Processing (NLP).
This infatuation is brought about by necessity, since investment companies are drowning in text data: analysis, news, contracts, compliance reports…
And the finance sector does like in-house tech. Away from Silicon Valley, investment banks in particular have been embracing engineering talent with absolute zeal. Though this process was well underway even before the 2008 financial crisis, subsequent regulation sped it up, by creating demand for better data solutions.
There are also the incentives to consider. It’s an industry driven by the hunger to win an edge over the market; ears tend to perk at the prospect of knowing more by processing data differently, especially if it makes for a good story.
I’ve been at the intersection of financial services and natural language processing for a while, and late last year decided to gather as much information as I could to answer the question:
How are banks and asset managers using natural language processing?
I decided to collect the case studies I found into a whitepaper – "How Finance Uses Natural Language Processing – 8 Case Studies in Banking and Investment Management" .
The main intent was to educate financial industry insiders on the world of possibilities that NLP now offers. (Disclosure: I’m a founder at FinText).
To that end, the report also defined key NLP and machine learning concepts (Topic Modelling, Named-Entity Recognition, Feature Selection etc.), laid out clearly so that anyone could understand:

However, as I was compiling this work, I was struck by certain findings.
I felt it’d be good sharing them with people that were perhaps already familiar with NLP, but not necessarily with the financial-services domain. It’s those findings I want to share with you here today.
I believe their relevance extends beyond the specifics of the financial sector, into any other non-tech sector looking to make use of Text Analytics.
1. In Financial NLP, start with the familiar
Automated language processing has made a leap of progress since late 2017, through the introduction of transfer learning and transformers.
On top of that, every week NLP seems to get a little bit better. Beyond models, it’s also stuff like tools, corpora, and learning resources. Pretty soon, these compounding gains start to add up.
But when you look at the tech used in actual text-processing applications within financial services, it’s not the bleeding edge of technology.
By and large, the NLP concepts being put to work in the sector are familiar and established: embeddings, term-weighing schemes, clustering, training classifiers from scratch.
Generally, the text problems investment companies are facing tend to be ones of scale and efficiency: in many cases, you’re aiming to achieve similar things to what you did five years ago, while dealing with a lot more data __ or _les_s personnel. Thus, automation moves from nice-to-have to a financial lever in its own right.
Admittedly, since I was collecting publicly-available information, it is possible that many current projects— yet to be made public -are using large models at scale. On the whole, I think it unlikely.
One reason is that in-house financial NLP applications are being developed with the payoff in mind. Technology is being used neither for its own sake, nor for abstract research. While experimentation is definitely taking place, it’s being done to solve very specific business problems. In this context, it makes sense to try established ideas first.
The second reason for deploying more mundane technology is that lots of fruit are hanging awfully low. As with any domain that deals with lots of messy datasets, a major chunk of the work is pulling data together and cleaning it. But as far as the value you can then extract from the data, there’s less of a need for wizardry before you start seeing tangible gains.
2. Solve for optimising an internal process
While NLP is not new to data providers like Bloomberg and Refinitiv (Thompson Reuters’ market data service), I was mostly focused on researching internal developments.
Reason number one being, many providers at the intersection of financial services and NLP tend to gaze inwardly. They focus on the ‘How’ (Look! We’re using this very cool technology!) and not on the ‘What’ (this is the problem we magically solve).
This mindset is probably familiar to lots of people coming from a tech background. People hack at new tech because they love it, and thus assume everyone is bound to love it too, if they would just sit down and listen.
In financial services, though, very rarely do top execs care about tech. Even at, say, Goldman Sachs or Renaissance Capital (two firms that’re very strategic about tech), top execs will still care more about the next win than that cool new widget.
The other reason for choosing to focus on internal developments was evidence from the Bank of England that, when companies look to deploy ML technology, they like to try experiment with it in-house first.

When it comes to in-house financial text analysis, it quickly became apparent that problem solving was really about understanding a business process.
The tech plays a minor role relative to the ability of the solution to supplement existing workflows.
The momentum factor within financial companies is HUGE: To actually change a process is slow going. The path of less resistance is gently removing pain points from the way things are normally done.
Therefore, The most bang for your buck is understanding the step-by-step of how people go about their jobs when dealing with textual data.
To illustrate, here’s an example of an internal NLP application at a bank, targeting a specific pain point within a wider process:
Financial services firms receive company-specific data from many different source systems. However, no single company identifier exists that’s consistently shared across different data sources. Matching different data sets with the names used internally can therefore be difficult, since a company’s name can differ between data sets.
The immediate solution that comes to mind is to maintain some form of rule-based database. But one bank took this specific pain point and addressed it using embeddings and TF-IDF, by considering each company name as a "document" and matching similar documents.
3. Financial NLP’s breakout moment is yet to come
On the one hand, I was seeing all these internally-driven NLP applications, sprouting in business areas that spanned pretty much every department, from research to back-office, from client services to marketing.
In fact, I came across many more case studies than actually made it into the report. I didn’t cite, for example, BNY-Mellon automatically sifting through emails to direct them to the relevant department, or The Royal Bank of Scotland’s efforts to incorporate some NLP into models that predict customer churn.
I also came across several examples of big banks taking a stake in small startups they were nurturing (Like JP Morgan with Limeglass, or ING with Eigen Technologies). This again underlined the importance of processes – the solutions were being perfected on real-life situations.
Yet the information I was gathering was incredibly scattered: bits of news, obscure reports, a meetup presentation, the occasional blog post written by an internal technical team.
There didn’t seem to be any industry-insider champions for NLP, no specialist media, no conferences. Combined, I was witnessing massive scope for internal improvement, but no cross-industry discussion.
And so, while AI or machine learning in financial services are already hot topics, NLP in financial services has yet to emerge as a theme. If I had to guess, I’d say this will probably change in the next few years:
Technology has been commoditising financial services for years, and competing firms now tend to follow similar processes. When one firm makes an improvement that yields large benefits, its competitors tend to follow suite. Not immediately (these are big ships to steer), but eventually.
The economic fallout from COVID-19 is just an added push, as firms will look to make the most of their resources. It’s a fairly safe bet to expect more use cases out of banks and asset managers in the coming years.
4. NLP influences trading strategies indirectly
Building stock predictions based on news is now a common machine learning exercise. In practice, though, most investment firms are not using textual data flow to devise a trading strategy.
Partly, this is because of how investments are sold. Just like a supermarket doesn’t offer a scattered jumble of food products, but partitions items by isles and sections, investment products are sold under specific categories – particularly to large-scale professional investors (like pension funds).
Thematic products tend to appeal to retail audiences (lingo for regular people like you and I), but then they compete with other niche storytelling products: climate change ETFs, tech-trackers, Marijuana-focused etc. etc.
In fact, the BUZZ Sentiment ETF – designed to track the companies most favourably mentioned in news, blogs and tweets – ended up folding due to lack of investor interest.
Overall, financial firms are mostly using alternative data to detect hidden signals – good and bad. As an illustration: notice the language in how our report was covered by one of the industry’s top publication:

These signals then factor into a wider investment process. When taking a step back to look at the case studies in aggregate, you can see that banks and investment managers are actually asking themselves:
What does it mean to be well-informed, in the current over-saturated data environment?
Given the exploding volumes of textual data, this question is only going to become more pressing.
Summary
Let’s consider these four findings together:
- Established tech already offers plenty of potential.
- Tackling internal bottlenecks is a priority.
- Many have yet to catch on to the gains in NLP. Knowledge is scattered.
- On the investment side, use cases aren’t about beating some index.
Together, the first and third findings are meaningful. It’s cool to play around with the latest huge model, but avoid sneering at what Silicon Valley might consider yesterday’s news.
A lot of the challenge still lies in data cleaning and aggregation, but to undertake such efforts, you need to first know there’s a payoff. Therefore, there’s a lot of value to communicating what can be achieved, while using the terms and concepts the industry cares about.
Thinking about making something practically useful, as per the second finding, is very difficult to do without domain expertise – how would you know what the day-to-day looks like inside a bank or asset manager if you’ve never been part of one?
Because it’s so difficult, very few go after this knowledge. If you’re looking to apply your NLP skills in finance, I definitely think it’s worth asking people in that industry some pretty mundane questions:
What does their day look like? Who do they have to chase? What do they do over and over again? what sort of spreadsheets do they look at?
That’s where the fourth finding comes in: if you want to add value to a bank or an asset manager, don’t try to replace the actual investor. Internally, these tend to be some of the most powerful people in the organisation.
The case studies where NLP was directly helpful to investors, was by offering additional processed knowledge they simply couldn’t obtain otherwise.
That’s where the real edge lies.
Sources: