ESG-BERT: NLP meets Sustainable Investing

At Parabole, I worked under Charan Pothireddi’s mentorship exploring how Natural Language Processing can be applied to Sustainability analysis. This article outlines our findings.

Mukut Mukherjee
Towards Data Science

--

Sustainable Investing is a growing investment strategy that seeks strong financial returns while also making the world a better place. However, it is often a challenging investment strategy for many investors. Luckily, Natural Language Processing can help, here’s how —

What is Sustainable Investing?

Sustainable, responsible, and impact investing (SRI) is an investment discipline that considers environmental, social, and corporate governance (ESG) criteria to generate long-term competitive financial returns and positive societal impact.

Image by author

There are several motivations for sustainable investing, including personal values and goals, institutional mission, and the demands of clients, constituents, or plan participants.

Sustainable investors aim for strong financial performance, but also believe that these investments should be used to contribute to advancements in social, environmental, and governance practices. [6]

They may actively seek out investments — such as community development loan funds or clean tech portfolios — that are likely to provide important societal or environmental benefits.

Some investors embrace sustainable investing strategies to manage risk and fulfill fiduciary duties; they review ESG criteria to assess the quality of management and the likely resilience of their portfolio companies in dealing with future challenges. Some are seeking financial outperformance over the long term; a growing body of academic research shows a strong link between ESG and financial performance. [1]

Global growth in sustainable investments (USD$ Trillion) — Image by author

Investments marketed as sustainable — meaning they focus on companies that incorporate environmental and social corporate-governance practices into long-term corporate strategies — are experiencing explosive growth.

Although sustainable investing emerged in the 1970s, the movement has gained impressive traction in the last few years.

Since 2012, total assets in sustainable investing have more than doubled. [2]

As sustainable investing goes mainstream, it won’t simply act as a niche in a broader strategy — instead, it’ll be naturally integrated throughout a portfolio.

“With the impact of sustainability on investment returns increasing, we believe that sustainable investing is the strongest foundation for client portfolios going forward.”

— Larry Fink, BlackRock Chairman, and CEO

Sustainability is a global force that will continue to factor into everyday decisions.

Sustainable Investing — Challenges

The current pool of data around sustainability relies too much on voluntary corporate disclosures, such as annual sustainability reports and company questionnaires put together by institutional investors — many of which ask different questions. [3]

“Individual investors are quite challenged to obtain this type of information in a way that is easily available and informs investment decisions”

— Jean Rogers, CEO, and Founder of the nonprofit Sustainability Accounting Standards Board

It’s challenging for investors to make sustainable investment choices while relying solely on annual sustainability reports and such. They are often hundreds of pages long and take up huge amounts of human resources to analyze. This problem compounds itself over time as the number of sustainable assets increases. These reports are also never completely transparent. Companies may choose to leave certain things out of their annual reports.

Annual sustainability reports and such are also very static. They do not reflect changes in the company in real-time, they only reflect an accumulation of changes over a fixed period. This approach misses out on all the changes happening in real-time that may be reflected in news articles.

A more dynamic approach to sustainable investing would take real-time changes into account while also reducing the complexity of analyzing annual sustainability reports. This would make Sustainable Investing more scalable, increasing efficiency while reducing human prone errors.

How NLP can help

Natural Language Processing can be used to analyze sustainability reports and news articles extracting out important ESG centric insights. This reduces the complexity of analyzing reports manually, while also making the approach more dynamic by also looking at real-time changes in news articles.

Let’s take a look at an example:

We are proud to have reached 100 percent renewable electricity for Apple facilities, and carbon neutrality for Apple’s corporate emissions, including business travel and employee commute. We are embarking on a new goal to become carbon neutral for our entire carbon footprint by 2030.

—An excerpt from the 2020 Apple Sustainability Report [4]

Rather than manually reading the report and analyzing it, an NLP model could perform downstream NLP tasks such as text classification and sentiment analysis on the report, reducing the complexity of analyzing a report and making the whole process more time and resource-efficient. In this case, the NLP model would classify the excerpt as relating to “Climate Change” with a sentiment value of “positive”.

NLP empowers the investor to make a better and more efficient analysis of reports and articles, leading to a much more informed Sustainable Investment decision.

At my internship at Parabole.ai, I was able to develop ESG-BERT by further pre-training Google’s “BERT” language model on large unstructured Sustainability text corpora.

I had tried approaching this problem using ‘sci-kit learn’ models and ‘count-vectorizers’. Given the nature of this domain and its unique vocabulary, traditional ML models did not yield satisfactory results. Deep Learning models, on the other hand, required large amounts of structured text data, which we were lacking in this case. There was an abundance of unstructured text data, but structured data was scarce.

Having tried these approaches, I turned towards Google’s BERT which is pre-trained on large unstructured text corpora and hence requires much less structured data for downstream NLP tasks, such as text classification. This seemed to fit our case quite perfectly. [5]

BERT (Bidirectional Encoder Representations from Transformers) is a technique developed by Google for pre-training of Natural Language Processing models. The official BERT repo contains different pre-trained models that can be trained on downstream NLP tasks with an added output layer. These models, however, are pre-trained on general English text corpora, and they are not capable of understanding domain-specific vocabulary. [5]

Sustainable Investing as a domain has a unique vocabulary that ESG-BERT is capable of understanding. ESG-BERT was further trained on unstructured text data with accuracies of 100% and 98% for Next Sentence Prediction and Masked Language Modelling tasks. Fine-tuning ESG-BERT for text classification yielded an F-1 score of 0.90. For comparison, the general BERT (BERT-base) model scored 0.79 after fine-tuning, and the sci-kit learn approach scored 0.67.

Image by author

The applications of ESG-BERT can be expanded way beyond just text classification. It can be fine-tuned to perform various other downstream NLP tasks in the domain of Sustainable Investing.

How to use ESG-BERT?

The pre-trained domain-specific ESG-BERT model can be downloaded from the GitHub repository here. It can be fine-tuned to perform downstream NLP tasks such as sentiment analysis, etc.

ESG-BERT was also fine-tuned to perform text classification on Sustainable Investing text data. The fine-tuned model can be downloaded and served, as explained in the readme section of the GitHub repo.

Conclusion

This is a substantial step towards text mining in Sustainable Investing.

ESG-BERT can be used to make Sustainable Investing more accessible to investors. It makes Sustainability as a goal more attainable by bridging the gap between complex Sustainability data and investors. Its impacts, however, transcend just text mining. Sustainability Reports are often hundreds of pages long and filled with ESG jargon that most people would not understand. This tool makes these Sustainability reports more readable and accessible to everyone and therefore increasing the impact of Sustainable Investing. This moves us one step closer to a greener, safer, and more sustainable future.

In the near future, I will be publishing tutorials on how I further pre-trained BERT to create ESG-BERT, how I Fine-Tuned BERT using PyTorch, and talk about the other NLP approaches using “count-vectorizers”, and “bag of word” models.

References

[1] — Sustainable Investing Basics, US SIF: The Forum for Sustainable and Responsible Investment

[2] — Iman Ghosh, Visualizing the Global Rise of Sustainable Investing (2020), Visual Capitalist

[3] — Alex Davidson, A Guide to Sustainable Investing (2015), The Wall Street Journal

[4] — Environmental Progress Report (2020), Apple

[5] —Jacob Devlin and Ming-Wei Chang, Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing (2018), Google AI Blog

[6] — Karen Wallace, Interested in Sustainable Investing? Here’s What You Need to Know About Sustainable Funds (2020), Morningstar

Feel free to connect with me on LinkedIn and shoot me a message here.

--

--

Artificial Intelligence Intern at Parabole.ai — looking to use Data Science to create a more sustainable future.