This article introduces the "FedTools" Python package, providing a practical implementation of a basic Bag Of Words algorithm.
TL DR: Github Repo and FedTools package.
So, what is Fedspeak?
"Fedspeak", otherwise known as "Greenspeak", was initially termed by Alan Blinder to describe the "turgid dialect of English" used by Federal Reserve Board chairpeople when making vague, noncommittal or ambiguous statements. Over recent years, Federal Reserve policy communications have evolved dramatically, owing to increases in natural language processing (NLP) capabilities of Financial Institutions world over.
Natural Language Processing
Natural Language Processing (NLP) is a field of artificial intelligence enabling machines to interact with, analyse, understand and interpret the meaning of human language. NLP has a number of sub-fields, such as automated summarization, automated translation, named entity recognition, relationship extraction, speech recognition, topic segmentation and sentiment analysis.
This article focuses on implementing a basic sentiment analysis through the use of a "Bag of Words" (BoW) algorithm. The BoW algorithm is useful for extracting features from text documents, which can be subsequently incorporated into modelling pipelines.
Bag of Words (BoW)
The BoW approach is very simple and can be implemented on different document types in order to extract pre-defined features from documents. At a high-level, the Bag of Words is a representation of text which describes the occurrence of a pre-determined set of words within a document. This is characterized by two steps:
1) A vocabulary or ‘dictionary’ of pre-determined words must be chosen.
2) The presence of the known words is measured. This is known as a "bag", as all information about word order is discarded. The model only considers the number of occurrences of the pre-determined words within the text.
Practical Implementation
Now we have outlined the BoW algorithm, we can implement this in 7 easy steps.
The first step is to install packages and modules which we shall use:
Secondly, we need to obtain historical Federal Open Market Committee (FOMC) statements, which can be found here. However, the new "FedTools" Python Library enables us to extract this information automatically:
Now, we have a Pandas DataFrame, with the ‘FOMC Statements’ in a column, indexed by FOMC Meeting date. The next step is to iterate through each statement and remove paragraph delimiters:
Now, we have to consider which dictionary of predetermined words we wish to use. For ease, we use Tim Loughran & Bill McDonald’s Sentiment Word Lists. As the list is extensive, it isn’t included within this article, but can be obtained from the consolidated code, held within the Github repo.
Next, we define a function which determines if a word is a ‘negator’. This function checks if the inputted word is held within the pre-determined list of ‘negate’.
Now, we can implement the BoW algorithm, considering potential negators in the three words prior to detected words. The function enables us to count the positive and negative words detected, whilst also saving these words within a separate DataFrame column.
The build_dataset
function iteratively invokes the bag_of_words_using_negator
function for each FOMC statement, using an input argument of the Loghran & McDonald dictionary lmdict
.
Finally, the plot_figure
function invokes the build_dataset
function, subsequently building out an interactive visualisation of the outputs.
Call the plot_figure function, and the figure is displayed:
Full code can be found within the Github Repo, with the open source FedTools package available via pip install.