Improving language variety in template-based NLG for automated journalism

In this blog post, I describe part of my master’s thesis research in the field of Data Science conducted at University of Helsinki in 2020. A paper was then later published in Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation in April 2021. Special thanks to my master’s thesis supervisor and co-author Leo Leppänen.
To serve a wider audience, I will shortly explain some of the key concepts. However, understanding this post will require some basic understanding of computer science.
Natural Language Generation (NLG) – In NLG, the aim is to create systems that can produce fluent texts in English or other human languages from some underlying non-linguistic data, such as time-series data or images.
Template-based NLG – The classical NLG method, the final text is constructed by inserting data into templates such as "In {where}, the highest temperature for {period} was {temperature} °C". The resulting sentence could then be "In Finland, the highest temperature for summer 2020 was 33.5 °C".
Part-of-speech (POS) tag – Tag that is used to represent the role of a word in a text, for example nouns and adverbs.
Word embeddings – Natural language words represented as numerical vectors, so that a computer can understand them. These vectors learned from a collection of text, form a vector space where the distance of vectors is an approximation of the semantic distance of those words. A classical example is that you may do calculations such as "if you subtract the vector for man from vector for king, and then add vector for woman, you would get the vector for queen".
Contextual word embeddings – Word embeddings were initially non-contextual, which means that regardless of the original context, two words with same spelling would have the same vector representation. Let’s take the word bank as an example, it can mean a river bank or a financial institution. Contextual word embeddings will learn different vector representations for different contexts.
Using contextual word embeddings to improve variety of language in template-based NLG for automated journalism
Template based natural language generation (NLG) methods are still widely used in the news industry for Automated Journalism even though new and exciting statistical methods for NLG have been introduced. Why? Template based methods represent the underlying data accurately and their process is transparent.
Developing a simple template based NLG system is easy but the amount of effort needed quickly increases when more templates are introduced to increase the variety of language. And when a large number of templates are written for a specific domain, they are not transferable to new domains without heavy modification.
For my master’s thesis research, we came up with an idea: what if we relied on a low number of simple templates and allowed a contextual word embedding model to insert new and replace existing words in the output?
As retaining the meaning and controlling the output are crucial for the context of news, we narrowed our methods down to inserting predefined parts-of-speech and replacement with synonyms only.
The approach
In my thesis research, I developed two algorithms, one for the addition of words and one for the replacement of words. My approach requires contextual word embeddings with a masked language model head (BERT), a synonym dictionary (Wordnet) and a part-of-speech tagging tool (NLTK) available for the language in question (English). The specific tools and language used in my research are mentioned in brackets.
The system used as an example produces news reports from time-series data provided by Eurostat (the statistical office of the European Union).
Adding new words based on context
For adding new words, empty slots with part-of-speech tag definition are introduced to the intermediate domain templates. These empty slots are then handled at the end of the process as the contextual word embeddings require context, i.e. the surrounding words in the sentence to operate.
In Austria in 2018 75 year old or older females {empty, pos=RB} received median equivalised net income of 22234 €.
The contextual word embeddings are then used to find a set of fitting words for that empty slot. Word is considered fitting, if it receives a score above a given threshold from the contextual word embeddings. That list of words is then narrowed down by our choice of part-of-speech, which in this example is adverbs, RB.
In Austria in 2018 75 year old or older females still received median equivalised net income of 22234 €.
The Insertion algorithm in short (full pseudocode available in master’s thesis):
Input: Sentence, part-of-speech, k, min number of MASK tokens, max number of MASK tokens Output: List of fitting words
- Initialise an empty list WordsAndScores.
-
Loop for n in range from min [MASK] tokens to max [MASK] tokens 2.1. Insert n [MASK] tokens into the sentence in place of the empty slot 2.2. Take top k predictions (words and scores) from the masked language model for the masked sentence from 2.1. 2.3. Insert the words and their scores to the list initialised in step 1.
- Return subset of the list WordsAndScores where word part-of-speech is equal to the input part-of-speech and score is greater or equal than a threshold value determined.
Replacing existing words based on context
To replace words, words are marked in the intermediate templates as "to be replaced".
In Finland in 2016 households’ total {expenditure, replace=True} on healthcare was 20.35 %.
At the end of the generation process, a synonym dictionary is queried for synonyms of those words to be replaced. These synonyms are then scored with the contextual word embeddings with a masked language model head and those with a score higher than a given threshold value are considered as fitting candidates. A word from this list is then chosen by random.
In Finland in 2016 households’ total spending on healthcare was 20.35 %.
The Replacement algorithm in short (full pseudocode available in master’s thesis):
Input: Original word, sentence Output: List of fitting words
- Initialise an empty list WordsAndScores
- Get list of synonyms for original word from a synonym dictionary
-
Loop over the synonym list 3.1. Replace original word with the synonym in the sentence 3.2. With masked language model, score the word in the sentence 3.3. Add the synonym and the score to the list of words and scores initialised in step 1.
- Return subset of the list WordsAndScores where score is greater or equal to a given threshold
The results
Our approach was evaluated by human judges recruited via an online platform. The judges were asked questions 1–3 regarding the original sentence and the modified version, and questions 4 and 5 regarding word groups of fitting and unfitting words.
- Original sentence is a good quality sentence in the target language.
- Modified sentence is a good quality sentence in the target language.
- The original and the modified sentence have essentially the same meaning.
- How many of the words evaluated as fitting could be used in the marked place in sentence 1 so that the meaning remains essentially the same?
- How many of the words evaluated as unfitting could be used in the marked place in sentence 1 so that the meaning remains essentially the same?
Questions 1 and 2 were asked to evaluate whether the modification changed the sentence quality and question 3 to evaluate whether the sentence meaning remained essentially the same as that was defined as a requirement for a news generation system.
Question 4 and 5 were asked to check whether our division of words to fitting and unfitting words was meaningful. It was not. There were fitting words left out of the word group 1 and similarly some clearly unfitting ones were included in that group.
However, based on our evaluation, the modification did not lower the sentence quality and the sentence meaning remained essentially the same as when evaluating the sentences 1–3 on a Likert scale ranging from 1 (‘Strongly Disagree’) to 4 (‘Neither Agree nor Disagree’) to 7 (‘Strongly Agree’) the scores for Q1 and Q2 were roughly the same and the score for Q3 on the agree side as well. We interpret these results as a success.


The key takeaways
- Using the algorithms presented in this blog post in the context of a case study did create variety in language, while preserving quality and meaning of the sentences.
- The addition method: Narrowing the additions down to a specific part-of-speech keeps the process in control. I anticipate that the part-of-speech "safe" to be added depends greatly on the type of text. For example, for news reports, adding adjectives might add false information and thus change the meaning but then, for fictional stories, that might be an interesting option.
- The replacement method: The use of synonym dictionaries reduces the number of template variants required for natural language generation, while the use of language models allows for contextual scoring of the proposed variants so that higher quality results are selected. One could, for example, figure out some part-of-speech "safe" for replacement in one’s domain and allow replacement for all occurrences of that part-of-speech in the templates.
Limitations
- I managed to stay away from handling morphological forms with English language in the replacement method. I anticipate that developing a separate handling for morphology would allow replacing words in, for example, plural form successfully.
- The results were evaluated on local (sentence) rather than on global (full news report) level. We anticipate that, for example, when inserting a word like ‘still’ in a sentence (as presented in the example in part about the addition method), the results might differ when evaluating on a global level.
Rest of the master’s thesis research
In my thesis work, I additionally developed variants of the algorithms presented in this post for low-resource languages, which do not have a synonym dictionary or part-of-speech tagging tool available. These variants utilise pairwise aligned cross-lingual word embeddings for "translating" low-resource language words to a high-resource language to then utilise the synonym dictionary and part-of-speech tagging tool available. The results for those variants were significantly lower than the results presented in this blog post. However, some clear points of improvement are already identified.
Thank you for reading! If you learned something new or enjoyed this article, follow me on Medium. I am currently working on future articles about NLP and data engineering. You can also find me on LinkedIn. You can find the original thesis and the workshop paper from the following links:
The master’s thesis: (Re)lexicalization of auto-written news with contextual and cross-lingual word embeddings
The workshop paper: Using contextual and cross-lingual word embeddings to improve variety in template-based NLG for automated journalism