The world’s leading publication for data science, AI, and ML professionals.

Digital Text Analysis: Street Poetry in the Dutch Language Area

A digital perspective on poems in the streetscape.

Street poetry in Eindhoven. Photograph by Siebe Hiemstra.
Street poetry in Eindhoven. Photograph by Siebe Hiemstra.

Street Poetry. Once you are open to its existence, you cannot unsee it. Sometimes large and very present, other times more discrete and modest. In 2016, the website www.straatpoezie.nl was launched by dr. Kila van der Starre as part of a dissertation, during which different forms of poetry in public space were investigated. The website is a crowdsourced database with over 3000 entries of poetry to be found on the streets in The Netherlands and Belgium.

On the market square of a small town in a sparsely populated part of the Netherlands, Meijel, lies a small literary work of art. Just outside the core of this village begins national park De Peel, "an infinite space with unequalled fauna and flora. Inspiration for many novels, poems, songs and more", like this one:

En d’n hemel waas veurroej (the sky was fire-red) Mist hing op ‘t land (mist hung on the land) _’s Oaves laat da stong de Peel in brand (_late at night the Peel was on fire)

This small piece of text comes from the song ”De Peel in Brand” (The Peel on fire) by the Limburg band Rowwen Hèze. It describes how a small boy found the fog in nature reserve De Peel combined with the red evening sun to resemble a fire.

Even though the choice for this fragment about De Peel was not without controversy (the band is not from Meijel…) the choice for this literary work of art situated in Meijel makes sense. The town profilates itself als the front gate of the national park. When you check it on a map, you see indeed that it is closely related to De Peel. Regardless of the motivations behind a certain (fragment of a) poem or even song, it is interesting to investigate whether the poems that are chosen in the streetscape somehow reflect on or represent a part of its surroundings. One of the most straightforward ways (for now) is to check if and how often particular words appear in a certain condition. In this article we’re going to zoom in on the presence of two specific words, ‘stad’ (city) and ‘zee’ (sea), given the condition of the poem being located in an urban area or not. One of the hypotheses could read that ‘sea’ appears in the vicinity of the sea, in poems where the power and attraction of the sea is celebrated. And poems with ‘city’ will mostly appear in urban areas. After all – why would you praise (or even name) the city when you’re somewhere in a non city-like environment?

Roadmap

The compiler of the dataset, Dr. Kila van der Starre, let me use the dataset for my own research purposes when I contacted her. First, we’re going to have a look at the data (a new dataset can surprise you in many ways). After that, a polygon is uploaded of all urban areas in the world, which we’ll delimit by latitude/longitude so only The Netherlands and Belgium are visible. Finally, we create two separate dataframes with the words ‘sea’ en ‘city’.

Data The huge advantage of crowdsourcing is that a scientist can gather a much larger collection of data in a way that would take years if it were to be done alone. However, the downside is that the information is not always accurate, complete and neat. Crowdsourced datasets are fun to work with, forcing a data scientist to be creative. You have to try multiple solutions before you find one that works.

When opening the dataset with Pandas, we see the following:

Image by author
Image by author

When browsing through the entries, it is immediately noticeable how large the differences are between the filled-in fields of the entries. Some entries have only the two mandatory fields filled in, while others have complete stories under ‘Relation with location’, ‘Remarks’ and ‘More info’. An NLP-scientist feasts on different text columns like this! 😉 Anyway, for here and now, the most interesting columns are Latitude, Longitude and Text.

System description

The next step is to import a polygon of all urban areas in The Netherlands and Belgium:

The left one is the polygon we just imported, the map on the right is for reference:

Right: Google Maps (2021) Netherlands/Belgium. Terrain image, retrieved https://www.google.com/maps/@51.8509528,4.1818916,7.5z/data=!5m1!1e4
Right: Google Maps (2021) Netherlands/Belgium. Terrain image, retrieved https://www.google.com/maps/@51.8509528,4.1818916,7.5z/data=!5m1!1e4

We’re almost there. In order to further work with this data, we need to merge the latitude and longitude of each entry. Also, to make sure that the distribution urban/non-urban is not too skewed, we do a quick check:

When counting the values in the column ‘Is_urban’, we obtain the ratio 1886 (urban) to 1014 (non-urban), so roughly 2/3 of the entries is located in an urban area.

Finally, we can create a dataframe with any word we want to see where on the map in The Netherlands or Belgium they are used. Let’s take ‘sea’ and ‘city’ like we discussed in the introduction.

That’s it!

Image by author
Image by author

Interpretation

When we select poems with the word ‘zee’ (sea), we see that -surprisingly- only a tiny part of the poems that contain the word sea are actually located in the vicinity of the sea. An interim conclusion is that poems not just reflect on or represent their surroundings; they could also state the absent. A lead that could be followed, is whether these poems are situated along other types of natural waters like lakes and rivers.

When selecting poems with the word ‘stad’ (city), we see that most poems are indeed to be found in urban areas. Interestingly, on the Wadden Islands (in the left picture, the four collective dots left-under ‘sea’), ‘sea’ is present, and not a single poem brings up the city.

Future work

In the next article, we’re diving into the poets behind fragments of street poetry. What can be investigated about the age, gender, nationality and active years of these writers? Follow my channel to stay tuned.

References

Karsdorp, F., Kestemont, M., & Riddell, A. (2021). Humanities Data Analysis. Amsterdam University Press.

Van der Starre, Kila. (2021). Poëzie buiten het boek. De circulatie en het gebruik van poëzie. Universiteit Utrecht.


Related Articles