StayAtHome — A Story of COVID-19

An Analysis of Trend and Perspective about StayAtHome Campaign

Robert
Towards Data Science

--

Source https://unsplash.com/@anastasiiachepinska

Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus [Source]. The first case of the novel coronavirus was reported in Wuhan, China, in late December 2019 [Source] and has spread globally since then. As a result, WHO has declared it as a global pandemic by 11 March 2020. As of 23 May 2020, over 5.2 million cases have been reported across 188 countries and territories with over 338,000 deaths and around 2.05 million people have recovered [Source]. Many countries had decided to fully or partly lockdown their countries since the pandemic announcement and many people are studying, working, and praying from home which leads to the StayAtHome campaign. For more info about the coronavirus, see cdc.gov.

StayAtHome

In 2020, StayAtHome has been a campaign around the world as an action from the governments to restrict population movements for mitigating the COVID-19 outbreak. This campaign orders society to stay at home except for essential tasks. An almost similar term is a lockdown, but some authorities think that lockdown has the potential to be misunderstood in which the people will think it includes door-to-door inspection.

The purpose of this article is to understand the trend and perspective of people about StayAtHome campaign

Data Source

The data used in this article are all tweets that contain #stayathome or phrase “stay at home”. The COVID-19 first case was reported by the end of 2019 and started getting world attention in early 2020. Therefore, the time range for the data collection was between 1 January 2020 and 20 May 2020. The tweets collected were limited to tweets in English language. The total collected tweets are 3,571,374. A list of data cleansing steps has been applied on top of the collected dataset such as remove duplicates, remove retweets, and mentions. The total of tweets after data cleansing is 3,570,271.

The Trend

Figure 1 The trend of the total number of tweets related to StayAthHome campaign between 1 January 2020 and 10 May 2020

The number of tweets related to the StayAtHome campaign was started to increase since 9 March 2020 and reached a peak on 23 March 2020. The peak occurred as many countries took important decisions close to that date. Australia began lockdown on Monday (23 March 2020) as coronavirus cases reached 1,600 [Source]. Prime Minister of the United Kingdom, Boris Johnson told the British public: “You must stay at home.” on Monday, 23 March 2020. Some states of the United States also announced a lockdown at 22–24 March 2020 such as Connecticut, Massachusetts, Michigan, New York, Wisconsin, Kansas City, KS, and Oregon. Further, there were also other countries which took the same decision close to that date such as Greece, Liberia, and Germany [Source].

After 23 March 2020, the trend was decreasing, the lowest number happened on 10 May 2020 which is the last day of the data collection time range. This number is expected to further decrease as several countries have eased their lockdown, such as Australia.

Related Hashtags

Figure 2 shows the top 25 hashtags that are related to the StayAtHome campaign beside the #StayAtHome itself. It is confirmed that #StayAtHome is highly related to #covid19 and #coronavirus. Many people tried to remind the society by using #stayhomesavelives, #socialdistancing, and #staysafe. Another finding from the hashtag is about the location, for example #NHS represents the United Kingdom, #indiafightscorona and #jantacurfew from India. Furthermore, people also tweeted about Easter day, music/art performance, and work from home during this outbreak.

Figure 2 Related Hashtags to StayAtHome campaign

Topic Modelling

The previous section discussed the context of the tweets through their hashtags. It would be interesting to understand the tweets through the text itself. One of the way is by performing topic modeling.

Topic modeling is a type of statistical model for discovering the abstract “topics” which occurs in a collection of documents. There are numerous distinct algorithms which can be applied to perform topic modeling, such as LDA, DMM, NMF, Bi-Term, and others. LDA will be used for this writing. If you are interested in the technical side of LDA, I would suggest you read the paper, or I might write another article about it next time.

As mentioned before, the trend started to increase since 9 March 2020, therefore, in performing the topic modeling, only 3,451,390 tweets since 9 March 2020 will be used. In performing LDA, one of the steps is to define the number of topics. A list of the possible number of topics ranging from 2 to 20 was applied to the algorithm and the best model was LDA with 8 topics. Here are the top words from each topic.

Topic 1: time, work, make, kid, day, school, mom, child, home, parent, good, eat, man, night, love, year, thing, god, food, miss
Topic 2: order, close, test, worker, people, mask, essential, ppl, fine, due, police, increase, health, office, hospital, wear, group, open, hit, walk
Topic 3: order, state, governor, extend, open, protest, issue, business, lift, week, place, county, month, case, city, april, trump, reopen, start, wanna
Topic 4: day, time, today, watch, make, play, enjoy, good, love, free, great, read, learn, online, show, tomorrow, week, happy, start, live
Topic 5: people, virus, spread, government, stop, die, covid, corona, make, lockdown, country, tell, don, coronavirus, fight, pandemic, listen, rule, public, point
Topic 6: case, covid, video, share, death, post, safety, today, morning, team, update, support, total, march, spend, follow, service, daily, number, positive
Topic 7: stay, safe, home, healthy, family, save, weekend, save_live, love, message, covid, wash_hand, hope, avoid, everyone, protect, care, life, time, good
Topic 8: people, work, don, do, home, day, not, thing, life, back, can, week, pay, time, make, tell, sick, feel, be, there

Interpretation

This interpretation is purely from the author’s perspective.

Topic 1 contains kid, day, school, mom, child, parent might be related to school closure and parents who need to take care of them at home.

Topic 2 contains test, people, mask, essential, hospital, wear could represent COVID-19 test and people are high encouraged to wear mask to help prevent the spread of disease

Topic 3 contains state, governor, extend, open, protest, issue, trump could represent the lockdown extension at the United States and issue about the protest to the President of the United States to open the lockdown

Topic 4 contains day, time, watch, play, enjoy, online, show could represent common activities at home which are related to game and show.

Topic 5 contains people, virus, spread, government, stop, die, covid, corona, make, fight, pandemic, listen, rule, public could represent voice from users to fight the pandemic

Topic 6 contains case, covid, video, share, death, post, safety, team, update, support, total, march, spend, service, daily, number, positive could represent an update of number of positive or death cases especially during March 2020.

Topic 7 contains stay, safe, home, healthy, family, save, weekend, save_live, love, message, covid, wash_hand, avoid, protect, care, life, time, good could relate to staying safe and healthy at home and avoid COVID-19 by washing hands.

Topic 8 contains people, work, home, pay, week could relate to working from home

Topic Proportion

Although LDA assumes one document consists of more than one topic, it would be interesting if each tweet is labeled by its dominant topic. By doing that, the number of tweets for each topic can be calculated.

Figure 3 The tweets proportion based on the dominant topic

This illustrates that topics related to working from home (Topic 8) and the voice from users to fight the pandemic (Topic 5) represents more than 40% of the total tweets. This is in line with the fact many people were forced to work from home due to this crisis and people tend to discuss the spread of COVID-19 in social media and hope that this pandemic can be over soon.

Remarks

If you are interested to further analyzing this dataset, I have released the dataset here.

Resource: Github

If you have any feedback or further discussion, please reach me out through LinkedIn.

--

--