DATA SCIENCE — COVID-19 —VISUALISATION — PROGRAMMING

Prediction and Analysis of COVID-19 Data: Model — Proposal Algorithm- Vuong Simulator

The Anh Vuong, Dr.-Ing.
Towards Data Science
11 min readJun 7, 2020

--

Photo by Thuy Chung Vuong

How to find undiscovered COVID-19 infection cases?

Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here.

To join the fight against the Covid-19 pandemic, I have done my private research for data prediction, and analysis based on my experiences in the data processing, and the electronics. With my limited resources, I could of course only use my Surface laptop, my Raspberry PI, and the software of the open-source community, but I would like to report a new result: An algorithm for estimating undiscovered infections, using my “Vuong model for Covid-19 data analysis”, its implemented software could be downloaded here.

An important and challenging research objective in COVID -19 data analysis is the search for the number of undiscovered infection cases, the “Dunkel Ziffern” (germ. word), because of the transmission of the virus, SARS-CoV-2 is very complicated and unknown. After the Coronavirus Incubation Period (time from exposure to the development of symptoms), the virus could be transmitted from one person to another. There are reports of a symptom-free (asymptomatic) person who could transmit the SARS-CoV-2 virus to another person ( e. g. “Huge asymptomatic pool of coronavirus infected worldwide”). Coronavirus patients could be reinfected by the virus after their apparent recovery (latest good news “New data suggest people aren’t getting reinfected with the coronavirus”, May 19, 2020)

SIR MODEL

There are already many data models for the analysis and simulation of the COVID-19 pandemic to calculate the number of infection cases. A known non-linear model is the SIR model [Kermack — McKendrick theory, 1927, 1928, 1932]: S for the number of susceptible people, I for the number of infectious people, R for the number of recovered or deceased (or immune) people. R can also be called “resistant” or “removed”.

From the real data, the current daily new cases of infection (new cases), one used the differential equations of the SIR model to optimize the parameters “β” and “γ” of the function I (t) of the SIR model, therefore on could find the number of infectious people at every time points of I(t) (s. Fig.1).

  • Beta “β” is the rate of infected susceptible people.
  • Gamma “γ” is the rate of infected people becomes resistant.

You could find many software packets for visualization of the SIR model, e. g. [mattravenhall / BasicSIRModel] and the optimizing of beta rate and gamma rate to find I(t), is used to estimate the number of infectious people., e. g. Brian Collins, Dynamic Modeling of Covid-19.

Fig1. Optimizing of beta rate and gamma rate to find I(t), is used to estimate the number of infectious people

The optimization of the SIR model has some critics points:

  • The model has only two parameters beta and gamma rate, which could be not enough for Simulation of Covid-19 pandemic.
  • There are so many other parameters that medicine, virologists have not yet discovered.
  • The SIR model has been adopted as a closed system. N = S + I + R was assumed to be constant over time. In the real world, we have an open system, N is not constant over time because people are moving and the virus could be transmitted by air.

Therefore there is also a comment by [James Jansson,COVID-19 modeling is wrong]

Fig. 2 Vuong model for COVID-19 Data Analysis to prediction and analysis of CODVID-19 Data.

VUONG Model for Covid-19 Data Analysis

Therefore I would like to report my proposal model: the Vuong model for COVID-19 Data Analysis to prediction and analysis of CODVID-19 Data.

The time values t in the Covid-19 data functions are date and time (see Python document: “datetime module”). Therefore they are discrete values, so the function y (t) is a “time series”. I used here y[x] instead of y (t), thereby x is the date-time. It is making it easier for programming and plotting later.

We start now with the description of the Vuong model for Covid-19 Data Analysis (see Fig. 2).

The inputs:

  • nc[x] is the number of daily confirmed infection cases, the “new cases”.
  • nd[x] is the number of daily deaths.

The outputs :

  • I[x] is the estimated number of daily infection cases.
  • G[x] is the estimated number of daily recovered cases (germ. Gesund Fälle).

The parameters list would be very long and still much unknown. Only important known parameters are listed here:

In the Vuong model, I implemented two parameters: the incubation period and the recovery period

The concept to estimate the infections cases has 3 steps:

Step 1

From daily new cases of infection nc[x], we will calculate the prediction function Ip [x]. Ip[x] is the number of infected persons including the undiscovered infected persons. In comparison to the SIR-model optimization, I have used an additional parameter — the incubation period Tau -.

Ip [x] = nc [x, Tau]

Tau: incubation period. After the incubation period, the virus can be passed on to another person. The incubation period is estimated between 2–14 days.

Step 2:

The recovery function G [x] has been calculated from Ip[x] in step1, nd[x] (the daily deaths) and RG Parameter:

G [x] = g [Ip [x], nd [x], RG)

RG is the Recovery Period, not the R-factor. RG is the time one infected person would be recovered. RG could be 14 days for mild cases and approximately 30 days for severe cases (Coronavirus: How long does it take to recover?, BBC News)

Step 3:

The final estimated daily number of infection cases is calculated from the number of prediction function Ip[x], which has been derived from step 1 and excluded the recovery estimated cases from step 2.

I [x] = Ip [x] — G [x]

Let go now to details!

Vuong-algorithm to Prediction of COVID-19 Data

The prediction of the number of infectious cases Ip[x]:

Prediction function Ip [x] depends on nc[x] (the daily confirmed new cases) and the incubation period Tau.

For every Ip [x], there is an r[x-1], the reproduction factor. This means that the number of cases of infection Ip [x] at time point x is an r[x-1]-time of the number of cases nc[x-1] at the previous time point x-1.

Example: At time x-1, there are 100 cases of infection (Ip[x-1] = 100) and the reproductive factor would be 5 times (r[x-1] = 5), then there would be 500 cases of infection the next day (Ip [x] = 500).

Ip [x] = r [x-1] * Ip [x-1] (1)

r[x-1]: reproduction factor for Ip [x].

From (1), one can calculate:

Ip [x-1] = r [x-2] * r [x-3] * … *r [x-1-n ] * …. * r [x-N] Ip [x-1-N ] (2)

With n = 0, …, N, N is the number of Ip values ​​for the calculation

Assume: All r[x-1-n] are the same in the time intervals N

Ip [x-1] = (R ** N). Ip [x-1-N] (3)

R**N: R exponential N

According to Information of Covid-19, we know that after an incubation period Tau, one person could transmit his virus to the other person.

One can say that the Ip [x-1-N) persons could transmit their virus to the others after the incubation period. That means R will not active before time point x-1-N-Tau. It follows:

R*Ip [x-1] = (R ** (N-Tau))* Ip [x-1-N] (4)

(N-Tau-1) log R = log (Ip [x]) — log (Ip [x-1-N] (5)

For the estimation of Ip [x], we have only to take 2 number nc[x-1] and nc[x-1-N] from time intervals {x / x= x-1,… .x-1-N}

Then we calculate R from log R

log R = (log (nc [x]) — log (nc [x-1-N] ) / (N-Tau-1) (6)

R = 10 ** log R

R is the reproduction factor.

Then the estimated number of infection cases Ip[x]:

Ip [x] = R * nc[x-1] (7)

Programming concept:

Vuong Algorithm has been implemented in tavuong_simulator.py

For practical programming, we run a “Windows” of length N, with the start time [x-1] and end time [x-1-N], on the x-axis x = 0 … .. N. to calculate the R. In equation (5) we only need 2 values ​​nc[x-1] and nc[x1-N]. However, these values ​​can be 0 (zero). Therefore we have to put additional rules to avoid the log (0) calculation:

The calculation Ip and R are in two vectors of two fields, Ip [x] and R [x]

The first N values ​​of nc[x] are used as start values of called Ip [x].

Ip [x] = nc [x], for x = 0… N-1

After x > N:

If nc [x] = 0 and nc [x-1-N]! = 0, i.e. there has been an infection in the past, we take R [x] = R [x-1]

If nc [x] = 0 and nc [x-1-N] = 0, i.e. there was an infection in nc [x], we take R [x] = 0

If nc [x]! = 0 and nc [x-1-N]! = 0, i.e. there was an infection in the past, we take R [x] = calculated (s. equation (6))

If nc [x]! = 0 and nc [x-1-N] = 0, i.e. there has been an infection in the past, we take R [x] = R [x-1]

Recovery function G[x]

G [x]: the recovery function, which depends on nc[x](the daily new cases), nd[x] (daily deaths), and the recovery period RG

From knowledge of the coronavirus, an infected person becomes either recovery or dead after an RP recovery period. The number of infected persons in time x-RP will be either be dead (nd[x]) or recovered.

G [x] = nc[x-GP] — nd[x] (8)

Estimate Number of infected Cases

In the end, you can estimate the cases of infections that are still ongoing existed

I [x] = Ip [x] — G [x] (9)

Covid-19 — VuongSimulator.py

I have used the Python Development KIT platform “tavuong/covid19-datakit” to develop the Vuong model for Covid-19 data analysis, which detailed description could be read in Readme.md or in my last paper, so I would like to describe here briefly.

Install and Starting

$ github clone https://github.com/tavuong/covid19-datakit.git

$ pip install numpy

$ pip install Matplotlib

$ cd ~ \ covid19-datakit \

$ python. \ covid19-VuongSimulator.py [by PC]

$ python3. \ covid19-VuongSimulator.py [by Raspberry PI]

This example shows you how to start Covid19-VuongSimulation and give the parameter over its dialog. You could use the test data in data folder of projects or with your data,

$ cd ~\covid19-data-kit\

$ python .\covid19-VuongSimulator.py [by PC]

VMODEL > country? World

VMODEL > new_case-file ? ./data/new_cases.csv

VMODEL > deaths-file ? ./data/new_deaths.csv

VMODEL > VuongSimualtion mode ? 6

VMODEL > Incubation Period? 7

VMODEL > Recovery Period ?14

Then it will plot and print result

Vuong_Simulator >confirmed Inf.= 5555708

Vuong_Simulator >Incub.P =7/ Est. inf.=5937024

Vuong_Simulator >Reco.P =14/ Est. recovery=4233977

Vuong_Simulator >Reco.P =14/ Est. Inf.=1703047

Vuong_Simulator >deaths = 350212

FIG. 3 Analysis of Covid-19- Data [World], Data Source: https://ourworldindata.org/coronavirus-source-data

You could use this command line to start the Vuong Simulator to get the same result

$ python .\covid19-VuongSimulator.py -c World -o test.png -m ta -n .\data\new_cases.csv -d .\data\new_deaths.csv -g 0.98 -r 14 -t 7 -s 6

just been implemented with the following options:

$ python .\covid19-VuongSimulator.py -h

-n <new_Cases_file> -d <new_Deaths_file> -o outputfile

-c country

-t Incubation Period Tau

-r Recovery Period

-s simulation mode

The Programm calculates and plots according to simulation-mode. Only in mode 1 is time function, in other modes, accumulate of time functions are shown

1 : R-factor after Vuong Modell (in development)

2: confirmed Infection nc[x] — Deaths nd [x]

3: confirmed Infection nc[x] / recovery Function G[x]

/ final estimated Infection I[x] /Deaths nd [x]

4 : confirmed Infection nc[x] — Recovery Function G[x]

Deaths nd [x]

5: confirmed Infection nc[x] / estimated Infection Ip[x]

/ final estimated Infection I[x] /Deaths nd [x]

6: confirmed Infection nc[x] / estimated Infection Ip[x]

/ Recovery Function G[x]

/ final estimated Infection I[x] /Deaths nd [x]

Examples commands

$ python .\covid19-VuongSimulator.py -c “United States” -o test.png -m ta -n .\data\new_cases.csv -d .\data\new_deaths.csv -g 0.98 -r 14 -t 7 -s 6

FIG. 4 Analysis of Covid-19- Data [USA], Data Source: https://ourworldindata.org/coronavirus-source-data

Prediction of undiscovered infections cases with Covid-19- VuongSimulator

Vuong Algorithm is used to analyze COVID-19-Data, which CSV-files are downloadable from open-source here. I would like to make the analysis for affecting 213 countries and territories countries in the world, or cities, e.g. California or NewYork or Düsseldorf, but I have limited capacity. So I have used Vuong Simulator to analyze COVID-19 Data of some countries: Italy, Germany, Sweden, the United States, and “the world”, you could see the results in Fig. 5

FIG. 5 Analysis of Covid-19- Data of Italy, Germany, Sweden, USA. Data Source: https://ourworldindata.org/coronavirus-source-data

I have found some interesting prognosis for the coronavirus pandemic by the test analyze with my proposal Algorithm:

  • In Italy, in Germany, the pandemic has been strongly reduced (Fig.5).
  • In the United States, the pandemic has been beginning reduced (Fig.4).
  • In Sweden, the pandemic has been still expanding (Fig.5).
  • For the world, a second crisis (the second wave) would be coming (Fig.3).

Summary

The algorithm in the Vuong-Model for Covid-19 Data Analysis has been developed to search the undiscovered infection cases from the data of confirmed cases and deaths. The Vuong-algorithm has been based on an open system, using additional COVID-19 Pandemic information and discrete mathematics. It could be an alternative solution to the well-known optimization method for the SIR model.

You could download the software “Covid19-Vuong Simulator” integrated into the development kit to analyze undiscovered infection cases with the default test open-source COVID-19 data or your data.

Visualizing and modeling for covid19 data have been continually developing, it will be updated in the future. If you have developed a new interesting model-module or presentation-module, please don’t hesitate to contact me for consult development and maybe to contribute your modules to the open-source and MIT licensed project tavuong/covid19-datakit over Github.

Have fun!

Acknowledgments for Covid-19 data: Hannah Ritchie.

Acknowledgments for review: Prof. Dr. Kien Pham

Acknowledgments for support and coffee cake motivation: my wife Thi Chung Vuong

--

--