Predicting Covid-19 Infection Using Fuzzy Logic
How to forecast coronavirus infection cases using fuzzy time series
Introduction
Using time-series forecasting methods are a common way to analyze the rate of a pandemic infection. This would help us to create better decision support systems. what I have written here, is part of what I have learned in the course of Fuzzy systems which was taught by professor J-Asgari at the Isfahan University of Technology. So let’s start with the definition of time series.
A time series is a set of regular time-ordered observations of a quantitative characteristic of an individual or collective phenomenon taken at successive, in most cases equidistant, periods/points of time [more]. There are two main prediction methods for time series data:
1- Statistical tools: ARMA ARIMA SARIMA [more]
2- Intelligence tools based on Neural Networks like RNN and LSTM [more]
Data Exploration
We have reviewed data from Europe download from here. Also, all data files, source codes and notebooks have been uploaded on this Colab Notebook. let’s take a quick glance at dataset:
import pandas as pd
import warnings
import matplotlib.pylab as plt
%pylab inlinedf = pd.read_excel('COVID-19.xlsx')
df.head()
As you can see above, we have to aggregate data by ‘continentExp’ and ‘countriesAndTerritories’ :
#Aggregate by continentExp
continentExp = pd.pivot_table(df, values='cases', index=['dateRep'],columns=['continentExp'], aggfunc=np.sum, fill_value=0)#Aggregate by countriesAndTerritories
countriesAndTerritories = pd.pivot_table(df, values='cases', index=['dateRep'],columns=['countriesAndTerritories'], aggfunc=np.sum, fill_value=0)
continentExp["Europe"].plot(figsize=(15,5), color=["green"], title='Europe')plt.show()
Fuzzy Time Series Prediction
In 1965, Zadeh proposed the concept of fuzzy sets as a tool to test the unknown degree of membership. Many fuzzy studies then attempted to use this method as a theoretical framework, which is widely used in the research fields of natural sciences and social sciences, obtaining good study achievements. The fuzzy time series is also an analysis method derived from the concept of fuzzy sets. Fuzzy sets, presented by Zadeh, have numerous presentations, such as fuzzy sets, fuzzy decision analysis, and fuzzy time series. [1].
What is pyFTS Library?
This package is intended for students, researchers, data scientists, or who want to exploit the Fuzzy Time Series methods. These methods provide simple, easy to use, computationally cheap and human-readable models, suitable for statistic laymen to experts. Github.
#install pyFTS
!pip install pyFTSCollecting pyFTS Downloading https://files.pythonhosted.org/packages/41/3a/c5ef1879b33fdf07dc5678e8484d9ea637924afd6c66f14d65001cb1cddf/pyFTS-1.6-py3-none-any.whl (175kB) |████████████████████████████████| 184kB 3.3MB/s Installing collected packages: pyFTS Successfully installed pyFTS-1.6
Define linguistic variables
By a linguistic variable, we mean a variable whose values are words or sentences in a natural or artificial language. For example, Age is a linguistic variable if its values are linguistic rather than numerical, i.e., young, not young, very young, quite young, old, not very old and not very young[2].
We have defined 10 variables “A0 — A9”, which A0 is the lowest Infection and A9 is the highest one. Then we do fuzzification that means set each record to a fuzzy set by maximizing method.
from pyFTS.partitioners import Griddata = data.valuesfs = Grid.GridPartitioner(data=data,npart=15)fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[15,5])fs.plot(ax)
fuzzyfied = fs.fuzzyfy(data, method='maximum', mode='sets')pd.DataFrame(fuzzyfied).assign(values = list(data)).tail()
Temporal patterns
This part describes the rules which have been generated considering Precedent →Consequent logic, for example, after A0 Precedent, we moved to the A0 and the next timestep (t+1) A1. so here we have:
from pyFTS.common import FLRpatterns = FLR.generate_non_recurrent_flrs(fuzzyfied)print([str(k) for k in patterns])output: 'A0 -> A0', 'A0 -> A1', 'A1 -> A1', 'A1 -> A2', 'A2 -> A2', 'A2 -> A3', 'A3 -> A3' ... ,'
Rule generation
In generating fuzzy rules, overlaps between classes need to be resolved. There are two ways to resolve overlaps: one is to generate fuzzy rules without considering overlaps and then to resolve overlaps by tuning fuzzy rules, and the other is to resolve overlaps while generating fuzzy rules. We call the former static fuzzy rule generation and the latter dynamic fuzzy rule generation [more].
According to patterns we discussed above, rules of timesteps movement can be generated.
from pyFTS.models import chenmodel = chen.ConventionalFTS(partitioner=fs)model.fit(data)print(model)
The rules above are showing precedent and consequent on each linguistic variable and generating rules. i.e any data that happened in A0 have a consequence of A0 and A1 which means we have no data that comes A2 after A0.
Fuzzification and modeling
Fuzzification is to divide the continuous quantity in the fuzzy domain into several levels, according to the requirement, each level can be regarded as a fuzzy variable and corresponds to a fuzzy subset or a membership function [more].
fuzzyfied = fs.fuzzyfy(18876, method='maximum', mode='sets')fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[15,5])forecasts = model.predict(data)forecasts.insert(0,None)orig, = plot(data, label="Original data")pred, = plot(forecasts, label="fuzzified forecast")legend(handles=[orig, pred])
Conclusion
The methodology in ARIMA time series forecasting can forecast the average error. We can find the average error using the formula in the fuzzy time series. The error values of the fuzzy time series is less than the ARIMA time series error values. So it is concluded that the fuzzy time series gives us better results than other time series models.
References
Forecasting Covid-19 tweeting volume using Prophet and SARIMA model