# Logistic equation and COVID-19

## Transcript Of Logistic equation and COVID-19

*Manuscript Click here to view linked References

Logistic equation and COVID-19

Efim Pelinovskya-c ([email protected]), Andrey Kurkina ([email protected], corresponding author), Oxana Kurkinaa ([email protected]), Maria Kokoulinaa

([email protected]) and Anastasia Epifanovaa ([email protected])

a Nizhny Novgorod State Technical University n.a. R.E. Alekseev, Minin st., 24, Nizhny Novgorod, 603950, Russia b National Research University – Higher School of Economics, Myasnitskaya st., 20, Moscow, 101000, Russia c Institute of Applied Physics, Nizhny Novgorod, Ul'yanov st., 46, Nizhny Novgorod, 603950, Russia

Abstract The generalized logistic equation is used to interpret the COVID-19 epidemic data in several countries: Austria, Switzerland, the Netherlands, Italy, Turkey and South Korea. The model coefficients are calculated: the growth rate and the expected number of infected people, as well as the exponent indexes in the generalized logistic equation. It is shown that the dependence of the number of the infected people on time is well described on average by the logistic curve (within the framework of a simple or generalized logistic equation) with a determination coefficient exceeding 0.8. At the same time, the dependence of the number of the infected people per day on time has a very uneven character and can be described very roughly by the logistic curve. To describe it, it is necessary to take into account the dependence of the model coefficients on time or on the total number of cases. Variations, for example, of the growth rate can reach 60%. The variability spectra of the coefficients have characteristic peaks at periods of

1

several days, which corresponds to the observed serial intervals. The use of the stochastic logistic equation is proposed to estimate the number of probable peaks in the coronavirus incidence.

Keywords

logistic equation; generalized logistic model; mathematical modeling; COVID-19

1. Introduction

Already in this century, several global epidemics have broken out (bovine spongiform encephalopathy, avian influenza, severe acute respiratory syndrome (SARS), etc.). The latest coronavirus epidemic (CODIV-19) struck everyone with its scale and affected literally all countries forced to take emergency measures to prevent the infection spread of (closure of state borders, quarantine, self-isolation, temporary work break of many enterprises and institutions, transition to distance work and training). The number of people infected in the world exceeds 4.89 million people (the data from end-May 2020), and the number of deaths is more than 320,000 people. General information about this viral infection can be found on the Internet. The dynamics of the disease spread is illustrated in Fig. 1, built according to the World Health Organization (WHO) website (https://www.who.int/emergencies/diseases/novel-coronavirus2019/situation-reports) on 05/20/2020. In this figure, the growth in the number of coronavirus cases in the world and in several countries is indicated in a semi-logarithmic scale. The dashed lines show exponential asymptotics corresponding to doubling the number of cases in a certain number of days. Asterisks indicate the days when countries introduced restrictive measures. As one can see, the nature of the epidemics spread in each country follows almost the same scenario, first there is an exponential growth (or close to exponential) of the number of infected people, and then this growth slows down (however, the numerical values of the constants describing these curves are different for different countries). In some countries, the number of

2

cases is no longer increasing, so the coronavirus epidemic in these countries is almost over. In other countries, the curves in these coordinates are still almost straight lines, which means an exponential increase in the number of cases, and the epidemic has not yet reached its peak. In general, these curves are quite smooth, although some of them show bends associated with the action of certain quarantine measures.

Fig. 1. The confirmed number of people infected with the coronavirus on 05/20/2020. (Source: WHO data https://www.who.int/emergencies/diseases/novel-coronavirus2019/situation-reports)

Fig. 2 presents the dynamics of the infection by days, built on the same data. In contrast to Fig. 1, the curves in Fig. 2 are not smooth, and sporadic outbreaks of the number of cases are noticeable in them, which is caused by many, often unpredictable reasons. These data show that in the dynamics of the epidemic spread there are different scales from several months (the total epidemic duration), to several weeks (the incubation period), and even up to several days (the serial interval and local causes). Some of the scales are associated with certain virus properties, others – with the action of the state and local authorities that introduced restrictive rules. The

3

noted features of the dynamics of the COVID-19 virus spread can be reproduced in mathematical models.

Fig. 2. The number of infected people per day, normalized to the maximum value for each country, according to the same data.

To explain the spread of epidemics and predict their consequences, a number of mathematical models of different complexity levels are used. Historically, the first model is the Verhulst logistic equation [1], representing a nonlinear first-order ordinary differential equation (ODE) with constant coefficients. It is also used as the simplest model to describe the population growth and advertising performance. Qualitatively, it explains the increase in the number of disease cases over the time presented in Fig. 1: the exponential increase in the number of infected people at the initial stage of the epidemic development and the tendency towards a constant value by the end of the epidemic. In the context of COVID-19, this model is used in [2], [3]. The COVID-19 data analysis given in [4], showed that an exponential increase in the number of cases at the initial stage is found mainly in America and Australia, while in many European countries it is a power law. In this case, one can use the generalized logistic equation [5], [6], and it was used in [3], [7], [8], [9]. From the mathematical point of view, the dynamics

4

in the framework of the logistic equation is trivial. More complex dynamics, including chaotic, arise in the different logistic equation or when the delay for the incubation period is accounted for [10], [11], [12], [13], [14], and these models are also used to interpret and forecast COVID19 [15], [16], [17]. In more complex models, people are divided into different groups: (S) The susceptible class: those individuals who are capable of contracting the disease and becoming infected, (I) The infected class: those individuals who are capable of transmitting the disease to others, and (R) The removed class: infected individuals who are deceased, or have recovered and are either permanently immune or isolated, so the mathematical model called SIR model and its generalizations, includes a higher-order ODE system. The dynamics of such systems has not yet been sufficiently studied, and stochastic oscillations are possible in it [18], [19], [20], [21], [22], [23], [24], [25], [26]. However, models of this level can be comparatively easily implemented, they have shown their effectiveness and are actively used to model the distribution of COVID-19 [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38].

There are also models that take into account, for example the super-spreading phenomenon of some individuals or quarantine measures, including social distancing and isolation policies, border control, and a high number in the percentage of reported cases [39 this issue], [40 this issue], [41 this issue].

The statistical methods to forecast the epidemic development, based on Poisson statistics, are also worth mentioning [42], [43], [44], [45].

The main difficulty in applying mathematical models is associated with the uncertainty of the choice of coefficients in the equations. The more complex is the model, the larger is the number of its coefficients. The experience of using models to interpret “old” epidemics may not always help, since the intensity of the virus impact on living organisms changes, many epidemics were local, and, accordingly, measures to prevent the epidemic spread were different. The pattern of the curves shown in Figs. 1 and 2, shows their strong differences for different

5

countries, which is associated with different population density, differences in their customs, traditions and administrative preventive measures. Therefore, any forecasts at the initial stage of the epidemic development regarding its final stage are very rough and unreliable. As the epidemic develops, more and more constants in the equations can be determined from medical databases, but the previous constants are also corrected. Therefore, in essence, for prognostic purposes, equations with variable coefficients are solved, which mathematical properties (existence, convergence and stability) are not defined. As a result, different models with permanently “corrected” coefficients can lead to close forecast results for a short time. At the same time, for long-term forecasts, it is necessary to understand the possible temporal variability of the model coefficients, and their influence on the character of the obtained solutions.

In this study, we will try to assess the character of the scatter of the logistic model coefficients and its generalizations on the basis of the currently available COVID-19 data. The data of the epidemic development were used for the following countries: Austria, Switzerland, the Netherlands, Italy, Turkey and South Korea. Section 2 presents the classical logistic equation and shows the calculations of the coefficient average values within this equation for the above mentioned countries. It has been shown that this model with a high determination coefficient is suitable to describe the number of patients with coronavirus in most countries, except for South Korea. To take into account the data randomness on the number of cases per day, it is proposed to switch to a stochastic logistic equation with external force. The spectral and statistical properties of random parameters of this equation are investigated. Section 3 describes the same procedure within the framework of the generalized logistic equation. It is shown that, on average, this model is suitable for all the countries listed above with a high determination coefficient. Section 4 summarizes the results.

6

2. Logistic equation Here we will give briefly the main information on the logistic equation theory written in

the standard ODE form

dN

rN1

N

.

(1)

dt

N

where N(t) is the total number of people affected by the epidemic, N is the maximum number of the infected people during the whole epidemic, and r is the growth rate of the epidemic. The solution of this equation with constant coefficients can be easily found in the form

N (t) N0N exp(rt) ,

(2)

N N0 exp(rt) 1

where N0 is the initial number of the infected people and t is the time from the beginning of the epidemic. At the initial stage of the epidemic, it can be represented by an exponential function

N(t) N0 exp(rt) ,

(3)

and, if this curve approximates the increase in the number of cases at the initial stage well, we will be able to determine the growth rate r. At the same stage, the logistic model can be rejected if the data do not fit in with the exponential dependence. At the same time, the most important characteristic for prediction – the maximum possible number of the infected people N – can be estimated only at the stage of the noticeable difference between the data and the exponential curve, when the number of sick people is already not small.

To prepare medical institutions to function in an optimal way during an epidemic, it is important to know the number of infected people per day, which is easily obtained when Eq. (2) is differentiated

dN N0N (N N0 )r exp(rt) ,

(4)

dt N N0 exp(rt) 1 2

7

and this curve is nonmonotonic with the maximum given by

max

dN

rN

,

(5)

dt 4

which corresponds to the time (the epidemic peak)

T 1 ln N N0 .

(6)

r

N0

As it can be seen, these characteristics (Eqs. (5) and (6)) can only be estimated when the data are no longer described by an exponential curve and both model parameters r and N are found or known.

Let us note that the time dependences (2) and (4) are smooth functions, while from Fig. 2 it follows that dependence (4) must be non-smooth and irregular. The study of the resulting irregularity is carried out below.

Since medical statistics operates with the cases per day, it is in fact necessary to solve the difference logistic equation

K N

N

rN

1

Nn

.

(7)

n

n1

n

n

N

After removing the index n, we obtain a simple relationship between the number of cases per day (K) and the total number of cases (N)

K rN 1 N , (8) N

which in these variables is a parabola.

As an example, we will take the data on the coronavirus incidence in several countries where the epidemic is close to its end (at least its active phase is over). These countries are Austria (number of points 58), Switzerland (58 points), the Netherlands (64 points), Italy (72

8

points), Turkey (49 points) and South Korea (number of points 94). We will operate with the data on the 04/23/2020); they are taken from the WHO data (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports). Figure 3 shows the relationship between the number of cases per day (K) and the total number of cases (N) for each country. Parabolic approximations (the solid lines) arising from (8) are also presented here. Non-simultaneous 95% prediction bounds for response values (the dashed lines) are shown as well.

Evidently, the parabolic approximation of the available data is good enough for almost all of the listed countries (R2 > 0.8), but obviously has low accuracy for South Korea (R2 ~ 0.55). Therefore, later in this section we will not use the data on South Korea, for which the logistic model is not suitable (this case is analyzed in the next section). Despite a good approximation of the data for most countries of the logistic curve, the scatter of points near the parabolic curve is still not small; it indicates that it is necessary to consider the coefficients of the parabolic curve as the time functions, which, in essence, is done in the forecasts when these coefficients are refined when new data appear. Let us, for example, change only the coefficient r. Within the framework of the logistic model, this coefficient variability can be determined from the available data by using the following formula, arising from (8):

r K . (9) N 1 N N

9

a)

b)

c)

d)

e)

f)

Fig. 3. The relationship between the number of cases per day (K) and the total number of cases (N). The markers show the data, the solid line is the regression according to Eq. (8), and the dashed lines give non-simultaneous 95% prediction bounds for response values: a – Austria: N 14700 , r = 0.195, R2= 0.81; b – Switzerland: N 28400 , r = 0.163, R2= 0.81; c – The Netherlands: N∞=42580, r = 0.114, R2= 0.89; d – Italy: N∞=216600, r = 0.099, R2= 0.82; e – Turkey: N∞=133700, r = 0.144, R2= 0.94; f – South Korea: N 10300 , r = 0.158, R2= 0.55.

10

Logistic equation and COVID-19

Efim Pelinovskya-c ([email protected]), Andrey Kurkina ([email protected], corresponding author), Oxana Kurkinaa ([email protected]), Maria Kokoulinaa

([email protected]) and Anastasia Epifanovaa ([email protected])

a Nizhny Novgorod State Technical University n.a. R.E. Alekseev, Minin st., 24, Nizhny Novgorod, 603950, Russia b National Research University – Higher School of Economics, Myasnitskaya st., 20, Moscow, 101000, Russia c Institute of Applied Physics, Nizhny Novgorod, Ul'yanov st., 46, Nizhny Novgorod, 603950, Russia

Abstract The generalized logistic equation is used to interpret the COVID-19 epidemic data in several countries: Austria, Switzerland, the Netherlands, Italy, Turkey and South Korea. The model coefficients are calculated: the growth rate and the expected number of infected people, as well as the exponent indexes in the generalized logistic equation. It is shown that the dependence of the number of the infected people on time is well described on average by the logistic curve (within the framework of a simple or generalized logistic equation) with a determination coefficient exceeding 0.8. At the same time, the dependence of the number of the infected people per day on time has a very uneven character and can be described very roughly by the logistic curve. To describe it, it is necessary to take into account the dependence of the model coefficients on time or on the total number of cases. Variations, for example, of the growth rate can reach 60%. The variability spectra of the coefficients have characteristic peaks at periods of

1

several days, which corresponds to the observed serial intervals. The use of the stochastic logistic equation is proposed to estimate the number of probable peaks in the coronavirus incidence.

Keywords

logistic equation; generalized logistic model; mathematical modeling; COVID-19

1. Introduction

Already in this century, several global epidemics have broken out (bovine spongiform encephalopathy, avian influenza, severe acute respiratory syndrome (SARS), etc.). The latest coronavirus epidemic (CODIV-19) struck everyone with its scale and affected literally all countries forced to take emergency measures to prevent the infection spread of (closure of state borders, quarantine, self-isolation, temporary work break of many enterprises and institutions, transition to distance work and training). The number of people infected in the world exceeds 4.89 million people (the data from end-May 2020), and the number of deaths is more than 320,000 people. General information about this viral infection can be found on the Internet. The dynamics of the disease spread is illustrated in Fig. 1, built according to the World Health Organization (WHO) website (https://www.who.int/emergencies/diseases/novel-coronavirus2019/situation-reports) on 05/20/2020. In this figure, the growth in the number of coronavirus cases in the world and in several countries is indicated in a semi-logarithmic scale. The dashed lines show exponential asymptotics corresponding to doubling the number of cases in a certain number of days. Asterisks indicate the days when countries introduced restrictive measures. As one can see, the nature of the epidemics spread in each country follows almost the same scenario, first there is an exponential growth (or close to exponential) of the number of infected people, and then this growth slows down (however, the numerical values of the constants describing these curves are different for different countries). In some countries, the number of

2

cases is no longer increasing, so the coronavirus epidemic in these countries is almost over. In other countries, the curves in these coordinates are still almost straight lines, which means an exponential increase in the number of cases, and the epidemic has not yet reached its peak. In general, these curves are quite smooth, although some of them show bends associated with the action of certain quarantine measures.

Fig. 1. The confirmed number of people infected with the coronavirus on 05/20/2020. (Source: WHO data https://www.who.int/emergencies/diseases/novel-coronavirus2019/situation-reports)

Fig. 2 presents the dynamics of the infection by days, built on the same data. In contrast to Fig. 1, the curves in Fig. 2 are not smooth, and sporadic outbreaks of the number of cases are noticeable in them, which is caused by many, often unpredictable reasons. These data show that in the dynamics of the epidemic spread there are different scales from several months (the total epidemic duration), to several weeks (the incubation period), and even up to several days (the serial interval and local causes). Some of the scales are associated with certain virus properties, others – with the action of the state and local authorities that introduced restrictive rules. The

3

noted features of the dynamics of the COVID-19 virus spread can be reproduced in mathematical models.

Fig. 2. The number of infected people per day, normalized to the maximum value for each country, according to the same data.

To explain the spread of epidemics and predict their consequences, a number of mathematical models of different complexity levels are used. Historically, the first model is the Verhulst logistic equation [1], representing a nonlinear first-order ordinary differential equation (ODE) with constant coefficients. It is also used as the simplest model to describe the population growth and advertising performance. Qualitatively, it explains the increase in the number of disease cases over the time presented in Fig. 1: the exponential increase in the number of infected people at the initial stage of the epidemic development and the tendency towards a constant value by the end of the epidemic. In the context of COVID-19, this model is used in [2], [3]. The COVID-19 data analysis given in [4], showed that an exponential increase in the number of cases at the initial stage is found mainly in America and Australia, while in many European countries it is a power law. In this case, one can use the generalized logistic equation [5], [6], and it was used in [3], [7], [8], [9]. From the mathematical point of view, the dynamics

4

in the framework of the logistic equation is trivial. More complex dynamics, including chaotic, arise in the different logistic equation or when the delay for the incubation period is accounted for [10], [11], [12], [13], [14], and these models are also used to interpret and forecast COVID19 [15], [16], [17]. In more complex models, people are divided into different groups: (S) The susceptible class: those individuals who are capable of contracting the disease and becoming infected, (I) The infected class: those individuals who are capable of transmitting the disease to others, and (R) The removed class: infected individuals who are deceased, or have recovered and are either permanently immune or isolated, so the mathematical model called SIR model and its generalizations, includes a higher-order ODE system. The dynamics of such systems has not yet been sufficiently studied, and stochastic oscillations are possible in it [18], [19], [20], [21], [22], [23], [24], [25], [26]. However, models of this level can be comparatively easily implemented, they have shown their effectiveness and are actively used to model the distribution of COVID-19 [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38].

There are also models that take into account, for example the super-spreading phenomenon of some individuals or quarantine measures, including social distancing and isolation policies, border control, and a high number in the percentage of reported cases [39 this issue], [40 this issue], [41 this issue].

The statistical methods to forecast the epidemic development, based on Poisson statistics, are also worth mentioning [42], [43], [44], [45].

The main difficulty in applying mathematical models is associated with the uncertainty of the choice of coefficients in the equations. The more complex is the model, the larger is the number of its coefficients. The experience of using models to interpret “old” epidemics may not always help, since the intensity of the virus impact on living organisms changes, many epidemics were local, and, accordingly, measures to prevent the epidemic spread were different. The pattern of the curves shown in Figs. 1 and 2, shows their strong differences for different

5

countries, which is associated with different population density, differences in their customs, traditions and administrative preventive measures. Therefore, any forecasts at the initial stage of the epidemic development regarding its final stage are very rough and unreliable. As the epidemic develops, more and more constants in the equations can be determined from medical databases, but the previous constants are also corrected. Therefore, in essence, for prognostic purposes, equations with variable coefficients are solved, which mathematical properties (existence, convergence and stability) are not defined. As a result, different models with permanently “corrected” coefficients can lead to close forecast results for a short time. At the same time, for long-term forecasts, it is necessary to understand the possible temporal variability of the model coefficients, and their influence on the character of the obtained solutions.

In this study, we will try to assess the character of the scatter of the logistic model coefficients and its generalizations on the basis of the currently available COVID-19 data. The data of the epidemic development were used for the following countries: Austria, Switzerland, the Netherlands, Italy, Turkey and South Korea. Section 2 presents the classical logistic equation and shows the calculations of the coefficient average values within this equation for the above mentioned countries. It has been shown that this model with a high determination coefficient is suitable to describe the number of patients with coronavirus in most countries, except for South Korea. To take into account the data randomness on the number of cases per day, it is proposed to switch to a stochastic logistic equation with external force. The spectral and statistical properties of random parameters of this equation are investigated. Section 3 describes the same procedure within the framework of the generalized logistic equation. It is shown that, on average, this model is suitable for all the countries listed above with a high determination coefficient. Section 4 summarizes the results.

6

2. Logistic equation Here we will give briefly the main information on the logistic equation theory written in

the standard ODE form

dN

rN1

N

.

(1)

dt

N

where N(t) is the total number of people affected by the epidemic, N is the maximum number of the infected people during the whole epidemic, and r is the growth rate of the epidemic. The solution of this equation with constant coefficients can be easily found in the form

N (t) N0N exp(rt) ,

(2)

N N0 exp(rt) 1

where N0 is the initial number of the infected people and t is the time from the beginning of the epidemic. At the initial stage of the epidemic, it can be represented by an exponential function

N(t) N0 exp(rt) ,

(3)

and, if this curve approximates the increase in the number of cases at the initial stage well, we will be able to determine the growth rate r. At the same stage, the logistic model can be rejected if the data do not fit in with the exponential dependence. At the same time, the most important characteristic for prediction – the maximum possible number of the infected people N – can be estimated only at the stage of the noticeable difference between the data and the exponential curve, when the number of sick people is already not small.

To prepare medical institutions to function in an optimal way during an epidemic, it is important to know the number of infected people per day, which is easily obtained when Eq. (2) is differentiated

dN N0N (N N0 )r exp(rt) ,

(4)

dt N N0 exp(rt) 1 2

7

and this curve is nonmonotonic with the maximum given by

max

dN

rN

,

(5)

dt 4

which corresponds to the time (the epidemic peak)

T 1 ln N N0 .

(6)

r

N0

As it can be seen, these characteristics (Eqs. (5) and (6)) can only be estimated when the data are no longer described by an exponential curve and both model parameters r and N are found or known.

Let us note that the time dependences (2) and (4) are smooth functions, while from Fig. 2 it follows that dependence (4) must be non-smooth and irregular. The study of the resulting irregularity is carried out below.

Since medical statistics operates with the cases per day, it is in fact necessary to solve the difference logistic equation

K N

N

rN

1

Nn

.

(7)

n

n1

n

n

N

After removing the index n, we obtain a simple relationship between the number of cases per day (K) and the total number of cases (N)

K rN 1 N , (8) N

which in these variables is a parabola.

As an example, we will take the data on the coronavirus incidence in several countries where the epidemic is close to its end (at least its active phase is over). These countries are Austria (number of points 58), Switzerland (58 points), the Netherlands (64 points), Italy (72

8

points), Turkey (49 points) and South Korea (number of points 94). We will operate with the data on the 04/23/2020); they are taken from the WHO data (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports). Figure 3 shows the relationship between the number of cases per day (K) and the total number of cases (N) for each country. Parabolic approximations (the solid lines) arising from (8) are also presented here. Non-simultaneous 95% prediction bounds for response values (the dashed lines) are shown as well.

Evidently, the parabolic approximation of the available data is good enough for almost all of the listed countries (R2 > 0.8), but obviously has low accuracy for South Korea (R2 ~ 0.55). Therefore, later in this section we will not use the data on South Korea, for which the logistic model is not suitable (this case is analyzed in the next section). Despite a good approximation of the data for most countries of the logistic curve, the scatter of points near the parabolic curve is still not small; it indicates that it is necessary to consider the coefficients of the parabolic curve as the time functions, which, in essence, is done in the forecasts when these coefficients are refined when new data appear. Let us, for example, change only the coefficient r. Within the framework of the logistic model, this coefficient variability can be determined from the available data by using the following formula, arising from (8):

r K . (9) N 1 N N

9

a)

b)

c)

d)

e)

f)

Fig. 3. The relationship between the number of cases per day (K) and the total number of cases (N). The markers show the data, the solid line is the regression according to Eq. (8), and the dashed lines give non-simultaneous 95% prediction bounds for response values: a – Austria: N 14700 , r = 0.195, R2= 0.81; b – Switzerland: N 28400 , r = 0.163, R2= 0.81; c – The Netherlands: N∞=42580, r = 0.114, R2= 0.89; d – Italy: N∞=216600, r = 0.099, R2= 0.82; e – Turkey: N∞=133700, r = 0.144, R2= 0.94; f – South Korea: N 10300 , r = 0.158, R2= 0.55.

10