Next Article in Journal
A Nurse-Led Education Program for Pneumoconiosis Caregivers at the Community Level
Next Article in Special Issue
Socioeconomic Inequalities in COVID-19 in a European Urban Area: Two Waves, Two Patterns
Previous Article in Journal
Adherence to COVID-19 Preventive Measures in Mozambique: Two Consecutive Online Surveys
Previous Article in Special Issue
Reduction of Multidrug-Resistant (MDR) Bacterial Infections during the COVID-19 Pandemic: A Retrospective Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating Unreported COVID-19 Cases with a Time-Varying SIR Regression Model

1
Department of Graphics and Digital Technology, School of Urban Design, Wuhan University, Wuhan 430072, China
2
Department of Urban Planning, School of Urban Design, Wuhan University, Wuhan 430072, China
3
China Data Institute, Ann Arbor, MI 48108, USA
4
Center for Geographic Analysis, Harvard University, Cambridge, MA 02138, USA
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2021, 18(3), 1090; https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18031090
Submission received: 21 November 2020 / Revised: 18 January 2021 / Accepted: 22 January 2021 / Published: 26 January 2021

Abstract

:
Background: Potential unreported infection might impair and mislead policymaking for COVID-19, and the contemporary spread of COVID-19 varies in different counties of the United States. It is necessary to estimate the cases that might be underestimated based on county-level data, to take better countermeasures against COVID-19. We suggested taking time-varying Susceptible-Infected-Recovered (SIR) models with unreported infection rates (UIR) to estimate factual COVID-19 cases in the United States. Methods: Both the SIR model integrated with unreported infection rates (SIRu) of fixed-time effect and SIRu with time-varying parameters (tvSIRu) were applied to estimate and compare the values of transmission rate (TR), UIR, and infection fatality rate (IFR) based on US county-level COVID-19 data. Results: Based on the US county-level COVID-19 data from 22 January (T1) to 20 August (T212) in 2020, SIRu was first tested and verified by Ordinary Least Squares (OLS) regression. Further regression of SIRu at the county-level showed that the average values of TR, UIR, and IFR were 0.034%, 19.5%, and 0.51% respectively. The ranges of TR, UIR, and IFR for all states ranged from 0.007–0.157 (mean = 0.048), 7.31–185.6 (mean = 38.89), and 0.04–2.22% (mean = 0.22%). Among the time-varying TR equations, the power function showed better fitness, which indicated a decline in TR decreasing from 227.58 (T1) to 0.022 (T212). The general equation of tvSIRu showed that both the UIR and IFR were gradually increasing, wherein, the estimated value of UIR was 9.1 (95%CI 5.7–14.0) and IFR was 0.70% (95%CI 0.52–0.95%) at T212. Interpretation: Despite the declining trend in TR and IFR, the UIR of COVID-19 in the United States is still on the rise, which, it was assumed would decrease with sufficient tests or improved countersues. The US medical system might be largely affected by severe cases amidst a rapid spread of COVID-19.

1. Introduction

Although COVID-19 was reported several months ago [1], the coronavirus is still raging on a global scale, and is especially surging in the United States, which is one of the most important engines of the global economic network. The pandemic in the United States will have an important impact on the global economy and politics. It is fundamental to make relatively accurate estimates for preventing and controlling the COVID-19 pandemic in the United States [2,3], wherein the transmission rate (TR) and infection fatality rate (IFR) are key indicators [4].
The main obstacle to calculating such indicators is the unreported infection rate (UIR), which might be caused by insufficient testing, data depression of mild or asymptomatic patients, and a time-lag bias [5,6]. Direct use of IFR values derived from official data might lead to larger errors [7]. Similar research on SARS pointed out that preferential ascertainment of severe cases and delayed reporting of deaths are the main two reasons for case fatality risk (CFR) error [8]. Beyond insufficient early testing, mild and asymptomatic patients might cause most unreported cases. In Brazil, only some moderate and severe infectives in hospitalizations are recorded thus far [9]. On the other hand, the time lag deviation could be explained by the incubation period of COVID-19, which fluctuates in a wide range [10] and still possess a high transmittance [11]. The incubation period is also correlated to the age of the infectives, which can directly affect IFR [12]. It was concluded that the unreported cases might lead to four kinds of uncertainty in IFR calibration, with the unclear denominator, unknown infection time, unknown incubation, and undiagnosed asymptomatic infections [13].
Characterizing unreported cases has become a popular question in the epidemic modeling of COVID-19. The recent literature attempts to calculate the UIR or the reported rate (RR) based on country-level data [14,15,16], wherein, a single country-level data might lead to a greater bias [17]. Moreover, the county-level data in the United States on recovered infectives are not released. Thus, the calculation of IFR depends merely on the national aggregate data, which might further amplify the error. More and more studies use multinational data [18], county-level data [19], or country-county mixed regional data [20] for analysis, which greatly improves the accuracy of modeling by increasing the dimensionality and quantity of data.
However, previous studies seldom investigated the time effect of UIR, which might affect the accuracy of all indicators. A recent study suggested using a time-varying SIR model to capture the changing transmissive rate [21]. Moreover, the incubation period was shown to change in different stages of transmission [22]. Some studies showed that the possible value of COVID-19 IFR of China should be 2.3% [23], while another study showed that the early COVID-19 IFR in Wuhan might be as high as 20% [24]. Such disputes might also imply a changing trend in IFR.
This study proposes an SIR regression model with an unreported infection rate (SIRu) and SIRu, with time-varying parameters (tvSIRu) to estimate the values of TR, UIR, and IFR, and assess the impact of the time effect. The US county-level data used in this study comes from the open-source data of Johan Hopkins University on GitHub [25]. This study provides the first insights into the time series values of TR, UIR, and IFR of COVID-19, contributing to a deeper understanding of the trend of COVID-19 in the United States.

2. Materials and Methods

2.1. Data Source

The COVID-19 data used in this article contained 3142 counties in the United States, which included the number of daily new infectives, cumulative infectives, and deaths, while the population of recovered infectives remained unreported.
The date of the data ranged from 22 January 2020 to 20 August 2020, which contained 666,104 (3142 × 212) records. As a time-lag order (tk, tk+1) was applied in the data analysis, the number of whole records used for regression was 662,962 (3142 × 211).

2.2. tvSIRu Model with Fixed UIR

In the classic SIR dynamic model, the number of daily infectives ( I d t k + 1 ) at time t1 could be expressed by the function of the infection rate β, the number of susceptible persons ( S t k ), infected persons ( I t k ), and the total population (N) at time tk (Equation (1)).
I d t k + 1 =   β S t k I t k N
The SIR model with unreported infection rate (SIRu) could be illustrated in Figure 1.
As the population of the recovered infectives was not released, two kinds of parameters were added to the SIR model, λ for the recovery/death rate (RDR), φ and φ′ for the unreported infection rate (UIR) of cumulative cases and daily cases, respectively. Such variables could be described as the following equations:
I c t k =   φ I c r t k ,   R c t k = λ R d r t k ,   I d t k + 1 =   φ I d r t k + 1
where I c t k represented the total cumulative infectives at time tk, and I c r t k denoted the cumulative cases reported. R c t k reflected the whole population of removals at time tk, R d r t k as the cumulative death reported. The daily new infectives at time tk+1 ( I d t k + 1 ) was calculated by φ′ and the corresponding data were reported ( I d r t k + 1 ).
RDR could also be transformed into the infection fatality rate (IFR):
IFR = 1/(λ + 1)
while considering a fixed UIR with no time effect, the UIR of total cumulative infectives and daily new cases could be considered equivalent, thus:
φ = φ
The two explanatory variables in Equation (1), S t k , I t k , could be calculated as
S t k = N   I c t k ,   I t k = I c t k   R c t k
The SIR model (Equation (1)) could be developed into Equation (6) by substituting Equations (2)–(4).
φ I d r t k + 1 = β N φ I c r t k φ I c r t k λ   R d r t k N
Through further simplification and operation, Equation (6) could be transformed into Equation (7), which could be taken as the general tvSIRu model:
I d r t k + 1 =   β I c r t k β λ φ R d r t k β φ ( I c r t k ) 2 N + β λ I c r t k R d r t k N
Since the four combined variables, I c r t k , R d r t k , ( I c r t k ) 2 N , I c r t k R d r t k N , could be acquired or calculated by the data released, Equation (7) could be regarded as the primary linear function, Equation (8) with coefficients, a, b, c, d:
I d r t k + 1 =   a I c r t k + b R d r t k + c ( I c r t k ) 2 N + d I c r t k R d r t k N
while considering the fixed-time effect of all three parameters in Equation (7), the corresponding average value (β, λ, φ) could be calculated in Equation (9).
I d r t k + 1 =   β 0 I c r t k β 0 λ 0 φ 0 R d r t k β 0 φ 0 ( I c r t k ) 2 N + β 0 λ 0 I c r t k R d r t k N
where the values of β0, λ0, φ0 are constants.

2.3. tvSIRu Model with Time-Varying UIR

If the UIR varied over time, the UIRs of the cumulative cases and daily new cases were different, which was defined as φ and φ′, respectively. Equation (6) could be rewritten as
φ I d r t k + 1 = β N φ I c r t k φ I c r t k λ   R d r t k N
To simplify the computation, a new parameter β′ was introduced:
β = β / φ
Then Equation (10) could be transformed into a similar form of Equation (7):
I d r t 1 =   β I c r t k β λ φ R d r t k β φ ( I c r t k ) 2 N + β λ I c r t k R d r t k N
To verify the assumption of time-varying parameters, the coefficients in Equations (7) and (12) could be represented by the initial values and time effect functions. Such functions were substituted into the two models gradually, resulting in several sub-equations with time effects.
β =   β 0 g t
β =   β 0 g t ,   λ =   λ 0 h t
β =   β 0 g t ,   λ =   λ 0 h t ,   φ =   φ 0 f t
Substituting Equations (13)–(15) into Equations (7) and (12), three complete equations could be generated:
I d r t 1 =   β 0 g t I c r t k β 0 λ g t φ R d r t k β 0 g t φ ( I c r t k ) 2 N + β 0 g t λ I c r t k R d r t k N
I d r t 1 =   β 0 g t I c r t k β 0 g t λ 0 h t φ R d r t k β 0 g t φ ( I c r t k ) 2 N + β 0 g t λ 0 h t I c r t k R d r t k N
I d r t 1 =   β 0 g ( t ) ( φ 0 f ( t ) I c r t k λ 0 h t R d r t k φ 0 2 f t 2 ( I c r t k ) 2 N + λ 0 h t φ 0 f t I c r t k R d r t k N )
In terms of the specific functions reflecting time effect, the power, exponential, and periodic function were tested and compared in this article:
τ 1 t = x   t ,   τ 2 t = t   x ,   τ 3 t = 1 2 1 + cos t x π
This study tested the five equations, (8), (9), (16), (17), and (18), where Equation (8) is the OLS linear regression derived from the SIRu model, Equation (9) is SIRu with fixed-time effect, Equations (16), (17), and (18) are tvSIRu with single time-varying β, time-varying β and λ, all time-varying parameters of β, λ, and φ, respectively.

3. Results

3.1. OLS and SIRu Regressions

The linear regression derived from the SIRu model showed acceptable fitness and the adjusted R2 was 0.4813 (n = 662,962) (Table 1). The negative value of coefficients b and c were consistent with the corresponding operation signs in Equation (7). Such results verified the assumption of the SIRu model to a certain extent.
The SIRu model with a fixed-time effect in Equation (9) further provided the estimated value of TR, UIR, and RDR (Table 2). The results showed that the average β0 value from 22 January to 20 August was 0.0339 (95%CI 0.0338–0.0340), and the φ0 value was 19.5 (95%CI 19.38–19.54), which implied that there might be 19.5 undiagnosed cases while one infection was reported in US counties, on average. Meanwhile, the λ0 value of 192.5 (95%CI 191.790–193.243) could be interpreted as an IFR value of 0.516%.

3.2. SIRu at the State Level

The study further utilized county-level data to compare state-level parameters based on fixed-time effects. Figure 2 shows the fitness of Equation (8) across the whole states, most of which were above 0.5 (Figure 2), and each state had different TR, UIR, and RDR values in Equation (9), which indicated an obvious spatial heterogeneity in the transmission of COVID-19 (Figure 3). All parameters and statistical descriptions are reported in Appendix A, Appendix B and Appendix C.
Most states had a TR between 0.018–0.053, seven states with relatively high values were Illinois (0.146), Massachusetts (0.109), Connecticut (0.104), New Jersey (0.098), Nevada (0.080), Arizona (0.087), and Alaska (0.076) (Figure 3a).
In terms of UIR, most states were concentrated between 28–50 (Figure 3b). Some states had relatively lower values, such as New York (7.31) and Oregon (8.64), while the top five states were Maine (122.84), Vermont (185.66), Alaska (85.69), and West Virginia (80.90).
The fitting results on RDR in some cities were not significant, but most significant values were between 200–500, which was equivalent to the value of IFR ranging from 0.2% to 0.5% (Figure 3c). Wherein, eight cities were reported below 99 (IFR > 1%), including Ohio (44.29), Oklahoma (49.35), Florida (77.97), Alabama (66.40), Mississippi (98.32), Kentucky (74.40), Iowa (58.45), New Mexico (62.18), and California (55.91).
The Pearson correlation between the three state-level indicators was also tested, showing a positive correlation between UIR and RDR. In other words, the lower the IFR, the higher the UIR (Figure 3d).

3.3. tvSIRu Regression at the Country Level

The tvSIRu model with time-varying TR was first tested by three sub-equations of Equation (16), and the AIC of all equations was reduced, by comparing to the SIRu model of fixed-time effect (Table 3). Meanwhile, all estimated TR displayed a declining trend (Figure 4). Wherein, the power function showed the best fitness with an initial extremely high value of 227.58 (95%CI 219.89–235.27) decreasing to 0.022 on 20 August. Such a high value might reflect the high contagiousness of COVID-19 in the early stage. The corresponding UIR and RDR were 18.61 (95%CI 18.52–18.69) and 183.34 (95%CI 182.63–184.05), which were slightly higher than the values in Equation (9).
When the time effect of RDR was further added to Equation (17), the AIC of the power function displayed a slight decrease in Equation (17) (Table 4). Wherein, the UIR was 19.02 (95%CI 18.93–19.12), which was similar to the value in Equation (9). However, both equations showed decreasing trends in the changing RDR, implying an increase of IFR (Figure 5).
The power function also showed better performance in tvSIRu with all three time-varying parameters estimated by Equation (18), which indicated a gradual increase in both UIR and RDR (Table 5). This trend indicated that the initial UIR and RDR were relatively low (Figure 6). The value of UIR and RDR achieved 9.1 (95%CI 5.7–14.0) and 141.706 (95%CI 103.3358–189.9486) at T212 on 20 August, respectively. IFR could be calculated as 0.70% (95%CI 0.52–0.95%). Based on the officially released data on 20 August 2020, it might be concluded that about 30% of the whole population was infected.

4. Discussion

Few studies analyzed the time-varying UIR of COVID-19, and its impacted on the estimation of TR and IFR. This study estimated the values of UIR, TR, and IFR of both time-fixed effect and time-varying effect with tvSIRu models, based on county-level data.
In terms of the fixed-time effect, the results showed that from 22 January to 20 August, the average TR and UIR at the country level in the United States were 0.03 and 19.5, respectively, and the RDR was 192.5, which also meant that the IFR was 0.516%. The IFR was slightly lower than the overall IFR of 0.66% estimated in China [17], while the CDC in the United States recommends 0.65% [26].
In a further analysis on the state level, the UIR of all states ranged from 7.32–185.66 (mean = 38), and the IFR ranged from 0.037–2.20% (mean = 0.21%). A related study on 20 US counties estimated that the range of UIR was 4.32–776.68 (mean = 27.7) and IFR was 0.02–1.81% (mean = 0.027%), the range of UIR estimated by the SIRu model was more concentrated, and the IFRs had a similar upper boundary [27]. Another previous study estimated four states’ upper boundary of UIR—Illinois (40.86), Massachusetts (38.28), New Jersey (29.22), and New York (35.17) [19]. Among these, the first three were similar to the values estimated by the SIRu model, which were 41.51, 39.22, and 31.83, only New York had a different value of 7.32. However, interestingly the study also pointed out that the UIR estimated by an antibody test in New York State in early May was around 7.6, which might indicate the stability of the SIRu method.
Based on the tvSIRu model, UIR and IFR increased by following the power function rather than the exponent function, which was the default setting in previous research [21]. Other than the average value of 0.03 in SIRu, the TR estimated by the tvSIRu model decreased from a large value of 227 to a value of 0.022 on 20 August, which was much lower than the fixed value 0.05–0.06 reported in related research [21]. It might further explain the high contagiousness in the initial stage in COVID-19 transmission. The increasing UIR estimated by the tvSIRu model had a similar value of 9.1 (95%CI 5.7–14.0) at T212 (20 August), which was very close to the value of 9 estimated in a former study in April [20], and the latest study in September [28]. The UIR value was also close to the value reported in Brazil (Reported rate = 9.2%, UIR = 10.8) [18]. Such similarity in the estimated UIR in different periods might be caused by the fixed-time effect in the former models, which only represented the average values of UIR, as calculated by the SIRu model. The increasing UIR meant that the IFR was on a downward trend. The value of IFR on August 20 was 0.70% (95%CI 0.52–0.95%), which was still close to the value recommend by CDC [26].
Many studies supposed that the UIR would decrease with the improvement of COVID-19 testing and increased hygiene awareness, but our research showed that UIR in the United States is increasing, which might have a great impact on policy-making for COVID-19 prevention. On the other hand, empirical TR is often used in contemporary COVID-19 modeling, but the tvSIRu model indicates that the COVID-19 infection rate changed dramatically. The initial value of TR was 246, reflecting that this pandemic was extremely contagious in the early transmission stage of the United States. Previous SIR modeling seldom characterized such a feature, which might lead to large estimation errors. The reducing TR, IFR, and increasing UIR indicated by the model showed that the epidemic was rapidly spreading in the United States with a large number of self-healing populations. However, it is noteworthy the potential increasing cases of severe illnesses might greatly affect the medical system, and the relevant departments still need to provide more protection to high-risk groups.
As shown in Figure 3, with the potential pattern of spatial correlation, the tvSIRu model could be developed by integrating models considering the spatial weight, to detect the spatiotemporal features of COVID-19 transmission, such as Geographical Weighed Regression model (GWR) [29], Spatial Panel Model, etc. Meanwhile, the regression used in the tvSIRu models could also be extended by a non-linear method, such as the Artificial Neural Network (ANN) [30].

5. Conclusions

This article indicates that there might be an increasing number of unrecorded COVID-19 cases in the official U.S. data, wherein, the tvSIRu model provides a simple, convenient, and relatively accurate calculation of the unreported parameters of COVID-19 with time effect, based on official released data. Moreover, this method can be easily transplanted to analyze the epidemic modeling of other countries.
It must be admitted that if single level geography units of data are used, the independent variables might display strong collinearity, leading to overfitting. It is therefore necessary to use proper sub-geographical level data to fit the national-level or state-level data. Furthermore, the non-linear model regression was based on the Gauss-Newton iteration, which could be further optimized with machine learning models.

Author Contributions

L.L. and S.B. conceived and designed the experiments; L.L. and S.A. performed the experiments; T.H. and Z.P. acquired and analyzed the data; R.W. and H.W. contributed reagents/materials/analysis tools; S.A. and L.L. wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

The study is funded by the National Key Research and Development Project (2019YFB2101803); National Natural Science Foundation of China (52078390); Wuhan University Experiment Technology Project Funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study comes from the open-source data of Johan Hopkins University on GitHub (https://github.com/CSSEGISandData/COVID-19).

Acknowledgments

The authors would like to acknowledge Xun Shi and other experts for their suggestions on the presentation of Data Statistical Analysis and Spatio-temporal Prediction Models of COVID-19 based on Workflow in the COVID-19 Data Analysis Webinar.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Parameters Estimated by Equation (7) on State Level.
Table A1. Parameters Estimated by Equation (7) on State Level.
StateR2β (TR)φ (UIR)λ (RDR)IFR
Alabama0.68000.038428.111866.40500.0148
Alaska0.70430.075885.69932677.07400.0004
Arizona0.83440.087331.8510459.39590.0022
Arkansas0.43230.020214.4224−103.5099 *
California0.71990.033731.979755.91310.0176
Colorado0.63270.032455.2171395.22570.0025
Connecticut0.42090.103955.4896389.61640.0026
Delaware0.31710.087538.4571708.53200.0014
District of Columbia0.70490.157550.9941780.38510.0013
Florida0.70760.039113.410179.97650.0123
Georgia0.76570.040228.1156195.65810.0051
Hawaii0.89350.092870.39562208.20300.0005
Idaho0.70480.058833.2108466.14970.0021
Illinois0.79090.145741.5079693.71600.0014
Indiana0.61820.030129.8719255.24910.0039
Iowa0.47860.019015.763958.45000.0168
Kansas0.53600.023016.0095−81.6163 *
Kentucky0.57810.025925.907874.40670.0133
Louisiana0.43160.065825.3544284.93280.0035
Maine0.53110.0223122.8401−543.0694 *
Maryland0.63330.040635.1383325.58100.0031
Massachusetts0.69040.108939.2230448.52650.0022
Michigan0.49500.053356.0760339.50870.0029
Minnesota0.74410.006830.1427−1098.4030 *
Mississippi0.51790.034921.977898.32660.0101
Missouri0.70960.041231.3774188.54050.0053
Montana0.52300.022911.8879−139.4672 *
Nebraska0.56440.025013.2247400.81060.0025
Nevada0.92500.080635.9869514.96210.0019
New Hampshire0.46950.044640.4871486.11620.0021
New Jersey0.50940.098631.8312323.42390.0031
New Mexico0.50750.019713.467562.17880.0158
New York0.4734−0.0221 *7.3187244.72020.0041
North Carolina0.73070.035539.0629182.68430.0054
North Dakota0.61310.051149.7167715.75870.0014
Ohio0.60070.021328.913646.35050.0211
Oklahoma0.74520.036233.441644.29040.0221
Oregon0.66630.01668.6465−24.6175 *
Pennsylvania0.52680.054747.0019407.85710.0024
Rhode Island0.22140.037340.5705−56,191.3200 *
South Carolina0.69500.056331.2758433.80970.0023
South Dakota0.40690.050039.89301084.92800.0009
Tennessee0.68540.020017.9252−388.7768 *
Texas0.56050.042428.6138250.57310.0040
Utah0.90430.036142.7415119.12230.0083
Vermont0.14900.0322185.66251314.21700.0008
Virginia0.55680.029948.3013259.45860.0038
Washington0.55310.021315.1840115.72730.0086
West Virginia0.31630.034680.9098497.30750.0020
Wisconsin0.73680.020628.0754−231.9747 *
Wyoming0.26220.018534.7665404.76220.0025
* p value > 0.05.

Appendix B

Table A2. Statistical Prescription of State Level Parameters on COVID-19.
Table A2. Statistical Prescription of State Level Parameters on COVID-19.
Min.1st Qu.MedianMean3rd Qu.Max.
TR0.0068240.02350.0367720.0478260.0559020.157463
UIR7.31865925.6311231.979738.8912142.12469185.6625
RDR44.29042135.0128332.5448456.1626481.12462677.074
Note: TR: the transmission rate; UIR: the unreported infection rate; RDR: the recovery/mortality rate.

Appendix C

Table A3. Parameters Table List.
Table A3. Parameters Table List.
ParametersReferences
βthe transmission rate of COVID-19 in SIR model
φ′the unreported infection rate of new reported infections
φthe unreported infection rate of cumulative reported infections
λthe recovery/mortality rate of reported deaths
I c t k the total cumulative infectives at time tk
I c r t k the cumulative cases reported at time tk
R c t k the whole population of removals at time tk
R d r t k the cumulative death reported at time tk
I d t k + 1 the factual daily new infectives at time tk+1
I d r t k + 1 the reported daily new infectives at time tk+1
S t k the number of susceptible persons at time tk
I t k the number of infectives at time tk

References

  1. Peng, Z.; Wang, R.; Liu, L.; Wu, H. Exploring Urban Spatial Features of COVID-19 Transmission in Wuhan Based on Social Media Data. ISPRS Int. J. Geo Inf. 2020, 9, 402. [Google Scholar] [CrossRef]
  2. Hu, T.; Guan, W.W.; Zhu, X.; Shao, Y.; Liu, L.; Du, J.; Liu, H.; Zhou, H.; Wang, J.; She, B.; et al. Building an Open Resources Repository for COVID-19 Research. Data Inf. Manag. 2020, 4, 130. [Google Scholar]
  3. Yang, C.; Sha, D.; Liu, Q.; Li, Y.; Lan, H.; Guan, W.W.; Hu, T.; Li, Z.; Zhang, Z.; Thompson, J.H.; et al. Taking the pulse of COVID-19: A spatiotemporal perspective. Int. J. Digit. Earth 2020, 13, 1186–1211. [Google Scholar] [CrossRef]
  4. Leon, D.A.; Shkolnikov, V.M.; Smeeth, L.; Magnus, P.; Pechholdová, M.; Jarvis, C.I. COVID-19: A need for real-time monitoring of weekly excess deaths. Lancet 2020, 395, e81. [Google Scholar] [CrossRef]
  5. Jung, S.-M.; Akhmetzhanov, A.R.; Hayashi, K.; Linton, N.M.; Yang, Y.; Yuan, B.; Kobayashi, T.; Kinoshita, R.; Nishiura, H. Real-Time Estimation of the Risk of Death from Novel Coronavirus (COVID-19) Infection: Inference Using Exported Cases. J. Clin. Med. 2020, 9, 523. [Google Scholar] [CrossRef] [Green Version]
  6. Spychalski, P.; Błażyńska-Spychalska, A.; Kobiela, J. Estimating case fatality rates of COVID-The Lancet. Infect. Dis. 2020, 20, 774–775. [Google Scholar]
  7. Abdollahi, E.; Champredon, D.; Langley, J.M.; Galvani, A.P.; Moghadas, S.M. Temporal estimates of case-fatality rate for COVID-19 outbreaks in Canada and the United States. Can. Med. Assoc. J. 2020, 192, E666–E670. [Google Scholar] [CrossRef]
  8. Lipsitch, M.; Donnelly, C.A.; Fraser, C.; Blake, I.M.; Cori, A.; Dorigatti, I.; Ferguson, N.M.; Garske, T.; Mills, H.L.; Riley, S.; et al. Potential Biases in Estimating Absolute and Relative Case-Fatality Risks during Outbreaks. PLoS Negl. Trop. Dis. 2015, 9, e0003846. [Google Scholar] [CrossRef] [Green Version]
  9. Sousa, G.J.B.; Garces, T.S.; Cestari, V.R.F.; Florêncio, R.S.; Moreira, T.M.M.; Pereira, M.L.D. Mortality and survival of COVID. Epidemiol. Infect. 2020, 148, e123. [Google Scholar] [CrossRef]
  10. Guan, W.-J.; Ni, Z.-Y.; Hu, Y.; Liang, W.-H.; Ou, C.-Q.; He, J.-X.; Liu, L.; Shan, H.; Lei, C.-L.; Hui, D.S.C. Clinical Characteristics of Coronavirus Disease 2019 in China. N. Engl. J. Med. 2020, 382, 1708–1720. [Google Scholar] [CrossRef]
  11. Wang, X.; Zhou, Q.; He, Y.; Liu, L.; Ma, X.; Wei, X.; Jiang, N.; Liang, L.; Zheng, Y.; Ma, L.; et al. Nosocomial outbreak of COVID-19 pneumonia in Wuhan, China. Eur. Respir. J. 2020, 55, 2000544. [Google Scholar] [CrossRef] [PubMed]
  12. Donnelly, C.A.; Ghani, A.C.; Leung, G.M.; Hedley, A.J.; Fraser, C.; Riley, S.; Abu-Raddad, L.J.; Ho, L.; Thach, T.; Chau, P.; et al. Epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in Hong Kong. Lancet 2003, 361, 1761–1766. [Google Scholar] [CrossRef] [Green Version]
  13. Anderson, R.M.; Heesterbeek, H.; Klinkenberg, D.; Hollingsworth, T.D. How will country-based mitigation measures influence the course of the COVID-19 epidemic? Lancet 2020, 395, 931–934. [Google Scholar] [CrossRef]
  14. Lau, H.; Khosrawipour, V.; Kocbach, P.; Mikolajczyk, A.; Ichii, H.; Schubert, J.; Bania, J.; Khosrawipour, T. Internationally lost COVID-19 cases. J. Microbiol. Immunol. Infect. 2020, 53, 454–458. [Google Scholar] [CrossRef]
  15. Liu, Z.; Magal, P.; Seydi, O.; Webb, G. A COVID-19 epidemic model with latency period. Infect. Dis. Model. 2020, 5, 323–337. [Google Scholar] [CrossRef]
  16. Cakmakli, C.; Simsek, Y. Bridging the COVID-19 Data and the Epidemiological Model using Time Varying Parameter SIRD Model. arXiv 2020, arXiv:2007.02726. [Google Scholar]
  17. Verity, R.; Okell, L.C.; Dorigatti, I.; Winskill, P.; Whittaker, C.; Imai, N.; Cuomo-Dannenburg, G.; Thompson, H.; Walker, P.G.T.; Fu, H.; et al. Estimates of the severity of coronavirus disease 2019: A model-based analysis. Lancet Infect. Dis. 2020, 20, 669–677. [Google Scholar] [CrossRef]
  18. Prado, M.F.; Antunes, B.B.; Bastos, L.D.S.L.; Peres, I.T.; da Silva, A.d.B.; Dantas, L.F.; Baião, F.A.; Maçaira, P.; Hamacher, S.; Bozza, F.A. Analysis of COVID-19 under-reporting in Brazil. Rev. Bras. Ter. Intensiva 2020, 32, 224–228. [Google Scholar] [CrossRef]
  19. Srivastava, A.; Prasanna, V. Data-driven Identification of Number of Unreported Cases for COVID-19: Bounds and Limitations. arXiv 2020, arXiv:2006.02127. [Google Scholar]
  20. Chow, C.; California, J.; Gerkin, R.; Vattikuti, S. Global prediction of unreported SARS-CoV2 infection from observed COVID-19 cases. medRxiv 2020. [Google Scholar] [CrossRef]
  21. Zhou, Y.; Wang, L.; Zhang, L.; Shi, L.; Yang, K.; He, J.; Zhao, B.; Overton, W.; Purkayastha, S.; Song, P. A Spatiotemporal Epidemiological Prediction Model to Inform County-Level COVID-19 Risk in the United States. Harv. Data Sci. Rev. 2020. [Google Scholar] [CrossRef]
  22. Li, Q.; Guan, X.; Wu, P.; Wang, X.; Zhou, L.; Tong, Y.; Ren, R.; Leung, K.S.M.; Lau, E.H.Y.; Wong, J.Y. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. N. Engl. J. Med. 2020. [Google Scholar] [CrossRef] [PubMed]
  23. Wu, Z.; McGoogan, J.M. Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72314 Cases From the Chinese Center for Disease Control and Prevention. JAMA 2020, 323, 1239–1242. [Google Scholar] [CrossRef] [PubMed]
  24. Baud, D.; Qi, X.; Nielsen-Saines, K.; Musso, D.; Pomar, L.; Favre, G. Real estimates of mortality following COVID-19 infection. Lancet Infect. Dis. 2020, 20, 773. [Google Scholar] [CrossRef] [Green Version]
  25. Dong, E.; Du, H.; Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 2020, 20, 533–534. [Google Scholar] [CrossRef]
  26. Meyerowitz-Katz, G.; Merone, L. A systematic review and meta-analysis of published research data on COVID-19 infection-fatality rates. medRxiv 2020. [Google Scholar] [CrossRef]
  27. Jiarui, L.H.; Timothy, S. Estimating the Fraction of Unreported Infections in Epidemics with a Known Epicenter: An Application to COVID-SSRN; Becker Friedman Institute for Economics Working Paper No. 2020-37; University of Chicago: Chicago, IL, USA, 2020. [Google Scholar]
  28. Wu, S.L.; Mertens, A.N.; Crider, Y.S.; Nguyen, A.; Pokpongkiat, N.N.; Djajadi, S.; Seth, A.; Hsiang, M.S.; Colford, J.M.; Reingold, A.; et al. Substantial underestimation of SARS-CoV-2 infection in the United States. Nat. Commun. 2020, 11, 4507. [Google Scholar] [CrossRef]
  29. Sannigrahi, S.; Pilla, F.; Basu, B.; Basu, A.S.; Molter, A. Examining the association between socio-demographic composition and COVID-19 fatalities in the European region using spatial regression approach. Sustain. Cities Soc. 2020, 62, 102418. [Google Scholar] [CrossRef]
  30. Tadano, Y.D.; Potgieter-Vermaak, S.; Kachba, Y.R.; Chiroli, D.M.D.; Godoi, R.H.M. Dynamic model to predict the association between air quality, COVID-19 cases, and level of lockdown. Environ. Pollut. 2020, 268, 115920. [Google Scholar] [CrossRef]
Figure 1. Susceptible–Infected–Recovered (SIR) model with unreported infection cases. The three big dashed boxes represent typical cabin parameters in the SIR model, wherein the infection data could be divided into two parts—reported and unreported. The solid green boxes represent the official released daily data on new infections, cumulative infections, and deaths, and might not represent the actual data on COVID-19 infection. Three new parameters were introduced to bridge such type of data suppression problem: φ’ is the unreported infection rate (UIR) of newly reported infections, φ is the UIR of cumulative reported infections, and λ represents the recovery/mortality rate of reported deaths (RDR).
Figure 1. Susceptible–Infected–Recovered (SIR) model with unreported infection cases. The three big dashed boxes represent typical cabin parameters in the SIR model, wherein the infection data could be divided into two parts—reported and unreported. The solid green boxes represent the official released daily data on new infections, cumulative infections, and deaths, and might not represent the actual data on COVID-19 infection. Three new parameters were introduced to bridge such type of data suppression problem: φ’ is the unreported infection rate (UIR) of newly reported infections, φ is the UIR of cumulative reported infections, and λ represents the recovery/mortality rate of reported deaths (RDR).
Ijerph 18 01090 g001
Figure 2. State-level fitness of Equation (8) with county-level data. The scaled density curve of adjusted R2 shows that Equation (8) was generally applicable, and its mapping indicated that the potential spatial heterogeneity of the states would affect the results of the SIRu modeling. Among them, the states in the southeastern, the west coast, and the Great Lakes Region showed higher adaptability.
Figure 2. State-level fitness of Equation (8) with county-level data. The scaled density curve of adjusted R2 shows that Equation (8) was generally applicable, and its mapping indicated that the potential spatial heterogeneity of the states would affect the results of the SIRu modeling. Among them, the states in the southeastern, the west coast, and the Great Lakes Region showed higher adaptability.
Ijerph 18 01090 g002
Figure 3. State-level parameters of Equation (9) with county-level data. (a) Transmission rate—three obvious clusters could be identified, Nevada–Arizona, Illinois, and Massachusetts–New Jersey, wherein the coefficient of New York could not be applied due to the non-significant p-value. (b) Unreported infection Rate. The UIR in the northeast was relatively high, but there were also two central states with high values. (c) Recovery/Death Rate. Blank blocks indicate that the RDR in the area was not applicable due to the insignificant p-value, wherein, the RDR of the northeastern cities was relatively higher, while the west coast states had both a high TR and RDR. (d) Correlation Test. The Pearson correlation test of all states’ parameters with significant p-values showed an obvious connection between RDR and TR, UIR. Note: *: significant at 0.05 level; ***: significant at 0.001 level.
Figure 3. State-level parameters of Equation (9) with county-level data. (a) Transmission rate—three obvious clusters could be identified, Nevada–Arizona, Illinois, and Massachusetts–New Jersey, wherein the coefficient of New York could not be applied due to the non-significant p-value. (b) Unreported infection Rate. The UIR in the northeast was relatively high, but there were also two central states with high values. (c) Recovery/Death Rate. Blank blocks indicate that the RDR in the area was not applicable due to the insignificant p-value, wherein, the RDR of the northeastern cities was relatively higher, while the west coast states had both a high TR and RDR. (d) Correlation Test. The Pearson correlation test of all states’ parameters with significant p-values showed an obvious connection between RDR and TR, UIR. Note: *: significant at 0.05 level; ***: significant at 0.001 level.
Ijerph 18 01090 g003
Figure 4. Time-varying TR estimated by Equation (16). Although the initial values of the power function were much higher than the exponential function in the medium term, the two values tended to be the same, while the periodic function showed that it was in the third wave.
Figure 4. Time-varying TR estimated by Equation (16). Although the initial values of the power function were much higher than the exponential function in the medium term, the two values tended to be the same, while the periodic function showed that it was in the third wave.
Ijerph 18 01090 g004
Figure 5. Time-varying RDR with 95% CI estimated by Equation (17). If the time effect of UIR was not considered, the fitting results showed that RDR exhibited a decreasing effect over time, which meant that IFR might be slowly increasing.
Figure 5. Time-varying RDR with 95% CI estimated by Equation (17). If the time effect of UIR was not considered, the fitting results showed that RDR exhibited a decreasing effect over time, which meant that IFR might be slowly increasing.
Ijerph 18 01090 g005
Figure 6. Time-varying UIR and RDR with 95%CI estimated by Equation (18). Equation (18) only provided the estimated values of UIR and RDR. Both the power function and the exponential function implied an increasing effect, wherein, the power function was much smaller than the exponential function in terms of UIR estimation.
Figure 6. Time-varying UIR and RDR with 95%CI estimated by Equation (18). Equation (18) only provided the estimated values of UIR and RDR. Both the power function and the exponential function implied an increasing effect, wherein, the power function was much smaller than the exponential function in terms of UIR estimation.
Ijerph 18 01090 g006
Table 1. Linear SIR Regression estimated by Equation (8).
Table 1. Linear SIR Regression estimated by Equation (8).
EstimateStd. Errort Valuep-ValueSignificance
Intercept0.94450.061715.29<0.001***
a0.02830.0001421.44<0.001***
b−0.18530.0011−161.50<0.001***
c−0.53920.0023−227.86<0.001***
d4.67180.0241193.40<0.001***
Adjusted R20.4813 <0.001***
AIC7,059,288
Note: AIC: model fitness based on Akaike information criterion; ***: significant at 0.001 level.
Table 2. SIR Regression estimated by Equation (9).
Table 2. SIR Regression estimated by Equation (9).
EstimateStd. Errort Valuep-Value
β00.03390.0001604.4<0.001
φ019.46030.0415468.7<0.001
λ0192.51630.3707519.3<0.001
AIC7,080,522
Note: β0: the average transmission rate; φ0: the average unreported infection rate; λ0: the average recovery/mortality rate of reported deaths; AIC: Akaike information criterion.
Table 3. Time-varying TR estimated by Equation (16).
Table 3. Time-varying TR estimated by Equation (16).
g t = m t   g t = t m   g t = 1 2 1 + cos t m π  
Estimatep-ValueEstimatep-ValueEstimatep-Value
β00.2498<0.001227.5862<0.0010.0525<0.001
φ18.5069<0.00118.6100<0.00119.7915<0.001
λ181.9526<0.001183.3437<0.001196.0005<0.001
m0.9883<0.001−1.7229<0.00143.5300<0.001
AIC6,982,233 6,962,783 7,076,624
Note: β0: the initial constant in the function of time-varying transmission rate; φ: the unreported infection rate; λ: the recovery/mortality rate of reported deaths; m: the estimated constant in power/exponential function of the time variable; AIC: model fitness based on Akaike information criterion.
Table 4. Time-varying TR and RDR estimated by Equation (17).
Table 4. Time-varying TR and RDR estimated by Equation (17).
g t = m t ,   h t = k t g t = t m ,     h t = t k
Estimatep-ValueEstimatep-Value
β00.25239807<0.001241.912633<0.001
φ18.66057736<0.00119.024919<0.001
λ0196.63534702<0.001284.386081<0.001
m0.98828594<0.001−1.734960<0.001
k0.99949887<0.001−0.085439<0.001
AIC6,980,073 6,959,144
Note: β0: the initial constant in the function of time-varying transmission rate; φ: the average unreported infection rate; λ0: the initial constant in the function of time-varying recovery/mortality rate; m,k: the estimated constant in power/exponential functions of the time variable; AIC: model fitness based on Akaike information criterion.
Table 5. Time-varying UIR and RDR estimated by Equation (18).
Table 5. Time-varying UIR and RDR estimated by Equation (18).
g t = m t ,   f t = n t ,   h t = k t ( E . 1 ) g t = t m ,   f t = t n ,   h t = t k ( E . 2 )
Estimatep-ValueEstimatep-Value
β00.2507<0.00140.1660<0.001
φ015.2287<0.0010.0109<0.001
λ0143.4179<0.0010.0001<0.001
m0.9838<0.001−2.6890<0.001
n1.0018<0.0011.2555<0.001
k1.0013<0.0012.6687<0.001
AIC6,969,888 6,813,832
Note: β0: the initial constant in the time-varying function of the transmission rate and the unreported rate of new reported infections; φ0: the initial constant in the time-varying function of the unreported rate of cumulative reported infections; λ0: the initial constant in the function of time-varying recovery/mortality rate; m,n,k: the estimated constants in power/exponential functions of the time variable; AIC: model fitness based on Akaike information criterion.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Peng, Z.; Ao, S.; Liu, L.; Bao, S.; Hu, T.; Wu, H.; Wang, R. Estimating Unreported COVID-19 Cases with a Time-Varying SIR Regression Model. Int. J. Environ. Res. Public Health 2021, 18, 1090. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18031090

AMA Style

Peng Z, Ao S, Liu L, Bao S, Hu T, Wu H, Wang R. Estimating Unreported COVID-19 Cases with a Time-Varying SIR Regression Model. International Journal of Environmental Research and Public Health. 2021; 18(3):1090. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18031090

Chicago/Turabian Style

Peng, Zhenghong, Siya Ao, Lingbo Liu, Shuming Bao, Tao Hu, Hao Wu, and Ru Wang. 2021. "Estimating Unreported COVID-19 Cases with a Time-Varying SIR Regression Model" International Journal of Environmental Research and Public Health 18, no. 3: 1090. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18031090

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop