Next Article in Journal
Seroprevalence of Severe Fever with Thrombocytopenia Syndrome Virus in Small-Animal Veterinarians and Nurses in the Japanese Prefecture with the Highest Case Load
Next Article in Special Issue
Thapsigargin Is a Broad-Spectrum Inhibitor of Major Human Respiratory Viruses: Coronavirus, Respiratory Syncytial Virus and Influenza A Virus
Previous Article in Journal
Infection, Dissemination, and Transmission Potential of North American Culex quinquefasciatus, Culex tarsalis, and Culicoides sonorensis for Oropouche Virus
Previous Article in Special Issue
COVID-19: A Review on the Novel Coronavirus Disease Evolution, Transmission, Detection, Control and Prevention
Article

Genomic Signatures of SARS-CoV-2 Associated with Patient Mortality

1
Department of Tropical Medicine, Vector-Borne and Infectious Disease Research Center, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA 70112, USA
2
Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
3
Department of Pathology, School of Medicine, Tulane University, New Orleans, LA 70112, USA
*
Author to whom correspondence should be addressed.
Academic Editors: Luis Martinez-Sobrido and Fernando Almazan Toral
Received: 1 December 2020 / Revised: 12 January 2021 / Accepted: 27 January 2021 / Published: 2 February 2021
(This article belongs to the Collection Coronaviruses)

Abstract

Infections with SARS-CoV-2 can progress toward multiple clinical outcomes, and the identification of factors associated with disease severity would represent a major advance to guide care and improve prognosis. We tested for associations between SARS-CoV-2 genomic variants from an international cohort of 2508 patients and mortality rates. Findings were validated in a second cohort. Phylogenetic analysis of SARS-CoV-2 genome sequences revealed four well-resolved clades which had significantly different mortality rates, even after adjusting for patient demographic and geographic characteristics. We further identified ten single-nucleotide polymorphisms (SNPs) in the SARS-CoV-2 genome that were associated with patient mortality. Three SNPs remained associated with mortality in a generalized linear model (GLM) that also included patient age, sex, geographic region, and month of sample collection. Multiple SNPs were confirmed in the validation cohort. These SNPs represent targets to assess the mechanisms underlying COVID-19 disease severity and warrant straightforward validation in functional studies.
Keywords: COVID-19; coronavirus; pathogenesis; SNP; genome COVID-19; coronavirus; pathogenesis; SNP; genome

1. Background

In December 2019, an outbreak caused by a novel Coronavirus referred to as SARS-CoV-2 quickly progressed into one of the worst pandemics, causing an unprecedented international health and economic crisis. As of October 2020, there were over 45 million confirmed cases worldwide, with over 1.2 million deaths (corresponding to a global 2.7% mortality rate) according to the Johns Hopkins Coronavirus Resource Center [1]. However, infections with SARS-CoV-2 can progress toward very variable clinical outcomes, ranging from asymptomatic infections to very severe pulmonary disease and death. The identification of host and viral factors associated with disease severity would represent a major advance to guide medical care and improve patient prognosis [2]. It could also allow the identification of precise therapeutic targets such as host pathways or specific viral proteins involved in pathogenesis.
Mortality rates can vary greatly according to geography, population density, demographics, the timing and extent of community mitigation measures, testing availability, healthcare infrastructure, and public health reporting practices, among others [3]. Nonetheless, epidemiologic studies allowed for identifying some host factors associated with a higher mortality, including increased age, male sex, and several chronic comorbidities such as obesity, diabetes, or coronary artery disease [4,5]. An exacerbated immune response in patients together with an impaired type I interferon has also been proposed as a critical contributor to disease severity [6,7].
Viral factors underlying COVID-19 disease severity and mortality are, on the other hand, unclear [6]. Comparison of genome sequences among SARS-CoV-2 and other members of the Coronaviridae family infecting humans, including SARS and MERS, indicated that differences in the nucleocapsid (N) and spike (S) viral proteins may be associated with the increased mortality rate caused by these coronaviruses as well as with the host switch from animals to humans [8]. Among SARS-CoV-2, a mutation hot spot was first identified in the virus RNA-dependent RNA polymerase (RdRp), but mutations have also been detected elsewhere throughout the viral genome [9,10], although their relevance for pathogenesis is unknown. The G614D variant in the S glycoprotein has been associated with increased transmissibility, infectivity and viral loads, but not with disease severity [11,12]. Similarly, a G/T variant in the Open reading frame 1ab (Orf1ab) gene has been associated with symptomatic and asymptomatic infections, respectively, in a small cohort of 152 patients [13], and some viral clades from Chicago have been associated with differences in viral loads [14]. Another study found correlations between mortality rates at the country level and the frequencies of G614D and P4715L variants in the S and Orf1ab proteins, respectively [15]. In this study, we aimed to identify genomic signatures of SARS-CoV-2 virus that may be associated with mortality of infected patients. Therefore, we tested for potential associations among viral genomic variants from a large international cohort of 2508 patients and their clinical and demographic characteristics.

2. Methods

2.1. Viral Sequence Data and Associated Patient Metadata

A dataset of 3205 whole-genome sequences from SARS-CoV-2 virus was selected from the Global Initiative on Sharing All Influenza Data (GISAID) database (https://www.gisaid.org), based on the availability of links with patient metadata, including disease severity and demographics (Supplementary Table S1). These sequences corresponded to virus isolates from multiple regions, including Asia, Africa, Europe, Oceania, and America, collected between December 2019 and 26 June 2020, from over 402 laboratories.
The cohort included both inpatients and outpatients, and most sequences were derived from oro- or naso-pharyngeal swabs (1691 sequences), secretions/sputum (105 sequences), broncho-alveolar fluid (42 sequences), and other or unspecified samples. No information on patient ethnicity was available, nor on potential co-morbidities. Because many of the clinical terms used to describe disease status were ambiguous/vague and to avoid any bias in categorizing disease severity, we only focused on mortality versus survival (case fatality rate). We considered patients described as “Deceased” as dead, while those described as “Alive, Asymptomatic, Cured, Discharged, Discharged after recovery, Facility quarantine, Fever, Home, Hospitalized, Intensive Care Unit, Severe, In hospital, Inpatient, Isolation, Live, Mild, Moderate, Outpatient, Not Hospitalized, Quarantine(d), Recovered, Released, Symptomatic” were considered as survivors. After deleting incomplete/low-quality sequences, and samples with inconsistent metadata, a final dataset of 2508 sequences with associated patient mortality, collection date, geographic region, patient age, and sex was used for analysis.
We also obtained a second independent dataset from GISAID for validation of our findings. It consisted of 1488 new viral sequences deposited between June 27 and 23 July 2020 (Supplementary Table S2). These included virus sequences from multiple regions, as above, collected between 1 January and 23 July 2020. After deleting incomplete/low-quality sequences, and samples with inconsistent/missing metadata, a validation dataset of 992 sequences with associated patient mortality, collection date, geographic region, patient age, and sex was used for analysis.

2.2. Data Analysis

Viral genome sequences from our first dataset were aligned using MAFFT [16] as implemented in Geneious 11, and a phylogenetic tree was built using FastTree, which infers approximately-maximum-likelihood phylogenetic trees [17]. Major clades were compared to GISAID clades for easier reference. We tested for potential association of viral clades with mortality rates using X2 tests. Mortality rates were also compared according to demographic, geographic and temporal data to assess potential confounders. We used X2 tests to assess differences in mortality rates according to sex, month of year samples were collected and geographic region. We also compared the age of deceased and survivor patients using t-test, and compared patient age among viral Clades through Tukey post hoc test. No specific analyses could be performed associating mortality with ethnicity or preexisting comorbidities, as these were not reported in these datasets. Next, we used Generalized Linear Models (GLM) to test for association of mortality rates with viral clades while adjusting for demographic, geographic and temporal parameters, assuming a binomial distribution and a Logit link function with Firth Bias-adjusted estimates. Sequences from Oceania were excluded from this analysis due to a small sample size of sequences from this region (15 sequences), which caused model instability.
To identify genomic variants, Single-Nucleotide Polymorphisms (SNPs) were called from SARS-CoV-2 genome alignment through Geneious 11 SNP/variant tool, and tested individually for association with mortality, using X2 tests and odd ratios (OR). Statistical significance was adjusted using Bonferroni correction to account for multiple testing. SNPs positions in the genome were determined based on a sequence from Wuhan, China, from December 30, 2019 (Genbank # MT291827) as reference. We also assessed the phenotypic effect of each SNP in the corresponding viral protein. Finally, we analyzed combinations of SNPs found significantly associated with mortality in the bivariate analysis and further adjusted for demographic and geographic covariates in a multivariate model based on a GLM as described above. Again, sequences from Oceania were excluded from this analysis due to insufficient sample size from this region. We elaborated several models with different combinations of SNPs and covariates and models were compared based on Akaike Information criteria (AICc) to select the best model. All statistical analyses were performed in JMP 9.0.

2.3. Analysis of Validation Dataset

SARS-CoV-2 genome sequences were aligned as described above, and nine of the SNPs associated with mortality in the first cohort of patients were identified in this second cohort and their nucleotide variants scored. SNP association with mortality was tested by X2 tests, and statistical significance was adjusted using Bonferroni correction to account for multiple testing, as above. Finally, we also analyzed combinations of SNPs found significantly associated with mortality in the bivariate analysis and adjusted for demographic and geographic covariates in a multivariate model based on a GLM.

3. Results

Phylogenetic analysis of SARS-CoV-2 genome sequences from our cohort revealed four well-resolved clades (Figure 1). The overall mortality rate in this cohort was 5.74% (144/2508). However, patient mortality varied significantly among the identified clades (X2 = 47.93, d.f. = 3, p < 0.0001), ranging from 2.06% [95%CI 1.28–3.32] for Clade 1 up to 11.61% [95%CI 8.97–14.91] for Clade 3, with Clades 2 and 4 having intermediate mortality rates (6.03% [95%CI 4.35–8.31] and 5.84% [95%CI 4.34–7.83], respectively). These results suggested an association of viral clades with mortality.
However, deceased patients were also of older age (66.8 ± 1.31 vs. 48.2 ± 0.4 years for survivors, t-test, p < 0.0001), and mortality rates varied significantly according to the geographic region (X2 = 61.26, d.f.= 4, p < 0.0001), time of year (X2 = 74.35, d.f.= 6, p < 0.0001), and tended to be higher in males (X2 = 2.37, d.f. = 1, p = 0.12) (Supplementary Table S3). Therefore, we adjusted for these variables in a Generalized Linear Model (GLM), which confirmed that patient mortality was significantly associated with SARS-CoV-2 clades (Effect test p < 0.0001), together with patient age (p < 0.0001), geographic region (p < 0.0001), time of year (p < 0.0001), and sex (p = 0.004) (Supplementary Table S4). The sex-ratio of infections was similarly biased toward more males for all clades (X2 = 4.02, d.f. = 3, p = 0.25, Figure 2A). Patient age distributions were similar among the Clades, although the mean age was significantly lower for Clade 1 (p < 0.01, Figure 2B). There was a significant difference in the proportion of the respective clades among geographic regions (X2 = 522.65, d.f. = 12, p < 0.0001), and Clade 1, associated with a lower mortality rate, was predominant in Asia, and Clade 4 was predominant in North America (Figure 2C). The proportion of each Clade also varied over time (X2 = 821.54, d.f. = 18, p < 0.0001) (Figure 2D). Initial infections were caused by virus from Clade 1, which started to be replaced by the other Clades in February 2020, and became nearly absent from this cohort by June 2020 (1.45% of all sequences). Clade 3, associated with a high mortality rate, presented an initial increase in proportion in March and April 2020, but decreased since then, and Clade 2 and 4 were the most frequent clades in the cohort as of June 2020 (Figure 2D). However, there were regional variations in the changes in Clade proportions over time, and the only constant observation was the progressive replacement of Clade 1 (Figure 3).
We then focused on identifying specific sequence variants underlying these differences in mortality rates among SARS-CoV-2 clades. Viral genome sequences were analyzed for Single-Nucleotide Polymorphisms (SNPs), and we identified a total of 27 positions with SNPs ranging in frequency from 2.4 to 68.1% (Table 1). Twenty-one SNPs were transitions (mostly C/T), five were transversions, and one a combination. Ten of these SNPs (37%) were significantly associated with mortality rates (after Bonferroni correction), with three SNPs associated with a decreased mortality and six with an increased mortality. Four SNPs were located in non-structural proteins (nsps) from the Orf1ab gene, one in the S gene, one in the Orf8, three in the N gene, and one in the 3’Untranslated Region (UTR) (Table 1). The three SNPs from the N gene covered two consecutive codons with a change from AGGGG to AAACGA. All SNPs significantly associated with mortality rate caused changes in the amino acid sequence of the respective proteins, except C/T 2983 and C/T 8728 in Orf1ab, which were silent (Table 1).
We then elaborated new GLMs to test the association of SNPs combinations with patient mortality, again adjusting for patient age, sex, geographic region and month of the year, and models were compared based on Akaike information criteria. The best model included three SNPs that were significantly associated with patient mortality: C/T 2983 and T/C 14,353 in Orf1ab, and the 28,827–28,829 codons of the N gene (Table 2). These results indicate that Orf1ab and N viral proteins are key proteins which variants are associated with patient mortality, and to a lesser extent, Orf8 and the S glycoprotein.
To validate these results, we examined SARS-CoV-2 genomes from a second independent cohort of 992 patients (Supplementary Table S2). We tested nine SNPs identified in Table 1 and seven of these were found significantly associated with patient mortality (Table 3), mostly validating our initial results. The best GLM testing SNPs combinations included SNPs A/G 23349, C/T 14353 and CA/GG 28827–28828, as well as geographic region, month of the year and patient age, but not sex (Supplementary Table S5), providing further support that SNPs in Orf1ab, S and N genes are associated with patient mortality.

4. Discussion

We identified multiple SARS-CoV-2 genomic signatures in several viral genes that were associated with patient mortality in two independent cohorts. The functional significance of the variants identified here remains to be further investigated. Orf1ab encodes for several nsps, including the RNA-dependent RNA polymerase (RdRp), and it was previously identified as a mutation hot spot, suggestive of potential selection pressure associated with adaptation to human hosts [9]. The frequency of the T/C 14,353 variant (P4714L) has been found correlated with country mortality rates [15], but we found here that it was associated with a lower mortality. This variant falls within the RdRp and may affect viral replication. However, Orf1ab was also implicated in the pathogenesis of SARS-CoV-1 infections through processes distinct from viral replication that included cell signaling and the modulation of the immune response [18]. The proteins nsp2 and nsp3 have been proposed to play a role in COVID-19 pathogenesis [19], although SNP C/T 2983 associated with mortality and located in the nsp3 sequence did not cause a change in amino acid. Thus, this SNP may have an unknown function in addition to coding for nsp3. Orf8 from SARS-CoV-2 can interfere with type I interferon response in vitro [20], which has been found to be critical for mitigating disease severity [7]. The consequences of the R203K G204R substitutions in the N protein also warrant functional studies to assess its role in pathogenesis, as these are the most frequent variants in this protein [10]. Finally, it is interesting to note that the G614D substitution in the S protein was associated with an increased mortality rate in the bivariate analysis in both cohorts, and in the multivariate model of the validation cohort. Previous works showed that this substitution causes a greater infectivity and higher virus loads, but its effect on disease severity and mortality in patients has been debated [11,12,15]. Our data provide evidence that this substitution can lead to increased mortality.
The changes in the proportion of the different Clades we identified over time indicate that further monitoring is necessary. While changes in the proportion of variants over time can be expected due to founder effect of a virus rapidly spreading into naïve host populations, the associations of several of these variants with patient mortality may help better anticipate the risk for severe disease.
A limitation of our study is that the viral genomes which are sequenced may not be a random sample of the global virus population. Thus, these cohorts could be biased as sequencing effort may vary among health institutions, countries, and over time. Sequencing may also be biased according to patient status, and contact tracing may result in samples being epidemiologically linked. These potential biases may affect the proportion of genome variants and SNPs. The lack of standardized reporting of patient clinical status may also be a limitation and some patients may have died at a later time after sequences were reported. Finally, some comorbidities are known to increase the risk of severe disease, but could not be taken into account as these are not reported in these datasets. Nonetheless, variations in co-morbidities were in part taken into account in the analysis of mortality rates as we adjusted for geographic, age and temporal variations.
In conclusion, we identified here several previously undetected possible determinants of mortality in the SARS-CoV-2 genome. The identified SNPs are potential critical targets to assess the mechanisms underlying COVID-19 disease severity and warrant straightforward experimental validation in functional studies, and further confirmation in additional cohorts.

Supplementary Materials

The following are available online at https://0-www-mdpi-com.brum.beds.ac.uk/1999-4915/13/2/227/s1, Table S1: List of SARS-CoV-2 genomes from the testing cohort, Table S2: List of SARS-CoV-2 genomes from the validation cohort, Table S3: Mortality rates among demographic and geographic parameters, Table S4: Parameter estimates of Generalized Linear Model (GLM) for patient mortality, Table S5: Parameter estimates of Generalized Linear Model (GLM) for patient mortality in the validation cohort.

Author Contributions

Conceptualization, E.D., D.F., A.D. and C.H.; data analysis, E.D. and C.H.; interpretation of data, E.D., D.F., A.D. and C.H.; writing—original draft preparation, E.D.; writing—review and editing, E.D., D.F., A.D. and C.H. All authors have read and agreed to the published version of the manuscript.

Funding

We acknowledge funding from Tulane University Startup Package and Tulane University Physician Scientist Pipeline Program to DF; Tulane University Pilot Program to AD, and from the Department of Tropical Medicine Startup package to ED.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in Supplementary Tables S1 and S2.

Acknowledgments

We acknowledge funding from the Tulane University Startup Package and Tulane University Physician Scientist Pipeline Program to D.F.; Tulane University Pilot Program to A.D., and from the Department of Tropical Medicine Startup package to E.D.

Conflicts of Interest

The authors declare no competing interests.

References

  1. Johns Hopkins University, Coronavirus Resource Center. Available online: https://0-coronavirus-jhu-edu.brum.beds.ac.uk/ (accessed on 28 October 2020).
  2. Koutsakos, M.; Kedzierska, K. A race to determine what drives COVID-19 severity. Nature 2020, 582, 366–368. [Google Scholar] [CrossRef] [PubMed]
  3. Team, C.C.-R. Geographic Differences in COVID-19 Cases, Deaths, and Incidence-United States, February 12-April 7, 2020. MMWR Morb. Mortal. Wkly. Rep. 2020, 69, 465–471. [Google Scholar] [CrossRef]
  4. Docherty, A.B.; Harrison, E.M.; Green, C.A.; Hardwick, H.E.; Pius, R.; Norman, L.; Holden, K.A.; Read, J.M.; Dondelinger, F.; Carson, G.; et al. Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: Prospective observational cohort study. BMJ 2020, 369, m1985. [Google Scholar] [CrossRef] [PubMed]
  5. Gupta, S.; Hayek, S.S.; Wang, W.; Chan, L.; Mathews, K.S.; Melamed, M.L.; Brenner, S.K.; Leonberg-Yoo, A.; Schenck, E.J.; Radbel, J.; et al. Factors Associated With Death in Critically Ill Patients With Coronavirus Disease 2019 in the US. JAMA Intern. Med. 2020. [Google Scholar] [CrossRef] [PubMed]
  6. Zhang, X.; Tan, Y.; Ling, Y.; Lu, G.; Liu, F.; Yi, Z.; Jia, X.; Wu, M.; Shi, B.; Xu, S.; et al. Viral and host factors related to the clinical outcome of COVID-19. Nature 2020. [Google Scholar] [CrossRef] [PubMed]
  7. Hadjadj, J.; Yatim, N.; Barnabei, L.; Corneau, A.; Boussier, J.; Smith, N.; Pere, H.; Charbit, B.; Bondet, V.; Chenevier-Gobeaux, C.; et al. Impaired type I interferon activity and inflammatory responses in severe COVID-19 patients. Science 2020. [Google Scholar] [CrossRef] [PubMed]
  8. Gussow, A.B.; Auslander, N.; Faure, G.; Wolf, Y.I.; Zhang, F.; Koonin, E.V. Genomic determinants of pathogenicity in SARS-CoV-2 and other human coronaviruses. Proc. Natl. Acad. Sci. USA 2020, 117, 15193–15199. [Google Scholar] [CrossRef] [PubMed]
  9. Pachetti, M.; Marini, B.; Benedetti, F.; Giudici, F.; Mauro, E.; Storici, P.; Masciovecchio, C.; Angeletti, S.; Ciccozzi, M.; Gallo, R.C.; et al. Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant. J. Transl. Med. 2020, 18, 179. [Google Scholar] [CrossRef] [PubMed]
  10. Dumonteil, E.; Herrera, C. Polymorphism and selection pressure of SARS-CoV-2 vaccine and diagnostic antigens: Implications for immune evasion and serologic diagnostic performance. Pathogens 2020, 9, 584. [Google Scholar] [CrossRef] [PubMed]
  11. Korber, B.; Fischer, W.M.; Gnanakaran, S.; Yoon, H.; Theiler, J.; Abfalterer, W.; Foley, B.; Giorgi, E.E.; Bhattacharya, T.; Parker, M.D.; et al. Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2. bioRxiv 2020. [Google Scholar] [CrossRef]
  12. Ozono, S.; Zhang, Y.; Ode, H.; Seng Tan, T.; Imai, K.; Miyoshi, K.; Kishigami, S.; Ueno, T.; Iwatani, Y.; Suzuki, T.; et al. Naturally mutated spike proteins of SARS-CoV-2 variants show differential levels of cell entry. bioRxiv 2020. [Google Scholar] [CrossRef]
  13. Aiewsakun, P.; Wongtrakoongate, P.; Thawornwattana, Y.; Hongeng, S.; Thitithanyanont, A. SARS-CoV-2 genetic variations associated with COVID-19 severity. medRxiv 2020. [Google Scholar] [CrossRef]
  14. Lorenzo-Redondo, R.; Nam, H.H.; Roberts, S.C.; Simons, L.M.; Jennings, L.J.; Qi, C.; Achenbach, C.J.; Hauser, A.R.; Ison, M.G.; Hultquist, J.F.; et al. A Unique Clade of SARS-CoV-2 Viruses is Associated with Lower Viral Loads in Patient Upper Airways. medRxiv 2020. [Google Scholar] [CrossRef]
  15. Toyoshima, Y.; Nemoto, K.; Matsumoto, S.; Nakamura, Y.; Kiyotani, K. SARS-CoV-2 genomic variations associated with mortality rate of COVID-19. J. Hum. Genet. 2020, 65, 1075–1082. [Google Scholar] [CrossRef] [PubMed]
  16. Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef] [PubMed]
  17. Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS ONE 2010, 5, e9490. [Google Scholar] [CrossRef] [PubMed]
  18. Graham, R.L.; Sparks, J.S.; Eckerle, L.D.; Sims, A.C.; Denison, M.R. SARS coronavirus replicase proteins in pathogenesis. Virus Res. 2008, 133, 88–100. [Google Scholar] [CrossRef] [PubMed]
  19. Angeletti, S.; Benvenuto, D.; Bianchi, M.; Giovanetti, M.; Pascarella, S.; Ciccozzi, M. COVID-2019: The role of the nsp2 and nsp3 in its pathogenesis. J. Med. Virol. 2020, 92, 584–588. [Google Scholar] [CrossRef] [PubMed]
  20. Li, J.Y.; Liao, C.H.; Wang, Q.; Tan, Y.J.; Luo, R.; Qiu, Y.; Ge, X.Y. The ORF6, ORF8 and nucleocapsid proteins of SARS-CoV-2 inhibit type I interferon signaling pathway. Virus Res. 2020, 286, 198074. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Genomic diversity of SARS-CoV-2 and patient mortality. Phylogenetic analysis of SARS-CoV-2 genomes from 2508 patients. The unrooted tree showed four main Clades (Clades 1–4) with a strong phylogenetic support as specified for each major branch, and mortality rates varied significantly among Clades as indicated (X2 = 47.93, d.f. =3, p < 0.0001). GISAID clade names are indicated in parenthesis for comparison.
Figure 1. Genomic diversity of SARS-CoV-2 and patient mortality. Phylogenetic analysis of SARS-CoV-2 genomes from 2508 patients. The unrooted tree showed four main Clades (Clades 1–4) with a strong phylogenetic support as specified for each major branch, and mortality rates varied significantly among Clades as indicated (X2 = 47.93, d.f. =3, p < 0.0001). GISAID clade names are indicated in parenthesis for comparison.
Viruses 13 00227 g001
Figure 2. Patient sex-ratio, geographic and temporal distribution of SARS-CoV-2 clades. (a) There were no significant variations in patient sex-ratio among clades (X2 = 4.02, d.f. = 3, p = 0.26). (b) Patient age distribution among Clades was similar, although patients from Clade 1 were significantly younger (Tukey post hoc test, p < 0.01). (c) Geographic regions presented significant differences in Clade proportion (X2 = 522.65, d.f. = 12, p < 0.0001). (d) The proportion of Clades varied significantly over time (X2 = 821.54, d.f. = 18, p < 0.0001).
Figure 2. Patient sex-ratio, geographic and temporal distribution of SARS-CoV-2 clades. (a) There were no significant variations in patient sex-ratio among clades (X2 = 4.02, d.f. = 3, p = 0.26). (b) Patient age distribution among Clades was similar, although patients from Clade 1 were significantly younger (Tukey post hoc test, p < 0.01). (c) Geographic regions presented significant differences in Clade proportion (X2 = 522.65, d.f. = 12, p < 0.0001). (d) The proportion of Clades varied significantly over time (X2 = 821.54, d.f. = 18, p < 0.0001).
Viruses 13 00227 g002
Figure 3. Temporal variations in Clade proportion per geographic region. Changes in the proportion of the indicated clades over time are shown for Asia, Europe, Africa and America.
Figure 3. Temporal variations in Clade proportion per geographic region. Changes in the proportion of the indicated clades over time are shown for Asia, Europe, Africa and America.
Viruses 13 00227 g003
Table 1. SNPs of SARS-CoV-2 and their association with patient mortality.
Table 1. SNPs of SARS-CoV-2 and their association with patient mortality.
SNPPositionReferenceVariantX2p Value *OROR 95%CIProteinAA Change $
C/T100588.10%11.90%0.2570.61251.150.66–1.99nsp2T265I
C/T236297.60%2.40%2.6020.10683.690.51–26.81nsp2No change
C/T278296.10%3.90%1.8520.1730.590.29–1.2nsp3No change
C/T298367.70%32.30%20.454<0.0001 *0.380.24–0.61nsp3No change
C/A625895.80%4.20%0.1770.6740.820.33–2.06nsp3T2016K
C/T872890.90%9.10%9.7680.0018 *3.671.35–10.1nsp4No change
G/T11,02987.60%12.40%11.9470.0005 *3.361.47–7.69nsp6L3606F
C/T13,67695.90%4.10%0.0010.9810.990.42–2.3nsp12 (RdRp)A4489V
T/C14,35367.40%32.60%40.812<0.0001 *0.220.13–0.39nsp12 (RdRp)P4714L
C/T14,75196.80%3.20%2.0750.1492.450.6–10.1nsp12 (RdRp)No change
C/T15,27094.10%5.90%3.3810.0660.560.310.99nsp12 (RdRp)No change
C/T18,82387.30%12.70%1.3770.2410.750.47–1.19nsp14No change
A/G20,21493.10%6.90%0.1120.7381.120.56–2.24nsp15No change
C/T22,39094.80%5.20%4.2010.040.510.28–0.93SNo change
A/G23,34931.90%68.10%38.338<0.0001 *4.232.46–7.27SG614D
C/T23,87596.00%4.00%0.1090.7411.160.47–2.91SNo change
G/T25,50872.30%27.70%0.1560.6930.930.64–1.35Orf3aQ57H
G/T26,09095.90%4.10%6.5720.01046.370.88–45.96Orf3aG251V
C/T26,68191.00%9.00%1.4870.2220.710.42–1.2MNo change
T/C28,09090.20%9.80%11.1940.0008 *0.250.09–0.69Orf8L84S
C/T28,25795.70%4.30%0.1990.6561.220.49–3.05NP13L
C/T28,80094.70%5.30%2.5870.1080.580.31–1.08NS194L
G/A28,82397.70%2.30%6.8120.00910-NS202N
G/A #28,82782.00%18.00%31.579<0.0001 *2.932.05–4.19NR203K G204R
G/A #28,82882.00%18.00%32.172<0.0001 *2.962.08–4.24NR203K G204R
G/C #28,82982.00%18.00%31.444<0.0001 *2.922.05–4.18NR203K G204R
G/A/T29,68895.60%2.50%12.8960.0016 *--3’UTR-
* indicate statistically significant association of Single-Nucleotide Polymorphisms (SNP) with mortality after Bonferroni correction (The adjusted threshold for significance was 0.00185). # These SNPs occur in the same sequences and affect two consecutive codons resulting in two amino acid changes. $ Amino acid (AA) position within the respective proteins is indicated.
Table 2. Parameter estimates of the Generalized Linear Model (GLM) for patient mortality.
Table 2. Parameter estimates of the Generalized Linear Model (GLM) for patient mortality.
TermEstimateStd ErrorX2p ValueLower CLUpper CL
Intercept−5.1031.14337.51<0.0001 *−7.533−3.310
Africa−1.0690.6755.370.021 *−2.8360.030
Asia0.1980.2293.900.048 *−0.2270.717
Europe−0.7300.2736.870.009 *−1.264−0.148
North America0.2390.2823.310.069−0.3050.834
Jan−1.9291.2652.080.149−6.105−0.096
Feb−1.3600.7732.080.149−3.2920.010
Mar−0.1000.3851.800.18−0.7660.818
Apr0.8100.3798.760.003 *0.1611.721
May0.3170.4542.160.142−0.5411.320
Jun1.1610.43910.100.002 *0.3622.147
Female sex−0.2160.1009.260.002 *−0.419−0.019
Age0.0520.006108.63<0.0001 *0.0410.064
C29831.8330.75311.030.001 *0.6903.585
T2983−1.1600.6950.150.697−2.1470.530
C14353−2.3490.6900.360.549−3.514−0.583
T143530.7670.5874.770.029 *−0.1072.434
AAC288270.0010.5781.070.301−0.8201.648
GGG28827−0.8240.5730.410.523−1.6250.819
* Statistically significant p values. Std Error: standard error. CL: 95% confidence interval of estimates. The overall model had a X2 = 201.02, p < 0.0001. AICc = 779.7, with significant effect tests for geographic region (X2 = 54.27, d.f. = 4, p < 0.0001), time of year (X2 = 41.48, d.f. = 6, p < 0.0001, sex (X2 = 9.26, d.f. = 1, p = 0.002), age (X2 = 108.63, d.f. = 1, p < 0.0001), C/T2983 (X2 = 30.04, d.f. = 2, p < 0001), C/T14353 (X2 = 29.30, d.f. = 2, p < 0.0001, and CCA/GGG28827 (X2 = 14.96, d.f. = 2, p = 0.0006).
Table 3. SNPs of SARS-CoV-2 and their association with patient mortality in the validation cohort.
Table 3. SNPs of SARS-CoV-2 and their association with patient mortality in the validation cohort.
SNPPositionReferenceVariantX2p Value *Protein
C/T298375.73%24.27%20.318<0.0001 *nsp3
C/T872889.59%10.41%8.2500.0041 *nsp4
G/T/A11,02990.07%9.93%7.6330.022nsp6
T/C14,35373.71%26.29%10.8450.0010 *nsp12 (RdRp)
A/G23,34923.91%76.09%20.529<0.0001 *S
T/C28,09089.53%10.47%8.3030.0040 *Orf8
G/A #28,82760.96%39.04%9.3950.0022 *N
G/A #28,82860.99%39.01%9.3740.0022 *N
G/C/A #28,82960.44%38.94%31.4440.0094N
* indicate statistically significant association of SNP with mortality after Bonferroni correction (The adjusted threshold for significance was 0.0055). # These SNPs mostly occur in the same sequences but not exclusively as in the first cohort.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Back to TopTop