A new pandemic, coronavirus disease 2019 (COVID-19), emerged in 2019 and was caused by a new coronavirus, designated as severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) [1
]. SARS-CoV-2 is a positive-sense single-stranded RNA virus with a genome of about 30 kb that encodes four structural proteins: the spike (S) protein, the envelope (E) protein, the matrix (M) protein, and the nucleocapsid (N) protein; together with 8 accessory proteins and 16 non-structural proteins [4
], including the RNA-dependent RNA polymerase (RdRp) and the nsP14 with exonuclease activity and proof-reading function [5
]. Due to its high transmissibility and lack of pre-existing immunity against this novel virus in the human population, the rapid spread of SARS-CoV-2 is currently a huge threat to public health and global economies.
The first case of COVID-19 in Sweden was reported on 31 January 2020 from a woman returning from Wuhan to Jönköping. Soon after that, several introductions of COVID-19 cases were reported and all were travel-related cases from Italy and Iran. The community transmission of COVID-19 was thought to start in late February, especially in the Stockholm, Sörmland, Uppsala, Västra Götaland, Örebro, and Östergötland regions. During the early pandemic (until 14 May), a total of 29,739 COVID-19 cases and 3834 deaths had been confirmed in Sweden [6
To prevent the spread of SARS-CoV-2, many countries adopted strict non-pharmacological interventions (NPIs) such as lockdowns, travel restrictions, and widespread business and school closures to stop the transmission [7
]. However, in response to the pandemic, Sweden took a unique strategy where less strict NPIs were implemented. Instead, social recommendations were advised, with the aim to slow down the spread of the virus and protect the risk group at the same time [8
]. This unique strategy was meant to slow down the viral transmission in the population, but as compared to more strict strategies, resulted in a higher probability of a slow but continuous evolution of the virus. The high level of transmission provided us with an opportunity to investigate the evolutionary profiling of Swedish SARS-CoV-2 over time.
This descriptive study is based on Swedish SARS-CoV-2 sequences that are freely available from GISAID [9
], where we compared how these strains diverged from the Wuhan prototype of SARS-CoV-2. The study traced the dynamic mutational profiles of SARS-CoV-2 in Sweden and calculated the time points when community transmission for these mutational profiles likely started. These viral characteristics help us to understand how SARS-CoV-2 spread under the current Swedish mandates against COVID-19 and the evolutionary traits it acquired within this time-frame.
Continuous molecular tracing of SARS-CoV-2 is needed for effective surveillance and interventions. FHM has been monitoring the molecular traits of Swedish SARS-CoV-2 since the initial cases in Sweden. Two reports from FHM, not yet published in peer-reviewed journals but available on their web-page (www.folkhalsomyndigheten.se
), indicate that the initial introductions of SARS-CoV-2 to Sweden originate from Italy and Austria [18
]. Our study has similar findings to the FHM reports with independent genotypes circulating, which are highly likely to have originated from independent geographic locations. However, our study is more focused on the genetic variations among the Swedish SARS-CoV-2 sequences and the evolutionary events that have occurred.
Most RNA virus populations exist as complex mixtures of genetic and phenotypic variants, resulting from the high RNA polymerase error rate [20
]. The theoretical advantage of maintaining such a diverse viral population is that a variant might fit into a new environment when the virus spreads. In certain circumstances, some mutations could be drivers for the emergence of new trains with changed pathogenicity. For instance, a mutation in the Zika virus membrane region (prM-S139N) emerged in a viral lineage preceding the devastating epidemic in the Americas [21
], while a single mutation (GP-A82V) in Ebola virus increased the infection rate of human cells [22
]. However, coronaviruses have RDRp and nsP14 proteins with proofreading, and therefore mutations occur at a lower rate as compared to most other RNA viruses [23
]. Still, genetic drift is the main evolutionary mode for Swedish SARS-CoV-2, and the wide spreading of SARS-CoV-2 have already resulted in different clades/lineages that differ from the original strain from Wuhan, where the first cases were found (Figure S1
). There is no information available on whether these variants could affect the transmissibility or infectivity of SARS-CoV-2. The continuous pandemic may enable accumulation of immunologically relevant mutations in the SARS-CoV-2 genome [24
]. Point mutations have been shown to result in resistance to neutralizing antibodies in MERS-CoV [25
] and SARS-CoV [26
]. Antigenic drift has been demonstrated in other CoVs, including the common cold coronaviruses OC43 and 229E, and SARS-CoV [27
]. Our findings that D936Y in the S protein is under positive selection is consistent with antigenic drift playing a role for SARS-CoV-2 as well. The S protein of SARS-CoV-2 is responsible for viral entry into host cells through the receptor binding domain (RBD). Mutations in the S protein may impact development of pharmacological interventions and sensitive diagnostic methods. However, the functional change of this mutation is still unclear. One study using mutant modelling and analysis showed that it could weaken the post-fusion assembly for the virus [31
]. Although the frequency of S936Y is low worldwide, increased frequency has been observed in Nordic countries: 69% (178/258, the number for mutant’s appearance/total number of SARS-CoV-2) in Finland, 22% (116/531) in Sweden, and 11% (9/83) in Norway (data from 3 August, http://covid19.datamonkey.org
Our study also indicates that SARS-CoV-2 evolves through certain mutational profiles, i.e., multiple genes are likely involved in the evolution. A mutated virus must contain multiple mutations in different genes in order to keep up with stringent evolutionary constraints [32
]. Those mutations that are favoured by natural selection can spread in the population and act as the mutational backbone for further genetic variants to evolve from. For our study, we set a ≥5% frequency threshold in the population as the cut-off for the variant sites. We found that the basis mutations, which contain C241T, C3037T, C14408T, and A23403G, combined with other mutations can be classified into 10 mutational profiles in Sweden. A23403G is one of the most prominent mutations; it occurs in the S protein at amino acid residue 614, where Aspartic acid is substituted by Glycine (D614G). The D614G mutant strain is designated as the “G clade” by GISAID and originated in Europe, and further spread to North America and Oceania, then Asia [33
]. This mutation can increase infectivity of SARS-CoV-2 based on in vitro experiments [24
]. In Sweden, we found that on 14 May, the frequency of D614G on the S1 protein was 94.8% in the population. All MPs with the exception of MP1 had the basic genomic mutation A23403G. Out of the 10 mutational profiles, MP6 appeared latest within our investigation period and could have the carrying capacity to outcompete MPs in the population after our time-frame. Cavallo L. et al. found that the D614G/ D936Y co-occur on the S1/S2 protein, and their emergence was traced back to 15 March in Washington, USA, and later on spread to Wales, Iceland, and the Netherlands [31
]. This provides more evidence that multiple mutations can modulate viral transmission, replication efficiency, and virulence in different regions of the world [34
]. Therefore, exploring mutational profiles of sequences is an important complement to analysing single nucleotide polymorphisms and may be more efficient. We saw this co-occurrence of D614G/D936Y in our data-set with a frequency of 17.2%, which was the same frequency as MP6. MP6 has the same mutations as in the findings of Cavallo L. et al., but with the additional mutations T265I on ORF1ab, Q57H on ORF3, and the four basic mutations (C241T, C3037T, C14408T, and A23403G). We are unable to ascertain the function of the additional mutations found in MP6 compared to S1/S2 protein findings: this will require additional characterization.
Due to high viral transmissibility and lack of pre-existing immunity, COVID-19 cases surged in late February and March, mainly in Stockholm. From our Bayesian phylogenetic method, we have calculated the emergence of COVID-19 in Sweden and the start of community transmission, which occurred in Stockholm. We found 1.5425 × 10−3
substitutions per year as the evolutionary rate of Swedish SARS-CoV-2 by using the formal Bayesian inference. This is similar to earlier reports that demonstrated 1.12 × 10−3
substitutions per year for SARS-CoV-2 [35
]. However, substitution rates may be overestimated, as most mutations are under purifying selection [37
]. In addition, this analysis requires caution due to some uncertainties as a result of small sampling size and model selections during the estimations. Therefore epidemiological evidences have to be incorporated to the analysis, to reduce the descriptive conclusions of this study [38
]. During the pandemic, there have been frequent updates for new sequenced isolates with evolving nomenclature systems for SARS-CoV-2 such as Nextstrain, GISAID, and PANGOLIN. According to the PANGOLIN system [39
], lineage B.1 is the predominant global lineage, which comprises the large Italian outbreak and is also associated with many outbreaks in Europe [40
]. Lineage B.1.1 is the main lineage in Europe and was exported to several areas of the world [39
]. B.1 and B.1.1 are the major lineages in Sweden. To further see if how these major lineages transmitted into Sweden, the report from FMH compared the single nucleotide polymorphism (SNP) profiles of Swedish sequences and the sequences from Italy and Austria within the B.1 and B.1.1. They found a clear link between the sequences from Sweden and Italy within B.1.1. They also observed similarities between sequences from Sweden and Austria within B.1. However, unlike the Swedish B.1 isolates, the Austrian sequences had no mutations in the S protein at position 936. One explanation of the result seen by FHM could be that further mutational evolution occurred in Sweden or another geographical location, or that not enough sequencing in Austria was done to detect these mutations. Unlike the FMH reports, our mutational profiles systems, on the other hand, can further distinguish those genetic variances with more precision, as B.1 can be further divided into MP4, MP5, MP6, MP8, and MP9, while B.1.1 can be further divided into MP2, MP3, and MP10. This additional information can aid in the assessment of the evolutionary paths that SARS-CoV-2 virus can take to become the predominant genotypes in the population. From remapping the mutational profiles involved in our analysis in Figure 3
and Table 2
, we can see a clear clustering pattern that still matches with the PANGOLIN and GISAID classification systems that standardized SARS-CoV-2 nomenclature. Therefore, the use of mutation profiles can be used in conjunction with other SARS-CoV-2 nomenclature systems to aid in showing the local sub-populations that occur in a given location during the SARS-CoV-2 pandemic, such as those presented in our paper.