1. Introduction
Papaya ringspot virus (PRSV), a single-stranded, positive-sense RNA virus, belongs to the family
Potyviridae and genus
Potyvirus. The virus is primarily grouped into two serologically indistinguishable strains: papaya-infecting type (PRSV-P) and cucurbit-infecting type (PRSV-W) [
1]. The host range of PRSV-W is limited to Chenopodiaceae and Cucurbitaceae, while PRSV-P can infect plants in the papaya family (Caricaceae) as well [
1,
2,
3]. The site-directed mutagenesis study in recombinant viruses showed that lysine amino acid (aa) at position 27 in NIa-Pro determines the host specificity in PRSV-P, as a single aa change at that position from lysine to aspartic acid can change the host range of PRSV-P to non-papaya-infecting [
4].
PRSV particles are non-enveloped, flexuous, and filamentous rods about 680–900 nm in length and 12–15 nm in diameter. They are transmitted by several aphid species in a non-persistent, non-circulative manner [
5,
6]. The virus can be mechanically inoculated, while no seed transmission has been reported. Similar to other potyviruses, PRSV has a linear single-stranded and positive-sense RNA (+ssRNA) genome that is approximately 10.3 kb. The PRSV genome comprises a 5ʹ untranslated region, a single open reading frame (ORF) that codes for a major polyprotein, and a 3’ untranslated region. The polyprotein is proteolytically processed by virus-encoded proteases into 10 mature proteins: P1 (63k), helper component protease (Hc-Pro, 52k), P3 (46k), 6K1 (6k), cylindrical inclusion (CI, 72k), 6K2 (6k), nuclear inclusion protein a-virus protein genome linked (NIa-Vpg, 27k), nuclear inclusion protein a-protease (NIa-Pro 21k), nuclear inclusion protein b (NIb, 59k), and coat protein (CP, 36K) [
7]. A small protein called pretty interesting potyvirus ORF (PIPO) is synthesized by an additional ORF in the P3 region [
8].
The CP gene of potyviruses, including PRSV, is located at the 3’ terminal end of the viral genome and encapsidates the RNA genome of the virus. The CP of potyviruses is a multifunctional protein and has a significant role in the viral life cycle. For instance, it is involved in aphid transmission in association with HC-Pro [
9], cell-to-cell and systemic movement [
10], virus assembly [
7], and host adaptation [
11]. Classification of potyviruses based on the CP gene was quite common until the mid-2000s as it is considered the most conserved protein among potyviruses. CP is the only structural protein in potyviruses, and its multiple subunits form a protective coat for the RNA genome. The cleavage motif for CP and Nib region in PRSV-W is VFHQ/S [
12].
At least 94 viruses from 17 different families, including
Potyviridae, have been reported to infect cucurbit crops [
13], while 15 of those belong to the genus
Potyvirus [
6] PRSV-W is one of the major viruses infecting cucurbits that cause substantial yield losses of cucurbits worldwide [
13,
14,
15,
16,
17,
18,
19,
20]. PRSV-W induces a wide range of symptoms in different cucurbit crops, which includes mosaics, mottling, stunting, vein clearing, shoestrings on leaves, ringspots, and streaks on fruit stems and petioles, thereby reducing both quality and quantity of fruit production [
13].
The highly error-prone, RNA-dependent RNA Polymerase (RdRp) of RNA viruses, including PRSV-W, lack proofreading ability, enabling them to generate a large pool of genetically distinct sequences (often referred to as a ‘mutant swarm’) in a short generation time compatible with the concept of viral quasispecies [
21]. These attributes contribute to high levels of genetic diversity, the ability to adapt to changing environments, including new hosts, and to evade host resistance [
22]. Numerous molecular studies have been conducted in recent decades on ecology, etiology, pathogenesis, molecular biology, diversity, evolution, and control strategies of PRSV-P, but very few about PRSV-W. Thus, further investigation of the evolution of the viral population is important to deduce reliable diagnostic tools and effective management strategies for combating PRSV-W.
Previously, an evolutionary study was conducted on 64 PRSV-W isolates from watermelon in Oklahoma during a single growing season with a limited area of sampling [
12]. In this study, we performed a comprehensive evaluation of population differentiation and genetic diversity among >100 PRSV-W isolates from six counties of Oklahoma (
Figure S1) during three growing seasons. We hypothesized that since the virus sequences were sampled within a short span of time, there would be strong purifying selection, and the population would consist of a mutant cloud or mutant swarm. Our other hypothesis was that the phylogenetic grouping of the isolates would be influenced by geographical location, hosts, and year of collection. We also evaluated mutation frequency and its pattern within individual clones of PRSV-W isolates collected from different regions of Oklahoma.
3. Discussion
This study confirms our first hypothesis that there is a strong purifying selection within the PRSV-W populations. The selection is likely acting on removing deleterious mutations caused by error-prone replication. The high mutation frequency within all PRSV-W populations from Oklahoma confirms that the population exists as mutant clouds compatible with the quasispecies concept. The PRSV-W populations within this study were collected in a relatively quick time frame, which makes high mutant clouds the likely outcome [
23]. Additionally, these mutation frequencies were remarkably consistent in all the populations (based on geography, hosts, collection years, and single/mixed infections) within the range of 10
−3. Mutations are the major driving forces for monopartite RNA virus variation in addition to recombination. The error rate of RNA replication ranges from 10
−3–10
−5 base per copying cycle giving rise to high diversity within populations due to large mutant clouds. The high mutation rates reflect an evolutionary strategy as these mutant clouds usually work in favor of viruses for adaptation during environmental stress [
24,
25]. The majority of the mutation events in this study were substitutions, and rarely any insertion/deletions (indels) were found. The indel mutations are usually rare but are lethal in most cases [
26]. Due to the abundance of deleterious mutations, the level of dN is generally higher in closely related sequences compared to distantly related sequences [
27], which explains slightly higher purifying selection in Oklahoma isolates compared to global isolates.
The replication rates of most RNA viruses are swift, so they are able to reach exceptionally large population sizes within a brief period of time [
26]. However, this large population size is not, in fact, an effective population size, as a substantial part of this population consists of mutants that will not pass to the next generation [
25]. The genetic bottleneck reduces the population size below a threshold level to facilitate the transmission of fittest variants, thereby limiting the size of the effective population [
28]. The genetic bottleneck usually is the product of the biology of the vector and its feeding habit and can also occur at different moments of the viral life cycle, such as virus movement between plant cells during systemic infection and horizontal transmission [
29,
30,
31,
32]. In addition, the purifying selection helps viruses maintain genetic stability by eliminating less-fit mutants with deleterious effects [
33]. The genetic variations due to mutation are also structured by gene flow [
34]. The gene flow among different hosts, geographical regions, and different parts of the same plant helps in shaping the global genetic diversity [
35]. The low level of long-distance movement or gene flow might be the reason behind the non-uniform and variable viral populations in this study. However, this low level of gene flow was enough to accommodate variants from different phylogroups occurring in the same geographical area.
Utmost care was taken to reduce the mutations during RT-PCR steps by employing a number of strategies: high-fidelity reverse transcriptase and Taq polymerase were used, the number of PCR cycles was limited to 25, and any mutations found in only one direction of Sanger sequencing were not considered. Despite these precautions, there is a chance that some of these mutations might be due to experimental error. Similar to the study conducted by Simmons et al. [
36], we calculated the highest possible number of erroneous mutations due to RT-PCR. The total number of possible mutations due to RT was ∼9 (2.9 × 10
−5 mutations × 864 sites × 370 clones), and PCR was ∼7 (2.28 × 10
−5 mutations × 864 sites × 370 clones), which adds to a total of ∼16 mutations. Even if we deduct these (possible) artifact mutations (16) from the total number of mutations (376) observed in the study, the mutation frequency remains in a similar range. However, the actual erroneous mutations due to experimental error might be significantly less than these calculated values due to the aforementioned experimental considerations.
The PRSV-W isolates collected from the same county, host, and growing season were grouped in different phylogroups, while isolates collected from different counties were grouped in the same phylogroup. This rejects our second hypothesis that these factors play a role in phylogenetic clustering. For instance, PRSV-W isolates collected from Blaine County in a single growing season (2017) from the same host (pumpkin) grouped in three different phylogroups (
Figure 1). Similarly, PRSV-W isolates from Blaine, Cimarron, and McCurtain counties collected in 2018 fell in two different phylogroups. This diversity is also well supported by the higher within-group mean evolutionary distance of PRSV-W isolates from these counties compared to others (
Table 2). Some of the isolates from geographically distant locations (Muskogee and Caddo counties) even had identical sequences. In addition, isolates from two far corners of Oklahoma with more than 900 kilometers of distance were grouped together closely in the same phylogroup (phylogroup 1), irrespective of their collection year and host. The close evolutionary distance between isolates from various locations might have caused the close evolutionary relationship. The lack of geographical connectivity in phylogeny among these isolates can be attributed to different possibilities. First, all these isolates from Oklahoma might have been derived from the same most recent common ancestors (MRCA). Second, the virus or virus harboring aphids likely travels with harvested plants and fruits to various parts of the state, thereby facilitating spread in new locations. In both cases, the virus population can use the wild host as their reservoir during times other than the growing seasons of their primary hosts [
13]. In addition, none of the phylogroups within Oklahoma had distinct fixed mutations in terms of nt and aa, indicating the recent common ancestry of all these populations. The anomaly was the isolates from Tulsa, which had three distinct aa changes compared to other populations at positions 44 (Alanine–Threonine), 76 (Valine–Isoleucine), and 120 (Serine–Asparagine).
Similarly, other parameters considered in the study, viz. host and collection years, also did not have a significant effect on phylogeny. For instance, the genetic differentiation between different hosts and their diversity was not significant in all populations of PRSV, and none of the frequent mutations were observed in specific hosts. The PRSV-W virus isolates collected at different points of time (2008–2018) from the same location did not cluster according to the collection years. However, if the isolates collected in the same year fell in the same phylogroup, they tended to group together, indicating a loose association between collection time and their evolutionary fate. This is further bolstered by the fact that mutation frequencies of virus isolates collected from different hosts and in different collection years remained highly similar (
Table 8 and
Table 9).
The clustering pattern of global isolates showed distinct geographical clustering, with Asian, American, and Oceania isolates falling in different phylogroups. The two European isolates from France and Poland were anomalous as they were grouped with two distinct groups. More isolates from Europe are needed to evaluate if all of them cluster together with either American and Oceanian isolates or form separate clusters among themselves. The grouping pattern in the phylogeny of PRSV-W in this study is similar to recent studies [
37,
38], which showed that isolates from different parts of the world grouped together with one phylogroup and a few Asian isolates in different phylogroup. The PRSV isolates from other parts of the US were close to Mexican and other American isolates, as observed previously [
39,
40]. The distinct geographical clustering of the PRSV population based on continents shows PRSV-W populations do not have recent travel history across the continents (North and South America are referred to here as ‘Americas’). While the aforementioned (refer to the previous paragraph) factors explain geographical connectivity among virus populations in nearby locations, longer distance movement from these modes of transmission is unlikely. Similar to the clustering pattern in Oklahoma, the global isolates did not have distinct phylogenetic differentiation among hosts and collection years.
More than 99% of the mutations observed were substitution mutations. These substitution mutations were biased towards transitions with a high proportion (>75%) of purine to purine or pyrimidine to pyrimidine nt change (
Figure 4). The transition mutation biases are common in viral systems and were noted in several previous studies [
41,
42,
43,
44]. All the combinations of nt substitutions involving G and C had the mutations favoring change to these nt, thereby favoring gain of net GC content. This net gain of GC content was also observed in a study conducted by Nigam et al. [
44]. In addition, the resulting aa changes from these GC-rich mutations mostly involved aa Arginine, Lysine, Asparagine, Aspartic acid, Glycine, and Alanine with either loss or gain of the net charge. Interestingly, all these aa are also disorder-promoting [
45]. Conversely, the order-promoting aa, such as Tryptophan and Cysteine, were rarely involved in substitution mutations.
Mutation frequency within the N-terminal region of CP was highest, followed by the C-terminal region and core region. Although mutations were frequent in core regions, they were disproportionately silent and included less positively selected sites in comparison to N- and C-terminal regions. This further consolidates the evidence of a highly conserved core region. All but seven isolates from McCurtain County, which were collected in 2018, had a DAG motif at amino acid positions from 7 to 9 in this study. These seven isolates have Threonine instead of Alanine. The Alanine to Threonine mutation was also observed in three other global isolates of PRSV; one from the USA and two from Taiwan. In addition, few isolates had NAG, DSG, and DSA instead of DAG motif. The DAG motif, highly conserved among Potyvirus CP genes, has a significant role in virus transmission by aphids [
46] and is exposed on the viral surface [
47]. However, the mutation in this region has been reported in many studies with efficient aphid transmission [
48,
49,
50,
51,
52,
53]. The other two motifs, PTK and KITC, present in another Potyvirus gene, Hc-Pro, and their interaction with the CP gene also have a vital role in viral transmission by aphids [
54,
55], and these motifs could facilitate aphid transmission in the absence of the DAG motif [
49]. A number of conserved motifs described in Potyvirus CP previously were present in PRSV-W populations in this study with minor or no mutations. In addition, there were more highly conserved motifs in the core region of CP, which are specific to PRSV (
Table 12). These conserved motifs were present in PRSV populations regardless of the biotype and might carry some evolutionary role. Further study is desired to decode the evolutionary messages conveyed by these motifs. The presence/absence of these motifs nevertheless could be useful in diagnostic tools such as primer design.
Recombination analysis of all 101 PRSV-W isolates sequenced in this study, as well as all 165 Oklahoman PRSV-W isolates (101 isolates from this study and 64 isolates from the previous study), was conducted using RDP software. Only two recombination events were detected by two algorithms in 101 PRSV-W isolates and were not significant (data not shown). These results indicate that recombination events are not frequent in the CP-gene-coding region.
To our knowledge, this is the first study on mutational analysis within the quasispecies population of PRSV-W isolates. The present study provides a broad analysis of a wide range of PRSV-W populations isolated from diverse geographical locations of the state, host, and collection time based on the various aspects of evolution such as mutation, genetic differentiation, and phylogeny. The insights provided by this study will enhance existing knowledge of PRSV-W evolution and epidemiology and will be helpful in developing viable management strategies. Specifically, the high diversity of PRSV populations in different geographical locations and the possibility of multiple viral introductions in the same geographical location demand careful consideration towards accommodating different genetic aspects of the virus in multiple locations while developing sustainable control strategies.