Next Article in Journal
Phylogeny and Diversity of the Genus Pseudohydnum (Auriculariales, Basidiomycota)
Previous Article in Journal
Role of the Gene ndufs8 Located in Respiratory Complex I from Monascus purpureus in the Cell Growth and Secondary Metabolites Biosynthesis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intra-Species Genomic Variation in the Pine Pathogen Fusarium circinatum

Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria 0002, South Africa
*
Author to whom correspondence should be addressed.
Submission received: 28 April 2022 / Revised: 2 June 2022 / Accepted: 8 June 2022 / Published: 23 June 2022
(This article belongs to the Section Fungal Genomics, Genetics and Molecular Biology)

Abstract

:
Fusarium circinatum is an important global pathogen of pine trees. Genome plasticity has been observed in different isolates of the fungus, but no genome comparisons are available. To address this gap, we sequenced and assembled to chromosome level five isolates of F. circinatum. These genomes were analysed together with previously published genomes of F. circinatum isolates, FSP34 and KS17. Multi-sample variant calling identified a total of 461,683 micro variants (SNPs and small indels) and a total of 1828 macro structural variants of which 1717 were copy number variants and 111 were inversions. The variant density was higher on the sub-telomeric regions of chromosomes. Variant annotation revealed that genes involved in transcription, transport, metabolism and transmembrane proteins were overrepresented in gene sets that were affected by high impact variants. A core genome representing genomic elements that were conserved in all the isolates and a non-redundant pangenome representing all genomic elements is presented. Whole genome alignments showed that an average of 93% of the genomic elements were present in all isolates. The results of this study reveal that some genomic elements are not conserved within the isolates and some variants are high impact. The described genome-scale variations will help to inform novel disease management strategies against the pathogen.

1. Introduction

Unravelling the genetic basis of population-level phenotypic variation, such as pathogenicity has been the underlying driving force in comparative genomics research [1,2]. For many fungal pathogens, we now know that genomes can be divided into sub-genomic compartments based on their evolutionary rates [3]. This is often referred to as the “two speed genome concept” [4], where some genomic regions seem to accumulate polymorphisms relatively quickly while others remain stable over extended periods of time. These more variable regions are often rich in repetitive elements and usually harbour genes that encode niche-defining phenotypes [5,6]. The latter includes effectors, which are secreted disease determinants of host infection and colonization, and are thus key to all plant–fungus interactions [7,8]. Variable regions are thus valuable resources for studying how fungal pathogens are adapted to their specific environments.
Although selection for disease phenotypes in plant breeding is dependent on knowledge about fungal pathogens’ inter- and intra-species variability [9,10], access to whole genome sequences can speed up the process. For example, virulence can be associated with specific micro and/or macro genomic polymorphisms that are located in specific sub-genomic compartments. The accuracy of such a process is therefore dependent on the quality of genome sequencing technologies, as well as genome assembly and variant calling platforms. Currently, long-read single-molecule sequencing technologies are useful for providing a broad scaffolding framework with which to assemble whole fungal genomes. This is because it can allow for the generation of long (up to 80 kb) sequence reads [11,12] that can span complex structural variants, including stretches of repetitive [12] and highly flexible regions which have shown to be associative with pathogenic determinants [1]. To account for the higher error rate of long-read sequencing technologies, they are often combined with short reads from platforms such as the Illumina technology [13]. Whole genomes in which chromosomes are sequenced from telomere to telomere can thus be assembled, thereby allowing a comprehensive analysis of genomic structural variation, as well as more fine-scale analyses, such as those involving single nucleotide polymorphisms (SNPs) [13].
In this study, we were interested in genomic variation in Fusarium circinatum. This economically important pathogen of pines is responsible for major losses in pine-based forestry around the world. Although high levels of genetic diversity and phenotypic variation have been reported within populations of this fungus [14,15], this was mostly attributed to sexual reproduction and mutation [16,17,18]. Other common sources of variation that could play a role in F. circinatum includes transposable elements [19,20] and horizontal gene transfer (HGT) [21]. Transposable elements can bring about genomic variations in several ways, e.g., transposition into exons, introns, regulatory regions and the mediation of non-homologous recombination [20]. To counter their impact on genome integrity, many fungi employ mechanisms such as repeat-induced point mutation (RIP) [22,23] that may introduce more variations, as have been shown in F. circinatum [24]. HGT is the exchange of genetic material between organisms that are not in a parent–offspring relationship [25]. A notable example in fungal pathogens are dispensable chromosomes, often harbouring pathogenicity genes, that can be exchanged among isolates [2]. In F. circinatum, the loss of a dispensable chromosome, in this case the 12th chromosome, reduces the pathogen’s virulence [26,27].
When trying to link a phenotype with genome-based polymorphism, a range of genomic features may be considered. These can range from micro variants, such as SNPs, to macro variants. The latter is mostly represented by structural variants (SVs) such as copy number variations (CNVs) or indels, translocations, inversions and whole chromosome loss/gain [28]. Although SNPs are most commonly employed, SVs also have great value as their occurrence and distribution can vary within populations [29] where they can have profound effects on phenotype diversity [30]. This is particularly true when SVs occur within genes or in regulatory regions [31,32]. CNVs or indels represent replicated/deleted genomic regions that arise from nonallelic homologous recombination and retrotransposition [33], and have shown to be capable of extensively impacting gene expression and phenotypic diversity [33,34]. Translocations result from recombination between nonhomologous chromosomes leading to reciprocal exchange or nonreciprocal transfer of the genetic material between the chromosomes [35]. Inversions are inverted chromosomal segments in which the DNA content remains the same and only the linear order of the DNA bases is changed [36]. They arise when a chromosome breaks at two points and the segment is reinserted in an inverted orientation. Inversions suppress recombination in heterokaryotypes and can even cause genetic isolation between populations [36,37]. However, relatively few studies have investigated SVs at the whole-genome level in fungi and/or attempted to associate them to phenotypic diversity [29,38,39].
The overall goal of the current study was to determine the extent to which SVs might influence the genome of F. circinatum. Our aims were four-fold: (i) sequencing of whole genomes for a set of geographically diverse F. circinatum isolates by making use of the short-read sequencing technology from Illumina® (Illumina, San Diego, CA, USA), together with long-read sequencing technology from either PACBIO® (Pacific Biosciences, Menlo Park, CA, USA) or MinION (Oxford Nanopore Technologies, Oxford Science Park, Oxford, UK); (ii) compilation of a comprehensive catalogue of fully characterized SVs occurring in the F. circinatum genome; (iii) annotation of genes associated with SVs; (iv) characterization of the F. circinatum pangenome for demarcating conserved and non-conserved genomic regions. This study will thus provide a valuable resource for future investigations on the functional effects of SVs on F. circinatum and contribute immensely towards the understanding of the biology of this pathogen, leading to the development of effective control mechanisms and management strategies.

2. Materials and Methods

2.1. Isolates

In this study we used seven isolates (FSP34, CMWF1803, CMWF560, CMWF567, UG10, UG27 and KS17) of F. circinatum. FSP34 was isolated from pitch canker-affected Pinus sp. in California, USA [38]. Isolates CMWF1803, CMWF560 and CMWF567 all originate from a pitch canker-affected Pinus species in Mexico [14]. Isolates UG10 and UG27 were isolated from pitch canker-affected P. greggii trees in a plantation near Ugie in the Eastern Cape Province of South Africa [17]. Isolate KS17 was isolated from the diseased roots of a P. radiata seedling that were obtained during a study of the pathogen in a commercial seedling production nursery in the Western Cape Province of South Africa [39].

2.2. Genome Sequencing, Data Sets and Assembly

High-quality DNAs were extracted from the respective fungi using lyophilised mycelia with the protocol described by Murray and Thompson [40]. For isolates CMWF560, CMWF567, CMWF1803, UG10 and UG27, Pacbio long reads (10 kb and 20 kb libraries) were generated by Macrogen (Seoul, South Korea), and for isolates KS17 and FSP34, MinION long reads were available in-house at the Forestry and Agriculture Biotechnology Institute (FABI, Pretoria, South Africa). For all the isolates, Illumina HiSeq2000 250bp paired-end sequencing was performed by Macrogen.
The Pacbio reads were filtered using the SMRT® (Single Molecule, Real-Time) portal (Pacific Biosciences, Menlo Park, CA, USA), while MinION read correction and trimming was carried out using CANU [41]. The Illumina reads were trimmed and filtered using Trimmomatic v 0.38 [42], and FASTQC v 0.11.5 was used to check the quality of the trimmed reads. De novo genome assemblies were then generated with long reads using CANU [41]. The assemblies were initially polished with Quiver v 2.2.2 [43] and Nanopolish [44] using Pacbio and MinION raw reads, respectively. The final polishing was carried out with Pilon [45] using the Illumina HiSeq reads.
The polished scaffolds were ordered and oriented into contiguous pseudomolecules based on the macrosynteny that was found within the Fusarium fujikuroi species complex (FFSC) [46] using the LASTZ [47] plugin of Geneious v 7.0.4 [48]. The latter employed as references the chromosome-level genome assemblies of F. fujikuroi [49] and F. temperatum [50]. Following this, redundant contigs that showed high similarity with chromosome scaffolds or other longer contigs were discarded.
Quality checking on the final genome assemblies was carried out by mapping individual reads to the genomes of their respective genome assemblies and then performing a variant calling analysis. For this purpose, BWA MEM version 0.7.17-r1188 [51] with the -M option was used for the Illumina reads, and for the Pacbio and MinION reads we used CoNvex Gap-cost alignMents for Long Reads (NGMLR) version 0.2.7 [12] (accessed on 11 December 2021). After mapping, the SAM files were converted to BAM files using samtools view [52]. Read groups were then added using bamaddrg (https://github.com/ekg/bamaddrg, accessed on 11 December 2021), after which BAM files were sorted using bamtools sort [53]. Duplicated reads were marked with samtools rmdup. Samtools depth was used to determine the depth of coverage for each BAM file. The breadth of coverage was determined using samtools mpileup. For each genome, SVs were detected using Sniffles [12] with default settings.
Completeness of the various genomes was estimated using BUSCO (Benchmarking Universal Single-Copy Orthologs) v 2.0.1 [54] and the “Sordariomyceta” database containing 3725 genes. WebAUGUSTUS [55] was used to annotate the genomes with F. graminearum as the reference.

2.3. Identification and Annotation of SVs

To identify SVs, the genome assembly of isolate FSP34 was used as the reference. FSP34 was chosen for this purpose as it was the first F. circinatum isolate to have its genome sequenced [56], and has since undergone several re-sequencing and assembly improvements [57]. The version used here was further improved to an assembly of 12 chromosome-level scaffolds with 15 unmapped contigs. Quality-filtered reads from all sequencing platforms were used for mapping against the FSP34 genome assembly using BWA MEM for Illumina reads and NGMLR for Pacbio/MinION. As described above, samtools view was used to convert SAM files to BAM files, after which the read groups were added with bamaddrg. The BAM files were sorted with bamtools and duplicated reads were marked with samtools rmdup. SVs were then identified using Sniffles with default settings. To filter variants from low mapping quality regions we used samtools view to extract the low mapping quality reads (MQ < 5) from the sorted BAM files. Samtools depth was then used to compute the base coverage from the sorted BAM files with low mapping quality reads. We then used SURVIVOR bincov [28] to cluster the coverage track into a BED file for filtering. SURVIVOR filter was then used to filter the VCF files, which were sorted and merged using SURVIVOR. The merged VCF file was then used to force-call SVs across all samples. The resultant VCF files were merged to obtain a final multiple sample VCF file. The identified SVs were viewed with the Integrative Genome Viewer (IGV) v 2.4.14 [58].
To annotate the identified SVs with SnpEff [59], we first compiled a SnpEff database for F. circinatum. This was completed using the annotation files that were obtained from WebAUGUSTUS. After the SVs in the VCF files were annotated, we used SnpSift [60] to extract genes with high or moderate impact variants and genes from chromosomal regions with high variant density for further analysis. These genes were subjected to a gene ontology (GO) enrichment analysis using the Fisher test (p value < 0.05) in the Blast2GO [61] plugin in CLC Genomic Workbench (Aarhus, Denmark). The list of all predicted genes from the FSP34 reference genome were used as the reference set for this analysis.
All reads that did not map to the FSP34 assembly were filtered from the respective BWA-MEM and NGMLR alignments. These were then assembled using SPAdes version 3.7.1 [62]. The assembled contigs were annotated using WebAUGUSTUS as described above. A BLASTn and BLASTp analysis of these predicted annotations was conducted using NCBI.

2.4. Identification and Characterization of Micro Variants

For the micro variant analysis, we specifically targeted SNPs, multiple nucleotide polymorphisms (MNPs) and indels in the 1–29 base pair (bp) range. For this purpose, the BAM files generated above were subjected to analysis with Freebayes version v1.2.0-2-g29c4002 (accessed on 11 December 2021) [63] using the options “—ploidy 1”; “—min-mapping-quality 30”; “—min-base-quality 20” and default settings for the rest of the parameters. Filtering of the generated variants was carried out using vcffilter (https://biopet.github.io/vcffilter/0.2, accessed on 11 December 2021). Variants with quality scores (QUAL) that were greater than 30 and with a minimum depth of coverage (DP) greater than 10 were filtered and removed. The identified micro-variants were then annotated using SnpEff, SnpSift and Blast2GO as described above.

2.5. Synteny and Pangenome Analyses

Whole genome alignments were conducted using LASTZ as described above. Pairwise alignments between the genomes were visualized in Geneious. A synteny analysis was carried out using Synchro [64] with the protein FASTA files that were derived from the WebAUGUSTUS annotations. Visualization of SVs was also carried out as graphic outputs from Synchro.
Spine [65] was used to reconstruct the pangenome of F. circinatum from the seven genomes that were examined. This program uses NUCmer [66,67] to align whole genomic sequences to build a nonredundant pangenome and extracts the core and accessory genomic elements as defined in the input parameters [65]. The program can take annotated genomes and output protein information online, but this function currently only works for small data sets, such as prokaryotic genomes. For our purposes, we downloaded the program and ran the analysis on our local servers. For this study, we used whole genome FASTA files and then searched for genes in the resultant data sets using WebAUGUSTUS, as described above. Similarly, BUSCO analysis was carried out on all predicted proteins.
Genomic sequences that were present in all seven genomes, with >85% similarity and a minimum output fragment of 100 bp and a gap of 10 bp between fragments, were regarded as belonging to the core sub-genomic compartment. To determine the distribution of accessory genomic elements among isolates we used the ClustAGE [68] package in Spine. From the ClustAGE bins we selected bins that had sequences >4.5 kb for further analysis of their distribution patterns within the respective genomes. We focused on two groups of accessory genomic elements, the uniquely absent (sequences that were present in 6 isolates and missing only in 1 isolate) and the uniquely present (sequences that were present only in a single isolate). We used BLASTp analysis in NCBI to determine the origins or possible sources of these sequences. Geneious was used to map the accessory genomic elements to the genomes to check their distribution across the twelve chromosomes.

3. Results

3.1. Genome Sequencing, Data Sets and Assembly

We used long reads and short reads to construct five new and nearly complete chromosome-scale genome assemblies for F. circinatum (Table 1). Long-read sequences that were generated with either PacBio or MinION yielded a total of 1,557,037 filtered, corrected and trimmed reads for the seven isolates. Illumina sequencing generated a total of 64,129,476 reads after filtering and trimming. Most of the Illumina reads had a base quality sequence score that was greater than 30. Together with the two previously sequenced isolates (FSP34 and KS17), the total genome sizes ranged from 45,008,552 bp to 43,828,286 bp (Table 2). All genomes were assembled to the expected 12 linear chromosome scaffolds and the single circular mitochondrion scaffold. BUSCO analyses indicated that genome completeness ranged from 98.2% to 99.1% (Table 1). The number of complete genes that was predicted for these genomes by WebAUGUSTUS ranged from 13,854 to 14,382.
The results of our read mapping experiments suggested that the generated assemblies were robust. Mapping reads to their corresponding genome assemblies revealed that sequencing depth for long-read PACBIO and MinION sequencing ranged from 14X to 74X coverage, while for short-read Illumina sequencing ranged from 40X to 60X coverage. Further, no variants were detected on all of the seven genome assemblies when SV calling was carried out with reads that were mapped to their corresponding assemblies. It is thus unlikely that the genome assemblies have errors that could result in false calls of variants based on the reference genome and whole genome comparisons.
For all isolates, sequencing data that could not be assembled to the 12 chromosome scaffolds and mitochondrion ranged from 114,529 bp for isolate FSP34 to 1,142,366 bp for isolate UG10. The latter corresponded to 42 contigs ranging in size from 1992 bp to 65,074 bp. Overall, however, the unmapped contigs from all assemblies showed high similarity with fragments from the 12 core chromosomes (Figure 1), but due to their smaller size we could not accurately position them in their respective chromosomes.
Additionally, a scaffold that could not be assembled to any of the 12 chromosomes and that was larger in size (1,045,806 bp) than chromosome 12 was observed for isolate CMWF1803. Comparisons with the other isolates revealed a similar scaffold with high similarity in isolate CMWF560 (Figure 1). These two isolates also had a higher overall genome size compared to the rest of the isolates. This uncharacterized scaffold might be an additional chromosome for F. circinatum.

3.2. Identification and Annotation of SVs

Of the short reads that were generated for the different genomes, 90.41% to 99.0% mapped to the FSP34 assembly, while 93.7% to 99.2% of the long reads mapped to it (Table 3). Following the construction of BAM files and multi-sample variant calling with Sniffles, we identified a total of 1 828 SVs, ranging in size from 30 bp to >10,000 bp (Table 4: Supplementary File S1). Of these, 1717 were copy number variants (990 deletions, 719 insertions, 8 duplications) and 111 were inversions.
Annotation with SnpEff predicted a total of 476,229 functional effects for the collection of SVs that were identified (Table 5: Supplementary File S2). SnpEff also provided an indication of the magnitude of the predicted effects. Variants with a high impact are disruptive with regards to probable protein function; moderate impact variants are less disruptive and would likely reduce functionality of the protein; low impact variants would be mostly harmless; and modifier variants often occur in non-coding genes, introns, intergenic regions, intragenic regions and downstream to genes where predictions of the effect are more difficult [59]. For the SVs identified here, a greater proportion (88.2%) were predicted to have a high impact, 10.5% a moderate impact, 0.0% (a single effect count) a low impact and 1.3% a modifier impact. The high impact-effect variant types included bidirectional gene fusions (46.6%); chromosome number variations (0.002%); exon loss variants (0.02%); feature ablations (1.6%); frame shift variants (0.03%); gene fusions (37.9%); inversions (10.5%); splice acceptor variants (0.003%); splice donor variants (0.004%); start losses (0.01%); stop losses (0.008%); and transcript ablations (2.0%). Moderate impact-SV variant types included conservative in-frame deletions (0.02%); disruptive in-frame deletions (0.01%); and upstream gene variants (0.5%). Modifier impact variant types that were predicted included downstream gene variants (0.5%); exon regions (0.0%); intergenic regions (0.13%); intergenic variants (0.08%); intron variants (0.01%); non-coding transcripts variants (0.07%); and splice region variants (0.01%).
A total of 4190 proteins were annotated in the regions that were predicted to have deletions, 345 in regions displaying insertions, 8885 were in inversion areas and one protein was in an area displaying duplications (Supplementary File S3). The following biological processes were overrepresented in the gene set of proteins that were annotated in regions associated with deletions (Figure 2): DNA metabolism, cellular protein modification process, chromosome organization, response to stress, signal transduction, macromolecular complex assembly, tRNA metabolism, chromosome segregation, cell division, mitotic cell cycle, small molecule metabolism, biosynthesis, cofactor metabolism, reproduction, cell adhesion and intracellular transport. The overrepresented cellular components with the deletions were macromolecular complex, nucleoplasm, chromosome, ribosome, endosome, cytosol, cytoskeleton, mitochondrion and nuclear envelope. Molecular functions that were overrepresented within proteins displaying deletions were enzyme regulator activity, kinase activity, nucleotidyltransferase activity, cytoskeleton protein binding, enzyme binding, protein binding, ion binding, helicase activity, rRNA binding, ligase activity, isomerase activity, rRNA binding and translation factor binding.
Three biological processes were overrepresented in the predicted gene set that was predicted to be affected by insertions (Figure 3). These were cell adhesion, nickel cation transmembrane transport and ubiquitin-dependent protein catabolism. Cellular components that were overrepresented in insertions were integral components of the membrane, the proteasome core complex and the alpha subunit. Dominant molecular functions in this gene set were threonine-type endopeptidase activity, amidase activity, nickel cation transmembrane and transporter activity.
Six biological processes and four cellular components were overrepresented in the genes that were affected by inversions (Figure 4). The biological processes were cofactor metabolism, protein targeting, homeostatic process, carbohydrate metabolism, cell division and response to the stimulus. Overrepresented molecular functions were transferase activity, transferring phosphorus-containing groups, RNA binding, protein binding and enzyme regulator activity.
The single gene that was predicted to be affected by duplications encoded an RNA binding protein with helicase activity, which is a ubiquitous family of proteins that potentially play a role in cellular processes that involve RNA metabolism [69]. The unmapped reads represented loci that were absent in the reference strain and/or contaminants in our samples. The BLASTp analysis retained top hits from the Fusarium species. Genes that were annotated from these sequences were mainly for proteins that were involved in transport and metabolism (Figure 5).

3.3. Identification and Annotation of Micro Variants

Multi-sample variant calling using Freebayes generated a total of 397,704 polymorphic sites among the genomes that were examined (Supplementary File S4). Of these, 353,117 (88.8%) were biallelic and 44,587 (11.2%) were multiallelic. From the collection of polymorphisms, a total of 461,683 micro variants were identified, of which 68.7% were SNPs, 23.1% were 1–29 MNPs, 3.2% were insertions, 2.4% were deletions and 2.7% were mixed. The observed genome-wide transitions/transversions (Ts/Tv) ratio was 2.9 and the overall variant rate was one variant for every 97 bases. The observed Ts/Tv ratio is consistent with the phenomenon that transitions generally occur at a higher frequency than transversions [70]. The whole genome SNP density showed an uneven distribution of SNPs across chromosome lengths. There were high SNP densities on chromosome arms while the middle regions showed relatively lower SNP densities (Figure 6). For chromosome 11 and 12, the SNP densities were relatively high across full chromosome lengths.
A total of 2,719,213 putative effects were assigned to a collection of micro variants. Of these, SnpEff predicted 0.7% of the effects to have a high impact, 2.3% to have a moderate impact, 2.8% to have a low impact and 94.3% to have a modifier effect. In terms of functional classes, 42.0% of the effects were classified as missense, 0.9% nonsense and 57.1% to be silent variants (Table 6 and Supplementary File S5). High impact variants included frameshifts, gene fusions, splice acceptors, splice donors, start losses, stop gains and stop losses. The predicted moderate impact variants were conservative in-frame deletions, conservative in-frame insertions, disruptive in-frame deletions, disruptive in-frame insertions, missense variants and upstream gene variants. Low impact variants included initiator codon variants, splice region variants, stop retained variants and synonymous variants. Modifier variants included were downstream gene variants, intergenic region, intragenic variants, intron variants and non-coding transcript variants.
The bulk of the predicted effects were outside the coding regions, in downstream, intergenic, intron and upstream regions. A total of 136,959 effects were predicted to impact on exons, 13,881 on genes and 30,148 on coding and non-coding transcripts. Within the splice sites, 266 effects were predicted on the splice site acceptor region, 301 on the splice site donor and 4518 within the splice site region.
A GO enrichment analysis showed that genes that were involved in the following biological processes were overrepresented (p > 0.05) in chromosome regions with high micro variant densities (Figure 7): transmembrane transport, carbohydrate metabolism, phosphorylation, proteolysis, eisosome assembly, secondary metabolism, oxidation-reduction process, nucleoside metabolism, transcription, DNA-templated and regulation of nitrogen compound metabolism. These overrepresented genes are localised within the integral component of membrane, the extracellular space and the eisosome. Overrepresented molecular functions within this gene set were oxidoreductase activity, transcription factor activity, sequence-specific DNA binding, hydrolase activity, acting on glycosyl bonds, hydrolase activity, activity on carbon-nitrogen (but not peptide) bonds, peptidase activity, acting on L-amino acid peptides, zinc ion binding, ADP binding and transmembrane transporter.
Genes that were overrepresented in gene sets affected by high impact micro variants included the following biological processes: transmembrane transport, regulation of transcription from RNA polymerase II promotor, RNA phosphodiester bond hydrolysis, eisosome assembly, cell adhesion and phosphorylation (Figure 8). The overrepresented cellular components were an integral part of the membrane and eisosome. Overrepresented molecular functions included oxidoreductase activity, zinc ion binding, calcium ion binding, ATP binding, RNA polymerase II transcription factor activity, sequence-specific DNA binding, transmembrane transporter activity and ribonuclease activity.
Protein classes that were overrepresented in regions affected by moderate impact micro variants (Figure 9) included these biological processes: transmembrane transport, carbohydrate metabolism, regulation of transcription from RNA polymerase II promoter and phosphorylation. Molecular functions that were overrepresented in genes affected by moderate variants were: oxidoreductase activity, hydrolase activity, acting on glycosyl bonds, DNA binding, hydrolase activity, acting on carbo-nitrogen (but not peptide) bonds, zinc ion binding, ATP binding, RNA polymerase II transcription factor activity and sequence-specific DNA binding.

3.4. Synteny and Pangenome Analyses

Using Spine, a nonredundant pangenome of 50,076,541 bp was built from the seven genomes that were used in this study. WebAUGUSTUS predicted a total of 15,099 complete genes from the pangenome (the pangenome was 96.7% complete according to BUSCO with 76 fragmented BUSCOs and 47 missing BUSCO genes). The percentage completeness of the pangenome is lower than the individual genomes of the seven isolates, which is indicative of the fragmentation of the pangenome. Analyses of the combined proteins that were annotated from the seven isolates indicated that there were 18 missing BUSCO genes based on the “Sordariomyceta” database. BLASTp analyses of the 18 missing BUSCOs revealed that only four of the proteins had top hits to Fusarium species, suggesting that these missing genes may not be common among Fusarium species. The total combined number of annotated proteins from the seven genomes was 98,484, of which 99.2% were clustered into orthogroups [72] with an average of 7.1 genes per orthogroup. Sequences from all isolates were present in 12,424 orthogroups, of which 11,712 represented single copy gene families. A combined total of 765 genes were not assigned to any orthogroup. These were mainly the isolate-specific genes based on the sample that was used in this study.
Whole genome alignments using Spine resulted in a backbone (core) of 42,260,189 bp that was present with a nucleotide similarity greater than 85% in all seven of the genomes (Table 7). This core comprised an average of 93% of all the seven genomes used in this study and had a GC content of 47.2%. A total of 13 282 complete genes were predicted by WebAUGUSTUS in the core genome. A comparison against BUSCO’s “Sordariomyceta” databased indicated that the core genome was 95.6% complete. The BLASTp analysis of the core proteins, excluding hits on F. circinatum, showed 13,182 proteins had top hits to Fusarium species, 48 proteins had top hits to non-Fusarium species and 52 predicted proteins had no blast hits to the NCBI database (Supplementary File S6). Non-Fusarium top hits included species from the genera Neonectria, Trichoderma, Exophlala and Aspergillus. Biological processes that were overrepresented within the set of 48 non-Fusarium proteins were regulation of transcription by RNA polymerase II, methylation and phosphorylation. Overrepresented molecular functions within the 48 proteins were DNA-binding transcription factor activity, RNA polymerase II specific and zinc ion binding.
An average of 7% of the genome sequence across the seven isolates was predicted to constitute the accessory genomic elements (Supplementary File S7). These were distributed across chromosome lengths, with chromosomes 1 to 11 showing higher densities in the sub-telomeric regions, while chromosome 12 and the potential additional chromosome (UC > 500 Kb) had relatively higher densities across full chromosome lengths (Supplementary Figure S1). The number of complete genes in the accessory genomic elements of the isolates was 58,881, of which 64 were complete BUSCOs. The combined accessory genomic elements from all the isolates amounted to a total of 22,175,551 bp with an average GC content of 44.8%. ClustAGE grouped sequences that were greater than 200 bp into 2023 unique bins of similar sequences.
A total of 632 genes were annotated on sequences that were only present in one of the examined genomes. Of these, 596 had best BLAST hits in other Fusarium species, 15 had similarity in non-Fusarium species and 20 genes returned no significant result using BLAST. Isolate CMWF1803 had the highest number of such uniquely present genes (240) while none were found in isolate UG10. Isolates CMWF560, CMWF567, FSP34, KS17 and UG27 had 148, 143, 57, 40 and 3 uniquely present genes, respectively. From the sequences that were absent in only one of the genomes examined, a total of 466 genes were annotated. Of these, 445 had best BLAST hits to other Fusarium species and 21 had best BLAST hits to non-Fusarium species. Isolate FSP34 had the highest number of uniquely absent genes (150) followed by KS17 (140) while none were found in isolate UG27. Isolates CMWF560, CMWF567, CMWF1803 and UG10 had 72, 47, 21 and 36 uniquely absent genes, respectively.
A total of 18 biological processes were overrepresented in the uniquely present proteins. These included tRNA metabolism, chromosome segregation, cellular protein modifications, ribonucleoprotein complex biogenesis, anatomical structure formation involved in morphogenesis, nucleocytoplasmic transport, chromosome organization, cell differentiation, nucleocytoplasmic transport, translation, DNA metabolism, protein transport, sulphur compound metabolism, response to stress, cofactor metabolism and cellular catabolism. Dominant molecular functions in this set were cytoskeletal protein binding, structural constituent of ribosome, RNA binding, phosphatase activity, nuclease activity, GTPase activity, protein binding, bridging and ubiquitin-like protein binding. Overrepresented cellular components in this set were macromolecular complex, cytoskeleton, nuclear chromosome, mitochondrion, Golgi apparatus, ribosome, nuclear envelope and extracellular matrix.
Among the uniquely absent proteins, the following biological processes were overrepresented: mitotic cell cycle, tRNA metabolism, reproduction, anatomical structure morphogenesis, chromosome segregation, cellular protein modification process and organelle organization. Molecular functions that were overrepresented in this set included peptidase activity, nuclease activity and phosphatase activity. The dominant cellular components were macromolecular complex, nuclear chromosome, extracellular matrix and endomembrane system.

4. Discussion

In this study, we used multi-sample variant calling to develop a catalogue of intra-species SVs for the pine pitch canker fungus, F. circinatum. This type of resource is relatively uncommon for fungi, as most previous studies explored genomic variation at the inter-species level [2,5]. These precious comparative genomics studies have provided insights into the biology of closely related species and improved our understanding of how they evolve [5,73]. By contrast, studies on the intra-species genetic variation provide insights into how individuals of the same species accumulate genetic differences and how these affect their adaptability [14,74,75]. As genomes of many isolates of the same species become available it will become routine for whole genome comparative studies to be conducted and enable phenotypic traits to be associated with genotypic variations.
To facilitate the compilation of our catalogue of SVs, we first had to ensure that we had access to high-quality reference assemblies. Most variant calling platforms use a reference genome to identify polymorphisms in other genomes of interest [12,63]. However, considering the broad structural variation that exists within fungal genomes [76], reference genomes can be limiting in variant calling if the assembly is incomplete or if it has assembly errors, which can lead to mapping errors that distort variant calling [77]. Indeed, it is widely recognized that the availability of good reference genomes against which comparisons can be made represent one of the main limitations of efficient and accurate in variant calling [78]. The current genome assembly for F. circinatum isolate FSP34 is near-complete, containing 12 chromosome scaffolds and the mitochondrion scaffold [57]. This isolate has also been used in various previous studies [46,74,75,79] and has emerged as the “model” strain for F. circinatum genomics studies.
In addition to the FSP34 reference genome, we also report here the high-quality genome assemblies for five additional F. circinatum strains. They have different origins, with three (CMWF560, CMWF567 and CMWF1803) originating from pitch canker-affected Pinus spp. tissue that was collected in Mexico [14] and two (UG10 and UG27) that were isolated from pitch canker-affected P. greggii tissue in South Africa [17]. The strains also have different mating types. Fusarium circinatum is heterothallic [80], with three of the isolates (CMWF560, CMWF567 and CMWF1803) being MAT 1–1 and two (UG10 and UG27) MAT 1–2. The five additional genomes were also assembled to near completion, with all 12 of the expected chromosome scaffolds. The 12th chromosome is the smallest and apparently dispensable [2,26,81].
In two of our new genome assemblies we detected a scaffold that likely represents an additional chromosome. These two chromosome-sized scaffolds were found among the uncharacterized scaffolds in the CMWF560 and CMWF1803 sequence data. Subsequent analyses confirmed that they were found only in isolates CMWF560 and CMWF1803, but not in the other five F. circinatum isolates, including FSP34. Chromosome number variation has been reported within species of the FFSC, where at least two dispensable or accessory chromosomes have been visualised [82]. We thus propose that the uncharacterised scaffolds identified here are an additional accessory chromosome, which we call chromosome 13. The 13th chromosome in CMWF1803 was the larger of the two at 1 045 806 bp, which is bigger than chromosome 12 assembled across all the isolates. The overall genome sizes of isolates CMWF560 and CMWF1803 were generally bigger than the rest of the isolates that did not have sequences resembling the additional chromosome.
Our results showed that micro variants were unevenly distributed across the lengths of chromosomes 1–10. In these chromosomes, we observed higher densities over the first 1Mb of chromosome ends, while they were more sparsely distributed in the middle regions of the chromosomes. This distribution pattern is consistent with previous reports of high variant density on sub-telomeric regions [31,32], which are widely recognized as hot spots for genome plasticity in eukaryotes. This is because telomeres accumulate variants during replication as they replicate differently from the inner regions of chromosomes [83]. Further, sub-telomeric regions are prone to recombination and are thus likely to accumulate mutations [84]. In contrast to chromosomes 1–10, chromosomes 11 and 12 generally had high micro variant densities across full chromosome lengths. The sequence and size variability of chromosome 12 [57,85], and the presence of a large reciprocal translocation and an inversion in chromosome 11 [46,50] make chromosomes 11 and 12 the most variable of the chromosomes within the FFSC [50]. This is evidenced in our results as the higher variant density across the full chromosome lengths.
Copy number variants (insertions, deletions and duplications) ranging from SVs as small as 30bps to large ones of >100 kbs were detected in this study. These types of variants are known to play a major role in fungal adaptation [29,86]. Duplications can influence the expression levels of target regions, e.g., duplication of regions containing biosynthetic genes can increase production of the relevant compounds [81,87,88], or duplication of certain genes can increase resistance to toxic compounds [89]. In this study, we identified a gene encoding an RNA binding protein with helicase activity arising by duplication. Furthermore, insertions can drive the accumulation of novel genetic traits, while deletions highlight genomic regions that might be dispensable as pathogenic fungi often shed genomic regions to avoid detection by hosts [6]. Therefore, the copy number variants that were identified in this study provide a robust starting point for further research into the biology of the pitch canker fungus.
Structural variants that impact genes can lead to functional variation among individuals and influence phenotypic traits. We used SnpEff [59] to access the significance of the identified variants based on gene models from the WebAUGUSTUS annotations [55]. Although the accuracy of our gene models still needs to be verified, the identified putative effects provide a basis for future association studies using these isolates. The most prevalent high impact variant effect was gene fusion. Clustering of genes involved in the same metabolic pathway, biological process or structural complex is a well-known phenomenon in fungi [90]. This has a beneficial effect because it allows for the physical coupling of proteins that are functionally related. Inversion was also a common high impact macro structural variant effect. Inversions are indicative of genes whose transcriptional orientation along chromosomes is different [91]. Frameshift mutations are indels within the coding regions of genes that alter the reading frame, resulting in alteration of the downstream codons. In many cases, the deletion or insertion results in truncated gene products [92]. Splice site mutations would result in abnormal splicing leading to base changes in the processed mRNA [93,94]. The phenotypic effect of all these mutations on the different isolates of F. circinatum needs to be investigated. GO enrichment analyses showed that genes that were involved in transcription, DNA binding, transmembrane transport, oxidoreductase activity, hydrolase activity, acting on glycosyl bonds, ion binding, metal binding and cation binding were overrepresented in regions that were affected by high and moderate impact variants. These classes of genes are mainly involved with proteins that are needed by the fungus when interacting with its environment. The presence of variants in these gene classes highlights the pathogen’s propensity to adapt to its environment.
We took a sequence-centric approach [95] in building the pangenome of F. circinatum. We characterized the core, accessory and pangenomes based on nucleotide sequence rather than only the protein-coding portion of the genomes. This approach ensured the inclusion of noncoding genomic elements in the core, accessory and pangenomes. This methodology allowed us to avoid the bias and errors of gene prediction models in the identification of genomic elements that are conserved in F. circinatum. Our BLAST analysis showed that within the core genome there were some proteins which did not return significant hits to closely related Fusarium species, but rather to other, distantly related species. These genomic elements may have been acquired by F. circinatum from other species through horizontal gene transfer [96] and have been subsequently maintained, possibly as they have provided some advantage to the species. Gene gains can also be attributable to de novo gene emergence from non-coding DNA. There were also some proteins that did not return any BLAST hits. These were mainly proteins of unknown function.
These orphan genes which lack homologues in other lineages can also arise through duplications followed by diversification [96]. The proteins that are unique to F. circinatum and are conserved in all the isolates could be involved in lineage-specific adaptations and are potential targets for species-specific pathogen management and control strategies.
Within the accessory genome, isolate-specific open reading frames that were only present in one isolate and not the others (uniquely present) also showed different patterns of possible origins. Predicted proteins that had BLAST hits with closely related Fusarium species could indicate genomic elements that were maintained in some isolates but dispensed in others. These genomic regions could have arisen through high impact variants which include deletions, loss of function mutations (such as frame shifts) and early stop codons, resulting in truncated proteins. Even if truncated proteins are expressed in different isolates, they are less likely to have exactly the same function as full-length proteins. Therefore, the functional variation of truncated proteins could have phenotypic effects on different isolates. Isolate-specific sequences with best BLAST hits to other non-Fusarium species could be indicative of recently gained genomic elements that have not been fixed in F. circinatum populations. The accessory genome elements also had a reduced GC content, in comparison to the core genome. There were more accessory elements mapping to chromosomal regions that also had a high SNP density. Accessory genomic elements were associated with AT rich and highly variable genomic regions.

5. Conclusions

The genomic structural variants that were identified in F. circinatum ranged from SNPs to chromosome scale SVs, including insertions, deletions and inversions. It is possible that some of these variations are neutral with little or no impact on the biology of this pine pathogen. However, it is also conceivable some of the SVs that are described in this study have profound effects on the biology, evolution and niche adaptation of this pine pathogen. The genetic variations could then impact the disease management and control strategies that are used in nurseries and plantations, as these strategies need to be applicable to strains of the pathogen notwithstanding variations between the strains. Therefore, this study provides a resource for future association studies within F. circinatum. Progress in understanding the exact impact of these genetic variations within F. circinatum will also benefit our understanding of the biology of the pitch canker fungus.

Supplementary Materials

The following supporting information can be downloaded at: https://0-www-mdpi-com.brum.beds.ac.uk/article/10.3390/jof8070657/s1, Figure S1: Distribution of accessory genome elements across chromosomes of isolate CMW1803. Accessory genome elements are generally sparsely distributed within the middle parts of the chromosomes while the chromosome arms show a high density; File S1: VCF file showing structural variants (SVs) identified in F. circinatum. Multi-sample variant calling was carried out using Sniffles. A total of 1828 SVs ranging in size from 30 bp to >10,000 bp were identified. These include 1717 copy number variants (990 deletions, 719 insertions, 8 duplications) and 111 inversions; File S2: Structural variant annotation output file from SnpEff; File S3: A total of 476,229 functional effects for the collection of SVs were identified. List of proteins annotated in the regions that were predicted to have deletions, in regions displaying insertions, in inversion areas and in areas displaying duplications; File S4: VCF file showing micro variants that were identified in F. circinatum following a multi-sample variant calling using Freebayes. A total of 397,704 polymorphic sites were identified among the genomes examined; File S5: Micro variant annotation output file from SnpEff. A total of 2,719,213 putative effects were assigned to the identified micro variants; File S6: BLASTp analysis output of the core proteins, excluding hits on F. circinatum. A total of 13,182 proteins had top hits to Fusarium species, 48 proteins had top hits to non-Fusarium species and 52 predicted proteins had no blast hits to the NCBI database; File S7: Accessory genomic elements identified in F. circinatum.

Author Contributions

Conceptualization, A.M.K., E.T.S., M.N.M. and B.D.W.; methodology, M.N.M., L.D.V. and A.M.K.; software, M.N.M., S.v.W., Q.C.S. and A.M.K.; validation, M.N.M. and A.M.K.; formal analysis, M.N.M.; investigation, M.N.M.; resources, B.D.W.; data curation, L.D.V. and M.N.M.; writing—original draft preparation, M.N.M.; writing—review and editing, A.M.K., L.D.V., T.A.D., Q.C.S., S.v.W., M.N.M. and B.D.W.; visualization, M.N.M. and A.M.K.; supervision, E.T.S., A.M.K., L.D.V. and B.D.W.; project administration, M.N.M., E.T.S. and B.D.W.; funding acquisition, B.D.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the South African Department of Science and Innovation’s South African Research Chair Initiative (SARChI grant number 98353) through the research chair of Prof. BD Wingfield. We would like to acknowledge the DSI-NRF Centre of Excellence in Plant Health Biotechnology (CPHB grant number 40945) at the Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Whole Genome Shotgun project for Fusarium circinatum CMWF1803 has been deposited at DDBJ/ENA/GenBank under the accession JAEHFH000000000. The version described in this paper is version JAEHFH010000000. The Whole Genome Shotgun project for Fusarium circinatum CMWF560 has been deposited at DDBJ/ENA/GenBank under the accession JAEHFI000000000. The version described in this paper is version JAEHFI010000000. The Whole Genome Shotgun project for Fusarium circinatum CMWF567 has been deposited at DDBJ/ENA/GenBank under the accession JADZLS000000000. The version described in this paper is version JADZLS010000000. The Whole Genome Shotgun project for Fusarium circinatum UG27 has been deposited at DDBJ/ENA/GenBank under the accession JAELVK000000000. The version described in this paper is version JAELVK010000000. The Whole Genome Shotgun project for Fusarium circinatum UG10 has been deposited at DDBJ/ENA/GenBank under the accession JAGJRQ000000000. The version described in this paper is version JAGJRQ010000000.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Schirawski, J.; Mannhaupt, G.; Münch, K.; Brefort, T.; Schipper, K.; Doehlemann, G.; Di Stasio, M.; Rössel, N.; Mendoza-Mendoza, A.; Pester, D. Pathogenicity determinants in smut fungi revealed by genome comparison. Science 2010, 330, 1546–1548. [Google Scholar] [CrossRef] [PubMed]
  2. Ma, L.-J.; Van Der Does, H.C.; Borkovich, K.A.; Coleman, J.J.; Daboussi, M.-J.; Di Pietro, A.; Dufresne, M.; Freitag, M.; Grabherr, M.; Henrissat, B. Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium. Nature 2010, 464, 367. [Google Scholar] [CrossRef] [PubMed]
  3. Schmidt, S.M.; Panstruga, R. Pathogenomics of fungal plant parasites: What have we learnt about pathogenesis? Curr. Opin. Plant Biol. 2011, 14, 392–399. [Google Scholar] [CrossRef] [PubMed]
  4. Croll, D.; McDonald, B.A. The accessory genome as a cradle for adaptive evolution in pathogens. PLoS Pathog. 2012, 8, e1002608. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Gardiner, D.M.; McDonald, M.C.; Covarelli, L.; Solomon, P.S.; Rusu, A.G.; Marshall, M.; Kazan, K.; Chakraborty, S.; McDonald, B.A.; Manners, J.M. Comparative pathogenomics reveals horizontally acquired novel virulence genes in fungi infecting cereal hosts. PLoS Pathog. 2012, 8, e1002952. [Google Scholar] [CrossRef] [Green Version]
  6. Möller, M.; Stukenbrock, E.H. Evolution and genome architecture in fungal plant pathogens. Nat. Rev. Microbiol. 2017, 15, 756. [Google Scholar] [CrossRef]
  7. De Wit, P.J.; Mehrabi, R.; Van den Burg, H.A.; Stergiopoulos, I. Fungal effector proteins: Past, present and future. Mol. Plant Pathol. 2009, 10, 735–747. [Google Scholar] [CrossRef]
  8. Perez-Nadales, E.; Nogueira, M.F.A.; Baldin, C.; Castanheira, S.; El Ghalid, M.; Grund, E.; Lengeler, K.; Marchegiani, E.; Mehrotra, P.V.; Moretti, M. Fungal model systems and the elucidation of pathogenicity determinants. Fungal Genet. Biol. 2014, 70, 42–67. [Google Scholar] [CrossRef] [Green Version]
  9. Parfrey, L.W.; Lahr, D.J.; Katz, L.A. The dynamic nature of eukaryotic genomes. Mol. Biol. Evol. 2008, 25, 787–794. [Google Scholar] [CrossRef]
  10. McCarthy, C.G.; Fitzpatrick, D.A. Pan-genome analyses of model fungal species. Microb. Genom. 2019, 5, e000243. [Google Scholar] [CrossRef]
  11. Eid, J.; Fehr, A.; Gray, J.; Luong, K.; Lyle, J.; Otto, G.; Peluso, P.; Rank, D.; Baybayan, P.; Bettman, B. Real-time DNA sequencing from single polymerase molecules. Science 2009, 323, 133–138. [Google Scholar] [CrossRef] [PubMed]
  12. Sedlazeck, F.J.; Rescheneder, P.; Smolka, M.; Fang, H.; Nattestad, M.; von Haeseler, A.; Schatz, M.C. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 2018, 15, 461–468. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Quail, M.A.; Smith, M.; Coupland, P.; Otto, T.D.; Harris, S.R.; Connor, T.R.; Bertoni, A.; Swerdlow, H.P.; Gu, Y. A tale of three next generation sequencing platforms: Comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genom. 2012, 13, 341. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Wikler, K.; Gordon, T.R. An initial assessment of genetic relationships among populations of Fusarium circinatum in different parts of the world. Can. J. Bot. 2000, 78, 709–717. [Google Scholar] [CrossRef]
  15. Fru, F.F.; Steenkamp, E.T.; Wingfield, M.J.; Roux, J. High genetic diversity of Fusarium circinatum associated with the first outbreak of pitch canker on Pinus patula in South Africa. South. For. J. For. Sci. 2019, 81, 69–78. [Google Scholar] [CrossRef]
  16. Wikler, K.; Gordon, T.R.; Clark, S.L.; Wingfield, M.J.; Britz, H. Potential for outcrossing in an apparently asexual population of Fusarium circinatum, the causal agent of pitch canker disease. Mycologia 2000, 92, 1085–1090. [Google Scholar] [CrossRef]
  17. Santana, Q.C.; Coetzee, M.P.A.; Wingfield, B.D.; Wingfield, M.J.; Steenkamp, E.T. Nursery-linked plantation outbreaks and evidence for multiple introductions of the pitch canker pathogen Fusarium circinatum into South Africa. Plant Pathol. 2016, 65, 357–368. [Google Scholar] [CrossRef] [Green Version]
  18. Pérez-Sierra, A.; Landeras, E.; León, M.; Berbegal, M.; García-Jiménez, J.; Armengol, J. Characterization of Fusarium circinatum from Pinus spp. in northern Spain. Mycol. Res. 2007, 111, 832–839. [Google Scholar] [CrossRef]
  19. Wöstemeyer, J.; Kreibich, A. Repetitive DNA elements in fungi (Mycota): Impact on genomic architecture and evolution. Curr. Genet. 2002, 41, 189–198. [Google Scholar] [CrossRef]
  20. Kempken, F.; Kück, U. Transposons in filamentous fungi—Facts and perspectives. BioEssays 1998, 20, 652–659. [Google Scholar] [CrossRef]
  21. Soucy, S.M.; Huang, J.; Gogarten, J.P. Horizontal gene transfer: Building the web of life. Nat. Rev. Genet. 2015, 16, 472–482. [Google Scholar] [CrossRef] [PubMed]
  22. Gladyshev, E. Repeat-induced point mutation and other genome defense mechanisms in fungi. Microbiol. Spectr. 2017, 5, 4. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Amselem, J.; Lebrun, M.-H.; Quesneville, H. Whole genome comparative analysis of transposable elements provides new insight into mechanisms of their inactivation in fungal genomes. BMC Genom. 2015, 16, 141. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Van Wyk, S.; Wingfield, B.D.; De Vos, L.; van der Merwe, N.A.; Santana, Q.C.; Steenkamp, E.T. Repeat-induced point mutations drive divergence between Fusarium circinatum and its close relatives. Pathogens 2019, 8, 298. [Google Scholar] [CrossRef] [Green Version]
  25. Steenkamp, E.T.; Wingfield, M.J.; McTaggart, A.R.; Wingfield, B.D. Fungal species and their boundaries matter–Definitions, mechanisms and practical implications. Fungal Biol. Rev. 2018, 32, 104–116. [Google Scholar] [CrossRef]
  26. Van der Nest, M.A.; Beirn, L.A.; Crouch, J.A.; Demers, J.E.; De Beer, Z.W.; De Vos, L.; Gordon, T.R.; Moncalvo, J.-M.; Naidoo, K.; Sanchez-Ramirez, S. Draft genomes of Amanita jacksonii, Ceratocystis albifundus, Fusarium circinatum, Huntiella omanensis, Leptographium procerum, Rutstroemia sydowiana, and Sclerotinia echinophila. IMA Fungus 2014, 5, 472–485. [Google Scholar] [CrossRef] [Green Version]
  27. Slinski, S.; Kirkpatrick, S.; Gordon, T. Inheritance of virulence in Fusarium circinatum, the cause of pitch canker in pines. Plant Pathol. 2016, 65, 1292–1296. [Google Scholar] [CrossRef] [Green Version]
  28. Jeffares, D.C.; Jolly, C.; Hoti, M.; Speed, D.; Shaw, L.; Rallis, C.; Balloux, F.; Dessimoz, C.; Bähler, J.; Sedlazeck, F.J. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 2017, 8, 14061. [Google Scholar] [CrossRef] [Green Version]
  29. Steenwyk, J.; Rokas, A. Extensive copy number variation in fermentation-related genes among Saccharomyces cerevisiae wine strains. G3 Genes Genomes Genet. 2017, 7, 1475–1485. [Google Scholar] [CrossRef] [Green Version]
  30. Syvänen, A.-C. Accessing genetic variation: Genotyping single nucleotide polymorphisms. Nat. Rev. Genet. 2001, 2, 930–942. [Google Scholar] [CrossRef]
  31. Suárez-Vega, A.; Gutiérrez-Gil, B.; Klopp, C.; Tosser-Klopp, G.; Arranz, J.J. Variant discovery in the sheep milk transcriptome using RNA sequencing. BMC Genom. 2017, 18, 170. [Google Scholar] [CrossRef] [PubMed]
  32. Das, A.; Panitz, F.; Gregersen, V.R.; Bendixen, C.; Holm, L.-E. Deep sequencing of Danish Holstein dairy cattle for variant detection and insight into potential loss-of-function variants in protein coding genes. BMC Genom. 2015, 16, 1043. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Daboussi, M.-J.; Capy, P. Transposable elements in filamentous fungi. Annu. Rev. Microbiol. 2003, 57, 275–299. [Google Scholar] [CrossRef] [PubMed]
  34. Katju, V.; Bergthorsson, U. Copy-number changes in evolution: Rates, fitness effects and adaptive significance. Front. Genet. 2013, 4, 273. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Chuma, I.; Isobe, C.; Hotta, Y.; Ibaragi, K.; Futamata, N.; Kusaba, M.; Yoshida, K.; Terauchi, R.; Fujita, Y.; Nakayashiki, H. Multiple translocation of the AVR-Pita effector gene among chromosomes of the rice blast fungus Magnaporthe oryzae and related species. PLoS Pathog. 2011, 7, e1002147. [Google Scholar] [CrossRef] [Green Version]
  36. Kirkpatrick, M. How and why chromosome inversions evolve. PLoS Biol. 2010, 8, e1000501. [Google Scholar] [CrossRef] [Green Version]
  37. Kirkpatrick, M.; Barton, N. Chromosome inversions, local adaptation and speciation. Genetics 2006, 173, 419–434. [Google Scholar] [CrossRef] [Green Version]
  38. De Vos, L.; Myburg, A.A.; Wingfield, M.J.; Desjardins, A.E.; Gordon, T.; Wingfield, B.D. Complete genetic linkage maps from an interspecific cross between Fusarium circinatum and Fusarium subglutinans. Fungal Genet. Biol. 2007, 44, 701–714. [Google Scholar] [CrossRef] [Green Version]
  39. Steenkamp, E.; Makhari, O.; Coutinho, T.; Wingfield, B.; Wingfield, M. Evidence for a new introduction of the pitch canker fungus F usarium circinatum in S outh Africa. Plant Pathol. 2014, 63, 530–538. [Google Scholar] [CrossRef]
  40. Murray, M.; Thompson, W.F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 1980, 8, 4321–4326. [Google Scholar] [CrossRef] [Green Version]
  41. Koren, S.; Walenz, B.P.; Berlin, K.; Miller, J.R.; Bergman, N.H.; Phillippy, A.M. Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017, 27, 722–736. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  43. Chin, C.-S.; Alexander, D.H.; Marks, P.; Klammer, A.A.; Drake, J.; Heiner, C.; Clum, A.; Copeland, A.; Huddleston, J.; Eichler, E.E. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 2013, 10, 563. [Google Scholar] [CrossRef] [PubMed]
  44. Loman, N.J.; Quick, J.; Simpson, J.T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 2015, 12, 733. [Google Scholar] [CrossRef]
  45. Walker, B.J.; Abeel, T.; Shea, T.; Priest, M.; Abouelliel, A.; Sakthikumar, S.; Cuomo, C.A.; Zeng, Q.; Wortman, J.; Young, S.K. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 2014, 9, e112963. [Google Scholar] [CrossRef]
  46. De Vos, L.; Steenkamp, E.T.; Martin, S.H.; Santana, Q.C.; Fourie, G.; van der Merwe, N.A.; Wingfield, M.J.; Wingfield, B.D. Genome-wide macrosynteny among Fusarium species in the Gibberella fujikuroi complex revealed by amplified fragment length polymorphisms. PLoS ONE 2014, 9, e114682. [Google Scholar] [CrossRef] [Green Version]
  47. Harris, R. Improved Pairwise Alignment of Genomic DNA. Ph.D. Thesis, The Pennsylvania State University, State College, PA, USA, 2007. [Google Scholar]
  48. Kearse, M.; Moir, R.; Wilson, A.; Stones-Havas, S.; Cheung, M.; Sturrock, S.; Buxton, S.; Cooper, A.; Markowitz, S.; Duran, C. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 2012, 28, 1647–1649. [Google Scholar] [CrossRef]
  49. Wiemann, P.; Sieber, C.M.; Von Bargen, K.W.; Studt, L.; Niehaus, E.-M.; Espino, J.J.; Huß, K.; Michielse, C.B.; Albermann, S.; Wagner, D. Deciphering the cryptic genome: Genome-wide analyses of the rice pathogen Fusarium fujikuroi reveal complex regulation of secondary metabolism and novel metabolites. PLoS Pathog. 2013, 9, e1003475. [Google Scholar] [CrossRef] [Green Version]
  50. Wingfield, B.D.; Barnes, I.; de Beer, Z.W.; De Vos, L.; Duong, T.A.; Kanzi, A.M.; Naidoo, K.; Nguyen, H.D.; Santana, Q.C.; Sayari, M. Draft genome sequences of Ceratocystis eucalypticola, Chrysoporthe cubensis, C. deuterocubensis, Davidsoniella virescens, Fusarium temperatum, Graphilbum fragrans, Penicillium nordicum, and Thielaviopsis musarum. IMA Fungus 2015, 6, 493–506. [Google Scholar] [CrossRef]
  51. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 2013, arXiv:1303.3997. [Google Scholar]
  52. Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  53. Barnett, D.W.; Garrison, E.K.; Quinlan, A.R.; Strömberg, M.P.; Marth, G.T. BamTools: A C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 2011, 27, 1691–1692. [Google Scholar] [CrossRef] [PubMed]
  54. Simão, F.A.; Waterhouse, R.M.; Ioannidis, P.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  55. Hoff, K.J.; Stanke, M. WebAUGUSTUS—A web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Res. 2013, 41, W123–W128. [Google Scholar] [CrossRef]
  56. Wingfield, B.D.; Steenkamp, E.T.; Santana, Q.C.; Coetzee, M.; Bam, S.; Barnes, I.; Beukes, C.W.; Yin Chan, W.; De Vos, L.; Fourie, G. First fungal genome sequence from Africa: A preliminary analysis. S. Afr. J. Sci. 2012, 108, 01–09. [Google Scholar] [CrossRef] [Green Version]
  57. Wingfield, B.D.; Liu, M.; Nguyen, H.D.; Lane, F.A.; Morgan, S.W.; De Vos, L.; Wilken, P.M.; Duong, T.A.; Aylward, J.; Coetzee, M.P. Nine draft genome sequences of Claviceps purpurea s. lat., including C. arundinis, C. humidiphila, and C. cf. spartinae, pseudomolecules for the pitch canker pathogen Fusarium circinatum, draft genome of Davidsoniella eucalypti, Grosmannia galeiformis, Quambalaria eucalypti, and Teratosphaeria destructans. IMA Fungus 2018, 9, 401. [Google Scholar]
  58. Robinson, J.T.; Thorvaldsdóttir, H.; Winckler, W.; Guttman, M.; Lander, E.S.; Getz, G.; Mesirov, J.P. Integrative genomics viewer. Nat. Biotechnol. 2011, 29, 24. [Google Scholar] [CrossRef] [Green Version]
  59. Cingolani, P.; Platts, A.; Wang, L.L.; Coon, M.; Nguyen, T.; Wang, L.; Land, S.J.; Lu, X.; Ruden, D.M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 2012, 6, 80–92. [Google Scholar] [CrossRef] [Green Version]
  60. Ruden, D.M.; Cingolani, P.; Patel, V.M.; Coon, M.; Nguyen, T.; Land, S.J.; Lu, X. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front. Genet. 2012, 3, 35. [Google Scholar]
  61. Conesa, A.; Götz, S.; García-Gómez, J.M.; Terol, J.; Talón, M.; Robles, M. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005, 21, 3674–3676. [Google Scholar] [CrossRef] [Green Version]
  62. Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; Pham, S.; Prjibelski, A.D. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012, 19, 455–477. [Google Scholar] [CrossRef] [Green Version]
  63. Garrison, E.; Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv 2012, arXiv:1207.3907. [Google Scholar]
  64. Drillon, G.; Carbone, A.; Fischer, G. SynChro: A fast and easy tool to reconstruct and visualize synteny blocks along eukaryotic chromosomes. PLoS ONE 2014, 9, e92621. [Google Scholar] [CrossRef]
  65. Ozer, E.A.; Allen, J.P.; Hauser, A.R. Characterization of the core and accessory genomes of Pseudomonas aeruginosa using bioinformatic tools Spine and AGEnt. BMC Genom. 2014, 15, 737. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  66. Delcher, A.L.; Phillippy, A.; Carlton, J.; Salzberg, S.L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002, 30, 2478–2483. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  67. Kurtz, S.; Phillippy, A.; Delcher, A.L.; Smoot, M.; Shumway, M.; Antonescu, C.; Salzberg, S.L. Versatile and open software for comparing large genomes. Genome Biol. 2004, 5, R12. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  68. Ozer, E.A. ClustAGE: A tool for clustering and distribution analysis of bacterial accessory genomic elements. BMC Bioinform. 2018, 19, 150. [Google Scholar] [CrossRef]
  69. Jankowsky, E. RNA helicases at work: Binding and rearranging. Trends Biochem. Sci. 2011, 36, 19–29. [Google Scholar] [CrossRef] [Green Version]
  70. Strandberg, A.K.; Salter, L.A. A comparison of methods for estimating the transition: Transversion ratio from DNA sequences. Mol. Phylogenet. Evol. 2004, 32, 495–503. [Google Scholar] [CrossRef]
  71. Yin, L.; Zhang, H.; Tang, Z.; Xu, J.; Yin, D.; Zhang, Z.; Yuan, X.; Zhu, M.; Zhao, S.; Li, X. rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated tool for Genome-Wide Association Study. Genom. Proteom. Bioinform. 2020, 19, 619–628. [Google Scholar] [CrossRef]
  72. Emms, D.M.; Kelly, S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol. 2019, 20, 238. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  73. Walkowiak, S.; Rowland, O.; Rodrigue, N.; Subramaniam, R. Whole genome sequencing and comparative genomics of closely related Fusarium Head Blight fungi: Fusarium graminearum, F. meridionale and F. asiaticum. BMC Genom. 2016, 17, 1014. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  74. Maphosa, M.N.; Steenkamp, E.T.; Wingfield, B.D. Genome-based selection and characterization of Fusarium circinatum-specific sequences. G3 Gene Genomes Genet. 2016, 6, 631–639. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  75. Phasha, M.M.; Wingfield, B.D.; Coetzee, M.P.; Santana, Q.C.; Fourie, G.; Steenkamp, E.T. Architecture and distribution of introns in core genes of four Fusarium species. G3 Gene Genomes Genet. 2017, 7, 3809–3820. [Google Scholar] [CrossRef] [Green Version]
  76. Ellison, C.E.; Hall, C.; Kowbel, D.; Welch, J.; Brem, R.B.; Glass, N.; Taylor, J.W. Population genomics and local adaptation in wild isolates of a model microbial eukaryote. Proc. Natl. Acad. Sci. USA 2011, 108, 2831–2836. [Google Scholar] [CrossRef] [Green Version]
  77. Nielsen, R.; Paul, J.S.; Albrechtsen, A.; Song, Y.S. Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 2011, 12, 443–451. [Google Scholar] [CrossRef]
  78. Olson, N.D.; Lund, S.P.; Colman, R.E.; Foster, J.T.; Sahl, J.W.; Schupp, J.M.; Keim, P.; Morrow, J.B.; Salit, M.L.; Zook, J.M. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front. Genet. 2015, 6, 235. [Google Scholar] [CrossRef] [Green Version]
  79. Santana, Q.C.; Coetzee, M.P.; Steenkamp, E.T.; Mlonyeni, O.X.; Hammond, G.N.; Wingfield, M.J.; Wingfield, B.D. Microsatellite discovery by deep sequencing of enriched genomic libraries. Biotechniques 2009, 46, 217–223. [Google Scholar] [CrossRef]
  80. Wallace, M.M.; Covert, S.F. Molecular Mating Type Assay for Fusarium circinatum. Appl. Environ. Microbiol. 2000, 66, 5506–5508. [Google Scholar] [CrossRef] [Green Version]
  81. Fierro, F.; García-Estrada, C.; Castillo, N.I.; Rodríguez, R.; Velasco-Conde, T.; Martín, J.-F. Transcriptional and bioinformatic analysis of the 56.8 kb DNA region amplified in tandem repeats containing the penicillin gene cluster in Penicillium chrysogenum. Fungal Genet. Biol. 2006, 43, 618–629. [Google Scholar] [CrossRef]
  82. Waalwijk, C.; Taga, M.; Zheng, S.-L.; Proctor, R.H.; Vaughan, M.M.; O’Donnell, K. Karyotype evolution in Fusarium. IMA Fungus 2018, 9, 13–33. [Google Scholar] [CrossRef] [PubMed]
  83. Zakian, V.A. Telomeres: Beginning to understand the end. Science 1995, 270, 1601–1607. [Google Scholar] [CrossRef] [PubMed]
  84. Cohn, M.; Liti, G.; Barton, D.B. Telomeres in fungi. In Comparative Genomics; Springer: Berlin/Heidelberg, Germany, 2006; pp. 101–130. [Google Scholar]
  85. Xu, J.-R.; Yan, K.; Dickman, M.B.; Leslie, J.F. Electrophoretic karyotypes distinguish the biological species of Gibberella fujikuroi (Fusarium section Liseola). Mol. Plant Microbe Interact. 1995, 8, 74–84. [Google Scholar] [CrossRef]
  86. Steenwyk, J.L.; Soghigian, J.S.; Perfect, J.R.; Gibbons, J.G. Copy number variation contributes to cryptic genetic variation in outbreak lineages of Cryptococcus gattii from the North American Pacific Northwest. BMC Genom. 2016, 17, 700. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  87. Barredo, J.L.; Díez, B.; Alvarez, E.; Martín, J.F. Large amplification of a 35-kb DNA fragment carrying two penicillin biosynthetic genes in high penicillin producing strains of Penicillium chrysogenum. Curr. Genet. 1989, 16, 453–459. [Google Scholar] [CrossRef] [PubMed]
  88. Smith, D.J.; Bull, J.H.; Edwards, J.; Turner, G. Amplification of the isopenicillin N synthetase gene in a strain of Penicillium chrysogenum producing high levels of penicillin. Mol. Gen. Genet. 1989, 216, 492–497. [Google Scholar] [CrossRef]
  89. Fogel, S.; Welch, J.W. Tandem gene amplification mediates copper resistance in yeast. Proc. Natl. Acad. Sci. USA 1982, 79, 5342–5346. [Google Scholar] [CrossRef] [Green Version]
  90. Marcotte, E.M.; Pellegrini, M.; Ng, H.-L.; Rice, D.W.; Yeates, T.O.; Eisenberg, D. Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285, 751–753. [Google Scholar] [CrossRef] [Green Version]
  91. Seoighe, C.; Federspiel, N.; Jones, T.; Hansen, N.; Bivolarovic, V.; Surzycki, R.; Tamse, R.; Komp, C.; Huizar, L.; Davis, R.W. Prevalence of small inversions in yeast gene order evolution. Proc. Natl. Acad. Sci. USA 2000, 97, 14433–14437. [Google Scholar] [CrossRef] [Green Version]
  92. Strauss, B.S. Frameshift mutation, microsatellites and mismatch repair. Mutat. Res. Rev. Mutat. Res. 1999, 437, 195–203. [Google Scholar] [CrossRef]
  93. Simmonds, J.; Scott, P.; Brinton, J.; Mestre, T.C.; Bush, M.; Del Blanco, A.; Dubcovsky, J.; Uauy, C. A splice acceptor site mutation in TaGW2-A1 increases thousand grain weight in tetraploid and hexaploid wheat through wider and longer grains. Theor. Appl. Genet. 2016, 129, 1099–1112. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  94. Fenn, J.; Boursnell, M.; Hitti, R.J.; Jenkins, C.A.; Terry, R.L.; Priestnall, S.L.; Kenny, P.J.; Mellersh, C.S.; Forman, O.P. Genome sequencing reveals a splice donor site mutation in the SNX14 gene associated with a novel cerebellar cortical degeneration in the Hungarian Vizsla dog breed. BMC Genet. 2016, 17, 123. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  95. Golicz, A.A.; Bayer, P.E.; Bhalla, P.L.; Batley, J.; Edwards, D. Pangenomics comes of age: From bacteria to plant and animal applications. Trends Genet. 2020, 36, 132–145. [Google Scholar] [CrossRef] [PubMed]
  96. Tautz, D.; Domazet-Lošo, T. The evolutionary origin of orphan genes. Nat. Rev. Genet. 2011, 12, 692–702. [Google Scholar] [CrossRef] [PubMed]
Figure 1. SynChro [64] generated chromosome painting, comparison of the isolates showing the uncharacterized scaffold > 500 Kbs in CMWF1803 and CMWF560. No similar sequences were found in the other isolates.
Figure 1. SynChro [64] generated chromosome painting, comparison of the isolates showing the uncharacterized scaffold > 500 Kbs in CMWF1803 and CMWF560. No similar sequences were found in the other isolates.
Jof 08 00657 g001
Figure 2. GO enrichment analysis for genes associated with deletions showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the reference set and genes associated with deletions were used as the test set.
Figure 2. GO enrichment analysis for genes associated with deletions showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the reference set and genes associated with deletions were used as the test set.
Jof 08 00657 g002
Figure 3. GO enrichment analysis for genes affected by insertions showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the reference set and genes affected by insertions were used as the test set.
Figure 3. GO enrichment analysis for genes affected by insertions showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the reference set and genes affected by insertions were used as the test set.
Jof 08 00657 g003
Figure 4. GO enrichment analysis for genes associated with inversions showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the reference set and genes within inverted regions were used as the test set.
Figure 4. GO enrichment analysis for genes associated with inversions showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the reference set and genes within inverted regions were used as the test set.
Jof 08 00657 g004
Figure 5. Protein classes coded by genes found in assembled sequences that did not map to the FSP34 genome assembly.
Figure 5. Protein classes coded by genes found in assembled sequences that did not map to the FSP34 genome assembly.
Jof 08 00657 g005
Figure 6. Micro variant density distribution across chromosome lengths of F. circinatum FSP34 relative to the other 6 genomes showing regions of higher variant density in chromosome arms and sparse distribution within the inner central parts of the chromosomes. The figure was generated using CM plot [71].
Figure 6. Micro variant density distribution across chromosome lengths of F. circinatum FSP34 relative to the other 6 genomes showing regions of higher variant density in chromosome arms and sparse distribution within the inner central parts of the chromosomes. The figure was generated using CM plot [71].
Jof 08 00657 g006
Figure 7. GO enrichment analysis for genes within high variant density chromosomal regions showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the reference set, genes within high variant density regions were used as the test set.
Figure 7. GO enrichment analysis for genes within high variant density chromosomal regions showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the reference set, genes within high variant density regions were used as the test set.
Jof 08 00657 g007
Figure 8. GO enrichment analysis of genes predicted to have high impact variants showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the reference set and genes predicted to have high impact variants were used as the test set.
Figure 8. GO enrichment analysis of genes predicted to have high impact variants showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the reference set and genes predicted to have high impact variants were used as the test set.
Jof 08 00657 g008
Figure 9. GO enrichment analysis for genes predicted to have moderate impact variants showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the test set and the genes with moderate impact variants were the test set.
Figure 9. GO enrichment analysis for genes predicted to have moderate impact variants showing overrepresented GO terms, biological processes (a) and molecular functions (b). The whole genome annotations of the reference FSP34 were used as the test set and the genes with moderate impact variants were the test set.
Jof 08 00657 g009
Table 1. Summary of whole nuclear genome assembly statistics.
Table 1. Summary of whole nuclear genome assembly statistics.
IsolateCMWF560CMWF567CMW1803UG10UG27
Accession numberJAEHFI
000000000
JADZLS
000000000
JAEHFH
000000000
JAGJRQ
000000000
JAELVK
000000000
Genome size (bp)46,691,34345,984,42046,810,76344,774,96845,546,500
Genome coverage 135 (58)65 (56)54 (56)31 (60)43 (43)
N504,436,154 bp4,431,017 bp4,492,802 bp4,380,615 bp4,358,900 bp
N753,211,240 bp3,566,220 bp3,263,251 bp3,014,022 bp3,202,209 bp
L5055555
L7588888
% G+C46.7846.8747.0547.5046.83
BUSCO (%)99.099.099.198.999.1
Number of
chromosomes
1212121212
Uncharacterised contigs371271635
ORF14,17014,11614,38214,09413,987
1 Genome coverage for Pacbio sequencing, with Illumina sequencing indicated in brackets ().
Table 2. Size of the different scaffolds in each of the assemblies for the seven Fusarium circinatum isolates used in this study.
Table 2. Size of the different scaffolds in each of the assemblies for the seven Fusarium circinatum isolates used in this study.
Scaffold 1F. circinatum Isolate 2
FSP34CMWF560CMWF567CMWF1803UG10UG27KS17
Chr016,407,6896,519,5906,423,8206,503,3896,400,8226,430,7586,397,914
Chr025,066,1975,005,6065,040,0055,011,5584,859,6494,976,0764,709,326
Chr035,081,8885,037,8265,141,2895,085,5564,913,4445,257,4615,148,568
Chr044,313,1684,556,9454,479,6724,551,6274,383,1474,236,0594,401,926
Chr054,432,5534,436,3234,431,0174,492,9344,408,8604,425,5594,304,443
Chr064,301,8954,284,5084,281,3184,282,1504,261,2364,358,6794,219,930
Chr073,541,0543,578,5083,565,9773,555,3733,412,6933,591,9873,312,103
Chr083,172,9153,210,5693,718,3483,263,9243,015,0143,202,3613,066,990
Chr092,981,5442,920,6412,844,9922,857,3382,289,9252,952,1692,282,005
Chr102,698,8202,714,7372,642,7822,649,8702,413,6752,564,1762,483,521
Chr112,228,4202,291,7572,2497992,266,9312,087,5082,247,1102,291,537
Chr12525,065969,164857,395771,183680,337978,035870,680
UC > 500 Kb-536,197-1,045,802---
UC < 500 Kb257,344631,872308,006473,128560,738328,470339,343
1 Chr01-Chr12 corresponds to chromosomes 1–12. Data for all uncharacterized contigs (UC) for each genome were categorised into two, UC > 500 Kbs and UC < 500 Kbs. The total sizes of uncharacterized contigs < 500 Kbs were combined per genome. 2 Scaffold sizes are indicated in bp, and the absence of uncharacterized contigs > 500 Kb is indicated with.
Table 3. Summary statistics for number of reads obtained and mapped to each genome for the different sequencing platforms.
Table 3. Summary statistics for number of reads obtained and mapped to each genome for the different sequencing platforms.
Sequencing
Platform
IsolateTotal Number of Quality-Filtered ReadsNumber of Reads Mapped to the FSP34 Reference
Assembly (%)
IlluminaFSP347,840,0067,762,015 (99.0)
CMWF5609,471,0998,931,113 (94.0)
CMWF5679,482,3089,012,583 (95.1)
CMWF18039,860,1439,028,628 (91.6)
UG109,737,1819,394,412 (96.5)
UG279,743,6938,934,966 (91.7)
KS177,995,0467,228,461 (90.4)
PacBioCMWF560300,889281,873 (93.7)
CMWF567187,327181,372 (96.8)
CMWF1803194,164183,210 (94.4)
UG10256,808247,772 (96.5))
UG27357,305339,326 (95.0)
MinIONKS1795,51092,336 (96.7)
FSP34165,477164,130 (99.2)
Table 4. Micro variant and structural variant (SV) rate details per chromosome.
Table 4. Micro variant and structural variant (SV) rate details per chromosome.
ChromosomeMicro Variants 1SVs 2
Number of
Variants
Variant Rate 3Number of
Deletions
Number of
Duplications
Number of
Inversions
Number of
Insertions
Chr0141,8091539911083
Chr0244,42911410421472
Chr0335,9471419701869
Chr0442,594101801471
Chr0536,398121810867
Chr0644,2239711201168
Chr0736,25897753863
Chr0843,13073970461
Chr0931,97193600644
Chr1043,87061800839
Chr1136,755606811049
Chr1217,31430330231
Total46,1683979868103717
1 Micro variants were determined using Freebayes. 2 SVs were determined using Sniffles. 3 Variant rate determined by number of variants per base (total chromosome size/total number of variants).
Table 5. SnpEff annotation summary for macro structural variants (SVs) indicating number of effects. The impact of the different types of effects are indicated in differing colours: High impact (red), moderate impact (orange), low impact (green) and modifier effect (yellow).
Table 5. SnpEff annotation summary for macro structural variants (SVs) indicating number of effects. The impact of the different types of effects are indicated in differing colours: High impact (red), moderate impact (orange), low impact (green) and modifier effect (yellow).
Effect ClassificationFieldsSnpEff CountPercentage (%)
Number of effects by impactHigh420,01788.2
Low10.0
Moderate49,89510.5
Modifier63161.3
Number of effects by typeBidirectional gene fusion222,12146.6
Chromosome number variation100.002
Conservative in-frame deletion1030.02
Disruptive in-frame deletion570.01
Downstream gene variant24990.5
Duplication10.0
Exon loss variant1000.02
Exon region20.0
Feature ablation77241.6
Frame shift variant1490.03
Gene fusion180,46837.9
Intergenic region6080.13
Intragenic variant3670.08
Intron variant490.01
Inversion49,86310.5
Non-coding transcript variant3100.07
Splice acceptor variant160.003
Splice donor variant190.004
Splice region variant670.01
Splice site region10.0
Start lost450.01
Stop gained10.0
Stop lost390.008
Transcript ablation93162.0
Upstream gene variant25150.5
Number of effects by regionChromosome490.01
Downstream2 4990.53
Exon4490.1
Gene438,34392.1
Intergenic6080.1
Intron150.003
Splice site acceptor50.001
Splice site donor60.001
Splice site region10.0
Transcript31,7396.7
Upstream25150.5
Table 6. SnpEff annotation summary of micro variants indicating number of effects (Freebayes). The impact of the different types of effects are indicated in differing colours: High impact (red), moderate impact (orange), low impact (green) and modifier effect (yellow).
Table 6. SnpEff annotation summary of micro variants indicating number of effects (Freebayes). The impact of the different types of effects are indicated in differing colours: High impact (red), moderate impact (orange), low impact (green) and modifier effect (yellow).
Effect ClassificationFieldsSnpEff CountPercentage (%)
Number of effects by impactHigh18,0470.7
Moderate63,0332.3
Low74,8452.8
Modifier2,563,28894.3
Number of effects by functional classMissense46,37242.0
Nonsense10270.9
Silent62,99157.1
Number of effects by typeConservative in-frame deletion4040.02
Conservative in-frame insertion3960.02
Disruptive in-frame deletion3540.01
Disruptive in-frame insertion2160.01
Downstream gene variant977,63135.9
Frameshift variant18810.07
Gene fusion13.8810.5
Initiator codon variant170.001
Intergenic region277.04710.2
Intragenic variant168.2826.2
Intron variant23.5720.9
Missense variant62.2332.3
Non-coding transcript variant133.1464.9
Splice acceptor variant2870.01
Splice donor variant3250.01
Splice region variant56890.2
Start lost1100.004
Stop gained14340.05
Stop lost2690.01
Stop retained variant1700.006
Synonymous variant71,0542.6
Upstream gene variant988,02736.2
Number of effects by regionDownstream977,63136.0
Exon136,9595.0
Gene13,8810.5
Intergenic277,04710.2
Intron19,1550.7
Splice site acceptor2660.01
Splice site donor3010.01
Splice site region45180.2
Transcript301,4280.0
Upstream988,02736.3
Table 7. Spine statistics showing the size of the accessory, core and pangenome elements of the F. circinatum isolates.
Table 7. Spine statistics showing the size of the accessory, core and pangenome elements of the F. circinatum isolates.
IsolateSourceTotal bpGC%
FSP34Accessory2,669,71344.19
Core42,260,18947.20
CMWF560Accessory3,831,93945.05
Core42,633,79547.02
CMWF567Accessory3,290,97545.14
Core42,465,50547.06
CMWF1803Accessory4,359,90745.27
Core42,501,74447.22
KS17Accessory2,369,66844.57
Core42,008,43347.42
UG10Accessory2,733,55245.07
Core41,905,48947.72
UG27Accessory2,919,79744.25
Core42,449,44047.06
Backbone42,260,18947.20
Pangenome50,076,54146.85
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Maphosa, M.N.; Steenkamp, E.T.; Kanzi, A.M.; van Wyk, S.; De Vos, L.; Santana, Q.C.; Duong, T.A.; Wingfield, B.D. Intra-Species Genomic Variation in the Pine Pathogen Fusarium circinatum. J. Fungi 2022, 8, 657. https://0-doi-org.brum.beds.ac.uk/10.3390/jof8070657

AMA Style

Maphosa MN, Steenkamp ET, Kanzi AM, van Wyk S, De Vos L, Santana QC, Duong TA, Wingfield BD. Intra-Species Genomic Variation in the Pine Pathogen Fusarium circinatum. Journal of Fungi. 2022; 8(7):657. https://0-doi-org.brum.beds.ac.uk/10.3390/jof8070657

Chicago/Turabian Style

Maphosa, Mkhululi N., Emma T. Steenkamp, Aquillah M. Kanzi, Stephanie van Wyk, Lieschen De Vos, Quentin C. Santana, Tuan A. Duong, and Brenda D. Wingfield. 2022. "Intra-Species Genomic Variation in the Pine Pathogen Fusarium circinatum" Journal of Fungi 8, no. 7: 657. https://0-doi-org.brum.beds.ac.uk/10.3390/jof8070657

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop