4.1. Repetitive Sequences and Evolution of Chromosome Size and DNA Composition
The size of a chromosome is a conspicuous feature of a given plant species. Generally, the chromosome size of monocotyledon species is larger than that of dicotyledon plants, and the chromosome size of temperate plants is larger than that of tropical plants [
49]. The total size of metaphase chromosomes is generally determined by the nuclear DNA content. For example, the total chromosomal length, as well as the total chromosomal area, in the genus
Oryza is positively correlated with the nuclear DNA content [
50]. It should be noted that the average chromosome size is not correlated well with the nuclear DNA content, because it not only depends on the amount of nuclear DNA, but is also influenced by the number of chromosomes of a complement [
51].
The main mechanisms that contribute to genome expansion in plants are polyploidization and the proliferation of repetitive DNA sequences, particularly TEs [
52]. The accumulation of repetitive sequences, particularly retrotransposons, emerges as a major contributor of genome size variation [
53], largely due to their intrinsic amplification ability. For example, the genome size in plants, such as
Oryza australiensis and
Gossypium spp, is doubled due to the retrotransposon activity [
54,
55]. In the genus
Asparagus, retrotransposon proliferation is probably responsible for the larger genome size in dioecious species [
56]. The genome size of the dioecious plant
Silene latifolia is more than twice that of the gynodioecious
Silene vulgaris, largely due to the expansion of
Ogre retrotransposons [
57]. It is now recognized that the quantity of repetitive DNA sequences is more closely correlated with the genome size than the number of coding genes [
58]. We performed correlation analysis between genome size and quantities of different types of repetitive DNA or gene numbers in the plant species with sequenced genomes (
Table S1). The results showed that the genome size is not correlated with gene numbers (R
2 = 0.3346,
Figure 2A). However, the genome size correlated well with the quantity of repetitive DNAs (R
2 = 0.8849,
Figure 2B). Comparisons between genome size and different classes of repetitive sequences revealed that a greater positive correlation was recorded for retrotransposons than for transposons (R
2 = 0.821 versus R
2 = 0.5297,
Figure 2C,D). Among the retrotransposons, LTR retrotransposons showed good correlation with genome size (R
2 = 0.774, data not shown), whereas no significant difference was observed between the contributions of Ty1-
copia and Ty3-
gypsy LTR retrotranposons (R
2 value was 0.7003 and 0.6388, respectively,
Figure 2E,F). These data suggest that the differential proliferation of repetitive sequences, especially retrotransposons, has largely contributed to differences in genome size observed between species.
In addition, such correlation between genome size and different types of repetitive sequences is enhanced in higher plants. If we remove the genome data of seven lower algae, the R2 value between genome size and various repetitive sequences, including total repetitive sequences, retrotranposons, LTR retrotransposons, Ty1-copia, and Ty3-gypsy, can be increased to 0.9062, 0.8935, 0.8683, 0.7824, and 0.7555, respectively. However, the R2 value between genome size and transposons was decreased from 0.5297 to 0.4552 after removing the lower plant data. The analysis indicated that retrotransposons contributed more significantly to the genome size in higher plants than in lower plants. In contrast, transposons more highly influenced the genome size in algae than in higher plants. We are not sure whether this phenomenon is due to the genome divergence between lower algae plants and higher plants, or is only due to the analyzed algae genomes having a smaller size. More data should be collected to study the impact of different classes of TEs on distinct lineages of plants.
The total chromosome size is also highly positively correlated with the repetitive sequence contents because total chromosome size is mainly determined by the nuclear DNA content. Numerous studies within different lineages confirmed this general trend. Among the 11 species of the genus
Oryza,
O. australiensis, with the largest genome and longest chromosome size, showed the overall amplification of genome-specific DNA sequences throughout the chromosomes, whereas
Oryza brachyantha, with the smallest genome and shortest chromosome size, had limited repetitive sequences [
50].
The most direct evidence is the size of sex chromosomes of dioecious plants. A number of dioecious plants have a larger Y chromosome than X chromosome, such as in
S. latifolia [
59],
Coccinia grandis [
60], and
Cannabis sativa [
61]. Evidence has suggested that the larger Y chromosome was formed by the accumulation of a large number of repetitive sequences. For example, the Y chromosome in
S. latifolia is the largest chromosome in male metaphase and it has accumulated a large number of repetitive DNAs. TE insertions are presented at more highly predicted frequencies at sites on the Y chromosome than on the other chromosomes by transposon display analysis [
59]. Additionally, in
C. grandis, the large Y chromosome is mainly due to the accumulation of various repetitive sequences, such as Ty1-
copia and Ty3-
gypsy elements, unclassified elements, and tandem repeats [
60].
Although repetitive sequences are major contributors to plant genome size, the prevalence of particular repeat families differs dramatically among different plant groups. In many cases, a limited number of repetitive types are highly amplified in one lineage. For example, a single Ty3-
gypsy-like retrotransposon accounts for approximately 38% of the genome of
Vicia pannonica [
62], and the accumulation of a single-type LTR retrotransposon that belongs to the Del subgroup plays vital roles in the
Capsicum annuum genome expansion [
63]. In several cases, the amplification of a specific family is observed in several related species [
64], but the copy number normally showed a large difference in close relatives [
65]. In other cases, a number of genomes comprised several TE families with similar quantities. However, many individual TE families in several species were amplified causing genome expansion, such as in
Picea abies, for which more than 86% of the repetitive elements recovered were singletons [
24]. These observations reveal that the accumulation pattern of repetitive sequences not only depends on the element itself, but also on the genome. Several repetitive elements can escape the control in a particular genome, and certain genomes are more tolerant of the amplification of repeats [
66].
In addition, the accumulation of repetitive sequences can obviously influence the DNA composition of chromosomes. For example, 79.2% of the male-specific regions on the Y chromosome (MSY) and 67.2% in the X chromosome counterpart are occupied by repetitive sequences, whereas the ration of repetitive sequences in the entire genome is 51% [
67]. Furthermore, the repetitive sequences are more tolerant of mutations, mainly because they are less influenced by the selective pressure [
68,
69]. Thus, the amplification, deletion, and mutation of the repetitive sequences contributed largely to the DNA composition of chromosomes during the evolution process.
4.2. Repetitive Sequences and Evolution of Chromosome Structure and Shape
The structure and shape of chromosomes can be altered by chromosome rearrangement, including insertion, duplication, deletion, centric split and fusion, inversion, and translocation. Comparative cytogenetic studies revealed extensive chromosome rearrangements in many plant species, such as in the Brassicaceae family [
70,
71,
72], Solanaceae family [
19,
73], and grass family [
74,
75]. For example, the difference between
Arabidopsis lyrata and
A. thaliana was mainly explained by 10 major rearrangement events, including five inversions, two translocations, and three fusion/fissions [
71]. The differences in the structure, shape, and numbers of chromosomes in related species, both in animals and plants, are due to the syntenic blocks being assembled in different combinations. Blocks that are fused together in one species can be separated on different chromosomes in another. Segments within blocks can be duplicated, lost, or inverted [
76,
77,
78]. The current karyotype of a given species is formed by complex chromosomal rearrangement usually combining two or more rearranging events, and the process is still going on. For example, BAC-FISH analysis showed that the genomes of
Brachypodium distachyon,
Brachypodium sylvaticum, and
Brachypodium pinnatum were differentiated by chromosomal rearrangements, such as duplications, translocations, and inversions. For instance, the presence of a chromosome pair carrying an additional site for Bd2/1 in
B. pinnatum might have resulted either from a duplication or a translocation event, and the Bd2/11 in
B. sylvaticum was possibly formed by a reciprocal translocation between the chromosome carrying sites for Bd2/10 and Bd2/11 [
79].
Increasing evidence suggests that major structural chromosomal repatterning is frequently associated with cytogenetically detectable heterochromatic regions composed of repetitive DNA sequences [
70,
80,
81,
82]. Repetitive sequences, especially TEs, are involved in various chromosomal rearrangements. Early in 1946, Barbara McClintock suggested that transposons can cause chromosome breakage and dissociate the acentric fragment from the rest of the chromatid [
83]. Cytogenetic evidence showed that the
En/Spm transposons were involved in the ongoing chromosomal rearrangement leading to the rise of a new fertile plant population of
Aegilops speltoides [
84].
Various factors can cause a double-strand break (DSB) on a chromosome, and the chromosomal rearrangements may be the result of illegitimate recombination during the process of DSB repair, either via the direct joining of ends between different DSBs or through recombination with ectopic homologous sequences (
Figure 3A). The ectopic recombination usually occurred at ectopic homologous sequences as a template for recombination repair. Thus, the repetitive sequences provide the ideal target region. It has been observed that the primary rearrangements are nearly exclusively located in heterochromatic regions enriched in similar highly repetitive DNA sequences [
70,
85].
Ectopic recombination between homologous repetitive sequences within one chromosome can cause a shorter chromosome and a chromosome fragment, followed by the loss of the chromosome fragment (
Figure 3B). In fact, the small Y chromosomes in many animals and several dioecious plants are probably formed by this mechanism [
86]. Ectopic recombination between homologous repetitive sequences between homologous chromosomes leads to the duplication of one chromosome and the deletion of another (
Figure 3C). In cucurbit species, the gain/loss-associated centromere reposition of pericentromeric heterochromatin sequences caused distinct alteration of the structure and shape of derived chromosomes between cucumber and melon. For example, the cucumber chromosome 6 is a metacentric chromosome with little heterochromatin, whereas the related melon chromosome 1 is a subtelocentric chromosome with a large amount of heterochromatin in the pericentromeric regions [
87].
Other chromosomal rearrangements, such as inversion, translocation, and centric fusion and fission are also related to repetitive sequences. For example, various types of repetitive sequences may play a role by facilitating the formation of secondary structure intermediates between the single-stranded DNA ends that recombine during chromosome rearrangements, such as translocations, and gross deletions in humans [
80]. In
Drosophila buzzatii, the commonly occurring polymorphic inversions were probably formed by ectopic recombination, during which the breakpoints contain large insertions corresponding to transposable elements [
88]. Apparently, these TEs contributed to natural inversions in
D. buzzatii. Sequence analysis suggested that one 1.17 Mb inversion between Col-0 and Ler
Arabidopsis was caused by the activity of a transposon Vandal5, which is a Mutator-like (Mule) transposon. According to the arrangement of the sequences at the distal and proximal breakpoints of the inversion, it is inferred that the 5′-end of the Vandal5 transposon inserted into the third exon of an F-box protein-coding gene, whereas the other end of the transposon remained attached to the original donor site. Recombining the two free ends resulted in the inversion (
Figure 3D) [
72].
Furthermore, translocations are also related to repetitive sequences, and ectopic recombination between homologous repetitive sequences between different chromosomes leads to the reciprocal translocation events. The Ty retrotransposon elements driving translocation events, and other duplication and deletion events, account for the chromosome length polymorphism of enological stains of
Saccharomyces cerevisiae [
89]. Molecular and cytogenetic analyses showed that 19 major chromosomal rearrangements, including 17 reciprocal translocations and two large inversions, were detected in the analyzed maize lines. The junctions of all these 19 chromosome rearrangements contained Ac termini and eight bp target site duplications. The results strongly indicated that excision of the Ac and fAc (a fractured Ac element) termini followed by insertion at a chromosomal target site leads to a rearrangement of the sequences flanking the transposon termini. After the cleavage of Ac and fAc ends by Ac transposase, the Ac/fAc termini inserted into a site on the opposite arm of the same sister chromatid could generate a pericentric inversion, whereas the transposon ends inserted into a site in another chromosome could produce a reciprocal translocation (
Figure 3E) [
90]. It should be noted that the mechanism of repetitive sequences involved in duplication and deletion is presently well established, as shown in
Figure 3B,C. However, the roles played by repetitive sequences in inversion and translocation have not yet been clearly understood.
Figure 3D,E only show partial possible mechanisms, and a comprehensive view of the relationship between repetitive sequences and inversion/translocation still needs more evidence.
4.3. Repetitive Sequences and Evolution of Chromosome Number
Chromosome numbers can be altered by ploidy mutations involving the entire complement (polyploidy) or individual chromosomes (aneuploidy) [
20]. In addition, chromosome rearrangements, such as chromosome fission or fusion, can also increase or decrease the number of chromosomes [
70,
91]. In general, the base chromosome number reduction in monodicots is usually caused by nested chromosome fusions, whereas in eudicots, end-to-end fusions are mostly involved [
20,
70,
79]. Nested chromosome fusion is a process during which a whole chromosome is inserted by its telomeres into a break in the centromeric region of another chromosome [
91]. For example, comparative cytogenomics analysis among
Brachypodium, sorghum, rice, and wheat revealed that the current five
Brachypodium chromosomes were formed from a five-chromosome ancestral genome via a 12-chromosome intermediate involving seven major chromosome fusions caused by nested chromosome insertions [
77]. While in eudicots, end-to-end fusions played an important role in base chromosome number reduction. For example, the karyotypes in
A. thaliana (
n = 5) and of related species with six or seven chromosome pairs were derived from an ancestral karyotype with eight chromosome pairs. Chromosome fusions in
A. thaliana resulted from the generation of acrocentric chromosomes by pericentric inversions, reciprocal translocation between two chromosomes (one or both acrocentric), and elimination of a minichromosome that arose in addition to the fusion chromosome [
70]. In addition, centric fission can cause an increase in the base chromosome number and karyotype symmetry. For example, comparative linkage mapping analysis showed that the genomes of closely related species,
Mimulus lewisii and
Mimulus guttatus, present strong segmental synteny, and compared to the ancestral base number 8 of
M. lewisii, the reconstruction of 14
M. guttatus chromosomes requires at least eight fission events plus two fusion events [
92].
As described above, the chromosome arrangement frequently occurred at the chromosome regions replete with repetitive sequences. Thus, the chromosome number change associated with chromosome arrangement leading to chromosome number alteration is often related to repetitive sequences. In fact, in the process of nested chromosome fusions that mostly occur in grasses, the concerned centromeric region and telomeric region are embedded with an abundance of repetitive sequences [
77,
93]. In has also been reported that repetitive sequence-abundant regions, such as constitutive heterochromatin, GC-rich DNA, and rDNA are implicated in chromosomal rearrangements when the basic chromosome number descends in the
Reichardia genus [
82].
Overall, rapid chromosomal evolution is driven by the activity of repetitive sequences. Although the exact mechanism of repetitive sequences involved in chromosomal evolution remains largely an enigma, it is speculated that repeated sequences within heterochromatin may affect karyotypic evolution by facilitating rearrangements that have a minimal deleterious impact on the genome [
5]. Repetitive sequences, especially TEs, can amplify themselves; can stimulate chromosome rearrangement, including the inversion, duplication, or deletion of adjacent DNA, translocation, chromosome breaking and repairing, and aborted transposition; or can cause ectopic recombination between homologous repeated elements at different chromosomal locations. Therefore, the genome structure of a species is largely the outcome of TE actions and of the cellular processes that act on TEs [
94].