RNA-Seq Analysis Reveals Localization-Associated Alternative Splicing across 13 Cell Lines

Zeng, Chao; Hamada, Michiaki

doi:10.3390/genes11070820

Open AccessArticle

RNA-Seq Analysis Reveals Localization-Associated Alternative Splicing across 13 Cell Lines

by

Chao Zeng

^1,2,*

and

Michiaki Hamada

^1,2,3,4,*

¹

AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo 169-8555, Japan

²

Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan

³

Institute for Medical-oriented Structural Biology, Waseda University, Tokyo 162-8480, Japan

⁴

Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan

^*

Authors to whom correspondence should be addressed.

Genes 2020, 11(7), 820; https://0-doi-org.brum.beds.ac.uk/10.3390/genes11070820

Submission received: 19 June 2020 / Revised: 15 July 2020 / Accepted: 17 July 2020 / Published: 18 July 2020

(This article belongs to the Special Issue Splicing: The New Frontier in Therapeutics)

Download

Browse Figures

Versions Notes

Abstract

:

Alternative splicing, a ubiquitous phenomenon in eukaryotes, is a regulatory mechanism for the biological diversity of individual genes. Most studies have focused on the effects of alternative splicing for protein synthesis. However, the transcriptome-wide influence of alternative splicing on RNA subcellular localization has rarely been studied. By analyzing RNA-seq data obtained from subcellular fractions across 13 human cell lines, we identified 8720 switching genes between the cytoplasm and the nucleus. Consistent with previous reports, intron retention was observed to be enriched in the nuclear transcript variants. Interestingly, we found that short and structurally stable introns were positively correlated with nuclear localization. Motif analysis reveals that fourteen RNA-binding protein (RBPs) are prone to be preferentially bound with such introns. To our knowledge, this is the first transcriptome-wide study to analyze and evaluate the effect of alternative splicing on RNA subcellular localization. Our findings reveal that alternative splicing plays a promising role in regulating RNA subcellular localization.

Keywords:

alternative splicing; subcellular localization; retained intron

1. Introduction

Most eukaryotic genes consist of exons that are processed to produce mature RNAs and introns that are removed by RNA splicing. Alternative splicing is a regulated process by which exons can be either included or excluded in the mature RNA. Various RNAs (also called transcript variants) can be produced from a single gene through alternative splicing. Thus, alternative splicing enables a cell to express more RNA species than the number of genes it has, which increases the genome complexity [1]. In humans, for instance, approximately 95% of the multi-exonic genes undergoing alternative splicing have been revealed by high-throughput sequencing technology [2]. Exploring the functionality of alternative splicing is critical to our understanding of life mechanisms. Alternative splicing is associated with protein functions, such as diversification, molecular binding properties, catalytic and signaling abilities, and stability. Such related studies have been reviewed elsewhere [3,4]. Additionally, relationships between alternative splicing and disease [5] or cancer [6,7] have received increasing attention. Understanding the pathogenesis associated with alternative splicing can shed light on diagnosis and therapy. With the rapid development of high-throughput technology, it has become possible to study the transcriptome-wide function and mechanism of alternative splicing [8].

The localization of RNA in a cell can determine whether the RNA is translated, preserved, modified, or degraded [9,10]. In other words, the subcellular location of an RNA is strongly related to its biological function [9]. For example, the asymmetric distribution of RNA in cells can influence the expression of genes [9], formation and interaction of protein complexes [11], biosynthesis of ribosomes [12], and development of cells [13,14], among other functions. Many techniques have been developed to investigate the subcellular localization of RNAs. RNA fluorescent in situ hybridization (RNA FISH) is a conventional method to track and visualize a specific RNA by hybridizing labeled probes to the target RNA molecule [15,16]. Improved FISH methods using multiplexing probes to label multiple RNA molecules have been presented, and have expanded the range of target RNA species [17,18]. With the development of microarray and high-throughput sequencing technologies, approaches for the transcriptome-wide investigation of RNA subcellular localizations have emerged [19]. Recently, a technology applying the ascorbate peroxidase APEX2 (APEX-seq) to detect proximity RNAs has been introduced [20,21]. APEX-seq is expected to obtain unbiased, highly sensitive, and high-resolved RNA subcellular localization in vivo. Simultaneously, many related databases have been developed [22,23], which integrate RNA localization information generated by the above methods and provide valuable resources for further studies of RNA functions.

Alternative splicing can regulate RNA/protein subcellular localization [24,25,26,27]. However, to date, alternative splicing in only a limited number of genes has been examined. One approach to solve this problem involves the use of high-throughput sequencing. The goal of this research was to perform a comprehensive and transcriptome-wide study of the impact of alternative splicing on RNA subcellular localization. Therefore, we analyzed RNA-sequencing (RNA-seq) data obtained from subcellular (cytoplasmic and nuclear) fractions and investigated whether alternative splicing causes an imbalance in RNA expression between the cytoplasm and the nucleus. Briefly, we found that RNA splicing appeared to promote cytoplasmic localization. We also observed that transcripts with intron retentions preferred to localize in the nucleus. A further meta-analysis of retained introns indicated that short and structured intronic sequences were more likely to appear in nuclear transcripts. Additionally, motif analysis revealed that fourteen RBPs were enriched in the retained introns that are associated with nuclear localization. The above results were consistently observed across 13 cell lines, suggesting the reliability and robustness of the investigations. To our knowledge, this is the first transcriptome-wide study to analyze and evaluate the effect of alternative splicing on RNA subcellular localization across multiple cell lines. We believe that this research may provide valuable clues for further understanding of the biological function and mechanism of alternative splicing.

2. Materials and Methods

2.1. RNA-Seq Data and Bioinformatics

We used the RNA-seq data from the ENCODE [28] subcellular (nuclear and cytoplasmic) fractions of 13 human cell lines (A549, GM12878, HeLa-S3, HepG2, HT1080, HUVEC, IMR-90, MCF-7, NHEK, SK-MEL-5, SK-N-DZ, SK-N-SH, and K562) to quantify the localization of the transcriptome. In brief, cell lysates were divided into the nuclear and cytoplasmic fractions by centrifugal separation, filtered for RNA with poly-A tails to include mature and processed transcript sequences (>200 bp in size), and then subjected to high-throughput RNA sequencing. Ribosomal RNA molecules (rRNA) were removed with RiboMinus^TM Kit. RNA-seq libraries of strand-specific paired-end reads were sequenced by Illumina HiSeq 2000. See Supplementary Data 1 for accessions of RNA-seq data. The human genome (GRCh38) and comprehensive gene annotation were obtained from GENCODE (v29) [29]. RNA-seq reads were mapped with STAR (2.7.1a) [30] and quantified with RSEM (v1.3.1) [31] using the default parameters. Finally, we utilized SUPPA2 [32] to calculate the differential usage (change of transcript usage,

Δ T U

[33]) and the differential splicing (change of splicing inclusion,

Δ Ψ

[34]) of transcripts.

2.2. $Δ T U$ and $Δ Ψ$

To investigate the inconsistency of subcellular localization of transcript variants, we calculated the

Δ T U

from the RNA-seq data to quantify this bias. For each transcript variant,

Δ T U

indicates the change in the proportion of expression level in a gene, which is represented as

\begin{matrix} Δ T U = T_{i} / G_{i} - T_{j} / G_{j}, \end{matrix}

(1)

where i and j represent different subcellular fractions (i.e., cytoplasmic and nuclear, respectively). T and G are the expression abundance (transcripts per million, TPM) of the transcript and the corresponding gene, respectively. Expression abundances were estimated from RNA-seq data using the aforementioned RSEM. A high

Δ T U

value of a transcript indicates that it is enriched in the cytoplasm, and a low value indicates that it is enriched in the nucleus. We utilized SUPPA2 to compute

Δ T U

and assess the significance of differential transcript usage between the cytoplasm and the nucleus. For a gene, we selected a pair of transcript variants

T_{c}

and

T_{n}

that

\begin{matrix} T_{c} : = m a x {Δ T U_{i} | Δ T U_{i} > 0.2, P_{i} < 0.05}, \end{matrix}

(2)

\begin{matrix} T_{n} : = m i n {Δ T U_{i} | Δ T U_{i} < - 0.2, P_{i} < 0.05}, \end{matrix}

(3)

where

P_{i}

is the significance level of the i-th transcript variant. SUPPA2 calculates the significance level by comparing the empirical distribution of

Δ T U

between replicates and conditions (see [32] for details). To study whether the different splicing patterns affect the subcellular localization of transcript, we used

Δ Ψ

to quantify this effect. Seven types of splicing patterns—alternative 3

^{'}

splice-site (A3), alternative 5

^{'}

splice-site (A5), alternative first exon (AF), alternative last exon (AL), mutually exclusive exon (MX), retained intron (RI), and skipping exon (SE)—were considered. These splicing patterns were well-defined in previous studies [32,35]. For each alternative splicing site (e.g., RI),

\begin{matrix} Δ Ψ = \sum T P M_{i n c l u d e} / (\sum T P M_{i n c l u d e} + \sum T P M_{e x c l u d e}), \end{matrix}

(4)

where

T P M_{i n c l u d e}

and

T P M_{e x c l u d e}

represent the expression values of transcripts included and excluded, respectively, from the splicing site.

2.3. Gene Ontology (GO) Enrichment Analysis

GO enrichment analysis was performed with g:Profiler (Benjamini–Hochberg false discovery rate (FDR) < 0.001) [36]. g:Profiler applies the hypergeometric test to detect statistically significantly enriched molecular function (MF), biological processes (BP), and cellular component (CC). In addition to GO enrichment analysis of switching genes in each cell line, we also defined the shared switching genes (detected as switching genes ≥ 5 cell lines in this study, n = 1219) and conducted GO enrichment analysis.

2.4. Nonsense-Mediated RNA Decay (NMD) Sensitivity

NMD sensitivity was defined by the fold-change of transcript expression before and after knockdown of UPF1, SMG6, and SMG7 (main factors in NMD) in HeLa-S3 cells. The larger the value, the higher the degradation rate by NMD. We used the RNA-seq data (GSE86148) [37] to calculate the NMD sensitivity. RNA-seq reads were mapped with STAR (2.7.1a) [30] and quantified with RSEM (v1.3.1) [31] using the default parameters and GENCODE (v29) comprehensive gene annotation. EBseq (v1.20.0) [38] was used to calculate the fold-change of transcript expression.

2.5. RNA Structure Context Analysis

We used CapR [39] with default parameters to predict secondary structures for RNA sequences. We extracted the stem probabilities of RNA sequence to represent the stability of the RNA secondary structure. The higher the stem probability, the more stable the secondary structure of RNA.

2.6. RBP-Binding Prediction and Repeat Sequence Analysis

We applied the stand-alone version of RBPmap [40] to predict RBP binding sites. All RBPmap human/mouse stored motifs were used, and other parameters used default values. We used the repeat library (built on 31 January 2014) that mapped to human (hg38) from Repeatmasker [41]. Bedtools [42] was used to obtain overlaps between these repeat features (TEs) and the target sequences (i.e., retained introns).

3. Results

3.1. Thousands of Transcript Switches between Cytoplasm and Nucleus Were Identified across Thirteen Human Cell Lines

To assess the influence of alternative splicing on RNA subcellular localization, we analyzed RNA-seq datasets that cover the nuclear and cytoplasmic fractions in 13 human cell lines (Supplementary Data 1). Using these datasets, we calculated transcript usage changes (

Δ T U

) to identify transcript switches for genes between the cytoplasm and the nucleus. For a transcript, the transcript usage (

T U

) means the percentage of its expression in all transcript variants, and the

Δ T U

assesses the extent of differential transcript usage between two conditions (i.e., nuclear and cytoplasmic fractions). Thus, a transcript switch involves two transcript variants in a gene, one of which is predominantly expressed in the cytoplasm (

Δ T U > 0

, termed “cytoplasmic transcripts”), while the other one is mainly expressed in the nucleus (

Δ T U < 0

, termed “nuclear transcripts”). For instance, the C16orf91 gene contains two transcript variants (Figure 1A, upper). In HeLa-S3 cells, we observed that the C16orf91-201 and C16orf91-202 (also termed a transcript switch in this study) were prone to substantial expression in the cytoplasm and the nucleus, respectively (see Figure 1A, lower and Supplementary Data 2). In total, 2073 transcript switches were detected in HeLa-S3 cells (Figure 1B). Specially, using a p-value of <0.05, we further screened for the switches that were not significantly altered. For example, in HT1080, SK-MEL-5, and SK-N-DZ, we appropriately discarded transcripts that were more likely to exhibit changes in transcript usage because of low expression levels (Supplementary Data 3).

On average, 1650 pairs of transcript switches were identified across the 13 cell lines (Figure 1C). The smallest (768 pairs) and the largest (2766 pairs) were both observed in brain tissue, SK-N-DZ and SK-N-SH cells, respectively. Next, we assessed if there is a group of genes shared among cell lines with similar biological functions through the inclusion of transcript switches. We observed a total of 8720 genes containing transcript switches from these 13 cell lines (hereafter referred to as “switching genes”). More than 93% (8177 of 8720) of these switching genes were found in fewer than six cell lines, indicating that most switching genes were cell-specific (Figure 1D and Supplementary Data 4, some may also caused by the cell specificity of the gene expression). GO enrichment analysis of the remaining genes revealed that these shared switching genes (detected as switching genes

\geq 5

cell lines, n = 1219) are associated with receptor/transducer activities (adjusted p < 1 × 10

^{- 10}

, GO:0004888, GO:0060089, GO:0038023, and GO:0004930), multicellular organismal process and G protein-coupled receptor signaling pathway (adjusted p < 1 × 10

^{- 10}

, GO:0032501 and GO:0007186), plasma membrane and cell periphery (adjusted p < 1 × 10

^{- 10}

, GO:0005886 and GO:0071944). Although the GO enrichment analysis results of switching genes in each cell line are somewhat different, they are also roughly similar. (see Supplementary Data 5).

To validate the transcripts defined by

Δ T U

, we first compared the

Δ T U

between protein-coding and non-coding transcripts over cell lines. Consistently, many cytoplasmic transcripts were observed to be protein-coding transcripts with positive

Δ T U

values, while non-coding transcripts preferred to locate in the nucleus and displayed more negative

Δ T U

values (all p < 5.4 × 10

^{- 24}

, Supplementary Data 6). This result agreed with our understanding that the protein-coding transcript needs to be transported into the cytoplasm to produce proteins, while a large number of non-coding transcripts have been reported to localize in the nucleus to participate in transcriptional and post-transcriptional gene expression and chromatin organization, among other functions [43,44].

3.2. A Transcript Switch Is Associated with RNA Splicing Rather Than RNA Degradation

We first investigated the post-transcriptional influence of the NMD pathway on the transcript switch. Initially, the two transcripts in a transcript switch are equally distributed in the cytoplasm and nucleus. We consider one possible reason for the transcript switch is that the rate of degradation of the two transcripts in the cytoplasm and nucleus is different. To test this hypothesis, we calculated the sensitivity of transcripts to NMD in HeLa-S3 cells based on changes in RNA-seq data [37] before and after the knockout of UPF1, SMG6, and SMG7 (core factors of NMD). However, after comparing the sensitivity of NMD to cytoplasmic and nuclear transcripts, we did not find significant differences (Figure 2A). This observation led us to conclude that NMD may not significantly affect transcript switches. The cause of the transcript switch should be mainly due to intrinsic (sequence features) and transcriptional (e.g., splicing) effects.

Considering that the longer the transcript is, the higher the splicing frequency (or exon number) should be [45], we next determined the association between the length of the transcript and its subcellular localization. We evaluated the relationship between length and subcellular localization by calculating the length ratio (logarithmic scale) between the cytoplasmic transcript and the nuclear transcript in each transcript switch. We divided the transcript switches into three categories based on the length ratio: positive (

r a t i o > 1

), negative (

r a t i o < - 1

), and neutral (other). A positive category indicates that the longer the transcript, the more enriched in the cytoplasm, and vice versa. A neutral category means that there is weak or no correlation (

| r a t i o | \leq 1

) between transcript length and its subcellular localization. We observed that the number of transcript switches in the positive category was higher than that in the negative category (p < 1 × 10

^{- 6}

, Binomial test), which implied that the longer the transcript is, the more likely it is to be transported into the cytoplasm (Figure 2B). To verify whether the splicing frequency of the transcript is positively correlated with its cytoplasmic location, we further compared the distribution of exon numbers (i.e., splicing frequency) in the cytoplasmic transcripts and nuclear transcripts for the positive and negative categories, respectively (Figure 2B, inset). We found that the exon number of the cytoplasmic transcripts in the positive category was higher than the nuclear transcripts (see Figure 2B, inset upper, p < 2.2 × 10

^{- 308}

Mood’s median test). Based on the above observations, we speculated that there are significant differences in subcellular localization between transcripts with or without splicing events. To confirm this hypothesis, we divided the transcript switches into the mono-exonic (unspliced) and multi-exonic (spliced) groups and then compared the distribution of

Δ T U

values between them. As expected, the

Δ T U

value of multi-exonic transcripts was positive (indicating cytoplasmic localization) and was significantly and consistently higher than the negative

Δ T U

value of mono-exonic transcripts (representing nuclear localization) in all cell lines (Figure 2C and Supplementary Data 7).

3.3. Enrichment and Characterization of Retained Introns in the Nuclear Transcripts

Next, we asked whether there was a specific kind of splicing pattern associated with subcellular localization. Seven different splicing patterns (A3, A5, AF, AL, MX, RI, and SE) were considered [46]. Comparing to an original transcript, each type of splicing pattern inserts, deletes, or replaces a partial sequence that may include sequence elements, which have important or decisive effects on subcellular localization (e.g., protein-binding sites, RNA structures, etc.). We used the

Δ Ψ

, which ranges from −1 to 1, to measure the preference for a type of splicing pattern between the nucleus and the cytoplasm (see “Materials and Methods” for details). The smaller the

Δ Ψ

, the more the corresponding splicing pattern prefers the nucleus. Interestingly, in HeLa-S3 cells, we observed that retained introns were biased toward the nucleus, while other splicing patterns had no preference (Figure 3A). Consistent results were also observed in the other twelve cell lines (Supplementary Datas 8 and 9).

To further characterize the retained introns associated with nuclear localization, we first selected a set of retained introns enriched in the nucleus (

Δ Ψ < 0

and

p < 0.05

, termed nuclear RIs). Compared with all retained introns, we should be able to observe some different features of the nuclear RIs, which are likely to be the determining factors for nuclear localization. We first investigated whether the nuclear RIs have a specific splicing signal. By applying this splicing signal, it is possible to regulate the intron retention specifically, thereby monitoring the subcellular localization of the transcript. Unfortunately, we found no differences in the splicing signal between the nuclear RIs and all RIs (Figure 3B). We then considered whether the structure of the nuclear RIs (including the length and RNA secondary structure) have a specific signature. Surprisingly, the length of the nuclear RIs was significantly shorter than the overall length level (Figure 3C, upper (all p < 6.64 × 10

^{- 7}

) and Supplementary Data 10). We also found that the average stem probability of the nuclear RIs was higher than that of all RIs (Figure 3C, bottom and Supplementary Data 11). Thus, we concluded that nuclear RIs maintain a more compact and stronger structure when compared with the overall RIs.

3.4. The Preferences of RNA-Binding Proteins on Retained Introns Suggest Their Role in Nuclear Localization

We sought to further investigate whether the nuclear RIs mentioned above are associated with RBPs. We calculated the frequency of each dinucleotide in the nuclear RIs across the 13 cell lines and normalized it with the background frequency of all intronic sequences in the genome. The reason we examined the dinucleotide composition of nuclear nucleotides was due to the reported specific contact between amino acids and dinucleotides [47]. The normalized dinucleotide frequency represents the extent to which the corresponding dinucleotide is preferred in the nuclear RIs. Overexpression of the GC-rich sequence occurred in the nuclear RIs (Figure 4A). This observation also provided an intuitive hypothesis of whether RBPs that preferentially bound to RNA containing GC-rich sequences directly or indirectly affect subcellular localization.

To further explore which RBPs preferentially interact with nuclear RIs, we predicted the binding preference between an RBP and an intronic sequence using RBPmap [40], a motif-based approach. We found that across the 13 cell lines, 14 of 30 RBPs tested consistently (more than 80%) preferred attaching to the nuclear RIs (Figure 4B). These include serine/arginine-rich proteins (SRSF1, SRSF2, SRSF3, SRSF5, SRSF7), heterogeneous nuclear ribonucleoproteins (HNRNPH2, HNRNPF, PTBP1), CUG-binding proteins (CUG-BP, MBNL1, BRUNOL4, BRUNOL5), and others (TARDBP and NOVA1). Conversely, binding motifs of five RBPs (SART3, PABPC1, PABPC4, RBMS3, KHDRBS1) were consistently under-represented in the nuclear RIs. Considering a previous report showing that the repeat sequence drives the nuclear localization of lncRNA [48], we subsequently analyzed the enrichment of the repeat sequence in the nuclear RIs. In NHEK cells, we observed SINE (short interspersed nuclear element; consistent with the previous study [48]), LINE (long interspersed nuclear element), and DNA transposon enrichment in nuclear RIs (Figure 4C). However, such phenomena have not been seen in other cell lines, suggesting that retained introns may not be ubiquitous in guiding nuclear localization through repeat sequences. Additionally, we observed that an LTR (long terminal repeat) was relatively rare in the nuclear RIs in all cell lines, implying that an LTR is involved in the regulation of RNA subcellular localization.

4. Discussion

We emphasize that

Δ T U

measures the change in the proportion of gene expression for a transcript between the nucleus and the cytoplasm. An increased

Δ T U

does not imply that the transcript abundance in the cytoplasm is higher than that in the nucleus. Instead, a significantly increased

Δ T U

indicates that the transcript is stored dominantly for the corresponding gene in the cytoplasm, but is expressed as a minor isoform in the nucleus. Therefore,

Δ T U

measures the dynamic changes of a representative (dominantly expressed) transcript of a gene between the nucleus and the cytoplasm. By using

Δ T U

, we defined a transcript switch, which indicates that a single gene has two different representative transcripts in the nucleus and the cytoplasm, respectively. A gene containing a transcript switch is termed as a switching gene in this study. Due to the limitation of available experimental data, we calculated

Δ T U

and performed alternative splicing analysis based on short-read RNA-seq. However, long-read RNA-seq and direct RNA-seq can provide more clear and convincing splicing events [49,50,51,52].

We hypothesize that a switching gene can function separately in the nucleus and cytoplasm (hereafter “bifunctional gene”), respectively. Previous studies have discovered several bifunctional mRNAs, which generate coding and non-coding isoforms through alternative splicing. The coding isoform translates proteins in the cytoplasm while the non-coding isoform resides in the nucleus to function as a scaffolding agent [53,54,55,56,57], post-transcriptional component [58], or coactivator [59,60,61]. Furthermore, a controlling feedback loop between two such isoforms can be considered. This hypothesis is consistent with a phenomenon called genetic compensation response (GCR), which has been proposed and validated in recent years [62,63,64,65]. In GCR, RNA sequence fragments obtained after mRNA degradation are imported to the nucleus to target genes based on sequence similarity and regulate their expression level. We argue that GCR-like feedback may be widely present in the switching genes. The reason is that parts of exonic sequences are usually shared (implying high sequence similarities) among transcript variants in the switching genes. Indeed, we have observed 768∼2777 switching genes (Figure 1C) in the 13 cell lines, suggesting that bifunctional genes or GCR are more widespread in the human genome than expected. A total of 8720 switching genes (Figure 1D) were identified in this study. We believe that this analysis promises to provide a valuable resource for studying bifunctional or GCR-associated genes. Additionally, we observed that most of the switching genes were cell-specific (Figure 1D), suggesting that such genes may be related to genetic adaption to the environmental changes (stress, development, disease, etc.).

Based on the observation that multi-exonic transcripts are preferentially exported to the cytoplasm, we argue that splicing supports cytoplasmic localization. One possible reason is that the exon-exon junction complex (EJC) may promote RNA export. The EJC containing or interacting with the export factor binds to the transcript during splicing, while the export factor promotes cytoplasmic localization. Previous studies have reported that the EJC component in Xenopus provides binding sites [66] for export factors (REF and/or TAP/p15). The coupling between a conserved RNA export machinery and RNA splicing has also been reviewed [67].

Additionally, we observed that transcript variants containing retained introns were more likely to reside in the nucleus, which is consistent with the previous studies [68,69]. By further analyzing these retained introns, we found that transcripts, including short but structurally stable retained introns, were preferentially localized in the nucleus. It is worth noting that the above observations are still robust after we distinguish between protein coding and noncoding genes (Supplementary Datas 7, 10 and 11). We speculate that such introns function as platforms for attaching to localization shuttles (e.g., RBPs). Short introns appear to be favored by natural selection because of the low metabolic cost of transcription and other molecular processing activities [67]. The RNA structure context was discussed as being able to affect protein binding [70]. Motif analysis of the retained introns of nucleus-localized transcripts detetced 14 enriched RBPs, in which MBNL1 has been reported to bind RNA in the CUG repeats [71] and contain two nuclear signals [72]. Previous studies have reported that the CUG repeat expansion caused nuclear aggregates and RNA toxicity, which in turn induces neurological diseases [73,74]. Taken together, the data reveal a post-transcriptional regulation mechanism in which alternative splicing regulates RNA subcellular localization by including or excluding those introns containing nuclear transporter binding sites.

5. Conclusions

This study explored whether alternative splicing can regulate RNA subcellular localization. The RNA-seq data derived from the nuclear and cytoplasmic fractions of 13 cell lines were utilized to quantify transcript abundance. We applied

Δ T U

to define a pair of nuclear or cytoplasmic transcripts expressed from a single gene locus. By comparing nuclear and cytoplasmic transcripts, we consistently observed that splicing appears to promote RNA export from the nucleus [75]. Furthermore, we used

Δ Ψ

to analyze the effect of splicing patterns on RNA localization and found that intron retention was positively correlated with nuclear localization. Sequence analysis of the retained introns revealed that short and structurally stable introns were favored for nuclear transcripts. We argue that such intronic sequences provide a hotspot for interacting with nuclear proteins. Subsequently, fourteen RBPs are consistently identified to prone to associate with retained introns across 13 cell lines. We argue that cells can regulate the subcellular localization and biological functions of RNAs through alternative splicing of introns, which contain localization elements (RNA structure context or sequence motifs). These findings can help us better understand the regulatory mechanisms of RNA localization, and thus, for example, predict RNA localization more precisely from sequence characteristics [76].

Although we used an EM-based method [31] to predict the transcript abundance between overlapping transcripts, this transcript quantification is still very challenging. One possible solution is to obtain more accurate transcript structures and their expression level through long reads (such as Nanopore sequencing [77,78]). Studying the effects of alternative splicing on RNA localization in healthy tissues or other species would be another challenging work. In summary, this research provides important clues for studying the mechanism of alternative splicing on gene expression regulation. We also believe that the switching genes defined in this study (Supplementary Data 4) will provide a valuable resource for studying gene functions.

Supplementary Materials

The following are available online at https://0-www-mdpi-com.brum.beds.ac.uk/2073-4425/11/7/820/s1, Supplementary Data 1: RNA-seq data used in this study. Supplementary Data 2: Read coverage of gene C16orf91 in HeLa-S3. Supplementary Data 3: Transcriptome-wide identification of transcript variant switches in 12 human cell lines. Supplementary Data 4: List of switching genes identified across 13 cell lines in this study. Gene Ontology enrichment analysis for switch genes. Supplementary Data 5: Gene Ontology enrichment analysis for switch genes. Supplementary Data 6: Comparison of

Δ T U

between protein-coding transcripts and non-coding transcripts. Supplementary Data 7: Comparison of

Δ T U

between mono-exonic and multi-exonic transcripts across 13 cell lines. Supplementary Data 8: Comparison of alternative splicing patterns between cytoplasmic and nuclear transcripts (all genes). Supplementary Data 9: Comparison of alternative splicing patterns between cytoplasmic and nuclear transcripts (protein-coding and noncoding genes). Supplementary Data 10: Comparison of length between all introns and nuclear RIs. Supplementary Data 11: Comparison of RNA secondary structure (stem probability) between all introns and nuclear RIs.

Author Contributions

M.H. and C.Z. conceived this study and contributed to the analysis and interpretation of the data. C.Z. performed all the experiments and wrote the draft, M.H. revised it critically. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Education, Culture, Sports, Science and Technology (KAKENHI) [grant numbers JP17K20032, JP16H05879, JP16H01318, and JP16H02484 to M.H.].

Acknowledgments

C.Z. and M.H. are grateful to Martin Frith, Yutaka Saito, and Masahiro Onoguchi for valuable discussions. Computations were partially performed on the NIG supercomputer at ROIS National Institute of Genetics.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RNA FISH	RNA fluorescent in situ hybridization
RNA-seq	RNA sequencing
RBP	RNA-binding protein
NLS	nuclear signal sequence
$Δ T U$	change of transcript usage
GO	Gene Ontology
MF	molecular function
BP	biological processes
CC	and cellular component
NMD	nonsense-mediated decay
$Δ Ψ$	change of splicing inclusion
TE	transposon element
RI	retained intron
SINE	Short interspersed nuclear element
LINE	Long interspersed nuclear element
LTR	Long terminal repeat
GCR	genetic compensation response
EJC	exon-exon junction complex
TPM	transcripts per million
A3	alternative 3 $^{'}$ splice-site
A5	alternative 5 $^{'}$ splice-site
AF	alternative first exon
AL	alternative last exon
MX	mutually exclusive exon
SE	skipping exon

References

Brett, D.; Pospisil, H.; Valcárcel, J.; Reich, J.; Bork, P. Alternative splicing and genome complexity. Nat. Genet. 2002, 30, 29–30. [Google Scholar] [CrossRef]
Pan, Q.; Shai, O.; Lee, L.J.; Frey, B.J.; Blencowe, B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 2008, 40, 1413–1415. [Google Scholar] [CrossRef]
Graveley, B.R. Alternative splicing: Increasing diversity in the proteomic world. Trends Genet. 2001, 17, 100–107. [Google Scholar] [CrossRef]
Stamm, S.; Ben-Ari, S.; Rafalska, I.; Tang, Y.; Zhang, Z.; Toiber, D.; Thanaraj, T.; Soreq, H. Function of alternative splicing. Gene 2005, 344, 1–20. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cieply, B.; Carstens, R.P. Functional roles of alternative splicing factors in human disease. Wiley Interdiscip. Rev. RNA 2015, 6, 311–326. [Google Scholar] [CrossRef] [PubMed] [Green Version]
David, C.J.; Manley, J.L. Alternative pre-mRNA splicing regulation in cancer: Pathways and programs unhinged. Genes Dev. 2010, 24, 2343–2364. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Di, C.; Zhang, Q.; Chen, Y.; Wang, Y.; Zhang, X.; Liu, Y.; Sun, C.; Zhang, H.; Hoheisel, J.D. Function, clinical application, and strategies of Pre-mRNA splicing in cancer. Cell Death Differ. 2019, 26, 1181–1194. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ule, J.; Blencowe, B.J. Alternative Splicing Regulatory Networks: Functions, Mechanisms, and Evolution. Mol. Cell 2019, 76, 329–345. [Google Scholar] [CrossRef] [PubMed]
Buxbaum, A.R.; Haimovich, G.; Singer, R.H. In the right place at the right time: Visualizing and understanding mRNA localization. Nat. Rev. Mol. Cell Biol. 2015, 16, 95–109. [Google Scholar] [CrossRef]
Fasken, M.B.; Corbett, A.H. Mechanisms of nuclear mRNA quality control. RNA Biol. 2009, 6, 237–241. [Google Scholar] [CrossRef] [Green Version]
Nevo-Dinur, K.; Govindarajan, S.; Amster-Choder, O. Subcellular localization of RNA and proteins in prokaryotes. Trends Genet. 2012, 28, 314–322. [Google Scholar] [CrossRef] [PubMed]
Dermit, M.; Dodel, M.; Lee, F.C.; Azman, M.S.; Schwenzer, H.; Jones, J.L.; Blagden, S.P.; Ule, J.; Mardakheh, F.K. Subcellular mRNA localization regulates ribosome biogenesis in migrating cells. bioRxiv 2019. [Google Scholar] [CrossRef]
Bashirullah, A.; Cooperstock, R.L.; Lipshitz, H.D. RNA localization in development. Annu. Rev. Biochem. 1998, 67, 335–394. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhou, Y.; King, M.L. Sending RNAs into the future: RNA localization and germ cell fate. IUBMB Life 2004, 56, 19–27. [Google Scholar]
Lawrence, J.B.; Singer, R.H. Intracellular localization of messenger RNAs for cytoskeletal proteins. Cell 1986, 45, 407–415. [Google Scholar] [CrossRef]
Femino, A.M.; Fay, F.S.; Fogarty, K.; Singer, R.H. Visualization of single RNA transcripts in situ. Science 1998, 280, 585–590. [Google Scholar] [CrossRef] [Green Version]
Raj, A.; Van Den Bogaard, P.; Rifkin, S.A.; Van Oudenaarden, A.; Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 2008, 5, 877–879. [Google Scholar] [CrossRef] [Green Version]
Lee, J.H.; Daugharthy, E.R.; Scheiman, J.; Kalhor, R.; Yang, J.L.; Ferrante, T.C.; Terry, R.; Jeanty, S.S.F.; Li, C.; Amamoto, R.; et al. Highly multiplexed subcellular RNA sequencing in situ. Science 2014, 343, 1360–1363. [Google Scholar] [CrossRef] [Green Version]
Taliaferro, J.M.; Wang, E.T.; Burge, C.B. Genomic analysis of RNA localization. RNA Biol. 2014, 11, 1040–1050. [Google Scholar] [CrossRef] [Green Version]
Fazal, F.M.; Han, S.; Parker, K.R.; Kaewsapsak, P.; Xu, J.; Boettiger, A.N.; Chang, H.Y.; Ting, A.Y. Atlas of subcellular RNA localization revealed by APEX-seq. Cell 2019, 178, 473–490. [Google Scholar] [CrossRef]
Padron, A.; Iwasaki, S.; Ingolia, N.T. Proximity RNA labeling by APEX-seq reveals the organization of translation initiation complexes and repressive RNA granules. Mol. Cell 2019, 75, 875–887. [Google Scholar] [CrossRef] [PubMed]
Zhang, T.; Tan, P.; Wang, L.; Jin, N.; Li, Y.; Zhang, L.; Yang, H.; Hu, Z.; Zhang, L.; Hu, C.; et al. RNALocate: A resource for RNA subcellular localizations. Nucleic Acids Res. 2016, 45, D135–D138. [Google Scholar] [PubMed]
Mas-Ponte, D.; Carlevaro-Fita, J.; Palumbo, E.; Pulido, T.H.; Guigo, R.; Johnson, R. LncATLAS database for subcellular localization of long noncoding RNAs. RNA 2017, 23, 1080–1087. [Google Scholar] [CrossRef] [Green Version]
Zhou, W.; Liu, Z.; Wu, J.; Liu, J.h.; Hyder, S.M.; Antoniou, E.; Lubahn, D.B. Identification and characterization of two novel splicing isoforms of human estrogen-related receptor β. J. Clin. Endocrinol. Metab. 2006, 91, 569–579. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hakre, S.; Tussie-Luna, M.I.; Ashworth, T.; Novina, C.D.; Settleman, J.; Sharp, P.A.; Roy, A.L. Opposing functions of TFII-I spliced isoforms in growth factor-induced gene expression. Mol. Cell 2006, 24, 301–308. [Google Scholar] [CrossRef] [PubMed]
Wiesenthal, A.; Hoffmeister, M.; Siddique, M.; Kovacevic, I.; Oess, S.; Müller-Esterl, W.; Siehoff-Icking, A. NOSTRINβ—A shortened NOSTRIN variant with a role in transcriptional regulation. Traffic 2009, 10, 26–34. [Google Scholar] [CrossRef] [PubMed]
Bombail, V.; Collins, F.; Brown, P.; Saunders, P.T. Modulation of ERα transcriptional activity by the orphan nuclear receptor ERRβ and evidence for differential effects of long-and short-form splice variants. Mol. Cell. Endocrinol. 2010, 314, 53–61. [Google Scholar] [CrossRef] [Green Version]
Djebali, S.; Davis, C.A.; Merkel, A.; Dobin, A.; Lassmann, T.; Mortazavi, A.; Tanzer, A.; Lagarde, J.; Lin, W.; Schlesinger, F.; et al. Landscape of transcription in human cells. Nature 2012, 489, 101–108. [Google Scholar] [CrossRef]
Frankish, A.; Diekhans, M.; Ferreira, A.M.; Johnson, R.; Jungreis, I.; Loveland, J.; Mudge, J.M.; Sisu, C.; Wright, J.; Armstrong, J.; et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019, 47, D766–D773. [Google Scholar] [CrossRef] [Green Version]
Dobin, A.; Davis, C.A.; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29, 15–21. [Google Scholar] [CrossRef]
Li, B.; Dewey, C.N. RSEM: Accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinform. 2011, 12, 323. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Trincado, J.L.; Entizne, J.C.; Hysenaj, G.; Singh, B.; Skalic, M.; Elliott, D.J.; Eyras, E. SUPPA2: Fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 2018, 19, 1–11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Trapnell, C.; Hendrickson, D.G.; Sauvageau, M.; Goff, L.; Rinn, J.L.; Pachter, L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 2013, 31, 46. [Google Scholar] [CrossRef] [PubMed]
Wang, E.T.; Sandberg, R.; Luo, S.; Khrebtukova, I.; Zhang, L.; Mayr, C.; Kingsmore, S.F.; Schroth, G.P.; Burge, C.B. Alternative isoform regulation in human tissue transcriptomes. Nature 2008, 456, 470. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Shen, S.; Park, J.W.; Lu, Z.x.; Lin, L.; Henry, M.D.; Wu, Y.N.; Zhou, Q.; Xing, Y. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl. Acad. Sci. USA 2014, 111, E5593–E5601. [Google Scholar] [CrossRef] [Green Version]
Raudvere, U.; Kolberg, L.; Kuzmin, I.; Arak, T.; Adler, P.; Peterson, H.; Vilo, J. g: Profiler: A web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019, 47, W191–W198. [Google Scholar] [CrossRef] [Green Version]
Colombo, M.; Karousis, E.D.; Bourquin, J.; Bruggmann, R.; Mühlemann, O. Transcriptome-wide identification of NMD-targeted human mRNAs reveals extensive redundancy between SMG6- and SMG7-mediated degradation pathways. RNA 2017, 23, 189–201. [Google Scholar] [CrossRef] [Green Version]
Leng, N.; Dawson, J.A.; Thomson, J.A.; Ruotti, V.; Rissman, A.I.; Smits, B.M.; Haag, J.D.; Gould, M.N.; Stewart, R.M.; Kendziorski, C. EBSeq: An empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 2013, 29, 1035–1043. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Fukunaga, T.; Ozaki, H.; Terai, G.; Asai, K.; Iwasaki, W.; Kiryu, H. CapR: Revealing structural specificities of RNA-binding protein target recognition using CLIP-seq data. Genome Biol. 2014, 15, R16. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Paz, I.; Kosti, I.; Ares Jr, M.; Cline, M.; Mandel-Gutfreund, Y. RBPmap: A web server for mapping binding sites of RNA-binding proteins. Nucleic Acids Res. 2014, 42, W361–W367. [Google Scholar] [CrossRef] [PubMed]
Smit, A.; Hubley, R.; Green, P. RepeatMasker Open-4.0, 2013–2015. 2013. Available online: http://www.repeatmasker.org (accessed on 1 October 2019).
Quinlan, A.R.; Hall, I.M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26, 841–842. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Quinn, J.J.; Chang, H.Y. Unique features of long non-coding RNA biogenesis and function. Nat. Rev. Genet. 2016, 17, 47. [Google Scholar] [CrossRef] [PubMed]
Marchese, F.P.; Raimondi, I.; Huarte, M. The multidimensional mechanisms of long noncoding RNA function. Genome Biol. 2017, 18, 206. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Regulapati, R.; Bhasi, A.; Singh, C.K.; Senapathy, P. Origination of the split structure of spliceosomal genes from random genetic sequences. PLoS ONE 2008, 3, e3456. [Google Scholar] [CrossRef]
Alamancos, G.P.; Pagès, A.; Trincado, J.L.; Bellora, N.; Eyras, E. Leveraging transcript quantification for fast computation of alternative splicing profiles. RNA 2015, 21, 1521–1531. [Google Scholar] [CrossRef] [Green Version]
Fernandez, M.; Kumagai, Y.; Standley, D.M.; Sarai, A.; Mizuguchi, K.; Ahmad, S. Prediction of dinucleotide-specific RNA-binding sites in proteins. BMC Bioinform. 2011, 12, S5. [Google Scholar] [CrossRef] [Green Version]
Lubelsky, Y.; Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 2018, 555, 107–111. [Google Scholar] [CrossRef]
Park, E.; Pan, Z.; Zhang, Z.; Lin, L.; Xing, Y. The expanding landscape of alternative splicing variation in human populations. Am. J. Hum. Genet. 2018, 102, 11–26. [Google Scholar] [CrossRef] [Green Version]
Wen, W.X.; Mead, A.J.; Thongjuea, S. Technological advances and computational approaches for alternative splicing analysis in single cells. Comput. Struct. Biotechnol. J. 2020, 18, 332–343. [Google Scholar] [CrossRef]
Parker, M.T.; Knop, K.; Sherwood, A.V.; Schurch, N.J.; Mackinnon, K.; Gould, P.D.; Hall, A.J.; Barton, G.J.; Simpson, G.G. Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification. eLife 2020, 9, e49658. [Google Scholar] [CrossRef]
Tang, A.D.; Soulette, C.M.; van Baren, M.J.; Hart, K.; Hrabeta-Robinson, E.; Wu, C.J.; Brooks, A.N. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 2020, 11, 1–12. [Google Scholar] [CrossRef] [Green Version]
Kloc, M.; Wilk, K.; Vargas, D.; Shirato, Y.; Bilinski, S.; Etkin, L.D. Potential structural role of non-coding and coding RNAs in the organization of the cytoskeleton at the vegetal cortex of Xenopus oocytes. Development 2005, 132, 3445–3457. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jenny, A.; Hachet, O.; Závorszky, P.; Cyrklaff, A.; Weston, M.D.; St Johnston, D.; Erdélyi, M.; Ephrussi, A. A translation-independent role of oskar RNA in early Drosophila oogenesis. Development 2006, 133, 2827–2833. [Google Scholar] [CrossRef] [Green Version]
Lim, S.; Kumari, P.; Gilligan, P.; Quach, H.N.B.; Mathavan, S.; Sampath, K. Dorsal activity of maternal squint is mediated by a non-coding function of the RNA. Development 2012, 139, 2903–2915. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Shevtsov, S.P.; Dundr, M. Nucleation of nuclear bodies by RNA. Nat. Cell Biol. 2011, 13, 167–173. [Google Scholar] [CrossRef] [PubMed]
Jansen, G.; Groenen, P.J.; Bächner, D.; Jap, P.H.; Coerwinkel, M.; Oerlemans, F.; van den Broek, W.; Gohlsch, B.; Pette, D.; Plomp, J.J.; et al. Abnormal myotonic dystrophy protein kinase levels produce only mild myopathy in mice. Nat. Genet. 1996, 13, 316. [Google Scholar] [CrossRef]
Gong, C.; Maquat, L.E. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 2011, 470, 284–288. [Google Scholar] [CrossRef] [Green Version]
Chooniedass-Kothari, S.; Emberley, E.; Hamedani, M.; Troup, S.; Wang, X.; Czosnek, A.; Hube, F.; Mutawe, M.; Watson, P.; Leygue, E. The steroid receptor RNA activator is the first functional RNA encoding a protein. FEBS Lett. 2004, 566, 43–47. [Google Scholar] [CrossRef]
Hubé, F.; Velasco, G.; Rollin, J.; Furling, D.; Francastel, C. Steroid receptor RNA activator protein binds to and counteracts SRA RNA-mediated activation of MyoD and muscle differentiation. Nucleic Acids Res. 2010, 39, 513–525. [Google Scholar] [CrossRef] [Green Version]
Lanz, R.B.; McKenna, N.J.; Onate, S.A.; Albrecht, U.; Wong, J.; Tsai, S.Y.; Tsai, M.J.; O’Malley, B.W. A steroid receptor coactivator, SRA, functions as an RNA and is present in an SRC-1 complex. Cell 1999, 97, 17–27. [Google Scholar] [CrossRef] [Green Version]
Rossi, A.; Kontarakis, Z.; Gerri, C.; Nolte, H.; Hölper, S.; Krüger, M.; Stainier, D.Y. Genetic compensation induced by deleterious mutations but not gene knockdowns. Nature 2015, 524, 230. [Google Scholar] [CrossRef] [PubMed]
El-Brolosy, M.A.; Stainier, D.Y. Genetic compensation: A phenomenon in search of mechanisms. PLoS Genet. 2017, 13, e1006780. [Google Scholar] [CrossRef] [PubMed] [Green Version]
El-Brolosy, M.A.; Kontarakis, Z.; Rossi, A.; Kuenne, C.; Günther, S.; Fukuda, N.; Kikhi, K.; Boezio, G.L.M.; Takacs, C.M.; Lai, S.L.; et al. Genetic compensation triggered by mutant mRNA degradation. Nature 2019, 568, 193–197. [Google Scholar] [CrossRef] [PubMed]
Ma, Z.; Zhu, P.; Shi, H.; Guo, L.; Zhang, Q.; Chen, Y.; Chen, S.; Zhang, Z.; Peng, J.; Chen, J. PTC-bearing mRNA elicits a genetic compensation response via Upf3a and COMPASS components. Nature 2019, 568, 259–263. [Google Scholar] [CrossRef] [PubMed]
Le Hir, H.; Gatfield, D.; Izaurralde, E.; Moore, M.J. The exon–exon junction complex provides a binding platform for factors involved in mRNA export and nonsense-mediated mRNA decay. EMBO J. 2001, 20, 4987–4997. [Google Scholar] [CrossRef] [Green Version]
Reed, R.; Hurt, E. A conserved mRNA export machinery coupled to pre-mRNA splicing. Cell 2002, 108, 523–531. [Google Scholar] [CrossRef] [Green Version]
Gonzàlez-Porta, M.; Frankish, A.; Rung, J.; Harrow, J.; Brazma, A. Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene. Genome Biol. 2013, 14, R70. [Google Scholar] [CrossRef] [Green Version]
Boutz, P.L.; Bhutkar, A.; Sharp, P.A. Detained introns are a novel, widespread class of post-transcriptionally spliced introns. Genes Dev. 2015, 29, 63–80. [Google Scholar] [CrossRef] [Green Version]
Wan, Y.; Kertesz, M.; Spitale, R.C.; Segal, E.; Chang, H.Y. Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 2011, 12, 641–655. [Google Scholar] [CrossRef] [Green Version]
Warf, M.B.; Berglund, J.A. MBNL binds similar RNA structures in the CUG repeats of myotonic dystrophy and its pre-mRNA substrate cardiac troponin T. RNA 2007, 13, 2238–2251. [Google Scholar] [CrossRef] [Green Version]
Taylor, K.; Sznajder, Ł.J.; Cywoniuk, P.; Thomas, J.D.; Swanson, M.S.; Sobczak, K. MBNL splicing activity depends on RNA binding site structural context. Nucleic Acids Res. 2018, 46, 9119–9133. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Goodwin, M.; Mohan, A.; Batra, R.; Lee, K.Y.; Charizanis, K.; Gómez, F.J.F.; Eddarkaoui, S.; Sergeant, N.; Buée, L.; Kimura, T.; et al. MBNL sequestration by toxic RNAs and RNA misprocessing in the myotonic dystrophy brain. Cell Rep. 2015, 12, 1159–1168. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, P.Y.; Chang, K.T.; Lin, Y.M.; Kuo, T.Y.; Wang, G.S. Ubiquitination of MBNL1 is required for its cytoplasmic localization and function in promoting neurite outgrowth. Cell Rep. 2018, 22, 2294–2306. [Google Scholar] [CrossRef] [PubMed]
Valencia, P.; Dias, A.P.; Reed, R. Splicing promotes rapid and efficient mRNA export in mammalian cells. Proc. Natl. Acad. Sci. USA 2008, 105, 3386–3391. [Google Scholar] [CrossRef] [Green Version]
Zuckerman, B.; Ulitsky, I. Predictive models of subcellular localization of long RNAs. RNA 2019, 25, 557–572. [Google Scholar] [CrossRef]
Soneson, C.; Yao, Y.; Bratus-Neuenschwander, A.; Patrignani, A.; Robinson, M.D.; Hussain, S. A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes. Nat. Commun. 2019, 10, 3359. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Workman, R.E.; Tang, A.D.; Tang, P.S.; Jain, M.; Tyson, J.R.; Razaghi, R.; Zuzarte, P.C.; Gilpatrick, T.; Payne, A.; Quick, J.; et al. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat. Methods 2019, 16, 1297–1305. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Transcript switches between cytoplasm and nucleus. (A) The upper panel shows two transcript variants of the gene C16orf91. The lower panel is an example for the identification of a transcript variant switch in the gene C16orf91 using

Δ T U

. C16orf91-201 and C16orf91-202 were mainly localized in the cytoplasm and the nucleus, respectively, in HeLa-S3 cells (see Supplementary Data 2 for read coverage on C16orf91 gene). P values are calculated with SUPPA2 [32] (see Materials and Methods). (B) Transcriptome-wide identification of transcript variant switches in HeLa-S3 cells. We applied

| Δ T U | > 0.25

and

p < 0.05

to filter out cytoplasmic (

| Δ T U | > 0

, blue) and nuclear (

| Δ T U | < 0

, red) transcript variants. Unknown (black) are transcripts filtered as no significant. (C) Number of transcript variant switches across the 13 cell lines. The percentage on the histogram indicates the content of protein-coding transcript. (D) Number of genes containing transcript variant switches shared across the 13 cell lines. For example, 4015 genes were identified as being cell specific.

Figure 1. Transcript switches between cytoplasm and nucleus. (A) The upper panel shows two transcript variants of the gene C16orf91. The lower panel is an example for the identification of a transcript variant switch in the gene C16orf91 using

Δ T U

. C16orf91-201 and C16orf91-202 were mainly localized in the cytoplasm and the nucleus, respectively, in HeLa-S3 cells (see Supplementary Data 2 for read coverage on C16orf91 gene). P values are calculated with SUPPA2 [32] (see Materials and Methods). (B) Transcriptome-wide identification of transcript variant switches in HeLa-S3 cells. We applied

| Δ T U | > 0.25

and

p < 0.05

to filter out cytoplasmic (

| Δ T U | > 0

, blue) and nuclear (

| Δ T U | < 0

, red) transcript variants. Unknown (black) are transcripts filtered as no significant. (C) Number of transcript variant switches across the 13 cell lines. The percentage on the histogram indicates the content of protein-coding transcript. (D) Number of genes containing transcript variant switches shared across the 13 cell lines. For example, 4015 genes were identified as being cell specific.

Figure 2. Characterization of cytoplasmic and nuclear transcripts. (A) NMD does not explain transcript variant switches between the cytoplasm and the nucleus. NMD sensitivity was defined by the fold-change of transcript expression before and after knockdown of NMD factor UPF1 (upper), SMG6 (middle), and SMG7 (lower) in HeLa-S3 cells. P values: Mood’s median test. ns: not significant. (B) Comparison of transcript length between cytoplasmic (

Δ T U^{+}

) and nuclear (

Δ T U^{-}

) transcripts. For each pair of transcript switches, the ratio of transcript length between the cytoplasm and the nucleus is defined as a metric.

R a t i o > 1

indicates a group of transcripts where the length of the cytoplasmic transcripts is greater than that of the nuclear transcripts. In the group, splicing frequency (or exon number) is over-expressed in the cytoplasmic transcripts (inset, upper, p < 2.2 × 10

^{- 308}

Mood’s median test), suggesting splicing can promote cytoplasmic localization. No significant difference in splicing frequency was observed in the group if the transcripts

R a t i o < - 1

(inset, bottom, p < 4.8 × 10

^{- 12}

Mood’s median test). (C) Comparison of

Δ T U

between mono-exonic (black) and multi-exonic (red) transcripts across 13 cell lines.

Δ T U

shows a significant positive correlation with splicing, indicating that splicing appears to be a dominant factor for RNA export from the nucleus. Based on two-tailed t-test, we calculated the significant level of difference between

Δ T U

in mono-exonic and multi-exonic transcripts in each cell line. The largest p value is 3.898 × 10

^{- 3}

.

Figure 2. Characterization of cytoplasmic and nuclear transcripts. (A) NMD does not explain transcript variant switches between the cytoplasm and the nucleus. NMD sensitivity was defined by the fold-change of transcript expression before and after knockdown of NMD factor UPF1 (upper), SMG6 (middle), and SMG7 (lower) in HeLa-S3 cells. P values: Mood’s median test. ns: not significant. (B) Comparison of transcript length between cytoplasmic (

Δ T U^{+}

) and nuclear (

Δ T U^{-}

) transcripts. For each pair of transcript switches, the ratio of transcript length between the cytoplasm and the nucleus is defined as a metric.

R a t i o > 1

indicates a group of transcripts where the length of the cytoplasmic transcripts is greater than that of the nuclear transcripts. In the group, splicing frequency (or exon number) is over-expressed in the cytoplasmic transcripts (inset, upper, p < 2.2 × 10

^{- 308}

Mood’s median test), suggesting splicing can promote cytoplasmic localization. No significant difference in splicing frequency was observed in the group if the transcripts

R a t i o < - 1

(inset, bottom, p < 4.8 × 10

^{- 12}

Mood’s median test). (C) Comparison of

Δ T U

between mono-exonic (black) and multi-exonic (red) transcripts across 13 cell lines.

Δ T U

shows a significant positive correlation with splicing, indicating that splicing appears to be a dominant factor for RNA export from the nucleus. Based on two-tailed t-test, we calculated the significant level of difference between

Δ T U

in mono-exonic and multi-exonic transcripts in each cell line. The largest p value is 3.898 × 10

^{- 3}

.

Figure 3. Comparison of alternative splicing patterns between cytoplasmic and nuclear transcripts. (A) Retained introns are enriched in the nuclear transcripts (HeLa-S3).

Δ Ψ

indicates the inclusion level for a specific alternative splicing pattern. A3: alternative 3

^{'}

splice-site; A5: alternative 5

^{'}

splice-site; AF: alternative first exon; AL: alternative last exon; MX: mutually exclusive exon; RI: retained intron; SE: skipping exon. Comparison of (B) splice sites, (C) Length (top) and RNA secondary structure (bottom) between all introns and nuclear RIs (

Δ Ψ < 0

and

p < 0.05

). Sequence logos show intron-exon (the left) and exon-intron (the right) splice boundaries. p values: two-tailed t-test between all introns and nuclear RIs. The largest p value is 6.54 × 10

^{- 7}

and 4.97 × 10

^{- 3}

for length and RNA secondary structure, respectively.

Figure 3. Comparison of alternative splicing patterns between cytoplasmic and nuclear transcripts. (A) Retained introns are enriched in the nuclear transcripts (HeLa-S3).

Δ Ψ

indicates the inclusion level for a specific alternative splicing pattern. A3: alternative 3

^{'}

splice-site; A5: alternative 5

^{'}

splice-site; AF: alternative first exon; AL: alternative last exon; MX: mutually exclusive exon; RI: retained intron; SE: skipping exon. Comparison of (B) splice sites, (C) Length (top) and RNA secondary structure (bottom) between all introns and nuclear RIs (

Δ Ψ < 0

and

p < 0.05

). Sequence logos show intron-exon (the left) and exon-intron (the right) splice boundaries. p values: two-tailed t-test between all introns and nuclear RIs. The largest p value is 6.54 × 10

^{- 7}

and 4.97 × 10

^{- 3}

for length and RNA secondary structure, respectively.

Figure 4. Preference of dinucleotide, RBPs, and TEs on nuclear RIs. The heatmap represents the relative density of a specific (A) dinucleotide, (B) RBP-binding site, or (C) TE (row) on nuclear RIs across multiple cell lines (columns) compared with that of all introns (Ctr.).

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zeng, C.; Hamada, M. RNA-Seq Analysis Reveals Localization-Associated Alternative Splicing across 13 Cell Lines. Genes 2020, 11, 820. https://0-doi-org.brum.beds.ac.uk/10.3390/genes11070820

AMA Style

Zeng C, Hamada M. RNA-Seq Analysis Reveals Localization-Associated Alternative Splicing across 13 Cell Lines. Genes. 2020; 11(7):820. https://0-doi-org.brum.beds.ac.uk/10.3390/genes11070820

Chicago/Turabian Style

Zeng, Chao, and Michiaki Hamada. 2020. "RNA-Seq Analysis Reveals Localization-Associated Alternative Splicing across 13 Cell Lines" Genes 11, no. 7: 820. https://0-doi-org.brum.beds.ac.uk/10.3390/genes11070820

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RNA-Seq Analysis Reveals Localization-Associated Alternative Splicing across 13 Cell Lines

Abstract

1. Introduction

2. Materials and Methods

2.1. RNA-Seq Data and Bioinformatics

2.2. $Δ T U$ and $Δ Ψ$

2.3. Gene Ontology (GO) Enrichment Analysis

2.4. Nonsense-Mediated RNA Decay (NMD) Sensitivity

2.5. RNA Structure Context Analysis

2.6. RBP-Binding Prediction and Repeat Sequence Analysis

3. Results

3.1. Thousands of Transcript Switches between Cytoplasm and Nucleus Were Identified across Thirteen Human Cell Lines

3.2. A Transcript Switch Is Associated with RNA Splicing Rather Than RNA Degradation

3.3. Enrichment and Characterization of Retained Introns in the Nuclear Transcripts

3.4. The Preferences of RNA-Binding Proteins on Retained Introns Suggest Their Role in Nuclear Localization

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

RNA-Seq Analysis Reveals Localization-Associated Alternative Splicing across 13 Cell Lines

Abstract

1. Introduction

2. Materials and Methods

2.1. RNA-Seq Data and Bioinformatics

2.2. Δ T U and Δ Ψ

2.3. Gene Ontology (GO) Enrichment Analysis

2.4. Nonsense-Mediated RNA Decay (NMD) Sensitivity

2.5. RNA Structure Context Analysis

2.6. RBP-Binding Prediction and Repeat Sequence Analysis

3. Results

3.1. Thousands of Transcript Switches between Cytoplasm and Nucleus Were Identified across Thirteen Human Cell Lines

3.2. A Transcript Switch Is Associated with RNA Splicing Rather Than RNA Degradation

3.3. Enrichment and Characterization of Retained Introns in the Nuclear Transcripts

3.4. The Preferences of RNA-Binding Proteins on Retained Introns Suggest Their Role in Nuclear Localization

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2. $Δ T U$ and $Δ Ψ$