Next Article in Journal
Critical Roles of Tumor Extracellular Vesicles in the Microenvironment of Thoracic Cancers
Next Article in Special Issue
Predicting FOXM1-Mediated Gene Regulation through the Analysis of Genome-Wide FOXM1 Binding Sites in MCF-7, K562, SK-N-SH, GM12878 and ECC-1 Cell Lines
Previous Article in Journal
Electronic-Cigarette Vehicles and Flavoring Affect Lung Function and Immune Responses in a Murine Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Asymmetric Conservation within Pairs of Co-Occurred Motifs Mediates Weak Direct Binding of Transcription Factors in ChIP-Seq Data

by
Victor Levitsky
1,2,*,
Dmitry Oshchepkov
1,
Elena Zemlyanskaya
1,2 and
Tatyana Merkulova
1,2,*
1
Department of System Biology, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia
2
Department of Natural Science, Novosibirsk State University, 630090 Novosibirsk, Russia
*
Authors to whom correspondence should be addressed.
Int. J. Mol. Sci. 2020, 21(17), 6023; https://0-doi-org.brum.beds.ac.uk/10.3390/ijms21176023
Submission received: 25 July 2020 / Revised: 18 August 2020 / Accepted: 18 August 2020 / Published: 21 August 2020

Abstract

:
(1) Background: Transcription factors (TFs) are main regulators of eukaryotic gene expression. The cooperative binding to genomic DNA of at least two TFs is the widespread mechanism of transcription regulation. Cooperating TFs can be revealed through the analysis of co-occurrence of their motifs. (2) Methods: We applied the motifs co-occurrence tool (MCOT) that predicted pairs of spaced or overlapped motifs (composite elements, CEs) for a single ChIP-seq dataset. We improved MCOT capability for the prediction of asymmetric CEs with one of the participating motifs possessing higher conservation than another does. (3) Results: Analysis of 119 ChIP-seq datasets for 45 human TFs revealed that almost for all families of TFs the co-occurrence with an overlap between motifs of target TFs and more conserved partner motifs was significantly higher than that for less conserved partner motifs. The asymmetry toward partner TFs was the most clear for partner motifs of TFs from the ETS (E26 Transformation Specific) family. (4) Conclusion: Co-occurrence with an overlap of less conserved motif of a target TF and more conserved motifs of partner TFs explained a substantial portion of ChIP-seq data lacking conserved motifs of target TFs. Among other TF families, conservative motifs of TFs from ETS family were the most prone to mediate interaction of target TFs with its weak motifs in ChIP-seq.

Graphical Abstract

1. Introduction

Tissue-, cell- and stage-specific regulation of gene expression is produced through interactions of transcription factors (TFs) with respective regulatory elements called binding sites (BSs) or motifs; typically, each TF functions in tight cooperation with other TFs: there is a variety of mechanisms for cooperative TF–DNA binding [1,2]. Roughly, these mechanisms may be classified into simultaneous and sequential [1]. The first option implies a protein–protein interaction, and subsequent homo- or heterodimer binding to DNA. This mechanism may allow comparable or approximately equal impacts of affinity of two respective motifs. Alternatively, one TF of a pair may preliminarily interact with DNA, and at the second stage, ternary complex is formed through contributions of protein–protein and protein–DNA contacts of the second TF. This opportunity is facilitated by a higher DNA affinity of the first TF than for the second one. DNA-mediated interaction may also be facilitated by DNA conformation or nucleosomal organization [1], e.g., the propensity to interact with nucleosomal DNA is a special mark of pioneer TFs [3,4,5]. Thus, different mechanisms may explain a variety of possible TF–DNA ternary complexes, but in many cases, we may expect that behavior of two TFs is asymmetric. The recent review [6] proposed that in co-occurred pairs of motifs besides the orientation and spacing, the strength (affinity) of the individual motifs contributes to the specificity of a DNA regulatory region. Hence, systematic analysis of all possible partner motifs for various target motifs may propose the possible mechanism of cooperative TFs action.
Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) analysis became the gold standard for protein/DNA-binding annotation at the whole genome level [7]. In particular, the ChIP-seq approach has been widely applied for the annotation of TFBSs; and the standard analysis pipeline at the final stage proposed application of de novo motif discovery tools that could confirm the presence of BSs specific for target (anchor) TF [7]. Since application of these tools for a single ChIP-seq datasets became a routine procedure [8], several attempts underlined the importance of massive analysis of motifs co-occurrence that reflected the cooperative mechanisms of TF actions [9,10]. We recently proposed the motifs co-occurrence tool (MCOT) package for motifs co-occurrence prediction in ChIP-seq data [11]. MCOT possesses two specific features, which are still absent in other analogous bioinformatics tools. First, MCOT uses a single ChIP-seq dataset for discovering motifs co-occurrence with a spacer and with an overlap. Second, MCOT performs simultaneous application of several thresholds for each motif; consequently, MCOT is able to retrieve composite elements (CEs) of anchor and partner motifs with various conservation ratios. Here the conservation of a motif implies its similarity to a recognition model.
In the current study we aimed to map anchor motifs in a benchmark ChIP-seq data for various TFs and predict which potential partner TFs might mediate their binding. We relied on estimation of the (a) co-occurrence of motifs for anchor and partner TFs and (b) asymmetry of motifs conservation in respective CE. In particular, we asked whether asymmetric pairs of anchor and partner motifs with more conserved partner motifs could explain earlier known substantial portions of ChIP-seq data lacking conserved anchor motifs (about a half of a ChIP-seq dataset, [12]). To investigate this issue we proposed the improvement for the MCOT computation procedure that directly reflected whether an observed misbalance between conservation of anchor and partner motifs was significantly higher than a random expectation. Consequently, for each pair of anchor and partner motifs, beside the conventional significance respecting CE enrichment, MCOT provided two additional significances that reflected enrichments of asymmetric CEs with more conserved anchor and partner motifs.
We carefully annotated anchor motifs for benchmark ChIP-seq data. Next, we calculated the abundance of CEs with a spacer and with an overlap for potential partner motifs from a library of known partner motifs. In particular, for each partner motif we separately analyzed asymmetric CEs with higher and lower conservation of partner motifs compared to respective anchor motifs. We classified all partner motifs according to families of partner TFs [13].
We concluded that only among overlapping pairs of anchor and partner motifs respecting all families of partner TFs, pairs with higher conservation of partner motifs were significantly more abundant than those with higher conservation of anchor motifs. Various TF families were differentiated according to the misbalance between asymmetric CEs with more conserved anchor and partner motifs. Thus, overrepresented asymmetric CEs with more conserved partner motifs and less conserved anchor motifs systematically promoted weak direct interactions of anchor TFs in ChIP-seq data. Among other families, partner motifs of TFs from the ETS family had the greatest misbalance in conservation toward partner motifs. Hence, we have shown that motifs of TFs from the ETS family systematically mediate cooperative binding of other TFs through higher conservation of ETS-like motifs in widespread CEs with an overlap of motifs.

2. Results

2.1. Integration of CE Significance and CE Asymmetry in the MCOT Analysis

We earlier developed the MCOT package for the prediction of spaced and overlapped pairs of co-occurred motifs in a single ChIP-seq dataset [11]. To perform the search of CEs, MCOT required the ChIP-seq dataset (peaks), the anchor motif that refers to the target TF and either the partner motif or the assignment of a public library of proven partner motifs; in the current study, we classified partner motifs from the Hocomoco library [14] according to the respective families of partner TFs (Figure 1A).
We applied a model of the position weight matrix (PWM) for motifs recognition. Besides the classification of CEs by the orientation, we classified them into fully/partially overlapped and spaced. We considered all orientations together and we updated the CEs classification according to motifs conservation (Figure 1B). The analysis of a scatterplot between conservation of anchor and partner motifs may reveal an extent of misbalance between similarities to recognition models of their motifs, more specifically the value −Log10(FPR) is the measure of motif’s conservation, here FPR denotes the false positive rate (Section 4.2).
Basic MCOT output data represents the significance of CEs regardless conservation of motifs in a pair and those for CEs with more conserved anchor or partner motifs (Figure 1C, Table 1). Thus, separate analysis of CEs with an overlap of motifs and with a spacer, the detailed classification of CE types, integration of homologous partner motifs of the same family and massive analysis of benchmark ChIP-seq data respecting various anchor motifs allowed to appreciate abundances of structurally specific CEs with partner motifs of various families (Figure 1D).
Additionally, in this study for each pair of motifs we proposed the significance of CE asymmetry (Table 2), which for a pair of motifs compared the content of asymmetric CEs with more conserved one motif with that for more conserved another motif. Table 1 and Table 2 show 2 × 2 contingency tables that illustrate the application of Fisher’s exact tests.

2.2. Single ChIP-Seq Dataset: Example of Significant Asymmetry within CE

In this section we illustrated the calculation of the asymmetry (Table 2 and Section 4.3) for CEs of the anchor FoxA2 motif (ChIP-seq dataset from mouse liver tissue [15]) and potential partner motifs from the Hocomoco mouse core collection, [14]. In the original study [15], besides the enrichment of anchor FoxA2 motifs, the authors revealed its co-occurrence with potential BSs of partner TFs GATA4, PAX6 and HNF1. The MCOT analysis confirmed significant co-occurrences of all respective CEs. However, only for HNF1β (HNF1B_MOUSE.H11MO.0.A, [14]) we found the extremely significant asymmetry within predicted CEs toward the partner motif (p < 2 × 10−28 and p < 4 × 10−17 for CEs with an overlap of motifs and with a spacer, respectively). Figure 2 shows the difference between relative frequencies of observed and expected CEs with specific conservation of FoxA2 and HNF1β motifs for their overlapped and spaced positioning.
MCOT analysis of other ChIP-seq datasets for FoxA2 and it close homologue FoxA1 revealed that FoxA1/2-HNF1β CEs with an overlap of motifs were significant, in some cases a moderate significance was also found for respective CEs with a spacer. However, the significant asymmetry in these CEs was not observed for other FoxA1/2 ChIP-seq datasets (FoxA2 for liver cancer cell line HepG2 [16]; GSM686926, FoxA1, prostate cell line LNCaP [17] and GSM1505633, FoxA1, embryonic cell lines, [18]). Thus, CE asymmetry toward the HNF1β motif appeared to be the specific feature of FoxA2-HNF1β CEs in liver tissue.

2.3. Single ChIP-Seq Dataset: Multiple Partner TFs Support Binding of Anchor TF

The application of MCOT may provide a list of potential partner motifs with the designation of relationships between conservation of motifs in a pair. The previous section represented a sole example of CE that had a higher conservation of a partner motif than an anchor motif. In practice, multiple partner TFs may cooperate with an anchor TF, and this may be in respect to several asymmetric CEs with more conserved partner motifs (and less conserved anchor motif). This may be a possible explanation of the absence of known motifs of anchor TFs in about a half of the peaks [12,19].
We previously showed that at least for FoxA2 in two ChIP-seq datasets [15,16] almost 100% of peaks contained potential motifs of anchor TF, although this conclusion was deduced due to an alternative to the PWM recognition model [20]. Consequently, the majority of FoxA2 peaks should contain at least moderately conserved FoxA2 motifs. Hence, we considered the same FoxA2 dataset [15] and excluded from analysis 37.7% of all (4455) peaks that had the most conservative hits (FPR respecting best hits in peaks below 5.24 × 10−5) and 19.1% of peaks that had too weak conservation of FoxA2 best hits (FPR above 5 × 10−4) (Section 4.2). The rest, 43.2%, of the peaks had FoxA2 hits with a moderate or weak conservation, 5.24 × 10−5 < FPR < 5 × 10−4. We believed that a portion of these peaks should contain CEs with respect to various more conserved partner motifs. To check that possibility, we performed a MCOT analysis and required that each partner motif beside the absence of similarity to the FoxA2 motif should have the significance of asymmetric CEs toward a partner motif. Thus, we selected the top 30 motifs according to the respective CE significance and sorted them by the fraction of peaks containing asymmetric CEs with more conserved partner motifs (Table 1, Section 4.2 and Section 4.3). Figure 3A represents the ranging of these partner motifs. As we expected, almost all peaks of the fraction with moderately and weakly conserved FoxA2 hits (89.6%) contained significant CEs with more conserved partner motifs. The first ranked motif FoxQ1 belonged to the same Forkhead box (FOX) factors {3.3.1} family as the FoxA2 motif, these motifs were moderately similar (p < 0.1), i.e., the MCOT filter detected their homology as not significant. Among other top-ranked partner motifs we found the BSs for previously known co-factors HNF1α/β and HNF6 (Figure 3A). The similarity filter excluded from our analysis motif HNF4γ (HNF4G_MOUSE.H11MO.0.C). Further analysis of motifs similarity within top 30 partner motifs (Figure 3B) demonstrated that they respected to relatively small numbers of TF families [13]. Thus, besides the first ranked FoxQ1 that belonged to the Forkhead box (FOX) factors {3.3.1} family, the next eight top-ranked motifs belonged to five families:
  • Thyroid hormone receptor-related factors (NR1){2.1.2} (NR1H3_MOUSE.H11MO.0.A),
  • POU domain factors {3.1.10} (HNF1A_MOUSE.H11MO.0.A and HNF1B_MOUSE.H11MO.0.A),
  • HD-CUT factors {3.1.9} (HNF6_MOUSE.H11MO.0.A and CUX2_MOUSE.H11MO.0.C),
  • C/EBP-related {1.1.8} (NFIL3_MOUSE.H11MO.0.C),
  • SOX-related factors {4.1.1} (SOX9_MOUSE.H11MO.0.A and SOX10_MOUSE.H11MO.0.B).
Notably, asymmetric CEs FoxA2/HNF1β, FoxA2/HNF6 and FoxA2/Sox9 with respect to different structural types of FoxA2, which could be represented by TATTTATTTA, TATTGACT and TGTTT(A/G)(C/T) (Figure S1), i.e., each time the FoxA2 motif is ‘adopted’ by a partner motif.
In total, ten top-ranked asymmetric CEs with more conserved partner motifs were contained in 29.5% of all ChIP-seq peaks and in 68.2% of peaks with moderately or weakly conserved FoxA2 motifs with FPRs from 5.24 × 10−5 to 5 × 10−4 (Figure 3). Accounting for 30 top-ranked asymmetric CEs increased these fractions up to 37.3% and 86.4%, respectively. Thus, a substantial portion of the FoxA2 ChIP-seq dataset [15] contained asymmetric CEs with more conserved partner motifs.

2.4. Massive Analysis of Asymmetric CEs

2.4.1. Analysis of Partner Motifs Classified According to the TFs Families

In the previous section we performed an analysis of a single ChIP-seq dataset and approved that multiple motifs of various known and presumed partner TFs might be located near weak motifs of anchor TF (Figure 3). We asked whether certain partner motifs tended to mediate binding of various anchor TFs through asymmetric CEs with more conserved motifs of partner TFs. We took in an analysis of the benchmark data of 119 ChIP-seq datasets for 45 human TFs with annotated occurrences of anchor motifs and applied the library of 396 partner motifs from the Hocomoco database (Section 4.1). We applied the MCOT package for the prediction of CEs regardless of motifs conservation, and for CEs with more conservative anchor or partner motifs (Section 4.2 and Section 4.3, Table 1). Abundances of these types of CEs for full, partial, overlap, spacer and any computation flows for all partner motifs are given in Table S2.
Since MCOT operated motifs, but not TFs, and the Hocomoco collections contained hundreds of more or less homologous motifs of various TFs, we organized all accepted in analysis 396 partner motifs into 50 clades according to recent classification of human TFs by the structure of DNA-binding domains [13]. These clades comprised of 49 families of TFs, and also one additional subfamily of CTCF-like motifs according to previous results [12] (Section 4.5).
The integrated MCOT application for all 119 anchor and 396 partner motifs produced 3623/4228 and 4484/14718 asymmetric CEs with more conserved anchor/partner motifs for “Full” and “Overlap” computation flows, respectively; the rest of the flows revealed substantially lower amounts (45/519, 287/56 and 499/15 for “Partial”, “Spacer” and “Any” flows, respectively; Table S2). The Welch’s t-test for the number of ChIP-seq datasets possessing asymmetric CEs toward partner motifs vs. that possessing asymmetric CEs toward anchor motifs demonstrated the significance for “Full”, “Partial” and “Overlap” computation flows (p < 0.02, p < 1 × 10−34 and p < 1 × 10−170, respectively; Table S2). The computation flows “Spacer” and “Any” revealed the significance in the reverse direction, i.e., asymmetric CEs toward anchor motifs were more abundant than those toward the partner motifs (p < 1 × 10−9 and p < 1 × 10−41, respectively; Table S2). Thus, the higher conservation of partner motifs in CEs with an overlap of motifs has a systematic behavior, and abundance of such asymmetric CEs is substantially higher than that for CEs with a spacer. Hence, the focus in the consequent analysis will be on CEs with an overlap of motifs.
Figure 4 compares the number of ChIP-seq datasets containing asymmetric CEs toward partner motifs and that for asymmetric CEs toward anchor motifs for 50 selected above clades of TFs for the benchmark ChIP-seq data.
Since points for all clades in Figure 4 lie above the diagonal from the lower left to upper right (dashed line), we concluded that for all clades of partner motifs the abundance of CEs with asymmetry toward partner motifs exceeded that of CEs with asymmetry toward anchor motifs. The clades of partner TFs that were the most specific for asymmetry toward partner motifs with respect to families of ETS-related factors {3.5.2} and heteromeric CCAAT-binding factors {4.2.1} (points close to the top left corner, Figure 4). The families of p53-related factors {6.3.1}, RFX-related factors {3.3.3} and thyroid hormone receptor-related factors (NR1) {2.1.2} showed high abundance of both asymmetric CEs toward the anchor and partner motifs, since their points were close to the diagonal in Figure 4. THAP11 and CTCF-like motifs had a tendency to form asymmetric CEs toward the anchor motifs, since the top six clades for asymmetric CEs toward the anchor motifs were p53-related factors {6.3.1}, THAP-related factors {2.9.1}, CTCF-like factors {2.3.3.50}, thyroid hormone receptor-related factors (NR1) {2.1.2}, nuclear factor 1 {7.1.2} and RFX-related factors {3.3.3} (Figure 4).
To estimate for the benchmark data the enrichment of asymmetric CEs toward partner motifs vs. those toward anchor motifs we applied the Welch’s t-test for the counts of respective ChIP-seq datasets. Figure 5 shows the significance of this test as a function of the number of datasets containing CEs with an overlap of motifs (and regardless motifs conservation). Application of Bonferroni’s correction to set a threshold for the significance, p < 0.05/50 = 0.001 (Figure 5, dashed line) resulted in 45 out of 50 clades (90%) possessing the significant enrichment of the number of ChIP-seq datasets with asymmetric CEs toward the partner motifs. We found that the ETS-related factors {3.5.2} family combined
  • High abundance of CEs (axis X in Figure 5),
  • Significant enrichment of asymmetric CEs toward partner motifs vs. those asymmetric toward anchor motifs (axis Y in Figure 5);
  • High abundance of asymmetric CEs toward partner motifs in comparison with that for asymmetric CEs toward anchor motifs (Figure 4).
The majority of clades (26 out of 50) possessed the high significance, p < 1 × 10−10 (Figure 5, axis Y). The top four clades were FTZ-F1-related receptors (NR5) {2.1.5}, heteromeric CCAAT-binding factors {4.2.1}, ETS-related factors {3.5.2} and NGFI-B-related receptors (NR4) {2.1.4} (p < 1 × 10−273, p < 1 × 10−198, p < 1 × 10−88 and p < 1 × 10−66, respectively). The top twelve clades included also C/EBP-related factors {1.1.8}, CTCF-like factors {2.3.3.50}, Maf-related factors {1.1.3}, Jun-related factors {1.1.1} and Forkhead box (FOX) factors {3.3.1} (Figure 5). The differences were not significant only for five TF families: B-ATF-related factors {1.1.4}, factors with multiple dispersed zinc fingers {2.3.4}, GATA-type zinc fingers {2.2.1}, HD-CUT factors {3.1.9} and THAP-related factors {2.9.1}.

2.4.2. Analysis of Top-Ranked Partner Motifs Classified According to TFs Families

In this section, we performed the detailed analysis of concrete top-ranked partner motifs participating in asymmetric CEs with more conserved partner motifs. This analysis was motivated by occasionally observed imperfect homology of motifs within separate families of TFs (Section 4.5), i.e., analysis of the previous subsection aimed be verified by top-ranked predictions for concrete motifs from various top-ranked TF families (Figure 4 and Figure 5). In addition, we should verify the MCOT results with the previous analysis [12] that revealed Jun-like, ETS-like, CTCF-like and THAP11 overrepresented motifs for the fraction of ChIP-seq data lacking canonical motifs of anchor TFs.
Thus, initially we checked the abundance of partner motifs participating in CEs regardless of the conservation of two motifs. We applied MCOT and selected 30 top-ranked partner motifs from the Hocomoco human core collection [14] motifs, excluding homologous pairs anchor–partner (Section 4.5) and performed the motifs clustering (Figure 6). Besides the Jun-like and ETS-like motifs, in the list of top-ranked partner motifs we found RFX-like motif, two motifs from Thyroid hormone receptor-related factors (NR1) {2.1.2} family, three GATA-like and three p53-like motifs (for description of these motifs Section 4.5). In Figure 6, we marked several families of TFs that were mentioned earlier by Worsley Hunt and Wasserman [12], or revealed above in our analysis (Figure 4 and Figure 5).
Among the 30 top-ranked motifs (Figure 6) we found many BSs of TFs from the two largest families (more than 3 adjacent zinc finger factors {2.3.3} and factors with multiple dispersed zinc fingers {2.3.4}, with 76 and 20 motifs, respectively; Table S2). These families belonged to the C2H2 zinc finger TF class [13] with the highest known diversity of DNA binding specificities [21] and the lowest specificity in the benchmarking comparison with motifs with respect to other families [22]. Notably, the third largest family ETS-related factors {3.5.2} respected to 19 motifs, for these motifs the high homology was detected (Section 4.5, [23]) and good performance in benchmarking comparisons with motifs with respect to other families [22].
In general, the results of our analysis (Figure 6) are in good accordance with the previous analysis of Worsley Hunt and Wasserman [12]. Thus, ETS-like and Jun-like motifs were found among the top 30; however, we did not detect CTCF-like and THAP11 motifs, but still we previously found them among top-ranked TF clades (Figure 4 and Figure 5).
Next, we selected the 30 top-ranked partner motifs that formed asymmetric CEs toward either anchor or partner motifs, again we excluded homologous anchor–partner pairs and performed the motifs clustering (Figure 7). Notably, the separate analysis of asymmetric CEs toward the partner motifs had shown the larger variety than that for asymmetric CEs toward the anchor motifs (compare colored frames on panels A and B of Figure 7). Thus, NR1H3-like, RFX-like, GATA-like and p53-like motifs were found in both lists. Jun-like motifs, which we expected from the previous study [12], were absent in both lists. The rank of the best Jun-like motif MAF_HUMAN.H11MO.0.A was only 62 (Table S2, the column “Conservative partner, Overlap”).
As for Jun-like motifs, our analysis that took into account the conservation of motifs (Figure 7) seemed to be contradictory to the one regardless of motif conservation (Figure 6). However, the previous study [12], which revealed overrepresented Jun-like motifs for the fraction of ChIP-seq data lacking canonical anchor motifs, did not check the homology between anchor and partner motifs. Hence, we canceled the restriction on the significant homology between CE participants and confirmed that the rank of Jun-like motifs substantially increased (Figure S2). Hence, we presumed that this enrichment of partner Jun-like motifs at least partially was based on their significant similarity to anchor motifs. Thus, we could not confirm the critical importance of Jun-like TFs in cooperative binding with other TFs to DNA.
Motifs of the ETS-related factors {3.5.2} family were found only in the list of asymmetric CEs toward partner motifs (Figure 7). Consequently, ETS-like motifs had the clearest tendency among motifs with respect to other families to form asymmetric CEs with less conserved anchor motifs so that within these CEs the similarity between anchor and partner motifs was absent.
All results presented above corresponded to the MCOT computation flow “Overlap”. The respective analysis of asymmetric CEs toward partner motifs for other computation flows revealed lower abundances of asymmetric CEs (Table S2). In particular, the “Full” computation flow was shown with only two NR1H3-like motifs from the thyroid hormone receptor-related factors (NR1) {2.1.2} family (45 and 30 datasets, Table S2) and three p53-like motifs (33, 30 and 29 datasets; Table S2). In the “Partial” computation flow we revealed the first-ranked CTCF-like motif from the CTCF-like factors {2.3.3.50} subfamily, it was detected in only 8 ChIP-seq datasets, while for the spacer computation flow the first three motifs were NFYA-like (heteromeric CCAAT-binding factors {4.2.1} family) with respect to only 7, 6 and 6 datasets (Table S2).
p53-like, GATA-like and NR1H3-like motifs have shown the enrichment in both cases of asymmetry toward the anchor and partner motifs (Figure 7). Thus, among other families, motifs of the ETS family most clearly demonstrate a specific enrichment in asymmetric CEs toward partner motifs. Hence, we may suppose that ETS-like motifs facilitate weak direct interaction of anchor TFs with their cognate binding sites in ChIP-seq peaks. In this case, the ternary complex {anchor TF, TF from ETS family, DNA} is formed so that the ETS-like motif is systematically more conserved than anchor motifs, i.e., TFs from the ETS family have the leading role in the cooperative interaction with other TFs, when they bind to DNA.

3. Discussion

Many studies confirmed that ChIP-seq data possessed a substantial portion of binding regions lacking the conserved motifs of target TFs [12,19]. In the current study, we aimed to clarify whether this portion might respect weak binding motifs of anchor TFs that were located near relatively more conserved motifs of multiple partner TFs. We applied recently a developed MCOT package for prediction of motifs co-occurrence with their overlaps and with spacers in a single ChIP-seq dataset [11]. The novelty of our study consisted of analysis of specific CEs with higher conservation of either anchor or partner motifs. We improved the previous algorithm [11] for estimation of the significance of such asymmetric CEs (Figure 1C, Table 1) and developed the novel methodology to measure the asymmetry within CEs toward one of the participant motifs (Table 2). Next, we have shown the example of the significant asymmetry within earlier known CEs FoxA2-HNF1β for ChIP-seq dataset from the liver tissue (Figure 2, [15]). The higher conservation of the HNF1β motif in these CEs proposed its leading importance in cooperative binding of both TFs, e.g., presumably HNF1β binding sites were preliminary occupied by HNF1β. This hypothesis is supported by the earlier observation that TFs HNF1β and FoxA3 are sufficient to reprogram mouse embryonic fibroblasts into induced hepatic stem cells [24]. The next example (Figure 3) illustrated the action of multiple partner motifs co-occurring near anchor FoxA2 motifs in the same ChIP-seq dataset [15]. We specifically excluded peaks with the most conservative and too weak FoxA2 motifs from the analysis. Peaks with the most conservative anchor motifs probably with respect to direct FoxA2 targets, too weak FoxA2 targets potentially required an alternative to the PWM model [20], so that we expected an expressive support for FoxA2 binding from partner TFs for intermediate cases of moderately or weakly conserved FoxA2 motifs. Our analysis demonstrated that about 90% of the analyzed peaks contained asymmetric pairs of co-occurred anchor and partner motifs, so that partner motifs possessed the higher conservation in pairs (Figure 3). Conventionally, in almost all analyses before MCOT, a single threshold for a recognition model of anchor motif was applied, so that weak interactions might be missed by a standard recognition model. Hence, these weakly conserved anchor motifs probably were annotated as indirect or non-specific binding (e.g., in [12,18]). We proposed that in this case multiple overrepresented asymmetric CEs with higher/lower conservation of partner/anchor motifs explained the absence of the most conserved motifs of anchor TFs in a substantial portion of peaks.
Next, we performed a massive analysis with the benchmark ChIP-seq data to study whether partner TFs from specific families possessing common characteristics of DNA-binding domains [13] tended to form specific asymmetric CEs toward partner motifs. As follows from the previous example (Figure 3), such partner TFs might have specific opportunities to mediate systematically the interaction of anchor TFs with their cognate binding sites in ChIP-seq data. Previously, Worsley Hunt and Wasserman [12] for the benchmark ChIP-seq data demonstrated that CTCF-like, Jun-like, ETS-like and THAP11 motifs had overrepresented motifs near summits in peaks lacking the canonical motifs of anchor TFs. These enriched motifs were termed “zingers” to highlight their outstanding enrichment in ChIP-seq datasets for various anchor TFs. With this knowledge, we took our benchmark data of 119 ChIP-seq datasets for 45 distinct TFs (Table S1, [10]) with manually annotated anchor motifs derived from the de novo motif search [8] and predicted CEs with several additional criteria. In particular, we searched CEs that (a) respected a higher conservation of partner motifs than that of anchor motifs, and (b) did not respect the significant similarity between anchor and partner motifs. We proposed that the enrichment of such asymmetric CEs with simultaneously less significant enrichment of the respective CEs with higher conservation of anchor motifs, reflected a leading role of partner motifs in cooperative interaction of anchor/partner TF pairs with genomic DNA. Thus, we used a similar research strategy as Worsley Hunt and Wasserman [12], but our MCOT algorithm with varied thresholds of both motifs until the very loose (FPR = 5 × 10−4) allowed us to deduce potential CEs that were almost imperceptible with canonical threshold occurrences of anchor motifs. Moreover, our tool had the advantage for analysis of the co-occurrence of motifs with an overlap, which have been missed in previous studies for a single ChIP-seq dataset [9,10,25,26]. Additionally, the conventionally applied masking procedure (e.g., in [12]) for anchor motifs inevitably destroyed overlapping partner motifs, though overlapping of motifs were observed notably higher than their co-occurrence with a spacer [10,11,27].
Our results substantially extended and supplemented the previous study [12] (Figure 4, Figure 5, Figure 6 and Figure 7); we confirmed earlier conclusions concerning CTCF-like, Jun-like, ETS-like and THAP11 motifs. However, our analysis brought many details concerning specific families of TFs. Thus, we explained the enrichment of Jun-like motifs by their similarity to anchor motifs (Figure S2 and Figure 7). Additionally, we found partner motifs of TFs from THAP-related factors {2.9.1} among the top-ranked in the list with respect to CEs with arbitrary conservation of motifs (Figure 6) and in the list with respect to asymmetric CEs toward anchor motifs (Figure 7A). Moreover, the THAP-related factors {2.9.1} family was detected among only five families among a total of 50 clades that did not possess the significant enrichment of the abundance of asymmetric CEs toward partner motifs vs. that for asymmetric CEs toward anchor motifs (Figure 5).
The detailed analysis (Figure 4, Figure 5 and Figure 7) demonstrated that besides the proposed earlier [12] CTCF-like, Jun-like, ETS-like and THAP11 motifs, other motifs, in particular NR1H3-like, RFX-like, p53-like, NFYA-like and GATA-like also systematically promoted binding of anchor TFs in ChIP-seq data. We may conclude that ETS-like motifs comprised of CEs with their highest conservation relative to anchor motifs, with respect to CEs were not enriched in the list of top ranked predictions for asymmetry toward anchor motifs, and ETS-like motifs were not significantly similar to anchor motifs participating in significantly enriched CEs.
The family of ETS-related TFs in human consists of 28 members [21], which are further classified into several subfamilies [13,21,23]. According to the comparative analysis of human TFs [21], besides the ETS family, only several other TF families or superfamilies, e.g., nuclear receptors, STAT and T-box, had the complete coverage of known motifs and absence of secondary motifs.
Recent all-against-all benchmarking of PWM models [22] suggested that the majority of ETS members have indistinguishable DNA binding specificity according to in vitro HT-SELEX assays. Thus, while a single PWM for ELK1 (MA0028.2 from JASPAR) was the best predictor for multiple TFs from the ETS family for in vivo and in vitro experiments; this matrix also was the best performer for ChIP-seq in vivo experiments for ten TFs, only five of which were ETS family members [22]. For the rest five unrelated TFs authors proposed the recruitment to their target binding sites through protein–protein interactions with a DNA-bound ETS factor. This hypothesis is in excellent accordance with our results (Figure 4 and Figure 5).
The previous analysis of genome binding of ETS family members [23] proposed that DNA-binding specificity differences alone could not explain genomic binding diversity of TFs from the ETS family. Authors proposed two possible mechanisms to achieve specificity for a certain family member: the divergent expression patterns of various family members and the cooperative binding of ETS factors with other TFs. The first mechanism was at least partially supported by (a) only partial overlapping of expression patterns of various family members revealed in transcriptome data [28] and (b) knock-down experiments replacing one member for another [29,30]. Results of our study and previous reviews on protein–protein interaction of ETS TFs with other TFs [31,32,33] strongly supported the second mechanism, i.e., combinatorial control of transcription as a characteristic property of ETS family members.
Outstanding properties of TFs from the ETS family were also supported by protein structure analysis [34,35,36,37,38,39,40,41,42,43]. In contrast to prokaryotes, the majority of eukaryotic TFs contained long stretches of intrinsically disordered regions (IDRs), which were sequences that did not adopt a stably structured conformation but they were essential for activity [44]. In TFs, IDRs were highly enriched around DNA binding domains (DBDs), which displayed electrostatically biased surfaces to their surroundings [45]. In the ETS family IDRs and highly stable α-helices flanking the DBD (ETS domain) were autoinhibitory for ETS1, ETS2, ETV6, ERG and ETV1/4/5 binding to DNA; ETS1, SPI1 and some other members of the ETS family were also regulated by another IDR serine-rich region [32,34,35,36,37]. DBD was autoinhibited in several family members by different mechanisms. Thus, a serine-rich IDR allosterically inhibited DNA binding of ETS1 through phosphorylation-enhanced interactions with the structured DBD and flanking N- and C-terminal inhibitory α-helices [38,39], or a single flanking C-terminal α-helix sterically inhibited DNA binding of ETV6 [34,40,41]. For ETV4 acetylation of selected lysines within the N-terminal IDR activated DNA binding, a C-terminal α-helix perturbed the conformation of its DNA-recognition helix [37]. Recently, experimental study of relatively distant paralogous ETS family members ETS1 and SPI1 has shown that the binding of DNA and the synthetic peptides containing IDRs by the DBD were mutually exclusive [42].
Thus, subfamily-specific α-helices that flank DBD and TF partners through IDRs could modify during TF–TF interaction the equilibrium between active and inactive states of a TF from ETS family; also, post-translational modifications within IDRs specifically regulated an individual ETS factor [37]. Hence, the regulatory strategy of TFs from ETS family consisted of activation through recruitment by other coactivators [43]. This conclusion is in good accordance with the results of our study.
Altogether, the results of our study allow one to improve the interpretation of ChIP-seq data and, accordingly, to clarify the understanding of functional interactions between TFs. We presume that the function of partner TFs does not consist of only indirect binding of anchor TFs (“tethering”); rather, the more conserved motifs of partner TFs may overlap less conserved motifs of anchor TFs. We propose the “permanent” model of cooperative binding of anchor and partner TFs (Figure 8), where various transition situations are possible. If an anchor TF binds genomic DNA directly, then the respective anchor motif was strongly conserved (Figure 8A). The presence of another TF (partner) may induce the protein–protein interaction anchor–partner that transforms this direct binding site of an anchor TF to CE anchor–partner with a more or less conserved anchor motif (Figure 8B,C), so that an anchor motif becomes moderately or weakly conserved, respectively. Finally, it is possible that an anchor TF loses even a weak contact with DNA, so that we may find in DNA only the motif of partner TF (Figure 8D, “tethering”).
Moreover, our findings can be helpful for the functional interpretation of GWAS noncoding SNPs and for revealing new regulatory variants. Recently, the prediction of potential TFBSs in ChIP-Seq data became the popular approach for the detection of genetic variants that were causal for various pathologies by affecting TF binding and gene regulation [46,47,48]. However, in this case numerous relatively weak (but causal) TF binding variants were usually missed and taking into account of cooperative TF binding via motifs co-occurrence was considered as one of the most promising approaches to resolve this issue [49].

4. Materials and Methods

4.1. MCOT: Classification of Co-Occurred Motifs

In the current study, we applied the MCOT package as described earlier [11] with some improvement (see below). This tool annotated pairs of overrepresented motifs, i.e., CEs. Input data of tool compiled peaks of a ChIP-seq dataset in the Fasta format, an anchor motif (nucleotide frequency matrix) with respect to potential BSs of the target TF, and either a partner motif or the list of partner motifs extracted from Hocomoco human or mouse core collections [14] (Figure 1A).

4.2. Composite Elements Search and Annotation

MCOT classified CEs according to the mutual orientation of motifs, e.g., for heterotypic CEs there were four distinct orientations (Figure 1B). There were three distinct cases of mutual locations: full/partial, overlaps and spacer, consequently MCOT used five computation flows (full, partial, overlap, spacer and any, Figure 1B). MCOT applied the recognition model of PWM for mapping motifs in peaks. For each matrix, five thresholds {T1, ..., T5} were used according to the unified set of expected FPRs for a whole genome dataset of promoters, {5.24 × 10−5, 1.02 × 10−4, 1.9 × 10−4, 3.33 × 10−4, 5 × 10−4}. The profile of the most stringent hits contained PWM scores T ≥ T1, the next profile comprised of scores in the range of T2 ≥ T > T1, etc. We estimated the conservation of a motif hit through an expected FPR as −Log10(FPR). For each of the 5 × 5 = 25 combinations of motifs conservations and each computation flow MCOT compiled the 2 × 2 contingency table (Table 1) and computed the significance of the Fisher’s exact test that compared the fractions of sequences with CEs and without them in peaks and background sequences, obtaining hits of both participating motifs. The background dataset was generated as described earlier [11].
MCOT subdivided all CEs into classes of asymmetric CEs with a more conservative anchor or partner motifs (Figure 1B). Hence, two additional Fisher’s tests estimated the enrichment of asymmetric CEs (Figure 1B), i.e., MCOT compared fractions of peaks/permuted sequences that contained only CEs with more conserved anchor or partner motifs, so that these calculations performed again according to Table 1.
The test of CE asymmetry (Table 2) implied for real and permuted sequences the comparison between counts of asymmetric CEs toward one and another motif.

4.3. Significances for Asymmetric CEs and for Asymmetry within CEs

We improved the MCOT algorithm [11] to calculate the asymmetry within CE as follows. We estimated the conservation of each motif by the expectation of its occurrence in the whole genome promoter dataset with the logarithmic measure −Log10(FPR). Than we applied the criteria
  • {−Log10[FPR(Anchor)] > −Log10[FPR(Partner)]} and,
  • {−Log10[FPR(Anchor)] ≤ −Log10[FPR(Partner)]}.
Additionally, we classified all predicted CEs into two classes with a more conservative anchor or partner motifs. Next, for each class we computed the significance that compared counts of peaks containing/not containing CEs in the foreground and background datasets (Table 1). To estimate the asymmetry within CEs we applied the Fisher’s exact test that compared the count of CEs with more conserved anchor motifs and that for more conserved partner motifs in the foreground and background datasets (Table 2).
We assigned to the asymmetry significance −Log10[p-value] the sign “+” in the case of enrichment toward an anchor motif, otherwise, sign “−“ denoted the enrichment toward a partner motif. Next, for the foreground and background datasets of sequences we compiled the full lists of predicted CEs. We classified the conservation of each motif within the ranges of twelve conservation levels as follows [<3.5], [3.5; 3.7], [3.7; 3.9], etc., up to [5.3; 5.5] and [>5.5]. We computed the counts of CEs from foreground and background datasets Obsi,j and Expi,j that had distinct combinations of conservation levels. Here indices i and j denote conservation levels for anchor and partner motifs. Finally, the per mille measure transforms the absolute CE counts to relative ones as follow: {1000 × Obsi,j/Obs)} and {1000 × Expi,j/Exp}.

4.4. Bonferroni Correction for Significance

To take into account multiple comparisons we applied the Bonferroni’s correction and used the following critical values:
  • Significance of CEs regardless motifs conservation, 0.05/(NFOR × NBACK × NFLOW × NTHR × NTHR);
  • Significance of asymmetric CEs toward one of motifs, 0.05/(NFOR × NBACK × NFLOW × 2);
  • CE asymmetry, 0.05/(NFOR × NBACK × NFLOW).
Here NFOR and NBACK means the size of the foreground and background datasets (i.e., the number of peaks and random sequences, which generated in MCOT [11], NFLOW = 5 designates the number of MCOT computation flows and NTHR = 5 means the number of thresholds for each motif).

4.5. Massive Analysis of the ChIP-Seq Data

In the current study, we complemented previously published benchmark ChIP-seq data [11] for human TFs, so the whole collection consisted of 119 ChIP-seq datasets for 45 TFs (Table S1). As in an earlier study [11], for each dataset we annotated the results of the de novo motif search [8], manually selected enriched motifs with respect to the anchor TF and approved the homology between the de novo detected and known motifs [50]. We applied the MCOT as described earlier and above in this study [11,51]. In particular, 396 partner motifs of human TFs were extracted from the Hocomoco human core collection [14,52] (Figure 1A). We used the classification of human and mouse TFs according to the characteristics of their DNA-binding domains [13,53] (Figure 1A). We supplied all partner motifs with the names of respective families and classified all motifs into 67 distinct families of TFs. Since the consequent analysis was based on the recognition of motifs, we performed the pairwise comparison of homology of all partner motifs with the motif comparison tool from the MCOT [11] (p < 0.05 for at least one of two motifs similarity measures). In our analysis we preserved the classification of motifs according to their families [13], but in specific cases we annotated together homologous motifs from various families. In particular, according to previous data [12] we distinguished the following groups of motifs:
  • Jun-like, out of a total 18 motifs of Jun-related {1.1.1}, Fos-related {1.1.2} and Maf-related {1.1.3} families 15 were homologous;
  • ETS-like, out of a total 19 motifs of the ETS-related factors {3.5.2} family 14 were homologous;
  • CTCF-like, two homologous motifs constituted the subfamily CTCF-like factors {2.3.3.50} of the largest family More than three adjacent zinc finger factors {2.3.3} consisting of 76 motifs;
  • Two non-homologous motifs THA11_HUMAN.H11MO.0.B and THAP1_HUMAN.H11MO.0.C constituted the THAP-related factors {2.9.1} family.
We also considered the following motifs, classified according to the TF families:
  • p53-like, all three motifs from family p53-related factors {6.3.1} were homologous;
  • RFX-like, all four motifs from family RFX-related factors {3.3.3} were homologous;
  • GATA-like, all five motifs from family GATA-type zinc fingers {2.2.1} were homologous, we added them to their homologue TAL1_HUMAN.H11MO.0.A from the Tal-related factors {1.2.3} family (the rest of the participants of this family were not homologous to GATA-like motifs);
  • NR1H3-like motifs, four motifs from the thyroid hormone receptor-related factors (NR1) {2.1.2} family (NR1H3_HUMAN.H11MO.0.B, THA_HUMAN.H11MO.0.C, NR1I3_HUMAN.H11MO.0.C and NR1I2_HUMAN.H11MO.0.C) were homologous, this family consisted of 14 motifs; NR1H3-like motifs had close homologous motifs in families of steroid hormone receptors (NR3) {2.1.1} (e.g., ERR1_HUMAN.H11MO.0.A) and RXR-related receptors (NR2) {2.1.3} (e.g., COT2_HUMAN.H11MO.0.A);
  • NFYA-like, all three motifs from the family heteromeric CCAAT-binding factors {4.2.1} were homologous.
We selected for consequent analysis 49 families with at least two motifs among all 67 families respecting all 396 partner motifs. We also included in analysis the CTCF-like factors {2.3.3.50} subfamily, since CTCF-like motifs were previously annotated [12]. Thus, we included in the analysis 50 clades of partner TFs, including 49 families and one subfamily.
We performed the prediction of potential CEs with the MCOT for the benchmark data of 119 ChIP-seq datasets (Table S1). We proposed that homology between an anchor and partner motifs might influence CEs enrichment. Hence, we excluded CEs consisting of significantly similar partner and anchor motifs. We presumed the significant similarity if at least one of two motifs similarity measures used showed the significant similarity (p < 0.05, [11]). We applied Bonferroni’s correction for the significance of CEs (see above) and counted ChIP-seq datasets with significant CEs separately for five MCOT computation flows.
We used the MEGA package to draw trees that showed the similarity of motifs [54,55].

5. Conclusions

  • We proposed the approach for the computation of the significance of co-occurrence of asymmetric CEs anchor–partner with one of the participant motifs more conservative than another one, and for asymmetry within pairs of co-occurred motifs;
  • We applied our approach for motifs of partner TFs from various families over-represented near motifs of anchor TFs in ChIP-seq data;
  • We demonstrated that for partner motifs of almost all families of TFs only for overlapping anchor–partner pairs but not for pairs with a spacer, pairs with a higher conservation of partner motifs were significantly more abundant than those with higher conservation of anchor motifs. This observation explained a substantial portion of ChIP-seq data lacking conserved anchor motifs;
  • We found that the asymmetric CEs toward partner motifs were the most reliable for partner motifs of TFs from ETS family. Hence, motifs of TFs from the ETS family tended to mediate the interaction of anchor TFs with genomic DNA.

Supplementary Materials

Author Contributions

Conceptualization V.L.; Methodology V.L., D.O.; Software V.L.; Investigation V.L., Writing—original draft preparation V.L.; Writing—review and editing, V.L., E.Z. and T.M. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by Russian Foundation for Basic Research Project #18-29-13040 and State Budget Project #0324–2019-0040-C-01.

Acknowledgments

The bioinformatics data analysis was performed in part on the equipment of the Bioinformatics Shared Access Center within the framework of State Assignment Kurchatov Genomic Center of ICG SB RAS (075-15-2019-1662). We are grateful for Sergey Lashin, Alexey Mukhin for technical support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

BSBinding Site
CEComposite Element
DBDDNA Binding Domain
FPRFalse Positive Rate
ETSE26 Transformation Specific
IDRIntrinsically Disordered Region
MCOTMotifs Co-Occurrence Tool
PWMPosition Weight Matrix
TFTranscription Factor
TFBSTranscription Factor Binding Site

References

  1. Morgunova, E.; Taipale, J. Structural perspective of cooperative transcription factor binding. Curr. Opin. Struct. Biol. 2017, 47, 1–8. [Google Scholar] [CrossRef] [PubMed]
  2. Reiter, F.; Wienerroither, S.; Stark, A. Combinatorial function of transcription factors and cofactors. Curr. Opin. Genet. Dev. 2017, 43, 73–81. [Google Scholar] [CrossRef] [PubMed]
  3. Mayran, A.; Drouin, J. Pioneer transcription factors shape the epigenetic landscape. J. Biol. Chem. 2018, 293, 13795–13804. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Lai, X.; Verhage, L.; Hugouvieux, V.; Zubieta, C. Pioneer factors in animals and plants-colonizing chromatin for gene regulation. Molecules 2018, 23, 1914. [Google Scholar] [CrossRef] [Green Version]
  5. Zaret, K.S.; Carroll, J.S. Pioneer transcription factors: Establishing competence for gene expression. Genes Dev. 2011, 25, 2227–2241. [Google Scholar] [CrossRef] [Green Version]
  6. Nagy, G.; Nagy, L. Motif grammar: The basis of the language of gene expression. Comput. Struct. Biotechnol. [CrossRef]
  7. Lloyd, S.M.; Bao, X. Pinpointing the genomic localizations of chromatin-associated proteins: The yesterday, today, and tomorrow of ChIP-seq. Curr. Protoc. Cell Biol. 2019, 84, e89. [Google Scholar] [CrossRef]
  8. Heinz, S.; Benner, C.; Spann, N.; Bertolino, E.; Lin, Y.C.; Laslo, P.; Cheng, J.X.; Murre, C.; Singh, H.; Glass, C.K. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010, 38, 576–589. [Google Scholar] [CrossRef] [Green Version]
  9. Whitington, T.; Frith, M.C.; Johnson, J.; Bailey, T.L. Inferring transcription factor complexes from ChIP-seq data. Nucleic Acids Res. 2011, 39, 98. [Google Scholar] [CrossRef] [Green Version]
  10. Jankowski, A.; Prabhakar, S.; Tiuryn, J. TACO: A general-purpose tool for predicting cell-type-specific transcription factor dimers. BMC Genom. 2014, 15, 208. [Google Scholar] [CrossRef] [Green Version]
  11. Levitsky, V.; Zemlyanskaya, E.; Oshchepkov, D.; Podkolodnaya, O.; Ignatieva, E.; Grosse, I.; Mironova, V.; Merkulova, T. A single ChIP-seq dataset is sufficient for comprehensive analysis of motifs co-occurrence with MCOT package. Nucleic Acids Res. 2019, 47, e139. [Google Scholar] [CrossRef] [PubMed]
  12. Worsley Hunt, R.; Wasserman, W.W. Non-targeted transcription factors motifs are a systemic component of ChIP-seq datasets. Genome Biol. 2014, 15, 412. [Google Scholar] [CrossRef] [PubMed]
  13. Wingender, E.; Schoeps, T.; Haubrock, M.; Krull, M.; Dönitz, J. TFClass: Expanding the classification of human transcription factors to their mammalian orthologs. Nucleic Acids Res. 2018, 46, D343–D347. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Kulakovskiy, I.V.; Vorontsov, I.E.; Yevshin, I.S.; Sharipov, R.N.; Fedorova, A.D.; Rumynskiy, E.I.; Medvedeva, Y.A.; Magana-Mora, A.; Bajic, V.B.; Papatsenko, D.A.; et al. HOCOMOCO: Expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res. 2018, 46, D252–D259. [Google Scholar] [CrossRef]
  15. Wederell, E.D.; Bilenky, M.; Cullum, R.; Thiessen, N.; Dagpinar, M.; Delaney, A.; Varhol, R.; Zhao, Y.; Zeng, T.; Bernier, B.; et al. Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing. Nucleic Acids Res. 2008, 36, 4549–4564. [Google Scholar] [CrossRef] [Green Version]
  16. Wallerman, O.; Motallebipour, M.; Enroth, S.; Patra, K.; Bysani, M.S.; Komorowski, J.; Wadelius, C. Molecular interactions between HNF4a, FOXA2 and GABP identified at regulatory DNA elements through ChIP-sequencing. Nucleic Acids Res. 2009, 37, 7498–7508. [Google Scholar] [CrossRef] [Green Version]
  17. Wang, D.; Garcia-Bassets, I.; Benner, C.; Li, W.; Su, X.; Zhou, Y.; Qiu, J.; Liu, W.; Kaikkonen, M.U.; Ohgi, K.A.; et al. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature 2011, 474, 390–394. [Google Scholar] [CrossRef] [Green Version]
  18. Tsankov, A.M.; Gu, H.; Akopian, V.; Ziller, M.J.; Donaghey, J.; Amit, I.; Gnirke, A.; Meissner, A. Transcription factor binding dynamics during human ES cell differentiation. Nature 2015, 518, 344–349. [Google Scholar] [CrossRef] [Green Version]
  19. Gheorghe, M.; Sandve, G.K.; Khan, A.; Chèneby, J.; Ballester, B.; Mathelier, A. A map of direct TF-DNA interactions in the human genome. Nucleic Acids Res. 2019, 47, e21. [Google Scholar] [CrossRef] [Green Version]
  20. Levitsky, V.G.; Kulakovskiy, I.V.; Ershov, N.I.; Oshchepkov, D.Y.; Makeev, V.J.; Hodgman, T.C.; Merkulova, T.I. Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data. BMC Genom. 2014, 15, 80. [Google Scholar] [CrossRef] [Green Version]
  21. Lambert, S.A.; Jolma, A.; Campitelli, L.F.; Das, P.K.; Yin, Y.; Albu, M.; Chen, X.; Taipale, J.; Hughes, T.R.; Weirauch, M.T. The Human transcription factors. Cell 2018, 172, 650–665. [Google Scholar] [CrossRef] [PubMed]
  22. Ambrosini, G.; Vorontsov, I.; Penzar, D.; Groux, R.; Fornes, O.; Nikolaeva, D.D.; Ballester, B.; Grau, J.; Grosse, I.; Makeev, V.; et al. Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study. Genome Biol. 2020, 21, 114. [Google Scholar] [CrossRef] [PubMed]
  23. Wei, G.H.; Badis, G.; Berger, M.F.; Kivioja, T.; Palin, K.; Enge, M.; Bonke, M.; Jolma, A.; Varjosalo, M.; Gehrke, A.R.; et al. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J. 2010, 29, 2147–2160. [Google Scholar] [CrossRef] [PubMed]
  24. Yu, B.; He, Z.Y.; You, P.; Han, Q.W.; Xiang, D.; Chen, F.; Wang, M.J.; Liu, C.C.; Lin, X.W.; Borjigin, U.; et al. Reprogramming fibroblasts into bipotential hepatic stem cells by defined factors. Cell Stem Cell 2013, 13, 328–340. [Google Scholar] [CrossRef] [Green Version]
  25. Guo, Y.; Mahony, S.; Gifford, D.K. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput. Biol. 2012, 8, e1002638. [Google Scholar] [CrossRef] [Green Version]
  26. Kazemian, M.; Pham, H.; Wolfe, S.A.; Brodsky, M.H.; Sinha, S. Widespread evidence of cooperative DNA binding by transcription factors in Drosophila development. Nucleic Acids Res. 2013, 41, 8237–8352. [Google Scholar] [CrossRef]
  27. Jankowski, A.; Szczurek, E.; Jauch, R.; Tiuryn, J.; Prabhakar, S. Comprehensive prediction in 78 human cell lines reveals rigidity and compactness of transcription factor dimers. Genome Res. 2013, 23, 1307–1318. [Google Scholar] [CrossRef]
  28. Richardson, L.; Venkataraman, S.; Stevenson, P.; Yang, Y.; Burton, N.; Rao, J.; Fisher, M.; Baldock, R.A.; Davidson, D.R.; Christiansen, J.H. EMAGE mouse embryo spatial gene expression database: 2010 update. Nucleic Acids Res. 2010, 38, D703–D709. [Google Scholar] [CrossRef] [Green Version]
  29. Dahl, R.; Ramirez-Bergeron, D.L.; Rao, S.; Simon, M.C. Spi-B can functionally replace PU.1 in myeloid but not lymphoid development. EMBO J. 2002, 21, 2220–2230. [Google Scholar] [CrossRef] [Green Version]
  30. DeKoter, R.P.; Lee, H.J.; Singh, H. PU.1 regulates expression of the interleukin-7 receptor in lymphoid progenitors. Immunity 2002, 16, 297–309. [Google Scholar] [CrossRef] [Green Version]
  31. Verger, A.; Duterque-Coquillaud, M. When Ets transcription factors meet their partners. BioEssays 2002, 24, 362–370. [Google Scholar] [CrossRef] [PubMed]
  32. Hollenhorst, P.C.; McIntosh, L.P.; Graves, B.J. Genomic and biochemical insights into the specificity of ETS transcription factors. Annu. Rev. Biochem. 2011, 80, 437–471. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Cooper, C.D.; Newman, J.A.; Gileadi, O. Recent advances in the structural molecular biology of Ets transcription factors: Interactions, interfaces and inhibition. Biochem. Soc. Trans. 2014, 42, 130–138. [Google Scholar] [CrossRef] [PubMed]
  34. Coyne, H.J., III; De, S.; Okon, M.; Green, S.M.; Bhachech, N.; Graves, B.J.; McIntosh, L.P. Autoinhibition of ETV6 (TEL) DNA binding: Appended helices sterically block the ETS domain. J. Mol. Biol. 2012, 421, 67–84. [Google Scholar] [CrossRef] [Green Version]
  35. Regan, M.C.; Horanyi, P.S.; Pryor, E.E., Jr.; Sarver, J.L.; Cafiso, D.S.; Bushweller, J.H. Structural and dynamic studies of the transcription factor ERG reveal DNA binding is allosterically autoinhibited. Proc. Natl. Acad. Sci. USA 2013, 110, 13374–13379. [Google Scholar] [CrossRef] [Green Version]
  36. Newman, J.A.; Cooper, C.D.; Aitkenhead, H.; Gileadi, O. Structural insights into the autoregulation and cooperativity of the human transcription factor ETS-2. J. Biol. Chem. 2015, 290, 8539–8549. [Google Scholar] [CrossRef] [Green Version]
  37. Currie, S.L.; Lau, D.; Doane, J.J.; Whitby, F.G.; Okon, M.; McIntosh, L.P.; Graves, B.J. Structured and disordered regions cooperatively mediate DNA-binding autoinhibition of ETS factors ETV1, ETV4 and ETV5. Nucleic Acids Res. 2017, 45, 2223–2241. [Google Scholar] [CrossRef] [Green Version]
  38. Lee, G.M.; Donaldson, L.W.; Pufall, M.A.; Kang, H.S.; Pot, I.; Graves, B.J.; McIntosh, L.P. The structural and dynamic basis of Ets-1 DNA binding autoinhibition. J. Biol. Chem. 2005, 280, 7088–7099. [Google Scholar] [CrossRef] [Green Version]
  39. Pufall, M.A.; Lee, G.M.; Nelson, M.L.; Kang, H.S.; Velyvis, A.; Kay, L.E.; McIntosh, L.P.; Graves, B.J. Variable control of Ets-1 DNA binding by multiple phosphates in an unstructured region. Science 2005, 309, 142–145. [Google Scholar] [CrossRef] [Green Version]
  40. Green, S.M.; Coyne, H.J., III; McIntosh, L.P.; Graves, B.J. DNA binding by the ETS protein TEL (ETV6) is regulated by autoinhibition and self-association. J. Biol. Chem. 2010, 285, 18496–18504. [Google Scholar] [CrossRef] [Green Version]
  41. De, S.; Chan, A.C.; Coyne, H.J., III; Bhachech, N.; Hermsdorf, U.; Okon, M.; Murphy, M.E.; Graves, B.J.; McIntosh, L.P. Steric mechanism of auto-inhibitory regulation of specific and non-specific DNA binding by the ETS transcriptional repressor ETV6. J. Mol. Biol. 2014, 426, 1390–1406. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Perez-Borrajero, C.; Lin, C.S.; Okon, M.; Scheu, K.; Graves, B.J.; Murphy, M.; McIntosh, L.P. The biophysical basis for phosphorylation-enhanced DNA-binding autoinhibition of the ETS1 transcription factor. J. Mol. Biol. 2019, 431, 593–614. [Google Scholar] [CrossRef] [PubMed]
  43. Xhani, S.; Lee, S.; Kim, H.M.; Wang, S.; Esaki, S.; Ha, V.; Khanezarrin, M.; Fernandez, G.L.; Albrecht, A.V.; Aramini, J.M.; et al. Intrinsic disorder controls two functionally distinct dimers of the master transcription factor PU.1. Sci. Adv. 2020, 6, eaay3178. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Liu, J.; Perumal, N.B.; Oldfield, C.J.; Su, E.W.; Uversky, V.N.; Dunker, A.K. Intrinsic disorder in transcription factors. Biochemistry 2006, 45, 6873–6888. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Guo, X.; Bulyk, M.L.; Hartemink, A.J. Intrinsic disorder within and flanking the DNA-binding domains of human transcription factors. In Proceedings of the Pacific Symposium on Biocomputing 2012, Kohala Coast, HI, USA, 3–7 January 2012; Altman, R.B., Dunker, A.K., Hunter, L., Murray, T., Klein, T.E., Eds.; World Scientific: Singapore, 2011; pp. 104–115. [Google Scholar] [CrossRef] [Green Version]
  46. Cavalli, M.; Pan, G.; Nord, H.; Wallerman, O.; Wallén Arzt, E.; Berggren, O.; Elvers, I.; Eloranta, M.L.; Rönnblom, L.; Lindblad Toh, K.; et al. Allele-specific transcription factor binding to common and rare variants associated with disease and gene expression. Hum. Genet. 2016, 135, 485–497. [Google Scholar] [CrossRef]
  47. Cavalli, M.; Baltzer, N.; Pan, G.; Bárcenas Walls, J.R.; Smolinska Garbulowska, K.; Kumar, C.; Skrtic, S.; Komorowski, J.; Wadelius, C. Studies of liver tissue identify functional gene regulatory elements associated to gene expression, type 2 diabetes, and other metabolic diseases. Hum. Genom. 2019, 13, 20. [Google Scholar] [CrossRef]
  48. Li, S.; Li, Y.; Li, X.; Liu, J.; Huo, Y.; Wang, J.; Liu, Z.; Li, M.; Luo, X.-J. Regulatory mechanisms of major depressive disorder risk variants. Mol. Psychiatry. [CrossRef]
  49. Deplancke, B.; Alpern, D.; Gardeux, V. The Genetics of Transcription Factor DNA Binding Variation. Cell 2016, 166, 538–554. [Google Scholar] [CrossRef] [Green Version]
  50. Gupta, S.; Stamatoyannopolous, J.A.; Bailey, T.L.; Noble, W.S. Quantifying similarity between motifs. Genome Biol. 2007, 8, R24. [Google Scholar] [CrossRef] [Green Version]
  51. MCOT. Available online: https://gitlab.sysbio.cytogen.ru/academiq/mcot-kernel (accessed on 20 August 2020).
  52. HOCOMOCO. Available online: https://hocomoco11.autosome.ru/ (accessed on 20 August 2020).
  53. Classification of Transcription Factors in Mammalia. Available online: http://tfclass.bioinf.med.uni-goettingen.de/ (accessed on 20 August 2020).
  54. Kumar, S.; Stecher, G.; Li, M.; Knyaz, C.; Tamura, K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 2018, 35, 1547–1549. [Google Scholar] [CrossRef]
  55. MEGA. Available online: https://www.megasoftware.net/ (accessed on 20 August 2020).
Figure 1. The workflow of the motifs co-occurrence tool (MCOT) application in the current study. Basic input data preparation comprises of an application of the de novo motif search tool [8] for a collection of ChIP-seq datasets, and classification of partner motifs from the public library [14] into families according to the structure of the DNA-binding domain [13] (A). Next, MCOT performs composite elements (CEs) classification according to orientations, overlaps or spacers and relationships of motifs conservation (B); MCOT computes significances of enrichment for various CE types, so that Bonferroni’s correction is applied (C) (Section 4.4). Finally, average counts of ChIP-seq datasets possessing certain CE type for all family members reflect their common tendency to participate in specific CEs in the benchmark ChIP-seq data (D).
Figure 1. The workflow of the motifs co-occurrence tool (MCOT) application in the current study. Basic input data preparation comprises of an application of the de novo motif search tool [8] for a collection of ChIP-seq datasets, and classification of partner motifs from the public library [14] into families according to the structure of the DNA-binding domain [13] (A). Next, MCOT performs composite elements (CEs) classification according to orientations, overlaps or spacers and relationships of motifs conservation (B); MCOT computes significances of enrichment for various CE types, so that Bonferroni’s correction is applied (C) (Section 4.4). Finally, average counts of ChIP-seq datasets possessing certain CE type for all family members reflect their common tendency to participate in specific CEs in the benchmark ChIP-seq data (D).
Ijms 21 06023 g001
Figure 2. The difference between observed and expected abundances of CEs with specific conservation of the anchor FoxA2 (axis Y) and partner HNF1β (axis X) motifs for ChIP-seq data [15] in per mille. The conservation of motifs was measured as −Log10(FPR) (logarithmic false positive rate (FPR), Section 4.2). The color on both heatmaps shows the difference between observed (peaks) and expected (permuted sequences) relative abundance of CEs with specific conservation levels (Section 4.3). FoxA2 and HNF1β motifs were derived from the Homer de novo motif search [8] and Hocomoco database (HNF1B_Mouse.H11MO.0.A) [14], respectively. Panels (A) and (B) show asymmetry of CEs with an overlap of motifs and with a spacer, respectively.
Figure 2. The difference between observed and expected abundances of CEs with specific conservation of the anchor FoxA2 (axis Y) and partner HNF1β (axis X) motifs for ChIP-seq data [15] in per mille. The conservation of motifs was measured as −Log10(FPR) (logarithmic false positive rate (FPR), Section 4.2). The color on both heatmaps shows the difference between observed (peaks) and expected (permuted sequences) relative abundance of CEs with specific conservation levels (Section 4.3). FoxA2 and HNF1β motifs were derived from the Homer de novo motif search [8] and Hocomoco database (HNF1B_Mouse.H11MO.0.A) [14], respectively. Panels (A) and (B) show asymmetry of CEs with an overlap of motifs and with a spacer, respectively.
Ijms 21 06023 g002
Figure 3. The analysis of FoxA2 peaks [15] that contained the FoxA2 motifs of moderate and weak conservation. Panel (A) displays fractions of analyzed peaks that contained asymmetric CEs with specific partner motifs. Panel (B) shows the tree of similarity for the selected list of the 30 top-ranked partner motifs from panel A and the respective families of partner transcription factors (TFs) [13]. We took in an analysis of only 43.21% of all peaks with the best scores of peaks in the range of FPR from 5.24 × 10−5 to 5 × 10−4, see the dashed line in panel (A). We applied MCOT package and defined the top-ranked 30 partner motifs from the Hocomoco mouse core collection [14] that did not have a similarity to the anchor motif (p < 0.05, any similarity measure from [11]) and respected the significant asymmetric CEs toward the partner motif. We sorted partner motifs according to the fraction of peaks that contained such asymmetric CEs and computed the cumulative fraction of peaks that contained at least one, two, three, etc., up to 30 types of CE types with various top-ranked partner motifs, see the regular line in panel (A).
Figure 3. The analysis of FoxA2 peaks [15] that contained the FoxA2 motifs of moderate and weak conservation. Panel (A) displays fractions of analyzed peaks that contained asymmetric CEs with specific partner motifs. Panel (B) shows the tree of similarity for the selected list of the 30 top-ranked partner motifs from panel A and the respective families of partner transcription factors (TFs) [13]. We took in an analysis of only 43.21% of all peaks with the best scores of peaks in the range of FPR from 5.24 × 10−5 to 5 × 10−4, see the dashed line in panel (A). We applied MCOT package and defined the top-ranked 30 partner motifs from the Hocomoco mouse core collection [14] that did not have a similarity to the anchor motif (p < 0.05, any similarity measure from [11]) and respected the significant asymmetric CEs toward the partner motif. We sorted partner motifs according to the fraction of peaks that contained such asymmetric CEs and computed the cumulative fraction of peaks that contained at least one, two, three, etc., up to 30 types of CE types with various top-ranked partner motifs, see the regular line in panel (A).
Ijms 21 06023 g003
Figure 4. The scatterplot of abundances of asymmetric CEs toward the anchor (axis X) and asymmetric CEs toward the partner (axis Y) motifs for 50 clades of TFs. These clades comprised of 49 families of TFs with at least two motifs and subfamily CTCF-like factors {2.3.3.50} with two motifs from the Hocomoco human core collection [14]. Total number of ChIP-seq datasets was equal to 119 (Table S1). Only CEs with an overlap of motifs were considered. The diagonal dashed line marks equal numbers of datasets, it implies the partitioning of all clades into those with the higher abundance of asymmetric CEs toward partner motifs (top left triangle, all 50 clades) and those with the higher abundance of asymmetric CEs toward anchor motifs (bottom right triangle without clades).
Figure 4. The scatterplot of abundances of asymmetric CEs toward the anchor (axis X) and asymmetric CEs toward the partner (axis Y) motifs for 50 clades of TFs. These clades comprised of 49 families of TFs with at least two motifs and subfamily CTCF-like factors {2.3.3.50} with two motifs from the Hocomoco human core collection [14]. Total number of ChIP-seq datasets was equal to 119 (Table S1). Only CEs with an overlap of motifs were considered. The diagonal dashed line marks equal numbers of datasets, it implies the partitioning of all clades into those with the higher abundance of asymmetric CEs toward partner motifs (top left triangle, all 50 clades) and those with the higher abundance of asymmetric CEs toward anchor motifs (bottom right triangle without clades).
Ijms 21 06023 g004
Figure 5. The significance of enrichment of asymmetric CEs toward the partner motifs as a function of CE abundance. The scatterplot shows 50 clades of partner TFs, including 49 families of TFs with at least two motifs and subfamily CTCF-like factors {2.3.3.50} with two motifs from the human core Hocomoco collection [14]. The total number of ChIP-seq datasets is 119 (Table S1). Axis X implies the number of ChIP-seq datasets with predicted CEs with an overlap of anchor motifs and partner motifs from a specific clade and without taking into account motifs conservation. Axis Y shows the significance of the Welch’s t-test that for each clade compare the number of datasets containing asymmetric CEs toward partner motifs and overlaps of motifs and the respective number of datasets containing asymmetric CEs toward anchor motifs. The horizontal dashed line marks Bonferroni’s correction for the t-test significance, −Log10(p-value) = 3.
Figure 5. The significance of enrichment of asymmetric CEs toward the partner motifs as a function of CE abundance. The scatterplot shows 50 clades of partner TFs, including 49 families of TFs with at least two motifs and subfamily CTCF-like factors {2.3.3.50} with two motifs from the human core Hocomoco collection [14]. The total number of ChIP-seq datasets is 119 (Table S1). Axis X implies the number of ChIP-seq datasets with predicted CEs with an overlap of anchor motifs and partner motifs from a specific clade and without taking into account motifs conservation. Axis Y shows the significance of the Welch’s t-test that for each clade compare the number of datasets containing asymmetric CEs toward partner motifs and overlaps of motifs and the respective number of datasets containing asymmetric CEs toward anchor motifs. The horizontal dashed line marks Bonferroni’s correction for the t-test significance, −Log10(p-value) = 3.
Ijms 21 06023 g005
Figure 6. Clustering of 30 top-ranked partner motifs from the Hocomoco human core collection [14], according to their abundance in CEs predicted with an overlap of anchor motifs. We excluded from the analysis CEs containing the significant homology between motifs. The left/middle/right columns show the tree constructed according to motifs homology, names of TF families [13] and the distribution of the number of ChIP-seq datasets that contained respective CEs. Brown, green, red, orange, blue and aqua boxes mark NR1H3-like motifs from thyroid hormone receptor-related factors (NR1) {2.1.2} family, Jun-like (Maf-related factors {1.1.3}), ETS-like (ETS-related factors {3.5.2}), RFX-like (RFX-related factors {3.3.3}, p53-like (p53-related factors {6.3.1}) and GATA-like (Tal-related factors {1.2.3}) motifs, respectively. Totally, we included in the analysis 119 ChIP-seq datasets for human TFs (Table S1).
Figure 6. Clustering of 30 top-ranked partner motifs from the Hocomoco human core collection [14], according to their abundance in CEs predicted with an overlap of anchor motifs. We excluded from the analysis CEs containing the significant homology between motifs. The left/middle/right columns show the tree constructed according to motifs homology, names of TF families [13] and the distribution of the number of ChIP-seq datasets that contained respective CEs. Brown, green, red, orange, blue and aqua boxes mark NR1H3-like motifs from thyroid hormone receptor-related factors (NR1) {2.1.2} family, Jun-like (Maf-related factors {1.1.3}), ETS-like (ETS-related factors {3.5.2}), RFX-like (RFX-related factors {3.3.3}, p53-like (p53-related factors {6.3.1}) and GATA-like (Tal-related factors {1.2.3}) motifs, respectively. Totally, we included in the analysis 119 ChIP-seq datasets for human TFs (Table S1).
Ijms 21 06023 g006
Figure 7. Clustering of the 30 top-ranked partner motifs from the Hocomoco human core collection [14] according to their abundance in CEs predicted with an overlap of anchor motifs. We excluded from the analysis CEs containing the significant homology between motifs. Panels (A,B) show results for CEs with more conserved anchor and partner motifs, respectively. For each panel the left/middle/right columns show the tree constructed according to motifs homology, names of TF families [13] and the distribution of the number of ChIP-seq datasets that contained the respective CEs. Brown, green, red, orange, blue, cyan and aqua boxes mark NR1H3-like motifs from the thyroid hormone receptor-related factors (NR1) {2.1.2} family, Jun-like (Maf-related factors {1.1.3}), ETS-like (ETS-related factors {3.5.2}), RFX-like (RFX-related factors {3.3.3}, p53-like (p53-related factors {6.3.1}), THAP-related factors {2.9.1} and GATA-like (Tal-related factors {1.2.3}) motifs, respectively. Totally, we included in the analysis 119 ChIP-seq datasets for human TFs (Table S1).
Figure 7. Clustering of the 30 top-ranked partner motifs from the Hocomoco human core collection [14] according to their abundance in CEs predicted with an overlap of anchor motifs. We excluded from the analysis CEs containing the significant homology between motifs. Panels (A,B) show results for CEs with more conserved anchor and partner motifs, respectively. For each panel the left/middle/right columns show the tree constructed according to motifs homology, names of TF families [13] and the distribution of the number of ChIP-seq datasets that contained the respective CEs. Brown, green, red, orange, blue, cyan and aqua boxes mark NR1H3-like motifs from the thyroid hormone receptor-related factors (NR1) {2.1.2} family, Jun-like (Maf-related factors {1.1.3}), ETS-like (ETS-related factors {3.5.2}), RFX-like (RFX-related factors {3.3.3}, p53-like (p53-related factors {6.3.1}), THAP-related factors {2.9.1} and GATA-like (Tal-related factors {1.2.3}) motifs, respectively. Totally, we included in the analysis 119 ChIP-seq datasets for human TFs (Table S1).
Ijms 21 06023 g007aIjms 21 06023 g007b
Figure 8. The “permanent” model of cooperative binding of an anchor and partner TFs for the explanation of a substantial portion of ChIP-seq data lacking conserved motifs of anchor TFs. Panel (A) is in respect to the most conserved motifs of an anchor TF in a ChIP-seq dataset, such motifs are in most cases overrepresented and successively recognized as the canonical motif of anchor TF. However, an anchor TF often participates in TF–TF interactions with multiple partner TFs. Thus, a whole conservation of anchor–partner CE is subdivided between anchor and partner motifs. We propose here two options: an anchor motif preserves the higher conservation than a partner motif (B), or an anchor motif has less conserved motif than a partner motif (C). Finally, an anchor TF binds to DNA indirectly (D), e.g., if a heterodimer of the anchor/partner TFs binds with DNA only through partner TF. The long arrow in the bottom reflects the permanent decrease/increase of the conservation of an anchor/partner motif. Numbers of red/blue arrows between each TF and DNA reflect the conservation of the respective motif.
Figure 8. The “permanent” model of cooperative binding of an anchor and partner TFs for the explanation of a substantial portion of ChIP-seq data lacking conserved motifs of anchor TFs. Panel (A) is in respect to the most conserved motifs of an anchor TF in a ChIP-seq dataset, such motifs are in most cases overrepresented and successively recognized as the canonical motif of anchor TF. However, an anchor TF often participates in TF–TF interactions with multiple partner TFs. Thus, a whole conservation of anchor–partner CE is subdivided between anchor and partner motifs. We propose here two options: an anchor motif preserves the higher conservation than a partner motif (B), or an anchor motif has less conserved motif than a partner motif (C). Finally, an anchor TF binds to DNA indirectly (D), e.g., if a heterodimer of the anchor/partner TFs binds with DNA only through partner TF. The long arrow in the bottom reflects the permanent decrease/increase of the conservation of an anchor/partner motif. Numbers of red/blue arrows between each TF and DNA reflect the conservation of the respective motif.
Ijms 21 06023 g008
Table 1. 2 × 2 contingency tables for the calculation of the significance of CE. We applied this table for computation of the CE significance regardless motifs conservation (this is in respect to all CEs of the scatterplot in Figure 1B) and CE significances for asymmetric CE with more conserved one or another motif (these cases are in respect to two triangles of the same scatterplot).
Table 1. 2 × 2 contingency tables for the calculation of the significance of CE. We applied this table for computation of the CE significance regardless motifs conservation (this is in respect to all CEs of the scatterplot in Figure 1B) and CE significances for asymmetric CE with more conserved one or another motif (these cases are in respect to two triangles of the same scatterplot).
Categories of SequencesCount of Sequences
With CEWithout CE
ForegroundObsCE+ObsCE-
BackgroundExpCE+ExpCE-
Table 2. 2 × 2 contingency tables for calculation of CE asymmetry.
Table 2. 2 × 2 contingency tables for calculation of CE asymmetry.
Categories of CEsCount of CEs with More Conserved
Anchor MotifPartner Motif
ForegroundObsCE,AnchorObsCE,Partner
BackgroundExpCE,AnchorExpCE,Partner

Share and Cite

MDPI and ACS Style

Levitsky, V.; Oshchepkov, D.; Zemlyanskaya, E.; Merkulova, T. Asymmetric Conservation within Pairs of Co-Occurred Motifs Mediates Weak Direct Binding of Transcription Factors in ChIP-Seq Data. Int. J. Mol. Sci. 2020, 21, 6023. https://0-doi-org.brum.beds.ac.uk/10.3390/ijms21176023

AMA Style

Levitsky V, Oshchepkov D, Zemlyanskaya E, Merkulova T. Asymmetric Conservation within Pairs of Co-Occurred Motifs Mediates Weak Direct Binding of Transcription Factors in ChIP-Seq Data. International Journal of Molecular Sciences. 2020; 21(17):6023. https://0-doi-org.brum.beds.ac.uk/10.3390/ijms21176023

Chicago/Turabian Style

Levitsky, Victor, Dmitry Oshchepkov, Elena Zemlyanskaya, and Tatyana Merkulova. 2020. "Asymmetric Conservation within Pairs of Co-Occurred Motifs Mediates Weak Direct Binding of Transcription Factors in ChIP-Seq Data" International Journal of Molecular Sciences 21, no. 17: 6023. https://0-doi-org.brum.beds.ac.uk/10.3390/ijms21176023

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop