Next Article in Journal
Distribution of FMR1 and FMR2 Repeats in Argentinean Patients with Primary Ovarian Insufficiency
Next Article in Special Issue
Advances in Genomic Profiling and Analysis of 3D Chromatin Structure and Interaction
Previous Article in Journal
Large Introns of 5 to 10 Kilo Base Pairs Can Be Spliced out in Arabidopsis
Previous Article in Special Issue
Evolutionary Origins of Cancer Driver Genes and Implications for Cancer Prognosis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mutation Clusters from Cancer Exome

1
Quantigic® Solutions LLC, 1127 High Ridge Road #135, Stamford, CT 06905, USA
2
Business School & School of Physics 240, Free University of Tbilisi, David Agmashenebeli Alley, 0159 Tbilisi, Georgia
3
Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857
*
Author to whom correspondence should be addressed.
Disclaimer: This address is used by the corresponding author for no purpose other than to indicate his professional affiliation as is customary in publications. In particular, the contents of this paper are not intended as an investment, legal, tax or any other such advice and in no way represent the views of Quantigic® Solutions LLC, the website www.quantigic.com or any of their other affiliates.
Submission received: 19 June 2017 / Revised: 26 July 2017 / Accepted: 7 August 2017 / Published: 15 August 2017
(This article belongs to the Special Issue Integrative Genomics and Systems Medicine in Cancer)

Abstract

:
We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development.

1. Introduction and Summary

Unless humanity finds a cure, about a billion people alive today will die of cancer. Unlike other diseases, cancer occurs at the DNA level via somatic alterations in the genome. A common type of such mutations found in cancer is due to alterations to single bases in the genome (single nucleotide variations (SNVs)). These alterations are accumulated throughout the lifespan of an individual via various mutational processes, such as imperfect DNA replication during cell division or spontaneous cytosine deamination [1,2], or due to exposures to chemical insults or ultraviolet radiation [3,4], etc. The footprint left by these mutations in the cancer genome is characterized by distinctive alteration patterns known as cancer signatures.
Identifying all cancer signatures would greatly facilitate progress in understanding the origins of cancer and its development. Therapeutically, if there are common underlying structures across different cancer types, then treatment for one cancer type might be applicable to other cancer types, which would be great news. From a diagnostic viewpoint, the identification of all underlying cancer signatures would aid cancer detection and identification methodologies, including vital early detection [5]—according to American Cancer Society, late stage metastatic cancers of unknown origin represent about 2% of all cancers [6] and can make treatment almost impossible. Another practical application is prevention by pairing the signatures extracted from cancer samples with those caused by known carcinogens (e.g., tobacco, aflatoxin, UV radiation, etc.). At the end of the day, it all boils down to the question of usefulness: is there a small enough number of cancer signatures underlying all (100+) known cancer types, or is this number too large to be meaningful/useful? Thus, if we focus on 96 mutation categories of SNVs [7], we cannot have more than 96 signatures [8]. Even if the number of true underlying signatures is, say, of order 50, it is unclear whether they would be useful, especially within practical applications. On the other hand, if there are only about a dozen underlying cancer signatures, then the hope for an order of magnitude simplification may well be warranted.
The commonly-used method for extracting cancer signatures [9] is based on nonnegative matrix factorization (NMF) [10,11]. Thus, one analyzes SNV patterns in a cohort of DNA sequenced whole cancer genomes and organizes the data into a matrix G i μ , where the rows correspond to the N = 96 mutation categories, the columns correspond to d samples and each element is a nonnegative occurrence count of a given mutation category in a given sample. Under NMF, the matrix G is then approximated via G W H , where W i A is an N × K matrix, H A μ is a K × d matrix and both W and H are nonnegative. The appeal of NMF is its biologic interpretation, whereby the K columns of the matrix W are interpreted as the weights with which the K cancer signatures contribute to the N = 96 mutation categories, and the columns of the matrix H are interpreted as the exposures to these K signatures in each sample. The price to pay for this is that NMF, which is an iterative procedure, is computationally costly, and depending on the number of samples d, it can take days or even weeks to run it. Furthermore, NMF does not fix the number of signatures K, which must be either guessed or obtained via trial and error, thereby further adding to the computational cost. Perhaps most importantly, NMF is a nondeterministic algorithm and produces a different matrix W in each run. (Each W corresponds to one in myriad local minima of the NMF objective function.) This is dealt with by averaging over many such W matrices obtained via multiple NMF runs (or samplings). However, each run generally produces a weights matrix W i A with columns (i.e., signatures) not aligned with those in other runs. Aligning or matching the signatures across different runs (before averaging over them) is typically achieved via nondeterministic clustering such as k-means. Therefore, the result, even after averaging, generally is both noisy [12] and nondeterministic, i.e., if this computationally-costly procedure (which includes averaging) is run again and again on the same data, generally it will yield different looking cancer signatures every time. Simply put, the NMF-based method for extracting cancer signatures is not designed to be even in-sample stable. Under these circumstances, out-of-sample stability cannot even be feasible (i.e., cancer signatures obtained from non-overlapping sets of samples can be dramatically different, and out-of-sample stability is crucial for practical usefulness, e.g., diagnostically).
Without in- and out-of-sample stability, practical therapeutic and diagnostic applications of cancer signatures would be challenging. For instance, suppose one sequences genome (or exome; see below) data from a patient sample (be it via a liquid biopsy, a blood test or some other (potentially novel) method). Let us focus on SNVs. We have a vector of occurrence counts for 96 mutation categories. We need a quick computational test to determine with a high enough confidence level whether (i) there is a cancer signature present in this data and (ii) which cancer type this cancer signature corresponds to (i.e., in which organ the cancer originated). If cancer signatures are not even in-sample stable, then we cannot trust them. They could simply be noise. Indeed, there is always somatic mutational noise present in such data, and this must be factored out of the data before extracting cancer signatures. A simple way to understand somatic mutational noise is to note that mutations (i) are already present in humans unaffected by cancer and (ii) such mutations, which are unrelated to cancer, are further exacerbated when cancer occurs, as it disrupts the normal operation of various processes (including repair) in the DNA. At the level of the data matrix G, in [13], we discussed a key component of the somatic mutational noise and gave a prescription for removing it [14]. However, there likely exist other, deeper sources of somatic mutational noise, which must be further identified and carefully factored out. Simply put, somatic mutational noise unequivocally is a substantial source of systematic error in cancer signatures.
However, then there is also the statistical error, which is large and due to the nondeterministic nature of NMF discussed above. This statistical error is exacerbated by the somatic mutational noise, but would be present even if this noise were somehow completely factored out. Therefore, the in-sample instability must somehow be addressed. We emphasize that, a priori, this does not automatically address out-of-sample stability, without which any therapeutic or diagnostic applications would still be farfetched. However, without in-sample stability, nothing is clear.
The problem at hand is nontrivial and requires a step-by-step approach, including identification of various sources of in-sample instability. One simple observation of [13] is that, if we work directly with occurrence counts G i μ for individual samples, (i) the data are very noisy and (ii) the number of signatures is bound to be too large to be meaningful/useful if the number of samples is large. A simple way to deal with this is to aggregate samples by cancer types. In doing so, we have a matrix G i s , where s now labels cancer types, which is (i) less noisy and (ii) much smaller ( 96 × n , where n is the number of cancer types), so the number of resultant signatures is much more reasonable [15]. Thus, such aggregation is helpful.
Still, even with aggregation, we must address nondeterminism (of NMF). To circumvent this, in [16], we proposed an alternative approach that bypasses NMF altogether. As we argue in [16], NMF is, at least to a certain degree, clustering in disguise, e.g., many COSMIC cancer signatures [17] obtained via NMF (augmented with additional heuristics based on biologic intuition and empirical observations) exhibit clustering substructure, i.e., in many of these signatures, there are mutation categories with high weights (“peaks” or “tall mountain landscapes”) with other mutation categories having small weights likely well within statistical and systematic errors. For all practical purposes, such low weights could be set to zero. Then, many cancer signatures would start looking like clusters, albeit some clusters could be overlapping between different signatures. Considering that various signatures may be somatic mutational noise artifacts in the first instance and statistical error bars are large, it is natural to wonder whether there are some robust underlying clustering structures present in the data, with the understanding that such structures may not be present for all cancer types. However, even if they are present for a substantial number of cancer types, unveiling them would amount to a major step forward in understanding cancer signature structure.
To address this question, in [16], we proposed a new clustering algorithm termed *K-means. Its basic building block is the vanilla k-means algorithm, which computationally is very inexpensive. However, it is also nondeterministic. *K-means uses two machine learning levels on top of k-means to achieve statistical determinism (see Section 2 for details) [18], without any initialization of the centers [19]. Once *K-means fixes the clustering, it turns out that the weights and exposures can be computed using (normalized) regressions [16], thereby altogether bypassing computationally-costly NMF. In [16], we applied this method to cancer genome data corresponding to 1389 published samples for 14 cancer types. We found that clustering works well for 10 out the 14 cancer types; the metrics include within-cluster correlations and overall fit quality. This suggests that there is indeed a clustering substructure present in the underlying cancer genome data, at least for most cancer types [20]. This is encouraging.
In this paper, we apply the method of [16] to exome data consisting of 10,656 published samples (sample IDs with sources are in Appendix A) aggregated by 32 cancer types. *K-means produces a robustly-stable clustering (11 clusters) from these data. One motivation for using exome data is that the exome is a small subset (∼1%) of the full genome containing only protein-coding regions of the genome [21]. The exome is much less expensive and less time consuming to sequence, which can be especially important for early-stage diagnostics, than the whole genome, yet it encodes important information about cancer signatures. As we discuss in the subsequent sections, our method appears to work well on exome data for most cancer types. In fact, overall, it appears to work better than COSMIC signatures, including out-of-sample, when applying clusters derived from our exome data to genome data.

2. *K-means

In [16], by extending a prior work [22] in quantitative finance on building statistical industry classifications using clustering algorithms, we developed a clustering method termed *K-means (“star K-means”) and applied it to the extraction of cancer signatures from genome data. *K-means is anchored on the standard k-means algorithm (see [23,24,25,26,27,28,29]) as its basic building block. However, k-means is not deterministic. *K-means is statistically deterministic, without specifying initial centers. This is achieved via two machine learning levels sitting on top of k-means. At the first level, we aggregate a large number M of k-means clusterings with randomly initialized centers (and the number of target clusters fixed using eRank) via a nontrivial aggregation procedure; see [16] for details. This aggregation is based on clustering (again, using k-means) the centers produced in the M clusterings, so the resultant aggregated clustering is nondeterministic. However, it is a lot less nondeterministic than vanilla k-means clusterings as aggregation dramatically reduces the degree of nondeterminism. At the second level, we take a large number P of such aggregated clusterings and determine the “ultimate” clustering with the maximum occurrence count (among the P aggregations). For sufficiently large M and P, the “ultimate” clustering is stable, i.e., if we run the algorithm over and over again, we will get the same “ultimate” clustering every time, even though the occurrence counts within different P aggregations are going to be different for various aggregations. What is important here is that the most frequently-occurring (“ultimate”) aggregation remains the same run after run. We emphasize that *K-means is a universal algorithm, and its application is not limited to the cancer genome or exome. We discuss how the input data (i.e., matrices of somatic mutation counts for cancer exome) are used in the context of *K-means in Section 3.2 (see [16] for technical details of *K-means).

3. Empirical Results

3.1. Data Summary

In this paper, we apply *K-means to exome data. (In [16], we applied it to published genome data. In this work, apart from applying *K-means to exome data, we also perform out-of-sample stability analysis of our results here (see Section 4).) We use data consisting of 10,656 published exome samples aggregated by 32 cancer types listed in Table 1, which summarizes total occurrence counts, numbers of samples and data sources. Appendix A provides sample IDs together with references for the data sources. Occurrence counts for the 96 mutation categories for each cancer type are given in Table A1, Table A2, Table A3 and Table A4. For Tables and Figures labeled A⋆, see Appendix A.

3.1.1. Structure of the Data

The underlying data consist of matrices [ G ( s ) ] i μ ( s ) whose elements are occurrence counts of mutation categories labeled by i = 1 , , N = 96 in samples labeled by μ ( s ) = 1 , , d ( s ) . Here,  s = 1 , , n labels n different cancer types (in our case n = 32 ). We can choose to work with individual matrices [ G ( s ) ] i μ ( s ) or with the N × d t o t “big matrix” Γ obtained by appending (i.e., bootstrapping) the matrices [ G ( s ) ] i μ ( s ) together column-wise (so d t o t = s = 1 n d ( s ) ). Alternatively, we can aggregate samples by cancer types and work with the so-aggregated matrix:
G i s = μ ( s ) = 1 d ( s ) [ G ( s ) ] i μ ( s )
Generally, individual matrices [ G ( s ) ] i μ ( s ) and, thereby, the “big matrix” Γ contain much noise. For some cancer types, we can have a relatively small number of samples. We can also have “sparsely-populated” data, i.e., with many zeros for some mutation categories. In fact, different samples are not even necessarily uniformly normalized. To mitigate the aforementioned issues, following [13], here, we work with the N × n matrix G i s with samples aggregated by cancer types. Below, we apply *K-means to G i s .

3.2. Exome Data Results

The 96 × 32 matrix G i s given in Table A1, Table A2, Table A3 and Table A4 is what we pass into the function bio.cl.sigs() in Appendix A of [16] as the input matrix x. We use: iter.max = 100 (this is the maximum number of iterations used in the built-in R function kmeans(); we note that there was not a single instance in our 30 million runs of kmeans() where more iterations were required – the R function kmeans() produces a warning if it does not converge within iter.max); num.try = 1000 (this is the number of individual k-means samplings we aggregate every time); and num.runs = 30,000 (which is the number of aggregated clusterings we use to determine the “ultimate”, that is the most frequently occurring, clustering). More precisely, we ran three batches with num.runs = 10,000 as a sanity check, to make sure that the final result based on 30,000 aggregated clusterings was consistent with the results based on smaller batches, i.e., that it was stable from batch to batch [30]. Based on Table A5, we identify Clustering-E1 as the “ultimate” clustering (see Section 2). Also, it is evident that the top-10 clusterings in Table A5 essentially are variations of each other.
For Clustering-E1, as in [16], we compute the within-cluster weights based on unnormalized regressions (via Equations (13)–(15) in [16]) and normalized regressions (via Equations (14), (16) and (17) in [16]) with exposures calculated based on arithmetic averages (see Section 2.6 of [16] for details). We give the within-cluster weights for Clustering-E1 in Table A6 and Table A7 and plot them in Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7, Figure A8, Figure A9, Figure A10 and Figure A11 for unnormalized regressions and in Table 2 and Table 3 and Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11 for normalized regressions. The actual mutation categories in each cluster can be read off the aforesaid Table A6 and Table A7 with the weights (thus, the mutation categories with nonzero weights belong to a given cluster), or from the horizontal axis labels in the aforesaid Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7, Figure A8, Figure A9, Figure A10 and Figure A11.

3.3. Reconstruction and Correlations

3.3.1. Within-Cluster Correlations

We have our data matrix G i s . We are approximating this matrix via the following factorized matrix:
G i s * = A = 1 K W i A H A s = w i H Q ( i ) , s
where W i A are the within-cluster weights ( i = 1 , , N ; A = 1 , K ), H A s are the exposures ( s = 1 , , n = 32 labels the cancer types), Q : { 1 , , N } { 1 , , K } is the map between the N = 96 mutations and K = 11 clusters in Clustering-E1, and we have W i A = w i δ Q ( i ) , A [31]. It is the matrix W i A that is given in Table A6 and Table A7 for the unnormalized regressions and Table 2 and Table 3 for the normalized regressions.
We can now compute an n × K matrix Θ s A of within-cluster cross-sectional correlations between G i s and G i s * defined via ( xCor ( · , · ) stands for “cross-sectional correlation”, i.e., “correlation across the index i” – due to the factorized structure (2), these correlations do not directly depend on H A s )
Θ s A = xCor ( G i s , G i s * ) i J ( A ) = xCor ( G i s , w i ) i J ( A )
Here, J ( A ) = { i | Q ( i ) = A } is the set of mutations labeled by i that belong to a given cluster labeled by A. We give the matrix Θ s A for Clustering-E1 for weights based on unnormalized regressions in Table 4 and weights based on normalized regressions in Table 5. As for genome data [16], the fit for normalized regressions is somewhat better than that for unnormalized regressions.

3.3.2. Overall Correlations

Another useful metric, which we use as a sanity check, is this. For each value of s (i.e., for each cancer type), we can run a linear cross-sectional regression (without the intercept) of G i s over the matrix W i A . Therefore, we have n = 32 of these regressions. Each regression produces multiple R 2 and adjusted R 2 , which we give in Table 4 and Table 5. Furthermore, we can compute the fitted values G ^ i s * based on these regressions, which are given by:
G ^ i s * = A = 1 K W i A F A s = w i F G ( i ) , s
where (for each value of s) F A s are the regression coefficients. We can now compute the overall cross-sectional correlations (i.e., the index i runs over all N = 96 mutation categories)
Ξ s = xCor ( G i s , G ^ i s * )
These correlations are also given in Table 4 and Table 5 and measure the overall fit quality.

3.3.3. Interpretation

Looking at Table 5, a few things jump out. First, most—24 out of 32—cancer types have high (80%+) within-cluster correlations with at least one cluster. Out of the other eight cancer types, six have reasonably high (70%+) within-cluster correlations with at least one cluster. The remaining two cancer types are X9 (cervical cancer) and X17 (liver cancer). In [16], based on genome data, we already observed that liver cancer does not have a clustering structure, so this is not surprising. On the other hand, with cervical cancer, the story appears to be trickier. According to [17], we should expect COSMIC signatures CSig2+13 and CSig26 (see Section 4 for more details) to appear in cervical cancer. According to Table A8 (see Section 4), CSig2+13 indeed have high correlations with X9 (but not CSig26). On the other hand, the dominant part of CSig2 (C > T mutations in TCA, TCC, TCG, TCT) is subsumed in Cluster Cl-10 (see Figure 10), and the dominant part of CSig13 (C > G mutations in TCA, TCC, TCT) is subsumed in Cluster Cl-9 (see Figure 9). Basically, it appears that the large (each with 16 mutation categories) Clusters Cl-9, Cl-10 and Cl-11 probably could be split into smaller clusters. In fact, Cl-9 and Cl-11 do not have 80%+ correlations with any cancer types (they do have 70%+ correlations with one cancer type each). This is another indication that these clusters might be “oversized”. The same was observed with the largest cluster (with 21 mutation categories) in [16] in the context of genome data. Simply put, these “oversized” clusters may have to be dealt with via appropriately tweaking the underlying clustering algorithm (this is outside of the scope hereof and will be dealt with elsewhere).
The last three columns in Table 5 provide metrics for the overall fit for each cancer type. The overall correlations (between the original data G i s and the model-fitted values G ^ i s * ; see Section 3.3.2) in the last column of Table 5 are above 80% for 16 (out of the 32) cancer types and above 70% for 26 cancer types. These high correlations indicate a good in-sample agreement between the original and reconstructed (model-fitted) data for each of these 26 cancer types. The remaining six cancer types, which all have overall correlations above 60%, are: X4 (B-cell lymphoma), X6 (bladder cancer), X8 (breast cancer), X9 (cervical cancer), X26 (rectum adenocarcinoma) and X29 (testicular germ cell tumor). We already discussed cervical cancer above. We address breast cancer in Section 4 hereof. Now, the X4 data are sparsely populated: there are 24 samples, and the total number of counts is 706, so there are many zeros in the underlying sample data, albeit only two zeros in the aggregated data. According to [17], we should expect CSig9 and CSig17 in B-cell lymphoma. However, according to Table A8 (see Section 4), these signatures do not have high correlations with X4. Note that clustering worked well for B-cell lymphoma for the genome data in [16], but there, the genome data were well-populated. Therefore, it is reasonable to assume that here, the “underperformance” is likely due to the sparsity of the underlying data. For X6 (bladder cancer), the situation is similar to X9 (cervical cancer) above: according to [17], we should expect CSig2+13 in bladder cancer, and Table A8 is consistent with this. However, as mentioned above, CSig2 and CSig13 are subsumed in Clusters Cl-10 and Cl-9, respectively (“oversizing”). According to Table A9, we should expect CSig10 in X26. CSig10 to be dominated by the C > A mutation in TCT (which is subsumed in Cluster Cl-9) and the C > T mutation in TCG (which is subsumed in Cluster Cl-10). Again, here we are dealing with “oversizing” of these clusters. X29 has high within-cluster correlations with Clusters Cl-4 and Cl-5. The overall fit correlation apparently is lowered by the high negative correlation with Cluster Cl-3. To summarize, “oversizing” is one potential “shortcoming” here.

4. Concluding Remarks

In order to understand the significance of our results, let us compare them to the fit that COSMIC signatures (for details, see [17]; for references, see [9,32,33,34,35]) provide for our exome data. We can do this by computing the following p × n cross-sectional correlation matrix:
Δ α s = xCor ( U i α , G i s )
where U i α ( α = 1 , , p ) is the N × p matrix of weights for p = 30 COSMIC signatures, which for brevity, we will refer to as CSig1, ..., CSig30 [36]. The matrix Δ α s is given in Table A8 and Table A9. Let us look at the 80%+ correlations (which are in bold font in Table A8 and Table A9). (Relaxing this cut-off to 70% (see Table A8 and Table A9) does not alter our conclusions below.) Only six out 30 COSMIC signatures, to wit CSig1,2,6,7,10,15, have 80%+ correlations with the exome data for the 32 cancer types. The aetiology of these signatures is known [17]. CSig1 is the result of an endogenous mutational process initiated by spontaneous 5-methylcytosine deamination, hence the ubiquity of its high correlations with many cancer types. CSig2 (which usually appears in tandem with CSig13) is due to APOBEC-mediated cytosine deamination, hence its high correlations with some cancer types. CSig6 is associated with defective DNA mismatch repair, hence its high correlations with several cancer types. CSig7 is due to ultraviolet light exposure, so its high correlation with X19 (melanoma) is spot on [37]. CSig10 is associated with recurrent error-prone polymerase POLE somatic mutations (its high correlations with X26 (rectum adenocarcinoma) and X32 (uterine cancer) are consistent with [17] and, once again, apparently are due to a large overlap between the exome data we use here and those used by [17]). CSig15 is associated with defective DNA mismatch repair; the significance of its high correlation with X23 (pancreatic cancer) is unclear. Therefore, only a handful of COSMIC signatures, all associated with known mutational processes, do well on our exome data [38]. Others do not fit well.
This is the out-of-sample stability issue emphasized in [13]. It traces to the fact that NMF is an intrinsically unstable method, both in- and out-of-sample. In-sample instability relates to the fact that NMF is nondeterministic and produces different looking signatures from one run to another. In fact, we attempted running NMF on our exome data. We ran three batches with 800 sampling in each batch (a computationally time-consuming procedure [39]). The three batches produced different looking results, which with much manual curation could only be partially matched to some COSMIC signatures, but this matching was different and highly unstable across the three batches. Simply put, NMF failed to produce any meaningful results on our exome data. Furthermore, the above discussion illustrates that most COSMIC signatures (extracted using NMF from exome and genome data) apparently are unstable out-of-sample, e.g., when applied to our exome data aggregated by cancer types. Here, one may argue that exome data contain only partial information, and NMF should not be used on it. However, the COSMIC signatures are in fact based on 10,952 exomes and 1048 whole-genomes across 40 cancer types [17] (also, see, e.g., [40]). The difference here is that we are aggregating samples by cancer types, and most COSMIC signatures apparently do not apply, which means that COSMIC signatures are highly sample-set-specific (that is, unstable out-of-sample). Furthermore, as mentioned above, CSig7 (UV exposure) is spot on in that it has 99.66% correlation with X19 (melanoma) (albeit one should keep in mind the comments in [37]). Therefore, one can argue that the culprit is not the exome data, but the method (NMF) itself. To quantify this, let us look at correlations of COSMIC signatures with genome data for 14 cancer types used in [13] and [16]. The results are given in Table A10. As in the case of exome data, here too, we have high correlations only for a handful of COSMIC signatures corresponding to known mutational processes, to wit CSig1,4,6,13. Therefore, most COSMIC signatures do not appear to have explanatory power on genome data aggregated by cancer types, a further indication that most COSMIC signatures lack out-of-sample stability.
What about out-of-sample stability for our clusters we obtained from exome data? One way to test this is to look at within-cluster correlations and the overall fit metrics as in Table 5, but for the aforesaid genome data for 14 cancer types used in [13,16]. The results are given in Table 6. Unsurprisingly, the quality of the fit for genome data (out-of-sample) is not as good as for exome data (in-sample). However, it is (i) reasonably good and (ii) unequivocally much better than the fit provided by the COSMIC signatures (Table A10). Furthermore, the 11 exome-based clusters have a poor overall fit for G.X4 (breast cancer), G.X8 (liver cancer), G.X9 (lung cancer) and G.X14 (renal cell carcinoma), the same four cancer types for which seven genome-based clusters in [16] produced a poor overall fit, and for a good reason as well (see [16] for details). It is less clear why the 11 exome-based clusters do not have a better fit for G.X7 (gastric cancer) considering the in-sample fits for this cancer type based on exome data (X15; Table 5 hereof) and genome data (Row 7, Table 15 of [16]) are petty good.
Therefore, unlike NMF, *K-means clustering, being a statistically deterministic method, is in-sample stable. Here, we can ask, what if we apply to NMF the same two machine learning levels as those that sit on top of k-means in *K-means to make it statistically deterministic? The answer is that when applying NMF, one already uses one machine learning method, which is a form of aggregation of a large number of samplings (i.e., individual NMF runs) [41]. This is conceptually similar to the first machine learning level in *K-means. Therefore, then we can ask, what if we add to NMF the second machine learning level as in *K-means, to wit by comparing a large number of such “averagings”? A simple, prosaic answer is that it would make NMF, which is already computationally costly as is and much more so with the first machine learning level, computationally prohibitive. The reason why *K-means is computationally much less expensive is that the basic building block of *K-means, on top of which we add the two machine learning methods, is vanilla k-means, which is much, much less expensive than NMF. That is what makes all the difference [42].
Finally, let us mention that exome data for chronic myeloid disorders (121 samples, 175 total counts) were published in [43,44], and for neuroblastoma (13 samples, 298 total counts) in [45]. However, these data are so sparsely populated (too many zeros even after aggregation) that we specifically excluded them from our analysis. Much more unpublished data are available for the cancer types we analyze here, as well as other cancer types, and it would be very interesting to apply our methods to these data, including to (still embargoed) extensive genome data of the International Cancer Genome Consortium.

Acknowledgments

The results published here are in whole or part based on data generated by the TCGA (The Cancer Genome Atlas) Research Network: http://cancergenome.nih.gov/.

Author Contributions

The authors contributed equally.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A: Exome Sample IDs

In this Appendix, we give the sample IDs with the corresponding publication references for the exome data we use. We label these references as H1, Z1, etc., and use these labels in Table 1 in the Sources column. This appendix also includes Tables and Figures labeled A⋆ (Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8,Table A9 ,Table A10, Table A11 and Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7, Figure A8, Figure A9, Figure A10, Figure A11); see Section 3.1.
Acute Lymphoblastic Leukemia (86 samples):
Source H1 = [46]. Sample IDs are of the form SJHYPO*, where * is:
001-D, 002-D, 004-D, 005-D, 006-D, 009-D, 009-R, 012-D, 013-D, 014-D, 016-D, 019-D, 020-D, 022-D, 024-D, 026-D, 029-D, 032-D, 036-D, 037-D, 037-R, 039-D, 040-D, 041-D, 042-D, 044-D, 045-D, 046-D, 047-D, 051-D, 052-D, 052-R, 055-D, 056-D, 116-D, 117-D, 119-D, 120-D, 123-D, 124-D, 125-D, 126-D.
Source Z1 = [47]. Sample IDs are of the form SJTALL*, where * is:
001, 002, 003, 004, 005, 006, 007, 008, 009, 011, 012, 013, 169, 192, 208.
Source D1 = [48]:
TBR01, TBR03, TBR05, TBR06, TBR08, TLE02, TLE10, TLE109, TLE31, TLE33, TLE34, TLE38, TLE39, TLE41, TLE42, TLE43, TLE50, TLE51, TLE54, TLE55, TLE57, TLE60, TLE61, TLE63, TLE64, TLE65, TLE66, TLE67, TLE68.
Acute Myeloid Leukemia (190 samples):
Source T1 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-AB-*, where * is:
2802, 2803, 2804, 2805, 2806, 2807, 2808, 2809, 2810, 2811, 2812, 2813, 2814, 2816, 2817, 2818, 2819, 2820, 2821, 2822, 2824, 2825, 2826, 2827, 2828, 2829, 2830, 2831, 2832, 2833, 2835, 2836, 2837, 2838, 2839, 2841, 2842, 2843, 2844, 2845, 2846, 2847, 2849, 2850, 2851, 2853, 2854, 2855, 2857, 2858, 2859, 2860, 2861, 2862, 2863, 2864, 2865, 2866, 2867, 2868, 2869, 2870, 2871, 2872, 2873, 2874, 2875, 2876, 2877, 2878, 2879, 2880, 2881, 2882, 2883, 2884, 2885, 2886, 2887, 2888, 2889, 2890, 2891, 2892, 2893, 2894, 2895, 2896, 2897, 2898, 2899, 2900, 2901, 2904, 2905, 2906, 2907, 2908, 2910, 2911, 2912, 2913, 2914, 2915, 2916, 2917, 2918, 2919, 2920, 2921, 2922, 2923, 2924, 2925, 2926, 2927, 2928, 2929, 2930, 2931, 2932, 2933, 2934, 2935, 2936, 2937, 2938, 2939, 2940, 2941, 2943, 2945, 2946, 2947, 2948, 2949, 2950, 2952, 2954, 2955, 2956, 2957, 2959, 2963, 2964, 2965, 2966, 2967, 2968, 2969, 2970, 2971, 2972, 2973, 2974, 2975, 2976, 2977, 2978, 2979, 2980, 2981, 2982, 2983, 2984, 2985, 2986, 2987, 2988, 2989, 2990, 2991, 2992, 2993, 2994, 2995, 2996, 2997, 2998, 2999, 3000, 3001, 3002, 3005, 3006, 3007, 3008, 3009, 3011, 3012.
Adrenocortical Carcinoma (91 samples):
Source T2 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
OR-A5J1, OR-A5J2, OR-A5J3, OR-A5J4, OR-A5J5, OR-A5J6, OR-A5J7, OR-A5J8, OR-A5J9, OR-A5JA, OR-A5JB, OR-A5JC, OR-A5JD, OR-A5JE, OR-A5JF, OR-A5JG, OR-A5JH, OR-A5JI, OR-A5JJ, OR-A5JK, OR-A5JL, OR-A5JM, OR-A5JO, OR-A5JP, OR-A5JQ, OR-A5JR, OR-A5JS, OR-A5JT, OR-A5JU, OR-A5JV, OR-A5JW, OR-A5JX, OR-A5JY, OR-A5JZ, OR-A5K0, OR-A5K1, OR-A5K2, OR-A5K3, OR-A5K4, OR-A5K5, OR-A5K6, OR-A5K8, OR-A5K9, OR-A5KB, OR-A5KO, OR-A5KP, OR-A5KQ, OR-A5KS, OR-A5KT, OR-A5KU, OR-A5KV, OR-A5KW, OR-A5KX, OR-A5KY, OR-A5KZ, OR-A5L1, OR-A5L2, OR-A5L3, OR-A5L4, OR-A5L5, OR-A5L6, OR-A5L8, OR-A5L9, OR-A5LA, OR-A5LB, OR-A5LC, OR-A5LD, OR-A5LE, OR-A5LF, OR-A5LG, OR-A5LH, OR-A5LI, OR-A5LJ, OR-A5LK, OR-A5LL, OR-A5LN, OR-A5LO, OR-A5LP, OR-A5LR, OR-A5LS, OR-A5LT, OU-A5PI, P6-A5OF, P6-A5OG, P6-A5OH, PA-A5YG, PK-A5H8, PK-A5H9, PK-A5HA, PK-A5HB, PK-A5HC.
B-Cell Lymphoma (24 samples):
Source M1 = [49]. In DLBCL sample IDs * runs from A though M (e.g., DLBCL-PatientC):
07-35482, DLBCL-Patient*, FL-PatientA, FL009.
Source L1 = [50]:
1060, 1061, 1065, 1093, 1096, 1102, 515, EB2.
Benign Liver Tumor (40 samples):
Source P1 = [51]. Sample IDs are of the form CHC*, where * is:
1023T, 1124T, 1315T, 1328T, 1329T, 1337T, 1382T, 1383T, 1424T, 1425T, 1428T, 1432T, 1434T, 1439T, 1488T, 1489T, 1665T, 1666T, 1854T, 1916T, 340T, 361TB, 462T, 463T, 464T, 470T, 471T, 517T, 575T, 578T, 603T, 605T, 623T, 624T, 674T, 687T, 689T, 846T, 918T, 976T.
Bladder Cancer (341 samples):
Source G1 = [52]. Sample IDs are of the form TCC+AF8-B**+AC0-Tumor, where ** is (below * stands for +AC0-, e.g., 104*0 = 104+AC0-0, and the full sample ID is TCC+AF8-B104+AC0-0+AC0-Tumor):
10, 100, 101, 102, 103, 104*0, 104, 105*0, 105*1, 105, 106, 107, 109, 11, 110, 111, 112, 114, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 23, 24, 25, 34, 35, 37, 41, 43, 45, 47, 5, 50, 52, 54, 55, 56, 57, 58, 59*0, 59*1, 59*3, 59, 60, 61, 62*0, 63, 64, 65, 66*0, 66, 68, 70, 71, 73, 74, 77, 78, 79, 8, 80*0, 80*1, 80*11, 80*13, 80*3, 80*4, 80*5, 80*7, 80*8, 80, 81*1, 81*2, 81, 82, 83, 84, 85*0, 85*2, 86, 87, 88, 89*1, 89*10, 89*11, 89*12, 89*16, 89*3, 89*4, 89*5, 9, 90, 92, 96, 98, 99.
Source T3 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA+AC0-**, where ** is (below * stands for +AC0-A, e.g., BL*0C8 = BL+AC0-A0C8, and the full sample ID is TCGA+AC0-BL+AC0-A0C8; also, below ⋆ = 3OO = 3-double-O):
BL*0C8, BL*13I, BL*13J, BL*3JM, BL*5ZZ, BT*0S7, BT*0YX, BT*20J, BT*20N, BT*20O, BT*20P, BT*20Q, BT*20R, BT*20T, BT*20U, BT*20V, BT*20W, BT*20X, BT*2LA, BT*2LB, BT*2LD, BT*3PH, BT*3PJ, BT*3PK, BT*42B, BT*42C, BT*42E, BT*42F, C4*0EZ, C4*0F0, C4*0F1, C4*0F6, C4*0F7, CF*1HR, CF*1HS, CF*27C, CF*3MF, CF*3MG, CF*3MH, CF*3MI, CF*47S, CF*47T, CF*47V, CF*47W, CF*47X, CF*47Y, CF*5U8, CF*5UA, CU*0YN, CU*0YO, CU*0YR, CU*3KJ, CU*3QU, CU*3YL, CU*5W6, CU*72E, DK*1A3, DK*1A5, DK*1A6, DK*1A7, DK*1AA, DK*1AB, DK*1AC, DK*1AD, DK*1AE, DK*1AF, DK*1AG, DK*2HX, DK*2I1, DK*2I2, DK*2I4, DK*2I6, DK*3IK, DK*3IL, DK*3IM, DK*3IN, DK*3IQ, DK*3IS, DK*3IT, DK*3IU, DK*3IV, DK*3WW, DK*3WX, DK*3WY, DK*3X1, DK*3X2, DK*6AV, DK*6AW, DK*6B0, DK*6B1, DK*6B2, DK*6B5, DK*6B6, E5*2PC, E5*4TZ, E5*4U1, E7*3X6, E7*3Y1, E7*4IJ, E7*4XJ, E7*541, E7*5KE, E7*5KF, E7*677, E7*678, E7*6ME, E7*6MF, E7*7DU, E7*7DV, FD*3B3, FD*3B4, FD*3B5, FD*3B6, FD*3B7, FD*3B8, FD*3N5, FD*3N6, FD*3NA, FD*3SJ, FD*3SL, FD*3SM, FD*3SN, FD*3SO, FD*3SP, FD*3SQ, FD*3SR, FD*3SS, FD*43N, FD*43P, FD*43S, FD*43U, FD*43X, FD*5BR, FD*5BS, FD*5BU, FD*5BV, FD*5BX, FD*5BY, FD*5BZ, FD*5C0, FD*5C1, FD*62N, FD*62O, FD*62P, FD*62S, FD*6TA, FD*6TB, FD*6TC, FD*6TD, FD*6TE, FD*6TF, FD*6TG, FD*6TH, FD*6TI, FD*6TK, FJ*3Z7, FJ*3Z9, FJ*3ZE, FJ*3ZF, FT*3EE, FT*61P, G2*2EC, G2*2EF, G2*2EJ, G2*2EK, G2*2EL, G2*2EO, G2*2ES, G2*3IB, G2*3IE, G2*3VY, GC*3BM, GC*3I6, GC*⋆, GC*3RB, GC*3RC, GC*3RD, GC*3WC, GC*3YS, GC*6I1, GC*6I3, GD*2C5, GD*3OP, GD*3OQ, GD*3OS, GD*6C6, GD*76B, GU*42P, GU*42Q, GU*42R, GU*762, GU*763, GU*766, GU*767, GV*3JV, GV*3JW, GV*3JX, GV*3JZ, GV*3QF, GV*3QG, GV*3QH, GV*3QI, GV*3QK, GV*40E, GV*40G, GV*6ZA, H4*2HO, H4*2HQ, HQ*2OE, HQ*2OF, HQ*5ND, HQ*5NE, K4*3WS, K4*3WU, K4*3WV, K4*4AB, K4*4AC, K4*54R, K4*5RH, K4*5RI, K4*5RJ, K4*6FZ, K4*6MB, KQ*41N, KQ*41P, KQ*41Q, KQ*41S, LC*66R, LT*5Z6, MV*51V, PQ*6FI, PQ*6FN, R3*69X, S5*6DX, UY*78K, UY*78L, UY*78N, UY*78O.
Brain Lower Grade Glioma (465 samples):
Source T4 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
CS-4938, CS-4941, CS-4942, CS-4943, CS-4944, CS-5390, CS-5393, CS-5394, CS-5395, CS-5396, CS-5397, CS-6186, CS-6188, CS-6290, CS-6665, CS-6666, CS-6667, CS-6668, CS-6669, CS-6670, DB-5270, DB-5273, DB-5274, DB-5275, DB-5276, DB-5277, DB-5278, DB-5279, DB-5280, DB-5281, DB-A4X9, DB-A4XA, DB-A4XB, DB-A4XC, DB-A4XD, DB-A4XE, DB-A4XF, DB-A4XG, DB-A4XH, DB-A64L, DB-A64O, DB-A64P, DB-A64Q, DB-A64R, DB-A64S, DB-A64U, DB-A64V, DB-A64W, DB-A64X, DB-A75K, DB-A75L, DB-A75M, DB-A75O, DB-A75P, DH-5140, DH-5141, DH-5142, DH-5143, DH-5144, DH-A669, DH-A66B, DH-A66D, DH-A66F, DH-A66G, DH-A7UR, DH-A7US, DH-A7UT, DH-A7UU, DH-A7UV, DU-5847, DU-5849, DU-5851, DU-5852, DU-5853, DU-5854, DU-5855, DU-5870, DU-5871, DU-5872, DU-5874, DU-6392, DU-6393, DU-6394, DU-6395, DU-6396, DU-6397, DU-6399, DU-6400, DU-6401, DU-6402, DU-6403, DU-6404, DU-6405, DU-6406, DU-6407, DU-6408, DU-6410, DU-6542, DU-7006, DU-7007, DU-7008, DU-7009, DU-7010, DU-7011, DU-7012, DU-7013, DU-7014, DU-7015, DU-7018, DU-7019, DU-7290, DU-7292, DU-7294, DU-7298, DU-7299, DU-7300, DU-7301, DU-7302, DU-7304, DU-7306, DU-7309, DU-8158, DU-8161, DU-8162, DU-8163, DU-8164, DU-8165, DU-8166, DU-8167, DU-8168, DU-A5TP, DU-A5TR, DU-A5TS, DU-A5TT, DU-A5TU, DU-A5TW, DU-A5TY, DU-A6S2, DU-A6S3, DU-A6S6, DU-A6S7, DU-A6S8, DU-A76K, DU-A76L, DU-A76O, DU-A76R, DU-A7T6, DU-A7T8, DU-A7TA, DU-A7TB, DU-A7TC, DU-A7TD, DU-A7TG, DU-A7TJ, E1-5302, E1-5303, E1-5304, E1-5305, E1-5307, E1-5311, E1-5318, E1-5319, E1-5322, E1-A7YD, E1-A7YE, E1-A7YH, E1-A7YI, E1-A7YJ, E1-A7YK, E1-A7YL, E1-A7YM, E1-A7YN, E1-A7YO, E1-A7YQ, E1-A7YS, E1-A7YU, E1-A7YV, E1-A7YW, E1-A7YY, E1-A7Z2, E1-A7Z3, E1-A7Z4, E1-A7Z6, EZ-7264, FG-5962, FG-5963, FG-5964, FG-5965, FG-6688, FG-6689, FG-6690, FG-6691, FG-6692, FG-7634, FG-7636, FG-7637, FG-7638, FG-7641, FG-7643, FG-8181, FG-8182, FG-8185, FG-8186, FG-8187, FG-8188, FG-8189, FG-8191, FG-A4MT, FG-A4MU, FG-A4MW, FG-A4MX, FG-A4MY, FG-A60J, FG-A60K, FG-A60L, FG-A6IZ, FG-A6J1, FG-A6J3, FG-A70Y, FG-A70Z, FG-A710, FG-A711, FG-A713, FN-7833, HT-7467, HT-7468, HT-7469, HT-7470, HT-7471, HT-7472, HT-7473, HT-7474, HT-7475, HT-7476, HT-7477, HT-7478, HT-7479, HT-7480, HT-7481, HT-7482, HT-7483, HT-7485, HT-7601, HT-7602, HT-7603, HT-7604, HT-7605, HT-7606, HT-7607, HT-7608, HT-7609, HT-7610, HT-7611, HT-7616, HT-7620, HT-7676, HT-7677, HT-7680, HT-7681, HT-7684, HT-7686, HT-7687, HT-7688, HT-7689, HT-7690, HT-7691, HT-7692, HT-7693, HT-7694, HT-7695, HT-7854, HT-7855, HT-7856, HT-7857, HT-7858, HT-7860, HT-7873, HT-7874, HT-7875, HT-7877, HT-7879, HT-7880, HT-7881, HT-7882, HT-7884, HT-7902, HT-8010, HT-8011, HT-8012, HT-8013, HT-8015, HT-8018, HT-8019, HT-8104, HT-8105, HT-8106, HT-8107, HT-8108, HT-8109, HT-8110, HT-8111, HT-8113, HT-8114, HT-8558, HT-8563, HT-8564, HT-A4DS, HT-A4DV, HT-A5R5, HT-A5R7, HT-A5R9, HT-A5RA, HT-A5RB, HT-A5RC, HT-A614, HT-A615, HT-A616, HT-A617, HT-A618, HT-A619, HT-A61A, HT-A61B, HT-A61C, HT-A74H, HT-A74J, HT-A74K, HT-A74L, HT-A74O, HW-7486, HW-7487, HW-7489, HW-7490, HW-7491, HW-7493, HW-7495, HW-8319, HW-8320, HW-8321, HW-8322, HW-A5KJ, HW-A5KK, HW-A5KL, HW-A5KM, IK-7675, IK-8125, KT-A74X, KT-A7W1, P5-A5ET, P5-A5EU, P5-A5EV, P5-A5EW, P5-A5EX, P5-A5EY, P5-A5EZ, P5-A5F0, P5-A5F1, P5-A5F2, P5-A5F4, P5-A5F6, P5-A72U, P5-A72W, P5-A72X, P5-A72Z, P5-A730, P5-A731, P5-A733, P5-A735, P5-A736, P5-A737, P5-A77W, P5-A77X, P5-A780, P5-A781, QH-A65R, QH-A65S, QH-A65V, QH-A65X, QH-A65Z, QH-A6CS, QH-A6CU, QH-A6CV, QH-A6CW, QH-A6CX, QH-A6CY, QH-A6CZ, QH-A6X3, QH-A6X4, QH-A6X5, QH-A6X8, QH-A6X9, QH-A6XA, QH-A6XC, R8-A6MK, R8-A6ML, R8-A6MO, R8-A6YH, R8-A73M, S9-A6TS, S9-A6TU, S9-A6TV, S9-A6TW, S9-A6TX, S9-A6TY, S9-A6TZ, S9-A6U0, S9-A6U1, S9-A6U2, S9-A6U5, S9-A6U6, S9-A6U8, S9-A6U9, S9-A6UA, S9-A6UB, S9-A6WD, S9-A6WE, S9-A6WG, S9-A6WH, S9-A6WI, S9-A6WL, S9-A6WM, S9-A6WN, S9-A6WO, S9-A6WP, S9-A6WQ, S9-A7IQ, S9-A7IS, S9-A7IX, S9-A7IY, S9-A7IZ, S9-A7J0, S9-A7J1, S9-A7J2, S9-A7J3, S9-A7QW, S9-A7QX, S9-A7QY, S9-A7QZ, S9-A7R1, S9-A7R2, S9-A7R3, S9-A7R4, S9-A7R7, S9-A7R8, TM-A7C3, TM-A7C4, TM-A7C5, TM-A7CA, TM-A7CF, TQ-A7RF, TQ-A7RG, TQ-A7RH, TQ-A7RI, TQ-A7RJ, TQ-A7RK, TQ-A7RM, TQ-A7RN, TQ-A7RO, TQ-A7RP, TQ-A7RQ, TQ-A7RR, TQ-A7RS, TQ-A7RU, TQ-A7RV, TQ-A7RW, VW-A7QS.
Breast Cancer (1182 samples):
Source N1 = [53]. Sample IDs are of the form CGP_specimen_*, where * is:
1096043, 1142475, 1142532, 1142534, 1192095, 1192097, 1192099, 1192101, 1192103, 1192105, 1192107, 1192111, 1192113, 1192115, 1192117, 1192119, 1192121, 1192123, 1192125, 1192127, 1192129, 1192131, 1192133, 1192135, 1192137, 1195364, 1195366, 1195368, 1212804, 1212810, 1212816, 1212822, 1212825, 1212828, 1215490, 1215532, 1215535, 1215553, 1215559, 1215561, 1215563, 1215565, 1215567, 1215573, 1223855, 1223858, 1223861, 1227889, 1227916, 1227918, 1227920, 1227922, 1227924, 1227926, 1227928, 1227951, 1227953, 1227955, 1227957, 1227959, 1227961, 1227963, 1227965, 1227969, 1227971, 1241537, 1241539, 1241541, 1241543, 1241545, 1241547, 1241549, 1241551, 1241553, 1241555, 1241557, 1241559, 1241562, 1241565, 1241568, 1241571, 1241574, 1241579, 1241581, 1261287, 1261291, 1261293, 1261295, 1261297, 1261299, 1261301, 1261303, 1261305, 1261307, 1261309, 1261311, 1261313, 1261337, 1261382, 1261391, 1266549, 1266551, 1266553, 1266561, 1266563, 1266565, 1266567, 1343241, 1343244, 1343247, 1380057, 1380059, 1380061, 1380063, 1380065, 1380067.
Source S1 = [54]. Sample IDs are of the form PD*a, where * is:
4842, 4843, 4844, 4934, 4935, 4936, 4937, 4938, 4939, 5961, 7206, 7211, 7316, 9193.
Source S2 = [55]. Sample IDs are of the form SA*, where * is:
018, 029, 030, 031, 051, 052, 053, 054, 055, 063, 065, 067, 068, 069, 071, 072, 073, 074, 075, 076, 077, 080, 083, 084, 085, 089, 090, 092, 093, 094, 096, 097, 098, 101, 102, 103, 106, 208, 210, 212, 213, 214, 215, 216, 217, 218, 219, 220, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 233, 234, 235, 236, 237.
Source T5 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
A1-A0SB, A1-A0SD, A1-A0SE, A1-A0SF, A1-A0SG, A1-A0SH, A1-A0SI, A1-A0SJ, A1-A0SK, A1-A0SM, A1-A0SN, A1-A0SO, A1-A0SP, A1-A0SQ, A2-A04N, A2-A04P, A2-A04Q, A2-A04R, A2-A04T, A2-A04U, A2-A04V, A2-A04W, A2-A04X, A2-A04Y, A2-A0CK, A2-A0CL, A2-A0CM, A2-A0CO, A2-A0CP, A2-A0CQ, A2-A0CR, A2-A0CS, A2-A0CT, A2-A0CU, A2-A0CV, A2-A0CW, A2-A0CX, A2-A0CZ, A2-A0D0, A2-A0D1, A2-A0D2, A2-A0D3, A2-A0D4, A2-A0EM, A2-A0EN, A2-A0EO, A2-A0EP, A2-A0EQ, A2-A0ER, A2-A0ES, A2-A0ET, A2-A0EU, A2-A0EV, A2-A0EW, A2-A0EX, A2-A0EY, A2-A0ST, A2-A0SU, A2-A0SV, A2-A0SW, A2-A0SX, A2-A0SY, A2-A0T0, A2-A0T1, A2-A0T2, A2-A0T3, A2-A0T4, A2-A0T5, A2-A0T6, A2-A0T7, A2-A0YC, A2-A0YD, A2-A0YE, A2-A0YF, A2-A0YG, A2-A0YH, A2-A0YI, A2-A0YJ, A2-A0YK, A2-A0YL, A2-A0YM, A2-A0YT, A2-A1FV, A2-A1FW, A2-A1FX, A2-A1FZ, A2-A1G0, A2-A1G1, A2-A1G4, A2-A1G6, A2-A259, A2-A25A, A2-A25B, A2-A25C, A2-A25D, A2-A25E, A2-A25F, A2-A3KC, A2-A3KD, A2-A3XS, A2-A3XT, A2-A3XU, A2-A3XV, A2-A3XW, A2-A3XX, A2-A3XY, A2-A3XZ, A2-A3Y0, A2-A4RW, A2-A4RX, A2-A4RY, A2-A4S0, A2-A4S1, A2-A4S2, A2-A4S3, A7-A0CD, A7-A0CE, A7-A0CG, A7-A0CH, A7-A0CJ, A7-A0D9, A7-A0DA, A7-A0DB, A7-A0DC, A7-A13D, A7-A13E, A7-A13F, A7-A13G, A7-A13H, A7-A26E, A7-A26F, A7-A26G, A7-A26H, A7-A26I, A7-A26J, A7-A2KD, A7-A3IY, A7-A3IZ, A7-A3J0, A7-A3J1, A7-A3RF, A7-A425, A7-A426, A7-A4SA, A7-A4SB, A7-A4SC, A7-A4SD, A7-A4SE, A7-A4SF, A7-A56D, A7-A5ZV, A7-A5ZW, A7-A5ZX, A8-A06N, A8-A06O, A8-A06P, A8-A06Q, A8-A06R, A8-A06T, A8-A06U, A8-A06X, A8-A06Y, A8-A06Z, A8-A075, A8-A076, A8-A079, A8-A07B, A8-A07C, A8-A07E, A8-A07F, A8-A07G, A8-A07I, A8-A07J, A8-A07L, A8-A07O, A8-A07P, A8-A07R, A8-A07U, A8-A07W, A8-A07Z, A8-A081, A8-A082, A8-A083, A8-A084, A8-A085, A8-A086, A8-A08A, A8-A08B, A8-A08F, A8-A08G, A8-A08H, A8-A08I, A8-A08J, A8-A08L, A8-A08O, A8-A08P, A8-A08R, A8-A08S, A8-A08T, A8-A08X, A8-A08Z, A8-A090, A8-A091, A8-A092, A8-A093, A8-A094, A8-A095, A8-A096, A8-A097, A8-A099, A8-A09A, A8-A09B, A8-A09C, A8-A09D, A8-A09E, A8-A09G, A8-A09I, A8-A09K, A8-A09M, A8-A09N, A8-A09Q, A8-A09R, A8-A09T, A8-A09V, A8-A09W, A8-A09X, A8-A09Z, A8-A0A1, A8-A0A2, A8-A0A4, A8-A0A6, A8-A0A7, A8-A0A9, A8-A0AB, A8-A0AD, AC-A23C, AC-A23E, AC-A23G, AC-A23H, AC-A2B8, AC-A2BK, AC-A2BM, AC-A2FB, AC-A2FE, AC-A2FF, AC-A2FG, AC-A2FK, AC-A2FM, AC-A2FO, AC-A2QH, AC-A2QI, AC-A2QJ, AC-A3BB, AC-A3EH, AC-A3HN, AC-A3OD, AC-A3QP, AC-A3TM, AC-A3TN, AC-A3W5, AC-A3W6, AC-A3W7, AC-A3YI, AC-A3YJ, AC-A5EH, AC-A5EI, AC-A5XS, AC-A5XU, AC-A62X, AC-A62Y, AN-A03X, AN-A03Y, AN-A041, AN-A046, AN-A049, AN-A04A, AN-A04C, AN-A04D, AN-A0AJ, AN-A0AK, AN-A0AL, AN-A0AM, AN-A0AR, AN-A0AS, AN-A0AT, AN-A0FD, AN-A0FF, AN-A0FJ, AN-A0FK, AN-A0FL, AN-A0FN, AN-A0FS, AN-A0FT, AN-A0FV, AN-A0FW, AN-A0FX, AN-A0FY, AN-A0FZ, AN-A0G0, AN-A0XL, AN-A0XN, AN-A0XO, AN-A0XP, AN-A0XR, AN-A0XS, AN-A0XT, AN-A0XU, AN-A0XV, AN-A0XW, AO-A03L, AO-A03M, AO-A03N, AO-A03O, AO-A03P, AO-A03R, AO-A03T, AO-A03U, AO-A03V, AO-A0J2, AO-A0J3, AO-A0J4, AO-A0J5, AO-A0J6, AO-A0J7, AO-A0J8, AO-A0J9, AO-A0JA, AO-A0JB, AO-A0JC, AO-A0JD, AO-A0JE, AO-A0JF, AO-A0JG, AO-A0JI, AO-A0JJ, AO-A0JL, AO-A0JM, AO-A124, AO-A125, AO-A126, AO-A128, AO-A129, AO-A12A, AO-A12B, AO-A12D, AO-A12E, AO-A12F, AO-A12G, AO-A12H, AO-A1KO, AO-A1KP, AO-A1KR, AO-A1KS, AO-A1KT, AQ-A04H, AQ-A04J, AQ-A04L, AQ-A0Y5, AQ-A1H2, AQ-A1H3, AQ-A54N, AQ-A54O, AR-A0TP, AR-A0TQ, AR-A0TR, AR-A0TS, AR-A0TT, AR-A0TU, AR-A0TV, AR-A0TW, AR-A0TX, AR-A0TY, AR-A0TZ, AR-A0U0, AR-A0U1, AR-A0U2, AR-A0U3, AR-A0U4, AR-A1AH, AR-A1AI, AR-A1AJ, AR-A1AK, AR-A1AL, AR-A1AM, AR-A1AN, AR-A1AO, AR-A1AP, AR-A1AQ, AR-A1AR, AR-A1AS, AR-A1AT, AR-A1AU, AR-A1AV, AR-A1AW, AR-A1AX, AR-A1AY, AR-A24H, AR-A24K, AR-A24L, AR-A24M, AR-A24N, AR-A24O, AR-A24P, AR-A24Q, AR-A24R, AR-A24S, AR-A24T, AR-A24U, AR-A24V, AR-A24W, AR-A24X, AR-A24Z, AR-A250, AR-A251, AR-A252, AR-A254, AR-A255, AR-A256, AR-A2LE, AR-A2LH, AR-A2LJ, AR-A2LK, AR-A2LL, AR-A2LM, AR-A2LN, AR-A2LO, AR-A2LQ, AR-A2LR, AR-A5QM, AR-A5QN, AR-A5QP, AR-A5QQ, B6-A0I1, B6-A0I2, B6-A0I5, B6-A0I6, B6-A0I8, B6-A0I9, B6-A0IA, B6-A0IB, B6-A0IC, B6-A0IE, B6-A0IG, B6-A0IH, B6-A0IJ, B6-A0IK, B6-A0IM, B6-A0IN, B6-A0IO, B6-A0IP, B6-A0IQ, B6-A0RE, B6-A0RG, B6-A0RH, B6-A0RI, B6-A0RL, B6-A0RM, B6-A0RN, B6-A0RO, B6-A0RP, B6-A0RQ, B6-A0RS, B6-A0RT, B6-A0RU, B6-A0RV, B6-A0WS, B6-A0WT, B6-A0WV, B6-A0WW, B6-A0WX, B6-A0WY, B6-A0WZ, B6-A0X0, B6-A0X1, B6-A0X4, B6-A0X5, B6-A0X7, B6-A1KC, B6-A1KF, B6-A1KI, B6-A1KN, B6-A2IU, B6-A3ZX, B6-A400, B6-A401, B6-A402, B6-A408, B6-A409, B6-A40B, B6-A40C, BH-A0AU, BH-A0AV, BH-A0AW, BH-A0AY, BH-A0AZ, BH-A0B0, BH-A0B1, BH-A0B3, BH-A0B4, BH-A0B5, BH-A0B6, BH-A0B7, BH-A0B8, BH-A0B9, BH-A0BA, BH-A0BC, BH-A0BD, BH-A0BF, BH-A0BG, BH-A0BJ, BH-A0BL, BH-A0BM, BH-A0BO, BH-A0BP, BH-A0BQ, BH-A0BR, BH-A0BS, BH-A0BT, BH-A0BV, BH-A0BW, BH-A0BZ, BH-A0C0, BH-A0C1, BH-A0C3, BH-A0C7, BH-A0DD, BH-A0DE, BH-A0DG, BH-A0DH, BH-A0DI, BH-A0DK, BH-A0DL, BH-A0DO, BH-A0DP, BH-A0DQ, BH-A0DS, BH-A0DT, BH-A0DV, BH-A0DX, BH-A0DZ, BH-A0E0, BH-A0E1, BH-A0E2, BH-A0E6, BH-A0E7, BH-A0E9, BH-A0EA, BH-A0EB, BH-A0EE, BH-A0EI, BH-A0GY, BH-A0GZ, BH-A0H0, BH-A0H3, BH-A0H5, BH-A0H6, BH-A0H7, BH-A0H9, BH-A0HA, BH-A0HB, BH-A0HF, BH-A0HI, BH-A0HK, BH-A0HL, BH-A0HN, BH-A0HO, BH-A0HP, BH-A0HQ, BH-A0HU, BH-A0HW, BH-A0HX, BH-A0HY, BH-A0RX, BH-A0W3, BH-A0W4, BH-A0W5, BH-A0W7, BH-A0WA, BH-A18F, BH-A18G, BH-A18H, BH-A18I, BH-A18J, BH-A18K, BH-A18L, BH-A18M, BH-A18N, BH-A18P, BH-A18Q, BH-A18R, BH-A18S, BH-A18T, BH-A18U, BH-A18V, BH-A1EN, BH-A1EO, BH-A1ES, BH-A1ET, BH-A1EU, BH-A1EV, BH-A1EW, BH-A1EX, BH-A1EY, BH-A1F0, BH-A1F2, BH-A1F5, BH-A1F6, BH-A1F8, BH-A1FC, BH-A1FD, BH-A1FE, BH-A1FG, BH-A1FH, BH-A1FJ, BH-A1FL, BH-A1FM, BH-A1FN, BH-A1FR, BH-A1FU, BH-A201, BH-A202, BH-A203, BH-A204, BH-A208, BH-A209, BH-A28O, BH-A28Q, BH-A2L8, BH-A42T, BH-A42U, BH-A42V, BH-A5IZ, BH-A5J0, C8-A12K, C8-A12L, C8-A12M, C8-A12N, C8-A12O, C8-A12P, C8-A12Q, C8-A12T, C8-A12U, C8-A12V, C8-A12W, C8-A12X, C8-A12Y, C8-A12Z, C8-A130, C8-A131, C8-A132, C8-A133, C8-A134, C8-A135, C8-A137, C8-A138, C8-A1HE, C8-A1HF, C8-A1HG, C8-A1HI, C8-A1HJ, C8-A1HK, C8-A1HL, C8-A1HM, C8-A1HN, C8-A1HO, C8-A26V, C8-A26W, C8-A26X, C8-A26Y, C8-A26Z, C8-A273, C8-A274, C8-A275, C8-A278, C8-A27A, C8-A27B, C8-A3M7, C8-A3M8, D8-A13Y, D8-A13Z, D8-A140, D8-A141, D8-A142, D8-A143, D8-A145, D8-A146, D8-A147, D8-A1J8, D8-A1J9, D8-A1JA, D8-A1JB, D8-A1JC, D8-A1JD, D8-A1JE, D8-A1JF, D8-A1JG, D8-A1JH, D8-A1JI, D8-A1JJ, D8-A1JK, D8-A1JL, D8-A1JM, D8-A1JN, D8-A1JP, D8-A1JS, D8-A1JT, D8-A1JU, D8-A1X5, D8-A1X6, D8-A1X7, D8-A1X8, D8-A1X9, D8-A1XA, D8-A1XB, D8-A1XC, D8-A1XF, D8-A1XG, D8-A1XJ, D8-A1XK, D8-A1XL, D8-A1XM, D8-A1XO, D8-A1XQ, D8-A1XR, D8-A1XS, D8-A1XT, D8-A1XU, D8-A1XV, D8-A1XW, D8-A1XY, D8-A1XZ, D8-A1Y0, D8-A1Y1, D8-A1Y2, D8-A1Y3, D8-A27E, D8-A27F, D8-A27G, D8-A27H, D8-A27I, D8-A27K, D8-A27L, D8-A27M, D8-A27N, D8-A27P, D8-A27R, D8-A27T, D8-A27V, D8-A27W, D8-A3Z5, D8-A3Z6, D8-A4Z1, E2-A105, E2-A107, E2-A108, E2-A109, E2-A10A, E2-A10B, E2-A10C, E2-A10E, E2-A10F, E2-A14N, E2-A14O, E2-A14P, E2-A14Q, E2-A14R, E2-A14S, E2-A14T, E2-A14U, E2-A14V, E2-A14W, E2-A14X, E2-A14Y, E2-A14Z, E2-A150, E2-A152, E2-A153, E2-A154, E2-A155, E2-A156, E2-A158, E2-A159, E2-A15A, E2-A15C, E2-A15D, E2-A15E, E2-A15F, E2-A15G, E2-A15H, E2-A15I, E2-A15J, E2-A15K, E2-A15L, E2-A15M, E2-A15O, E2-A15P, E2-A15R, E2-A15S, E2-A15T, E2-A1AZ, E2-A1B0, E2-A1B1, E2-A1B4, E2-A1B5, E2-A1B6, E2-A1BC, E2-A1BD, E2-A1IE, E2-A1IF, E2-A1IG, E2-A1IH, E2-A1II, E2-A1IJ, E2-A1IK, E2-A1IL, E2-A1IN, E2-A1IO, E2-A1IU, E2-A1L6, E2-A1L7, E2-A1L8, E2-A1L9, E2-A1LA, E2-A1LB, E2-A1LE, E2-A1LG, E2-A1LH, E2-A1LI, E2-A1LK, E2-A1LL, E2-A1LS, E2-A2P5, E2-A2P6, E2-A3DX, E2-A56Z, E2-A570, E2-A573, E2-A574, E9-A1N3, E9-A1N4, E9-A1N5, E9-A1N8, E9-A1N9, E9-A1NA, E9-A1NC, E9-A1ND, E9-A1NE, E9-A1NF, E9-A1NG, E9-A1NH, E9-A1NI, E9-A1QZ, E9-A1R0, E9-A1R2, E9-A1R3, E9-A1R4, E9-A1R5, E9-A1R6, E9-A1R7, E9-A1RA, E9-A1RB, E9-A1RC, E9-A1RD, E9-A1RE, E9-A1RF, E9-A1RG, E9-A1RH, E9-A1RI, E9-A226, E9-A227, E9-A228, E9-A229, E9-A22A, E9-A22B, E9-A22D, E9-A22E, E9-A22G, E9-A22H, E9-A243, E9-A244, E9-A245, E9-A247, E9-A248, E9-A249, E9-A24A, E9-A295, E9-A2JS, E9-A2JT, E9-A3HO, E9-A3Q9, E9-A3QA, E9-A3X8, E9-A54X, E9-A54Y, E9-A5FK, E9-A5FL, E9-A5UO, E9-A5UP, EW-A1IW, EW-A1IX, EW-A1IY, EW-A1IZ, EW-A1J1, EW-A1J2, EW-A1J3, EW-A1J5, EW-A1J6, EW-A1OV, EW-A1OX, EW-A1OY, EW-A1OZ, EW-A1P0, EW-A1P1, EW-A1P3, EW-A1P4, EW-A1P5, EW-A1P6, EW-A1P7, EW-A1P8, EW-A1PA, EW-A1PB, EW-A1PC, EW-A1PD, EW-A1PE, EW-A1PG, EW-A1PH, EW-A2FR, EW-A2FS, EW-A2FV, EW-A2FW, EW-A3E8, EW-A3U0, EW-A423, GI-A2C8, GI-A2C9, GM-A2D9, GM-A2DA, GM-A2DB, GM-A2DC, GM-A2DD, GM-A2DF, GM-A2DH, GM-A2DI, GM-A2DK, GM-A2DL, GM-A2DM, GM-A2DN, GM-A2DO, GM-A3NW, GM-A3NY, GM-A3XG, GM-A3XL, GM-A3XN, GM-A4E0, GM-A5PV, GM-A5PX, HN-A2NL, HN-A2OB, JL-A3YW, JL-A3YX, LL-A440, LL-A441, LL-A50Y, LL-A5YL, LL-A5YM, LL-A5YN, LL-A5YO, LL-A5YP, LQ-A4E4, MS-A51U, OK-A5Q2, OL-A5D6, OL-A5D7, OL-A5D8, OL-A5DA, OL-A5RU, OL-A5RV, OL-A5RW, OL-A5RX, OL-A5RY, OL-A5RZ, OL-A5S0, OL-A66H, OL-A66I, OL-A66J, OL-A66K, PE-A5DC, PE-A5DD, PE-A5DE.
Cervical Cancer (197 samples):
Source T6 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
BI-A0VR, BI-A0VS, BI-A20A, C5-A0TN, C5-A1BE, C5-A1BF, C5-A1BI, C5-A1BJ, C5-A1BK, C5-A1BL, C5-A1BM, C5-A1BN, C5-A1BQ, C5-A1M5, C5-A1M6, C5-A1M7, C5-A1M8, C5-A1M9, C5-A1ME, C5-A1MF, C5-A1MH, C5-A1MI, C5-A1MJ, C5-A1MK, C5-A1ML, C5-A1MN, C5-A1MP, C5-A1MQ, C5-A2LS, C5-A2LT, C5-A2LV, C5-A2LX, C5-A2LY, C5-A2LZ, C5-A2M1, C5-A2M2, C5-A3HD, C5-A3HE, C5-A3HF, C5-A3HL, C5-A7CG, C5-A7CH, C5-A7CJ, C5-A7CK, C5-A7CL, C5-A7CM, C5-A7CO, C5-A7UC, C5-A7UE, C5-A7UH, C5-A7X3, DG-A2KH, DG-A2KJ, DG-A2KK, DG-A2KL, DG-A2KM, DR-A0ZL, DR-A0ZM, DS-A0VK, DS-A0VL, DS-A0VM, DS-A0VN, DS-A1OA, DS-A3LQ, DS-A5RQ, DS-A7WF, DS-A7WH, DS-A7WI, EA-A1QS, EA-A1QT, EA-A3HQ, EA-A3HR, EA-A3HT, EA-A3HU, EA-A3QD, EA-A3QE, EA-A3Y4, EA-A410, EA-A411, EA-A439, EA-A43B, EA-A44S, EA-A4BA, EA-A50E, EA-A556, EA-A5FO, EA-A5O9, EA-A5ZD, EA-A5ZE, EA-A5ZF, EA-A6QX, EA-A78R, EK-A2GZ, EK-A2H0, EK-A2H1, EK-A2IP, EK-A2PG, EK-A2PI, EK-A2PK, EK-A2PL, EK-A2PM, EK-A2R7, EK-A2R8, EK-A2R9, EK-A2RA, EK-A2RB, EK-A2RC, EK-A2RD, EK-A2RE, EK-A2RJ, EK-A2RK, EK-A2RL, EK-A2RM, EK-A2RN, EK-A2RO, EK-A3GJ, EK-A3GK, EK-A3GM, EK-A3GN, EX-A1H5, EX-A1H6, EX-A3L1, EX-A449, EX-A69L, EX-A69M, FU-A23K, FU-A23L, FU-A2QG, FU-A3EO, FU-A3HY, FU-A3HZ, FU-A3NI, FU-A3TQ, FU-A3TX, FU-A3WB, FU-A3YQ, FU-A40J, FU-A57G, FU-A5XV, FU-A770, HG-A2PA, HM-A3JJ, HM-A3JK, HM-A4S6, HM-A6W2, IR-A3L7, IR-A3LA, IR-A3LB, IR-A3LC, IR-A3LF, IR-A3LH, IR-A3LI, IR-A3LK, IR-A3LL, JW-A5VG, JW-A5VH, JW-A5VI, JW-A5VJ, JW-A5VK, JW-A5VL, JW-A69B, JW-A852, JX-A3PZ, JX-A3Q0, JX-A3Q8, JX-A5QV, LP-A4AU, LP-A4AV, LP-A4AW, LP-A4AX, LP-A5U2, LP-A5U3, LP-A7HU, MU-A51Y, MU-A5YI, MY-A5BD, MY-A5BE, MY-A5BF, Q1-A5R1, Q1-A5R2, Q1-A5R3, Q1-A6DT, Q1-A6DV, Q1-A6DW, Q1-A73O, Q1-A73P, Q1-A73Q, Q1-A73R, Q1-A73S, R2-A69V, RA-A741, UC-A7PD, UC-A7PF, WL-A834, DS-A1OB, DS-A1OC, DS-A1OD.
Cholangiocarcinoma (139 samples):
Source Z2 = [56]:
1, 10, 100, 101, 107, 108, 109, 110, 111, 112, 113, 115, 116, 118, 119, 120, 121, 122, 123, 125, 127, 128, 129, 13, 130, 131, 132, 133, 134, 135, 137, 139, 140, 141, 142, 143, 144, 145, 146, 147, 16, 17, 18, 19, 2, 20, 24, 25, 26, 28, 29, 3, 33, 34, 35, 39, 41, 42, 44, 46, 48, 5, 50, 51, 52, 53, 56, 58, 59, 6, 60, 61, 63, 64, 66, 67, 69, 7, 70, 71, 74, 79, 8, 8_1, 8_2, 8_4, 8_6, 80, 81, 82, 85, 86, 87, 88, 89, 9, 90, 91, 94, 95, 97, 98, 99.
Source T7 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
3X-AAV9, 3X-AAVA, 3X-AAVB, 3X-AAVC, 3X-AAVE, 4G-AAZO, 4G-AAZT, W5-AA2G, W5-AA2H, W5-AA2I, W5-AA2O, W5-AA2Q, W5-AA2R, W5-AA2T, W5-AA2U, W5-AA2W, W5-AA2X, W5-AA2Z, W5-AA30, W5-AA31, W5-AA33, W5-AA34, W5-AA36, W5-AA38, W5-AA39, W6-AA0S, WD-A7RX, YR-A95A, ZD-A8I3, ZH-A8Y1, ZH-A8Y2, ZH-A8Y4, ZH-A8Y5, ZH-A8Y6, ZH-A8Y8, ZU-A8S4.
Chronic Lymphocytic Leukemia (80 samples):
Source Q1 = [57]:
170, 171, 172, 173, 174, 175, 18, 181, 182, 184, 185, 186, 188, 189, 19, 191, 193, 194, 195, 197, 20, 22, 23, 264, 266, 267, 27, 270, 272, 273, 274, 275, 276, 278, 279, 280, 29, 290, 30, 319, 32, 321, 322, 323, 324, 325, 326, 328, 33, 375, 39, 40, 41, 42, 43, 44, 45, 48, 49, 5, 51, 52, 53, 54, 6, 618, 63, 64, 642, 680, 7, 758, 761, 785, 8, 82, 83, 9, 90, 91.
Colorectal Cancer (581 samples):
Source S3 = [58]:
587220, 587222, 587224, 587226, 587228, 587230, 587232, 587234, 587238, 587242, 587246, 587254, 587256, 587260, 587262, 587264, 587268, 587270, 587276, 587278, 587282, 587284, 587286, 587288, 587290, 587292, 587294, 587298, 587300, 587302, 587304, 587306, 587316, 587318, 587322, 587328, 587330, 587332, 587334, 587336, 587338, 587340, 587342, 587344, 587346, 587348, 587350, 587352, 587354, 587356, 587358, 587360, 587362, 587364, 587368, 587370, 587372, 587374, 587376, 587378, 587380, 587382, 587384, 587386, 587388, 587390, 587392, 587394, 587398, 587400.
Source T8 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
A6-2670, A6-2671, A6-2672, A6-2674, A6-2675, A6-2676, A6-2677, A6-2678, A6-2683, A6-3807, A6-3808, A6-3809, A6-3810, A6-4105, A6-5656, A6-5657, A6-5659, A6-5660, A6-5661, A6-5662, A6-5664, A6-5665, A6-5666, A6-5667, A6-6137, A6-6138, A6-6140, A6-6141, A6-6142, A6-6648, A6-6649, A6-6650, A6-6651, A6-6652, A6-6653, A6-6654, A6-6780, A6-6781, A6-6782, AA-3489, AA-3492, AA-3496, AA-3502, AA-3510, AA-3511, AA-3514, AA-3516, AA-3517, AA-3518, AA-3519, AA-3520, AA-3521, AA-3522, AA-3524, AA-3525, AA-3526, AA-3527, AA-3529, AA-3531, AA-3532, AA-3534, AA-3538, AA-3542, AA-3543, AA-3544, AA-3548, AA-3549, AA-3552, AA-3553, AA-3554, AA-3555, AA-3556, AA-3558, AA-3560, AA-3561, AA-3562, AA-3655, AA-3660, AA-3662, AA-3663, AA-3664, AA-3666, AA-3667, AA-3672, AA-3673, AA-3678, AA-3679, AA-3680, AA-3681, AA-3684, AA-3685, AA-3688, AA-3692, AA-3693, AA-3695, AA-3696, AA-3697, AA-3710, AA-3712, AA-3713, AA-3715, AA-3811, AA-3812, AA-3814, AA-3815, AA-3818, AA-3819, AA-3821, AA-3831, AA-3833, AA-3837, AA-3842, AA-3844, AA-3845, AA-3846, AA-3848, AA-3850, AA-3851, AA-3852, AA-3854, AA-3855, AA-3856, AA-3858, AA-3860, AA-3861, AA-3864, AA-3866, AA-3867, AA-3869, AA-3870, AA-3872, AA-3875, AA-3877, AA-3930, AA-3939, AA-3941, AA-3947, AA-3949, AA-3950, AA-3952, AA-3955, AA-3956, AA-3966, AA-3968, AA-3971, AA-3972, AA-3973, AA-3975, AA-3977, AA-3979, AA-3980, AA-3982, AA-3984, AA-3986, AA-3989, AA-3994, AA-A004, AA-A00A, AA-A00D, AA-A00E, AA-A00F, AA-A00J, AA-A00K, AA-A00L, AA-A00N, AA-A00O, AA-A00Q, AA-A00R, AA-A00U, AA-A00W, AA-A00Z, AA-A010, AA-A017, AA-A01D, AA-A01F, AA-A01G, AA-A01I, AA-A01K, AA-A01P, AA-A01Q, AA-A01R, AA-A01S, AA-A01T, AA-A01V, AA-A01X, AA-A01Z, AA-A022, AA-A024, AA-A029, AA-A02F, AA-A02H, AA-A02J, AA-A02K, AA-A02O, AA-A02W, AA-A02Y, AA-A03F, AA-A03J, AD-5900, AD-6548, AD-6888, AD-6889, AD-6890, AD-6895, AD-6899, AD-6901, AD-6963, AD-6964, AD-6965, AF-2687, AF-2689, AF-2691, AF-2692, AF-2693, AF-3400, AF-3913, AF-4110, AF-5654, AF-6136, AF-6655, AF-6672, AG-3574, AG-3575, AG-3578, AG-3580, AG-3581, AG-3582, AG-3583, AG-3584, AG-3586, AG-3587, AG-3593, AG-3594, AG-3598, AG-3599, AG-3600, AG-3601, AG-3602, AG-3605, AG-3608, AG-3609, AG-3611, AG-3612, AG-3726, AG-3727, AG-3731, AG-3732, AG-3742, AG-3878, AG-3881, AG-3882, AG-3883, AG-3885, AG-3887, AG-3890, AG-3892, AG-3893, AG-3894, AG-3896, AG-3898, AG-3901, AG-3902, AG-3909, AG-3999, AG-4001, AG-4005, AG-4007, AG-4008, AG-4015, AG-A002, AG-A008, AG-A00C, AG-A00H, AG-A00Y, AG-A011, AG-A014, AG-A015, AG-A016, AG-A01L, AG-A01W, AG-A01Y, AG-A020, AG-A025, AG-A026, AG-A02G, AG-A02N, AG-A02X, AG-A032, AG-A036, AH-6544, AH-6547, AH-6549, AH-6643, AH-6644, AM-5820, AM-5821, AU-3779, AU-6004, AY-4070, AY-4071, AY-5543, AY-6196, AY-6197, AY-6386, AZ-4315, AZ-4323, AZ-4615, AZ-4616, AZ-4681, AZ-4682, AZ-5403, AZ-5407, AZ-6598, AZ-6599, AZ-6600, AZ-6601, AZ-6603, AZ-6605, AZ-6606, AZ-6607, AZ-6608, CA-5254, CA-5255, CA-5796, CA-5797, CA-6715, CA-6716, CA-6717, CA-6718, CA-6719, CI-6619, CI-6620, CI-6621, CI-6622, CI-6624, CK-4947, CK-4948, CK-4950, CK-4952, CK-5912, CK-5913, CK-5914, CK-5915, CK-5916, CK-6746, CK-6747, CK-6748, CK-6751, CL-5917, CL-5918, CM-4743, CM-4744, CM-4746, CM-4747, CM-4748, CM-4750, CM-4752, CM-5341, CM-5344, CM-5348, CM-5349, CM-5860, CM-5861, CM-5862, CM-5863, CM-5864, CM-5868, CM-6161, CM-6162, CM-6163, CM-6164, CM-6165, CM-6166, CM-6167, CM-6168, CM-6169, CM-6170, CM-6171, CM-6172, CM-6674, CM-6675, CM-6676, CM-6677, CM-6678, CM-6679, CM-6680, D5-5537, D5-5538, D5-5539, D5-5540, D5-5541, D5-6529, D5-6531, D5-6532, D5-6533, D5-6534, D5-6535, D5-6536, D5-6537, D5-6538, D5-6539, D5-6540, D5-6541, D5-6898, D5-6920, D5-6922, D5-6923, D5-6924, D5-6926, D5-6927, D5-6928, D5-6929, D5-6930, D5-6931, D5-6932, D5-7000, DC-5337, DC-5869, DC-6155, DC-6157, DC-6158, DC-6160, DC-6681, DC-6682, DC-6683, DM-A0X9, DM-A0XD, DM-A0XF, DM-A1D0, DM-A1D4, DM-A1D6, DM-A1D7, DM-A1D8, DM-A1D9, DM-A1DA, DM-A1DB, DM-A1HA, DM-A1HB, DM-A282, DM-A285, DM-A28C, DM-A28E, DM-A28F, DM-A28G, DM-A28H, DM-A28K, DM-A28M, DT-5265, DY-A0XA, DY-A1DC, DY-A1DD, DY-A1DF, DY-A1DG, DY-A1H8, EF-5830, EI-6506, EI-6507, EI-6508, EI-6510, F4-6459, F4-6460, F4-6461, F4-6463, F4-6569, F4-6570, F4-6703, F4-6704, F4-6805, F4-6806, F4-6807, F4-6808, F4-6809, F4-6854, F4-6855, F4-6856, F4-6857, F5-6464, F5-6465, F5-6571, F5-6702, F5-6811, F5-6812, F5-6813, G4-6293, G4-6294, G4-6295, G4-6297, G4-6298, G4-6299, G4-6302, G4-6303, G4-6304, G4-6306, G4-6307, G4-6309, G4-6310, G4-6311, G4-6314, G4-6315, G4-6317, G4-6320, G4-6321, G4-6322, G4-6323, G4-6586, G4-6588, G4-6625, G4-6626, G4-6628, G5-6235, G5-6641.
Esophageal Cancer (329 samples):
Source D2 = [59]. Sample IDs are of the form ESO-*-Tumor, where * is:
0001, 0009, 0013, 0015, 0019, 0023, 0025, 0029, 003, 005, 0053, 0059, 0061, 0067, 007, 0071, 0079, 0103, 0115, 0123, 0125, 0129, 0133, 0149, 0167, 017, 0176, 021, 0255, 027, 0280, 037, 043, 045, 0459, 049, 051, 0590, 075, 077, 083, 085, 0950, 105, 1059, 1060, 107, 1096, 111, 1130, 1133, 114, 1145, 1154, 116, 1163, 117, 118, 119, 120, 122, 130, 131, 135, 137, 139, 141, 1427, 143, 147, 1481, 1488, 151, 152, 153, 155, 157, 159, 1594, 160, 1608, 161, 164, 165, 167, 1670, 169, 171, 173, 1733, 1748, 175, 177, 179, 184, 185, 187, 1872, 189, 191, 2143, 224, 2472, 250, 251, 2536, 327, 408, 409, 512, 536, 539, 555, 580, 582, 601, 610, 632, 640, 669, 682, 683, 708, 718, 720, 721, 732, 752, 805, 837, 838, 859, 864, 866, 874, 887, 913, 916, 931, 963, D76, H01, H63, K08, R61, S41.
Source T9 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
2H-A9GF, 2H-A9GH, 2H-A9GI, 2H-A9GJ, 2H-A9GK, 2H-A9GL, 2H-A9GM, 2H-A9GN, 2H-A9GO, 2H-A9GQ, 2H-A9GR, IC-A6RE, IC-A6RF, IG-A3I8, IG-A3QL, IG-A3Y9, IG-A3YA, IG-A3YB, IG-A3YC, IG-A4P3, IG-A4QS, IG-A4QT, IG-A50L, IG-A51D, IG-A5B8, IG-A5S3, IG-A625, IG-A6QS, IG-A7DP, IG-A8O2, IG-A97H, IG-A97I, JY-A6F8, JY-A6FA, JY-A6FB, JY-A6FD, JY-A6FE, JY-A6FG, JY-A6FH, JY-A938, JY-A939, JY-A93C, JY-A93D, JY-A93E, JY-A93F, KH-A6WC, L5-A43C, L5-A43E, L5-A43H, L5-A43I, L5-A43J, L5-A43M, L5-A4OE, L5-A4OF, L5-A4OG, L5-A4OH, L5-A4OI, L5-A4OJ, L5-A4OM, L5-A4ON, L5-A4OO, L5-A4OP, L5-A4OQ, L5-A4OR, L5-A4OS, L5-A4OT, L5-A4OU, L5-A4OW, L5-A4OX, L5-A88S, L5-A88T, L5-A88V, L5-A88W, L5-A88Y, L5-A88Z, L5-A891, L5-A893, L5-A8NE, L5-A8NF, L5-A8NG, L5-A8NH, L5-A8NI, L5-A8NJ, L5-A8NK, L5-A8NL, L5-A8NM, L5-A8NN, L5-A8NQ, L5-A8NR, L5-A8NS, L5-A8NT, L5-A8NU, L5-A8NV, L5-A8NW, L7-A56G, L7-A6VZ, LN-A49K, LN-A49L, LN-A49M, LN-A49N, LN-A49O, LN-A49P, LN-A49R, LN-A49S, LN-A49U, LN-A49V, LN-A49W, LN-A49X, LN-A49Y, LN-A4A1, LN-A4A2, LN-A4A3, LN-A4A4, LN-A4A5, LN-A4A6, LN-A4A8, LN-A4A9, LN-A4MQ, LN-A4MR, LN-A5U5, LN-A5U6, LN-A5U7, LN-A7HV, LN-A7HW, LN-A7HX, LN-A7HY, LN-A7HZ, LN-A8HZ, LN-A8I0, LN-A8I1, LN-A9FO, LN-A9FP, LN-A9FQ, LN-A9FR, M9-A5M8, Q9-A6FU, Q9-A6FW, R6-A6DN, R6-A6DQ, R6-A6KZ, R6-A6L4, R6-A6L6, R6-A6XG, R6-A6XQ, R6-A6Y0, R6-A6Y2, R6-A8W5, R6-A8W8, R6-A8WC, R6-A8WG, RE-A7BO, S8-A6BV, S8-A6BW, V5-A7RB, V5-A7RC, V5-A7RE, V5-AASV, V5-AASW, V5-AASX, VR-A8EO, VR-A8EP, VR-A8EQ, VR-A8ER, VR-A8ET, VR-A8EU, VR-A8EW, VR-A8EX, VR-A8EY, VR-A8EZ, VR-A8Q7, VR-AA4D, VR-AA4G, VR-AA7B, VR-AA7D, X8-AAAR, XP-A8T6, XP-A8T7, XP-A8T8, Z6-A8JD, Z6-A8JE, Z6-A9VB, Z6-AAPN, ZR-A9CJ.
Gastric Cancer (401 samples):
Source Z3 = [60]:
2000362, 31231321, 76629543, 970010, 98748381, 990089, 990097, 990172, 990300, 990396, 990475, 990515, TGH, TWH.
Source W1 = [61]. Sample IDs are of the form pfg*T, where * is:
001, 002, 003, 005, 006, 007, 008, 009, 010, 011, 014, 015, 016, 017, 018, 019, 020, 021, 022, 024, 025, 029.
Source T10 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
B7-5816, B7-5818, B7-A5TI, B7-A5TJ, B7-A5TK, B7-A5TN, BR-4183, BR-4184, BR-4186, BR-4187, BR-4188, BR-4190, BR-4191, BR-4194, BR-4195, BR-4197, BR-4199, BR-4200, BR-4201, BR-4205, BR-4253, BR-4255, BR-4256, BR-4257, BR-4259, BR-4261, BR-4263, BR-4264, BR-4265, BR-4267, BR-4271, BR-4273, BR-4276, BR-4277, BR-4278, BR-4279, BR-4280, BR-4281, BR-4283, BR-4284, BR-4286, BR-4288, BR-4291, BR-4292, BR-4294, BR-4298, BR-4357, BR-4361, BR-4362, BR-4363, BR-4366, BR-4368, BR-4369, BR-4370, BR-4371, BR-4375, BR-4376, BR-6452, BR-6453, BR-6454, BR-6455, BR-6456, BR-6457, BR-6458, BR-6563, BR-6564, BR-6565, BR-6566, BR-6705, BR-6706, BR-6707, BR-6709, BR-6801, BR-6802, BR-6803, BR-6852, BR-7196, BR-7197, BR-7703, BR-7704, BR-7707, BR-7715, BR-7716, BR-7717, BR-7722, BR-7723, BR-7851, BR-7901, BR-7957, BR-7958, BR-7959, BR-8058, BR-8059, BR-8060, BR-8077, BR-8078, BR-8080, BR-8081, BR-8284, BR-8285, BR-8286, BR-8289, BR-8291, BR-8295, BR-8296, BR-8297, BR-8360, BR-8361, BR-8363, BR-8364, BR-8365, BR-8366, BR-8367, BR-8368, BR-8369, BR-8370, BR-8371, BR-8372, BR-8373, BR-8380, BR-8381, BR-8382, BR-8384, BR-8483, BR-8484, BR-8485, BR-8486, BR-8487, BR-8588, BR-8589, BR-8590, BR-8591, BR-8592, BR-8676, BR-8677, BR-8678, BR-8679, BR-8680, BR-8682, BR-8683, BR-8686, BR-8687, BR-8690, BR-A44T, BR-A44U, BR-A452, BR-A453, BR-A4CQ, BR-A4CR, BR-A4CS, BR-A4IU, BR-A4IV, BR-A4IY, BR-A4IZ, BR-A4J1, BR-A4J2, BR-A4J4, BR-A4J5, BR-A4J6, BR-A4J7, BR-A4J8, BR-A4PD, BR-A4PE, BR-A4PF, BR-A4QI, BR-A4QL, BR-A4QM, CD-5798, CD-5799, CD-5800, CD-5801, CD-5802, CD-5803, CD-5804, CD-5813, CD-8524, CD-8525, CD-8526, CD-8527, CD-8528, CD-8529, CD-8530, CD-8531, CD-8532, CD-8533, CD-8534, CD-8535, CD-8536, CD-A486, CD-A487, CD-A489, CD-A48A, CD-A48C, CD-A4MG, CD-A4MH, CD-A4MI, CD-A4MJ, CG-4300, CG-4301, CG-4304, CG-4305, CG-4306, CG-4436, CG-4437, CG-4438, CG-4440, CG-4441, CG-4442, CG-4443, CG-4444, CG-4449, CG-4455, CG-4460, CG-4462, CG-4465, CG-4466, CG-4469, CG-4474, CG-4475, CG-4476, CG-4477, CG-5716, CG-5717, CG-5718, CG-5719, CG-5720, CG-5721, CG-5722, CG-5723, CG-5724, CG-5725, CG-5726, CG-5727, CG-5728, CG-5730, CG-5732, CG-5733, CG-5734, D7-5577, D7-5578, D7-5579, D7-6518, D7-6519, D7-6520, D7-6521, D7-6522, D7-6524, D7-6525, D7-6526, D7-6527, D7-6528, D7-6815, D7-6817, D7-6818, D7-6820, D7-6822, D7-8570, D7-8572, D7-8573, D7-8574, D7-8575, D7-8576, D7-8578, D7-8579, D7-A4YT, D7-A4YU, D7-A4YV, D7-A4YX, D7-A4YY, D7-A4Z0, D7-A6ET, D7-A6EV, D7-A6EX, D7-A6EY, D7-A6EZ, D7-A6F0, D7-A6F2, D7-A747, D7-A748, D7-A74A, D7-A74B, EQ-5647, EQ-8122, EQ-A4SO, F1-6177, F1-6874, F1-6875, F1-A448, F1-A72C, FP-7735, FP-7829, FP-7916, FP-7998, FP-8099, FP-8209, FP-8210, FP-8211, FP-8631, FP-A4BE, FP-A4BF, HF-7131, HF-7132, HF-7133, HF-7134, HF-7136, HF-A5NB, HJ-7597, HU-8238, HU-8243, HU-8244, HU-8245, HU-8249, HU-8602, HU-8604, HU-8608, HU-8610, HU-A4G2, HU-A4G3, HU-A4G6, HU-A4G8, HU-A4G9, HU-A4GC, HU-A4GD, HU-A4GF, HU-A4GH, HU-A4GJ, HU-A4GN, HU-A4GP, HU-A4GQ, HU-A4GT, HU-A4GU, HU-A4GX, HU-A4GY, HU-A4H0, HU-A4H2, HU-A4H3, HU-A4H4, HU-A4H5, HU-A4H6, HU-A4H8, HU-A4HB, HU-A4HD, IN-7806, IN-7808, IN-8462, IN-8663, IN-A6RI, IN-A6RJ, IN-A6RL, IN-A6RN, IN-A6RO, IN-A6RP, IN-A6RR, IP-7968, KB-A6F5, KB-A6F7, MX-A5UG, MX-A5UJ, MX-A663, MX-A666, R5-A7O7, RD-A7BS, RD-A7BT, RD-A7BW, RD-A7C1.
Glioblastoma Multiforme (359 samples):
Source P2 = [62]. Sample IDs are of the form Br*, where * is:
001X, 018X, 019X, 02X, 03X, 04X, 05X, 06X, 07X, 08X, 102X, 103X, 104X, 10P, 112X, 116X, 117X, 118X, 11P, 128X, 12P, 132X, 133X, 136X, 13X, 143X, 148X, 14X, 15X, 16X, 17X, 20P, 21PT, 229T, 230T, 237T, 238T, 23X, 247T, 248T, 25X, 26X, 27P, 29P, 301T, 302T, 303T, 306T, 401X, 9PT.
Source T11 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
02-0003, 02-0007, 02-0010, 02-0014, 02-0015, 02-0021, 02-0028, 02-0033, 02-0043, 02-0047, 02-0055, 02-0083, 02-0089, 02-0099, 02-0107, 02-0114, 02-0115, 02-2470, 02-2483, 02-2485, 02-2486, 06-0119, 06-0122, 06-0124, 06-0125, 06-0126, 06-0128, 06-0129, 06-0130, 06-0132, 06-0137, 06-0138, 06-0139, 06-0140, 06-0141, 06-0142, 06-0143, 06-0145, 06-0147, 06-0148, 06-0151, 06-0152, 06-0154, 06-0155, 06-0157, 06-0158, 06-0165, 06-0166, 06-0167, 06-0168, 06-0169, 06-0171, 06-0173, 06-0174, 06-0176, 06-0178, 06-0184, 06-0185, 06-0188, 06-0189, 06-0190, 06-0192, 06-0195, 06-0201, 06-0209, 06-0210, 06-0211, 06-0213, 06-0214, 06-0216, 06-0219, 06-0221, 06-0237, 06-0238, 06-0240, 06-0241, 06-0644, 06-0645, 06-0646, 06-0648, 06-0649, 06-0650, 06-0686, 06-0743, 06-0744, 06-0745, 06-0747, 06-0749, 06-0750, 06-0875, 06-0876, 06-0877, 06-0878, 06-0879, 06-0881, 06-0882, 06-0939, 06-1804, 06-1806, 06-2557, 06-2558, 06-2559, 06-2561, 06-2562, 06-2563, 06-2564, 06-2565, 06-2567, 06-2569, 06-2570, 06-5408, 06-5410, 06-5411, 06-5412, 06-5413, 06-5414, 06-5415, 06-5417, 06-5418, 06-5856, 06-5858, 06-5859, 06-6388, 06-6389, 06-6390, 06-6391, 06-6693, 06-6694, 06-6695, 06-6697, 06-6698, 06-6699, 06-6700, 06-6701, 08-0244, 08-0345, 08-0352, 08-0353, 08-0360, 08-0373, 08-0375, 08-0385, 08-0386, 12-0615, 12-0616, 12-0618, 12-0619, 12-0688, 12-0692, 12-0821, 12-1597, 12-3649, 12-3650, 12-3652, 12-3653, 12-5295, 12-5299, 12-5301, 14-0740, 14-0781, 14-0786, 14-0787, 14-0789, 14-0790, 14-0813, 14-0817, 14-0862, 14-0871, 14-1034, 14-1043, 14-1395, 14-1450, 14-1456, 14-1823, 14-1825, 14-1829, 14-2554, 14-3476, 14-4157, 15-0742, 15-1444, 16-0846, 16-0861, 16-1045, 16-1048, 19-1390, 19-1790, 19-2619, 19-2620, 19-2623, 19-2624, 19-2625, 19-2629, 19-2631, 19-4068, 19-5953, 26-1439, 26-1442, 26-5132, 26-5133, 26-5134, 26-5135, 26-5136, 26-5139, 26-6173, 26-6174, 27-1830, 27-1831, 27-1832, 27-1833, 27-1834, 27-1835, 27-1836, 27-1837, 27-1838, 27-2518, 27-2519, 27-2521, 27-2523, 27-2524, 27-2526, 27-2527, 27-2528, 28-1747, 28-1753, 28-2499, 28-2501, 28-2502, 28-2509, 28-2510, 28-2513, 28-2514, 28-5204, 28-5207, 28-5208, 28-5209, 28-5211, 28-5213, 28-5214, 28-5215, 28-5216, 28-5218, 28-5219, 28-5220, 28-6450, 32-1970, 32-1977, 32-1979, 32-1980, 32-1982, 32-1986, 32-1991, 32-2491, 32-2494, 32-2495, 32-2498, 32-2615, 32-2632, 32-2634, 32-2638, 32-4208, 32-4209, 32-4210, 32-4211, 32-4213, 32-4719, 32-5222, 41-2571, 41-2572, 41-2573, 41-2575, 41-3392, 41-3393, 41-3915, 41-4097, 41-5651, 41-6646, 74-6573, 74-6575, 74-6577, 74-6578, 74-6584, 76-4925, 76-4926, 76-4927, 76-4928, 76-4929, 76-4931, 76-4932, 76-4934, 76-4935, 76-6191, 76-6192, 76-6193, 76-6280, 76-6282, 76-6283, 76-6285, 76-6286, 76-6656, 76-6657, 76-6660, 76-6661, 76-6662, 76-6663, 76-6664, 81-5910, 81-5911, 87-5896.
Head and Neck Cancer (591 samples):
Source A1 = [63]:
139, 266, 325, 347, 388, 478, 91.
Source S4 = [64]:
HN12PT, HN22PT, HN27PT, HN32PT, HN33PT.
Remaining sample IDs are of the form HN_*-Tumor, where * is:
0-046, 0-064, 00076, 00122, 00190, 00313, 00338, 00361, 00378, 00443, 00466, 00761, 01000, 62237, 62298, 62318, 62338, 62374, 62415, 62417, 62421, 62426, 62469, 62481, 62493, 62505, 62506, 62515, 62532, 62539, 62601, 62602, 62624, 62646, 62652, 62671, 62672, 62686, 62699, 62739, 62740, 62741, 62755, 62756, 62807, 62814, 62825, 62832, 62854, 62857_2, 62860, 62861, 62863, 62897, 62921, 62926, 62984, 62996, 63007, 63021, 63027, 63039, 63048, 63058, 63080, 63081, 63095, 63114.
Source T12 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
BA-4074, BA-4075, BA-4076, BA-4077, BA-4078, BA-5149, BA-5151, BA-5152, BA-5153, BA-5555, BA-5556, BA-5557, BA-5558, BA-5559, BA-6868, BA-6869, BA-6870, BA-6871, BA-6872, BA-6873, BA-7269, BA-A4IF, BA-A4IG, BA-A4IH, BA-A4II, BA-A6D8, BA-A6DA, BA-A6DB, BA-A6DD, BA-A6DE, BA-A6DF, BA-A6DG, BA-A6DI, BA-A6DJ, BA-A6DL, BB-4217, BB-4223, BB-4224, BB-4225, BB-4227, BB-4228, BB-7861, BB-7862, BB-7863, BB-7864, BB-7866, BB-7870, BB-7871, BB-7872, BB-8596, BB-8601, BB-A5HU, BB-A5HY, BB-A5HZ, BB-A6UM, BB-A6UO, C9-A47Z, C9-A480, CN-4723, CN-4725, CN-4726, CN-4727, CN-4728, CN-4729, CN-4730, CN-4731, CN-4733, CN-4734, CN-4735, CN-4736, CN-4737, CN-4738, CN-4739, CN-4740, CN-4741, CN-4742, CN-5355, CN-5356, CN-5358, CN-5359, CN-5360, CN-5361, CN-5363, CN-5364, CN-5365, CN-5366, CN-5367, CN-5369, CN-5370, CN-5373, CN-5374, CN-6010, CN-6011, CN-6012, CN-6013, CN-6016, CN-6017, CN-6018, CN-6019, CN-6020, CN-6021, CN-6022, CN-6023, CN-6024, CN-6988, CN-6989, CN-6992, CN-6994, CN-6995, CN-6996, CN-6997, CN-6998, CN-A497, CN-A498, CN-A499, CN-A49A, CN-A49B, CN-A49C, CN-A63T, CN-A63U, CN-A63V, CN-A63W, CN-A63Y, CN-A640, CN-A641, CN-A642, CN-A6UY, CN-A6V1, CN-A6V3, CN-A6V6, CN-A6V7, CQ-5323, CQ-5324, CQ-5325, CQ-5326, CQ-5327, CQ-5329, CQ-5330, CQ-5331, CQ-5332, CQ-5333, CQ-5334, CQ-6218, CQ-6219, CQ-6220, CQ-6221, CQ-6222, CQ-6223, CQ-6224, CQ-6225, CQ-6227, CQ-6228, CQ-6229, CQ-7063, CQ-7064, CQ-7065, CQ-7067, CQ-7068, CQ-7069, CQ-7071, CQ-7072, CQ-A4C6, CQ-A4C7, CQ-A4C9, CQ-A4CB, CQ-A4CD, CQ-A4CE, CQ-A4CG, CQ-A4CH, CQ-A4CI, CR-5243, CR-5247, CR-5248, CR-5249, CR-5250, CR-6467, CR-6470, CR-6471, CR-6472, CR-6473, CR-6474, CR-6477, CR-6478, CR-6480, CR-6481, CR-6482, CR-6484, CR-6487, CR-6488, CR-6491, CR-6492, CR-6493, CR-7364, CR-7365, CR-7367, CR-7368, CR-7369, CR-7370, CR-7371, CR-7372, CR-7373, CR-7374, CR-7376, CR-7377, CR-7379, CR-7380, CR-7382, CR-7383, CR-7385, CR-7386, CR-7388, CR-7389, CR-7390, CR-7391, CR-7392, CR-7393, CR-7394, CR-7395, CR-7397, CR-7398, CR-7399, CR-7401, CR-7402, CR-7404, CV-5430, CV-5431, CV-5432, CV-5434, CV-5435, CV-5436, CV-5439, CV-5440, CV-5441, CV-5442, CV-5443, CV-5444, CV-5966, CV-5970, CV-5971, CV-5973, CV-5976, CV-5977, CV-5978, CV-5979, CV-6003, CV-6433, CV-6436, CV-6441, CV-6933, CV-6934, CV-6935, CV-6936, CV-6937, CV-6938, CV-6939, CV-6940, CV-6941, CV-6942, CV-6943, CV-6945, CV-6948, CV-6950, CV-6951, CV-6952, CV-6953, CV-6954, CV-6955, CV-6956, CV-6959, CV-6960, CV-6961, CV-6962, CV-7089, CV-7090, CV-7091, CV-7095, CV-7097, CV-7099, CV-7100, CV-7101, CV-7102, CV-7103, CV-7104, CV-7177, CV-7178, CV-7180, CV-7183, CV-7235, CV-7236, CV-7238, CV-7242, CV-7243, CV-7245, CV-7247, CV-7248, CV-7250, CV-7252, CV-7253, CV-7254, CV-7255, CV-7261, CV-7263, CV-7406, CV-7407, CV-7409, CV-7410, CV-7411, CV-7413, CV-7414, CV-7415, CV-7416, CV-7418, CV-7421, CV-7422, CV-7423, CV-7424, CV-7425, CV-7427, CV-7429, CV-7430, CV-7432, CV-7433, CV-7434, CV-7435, CV-7437, CV-7438, CV-7440, CV-7446, CV-7568, CV-A45O, CV-A45P, CV-A45Q, CV-A45R, CV-A45T, CV-A45U, CV-A45V, CV-A45W, CV-A45X, CV-A45Y, CV-A45Z, CV-A460, CV-A461, CV-A463, CV-A464, CV-A465, CV-A468, CV-A6JD, CV-A6JE, CV-A6JM, CV-A6JN, CV-A6JO, CV-A6JT, CV-A6JU, CV-A6JY, CV-A6JZ, CV-A6K0, CV-A6K1, CV-A6K2, CX-7082, CX-7085, CX-7086, CX-7219, CX-A4AQ, D6-6515, D6-6516, D6-6517, D6-6823, D6-6824, D6-6825, D6-6826, D6-6827, D6-8568, D6-8569, D6-A4Z9, D6-A4ZB, D6-A6EK, D6-A6EM, D6-A6EN, D6-A6EO, D6-A6EP, D6-A6EQ, D6-A6ES, D6-A74Q, DQ-5624, DQ-5625, DQ-5629, DQ-5630, DQ-5631, DQ-7588, DQ-7589, DQ-7590, DQ-7591, DQ-7592, DQ-7593, DQ-7594, DQ-7595, DQ-7596, F7-7848, F7-8489, F7-A50G, F7-A50I, F7-A50J, F7-A61S, F7-A61V, F7-A61W, F7-A620, F7-A622, F7-A623, F7-A624, H7-7774, H7-8501, H7-A6C4, H7-A6C5, H7-A76A, HD-7229, HD-7753, HD-7754, HD-7831, HD-7832, HD-7917, HD-8224, HD-8314, HD-8634, HD-8635, HD-A4C1, HD-A633, HD-A634, HD-A6HZ, HD-A6I0, HL-7533, IQ-7630, IQ-7631, IQ-7632, IQ-A61E, IQ-A61G, IQ-A61H, IQ-A61I, IQ-A61J, IQ-A61K, IQ-A61L, IQ-A61O, IQ-A6SG, IQ-A6SH, KU-A66S, KU-A66T, KU-A6H7, KU-A6H8, MT-A51W, MT-A51X, MT-A67A, MT-A67D, MT-A67F, MT-A67G, MT-A7BN, MZ-A5BI, MZ-A6I9, MZ-A7D7, P3-A5Q6, P3-A5QA, P3-A5QE, P3-A5QF, P3-A6SW, P3-A6SX, P3-A6T0, P3-A6T2, P3-A6T3, P3-A6T4, P3-A6T5, P3-A6T6, P3-A6T7, P3-A6T8, QK-A64Z, QK-A652, QK-A6IF, QK-A6IG, QK-A6IH, QK-A6II, QK-A6IJ, QK-A6V9, QK-A6VB, QK-A6VC, RS-A6TO, RS-A6TP, T2-A6WX, T2-A6WZ, T2-A6X0, T2-A6X2, TN-A7HI, TN-A7HJ, TN-A7HL, UF-A718, UF-A719, UF-A71A, UF-A71B, UF-A71D, UF-A71E, UF-A7J9, UF-A7JA, UF-A7JC, UF-A7JD, UF-A7JF, UF-A7JH, UF-A7JJ, UF-A7JK, UF-A7JO, UF-A7JS, UF-A7JT, UF-A7JV, UP-A6WW, WA-A7GZ, WA-A7H4.
Liver Cancer (452 samples):
Source S5 = [40]. Sample IDs are of the form BCB*, where * is:
109T, 111T, 151T, 157T, 167T, 231T, 301T, 307T, 325T.
Additional sample IDs are of the form BCM*, where * is:
229T, 257T, 265T, 269T, 275T, 321T, 325T, 329T, 337T, 339T, 371T, 375T, 397T, 399T, 423T, 439T, 455T, 483T, 489T, 501T, 529T, 531T, 543T, 545T, 565T, 567T, 617T, 643T, 671T, 683T, 689T, 695T, 703T, 711T, 723T, 735T, 739T, 759T, 769T, 783T, 791T.
Remaining sample IDs are of the form CHC*, where * is:
051T, 059T, 060T, 097T, 1010T, 1028T, 1035T, 1040T, 1041T, 1044T, 1052T, 1053T, 1055T, 1060T, 1061T, 1062T, 1065T, 1079T, 1081T, 1082T, 1083T, 1085T, 1089T, 1091T, 1097T, 1098T, 1137T, 1148T, 1152T, 1154T, 1162T, 1177T, 1180T, 1182T, 1183T, 1185T, 1186T, 1190T, 1191T, 1192T, 1201T, 1205T, 1207T, 1209T, 1210T, 1211T, 121T, 1530T, 1531T, 1534T, 1539T, 1545T, 1556T, 155T, 1566T, 1568T, 1569T, 1591T, 1592T, 1594T, 1595T, 1596T, 1597T, 1598T, 1600T, 1601T, 1602T, 1603T, 1604T, 1611T, 1616T, 1624T, 1626T, 1629T, 1700T, 1704T, 1708T, 1712T, 1714T, 1715T, 1717T, 1719T, 1720T, 1725T, 1731T, 1732T, 1734T, 1736T, 1737T, 1738T, 1739T, 1741T, 1742T, 1743T, 1744T, 1745T, 1746T, 1747T, 1749T, 1750T, 1751T, 1753T, 1754T, 1756T, 1757T, 1763T, 1774T, 1775T, 1915T, 197T, 2029T, 2034T, 2039Tbis, 2043T, 2048T, 2052T, 205T, 2098T, 2099T, 2103T, 2110Tbis, 2111T, 2112T, 2113T, 2115T, 2127T, 2128T, 2134T, 2141T, 218T, 2200T, 2202T, 2206T, 2208T, 2211T, 2213T, 2215T, 2216T, 2321T, 2351T, 2352T, 2358T, 2362T, 253T, 258T, 301T, 302T, 303T, 304T, 306T, 307T, 313T, 314T, 320T, 322T, 326T, 327T, 361TA, 429T, 432T, 433T, 434T, 437T, 451T, 465T, 469T, 510T, 609T, 614T, 703T, 734T, 736T, 789T, 793T, 794T, 796T, 798T, 799T, 801T, 805T, 879T, 884T, 889T, 891T, 892T, 896T, 898T, 902T, 909T, 912T, 917T, 923T, 961T.
Source H2 = [65]:
P47, P48, P51, P52, P55, P56, P929.
Source T13 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
BC-4073, BC-A10Q, BC-A10R, BC-A10S, BC-A10T, BC-A10U, BC-A10W, BC-A10X, BC-A10Y, BC-A10Z, BC-A110, BC-A112, BC-A216, BC-A217, BC-A3KF, BC-A3KG, BC-A5W4, BC-A69H, BC-A69I, BD-A2L6, BD-A3EP, BD-A3ER, BW-A5NO, BW-A5NP, BW-A5NQ, CC-5258, CC-5259, CC-5260, CC-5261, CC-5262, CC-5263, CC-5264, CC-A123, CC-A1HT, CC-A3M9, CC-A3MA, CC-A3MB, CC-A3MC, CC-A5UC, CC-A5UD, CC-A5UE, CC-A7IF, CC-A7IG, CC-A7IH, CC-A7II, CC-A7IJ, CC-A7IK, CC-A7IL, DD-A113, DD-A114, DD-A115, DD-A116, DD-A118, DD-A119, DD-A11A, DD-A11B, DD-A11C, DD-A11D, DD-A1E9, DD-A1EA, DD-A1EB, DD-A1EC, DD-A1ED, DD-A1EF, DD-A1EG, DD-A1EH, DD-A1EI, DD-A1EJ, DD-A1EK, DD-A1EL, DD-A39V, DD-A39W, DD-A39X, DD-A39Y, DD-A39Z, DD-A3A0, DD-A3A1, DD-A3A2, DD-A3A3, DD-A3A4, DD-A3A5, DD-A3A6, DD-A3A7, DD-A3A8, DD-A3A9, DD-A4NA, DD-A4NB, DD-A4ND, DD-A4NE, DD-A4NF, DD-A4NG, DD-A4NH, DD-A4NI, DD-A4NJ, DD-A4NK, DD-A4NL, DD-A4NN, DD-A4NO, DD-A4NP, DD-A4NQ, DD-A4NR, DD-A4NS, DD-A4NV, DD-A73A, DD-A73B, DD-A73C, DD-A73D, DD-A73E, DD-A73F, DD-A73G, ED-A459, ED-A4XI, ED-A5KG, ED-A627, ED-A66X, ED-A66Y, ED-A7PX, ED-A7PY, ED-A7PZ, ED-A7XO, ED-A7XP, ED-A82E, EP-A12J, EP-A26S, EP-A2KA, EP-A2KB, EP-A2KC, EP-A3JL, EP-A3RK, ES-A2HS, ES-A2HT, FV-A23B, FV-A2QQ, FV-A2QR, FV-A3I0, FV-A3I1, FV-A3R2, FV-A3R3, FV-A495, FV-A496, FV-A4ZP, FV-A4ZQ, G3-A25S, G3-A25T, G3-A25U, G3-A25V, G3-A25W, G3-A25Y, G3-A25Z, G3-A3CG, G3-A3CH, G3-A3CI, G3-A3CJ, G3-A3CK, G3-A5SI, G3-A5SJ, G3-A5SK, G3-A5SL, G3-A5SM, G3-A6UC, G3-A7M5, G3-A7M6, G3-A7M7, G3-A7M8, G3-A7M9, GJ-A6C0, HP-A5MZ, HP-A5N0, K7-A5RF, K7-A5RG, K7-A6G5, KR-A7K0, KR-A7K2, KR-A7K7, KR-A7K8, LG-A6GG, MI-A75C, MI-A75E, MI-A75G, MI-A75H, MI-A75I, MR-A520, NI-A4U2, O8-A75V, PD-A5DF, QA-A7B7, RC-A6M3, RC-A6M4, RC-A6M5, RC-A6M6, RC-A7S9, RC-A7SB, RC-A7SF, RC-A7SK, RG-A7D4, T1-A6J8, UB-A7MA, UB-A7MB, UB-A7MC, UB-A7MD, UB-A7ME, UB-A7MF.
Lung Cancer (1018 samples):
Source D3 = [66]:
16600, 16608, 16628, 16632, 16648, 16660, 16668, 16678, 16686, 16724, 16802, 16814, 16835, 16857, 16949, 17042, 17055, 17156, 17174, 17210, 17218, 17226, 17242, 17268, 17290, 17308, 17733, 17746, 17759, 17763.
Source R1 = [67]:
113368, 134398, 134413, 134417, 134421, 134426, 134427, 134430, 2334187, 2334188, 2334189, 2334191, 2334193, 2334195, 2334196, 2334199, 2334201, 2334202, 585203, 585205, 585208, 585210, 585223, 585258, 585260, 585265, 585267, 585270, 585272, 585276, 631052, 631056, 631060, 631064, 631076, 631084, 631092, 98687, 98711, 98735.
Source P3 = [68]:
H1672, H2171, S00022, S00050, S00356, S00472, S00501, S00539, S00827, S00830, S00833, S00836, S00837, S00841, S00932, S00933, S00935, S00936, S00943, S00944, S00945, S00946, S00947, S01366, S01453, S01494, S01512, S01563, S01728.
Source S6 = [69]. Sample IDs are of the form LC_*, where * is:
C1, C10, C11, C13, C14, C15, C17, C18, C19, C2, C20, C21, C22, C23, C24, C25, C26, C27, C28, C29, C30, C32, C33, C34, C35, C36, C4, C5, C6, C7, C8, C9, S10, S11, S12, S13, S14, S15, S16, S17, S18, S19, S2, S20, S21, S23, S24, S25, S27, S28, S29, S3, S31, S32, S34, S35, S37, S38, S39, S4, S40, S41, S42, S43, S44, S45, S46, S47, S48, S49, S5, S51, S6, S8, S9.
Source I1 = [70]. Sample IDs are of the form LUAD.**.Tumor, where ** is (below * stands for NYU, e.g., *1021 = NYU1021 and the full sample ID is LUAD.NYU1021.Tumor):
5O6B5, 74TBW, B00416, B00523, B00859, B00915, B01102, B01145, B01811, B01970, B02077, B02216, B02477, B02515, B02594, D00147, D01278, D01603, D01751, D02085, D02185, E00163, E00443, E00897, E00918, F00018, F00057, F00089, F00121, F00134, F00162, F00170, F00257, F00282, F00365, F00368, GU4I3, LC15C, LIP77, *1021, *1026, *1027, *1051S, *1093, *1096, *1101, *1142, *1177, *1195, *1210, *1219, *160, *184, *195, *201, *213, *252, *259, *263, *282, *284, *287, *315, *330, *408, *508, *574S, *575, *584S, *608, *627, *669, *689, *696, *704, *739, *796, *802, *803, *846, *847, *848, *947, *994, QCHM7, QJN9L, S00484, S00486, S00499, S01304, S01306, S01315, S01320, S01354, S01357, S01362, S01373, S01409, S01413, S01482, TLLGS, UF7HM, VUMN6, YINHD, YKER9.
Additional sample IDs are of the form LUAD.CHTN.*.Tumor, where * is:
3090346, 3090415, 3090416, 4090680, MAD04.00674, MAD06.00490, MAD06.00668, MAD06.00678, MAD08.00104, Z4716A.
Further sample IDs are of the form LUAD.RT.*.Tumor, where * is:
S01477, S01487, S01699, S01700, S01702, S01703, S01709, S01711, S01721, S01769, S01770, S01771, S01774, S01777, S01808, S01810, S01813, S01818, S01831, S01832, S01840, S01852, S01856, S01866.
Remaining sample IDs are of the form LUAD_*.Tumor, where * is:
E00522, E00565, E00623, E00703, E00945, E01047, E01086, E01147, E01166, E01319, E01419.
Source T14 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA.*, where * is:
05.4244, 05.4249, 05.4250, 05.4382, 05.4384, 05.4389, 05.4390, 05.4395, 05.4396, 05.4397, 05.4398, 05.4402, 05.4403, 05.4405, 05.4410, 05.4415, 05.4417, 05.4418, 05.4420, 05.4422, 05.4424, 05.4425, 05.4426, 05.4427, 05.4430, 05.4432, 05.4433, 05.4434, 05.5420, 05.5423, 05.5425, 05.5428, 05.5429, 05.5715, 17.Z000, 17.Z001, 17.Z002, 17.Z003, 17.Z004, 17.Z005, 17.Z007, 17.Z008, 17.Z009, 17.Z010, 17.Z011, 17.Z012, 17.Z013, 17.Z014, 17.Z015, 17.Z016, 17.Z017, 17.Z018, 17.Z019, 17.Z020, 17.Z021, 17.Z022, 17.Z023, 17.Z025, 17.Z026, 17.Z027, 17.Z028, 17.Z030, 17.Z031, 17.Z032, 17.Z033, 17.Z035, 17.Z036, 17.Z037, 17.Z040, 17.Z041, 17.Z042, 17.Z043, 17.Z044, 17.Z045, 17.Z046, 17.Z047, 17.Z048, 17.Z049, 17.Z050, 17.Z051, 17.Z052, 17.Z053, 17.Z054, 17.Z055, 17.Z056, 17.Z057, 17.Z058, 17.Z059, 17.Z060, 17.Z061, 17.Z062, 18.3406, 18.3407, 18.3408, 18.3409, 18.3410, 18.3411, 18.3412, 18.3414, 18.3415, 18.3416, 18.3417, 18.3419, 18.3421, 18.4083, 18.4086, 18.4721, 18.5592, 18.5595, 21.1070, 21.1071, 21.1076, 21.1077, 21.1078, 21.1081, 21.5782, 21.5784, 21.5786, 21.5787, 22.0944, 22.1002, 22.1011, 22.1012, 22.1016, 22.4591, 22.4593, 22.4595, 22.4599, 22.4601, 22.4604, 22.4607, 22.4613, 22.5471, 22.5472, 22.5473, 22.5474, 22.5477, 22.5478, 22.5480, 22.5482, 22.5485, 22.5489, 22.5491, 22.5492, 33.4532, 33.4533, 33.4538, 33.4547, 33.4566, 33.4582, 33.4583, 33.4586, 33.6737, 34.2596, 34.2600, 34.2608, 34.5231, 34.5232, 34.5234, 34.5236, 34.5239, 34.5240, 34.5927, 34.5928, 34.5929, 35.3615, 35.3621, 35.4122, 35.4123, 35.5375, 37.3783, 37.3789, 37.4133, 37.4135, 37.4141, 37.5819, 38.4625, 38.4626, 38.4627, 38.4628, 38.4629, 38.4630, 38.4631, 38.4632, 38.6178, 38.7271, 38.A44F, 39.5016, 39.5019, 39.5021, 39.5022, 39.5024, 39.5027, 39.5028, 39.5029, 39.5030, 39.5031, 39.5035, 39.5036, 39.5037, 39.5039, 43.2578, 43.3394, 43.3920, 43.5668, 43.6143, 43.6647, 43.6770, 43.6771, 44.2655, 44.2656, 44.2657, 44.2659, 44.2661, 44.2662, 44.2665, 44.2666, 44.2668, 44.3396, 44.3398, 44.3918, 44.3919, 44.4112, 44.5643, 44.5644, 44.5645, 44.6144, 44.6145, 44.6146, 44.6147, 44.6148, 44.6774, 44.6775, 44.6776, 44.6777, 44.6778, 44.6779, 44.7659, 44.7660, 44.7661, 44.7662, 44.7667, 44.7669, 44.7670, 44.7671, 44.7672, 44.8117, 44.8119, 44.8120, 44.A479, 44.A47A, 44.A47B, 44.A47F, 44.A47G, 44.A4SS, 44.A4SU, 46.3765, 46.3767, 46.3768, 46.3769, 46.6025, 46.6026, 49.4486, 49.4487, 49.4488, 49.4490, 49.4494, 49.4501, 49.4505, 49.4506, 49.4507, 49.4510, 49.4512, 49.4514, 49.6742, 49.6743, 49.6744, 49.6745, 49.6761, 49.6767, 50.5044, 50.5045, 50.5049, 50.5051, 50.5055, 50.5066, 50.5068, 50.5072, 50.5930, 50.5931, 50.5932, 50.5933, 50.5935, 50.5936, 50.5939, 50.5941, 50.5942, 50.5944, 50.5946, 50.6590, 50.6591, 50.6592, 50.6593, 50.6594, 50.6595, 50.6597, 50.6673, 50.7109, 50.8457, 50.8459, 50.8460, 51.4079, 51.4080, 51.4081, 53.7624, 53.7626, 53.7813, 53.A4EZ, 55.1592, 55.1594, 55.1595, 55.1596, 55.5899, 55.6543, 55.6642, 55.6712, 55.6968, 55.6969, 55.6970, 55.6971, 55.6972, 55.6975, 55.6978, 55.6979, 55.6980, 55.6981, 55.6982, 55.6983, 55.6984, 55.6985, 55.6986, 55.6987, 55.7227, 55.7281, 55.7283, 55.7284, 55.7570, 55.7573, 55.7574, 55.7576, 55.7724, 55.7725, 55.7726, 55.7727, 55.7728, 55.7815, 55.7816, 55.7903, 55.7907, 55.7910, 55.7911, 55.7913, 55.7914, 55.7994, 55.7995, 55.8085, 55.8087, 55.8089, 55.8090, 55.8091, 55.8092, 55.8094, 55.8096, 55.8097, 55.8203, 55.8204, 55.8205, 55.8206, 55.8207, 55.8208, 55.8299, 55.8301, 55.8302, 55.8505, 55.8506, 55.8507, 55.8508, 55.8510, 55.8511, 55.8512, 55.8513, 55.8514, 55.8614, 55.8615, 55.8616, 55.8619, 55.8620, 55.8621, 55.A48X, 55.A48Y, 55.A48Z, 55.A490, 55.A491, 55.A492, 55.A493, 55.A494, 55.A4DF, 55.A4DG, 56.1622, 56.5897, 56.5898, 56.6545, 56.6546, 60.2698, 60.2707, 60.2708, 60.2709, 60.2710, 60.2711, 60.2712, 60.2713, 60.2715, 60.2719, 60.2720, 60.2721, 60.2722, 60.2723, 60.2724, 60.2725, 60.2726, 62.8394, 62.8395, 62.8397, 62.8398, 62.8399, 62.8402, 62.A46O, 62.A46P, 62.A46R, 62.A46S, 62.A46U, 62.A46V, 62.A46Y, 62.A470, 62.A471, 62.A472, 63.5128, 63.5131, 63.6202, 64.1676, 64.1677, 64.1678, 64.1679, 64.1680, 64.1681, 64.5774, 64.5775, 64.5778, 64.5779, 64.5781, 64.5815, 66.2727, 66.2734, 66.2742, 66.2744, 66.2754, 66.2755, 67.3770, 67.3771, 67.3772, 67.3773, 67.3774, 67.4679, 67.6215, 67.6216, 67.6217, 69.7760, 69.7761, 69.7763, 69.7764, 69.7765, 69.7973, 69.7974, 69.7978, 69.7979, 69.7980, 69.8253, 69.8254, 69.8255, 69.A59K, 71.6725, 71.8520, 73.4658, 73.4659, 73.4662, 73.4666, 73.4668, 73.4670, 73.4675, 73.4676, 73.4677, 73.7498, 73.7499, 75.5122, 75.5125, 75.5126, 75.5146, 75.5147, 75.6203, 75.6205, 75.6206, 75.6207, 75.6211, 75.6212, 75.6214, 75.7025, 75.7027, 75.7030, 75.7031, 78.7143, 78.7145, 78.7146, 78.7147, 78.7148, 78.7149, 78.7150, 78.7152, 78.7153, 78.7154, 78.7155, 78.7156, 78.7158, 78.7159, 78.7160, 78.7161, 78.7162, 78.7163, 78.7166, 78.7167, 78.7220, 78.7535, 78.7536, 78.7537, 78.7539, 78.7540, 78.7542, 78.7633, 78.8640, 78.8648, 78.8655, 78.8660, 78.8662, 80.5607, 80.5608, 80.5611, 83.5908, 86.6562, 86.6851, 86.7701, 86.7711, 86.7713, 86.7714, 86.7953, 86.7954, 86.7955, 86.8054, 86.8055, 86.8056, 86.8073, 86.8074, 86.8075, 86.8076, 86.8278, 86.8279, 86.8280, 86.8281, 86.8358, 86.8359, 86.8585, 86.8668, 86.8669, 86.8671, 86.8672, 86.8673, 86.8674, 86.A456, 86.A4D0, 86.A4JF, 86.A4P7, 86.A4P8, 91.6828, 91.6829, 91.6830, 91.6831, 91.6835, 91.6836, 91.6840, 91.6847, 91.6848, 91.6849, 91.7771, 91.8496, 91.8497, 91.8499, 91.A4BC, 91.A4BD, 93.7347, 93.7348, 93.8067, 93.A4JN, 93.A4JO, 93.A4JP, 93.A4JQ, 95.7039, 95.7043, 95.7562, 95.7567, 95.7944, 95.7947, 95.7948, 95.8039, 95.8494, 95.A4VK, 95.A4VN, 95.A4VP, 97.7546, 97.7547, 97.7552, 97.7553, 97.7554, 97.7937, 97.7938, 97.7941, 97.8171, 97.8172, 97.8174, 97.8175, 97.8176, 97.8177, 97.8179, 97.8547, 97.8552, 97.A4LX, 97.A4M0, 97.A4M1, 97.A4M2, 97.A4M3, 97.A4M5, 97.A4M6, 97.A4M7, 99.7458, 99.8025, 99.8028, 99.8032, 99.8033, J2.8192, J2.8194, J2.A4AD, J2.A4AE, J2.A4AG, L4.A4E5, L4.A4E6, L9.A443, L9.A444, MN.A4N1, MN.A4N4, MN.A4N5, MP.A4SV, MP.A4SW, MP.A4SY, MP.A4T2, MP.A4T4, MP.A4T6, MP.A4T7, MP.A4T8, MP.A4T9, MP.A4TA, MP.A4TC, MP.A4TD, MP.A4TE, MP.A4TF, MP.A4TH, MP.A4TI, MP.A4TK, MP.A5C7, NJ.A4YF, NJ.A4YG, NJ.A4YI, NJ.A4YP, NJ.A4YQ, NJ.A55A, NJ.A55O, NJ.A55R, O1.A52J.
Melanoma (594 samples):
Source S7 = [71]:
A02, A06, D05, D14, D35, D36, D41, D49.
Source D4 = [72]:
COLO-829.
Source B1 = [73]. Sample IDs are of the form ME*-Tumor, where * is:
001, 002, 009, 010, 011, 012, 014, 015, 016, 017, 018, 020, 021, 024, 029, 030, 032, 033, 034, 035, 037, 041, 043, 044, 045, 048, 049, 050.
Remaining sample IDs are:
Mel-BRAFi-03-Tumor, Mel_BRAFi_02_PRE-Tumor.
Source A2 = [32]. Sample IDs are of the form PD*, where * is:
10020a, 10021a, 10022a, 9024a2, 9024b, 9025a, 9025b, 9026a, 9027a, 9027b, 9028a, 9028b, 9029a, 9030a, 9031a, 9032a, 9033a.
Source H3 = [74]. Sample IDs are of the form SKCM-*-Tumor, where * is:
13447, 13456, 13463, 13468, 13473, 13531, 13537, 13543, 13549, 13560, 13561, 13567, 13575, 13591, 13600.
Additional sample IDs are of the form SKCM-JWCI-*-Tumor, where * is:
14, 27, WGS-1, WGS-11, WGS-12, WGS-13, WGS-15, WGS-18, WGS-19, WGS-2, WGS-20, WGS-21, WGS-22, WGS-23, WGS-24, WGS-25, WGS-26, WGS-29, WGS-3, WGS-32, WGS-33, WGS-34, WGS-35, WGS-36, WGS-37, WGS-38, WGS-39, WGS-4, WGS-42, WGS-43, WGS-5, WGS-6, WGS-7, WGS-8.
Further sample IDs are of the form SKCM-Ma-Mel-*-Tumor, where * is:
04, 05, 08a, 102, 103b, 105, 107, 108, 114, 119, 120, 122, 123, 15, 16, 19, 27, 28, 35, 36, 37, 48, 53, 54a, 55, 59, 62, 63, 65, 67, 71, 76, 79, 85, 86, 91, 92, 94.
Remaining sample IDs are:
SKCM-UKRV-Mel-20-Tumor, SKCM-UKRV-Mel-24-Tumor, SKCM-UKRV-Mel-6-Tumor.
Source T15 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
BF-A1PU, BF-A1PV, BF-A1PX, BF-A1PZ, BF-A1Q0, BF-A3DJ, BF-A3DL, BF-A3DM, BF-A3DN, D3-A1Q1, D3-A1Q3, D3-A1Q4, D3-A1Q5, D3-A1Q6, D3-A1Q7, D3-A1Q8, D3-A1Q9, D3-A1QA, D3-A1QB, D3-A2J6, D3-A2J7, D3-A2J8, D3-A2J9, D3-A2JA, D3-A2JB, D3-A2JC, D3-A2JD, D3-A2JF, D3-A2JG, D3-A2JH, D3-A2JK, D3-A2JL, D3-A2JN, D3-A2JO, D3-A2JP, D3-A3BZ, D3-A3C1, D3-A3C3, D3-A3C6, D3-A3C7, D3-A3C8, D3-A3CB, D3-A3CC, D3-A3CE, D3-A3CF, D3-A3ML, D3-A3MO, D3-A3MR, D3-A3MU, D3-A3MV, D9-A148, D9-A149, D9-A1JW, D9-A1JX, D9-A1X3, DA-A1HV, DA-A1HW, DA-A1HY, DA-A1I0, DA-A1I1, DA-A1I2, DA-A1I4, DA-A1I5, DA-A1I7, DA-A1I8, DA-A1IA, DA-A1IB, DA-A1IC, DA-A3F3, DA-A3F5, DA-A3F8, EB-A1NK, EB-A24C, EB-A24D, EB-A299, EB-A3HV, EE-A17X, EE-A17Y, EE-A17Z, EE-A180, EE-A181, EE-A182, EE-A183, EE-A184, EE-A185, EE-A20B, EE-A20C, EE-A20F, EE-A20H, EE-A20I, EE-A29A, EE-A29B, EE-A29C, EE-A29D, EE-A29E, EE-A29G, EE-A29H, EE-A29L, EE-A29M, EE-A29N, EE-A29P, EE-A29Q, EE-A29R, EE-A29S, EE-A29T, EE-A29V, EE-A29W, EE-A29X, EE-A2A0, EE-A2A1, EE-A2A2, EE-A2A5, EE-A2A6, EE-A2GB, EE-A2GC, EE-A2GD, EE-A2GE, EE-A2GH, EE-A2GI, EE-A2GJ, EE-A2GK, EE-A2GL, EE-A2GM, EE-A2GN, EE-A2GO, EE-A2GP, EE-A2GR, EE-A2GS, EE-A2GT, EE-A2GU, EE-A2M5, EE-A2M6, EE-A2M7, EE-A2M8, EE-A2MC, EE-A2MD, EE-A2ME, EE-A2MF, EE-A2MG, EE-A2MH, EE-A2MI, EE-A2MJ, EE-A2MK, EE-A2ML, EE-A2MM, EE-A2MN, EE-A2MP, EE-A2MQ, EE-A2MR, EE-A2MS, EE-A2MT, EE-A2MU, EE-A3AA, EE-A3AB, EE-A3AC, EE-A3AD, EE-A3AE, EE-A3AF, EE-A3AG, EE-A3AH, EE-A3J3, EE-A3J4, EE-A3J5, EE-A3J7, EE-A3J8, EE-A3JA, EE-A3JB, EE-A3JD, EE-A3JE, EE-A3JH, EE-A3JI, ER-A193, ER-A194, ER-A195, ER-A196, ER-A197, ER-A198, ER-A199, ER-A19A, ER-A19B, ER-A19C, ER-A19D, ER-A19E, ER-A19F, ER-A19G, ER-A19H, ER-A19J, ER-A19K, ER-A19L, ER-A19N, ER-A19O, ER-A19P, ER-A19Q, ER-A19S, ER-A19T, ER-A1A1, ER-A2NB, ER-A2NC, ER-A2ND, ER-A2NE, ER-A2NF, ER-A2NG, ER-A2NH, ER-A3ES, ER-A3ET, ER-A3EV, FR-A2OS, FS-A1YX, FS-A1YY, FS-A1Z0, FS-A1Z3, FS-A1Z4, FS-A1Z7, FS-A1ZB, FS-A1ZC, FS-A1ZD, FS-A1ZE, FS-A1ZF, FS-A1ZG, FS-A1ZH, FS-A1ZJ, FS-A1ZK, FS-A1ZM, FS-A1ZN, FS-A1ZP, FS-A1ZQ, FS-A1ZR, FS-A1ZS, FS-A1ZT, FS-A1ZU, FS-A1ZW, FS-A1ZY, FS-A1ZZ, FW-A3I3, GF-A2C7, GN-A262, GN-A263, GN-A264, GN-A265, GN-A266, GN-A267, GN-A268, GN-A269, GN-A26A, GN-A26C, GN-A26D, HR-A2OG, HR-A2OH, IH-A3EA, D3-A5GT, D9-A3Z4, D9-A4Z2, D9-A4Z3, D9-A4Z5, EB-A3XB, EB-A3XC, EB-A3XD, EB-A3XE, EB-A3Y6, EB-A3Y7, EB-A41A, EB-A41B, EB-A42Y, EB-A42Z, EB-A430, EB-A431, EB-A44N, EB-A44O, EB-A44P, EB-A4IQ, EB-A4IS, EB-A4OY, EB-A4OZ, EB-A4P0, EB-A551, EB-A553, EB-A57M, EB-A5SE, EB-A5SF, EB-A5UM, FR-A3R1, FW-A5DX, BF-A5EO, BF-A5EP, BF-A5EQ, BF-A5ER, BF-A5ES, D3-A51E, D3-A51F, D3-A51G, D3-A51H, D3-A51J, D3-A51K, D3-A51N, D3-A51R, D3-A51T, D3-A5GL, D3-A5GN, D3-A5GO, D3-A5GR, D3-A5GS, D9-A3Z1, D9-A3Z3, D9-A6E9, D9-A6EA, D9-A6EC, D9-A6EG, DA-A3F2, EB-A3XF, EB-A44Q, EB-A44R, EB-A4XL, EB-A5FP, EB-A5KH, EB-A5SG, EB-A5SH, EB-A5UL, EB-A5UN, EB-A5VU, EB-A5VV, EB-A6L9, EB-A6QY, EB-A6QZ, EB-A6R0, ER-A19M, ER-A19W, ER-A3PL, ER-A42H, ER-A42K, ER-A42L, FR-A3YN, FR-A3YO, FR-A44A, FR-A69P, FR-A726, FR-A728, FS-A1YW, FS-A1ZA, FS-A4F4, FS-A4F5, FS-A4F8, FS-A4F9, FS-A4FB, FS-A4FC, FS-A4FD, FW-A3R5, FW-A3TU, FW-A3TV, FW-A5DY, GF-A3OT, GF-A6C8, GF-A6C9, GF-A769, GN-A4U3, GN-A4U4, GN-A4U5, GN-A4U7, GN-A4U8, GN-A4U9, OD-A75X, QB-A6FS, RP-A690, RP-A693, RP-A694, RP-A695, D3-A5GU, FS-A4F0, GF-A4EO, RZ-AB0B, V3-A9ZX, V3-A9ZY, V4-A9E5, V4-A9E7, V4-A9E8, V4-A9E9, V4-A9EA, V4-A9EC, V4-A9ED, V4-A9EE, V4-A9EF, V4-A9EH, V4-A9EI, V4-A9EJ, V4-A9EK, V4-A9EL, V4-A9EM, V4-A9EO, V4-A9EQ, V4-A9ES, V4-A9ET, V4-A9EU, V4-A9EV, V4-A9EW, V4-A9EX, V4-A9EY, V4-A9EZ, V4-A9F0, V4-A9F1, V4-A9F2, V4-A9F3, V4-A9F4, V4-A9F5, V4-A9F7, V4-A9F8, VD-A8K7, VD-A8K8, VD-A8K9, VD-A8KA, VD-A8KB, VD-A8KD, VD-A8KE, VD-A8KF, VD-A8KG, VD-A8KH, VD-A8KI, VD-A8KJ, VD-A8KK, VD-A8KL, VD-A8KM, VD-A8KN, VD-A8KO, VD-AA8M, VD-AA8N, VD-AA8O, VD-AA8P, VD-AA8Q, VD-AA8R, VD-AA8S, VD-AA8T, WC-A87T, WC-A87U, WC-A87W, WC-A87Y, WC-A880, WC-A881, WC-A882, WC-A883, WC-A884, WC-A885, WC-A888, WC-A88A, WC-AA9A, WC-AA9E, YZ-A980, YZ-A982, YZ-A983, YZ-A984, YZ-A985.
Nasopharyngeal Cancer (11 samples):
Source L2 = [75]:
NPC088D, NPC105D, NPC29F, NPC31F, NPC34F, NPC3F, NPC42F, NPC4D, NPC4F, NPC5D, NPC5F.
Oral Cancer (106 samples):
Source I2 = [76]. Sample IDs are of the form OSCC-GB_0*, where * is:
001011, 002011, 003011, 004011, 005011, 006011, 007011, 008011, 011011, 012011, 013011, 014011, 015011, 016011, 017011, 018011, 019011, 020011, 021011, 022011, 023011, 024011, 025011, 026011, 027011, 028011, 029011, 030011, 031011, 032011, 033011, 034011, 035011, 036011, 037011, 038011, 039011, 040011, 041011, 042011, 043011, 044011, 045011, 046011, 047011, 048011, 049011, 050011, 051011, 052011, 053011, 054011, 055011, 056011, 057011, 058011, 059011, 060011, 061011, 062011, 063011, 064011, 065011, 066011, 067011, 068011, 069011, 070011, 073011, 074011, 075011, 076011, 077011, 080011, 081011, 082011, 083011, 084011, 085011, 086011, 087011, 088011, 089011, 090011, 091011, 092011, 093011, 094011, 095011, 096011, 097011, 098011, 099011, 100011, 101011, 102011, 103011, 104011, 105011, 106011, 107011, 108011, 109011, 110011, 111011, 112011.
Ovarian Cancer (471 samples):
Source J1 = [77]:
OCC01PT, OCC02PT, OCC03PT, OCC04PT, OCC05PT, OCC06PT, OCC07PT, OCC08PT.
Source T16 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
04-1331, 04-1332, 04-1336, 04-1337, 04-1338, 04-1342, 04-1343, 04-1346, 04-1347, 04-1348, 04-1349, 04-1350, 04-1351, 04-1353, 04-1356, 04-1357, 04-1361, 04-1362, 04-1364, 04-1365, 04-1367, 04-1369, 04-1514, 04-1516, 04-1517, 04-1519, 04-1525, 04-1530, 04-1542, 04-1638, 04-1644, 04-1646, 04-1648, 04-1649, 04-1651, 04-1652, 04-1655, 09-0364, 09-0365, 09-0366, 09-0367, 09-0369, 09-1659, 09-1661, 09-1662, 09-1664, 09-1665, 09-1666, 09-1669, 09-1670, 09-1672, 09-1673, 09-1674, 09-1675, 09-2044, 09-2045, 09-2049, 09-2050, 09-2051, 09-2053, 09-2056, 10-0926, 10-0927, 10-0928, 10-0930, 10-0931, 10-0933, 10-0934, 10-0935, 10-0937, 10-0938, 13-0714, 13-0717, 13-0720, 13-0723, 13-0724, 13-0726, 13-0727, 13-0730, 13-0751, 13-0755, 13-0758, 13-0760, 13-0761, 13-0762, 13-0765, 13-0791, 13-0792, 13-0793, 13-0795, 13-0800, 13-0801, 13-0804, 13-0807, 13-0883, 13-0884, 13-0885, 13-0886, 13-0887, 13-0889, 13-0890, 13-0891, 13-0893, 13-0894, 13-0897, 13-0899, 13-0900, 13-0901, 13-0903, 13-0904, 13-0905, 13-0906, 13-0910, 13-0911, 13-0912, 13-0913, 13-0916, 13-0919, 13-0920, 13-0923, 13-0924, 13-1403, 13-1404, 13-1405, 13-1407, 13-1408, 13-1409, 13-1410, 13-1411, 13-1412, 13-1477, 13-1481, 13-1482, 13-1483, 13-1484, 13-1487, 13-1488, 13-1489, 13-1491, 13-1492, 13-1494, 13-1495, 13-1496, 13-1497, 13-1498, 13-1499, 13-1501, 13-1504, 13-1505, 13-1506, 13-1507, 13-1509, 13-1510, 13-1512, 13-2057, 13-2059, 13-2060, 13-2061, 13-2065, 13-2066, 13-2071, 20-0987, 20-0990, 20-0991, 20-1682, 20-1683, 20-1684, 20-1685, 20-1686, 20-1687, 23-1021, 23-1022, 23-1023, 23-1024, 23-1026, 23-1027, 23-1028, 23-1029, 23-1030, 23-1031, 23-1032, 23-1109, 23-1110, 23-1111, 23-1114, 23-1116, 23-1117, 23-1118, 23-1119, 23-1120, 23-1122, 23-1123, 23-1124, 23-1809, 23-2072, 23-2077, 23-2078, 23-2079, 23-2081, 23-2641, 23-2643, 23-2645, 23-2647, 23-2649, 24-0966, 24-0968, 24-0970, 24-0975, 24-0979, 24-0980, 24-0982, 24-1103, 24-1104, 24-1105, 24-1413, 24-1416, 24-1417, 24-1418, 24-1419, 24-1422, 24-1423, 24-1424, 24-1425, 24-1426, 24-1427, 24-1428, 24-1431, 24-1434, 24-1435, 24-1436, 24-1463, 24-1464, 24-1466, 24-1469, 24-1470, 24-1471, 24-1474, 24-1544, 24-1545, 24-1546, 24-1548, 24-1549, 24-1551, 24-1552, 24-1553, 24-1555, 24-1556, 24-1557, 24-1558, 24-1560, 24-1562, 24-1563, 24-1564, 24-1565, 24-1567, 24-1603, 24-1604, 24-1614, 24-1616, 24-1842, 24-1843, 24-1844, 24-1845, 24-1846, 24-1847, 24-1849, 24-1850, 24-2019, 24-2024, 24-2030, 24-2035, 24-2038, 24-2254, 24-2260, 24-2261, 24-2262, 24-2267, 24-2271, 24-2280, 24-2281, 24-2288, 24-2289, 24-2290, 24-2293, 24-2298, 25-1313, 25-1315, 25-1316, 25-1317, 25-1318, 25-1319, 25-1320, 25-1321, 25-1322, 25-1324, 25-1325, 25-1326, 25-1328, 25-1329, 25-1623, 25-1625, 25-1626, 25-1627, 25-1628, 25-1630, 25-1631, 25-1632, 25-1633, 25-1634, 25-1635, 25-2042, 25-2391, 25-2392, 25-2393, 25-2396, 25-2398, 25-2399, 25-2400, 25-2401, 25-2404, 25-2408, 25-2409, 29-1688, 29-1690, 29-1691, 29-1693, 29-1694, 29-1695, 29-1696, 29-1697, 29-1698, 29-1699, 29-1701, 29-1702, 29-1703, 29-1705, 29-1707, 29-1710, 29-1711, 29-1761, 29-1762, 29-1763, 29-1764, 29-1766, 29-1768, 29-1769, 29-1770, 29-1771, 29-1774, 29-1775, 29-1776, 29-1777, 29-1778, 29-1781, 29-1783, 29-1784, 29-1785, 29-2427, 29-2429, 29-2431, 29-2432, 29-2434, 29-2436, 30-1714, 30-1718, 30-1853, 30-1855, 30-1856, 30-1857, 31-1950, 36-1568, 36-1569, 36-1570, 36-1571, 36-1574, 36-1575, 36-1576, 36-1577, 36-1578, 36-1580, 36-2530, 36-2532, 36-2533, 36-2534, 36-2537, 36-2538, 36-2539, 36-2540, 36-2542, 36-2543, 36-2544, 36-2545, 36-2547, 36-2548, 36-2551, 36-2552, 42-2582, 42-2587, 42-2588, 42-2589, 42-2590, 42-2591, 57-1582, 57-1584, 57-1586, 57-1993, 59-2348, 59-2350, 59-2351, 59-2352, 59-2354, 59-2355, 59-2363, 59-2372, 61-1722, 61-1725, 61-1727, 61-1728, 61-1730, 61-1733, 61-1736, 61-1737, 61-1738, 61-1740, 61-1741, 61-1895, 61-1899, 61-1900, 61-1901, 61-1903, 61-1904, 61-1906, 61-1907, 61-1910, 61-1911, 61-1913, 61-1914, 61-1915, 61-1995, 61-1998, 61-2000, 61-2002, 61-2003, 61-2008, 61-2009, 61-2012, 61-2016, 61-2092, 61-2094, 61-2095, 61-2097, 61-2101, 61-2102, 61-2104, 61-2109, 61-2110, 61-2111, 61-2113, 61-2610, 61-2611, 61-2612, 61-2613, 61-2614.
Pancreatic Cancer (184 samples):
Source W2 = [78]:
IPMN 11, IPMN 12, IPMN 20, IPMN 21, IPMN 36, IPMN 4, IPMN 41, MCN 162, MCN 163, MCN 164, MCN 166, MCN 168, MCN 169, MCN 170, SCA 14, SCA 23, SCA 27, SCA 35, SCA 37, SCA 38, SCA 40, SPN 8.
Source J2 = [79]. Sample IDs are of the form PanNET*, where * is:
10PT, 21PT, 23PT, 24PT, 25PT, 31PT, 36PT, 3PT, 7PT, 93PT.
Source T17 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
2L-AAQA, 2L-AAQE, 2L-AAQI, 2L-AAQJ, 2L-AAQL, 2L-AAQM, 3A-A9I5, 3A-A9I7, 3A-A9I9, 3A-A9IB, 3A-A9IC, 3A-A9IH, 3A-A9IJ, 3A-A9IL, 3A-A9IN, 3A-A9IO, 3A-A9IR, 3A-A9IS, 3A-A9IU, 3E-AAAY, 3E-AAAZ, F2-6879, F2-6880, F2-7273, F2-7276, F2-A44G, F2-A44H, F2-A7TX, F2-A8YN, FB-A4P5, FB-A4P6, FB-A545, FB-A5VM, FB-A78T, FB-A7DR, FB-AAPS, FQ-6551, FQ-6552, FQ-6553, FQ-6554, FQ-6555, FQ-6558, FQ-6559, FZ-5919, FZ-5920, FZ-5921, FZ-5922, FZ-5923, FZ-5924, FZ-5926, H6-8124, H6-A45N, H8-A6C1, HV-A5A3, HV-A5A4, HV-A5A5, HV-A5A6, HV-A7OL, HV-A7OP, HV-AA8X, HZ-7289, HZ-7918, HZ-7919, HZ-7920, HZ-7922, HZ-7923, HZ-7924, HZ-7925, HZ-7926, HZ-8001, HZ-8002, HZ-8003, HZ-8005, HZ-8315, HZ-8317, HZ-8636, HZ-8637, HZ-8638, HZ-A49G, HZ-A49H, HZ-A49I, HZ-A4BH, HZ-A4BK, HZ-A77O, HZ-A77P, HZ-A77Q, HZ-A8P0, HZ-A8P1, IB-7644, IB-7645, IB-7646, IB-7647, IB-7649, IB-7651, IB-7652, IB-7654, IB-7885, IB-7886, IB-7887, IB-7888, IB-7889, IB-7890, IB-7891, IB-7893, IB-7897, IB-8126, IB-8127, IB-A5SO, IB-A5SP, IB-A5SQ, IB-A5SS, IB-A5ST, IB-A6UF, IB-A6UG, IB-A7LX, IB-A7M4, IB-AAUM, IB-AAUN, IB-AAUO, IB-AAUP, IB-AAUR, IB-AAUS, IB-AAUT, IB-AAUU, IB-AAUV, IB-AAUW, LB-A7SX, LB-A8F3, LB-A9Q5, M8-A5N4, OE-A75W, PZ-A5RE, Q3-A5QY, Q3-AA2A, RB-A7B8, RB-AA9M, RL-AAAS, S4-A8RM, S4-A8RO, S4-A8RP, US-A774, US-A776, US-A779, US-A77E, US-A77G, US-A77J, XD-AAUL, XN-A8T3, XN-A8T5, YB-A89D, YH-A8SY, YY-A8LH.
Pheochromocytoma and Paraganglioma (178 samples):
Source T18 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
P7-A5NX, P7-A5NY, P8-A5KC, P8-A5KD, P8-A6RX, P8-A6RY, PR-A5PF, PR-A5PG, PR-A5PH, QR-A6GO, QR-A6GR, QR-A6GS, QR-A6GT, QR-A6GU, QR-A6GW, QR-A6GX, QR-A6GY, QR-A6GZ, QR-A6H0, QR-A6H1, QR-A6H2, QR-A6H3, QR-A6H4, QR-A6H5, QR-A6H6, QR-A6ZZ, QR-A702, QR-A703, QR-A705, QR-A706, QR-A707, QR-A708, QR-A70A, QR-A70C, QR-A70D, QR-A70E, QR-A70G, QR-A70H, QR-A70I, QR-A70J, QR-A70K, QR-A70M, QR-A70N, QR-A70O, QR-A70P, QR-A70Q, QR-A70R, QR-A70T, QR-A70U, QR-A70V, QR-A70W, QR-A70X, QR-A7IN, QR-A7IP, QT-A5XJ, QT-A5XK, QT-A5XL, QT-A5XM, QT-A5XN, QT-A5XO, QT-A5XP, QT-A69Q, QT-A7U0, RM-A68T, RM-A68W, RT-A6Y9, RT-A6YA, RT-A6YC, RW-A67V, RW-A67W, RW-A67X, RW-A67Y, RW-A680, RW-A681, RW-A684, RW-A685, RW-A686, RW-A688, RW-A689, RW-A68A, RW-A68B, RW-A68C, RW-A68D, RW-A68F, RW-A68G, RW-A7CZ, RW-A7D0, RW-A8AZ, RX-A8JQ, S7-A7WL, S7-A7WM, S7-A7WN, S7-A7WO, S7-A7WP, S7-A7WQ, S7-A7WR, S7-A7WT, S7-A7WU, S7-A7WV, S7-A7WW, S7-A7WX, S7-A7X0, S7-A7X1, S7-A7X2, SA-A6C2, SP-A6QC, SP-A6QD, SP-A6QF, SP-A6QG, SP-A6QH, SP-A6QI, SP-A6QJ, SP-A6QK, SQ-A6I4, SQ-A6I6, SR-A6MP, SR-A6MQ, SR-A6MR, SR-A6MS, SR-A6MT, SR-A6MU, SR-A6MV, SR-A6MX, SR-A6MY, SR-A6MZ, SR-A6N0, TT-A6YJ, TT-A6YK, TT-A6YN, TT-A6YO, TT-A6YP, W2-A7H5, W2-A7H7, W2-A7HA, W2-A7HB, W2-A7HC, W2-A7HD, W2-A7HE, W2-A7HF, W2-A7HH, W2-A7UY, WB-A80K, WB-A80L, WB-A80M, WB-A80N, WB-A80O, WB-A80P, WB-A80Q, WB-A80V, WB-A80Y, WB-A814, WB-A815, WB-A816, WB-A817, WB-A818, WB-A819, WB-A81A, WB-A81D, WB-A81E, WB-A81F, WB-A81G, WB-A81H, WB-A81I, WB-A81J, WB-A81K, WB-A81M, WB-A81N, WB-A81P, WB-A81Q, WB-A81R, WB-A81S, WB-A81T, WB-A81V, WB-A81W, WB-A820, WB-A821, WB-A822, XG-A823.
Prostate Cancer (480 samples):
Source B2 = [80]. Sample IDs are of the form P0*-Tumor, where * is:
0-000450, 1-28, 2-1562, 2-2035, 3-1334, 3-1426, 3-1906, 3-2345, 3-2620, 3-3391, 3-595, 3-871, 4-1084, 4-1243, 4-1421, 4-1790, 4-2599, 4-2641, 4-2666, 4-2740, 4-47, 4-594, 5-2212, 5-2594, 5-3436, 5-3829, 5-3852, 5-3859, 5-620, 6-1125, 6-1696, 6-2325, 6-3676, 6-3939, 6-4428, 7-144, 7-360, 7-5036, 7-684, 7-718, 7-837, 8-2516, 8-590, 9-120, 9-1372, 9-1580, 9-2497, 9-649.
Source B3 = [81]. Sample IDs are of the form PR-*, where * is:
0508, 0581, 1701, 1783, 2832, 3027, 3043.
Remaining sample IDs are of the form PR-*-Tumor, where * is:
00-1165, 00-160, 00-1823, 0099, 01-1934, 01-2382, 01-2492, 01-2554, 02-1082, 02-169, 02-1736, 02-1899, 02-2072, 02-2480, 02-254, 03-022, 03-1026, 03-870, 04-1367, 04-194, 04-3113, 04-3222, 04-3347, 04-639, 04-903, 0415, 0427, 05-3440, 05-3595, 05-839, 06-1651, 06-1749, 06-1999, 09-2517, 09-2744, 09-2767, 09-3421, 09-3566, 09-3687, 09-5094, 09-5245, 09-5446, 09-5630, 09-5700, 09-5702, 1024, 1043, 2661, 2682, 2740, 2761, 2762, 2858, 2872, 2915, 2916, 3023, 3026, 3034, 3035, 3036, 3048, 3051, 3127.
Source G2 = [82] (below * stands for WA, e.g., *10 = WA10):
T12, T32, T8, T90, T91, T92, T93, T94, T95, T96, T97, *10, *11, *12, *13, *14, *15, *16, *17, *18, *19, *20, *22, *23, *24, *25, *26, *27, *28, *29, *3, *30, *31, *32, *33, *35, *37, *38, *39, *40, *41, *42, *43-27, *43-44, *43-71, *46, *47, *48, *49, *50, *51, *52, *53, *54, *55, *56, *57, *58, *59, *60, *7.
Source T19 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
CH-5737, CH-5738, CH-5739, CH-5740, CH-5741, CH-5743, CH-5744, CH-5745, CH-5746, CH-5748, CH-5750, CH-5751, CH-5752, CH-5753, CH-5754, CH-5761, CH-5762, CH-5763, CH-5764, CH-5765, CH-5766, CH-5767, CH-5768, CH-5769, CH-5771, CH-5772, CH-5788, CH-5789, CH-5790, CH-5791, CH-5792, CH-5794, EJ-5494, EJ-5495, EJ-5496, EJ-5497, EJ-5498, EJ-5499, EJ-5501, EJ-5502, EJ-5503, EJ-5504, EJ-5505, EJ-5506, EJ-5507, EJ-5508, EJ-5509, EJ-5510, EJ-5511, EJ-5512, EJ-5514, EJ-5515, EJ-5516, EJ-5517, EJ-5518, EJ-5519, EJ-5521, EJ-5522, EJ-5524, EJ-5525, EJ-5526, EJ-5527, EJ-5530, EJ-5531, EJ-5532, EJ-5542, EJ-7115, EJ-7123, EJ-7125, EJ-7218, EJ-7312, EJ-7314, EJ-7315, EJ-7317, EJ-7318, EJ-7321, EJ-7325, EJ-7327, EJ-7328, EJ-7330, EJ-7331, EJ-7781, EJ-7782, EJ-7783, EJ-7784, EJ-7785, EJ-7786, EJ-7788, EJ-7789, EJ-7791, EJ-7792, EJ-7793, EJ-7794, EJ-7797, EJ-8468, EJ-8469, EJ-8470, EJ-8472, EJ-8474, EJ-A46B, EJ-A46D, EJ-A46E, EJ-A46F, EJ-A46G, EJ-A46H, EJ-A46I, EJ-A65B, EJ-A65D, EJ-A65E, EJ-A65F, EJ-A65G, EJ-A65J, EJ-A65M, EJ-A6RA, EJ-A6RC, EJ-A7NF, EJ-A7NG, EJ-A7NH, EJ-A7NM, EJ-A7NN, FC-7708, FC-7961, FC-A4JI, FC-A5OB, FC-A66V, FC-A6HD, G9-6329, G9-6332, G9-6333, G9-6336, G9-6338, G9-6339, G9-6342, G9-6343, G9-6347, G9-6348, G9-6351, G9-6353, G9-6354, G9-6356, G9-6361, G9-6362, G9-6363, G9-6364, G9-6365, G9-6366, G9-6367, G9-6369, G9-6370, G9-6371, G9-6373, G9-6377, G9-6378, G9-6379, G9-6384, G9-6385, G9-6494, G9-6496, G9-6498, G9-6499, G9-7510, G9-7519, G9-7521, G9-7522, G9-7523, G9-7525, H9-7775, H9-A6BX, H9-A6BY, HC-7075, HC-7077, HC-7078, HC-7079, HC-7080, HC-7081, HC-7209, HC-7210, HC-7211, HC-7212, HC-7213, HC-7230, HC-7231, HC-7232, HC-7233, HC-7736, HC-7737, HC-7738, HC-7740, HC-7742, HC-7744, HC-7745, HC-7747, HC-7748, HC-7749, HC-7750, HC-7752, HC-7817, HC-7818, HC-7819, HC-7820, HC-7821, HC-8213, HC-8216, HC-8256, HC-8257, HC-8258, HC-8259, HC-8260, HC-8261, HC-8262, HC-8264, HC-8265, HC-8266, HC-A48F, HC-A4ZV, HC-A631, HC-A632, HC-A6AL, HC-A6AN, HC-A6AO, HC-A6AP, HC-A6AQ, HC-A6AS, HC-A6HX, HC-A6HY, HC-A76W, HC-A76X, HI-7168, HI-7169, HI-7170, HI-7171, J4-8198, J4-8200, J4-A67K, J4-A67L, J4-A67M, J4-A67N, J4-A67O, J4-A67Q, J4-A67R, J4-A67S, J4-A67T, J4-A6G1, J4-A6G3, J4-A6M7, J9-A52B, J9-A52C, J9-A52D, J9-A52E, KC-A4BL, KC-A4BN, KC-A4BO, KC-A4BR, KC-A4BV, KC-A7F3, KC-A7F5, KC-A7F6, KC-A7FA, KC-A7FD, KC-A7FE, KK-A59V, KK-A59X, KK-A59Y, KK-A59Z, KK-A5A1, KK-A6DY, KK-A6E0, KK-A6E1, KK-A6E2, KK-A6E3, KK-A6E4, KK-A6E5, KK-A6E6, KK-A6E7, KK-A6E8, KK-A7AP, KK-A7AQ, KK-A7AU, KK-A7AV, KK-A7AW, KK-A7AY, KK-A7AZ, KK-A7B0, KK-A7B1, KK-A7B2, KK-A7B3, KK-A7B4, M7-A71Y, M7-A71Z, M7-A720, M7-A721, M7-A723, M7-A724, M7-A725, QU-A6IL, QU-A6IM, QU-A6IN, QU-A6IO, QU-A6IP, SU-A7E7.
Rectum Adenocarcinoma (115 samples):
Source T20 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
AF-2687, AF-2689, AF-2690, AF-2691, AF-2692, AF-2693, AF-3400, AF-3911, AF-4110, AF-5654, AF-6136, AF-6655, AF-6672, AG-3574, AG-3575, AG-3578, AG-3580, AG-3581, AG-3582, AG-3583, AG-3584, AG-3586, AG-3587, AG-3591, AG-3592, AG-3593, AG-3594, AG-3598, AG-3599, AG-3600, AG-3601, AG-3602, AG-3605, AG-3608, AG-3609, AG-3611, AG-3612, AG-3725, AG-3731, AG-3732, AG-3742, AG-4021, AG-4022, AG-A002, AG-A008, AG-A00C, AG-A00Y, AG-A011, AG-A014, AG-A015, AG-A016, AG-A01L, AH-6544, AH-6643, AH-6644, AH-6897, AH-6903, BM-6198, CI-6619, CI-6620, CI-6621, CI-6622, CI-6624, CL-4957, CL-5917, CL-5918, DC-4749, DC-5337, DC-5869, DC-6154, DC-6155, DC-6157, DC-6158, DC-6681, DC-6682, DC-6683, DT-5265, DY-A0XA, DY-A1DC, DY-A1DD, DY-A1DF, DY-A1DG, DY-A1H8, EF-5830, EI-6506, EI-6507, EI-6508, EI-6509, EI-6510, EI-6511, EI-6512, EI-6513, EI-6514, EI-6881, EI-6882, EI-6883, EI-6884, EI-6885, EI-6917, EI-7002, EI-7004, F5-6464, F5-6465, F5-6571, F5-6702, F5-6812, F5-6813, F5-6814, F5-6861, F5-6863, F5-6864, G5-6233, G5-6235, G5-6572, G5-6641.
Renal Cell Carcinoma (709 samples):
Source G3 = [83]:
K1, K20, K27, K29, K3, K31, K32, K38, K44, K48, T127, T142, T144, T163, T164, T166, T183M.
Source T21 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
A3-3308, A3-3311, A3-3313, A3-3316, A3-3317, A3-3319, A3-3320, A3-3322, A3-3323, A3-3324, A3-3326, A3-3331, A3-3346, A3-3347, A3-3349, A3-3351, A3-3357, A3-3358, A3-3362, A3-3363, A3-3365, A3-3367, A3-3370, A3-3372, A3-3373, A3-3374, A3-3376, A3-3378, A3-3380, A3-3382, A3-3383, A3-3385, A3-3387, A4-7286, A4-7287, A4-7288, A4-7583, A4-7584, A4-7585, A4-7732, A4-7734, A4-7828, A4-7915, A4-7996, A4-7997, A4-8098, A4-8310, A4-8311, A4-8312, A4-8515, A4-8516, A4-8517, A4-8518, A4-8630, A4-A48D, A4-A4ZT, A4-A57E, A4-A5DU, A4-A5XZ, A4-A5Y0, A4-A5Y1, A4-A6HP, AK-3425, AK-3427, AK-3428, AK-3429, AK-3430, AK-3431, AK-3434, AK-3436, AK-3440, AK-3443, AK-3444, AK-3445, AK-3447, AK-3450, AK-3451, AK-3453, AK-3454, AK-3455, AK-3456, AK-3458, AK-3460, AK-3461, AK-3465, AL-3466, AL-3467, AL-3468, AL-3472, AL-3473, AL-7173, AL-A5DJ, AS-3777, AS-3778, AT-A5NU, B0-4690, B0-4691, B0-4693, B0-4694, B0-4697, B0-4700, B0-4703, B0-4706, B0-4707, B0-4710, B0-4712, B0-4713, B0-4714, B0-4718, B0-4810, B0-4811, B0-4813, B0-4814, B0-4815, B0-4816, B0-4817, B0-4818, B0-4819, B0-4822, B0-4823, B0-4824, B0-4827, B0-4828, B0-4833, B0-4836, B0-4837, B0-4838, B0-4839, B0-4841, B0-4842, B0-4843, B0-4844, B0-4845, B0-4846, B0-4847, B0-4848, B0-4849, B0-4852, B0-4945, B0-5075, B0-5077, B0-5080, B0-5081, B0-5083, B0-5084, B0-5085, B0-5088, B0-5092, B0-5094, B0-5095, B0-5096, B0-5097, B0-5098, B0-5099, B0-5100, B0-5102, B0-5104, B0-5106, B0-5107, B0-5108, B0-5109, B0-5110, B0-5113, B0-5115, B0-5116, B0-5117, B0-5119, B0-5120, B0-5121, B0-5399, B0-5400, B0-5402, B0-5691, B0-5692, B0-5693, B0-5694, B0-5695, B0-5696, B0-5697, B0-5698, B0-5699, B0-5701, B0-5702, B0-5703, B0-5705, B0-5706, B0-5707, B0-5709, B0-5710, B0-5711, B0-5712, B0-5713, B0-5812, B1-5398, B1-A47M, B1-A47N, B1-A47O, B1-A654, B1-A655, B1-A656, B1-A657, B2-3923, B2-3924, B2-4098, B2-4099, B2-4101, B2-4102, B2-5633, B2-5635, B2-5641, B3-3925, B3-3926, B3-4103, B3-4104, B3-8121, B4-5377, B4-5832, B4-5834, B4-5835, B4-5836, B4-5838, B4-5843, B4-5844, B8-4143, B8-4146, B8-4148, B8-4151, B8-4153, B8-4154, B8-4619, B8-4620, B8-4621, B8-4622, B8-5158, B8-5159, B8-5162, B8-5163, B8-5164, B8-5165, B8-5545, B8-5546, B8-5549, B8-5550, B8-5551, B8-5552, B8-5553, B9-4113, B9-4114, B9-4115, B9-4116, B9-4117, B9-4617, B9-5155, B9-5156, B9-7268, B9-A44B, B9-A5W7, B9-A5W8, B9-A5W9, B9-A69E, BP-4158, BP-4159, BP-4160, BP-4161, BP-4162, BP-4163, BP-4164, BP-4165, BP-4166, BP-4167, BP-4169, BP-4170, BP-4173, BP-4174, BP-4176, BP-4177, BP-4326, BP-4329, BP-4330, BP-4331, BP-4337, BP-4338, BP-4340, BP-4341, BP-4342, BP-4343, BP-4345, BP-4346, BP-4347, BP-4349, BP-4351, BP-4352, BP-4354, BP-4355, BP-4756, BP-4758, BP-4759, BP-4760, BP-4761, BP-4762, BP-4763, BP-4765, BP-4766, BP-4768, BP-4770, BP-4771, BP-4774, BP-4775, BP-4777, BP-4781, BP-4782, BP-4787, BP-4789, BP-4790, BP-4795, BP-4797, BP-4798, BP-4799, BP-4801, BP-4803, BP-4804, BP-4807, BP-4960, BP-4961, BP-4962, BP-4963, BP-4964, BP-4965, BP-4967, BP-4968, BP-4969, BP-4970, BP-4971, BP-4972, BP-4973, BP-4974, BP-4975, BP-4976, BP-4977, BP-4981, BP-4982, BP-4983, BP-4985, BP-4986, BP-4987, BP-4988, BP-4989, BP-4991, BP-4992, BP-4993, BP-4994, BP-4995, BP-4998, BP-4999, BP-5000, BP-5001, BP-5004, BP-5006, BP-5007, BP-5008, BP-5009, BP-5010, BP-5168, BP-5169, BP-5170, BP-5173, BP-5174, BP-5175, BP-5176, BP-5177, BP-5178, BP-5180, BP-5181, BP-5182, BP-5183, BP-5184, BP-5185, BP-5186, BP-5187, BP-5189, BP-5190, BP-5191, BP-5192, BP-5194, BP-5195, BP-5196, BP-5198, BP-5199, BP-5200, BP-5201, BP-5202, BQ-5875, BQ-5876, BQ-5877, BQ-5878, BQ-5879, BQ-5880, BQ-5881, BQ-5882, BQ-5883, BQ-5884, BQ-5885, BQ-5886, BQ-5887, BQ-5888, BQ-5889, BQ-5890, BQ-5891, BQ-5892, BQ-5893, BQ-5894, BQ-7044, BQ-7045, BQ-7046, BQ-7048, BQ-7049, BQ-7050, BQ-7051, BQ-7053, BQ-7055, BQ-7056, BQ-7058, BQ-7059, BQ-7060, BQ-7061, BQ-7062, CJ-4634, CJ-4635, CJ-4636, CJ-4637, CJ-4638, CJ-4639, CJ-4640, CJ-4641, CJ-4643, CJ-4644, CJ-4868, CJ-4869, CJ-4870, CJ-4871, CJ-4872, CJ-4873, CJ-4874, CJ-4875, CJ-4876, CJ-4878, CJ-4881, CJ-4882, CJ-4884, CJ-4885, CJ-4886, CJ-4887, CJ-4888, CJ-4889, CJ-4890, CJ-4891, CJ-4892, CJ-4893, CJ-4894, CJ-4895, CJ-4897, CJ-4899, CJ-4900, CJ-4901, CJ-4902, CJ-4903, CJ-4904, CJ-4905, CJ-4907, CJ-4908, CJ-4912, CJ-4913, CJ-4916, CJ-4918, CJ-4920, CJ-4923, CJ-5671, CJ-5672, CJ-5675, CJ-5676, CJ-5677, CJ-5678, CJ-5679, CJ-5680, CJ-5681, CJ-5682, CJ-5683, CJ-5684, CJ-5686, CJ-6027, CJ-6028, CJ-6030, CJ-6031, CJ-6032, CJ-6033, CW-5580, CW-5581, CW-5583, CW-5584, CW-5585, CW-5588, CW-5589, CW-5591, CW-6087, CW-6090, CW-6093, CW-6097, CZ-4853, CZ-4854, CZ-4856, CZ-4857, CZ-4858, CZ-4859, CZ-4861, CZ-4862, CZ-4863, CZ-4865, CZ-4866, CZ-5451, CZ-5452, CZ-5453, CZ-5454, CZ-5455, CZ-5456, CZ-5457, CZ-5458, CZ-5459, CZ-5460, CZ-5461, CZ-5462, CZ-5463, CZ-5464, CZ-5465, CZ-5466, CZ-5467, CZ-5468, CZ-5469, CZ-5470, CZ-5982, CZ-5984, CZ-5985, CZ-5986, CZ-5987, CZ-5988, CZ-5989, DV-5565, DV-5566, DV-5568, DV-5569, DV-5574, DV-5575, DV-5576, DW-5560, DW-5561, DW-7834, DW-7837, DW-7838, DW-7839, DW-7840, DW-7841, DW-7842, DW-7963, DZ-6131, DZ-6132, DZ-6133, DZ-6134, DZ-6135, EU-5904, EU-5905, EU-5906, EU-5907, EV-5901, EV-5902, EV-5903, F9-A4JJ, G7-6789, G7-6790, G7-6792, G7-6793, G7-6795, G7-6796, G7-6797, G7-7501, G7-7502, G7-A4TM, GL-6846, GL-7773, GL-7966, GL-8500, GL-A4EM, GL-A59R, GL-A59T, HE-7128, HE-7129, HE-7130, HE-A5NF, HE-A5NH, HE-A5NI, HE-A5NJ, HE-A5NK, HE-A5NL, IA-A40U, IA-A40X, IA-A40Y, IZ-8195, IZ-8196, IZ-A6M8, IZ-A6M9, J7-6720, J7-8537, KL-8323, KL-8324, KL-8325, KL-8326, KL-8327, KL-8328, KL-8329, KL-8330, KL-8331, KL-8332, KL-8333, KL-8334, KL-8335, KL-8336, KL-8337, KL-8338, KL-8339, KL-8340, KL-8341, KL-8342, KL-8343, KL-8344, KL-8345, KL-8346, KM-8438, KM-8439, KM-8440, KM-8441, KM-8442, KM-8443, KM-8476, KM-8477, KM-8639, KN-8418, KN-8419, KN-8421, KN-8422, KN-8423, KN-8424, KN-8425, KN-8426, KN-8427, KN-8428, KN-8429, KN-8430, KN-8431, KN-8432, KN-8433, KN-8434, KN-8435, KN-8436, KN-8437, KO-8403, KO-8404, KO-8405, KO-8406, KO-8407, KO-8408, KO-8409, KO-8410, KO-8411, KO-8413, KO-8414, KO-8415, KO-8416, KO-8417, KV-A6GD, KV-A6GE, MH-A55W, MH-A55Z, MH-A560, MH-A561, MH-A562, P4-A5E6, P4-A5E7, P4-A5E8, P4-A5EA, P4-A5EB, P4-A5ED, PJ-A5Z8, PJ-A5Z9, Q2-A5QZ.
Sarcoma (255 samples):
Source T22 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
3B-A9HI, 3B-A9HJ, 3B-A9HL, 3B-A9HO, 3B-A9HP, 3B-A9HQ, 3B-A9HR, 3B-A9HS, 3B-A9HT, 3B-A9HU, 3B-A9HV, 3B-A9HX, 3B-A9HY, 3B-A9HZ, 3B-A9I0, 3B-A9I1, 3B-A9I3, 3R-A8YX, DX-A1KU, DX-A1KW, DX-A1KX, DX-A1KY, DX-A1KZ, DX-A1L0, DX-A1L1, DX-A1L2, DX-A1L3, DX-A1L4, DX-A23R, DX-A23T, DX-A23U, DX-A23V, DX-A23Y, DX-A240, DX-A2IZ, DX-A2J0, DX-A2J1, DX-A2J4, DX-A3LS, DX-A3LT, DX-A3LU, DX-A3LW, DX-A3LY, DX-A3M1, DX-A3M2, DX-A3U5, DX-A3U6, DX-A3U7, DX-A3U8, DX-A3U9, DX-A3UA, DX-A3UB, DX-A3UC, DX-A3UD, DX-A3UE, DX-A3UF, DX-A48J, DX-A48K, DX-A48L, DX-A48N, DX-A48O, DX-A48P, DX-A48R, DX-A48U, DX-A48V, DX-A6B7, DX-A6B8, DX-A6B9, DX-A6BA, DX-A6BB, DX-A6BE, DX-A6BF, DX-A6BG, DX-A6BH, DX-A6BK, DX-A6YQ, DX-A6YR, DX-A6YS, DX-A6YT, DX-A6YU, DX-A6YV, DX-A6YX, DX-A6YZ, DX-A6Z0, DX-A6Z2, DX-A7EF, DX-A7EI, DX-A7EL, DX-A7EM, DX-A7EN, DX-A7EO, DX-A7EQ, DX-A7ER, DX-A7ES, DX-A7ET, DX-A7EU, DX-A8BG, DX-A8BH, DX-A8BJ, DX-A8BK, DX-A8BL, DX-A8BM, DX-A8BN, DX-A8BO, DX-A8BP, DX-A8BR, DX-A8BT, DX-A8BU, DX-A8BV, DX-A8BX, DX-A8BZ, DX-AATS, DX-AB2E, DX-AB2F, DX-AB2G, DX-AB2H, DX-AB2J, DX-AB2L, DX-AB2O, DX-AB2P, DX-AB2Q, DX-AB2S, DX-AB2T, DX-AB2V, DX-AB2W, DX-AB2X, DX-AB2Z, DX-AB30, DX-AB32, DX-AB35, DX-AB36, DX-AB37, DX-AB3A, DX-AB3B, DX-AB3C, FX-A2QS, FX-A3NJ, FX-A3NK, FX-A3RE, FX-A3TO, FX-A48G, FX-A76Y, FX-A8OO, HB-A2OT, HB-A3L4, HB-A3YV, HB-A43Z, HB-A5W3, HS-A5N7, HS-A5N8, HS-A5N9, IE-A3OV, IE-A4EH, IE-A4EI, IE-A4EJ, IE-A4EK, IE-A6BZ, IF-A4AJ, IF-A4AK, IS-A3K6, IS-A3K7, IS-A3K8, IS-A3KA, IW-A3M4, IW-A3M5, IW-A3M6, JV-A5VE, JV-A5VF, JV-A75J, K1-A3PN, K1-A3PO, K1-A42W, K1-A42X, K1-A6RT, K1-A6RU, K1-A6RV, KD-A5QS, KD-A5QT, KD-A5QU, KF-A41W, LI-A67I, LI-A9QH, MB-A5Y8, MB-A5Y9, MB-A5YA, MB-A8JK, MB-A8JL, MJ-A68H, MJ-A68J, MJ-A850, MO-A47P, MO-A47R, N1-A6IA, PC-A5DK, PC-A5DL, PC-A5DM, PC-A5DN, PC-A5DO, PC-A5DP, QC-A6FX, QC-A7B5, QC-AA9N, QQ-A5V2, QQ-A5V9, QQ-A5VA, QQ-A5VB, QQ-A5VC, QQ-A5VD, QQ-A8VB, QQ-A8VD, QQ-A8VF, QQ-A8VG, QQ-A8VH, RN-A68Q, RN-AAAQ, SG-A6Z4, SG-A6Z7, SG-A849, SI-A71O, SI-A71P, SI-A71Q, SI-AA8B, SI-AA8C, UE-A6QT, UE-A6QU, VT-A80G, VT-A80J, VT-AB3D, WK-A8XO, WK-A8XQ, WK-A8XS, WK-A8XT, WK-A8XX, WK-A8XY, WK-A8XZ, WK-A8Y0, WP-A9GB, X2-A95T, X6-A7W8, X6-A7WA, X6-A7WB, X6-A7WC, X6-A7WD, X6-A8C2, X6-A8C3, X6-A8C4, X6-A8C5, X6-A8C6, X6-A8C7, X9-A971, X9-A973, Z4-A8JB, Z4-A9VC, Z4-AAPF, Z4-AAPG.
Testicular Germ Cell Tumors (150 samples):
Source T23 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
2G-AAEW, 2G-AAEX, 2G-AAF1, 2G-AAF4, 2G-AAF6, 2G-AAF8, 2G-AAFE, 2G-AAFG, 2G-AAFH, 2G-AAFI, 2G-AAFJ, 2G-AAFL, 2G-AAFM, 2G-AAFN, 2G-AAFO, 2G-AAFV, 2G-AAFY, 2G-AAFZ, 2G-AAG0, 2G-AAG3, 2G-AAG5, 2G-AAG6, 2G-AAG7, 2G-AAG8, 2G-AAG9, 2G-AAGA, 2G-AAGC, 2G-AAGE, 2G-AAGF, 2G-AAGG, 2G-AAGI, 2G-AAGJ, 2G-AAGK, 2G-AAGM, 2G-AAGN, 2G-AAGO, 2G-AAGP, 2G-AAGS, 2G-AAGT, 2G-AAGV, 2G-AAGW, 2G-AAGX, 2G-AAGY, 2G-AAGZ, 2G-AAH0, 2G-AAH2, 2G-AAH3, 2G-AAH4, 2G-AAH8, 2G-AAHA, 2G-AAHC, 2G-AAHG, 2G-AAHL, 2G-AAHN, 2G-AAHP, 2G-AAHT, 2G-AAKD, 2G-AAKG, 2G-AAKH, 2G-AAKL, 2G-AAKM, 2G-AAKO, 2G-AAL5, 2G-AAL7, 2G-AALF, 2G-AALG, 2G-AALN, 2G-AALO, 2G-AALP, 2G-AALQ, 2G-AALR, 2G-AALS, 2G-AALT, 2G-AALW, 2G-AALX, 2G-AALY, 2G-AALZ, 2G-AAM2, 2G-AAM3, 2G-AAM4, 2X-A9D5, 2X-A9D6, 4K-AA1G, 4K-AA1H, 4K-AA1I, 4K-AAAL, S6-A8JW, S6-A8JX, S6-A8JY, SB-A6J6, SB-A76C, SN-A6IS, SN-A84W, SN-A84X, SN-A84Y, SO-A8JP, VF-A8A8, VF-A8A9, VF-A8AA, VF-A8AB, VF-A8AC, VF-A8AD, VF-A8AE, W4-A7U2, W4-A7U3, W4-A7U4, WZ-A7V3, WZ-A7V4, WZ-A7V5, WZ-A8D5, X3-A8G4, XE-A8H1, XE-A8H4, XE-A8H5, XE-A9SE, XE-AANI, XE-AANJ, XE-AANR, XE-AANV, XE-AAO3, XE-AAO4, XE-AAO6, XE-AAOB, XE-AAOC, XE-AAOD, XE-AAOF, XE-AAOJ, XE-AAOL, XY-A89B, XY-A8S2, XY-A8S3, XY-A9T9, YU-A90P, YU-A90Q, YU-A90S, YU-A90W, YU-A90Y, YU-A912, YU-A94D, YU-A94I, YU-AA4L, YU-AA61, ZM-AA05, ZM-AA06, ZM-AA0B, ZM-AA0D, ZM-AA0E, ZM-AA0F, ZM-AA0H, ZM-AA0N.
Thymoma (123 samples):
Source T24 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
3G-AB0O, 3G-AB0Q, 3G-AB0T, 3G-AB14, 3G-AB19, 3Q-A9WF, 3S-A8YW, 3S-AAYX, 3T-AA9L, 4V-A9QI, 4V-A9QJ, 4V-A9QL, 4V-A9QM, 4V-A9QN, 4V-A9QQ, 4V-A9QR, 4V-A9QS, 4V-A9QT, 4V-A9QU, 4V-A9QW, 4V-A9QX, 4X-A9F9, 4X-A9FA, 4X-A9FB, 4X-A9FC, 4X-A9FD, 5G-A9ZZ, 5K-AAAP, 5U-AB0D, 5U-AB0E, 5U-AB0F, 5V-A9RR, X7-A8D6, X7-A8D7, X7-A8D8, X7-A8D9, X7-A8DB, X7-A8DC, X7-A8DD, X7-A8DE, X7-A8DF, X7-A8DG, X7-A8DI, X7-A8DJ, X7-A8M0, X7-A8M1, X7-A8M3, X7-A8M4, X7-A8M5, X7-A8M6, X7-A8M7, X7-A8M8, XH-A853, XM-A8R8, XM-A8R9, XM-A8RB, XM-A8RC, XM-A8RD, XM-A8RE, XM-A8RF, XM-A8RG, XM-A8RH, XM-A8RI, XM-A8RL, XM-AAZ1, XM-AAZ2, XM-AAZ3, XU-A92O, XU-A92Q, XU-A92R, XU-A92T, XU-A92U, XU-A92V, XU-A92W, XU-A92X, XU-A92Y, XU-A92Z, XU-A930, XU-A931, XU-A932, XU-A933, XU-A936, XU-AAXW, XU-AAXX, XU-AAXY, XU-AAXZ, XU-AAY0, XU-AAY1, YT-A95D, YT-A95E, YT-A95F, YT-A95G, YT-A95H, ZB-A961, ZB-A962, ZB-A963, ZB-A964, ZB-A965, ZB-A966, ZB-A969, ZB-A96A, ZB-A96B, ZB-A96C, ZB-A96D, ZB-A96E, ZB-A96F, ZB-A96G, ZB-A96H, ZB-A96I, ZB-A96K, ZB-A96L, ZB-A96M, ZB-A96O, ZB-A96P, ZB-A96Q, ZB-A96R, ZB-A96V, ZC-AAA7, ZC-AAAA, ZC-AAAF, ZC-AAAH, ZL-A9V6, ZT-A8OM.
Thyroid Cancer (409 samples):
Source T25 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
BJ-A0YZ, BJ-A0Z0, BJ-A0Z2, BJ-A0Z3, BJ-A0Z5, BJ-A0Z9, BJ-A0ZA, BJ-A0ZB, BJ-A0ZC, BJ-A0ZE, BJ-A0ZF, BJ-A0ZG, BJ-A0ZH, BJ-A0ZJ, BJ-A18Y, BJ-A18Z, BJ-A190, BJ-A191, BJ-A192, BJ-A28R, BJ-A28S, BJ-A28T, BJ-A28V, BJ-A28X, BJ-A28Z, BJ-A290, BJ-A2N7, BJ-A2N8, BJ-A2N9, BJ-A2NA, BJ-A2P4, BJ-A3EZ, BJ-A3F0, BJ-A3PR, BJ-A3PT, BJ-A3PU, BJ-A45D, BJ-A45E, BJ-A45F, BJ-A45G, BJ-A45I, BJ-A45J, BJ-A45K, BJ-A4O8, BJ-A4O9, CE-A13K, CE-A27D, CE-A3MD, CE-A3ME, CE-A482, CE-A484, CE-A485, DE-A0XZ, DE-A0Y2, DE-A0Y3, DE-A2OL, DE-A3KN, DE-A4M8, DE-A4M9, DJ-A13L, DJ-A13M, DJ-A13O, DJ-A13P, DJ-A13R, DJ-A13S, DJ-A13T, DJ-A13U, DJ-A13V, DJ-A13W, DJ-A13X, DJ-A1QD, DJ-A1QE, DJ-A1QF, DJ-A1QG, DJ-A1QH, DJ-A1QI, DJ-A1QL, DJ-A1QM, DJ-A1QN, DJ-A1QO, DJ-A1QQ, DJ-A2PN, DJ-A2PO, DJ-A2PP, DJ-A2PQ, DJ-A2PR, DJ-A2PS, DJ-A2PT, DJ-A2PU, DJ-A2PV, DJ-A2PW, DJ-A2PX, DJ-A2PY, DJ-A2PZ, DJ-A2Q0, DJ-A2Q1, DJ-A2Q2, DJ-A2Q3, DJ-A2Q4, DJ-A2Q5, DJ-A2Q6, DJ-A2Q7, DJ-A2Q8, DJ-A2Q9, DJ-A2QA, DJ-A2QB, DJ-A2QC, DJ-A3UK, DJ-A3UM, DJ-A3UN, DJ-A3UO, DJ-A3UP, DJ-A3UQ, DJ-A3UR, DJ-A3US, DJ-A3UT, DJ-A3UU, DJ-A3UV, DJ-A3UW, DJ-A3UX, DJ-A3UY, DJ-A3V7, DJ-A3VA, DJ-A3VB, DJ-A3VE, DJ-A3VF, DJ-A3VJ, DJ-A3VK, DJ-A3VL, DJ-A3VM, DJ-A4UL, DJ-A4UP, DJ-A4UT, DJ-A4UW, DJ-A4V0, DJ-A4V2, DJ-A4V4, DJ-A4V5, DO-A1JZ, DO-A1K0, DO-A2HM, E3-A3DY, E3-A3DZ, E3-A3E0, E3-A3E1, E3-A3E2, E3-A3E3, E3-A3E5, E8-A242, E8-A2EA, E8-A413, E8-A415, E8-A418, E8-A419, E8-A433, E8-A436, E8-A437, E8-A44K, E8-A44M, EL-A3CL, EL-A3CM, EL-A3CN, EL-A3CO, EL-A3CP, EL-A3CR, EL-A3CS, EL-A3CT, EL-A3CU, EL-A3CV, EL-A3CW, EL-A3CX, EL-A3CY, EL-A3CZ, EL-A3D0, EL-A3D1, EL-A3D4, EL-A3D5, EL-A3D6, EL-A3GO, EL-A3GP, EL-A3GQ, EL-A3GR, EL-A3GS, EL-A3GU, EL-A3GV, EL-A3GW, EL-A3GX, EL-A3GY, EL-A3GZ, EL-A3H1, EL-A3H2, EL-A3H3, EL-A3H4, EL-A3H5, EL-A3H7, EL-A3H8, EL-A3MW, EL-A3MX, EL-A3MY, EL-A3MZ, EL-A3N2, EL-A3N3, EL-A3T0, EL-A3T1, EL-A3T2, EL-A3T3, EL-A3T6, EL-A3T7, EL-A3T8, EL-A3T9, EL-A3TA, EL-A3TB, EL-A3ZH, EL-A3ZK, EL-A3ZN, EL-A3ZQ, EL-A3ZR, EL-A3ZT, EL-A4JV, EL-A4JW, EL-A4JX, EL-A4JZ, EL-A4K0, EL-A4K2, EL-A4K4, EL-A4K6, EL-A4KD, EL-A4KG, EL-A4KH, EL-A4KI, EM-A1CS, EM-A1CT, EM-A1CU, EM-A1CV, EM-A1CW, EM-A1YA, EM-A1YB, EM-A1YC, EM-A1YD, EM-A1YE, EM-A22I, EM-A22J, EM-A22K, EM-A22L, EM-A22M, EM-A22N, EM-A22O, EM-A22P, EM-A22Q, EM-A2CJ, EM-A2CK, EM-A2CL, EM-A2CM, EM-A2CN, EM-A2CO, EM-A2CP, EM-A2CQ, EM-A2CR, EM-A2CT, EM-A2CU, EM-A2OV, EM-A2OW, EM-A2OX, EM-A2OY, EM-A2OZ, EM-A2P0, EM-A2P1, EM-A2P2, EM-A2P3, EM-A3AI, EM-A3AJ, EM-A3AK, EM-A3AL, EM-A3AN, EM-A3AO, EM-A3AP, EM-A3AQ, EM-A3AR, EM-A3FJ, EM-A3FK, EM-A3FL, EM-A3FM, EM-A3FN, EM-A3FO, EM-A3FP, EM-A3FQ, EM-A3FR, EM-A3O3, EM-A3O6, EM-A3O7, EM-A3O8, EM-A3O9, EM-A3OA, EM-A3OB, EM-A4FK, EM-A4FM, EM-A4FO, EM-A4FQ, EM-A4FR, EM-A4FV, EM-A4G1, ET-A25G, ET-A25I, ET-A25J, ET-A25K, ET-A25O, ET-A25R, ET-A2MY, ET-A2MZ, ET-A2N0, ET-A2N1, ET-A2N4, ET-A2N5, ET-A39I, ET-A39J, ET-A39K, ET-A39L, ET-A39M, ET-A39N, ET-A39O, ET-A39P, ET-A39R, ET-A39S, ET-A39T, ET-A3BN, ET-A3BO, ET-A3BP, ET-A3BQ, ET-A3BS, ET-A3BT, ET-A3BU, ET-A3BV, ET-A3BW, ET-A3BX, ET-A3DO, ET-A3DP, ET-A3DQ, ET-A3DR, ET-A3DS, ET-A3DT, ET-A3DU, ET-A3DV, ET-A3DW, ET-A40S, ET-A4KN, FE-A22Z, FE-A230, FE-A231, FE-A232, FE-A233, FE-A234, FE-A235, FE-A236, FE-A237, FE-A238, FE-A23A, FE-A3PA, FE-A3PB, FE-A3PC, FE-A3PD, FK-A3S3, FK-A3SB, FK-A3SD, FK-A3SE, FK-A3SG, FK-A3SH, FY-A2QD, FY-A3BL, FY-A3I4, FY-A3I5, FY-A3NM, FY-A3NN, FY-A3NP, FY-A3ON, FY-A3R6, FY-A3R7, FY-A3R8, FY-A3R9, FY-A3RA, FY-A3W9, FY-A3WA, FY-A40K, FY-A4B3, GE-A2C6, H2-A26U, H2-A2K9, H2-A3RH, H2-A3RI, H2-A421, IM-A3EB, IM-A3ED, IM-A3U2, IM-A3U3, J8-A3NZ, J8-A3O0, J8-A3O1, J8-A3YE, J8-A3YH, J8-A4HW, KS-A41J, KS-A4I5, KS-A4I9, KS-A4IB, L6-A4EP, L6-A4ET, L6-A4EU, MK-A4N6, MK-A4N7, MK-A4N9.
Uterine Cancer (305 samples):
Source T26 = TCGA (see Acknowledgments (main text)). Sample IDs are of the form TCGA-*, where * is:
A5-A0G3, A5-A0G5, A5-A0G9, A5-A0GA, A5-A0GB, A5-A0GD, A5-A0GE, A5-A0GH, A5-A0GI, A5-A0GJ, A5-A0GM, A5-A0GN, A5-A0GP, A5-A0GQ, A5-A0GU, A5-A0GV, A5-A0GW, A5-A0GX, A5-A0R6, A5-A0R7, A5-A0R8, A5-A0R9, A5-A0RA, A5-A0VO, A5-A0VP, A5-A0VQ, AJ-A23M, AP-A051, AP-A052, AP-A053, AP-A054, AP-A056, AP-A059, AP-A05A, AP-A05D, AP-A05H, AP-A05J, AP-A05N, AP-A05P, AP-A0L8, AP-A0L9, AP-A0LD, AP-A0LE, AP-A0LF, AP-A0LG, AP-A0LH, AP-A0LI, AP-A0LJ, AP-A0LL, AP-A0LM, AP-A0LN, AP-A0LO, AP-A0LP, AP-A0LQ, AP-A0LT, AP-A0LV, AP-A1DQ, AX-A05S, AX-A05T, AX-A05U, AX-A05W, AX-A05Y, AX-A05Z, AX-A060, AX-A062, AX-A063, AX-A064, AX-A06B, AX-A06H, AX-A06L, AX-A0IS, AX-A0IU, AX-A0IW, AX-A0J0, AX-A0J1, AX-A1C7, AX-A1C8, AX-A1CP, AX-A2H5, AX-A2HF, B5-A0JN, B5-A0JR, B5-A0JS, B5-A0JT, B5-A0JV, B5-A0JY, B5-A0JZ, B5-A0K0, B5-A0K1, B5-A0K2, B5-A0K3, B5-A0K4, B5-A0K6, B5-A0K7, B5-A0K8, B5-A0K9, B5-A11E, B5-A11F, B5-A11G, B5-A11H, B5-A11I, B5-A11J, B5-A11M, B5-A11N, B5-A11O, B5-A11Q, B5-A11R, B5-A11S, B5-A11U, B5-A11V, B5-A11W, B5-A11X, B5-A11Y, B5-A11Z, B5-A121, B5-A1MU, B5-A1MY, BG-A0LW, BG-A0LX, BG-A0M0, BG-A0M2, BG-A0M3, BG-A0M4, BG-A0M6, BG-A0M7, BG-A0M8, BG-A0M9, BG-A0MC, BG-A0MG, BG-A0MI, BG-A0MO, BG-A0MQ, BG-A0MS, BG-A0MT, BG-A0MU, BG-A0RY, BG-A0VT, BG-A0VV, BG-A0VW, BG-A0VX, BG-A0VZ, BG-A0W1, BG-A0W2, BG-A0YU, BG-A0YV, BG-A186, BG-A187, BG-A18A, BG-A18B, BG-A18C, BG-A2AE, BK-A0C9, BK-A0CA, BK-A0CB, BK-A0CC, BK-A139, BK-A13C, BS-A0T9, BS-A0TA, BS-A0TC, BS-A0TD, BS-A0TE, BS-A0TG, BS-A0TI, BS-A0TJ, BS-A0U5, BS-A0U7, BS-A0U8, BS-A0U9, BS-A0UA, BS-A0UF, BS-A0UJ, BS-A0UL, BS-A0UM, BS-A0UT, BS-A0UV, BS-A0V6, BS-A0V7, BS-A0V8, BS-A0WQ, D1-A0ZN, D1-A0ZO, D1-A0ZP, D1-A0ZQ, D1-A0ZR, D1-A0ZS, D1-A0ZU, D1-A0ZV, D1-A0ZZ, D1-A101, D1-A102, D1-A103, D1-A15V, D1-A15W, D1-A15X, D1-A15Z, D1-A160, D1-A161, D1-A163, D1-A165, D1-A167, D1-A168, D1-A169, D1-A16B, D1-A16D, D1-A16E, D1-A16F, D1-A16G, D1-A16I, D1-A16J, D1-A16N, D1-A16O, D1-A16Q, D1-A16R, D1-A16S, D1-A16X, D1-A16Y, D1-A174, D1-A176, D1-A177, D1-A17A, D1-A17B, D1-A17C, D1-A17D, D1-A17F, D1-A17H, D1-A17K, D1-A17L, D1-A17M, D1-A17N, D1-A17Q, D1-A17R, D1-A17S, D1-A17T, D1-A17U, D1-A1NU, D1-A1NX, DI-A0WH, DI-A1NN, E6-A1LZ, EO-A1Y5, EO-A1Y8, EY-A1GS, EY-A212, FI-A2D2, FI-A2EW, FI-A2EX, FI-A2F8, N5-A4R8, N5-A4RA, N5-A4RD, N5-A4RF, N5-A4RJ, N5-A4RM, N5-A4RN, N5-A4RO, N5-A4RS, N5-A4RT, N5-A4RU, N5-A4RV, N5-A59E, N5-A59F, N6-A4V9, N6-A4VC, N6-A4VD, N6-A4VE, N6-A4VF, N6-A4VG, N7-A4Y0, N7-A4Y5, N7-A4Y8, N7-A59B, N8-A4PI, N8-A4PL, N8-A4PM, N8-A4PN, N8-A4PO, N8-A4PP, N8-A4PQ, N8-A56S, N9-A4PZ, N9-A4Q1, N9-A4Q3, N9-A4Q4, N9-A4Q7, N9-A4Q8, NA-A4QV, NA-A4QW, NA-A4QX, NA-A4QY, NA-A4R0, NA-A4R1, NA-A5I1, ND-A4W6, ND-A4WA, ND-A4WC, ND-A4WF, NF-A4WU, NF-A4WX, NF-A4X2, NF-A5CP, NG-A4VU, NG-A4VW, QM-A5NM, QN-A5NN.
Table A1. Occurrence counts for cancer types X1–X16 for the first 48 mutation categories for the exome data summarized in Table 1 aggregated by cancer types. Here and in tables below, the mutations (abbreviated as “Mut.”) are encoded as follows: XYZW = Y > W: XYZ.
Table A1. Occurrence counts for cancer types X1–X16 for the first 48 mutation categories for the exome data summarized in Table 1 aggregated by cancer types. Here and in tables below, the mutations (abbreviated as “Mut.”) are encoded as follows: XYZW = Y > W: XYZ.
Mut.X1X2X3X4X5X6X7X8X9X10X11X12X13X14X15X16
ACAA9111568834519518231901369681513493178892
ACCA717148613371248177517711211726465561214712
ACGA0510014252584318855232120314754446
ACTA661056625943614111645110981277438132514
CCAA131333351852025817503221329201982311451711426
CCCA12927921038827315062221346224657016811721099
CCGA310180393731275411751027142742488287836
CCTA1014292294031771180429812111556284442061911153
GCAA1212278111535612714372451341012588248161181146
GCCA78270513382218128124519091265486772138842
GCGA671634172629937295142169833135760656
GCTA1812247203325711216233111622226111447112700
TCAA16527371218001474102867143730077186881601450
TCCA1817282920124730528136011591016347097961801490
TCGA1593235498464027373382139726869632
TCTA25132811382214129248891150126131589683022021571359
ACAG51350572721471287138409374239250121423
ACCG3350341901159688451328922135095381
ACGG383324169435167137213911014366229
ACTG365533275169136388526446268374129440
CCAG4127113406891186162636272216270109517
CCCG1266582778987610263424921220286426
CCGG0114818290405699352223014014855397
CCTG23972251911015941907311335329341129636
GCAG375283256947208532325117022787310
GCCG3583442191367919746349322047092387
GCGG774220183393353440314410910029260
GCTG316498273133110611852438324831493371
TCAG5515186694416413309455496747811555391794640
TCCG45135582539134434814007444155464531591864
TCGG4843019957199764946215922513846706
TCTG862166684683321753648831261476614469002505340
ACAT13431141010663739229329213221257866519643391017
ACCT2028991015626582151123515415214158917754381081
ACGT7715565027262163379048801264442611287736111508332083132
ACTT42069495195941625190971516725051204294765
CCAT282815815191363451261359719812261291014424282033
CCCT2421154618930651173534018817242294115087292167
CCGT7115858632362541266942151538502451313239761482419334047
CCTT26541749281196582249644422019221095616044972161
GCAT101921322239021019226644517320431778438573161392
GCCT19342311623119714472181470232157695109468349851864
GCGT8013166229402440447745801665452441982341832171225963885
GCTT83723416129071128209543219418532988841924061269
TCAT293025213712157353218897214226192277194217563407315
TCCT43332271614412674167152397204253327130219885944506
TCGT58733561921656818267512360229327171352630931211784328
TCTT25191938176716611124703850166183770147416664134591
Table A2. Table A1, continued: occurrence counts (aggregated by cancer types) for the next 48 mutation categories for cancer types X1–X16.
Table A2. Table A1, continued: occurrence counts (aggregated by cancer types) for the next 48 mutation categories for cancer types X1–X16.
Mut.X1X2X3X4X5X6X7X8X9X10X11X12X13X14X15X16
ATAA104494125487293485427912326150293
ATCA3856105162908616470756422479174407
ATGA03488623478893712021428020629683424
ATTA8221431515510874944860317152958295
CTAA222950132345252919222428213141209
CTCA27676622113986670200553032852987481
CTGA8101285114801328889369114606310430154845
CTTA325056221110981801796560535714103507
GTAA024150129454123411963157917340232
GTCA334372121755444062444313457545263
GTGA23424416644537441941030211723960379
GTTA152003117656663744632618433755219
TTAA01203210536905357572818318732199
TTCA5433341616566053107836315528165278
TTGA433025163485023118852319914053218
TTTA2326331419310544567945919426158258
ATAC517104414632350167713817417158554113001601137
ATCC11126364319349964110931527283301306143450
ATGC102110472173149317811882341434105212601208876
ATTC319907115354591762153118181670484977236834
CTAC56484726717580888100511102731092117403
CTCC3146891345940613161541251141947241625252526
CTGC151513091969245014642492451449368423653324863
CTTC3107589427337114816214212357716732424222780
GTAC10411281042635411641631131025984343655126808
GTCC8873149373443780227901545444832647169575
GTGC87837134133311004204143935834563093140776
GTTC486631139137911341941241632774592338179773
TTAC30545628815084677829941196101278353
TTCC119631111290339104816593629643931362155428
TTGC34774835234272114111372088283144499456
TTTC51552711267318103715282101910282899142423
ATAG2112646343423332542086210424110
ATCG2214609385320432644468625841115
ATGG022664128775885044227613020555156
ATTG111952132136564112432119332258838123
CTAG111421742931819366238701622177
CTCG252854129995194974662427173750149
CTGG3106069271938268786886333993495317
CTTG24193262661769941231291421943027390359313
GTAG03254113831544312431617512523102
GTCG22282415967464501943621383344993
GTGG54133210656116537958542368155314118243
GTTG21271371528468385314714727116148109
TTAG139311023550359216466841373596
TTCG242942159864749865177817936730172
TTGG312036151826355963335114124761226
TTTG35228441519611893178954569555136163214
Table A3. Occurrence counts for cancer types X17–X32 for the first 48 mutation categories for the exome data summarized in Table 1 aggregated by cancer types.
Table A3. Occurrence counts for cancer types X17–X32 for the first 48 mutation categories for the exome data summarized in Table 1 aggregated by cancer types.
Mut.X17X18X19X20X21X22X23X24X25X26X27X28X29X30X31X32
ACAA281361495626332330814736264104555273586456457
ACCA221861083948340352214282161315112561115551554
ACGA1014397915721936768216441202127673634185
ACTA1623408935011187217351101292123601895236431594
CCAA2593138742172403793262913825916567330410975681094
CCCA24541258285118312317257331891256452619562421238
CCGA1644800418971622582296221451323381981286653548
CCTA20541071712431435028713681723346970432311483526273
GCAA2244638035830662260147282731625179591044455586
GCCA2318753733322756292322222341495319431953732899
GCGA15345020153143758012415135852615711464930239
GCTA1597481528623610245135218159297482218213644363015
TCAA2141688294126309259399212147826962426748572984
TCCA238010492121316319373404302972627272488345521575
TCGA926379142942048824016120141255137693640409
TCTA2464746486414256306125432261522470327882404122736
ACAG18521947301844276832313055284111391520168
ACCG107513772843331663930954230998302128181
ACGG82490121416221105110482114870511621140
ACTG1621146731953124765141185032492391333213
CCAG11392373735125125783261094927910424113893
CCCG141720256841169238542291353559219134397
CCGG85717628232641120448835322373281829128
CCTG17152383496688315822815838506154512345154
GCAG958159517231281766118765022284361321122
GCCG109119933378829165662713162241104322043215
GCGG8731350181901249301237351355125121562
GCTG1143149822631412931011913655279124662343201
TCAG148574794801839639617537189874291914337129548
TCCG15114330676122173711083617372496197372382253
TCGG74515592641364826213453317352301223118
TCTG2700887679017437598196393041417143415824173653
ACAT6483301983912913334465232117366729485661092148
ACCT406324379149141082477046124322261639865421391921
ACGT89314557138758454659267222219552009175813181993103607951
ACTT4600193133581454253609552402075452567644931766
CCAT459361292335616236400367703412019107349849136937
CCCT55144925351563124534657562341217860106194521321405
CCGT90376230935861616690247916719001747167113821942753687385
CCTT60555475268552217836675282390259913960138831381599
GCAT391332638951812645496279430244917304147891254051
GCCT50143961861438183380213017374269311615531971252426013
GCGT779657441491645586105591312223530181719141532941036512763
GCTT4278306661321611635017701124536528903881511001755056
TCAT385511556347554859553449953405268114280068562991758
TCCT439773238445328466449866954475711017172867882123178
TCGT504553293317140595456207311110226497102510869613823518507
TCTT4336748727330693263769326435578975284258621554029
Table A4. Table A3, continued: occurrence counts (aggregated by cancer types) for the next 48 mutation categories for cancer types X17–X32.
Table A4. Table A3, continued: occurrence counts (aggregated by cancer types) for the next 48 mutation categories for cancer types X17–X32.
Mut.X17X18X19X20X21X22X23X24X25X26X27X28X29X30X31X32
ATAA20991060431319735297634390549417119
ATCA120111124184221517529975423893241641256
ATGA1558217645162614855197740300108231440119
ATTA150187310592171169812916226599121821424
CTAA1106136621991660331143229144958962
CTCA1580215756732618899259056443126162125252
CTGA3163626651823019112229132100540140443142212
CTTA12532176762217194801010480377110237019271
GTAA737141921124227331116520377608510103
GTCA8081125248591117212653620491221612218
GTGA1232239139563112638127235243821517255131
GTTA888102935631110264125238143679317185
TTAA173582742461749236713926457182890
TTCA897106257712614550766383279622816217
TTGA114816233791168221449362516112617113
TTTA1985940153841213147667803299411720299
ATAC90962438512939189243662511355301684354101922
ATCC283510744672440183266431711254981376240681163
ATGC861829827461045248363932521227612134956921716
ATTC592817167761039261298843121655662133557133788
CTAC295113876134168794261088543293342354557
CTCC336013508809232220290302021277272801806696957
CTGC67543236100828642094149028411791025789781262141
CTTC40471913191393322439150231127730347255117159812
GTAC30481831513244123390382292044221573847561717
GTCC270811275912839131445292373114041515452351844
GTGC498718246261348140451392041665151984549531871
GTTC312414103319640163400502552674942365177761833
TTAC32209219346199110622138108272101252124782
TTCC28729629302236176207321661633961804643401411
TTGC38951271903718114254541591104261435136591429
TTTC3660994986322148245581681754171403737451159
ATAG1333325200164239752451222113104138
ATCG7052952721145897655622303513185422
ATGG112252254941387761249402297821148215
ATTG1101348334075518279523927230188191241
CTAG5032381190114313336377922348184
CTCG1030504443698410011924221070111624372
CTGG1519138460630221641282111264291101151433577
CTTG12401100109042914128019154271336881517311741
GTAG64331912861944396482410224219678
GTCG658311202695521001065551273458198259
GTGG13148359043843312119015128333501031242415181
GTTG1097478467351267178992671716236197408
TTAG960272237110543174812011726925390
TTCG10404624540129311698614831582115219754
TTGG1331733419414119621882642676529618264
TTTG223267788222511837912951083472661410264615
Table A5. Top-10 clusterings (Clustering-E1–Clustering-E10) by occurrence counts (second column) in 30,000 runs (performed as 3 consecutive batches of 10,000 runs in each batch). Each run is based on 1000 samplings (i.e., num.try = 1000 in the R function qrm.stat.ind.class() in Appendix A of [16]); also, the target number of clusters is k = 13, which is based on the effective rank, also known as eRank (and is computed using the R function bio.erank.pc() in Appendix B of [13]). The columns “Cl-1”–“Cl-13” give the numbers of mutations in each cluster (the total number of mutations in each clustering is 96). The entries “–” correspond to clusterings with fewer than 13 (the target number) of clusters (note that top-10 clusterings have either 11 or 12 clusters; however, there are other, less frequently occurring clusterings with 13 clusters). While there was variability in the placement (by occurrence counts) of the top-10 clusterings within the aforesaid 3 batches of 10,000 runs, in each batch, Clustering-E1 invariably had the highest count by a large margin: Batch1, Clustering-E1 count = 95, second place count = 47; Batch2, Clustering-E1 count = 124, second place count = 46; Batch3, Clustering-E1 count = 115, second place count = 49.
Table A5. Top-10 clusterings (Clustering-E1–Clustering-E10) by occurrence counts (second column) in 30,000 runs (performed as 3 consecutive batches of 10,000 runs in each batch). Each run is based on 1000 samplings (i.e., num.try = 1000 in the R function qrm.stat.ind.class() in Appendix A of [16]); also, the target number of clusters is k = 13, which is based on the effective rank, also known as eRank (and is computed using the R function bio.erank.pc() in Appendix B of [13]). The columns “Cl-1”–“Cl-13” give the numbers of mutations in each cluster (the total number of mutations in each clustering is 96). The entries “–” correspond to clusterings with fewer than 13 (the target number) of clusters (note that top-10 clusterings have either 11 or 12 clusters; however, there are other, less frequently occurring clusterings with 13 clusters). While there was variability in the placement (by occurrence counts) of the top-10 clusterings within the aforesaid 3 batches of 10,000 runs, in each batch, Clustering-E1 invariably had the highest count by a large margin: Batch1, Clustering-E1 count = 95, second place count = 47; Batch2, Clustering-E1 count = 124, second place count = 46; Batch3, Clustering-E1 count = 115, second place count = 49.
NameCountCl-1Cl-2Cl-3Cl-4Cl-5Cl-6Cl-7Cl-8Cl-9Cl-10Cl-11Cl-12Cl-13
Clustering-E133434666779161616
Clustering-E213434666777891617
Clustering-E312624666779161617
Clustering-E412024666778991616
Clustering-E511234666779151617
Clustering-E6109133666779161616
Clustering-E7109336667789101516
Clustering-E8105346667779101516
Clustering-E978246666779111616
Clustering-E107633666778891617
Table A6. Weights (in the units of 1%, rounded to 2 digits) for the first 48 mutation categories for the 11 clusters in Clustering-E1 (see Table A5) based on unnormalized regressions (see Section 3.2 for details). Each cluster is defined as containing the mutations with nonzero weights. For instance, cluster Cl-1 contains 3 mutations GCGA, TCGA, CTGA (also see Table A7). In each cluster, the weights are normalized to add up to 100% (up to 2 digits due to the aforesaid rounding).
Table A6. Weights (in the units of 1%, rounded to 2 digits) for the first 48 mutation categories for the 11 clusters in Clustering-E1 (see Table A5) based on unnormalized regressions (see Section 3.2 for details). Each cluster is defined as containing the mutations with nonzero weights. For instance, cluster Cl-1 contains 3 mutations GCGA, TCGA, CTGA (also see Table A7). In each cluster, the weights are normalized to add up to 100% (up to 2 digits due to the aforesaid rounding).
MutationCl-1Cl-2Cl-3Cl-4Cl-5Cl-6Cl-7Cl-8Cl-9Cl-10Cl-11
ACAA0.000.000.000.000.000.000.000.003.510.000.00
ACCA0.000.000.000.000.000.000.000.003.450.000.00
ACGA0.000.000.000.000.000.0014.210.000.000.000.00
ACTA0.000.000.000.000.000.000.000.002.780.000.00
CCAA0.000.000.000.000.000.000.000.005.010.000.00
CCCA0.000.000.000.000.000.000.000.004.190.000.00
CCGA0.000.000.000.000.000.000.000.003.050.000.00
CCTA0.000.000.000.000.000.000.000.006.910.000.00
GCAA0.000.000.000.000.000.000.000.005.050.000.00
GCCA0.000.000.000.000.000.000.000.005.390.000.00
GCGA39.730.000.000.000.000.000.000.000.000.000.00
GCTA0.000.000.000.000.000.000.000.006.390.000.00
TCAA0.000.000.000.000.000.000.000.006.050.000.00
TCCA0.000.000.000.000.000.000.000.005.200.000.00
TCGA25.160.000.000.000.000.000.000.000.000.000.00
TCTA0.000.000.000.000.000.000.000.0015.880.000.00
ACAG0.000.000.000.000.000.0014.550.000.000.000.00
ACCG0.000.0013.070.000.000.000.000.000.000.000.00
ACGG0.000.000.000.000.009.410.000.000.000.000.00
ACTG0.000.0015.050.000.000.000.000.000.000.000.00
CCAG0.000.000.000.000.000.0014.650.000.000.000.00
CCCG0.000.000.000.000.000.0012.890.000.000.000.00
CCGG0.000.000.000.000.000.0011.720.000.000.000.00
CCTG0.000.000.000.000.000.0017.130.000.000.000.00
GCAG0.000.000.000.000.0014.480.000.000.000.000.00
GCCG0.000.000.0014.070.000.000.000.000.000.000.00
GCGG0.000.000.000.000.0025.130.000.000.000.000.00
GCTG0.000.000.007.350.000.000.000.000.000.000.00
TCAG0.000.000.000.000.000.000.000.0010.410.000.00
TCCG0.000.000.000.000.000.000.000.004.510.000.00
TCGG0.000.000.000.000.000.0014.860.000.000.000.00
TCTG0.000.000.000.000.000.000.000.0012.230.000.00
ACAT0.000.000.000.000.000.000.000.000.001.630.00
ACCT0.000.000.000.000.000.000.000.000.002.620.00
ACGT0.000.000.000.000.000.000.000.000.007.610.00
ACTT0.000.000.000.000.000.000.000.000.000.006.83
CCAT0.000.000.000.000.000.000.000.000.004.650.00
CCCT0.000.000.000.000.000.000.000.000.006.180.00
CCGT0.000.000.000.000.000.000.000.000.008.070.00
CCTT0.000.000.000.000.000.000.000.000.005.360.00
GCAT0.000.000.000.000.000.000.000.000.002.170.00
GCCT0.000.000.000.000.000.000.000.000.004.480.00
GCGT0.000.000.000.000.000.000.000.000.009.570.00
GCTT0.000.000.000.000.000.000.000.000.003.320.00
TCAT0.000.000.000.000.000.000.000.000.009.260.00
TCCT0.000.000.000.000.000.000.000.000.0013.860.00
TCGT0.000.000.000.000.000.000.000.000.0012.420.00
TCTT0.000.000.000.000.000.000.000.000.007.130.00
Table A7. Table A6, continued: weights for the next 48 mutation categories.
Table A7. Table A6, continued: weights for the next 48 mutation categories.
MutationCl-1Cl-2Cl-3Cl-4Cl-5Cl-6Cl-7Cl-8Cl-9Cl-10Cl-11
ATAA0.000.000.000.000.0013.500.000.000.000.000.00
ATCA0.000.0015.560.000.000.000.000.000.000.000.00
ATGA0.000.0016.900.000.000.000.000.000.000.000.00
ATTA0.000.000.000.0014.100.000.000.000.000.000.00
CTAA0.000.000.000.000.009.120.000.000.000.000.00
CTCA0.000.0017.420.000.000.000.000.000.000.000.00
CTGA35.110.000.000.000.000.000.000.000.000.000.00
CTTA0.000.000.000.0019.220.000.000.000.000.000.00
GTAA0.000.000.000.000.0011.140.000.000.000.000.00
GTCA0.000.000.000.000.000.000.0013.540.000.000.00
GTGA0.000.0022.000.000.000.000.000.000.000.000.00
GTTA0.000.000.000.000.000.000.0010.690.000.000.00
TTAA0.0019.890.000.000.000.000.000.000.000.000.00
TTCA0.0026.530.000.000.000.000.000.000.000.000.00
TTGA0.0024.440.000.000.000.000.000.000.000.000.00
TTTA0.0029.140.000.000.000.000.000.000.000.000.00
ATAC0.000.000.000.000.000.000.000.000.000.006.02
ATCC0.000.000.000.000.000.000.000.000.000.005.08
ATGC0.000.000.000.000.000.000.000.000.000.007.83
ATTC0.000.000.000.000.000.000.000.000.000.006.08
CTAC0.000.000.000.0019.380.000.000.000.000.000.00
CTCC0.000.000.000.000.000.000.000.000.000.007.15
CTGC0.000.000.000.000.000.000.000.000.001.670.00
CTTC0.000.000.000.000.000.000.000.000.000.007.81
GTAC0.000.000.000.000.000.000.000.000.000.006.00
GTCC0.000.000.000.000.000.000.000.000.000.006.68
GTGC0.000.000.000.000.000.000.000.000.000.006.45
GTTC0.000.000.000.000.000.000.000.000.000.006.71
TTAC0.000.000.000.0018.540.000.000.000.000.000.00
TTCC0.000.000.000.000.000.000.000.000.000.005.16
TTGC0.000.000.000.000.000.000.000.000.000.004.64
TTTC0.000.000.000.000.000.000.000.000.000.004.63
ATAG0.000.000.000.000.000.000.009.170.000.000.00
ATCG0.000.000.000.000.000.000.0010.650.000.000.00
ATGG0.000.000.000.000.000.000.0011.820.000.000.00
ATTG0.000.000.000.0014.650.000.000.000.000.000.00
CTAG0.000.000.000.000.000.000.006.720.000.000.00
CTCG0.000.000.000.000.000.000.0014.760.000.000.00
CTGG0.000.000.007.090.000.000.000.000.000.000.00
CTTG0.000.000.000.000.000.000.000.000.000.006.31
GTAG0.000.000.000.000.0017.220.000.000.000.000.00
GTCG0.000.000.0010.700.000.000.000.000.000.000.00
GTGG0.000.000.0053.390.000.000.000.000.000.000.00
GTTG0.000.000.007.380.000.000.000.000.000.000.00
TTAG0.000.000.000.000.000.000.009.410.000.000.00
TTCG0.000.000.000.0014.110.000.000.000.000.000.00
TTGG0.000.000.000.000.000.000.0013.240.000.000.00
TTTG0.000.000.000.000.000.000.000.000.000.006.63
Table A8. Cross-sectional correlations between 30 COSMICsignatures and cancer types X1–X16 for the exome data summarized in Table 1 aggregated by cancer types. The weights for COSMIC signatures are available from [36]. The values above 80% are given in bold font. The values above 70% are underlined.
Table A8. Cross-sectional correlations between 30 COSMICsignatures and cancer types X1–X16 for the exome data summarized in Table 1 aggregated by cancer types. The weights for COSMIC signatures are available from [36]. The values above 80% are given in bold font. The values above 70% are underlined.
SignatureX1X2X3X4X5X6X7X8X9X10X11X12X13X14X15X16
COSMIC189.4594.2980.4470.2467.3627.6190.8323.7327.5863.390.1278.5181.1889.3397.0349.55
COSMIC224.579.9818.6114.156.2382.294.881.0781.7414.1219.067.6927.652.917.9671.12
COSMIC3−12.79−15.523.13−8.382.417−12.5628.3315.5−6.34−2.18−15.44−6.58−22.57−15.9715.47
COSMIC48.78−2.4840.12−7.3326.25−5.152.42−1.54−5.6522.44−1.494.36−0.21−7.05−5.546.11
COSMIC531.2930.6627.636.3652.77.333.069.533.6338.6151.3234.328.4327.8727.6423.26
COSMIC675.6782.7677.3469.8372.7416.292.4713.8416.859.9577.3478.6674.292.4684.3540.69
COSMIC744.1822.9522.2125.9826.2951.1315.543.9349.0622.8931.6718.7928.8810.8321.5859.96
COSMIC812.828.0829.721.5416.56−4.5414.912.71−4.6518.1418.415.457.783.286.332.89
COSMIC9−3.53−3.98−10.1922.79−2.13−10.041.52−7.89−9.83−3.79815.4814.234.86−3.85−12.77
COSMIC1034.3417.6428.2326.0314.0727.230.5320.5325.2713.3619.7764.8625.8823.0616.4322.68
COSMIC1135.223.6919.2523.936.7227.1921.5324.4124.522.0732.8315.3121.7613.0726.4442.54
COSMIC12−1.61−2.34−5.854.6723.61−5.740.37−5.46−7.789.7714.579.17−0.396.51−4.08−1.68
COSMIC132.87−3.4413.741.98−4.9171.24−2.5576.3572.850.633.61−4.2218.47−4.440.260.08
COSMIC1452.7948.4561.4445.3854.4712.3674.0513.5313.1935.4151.7561.6747.6560.6451.7631.07
COSMIC1551.2850.1953.954.1257.7510.5270.489.5311.2239.9549.6562.2449.8571.4558.8528.77
COSMIC16−2.28−2.35−0.57.2822.178.9−1.9711.415.7116.1718.764.36.41−3.4−6.1711.89
COSMIC17−1.47−0.75−9.3544.180.18−0.04−0.7−0.650.143.858.057.8139.4211.2−0.52−2.05
COSMIC1819.893.8343.3410.4824.386.5512.589.736.778.445.5325.9212.854.011.9312.79
COSMIC1942.9645.1438.4737.8456.0419.2241.3818.916.236.7248.6531.634.4933.1142.3739.03
COSMIC2031.735.841.0230.5448.65.9447.075.424.334.2140.4642.7530.6741.0432.1120.18
COSMIC219.799.964.7716.5620.54−3.6814.23−5.29−4.3411.9522.0319.98.4223.8811.672.69
COSMIC22−10.52−9.55−11.9−10.11−11.87−11.33−12.09−12.28−12.2552.49−2.78−14.12−14.12−11.19−8.46−11.76
COSMIC2327.1625.8119.4425.4442.99.5724.37.916.3923.5427.917.5917.9417.6427.6526.29
COSMIC2410.03−0.8443.352.4227.867.293.839.458.1710.73−0.155.556.34−1.74−1.9714.26
COSMIC2512.2611.0211.710.0813.836.6611.168.124.9357.5324.3113.548.959.189.9410.21
COSMIC2613.3713.388.2617.3729.370.7720.940.15−0.6918.1928.8225.2814.2328.1814.857.4
COSMIC27−4.19−4.22−5.911.02−11.83−2.66−4.31−2.91−2.638.24−7.21−4.96−6.82−4.21−3.09−4.06
COSMIC28−10.61−8.93−19.2317.31−11.5−8.21−7.57−8.29−6.85−5.6−7.64.119.380.1−8.41−14.27
COSMIC2926.9417.4453.814.8737.873.0423.716.254.0619.419.7224.2821.1716.5116.6814.59
COSMIC3044.2132.5825.3439.3437.0944.0626.2741.0141.9530.0141.7219.6331.6519.9731.0655.25
Table A9. Cross-sectional correlations between 30 COSMIC signatures and cancer types X17–X32 for the exome data summarized in Table 1 aggregated by cancer types. The weights for COSMIC signatures are available from [36]. The values above 80% are given in bold font. The values above 70% are underlined.
Table A9. Cross-sectional correlations between 30 COSMIC signatures and cancer types X17–X32 for the exome data summarized in Table 1 aggregated by cancer types. The weights for COSMIC signatures are available from [36]. The values above 80% are given in bold font. The values above 70% are underlined.
SignatureX17X18X19X20X21X22X23X24X25X26X27X28X29X30X31X32
COSMIC166.6622.8319.8614.4748.0866.5880.2483.3194.1658.8878.3759.458.1788.2776.2659.17
COSMIC213.3740.0450.8110.4636.6735.998.1310.2610.389.9928.0527.392.045.9138.810.5
COSMIC35.8331.250.864.429.6541.57−16.05−8.32−13.9−12.489.73−2.391.36−14.16−0.95−7.39
COSMIC41.8175.05−2.14−4.649.9727.12.51−5.36−1.115.9617.9721.3927.45.24−4.2812.78
COSMIC579.615.526.99−7.3110.7342.4529.3549.2630.4115.3153.9531.4431.6634.0942.4727.19
COSMIC663.8925.948.8212.0141.3262.4591.0687.7488.2446.2277.9650.7670.1688.8672.5358.1
COSMIC727.6531.8799.66736.3738.5218.5425.8119.522.1738.1756.5811.0216.3942.3120.47
COSMIC814.7243.44−9.3−5.2129.3425.7810.472.318.371821.1914.714.4713.530.9423.61
COSMIC915.26−17.36−6.58−14.5−14.65−13.151.24−0.21−1.9214.470.55−8.02−8.43−2.77−6.7318.46
COSMIC1011.4819.2118.580.8624.1219.6531.7514.8720.3389.1622.1417.6610.0615.2914.5687.49
COSMIC1137.0221.9477.365.7221.5636.8126.7235.6120.256.4540.2648.9323.0719.1141.8514.48
COSMIC1250.27−7.261.58−9.28−16.64−2.1−2.5216.72−0.01−9.4316.39−2.978.688.569.4−2.31
COSMIC13−2.0435.656.66−0.3831.2938.21−2.831.342.46−0.1111.234.34−2.61−3.2122.91−0.89
COSMIC1438.5536.514.945.8643.0849.4176.6556.5854.438.2257.554.3459.6459.9946.3558.74
COSMIC1542.0514.957.149.8927.6141.9582.987663.7332.7356.7540.1462.7168.4253.8746.38
COSMIC1656.2610.389.23−13.64−4.4613.69−4.7115.75−1.59−3.720.562.254.063.4115.091.88
COSMIC171.26−11.274.05−2.77−8.56−3.30.91−0.680.943.563.22−0.834.773.023.123.93
COSMIC181.0855.29−1.046.4353.9723.1215.210.288.8232.6118.5433.3624.0210.971.1638.89
COSMIC1955.0626.5748.765.0123.5450.7944.6154.3839.5714.4257.8947.6445.440.352.1625.02
COSMIC2055.931.19.542.8519.9237.8240.7943.9534.1517.3152.4324.842.4340.3239.6635.22
COSMIC2133.38−12.860.05−6.89−9.660.2312.3521.4215.342.915.332.966.5518.5514.918.78
COSMIC22−7.62−1.77−7.78−12.32−18.87−15.29−12.86−12.47−11.18−8.83−2.13−13.97−20.14−11.76−10.2−12.77
COSMIC2337.7414.6747.262.8812.2133.8330.5139.5122.274.5340.8136.5233.8321.7538.1614.34
COSMIC24−3.4864.69−1.21.2367.2326.969.63−0.15.176.4715.7936.1440.48.820.6411.09
COSMIC2526.4814.87−0.68−6.78−0.3812.126.8811.39.6713.2323.944.69−3.359.5118.1814.51
COSMIC2654.08−8.650.08−6.4−9.686.3317.8733.0318.873.9725.39412.1324.8622.7511.68
COSMIC27−5.21−5.53−0.83−1.13−5.61−10.71−3.71−4.84−4.08−3.215.48−4.27−10.11−6.96−6.57−4.13
COSMIC28−5−21.81−5.79−8.13−18.66−17.36−5.6−10.39−8.433.09−5.96−13.41−15.94−10.54−11.473.99
COSMIC2914.7560.17−4.795.7671.3834.2823.6812.7922.5521.9427.8445.5943.1526.3210.2827.23
COSMIC3041.4626.3776.085.9730.0545.8227.9938.0827.758.5548.6551.8823.8323.8451.5815.89
Table A10. Cross-sectional correlations between 30 COSMIC signatures and cancer types G.X1–G.X14 for the genome data summarized in Table 1 of [16] aggregated by cancer types. G.X1 = B-cell lymphoma, G.X2 = bone cancer, G.X3 = brain lower grade glioma, G.X4 = breast cancer, G.X5 = chronic lymphocytic leukemia, G.X6 = esophageal cancer, G.X7 = gastric cancer, G.X8 = liver cancer, G.X9 = lung cancer, G.X10 = medulloblastoma, G.X11 = ovarian cancer, G.X12 = pancreatic cancer, G.X13 = prostate cancer, G.X14 = renal cell carcinoma. The weights for COSMIC signatures are available from [36]. The values above 80% are given in bold font. The values above 70% are underlined.
Table A10. Cross-sectional correlations between 30 COSMIC signatures and cancer types G.X1–G.X14 for the genome data summarized in Table 1 of [16] aggregated by cancer types. G.X1 = B-cell lymphoma, G.X2 = bone cancer, G.X3 = brain lower grade glioma, G.X4 = breast cancer, G.X5 = chronic lymphocytic leukemia, G.X6 = esophageal cancer, G.X7 = gastric cancer, G.X8 = liver cancer, G.X9 = lung cancer, G.X10 = medulloblastoma, G.X11 = ovarian cancer, G.X12 = pancreatic cancer, G.X13 = prostate cancer, G.X14 = renal cell carcinoma. The weights for COSMIC signatures are available from [36]. The values above 80% are given in bold font. The values above 70% are underlined.
SignatureG.X1G.X2G.X3G.X4G.X5G.X6G.X7G.X8G.X9G.X10G.X11G.X12G.X13G.X14
COSMIC139.3886.2791.0515.4469.7374.4348.71−2.936.5994.8646.9495.3183.2719.48
COSMIC247.642218.3979.9111.5454.237.52−10.4432.4814.3932.2812.8727.2522.72
COSMIC311.5110.771.153610.736.034.291.3749.071.0460.73−16.413.836.19
COSMIC4−10.3116.210.211.071.061.39−2.628.4482.537.7926.95−3.5717.328.1
COSMIC555.457.7655.7213.0675.0222.0746.351.3229.0548.6263.5124.551.1554.23
COSMIC633.2466.8371.526.3758.4159.4650.19−4.938.2376.4237.6586.7266.8711.58
COSMIC742.6226.530.7439.2522.5136.557.210.2218.8423.9522.8219.3825.421.84
COSMIC815.6446.1622.687.6837.639.8520.6−4.5468.6331.6757.317.3744.8654.24
COSMIC962.3214.3312.05−3.7951.08−6.8360.78−13.48−0.579.8216.12−1.5718.4831.51
COSMIC1027.6828.4822.4311.923.9730.6825.051.7420.1230.2218.5421.4135.3813.93
COSMIC1136.7828.0331.9522.4929.3722.688.26.5213.7925.9522.7717.4821.5820.6
COSMIC1223.8814.2616.17−1.8534.02−6.6323.78−3.793.577.723.78−4.489.7818.13
COSMIC1313.664.282.3380.9−4.6547.36−1.17−14.6828.780.5943.563.9916.7110.38
COSMIC1430.0552.9144.839.2244.0439.2539.49−2.0330.5153.3536.453.7252.4216.22
COSMIC1525.6742.1841.374.5339.7835.8934.97−0.462.3148.6724.4958.9946.225.51
COSMIC1645.0227.5525.3918.4850.434.9632.15−1.0527.9914.6249.19−6.5224.244.42
COSMIC1754.33−2.692.27−0.59190.4373.87−3.68−11.64−2.52−32.2−2.89−2.4
COSMIC188.5627.178.019.5514.2415.8111.64.266.5518.7324.896.4831.0918.59
COSMIC1937.4648.0856.9217.8349.2930.7120.983.4920.945.8540.0234.2339.3328.58
COSMIC2022.0935.4136.982.4536.1323.8232.67−7.125.7834.9928.9632.3528.4212.46
COSMIC2112.2315.2218.41−6.0123.793.9616.111.44−12.0814.868.3311.1612.87−1.38
COSMIC22−15.62−7.33−10.46−10.43−7−8.75−13.39−16.172.97−11.57−5.9−9.23−8.0847.87
COSMIC2323.1224.0834.447.0629.3112.748.415.57.7824.4517.6216.6616.0815.21
COSMIC24−6.812.77−1.6110.16−3.4710.01−3.9810.2762.166.0716.11.8614.96.58
COSMIC2515.828.1120.8110.0729.9216.6316.12−21.225.4219.5133.8610.2327.0961.26
COSMIC262524.2325.820.8737.317.7127.110.51−4.2521.1324.314.9520.768.6
COSMIC27−4.17−0.03−4.34−2.16−2.63−4.3−7.7−8.592.35−3.32−4.12−3.360.4735.18
COSMIC2842.33−9.94−3.87−6.6216.24−11.5952.443.29−18.66−8.75−7.25−5.64−5.7112.33
COSMIC297.4138.8521.135.9323.5220.4114.984.2468.0931.1833.4719.6838.5516.98
COSMIC3049.5937.4641.4938.8435.9139.0216.170.9615.1534.5227.3926.8632.0224.89
Figure A1. Cluster Cl-1 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A1. Cluster Cl-1 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g012
Figure A2. Cluster Cl-2 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A2. Cluster Cl-2 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g013
Figure A3. Cluster Cl-3 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A3. Cluster Cl-3 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g014
Figure A4. Cluster Cl-4 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A4. Cluster Cl-4 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g015
Figure A5. Cluster Cl-5 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A5. Cluster Cl-5 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g016
Figure A6. Cluster Cl-6 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A6. Cluster Cl-6 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g017
Figure A7. Cluster Cl-7 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A7. Cluster Cl-7 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g018
Figure A8. Cluster Cl-8 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A8. Cluster Cl-8 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g019
Figure A9. Cluster Cl-9 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A9. Cluster Cl-9 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g020
Figure A10. Cluster Cl-10 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A10. Cluster Cl-10 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g021
Figure A11. Cluster Cl-11 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Figure A11. Cluster Cl-11 in Clustering-E1 with weights based on unnormalized regressions with arithmetic means.
Genes 08 00201 g022

References

  1. Goodman, M.F.; Fygenson, K.D. DNA polymerase fidelity: From genetics toward a biochemical understanding. Genetics 1998, 148, 1475–1482. [Google Scholar] [PubMed]
  2. Lindahl, T. Instability and decay of the primary structure of DNA. Nature 1993, 362, 709–715. [Google Scholar] [CrossRef] [PubMed]
  3. Ananthaswamy, H.N.; Pierceall, W.E. Molecular mechanisms of ultraviolet radiation carcinogenesis. Photochem. Photobiol. 1990, 52, 1119–1136. [Google Scholar] [CrossRef] [PubMed]
  4. Loeb, L.A.; Harris, C.C. Advances in chemical carcinogenesis: A historical review and perspective. Cancer Res. 2008, 68, 6863–6872. [Google Scholar] [CrossRef] [PubMed]
  5. See, e.g., [84]. A goal of early detection (via blood tests) is behind Grail, Inc.’s (Menlo Park, California) recent ∼$1B series B funding round; see, e.g., [85].
  6. American Cancer Society. What Are the Key Statistics About Cancers of Unknown Primary? 2017. Available online: https://www.cancer.org/cancer/cancer-unknown-primary/about/key-statistics.html (accessed on 31 March 2017).
  7. In brief, DNA is a double helix of two strands, and each strand is a string of letters A, C, G, T corresponding to adenine, cytosine, guanine and thymine, respectively. In the double helix, A in one strand always binds with T in the other, and G always binds with C. This is known as base complementarity. Thus, there are six possible base mutations C > A, C > G, C > T, T > A, T > C, T > G, whereas the other six base mutations are equivalent to these by base complementarity. Each of these six possible base mutations is flanked by four possible bases on each side, thereby producing 4 × 6 × 4 = 96 distinct mutation categories.
  8. A priori, nonlinearities could alter this conclusion. However, such nonlinearities may also render cancer signatures essentially useless.
  9. Alexandrov, L.B.; Nik-Zainal, S.; Wedge, D.C.; Campbell, P.J.; Stratton, M.R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 2013, 3, 246–259. [Google Scholar] [CrossRef] [PubMed]
  10. Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [PubMed]
  11. Paatero, P.; Tapper, U. Positive matrix factorization: A non-negative factor model with optimal utilization of error. Environmetrics 1994, 5, 111–126. [Google Scholar] [CrossRef]
  12. By “noise”, we mean the statistical errors in the weights obtained by averaging. Usually, such error bars are not reported in the literature on cancer signatures. Typically, they are large.
  13. Kakushadze, Z.; Yu, W. Factor Models for Cancer Signatures. Phys. A 2016, 462, 527–559. Available online: http://ssrn.com/abstract=2772458 (accessed on 31 March 2017). [CrossRef]
  14. This is achieved by cross-sectionally (i.e., across the 96 mutation categories) demeaning “log-counts”. This “de-noising” dramatically improved NMF-based signatures we extracted from genome data in [13] and cut the computational cost (these savings would scale nonlinearly for larger datasets) by a factor of about 10 on a genome dataset for 1389 samples in 14 cancer types. In [13], by adapting the methods used in statistical risk models in quantitative finance [86], we also proposed a simple method for fixing the number of cancer signatures based on eRank (effective rank) [87].
  15. In aggregating samples by cancer types, for some cancer types, pertinent information may be muddled up as there may be biologic factors one may wish to understand, e.g., mutational spectra of liver cancers can have substantial regional dependence as they are mutagenized by exposures to different chemicals (alcohol, aflatoxin, tobacco, etc.). In such cases, aggregation by regions (or other applicable characteristics, as the case may be) within a cancer type may still be warranted to reduce noise (or else, without any aggregation, there are simply too many cancer signatures; see, e.g., Table 7 in [13].) However, not to get ahead of ourselves (one step at a time), in this paper, we will work with (exome) data aggregated by cancer types (see below).
  16. Kakushadze, Z.; Yu, W. *K-means and Cluster Models for Cancer Signatures. Biomol. Detect. Quantif. 2017. (forthcoming). Available online: https://ssrn.com/abstract=2908286 (accessed on 31 March 2017). [CrossRef]
  17. Catalog of Somatic Mutations in Cancer. Wellcome Trust Sanger Institute. 2017. Available online: http://cancer.sanger.ac.uk/cosmic/signatures (accessed on 31 March 2017).
  18. There is virtually no way to make this paper self-contained without essentially copying all of the technical details over from [16]. We will not do so here. Instead, readers interested in technical details should read this paper together with [16].
  19. It also fixes the number of clusters K: it fixes the target number of clusters K1 via an eRank-based method (see [14]); then, the final number of clusters KK1 follows via machine learning.
  20. One of the cancer types for which clustering does not appear to work well, completely consistently with and expectedly from the results of [13], is liver cancer. In particular, the dominant (with a 96% contribution) NMF-based cancer signature we found in [13] for liver cancer does not have “peaks” (“rolling hills landscape”), with no resemblance to a clustering substructure. In this regard, note our comments in [15].
  21. Ng, S.B.; Turner, E.H.; Robertson, P.D.; Flygare, S.D.; Bigham, A.W.; Lee, C.; Shaffer, T.; Wong, M.; Bhattacharjee, A.; Eichler, E.E.; et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 2009, 461, 272–276. [Google Scholar] [CrossRef] [PubMed]
  22. Kakushadze, Z.; Yu, W. Statistical Industry Classification. J. Risk Control 2016, 3, 17–65. Available online: http://ssrn.com/abstract=2802753 (accessed on 31 March 2017). [CrossRef]
  23. Forgy, E.W. Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics 1965, 21, 768–769. [Google Scholar]
  24. Hartigan, J.A. Clustering Algorithms; John Wiley & Sons, Inc.: New York, NY, USA, 1975. [Google Scholar]
  25. Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-Means Clustering Algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 1979, 28, 100–108. [Google Scholar] [CrossRef]
  26. Lloyd, S.P. Least Square Quantization in PCM. Working Paper, Bell Telephone Laboratories, Murray Hill, NJ, USA, 1957. [Google Scholar]
  27. Lloyd, S.P. Least square quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
  28. MacQueen, J.B. Some Methods for classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability; LeCam, L., Neyman, J., Eds.; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
  29. Steinhaus, H. Sur la division des corps matériels en parties. Bull. Acad. Polon. Sci. 1957, 4, 801–804. [Google Scholar]
  30. We ran these three batches consecutively, and each batch produced slightly different top-10 (by occurrence counts) clusterings with varying occurrence counts across the batches, etc. However, Clustering-E1 invariably had the highest occurrence count by a large margin. See Table A5.
  31. Due to a binary clustering structure, the within-cluster weights WiA are encoded in an N-vector wi. This is because all but N elements of the matrix WiA are zero.
  32. Alexandrov, L.B.; Nik-Zainal, S.; Wedge, D.C.; Aparicio, S.A.; Behjati, S.; Biankin, A.V.; Bignell, G.R.; Bolli, N.; Borg, A.; Børresen-Dale, A.L.; et al. Signatures of mutational processes in human cancer. Nature 2013, 500, 415–421. [Google Scholar] [CrossRef] [PubMed]
  33. Alexandrov, L.B.; Stratton, M.R. Mutational signatures: The patterns of somatic mutations hidden in cancer genomes. Curr. Opin. Genet. Dev. 2014, 24, 52–60. [Google Scholar] [CrossRef] [PubMed]
  34. Helleday, T.; Eshtad, S.; Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nat. Rev. Genet. 2014, 15, 585–598. [Google Scholar] [CrossRef] [PubMed]
  35. Nik-Zainal, S.; Alexandrov, L.B.; Wedge, D.C.; Van Loo, P.; Greenman, C.D.; Raine, K.; Jones, D.; Hinton, J.; Marshall, J.; Stebbings, L.A.; et al. Mutational processes molding the genomes of 21 breast cancers. Cell 2012, 149, 979–993. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. see http://cancer.sanger.ac.uk/cancergenome/assets/signatures_probabilities.txt (accessed on 26 February 2017). Note that the ordering of mutation categories in this file is not the same as ours.
  37. However, there is no magic here. Apparently, there is a large overlap between the exome data we use here and those used by [17]. Furthermore, caution is in order when it comes to any NMF-based signature that dominates a given cancer type. What this means is that the signature is close to the properly normalized underlying occurrence counts data (either aggregated or appropriately averaged over all samples), and NMF samplings fail to find a local minimum substantially different along this particular direction from the local minima that include this cancer signature. Such a signature indicates that the corresponding cancer type is of a “stand-alone” type and has little in common with other cancer types. An example of such a signature is the liver cancer-dominant NMF-based cancer signature found in [13].
  38. Note that considering the overall fit quality for COSMIC signatures by running overall regressions (of Gis over U without the intercept) as we did above for clusters would not be meaningful. The regression coefficients FAs in (4) in the case of clusters are guaranteed to be nonnegative. This is because the N-vectors corresponding to the columns in the cluster weights matrix WiA are orthogonal to each other. The N-vectors corresponding to the columns in the COSMIC weights matrix U are not orthogonal, unacceptably resulting in many negative regression coefficients Fαs.
  39. Thus, to run one batch of NMF with 800 samplings on a four-CPU (eight cores each, 2.60 GHz) machine with 529 GB of RAM and hyper-threading (Operating System: Debian 3.2.84-2 x86_64 GNU/Linux), it took 6–7 days (and 3–4 days when the input data were “de-noised” following [13]). In contrast, to run each of our three batches of *K-means with 10 million instances of k-means in each batch (see Section 3.2), it only took under 24 h on a single CPU (quad-core, 3.1 GHz) machine with 16 GB of RAM (Operating System: 64-bit Windows Server 2008 R2 Standard). From this data, it is evident that *K-means computationally is much less expensive than NMF, even if NMF is improved via “de-noising” [13].
  40. Schulze, K.; Imbeaud, S.; Letouzé, E.; Alexandrov, L.B.; Calderaro, J.; Rebouissou, S.; Couchy, G.; Meiller, C.; Shinde, J.; Soysouvanh, F.; et al. Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets. Nat. Genet. 2015, 47, 505–511. [Google Scholar] [CrossRef] [PubMed]
  41. Thus, as mentioned above, we ran three batches of 800 NMF samplings. In each batch, 800 samplings are aggregated via nondeterministic clustering (e.g., via k-means; see, e.g., [16] for a detailed discussion). The net result, by design, is nondeterministic.
  42. Furthermore, as was argued in [16], NMF, at least to some degree, is clustering in disguise. In fact, visual inspection of COSMIC signatures makes it evident that many of them, albeit possibly not all, have clustering substructure. This will be discussed in more detail in a forthcoming paper. Furthermore, it would be interesting to understand the relation between “R-mutations” [88] (also see the references therein) and somatic mutational noise.
  43. Malcovati, L.; Papaemmanuil, E.; Bowen, D.T.; Boultwood, J.; Della Porta, M.G.; Pascutto, C.; Travaglino, E.; Groves, M.J.; Godfrey, A.L.; Ambaglio, I.; et al. Clinical significance of SF3B1 mutations in myelodysplastic syndromes and myelodysplastic/myeloproliferative neoplasms. Blood 2011, 118, 6239–6246. [Google Scholar] [CrossRef] [PubMed]
  44. Papaemmanuil, E.; Cazzola, M.; Boultwood, J.; Malcovati, L.; Vyas, P.; Bowen, D.; Pellagatti, A.; Wainscoat, J.S.; Hellstrom-Lindberg, E.; Gambacorti-Passerini, C.; et al. Somatic SF3B1 mutation in myelodysplasia with ring sideroblasts. N. Engl. J. Med. 2011, 365, 1384–1395. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Sausen, M.; Leary, R.J.; Jones, S.; Wu, J.; Reynolds, C.P.; Liu, X.; Blackford, A.; Parmigiani, G.; Diaz, L.A., Jr.; Papadopoulos, N.; et al. Integrated genomic analyses identify ARID1A and ARID1B alterations in the childhood cancer neuroblastoma. Nat. Genet. 2013, 45, 12–17. [Google Scholar] [CrossRef] [PubMed]
  46. Holmfeldt, L.; Wei, L.; Diaz-Flores, E.; Walsh, M.; Zhang, J.; Ding, L.; Payne-Turner, D.; Churchman, M.; Andersson, A.; Chen, S.C.; et al. The genomic landscape of hypodiploid acute lymphoblastic leukemia. Nat. Genet. 2013, 45, 242–252. [Google Scholar] [CrossRef] [PubMed]
  47. Zhang, J.; Ding, L.; Holmfeldt, L.; Wu, G.; Heatley, S.L.; Payne-Turner, D.; Easton, J.; Chen, X.; Wang, J.; Rusch, M.; et al. The genetic basis of early T-cell precursor acute lymphoblastic leukaemia. Nature 2012, 481, 157–163. [Google Scholar] [CrossRef] [PubMed]
  48. De Keersmaecker, K.; Atak, Z.K.; Li, N.; Vicente, C.; Patchett, S.; Girardi, T.; Gianfelici, V.; Geerdens, E.; Clappier, E.; Porcu, M.; et al. Exome sequencing identifies mutation in CNOT3 and ribosomal genes RPL5 and RPL10 in T-cell acute lymphoblastic leukemia. Nat. Genet. 2013, 45, 186–190. [Google Scholar] [CrossRef] [PubMed]
  49. Morin, R.D.; Mendez-Lago, M.; Mungall, A.J.; Goya, R.; Mungall, K.L.; Corbett, R.D.; Johnson, N.A.; Severson, T.M.; Chiu, R.; Field, M.; et al. Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature 2011, 476, 298–303. [Google Scholar] [CrossRef] [PubMed]
  50. Love, C.; Sun, Z.; Jima, D.; Li, G.; Zhang, J.; Miles, R.; Richards, K.L.; Dunphy, C.H.; Choi, W.W.; Srivastava, G.; et al. The genetic landscape of mutations in Burkitt lymphoma. Nat. Genet. 2012, 44, 1321–1325. [Google Scholar] [CrossRef] [PubMed]
  51. Pilati, C.; Amessou, M.; Bihl, M.P.; Balabaud, C.; Nhieu, J.T.; Paradis, V.; Nault, J.C.; Izard, T.; Bioulac-Sage, P.; Couchy, G.; et al. Genomic profiling of hepatocellular adenomas reveals recurrent FRK-activating mutations and the mechanisms of malignant transformation. Cancer Cell 2014, 25, 428–441. [Google Scholar] [CrossRef] [PubMed]
  52. Guo, G.; Sun, X.; Chen, C.; Wu, S.; Huang, P.; Li, Z.; Dean, M.; Huang, Y.; Jia, W.; Zhou, Q.; et al. Whole-genome and whole-exome sequencing of bladder cancer identifies frequent alterations in genes involved in sister chromatid cohesion and segregation. Nat. Genet. 2013, 45, 1459–1463. [Google Scholar] [CrossRef] [PubMed]
  53. Nik-Zainal, S.; Van Loo, P.; Wedge, D.C.; Alexandrov, L.B.; Greenman, C.D.; Lau, K.W.; Raine, K.; Jones, D.; Marshall, J.; Ramakrishna, M.; et al. The life history of 21 breast cancers. Cell 2012, 149, 994–1007. [Google Scholar] [CrossRef] [PubMed]
  54. Stephens, P.J.; Tarpey, P.S.; Davies, H.; Van Loo, P.; Greenman, C.; Wedge, D.C.; Nik-Zainal, S.; Martin, S.; Varela, I.; Bignell, G.R.; et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 2012, 486, 400–404. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  55. Shah, S.P.; Roth, A.; Goya, R.; Oloumi, A.; Ha, G.; Zhao, Y.; Turashvili, G.; Ding, J.; Tse, K.; Haffari, G.; et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 2012, 486, 395–399. [Google Scholar] [CrossRef] [PubMed]
  56. Zou, S.; Li, J.; Zhou, H.; Frech, C.; Jiang, X.; Chu, J.S.; Zhao, X.; Li, Y.; Li, Q.; Wang, H.; et al. Mutational landscape of intrahepatic cholangiocarcinoma. Nat. Commun. 2014, 5, 5696. [Google Scholar] [CrossRef] [PubMed]
  57. Quesada, V.; Conde, L.; Villamor, N.; Ordóñez, G.R.; Jares, P.; Bassaganyas, L.; Ramsay, A.J.; Beà, S.; Pinyol, M.; Martínez-Trillos, A.; et al. Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat. Genet. 2011, 44, 47–52. [Google Scholar] [CrossRef] [PubMed]
  58. Seshagiri, S.; Stawiski, E.W.; Durinck, S.; Modrusan, Z.; Storm, E.E.; Conboy, C.B.; Chaudhuri, S.; Guan, Y.; Janakiraman, V.; Jaiswal, B.S.; et al. Recurrent R-spondin fusions in colon cancer. Nature 2012, 488, 660–664. [Google Scholar] [CrossRef] [PubMed]
  59. Dulak, A.M.; Stojanov, P.; Peng, S.; Lawrence, M.S.; Fox, C.; Stewart, C.; Bandla, S.; Imamura, Y.; Schumacher, S.E.; Shefler, E.; et al. Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity. Nat. Genet. 2013, 45, 478–486. [Google Scholar] [CrossRef] [PubMed]
  60. Zang, Z.J.; Cutcutache, I.; Poon, S.L.; Zhang, S.L.; McPherson, J.R.; Tao, J.; Rajasegaran, V.; Heng, H.L.; Deng, N.; Gan, A.; et al. Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes. Nat. Genet. 2012, 44, 570–574. [Google Scholar] [CrossRef] [PubMed]
  61. Wang, K.; Kan, J.; Yuen, S.T.; Shi, S.T.; Chu, K.M.; Law, S.; Chan, T.L.; Kan, Z.; Chan, A.S.; Tsui, W.Y.; et al. Exome sequencing identifies frequent mutation of ARID1A in molecular subtypes of gastric cancer. Nat. Genet. 2011, 43, 1219–1223. [Google Scholar] [CrossRef] [PubMed]
  62. Parsons, D.W.; Jones, S.; Zhang, X.; Lin, J.C.; Leary, R.J.; Angenendt, P.; Mankoo, P.; Carter, H.; Siu, I.M.; Gallia, G.L.; et al. An integrated genomic analysis of human glioblastoma multiforme. Science 2008, 321, 1807–1812. [Google Scholar] [CrossRef] [PubMed]
  63. Agrawal, N.; Frederick, M.J.; Pickering, C.R.; Bettegowda, C.; Chang, K.; Li, R.J.; Fakhry, C.; Xie, T.X.; Zhang, J.; Wang, J.; et al. Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science 2011, 333, 1154–1157. [Google Scholar] [CrossRef] [PubMed]
  64. Stransky, N.; Egloff, A.M.; Tward, A.D.; Kostic, A.D.; Cibulskis, K.; Sivachenko, A.; Kryukov, G.V.; Lawrence, M.S.; Sougnez, C.; McKenna, A.; et al. The mutational landscape of head and neck squamous cell carcinoma. Science 2011, 333, 1157–1160. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  65. Huang, J.; Deng, Q.; Wang, Q.; Li, K.Y.; Dai, J.H.; Li, N.; Zhu, Z.D.; Zhou, B.; Liu, X.Y.; Liu, R.F.; et al. Exome sequencing of hepatitis B virus-associated hepatocellular carcinoma. Nat. Genet. 2012, 44, 1117–1121. [Google Scholar] [CrossRef] [PubMed]
  66. Ding, L.; Getz, G.; Wheeler, D.A.; Mardis, E.R.; McLellan, M.D.; Cibulskis, K.; Sougnez, C.; Greulich, H.; Muzny, D.M.; Morgan, M.B.; et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 2008, 455, 1069–1075. [Google Scholar] [CrossRef] [PubMed]
  67. Rudin, C.M.; Durinck, S.; Stawiski, E.W.; Poirier, J.T.; Modrusan, Z.; Shames, D.S.; Bergbower, E.A.; Guan, Y.; Shin, J.; Guillory, J.; et al. Comprehensive genomic analysis identifies SOX2 as a frequently amplified gene in small-cell lung cancer. Nat. Genet. 2012, 44, 1111–1116. [Google Scholar] [CrossRef] [PubMed]
  68. Peifer, M.; Fernández-Cuesta, L.; Sos, M.L.; George, J.; Seidel, D.; Kasper, L.H.; Plenker, D.; Leenders, F.; Sun, R.; Zander, T.; et al. Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nat. Genet. 2012, 44, 1104–1110. [Google Scholar] [CrossRef] [PubMed]
  69. Seo, J.S.; Ju, Y.S.; Lee, W.C.; Shin, J.Y.; Lee, J.K.; Bleazard, T.; Lee, J.; Jung, Y.J.; Kim, J.O.; Shin, J.Y.; et al. The transcriptional landscape and mutational profile of lung adenocarcinoma. Genome Res. 2012, 22, 2109–2119. [Google Scholar] [CrossRef] [PubMed]
  70. Imielinski, M.; Berger, A.H.; Hammerman, P.S.; Hernandez, B.; Pugh, T.J.; Hodis, E.; Cho, J.; Suh, J.; Capelletti, M.; Sivachenko, A.; et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 2012, 150, 1107–1120. [Google Scholar] [CrossRef] [PubMed]
  71. Stark, M.S.; Woods, S.L.; Gartside, M.G.; Bonazzi, V.F.; Dutton-Regester, K.; Aoude, L.G.; Chow, D.; Sereduk, C.; Niemi, N.M.; Tang, N.; et al. Frequent somatic mutations in MAP3K5 and MAP3K9 in metastatic melanoma identified by exome sequencing. Nat. Genet. 2011, 44, 165–169. [Google Scholar] [CrossRef] [PubMed]
  72. Davies, H.; Bignell, G.R.; Cox, C.; Stephens, P.; Edkins, S.; Clegg, S.; Teague, J.; Woffendin, H.; Garnett, M.J.; Bottomley, W.; et al. Mutations of the BRAF gene in human cancer. Nature 2002, 417, 949–954. [Google Scholar] [CrossRef] [PubMed]
  73. Berger, M.F.; Hodis, E.; Heffernan, T.P.; Deribe, Y.L.; Lawrence, M.S.; Protopopov, A.; Ivanova, E.; Watson, I.R.; Nickerson, E.; Ghosh, P.; et al. Melanoma genome sequencing reveals frequent PREX2 mutations. Nature 2012, 485, 502–506. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  74. Hodis, E.; Watson, I.R.; Kryukov, G.V.; Arold, S.T.; Imielinski, M.; Theurillat, J.P.; Nickerson, E.; Auclair, D.; Li, L.; Place, C.; et al. A landscape of driver mutations in melanoma. Cell 2012, 150, 251–263. [Google Scholar] [CrossRef] [PubMed]
  75. Lin, D.C.; Meng, X.; Hazawa, M.; Nagata, Y.; Varela, A.M.; Xu, L.; Sato, Y.; Liu, L.Z.; Ding, L.W.; Sharma, A.; et al. The genomic landscape of nasopharyngeal carcinoma. Nat. Genet. 2014, 46, 866–871. [Google Scholar] [CrossRef] [PubMed]
  76. India Project Team of the International Cancer Genome Consortium. Mutational landscape of gingivo-buccal oral squamous cell carcinoma reveals new recurrently mutated genes and molecular subgroups. Nat. Commun. 2013, 4, 2873. [Google Scholar]
  77. Jones, S.; Wang, T.L.; Shih, I.M.; Mao, T.L.; Nakayama, K.; Roden, R.; Glas, R.; Slamon, D.; Diaz, L.A., Jr.; Vogelstein, B.; et al. Frequent mutations of chromatin remodeling gene ARID1A in ovarian clear cell carcinoma. Science 2010, 330, 228–231. [Google Scholar] [CrossRef] [PubMed]
  78. Wu, J.; Jiao, Y.; Dal Molin, M.; Maitra, A.; de Wilde, R.F.; Wood, L.D.; Eshleman, J.R.; Goggins, M.G.; Wolfgang, C.L.; Canto, M.I.; et al. Whole-exome sequencing of neoplastic cysts of the pancreas reveals recurrent mutations in the components of ubiquitin-dependent pathways. Proc. Natl. Acad. Sci. USA 2011, 108, 21188–21193. [Google Scholar] [CrossRef] [PubMed]
  79. Jiao, Y.; Shi, C.; Edil, B.H.; de Wilde, R.F.; Klimstra, D.S.; Maitra, A.; Schulick, R.D.; Tang, L.H.; Wolfgang, C.L.; Choti, M.A.; et al. DAXX/ATRX, MEN1 and mTOR pathway genes are frequently altered in pancreatic neuroendocrine tumors. Science 2011, 331, 1199–1203. [Google Scholar] [CrossRef] [PubMed]
  80. Barbieri, C.E.; Baca, S.C.; Lawrence, M.S.; Demichelis, F.; Blattner, M.; Theurillat, J.P.; White, T.A.; Stojanov, P.; Van Allen, E.; Stransky, N.; et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat. Genet. 2012, 44, 685–689. [Google Scholar] [CrossRef] [PubMed]
  81. Berger, M.F.; Lawrence, M.S.; Demichelis, F.; Drier, Y.; Cibulskis, K.; Sivachenko, A.Y.; Sboner, A.; Esgueva, R.; Pflueger, D.; Sougnez, C.; et al. The genomic complexity of primary human prostate cancer. Nature 2011, 470, 214–220. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  82. Grasso, C.S.; Wu, Y.M.; Robinson, D.R.; Cao, X.; Dhanasekaran, S.M.; Khan, A.P.; Quist, M.J.; Jing, X.; Lonigro, R.J.; Brenner, J.C.; et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature 2012, 487, 239–243. [Google Scholar] [CrossRef] [PubMed]
  83. Guo, G.; Gui, Y.; Gao, S.; Tang, A.; Hu, X.; Huang, Y.; Jia, W.; Li, Z.; He, M.; Sun, L.; et al. Frequent mutations of genes encoding ubiquitin-mediated proteolysis pathway components in clear cell renal cell carcinoma. Nat. Genet. 2011, 44, 17–19. [Google Scholar] [CrossRef] [PubMed]
  84. Cho, H.; Mariotto, A.B.; Schwartz, L.M.; Luo, J.; Woloshin, S. When do changes in cancer survival mean progress? The insight from population incidence and mortality. J. Natl. Cancer Inst. Monogr. 2014, 2014, 187–197. [Google Scholar] [CrossRef] [PubMed]
  85. Nasdaq GlobeNewswire. GRAIL Closes Over $900 Million Initial Investment in Series B Financing to Develop Blood Tests to Detect Cancer Early. 2017. Available online: https://globenewswire.com/news-release/2017/03/01/929515/0/en/GRAIL-Closes-Over-900-Million-Initial-Investment-in-Series-B-Financing-to-Develop-Blood-Tests-to-Detect-Cancer-Early.html (accessed on 31 March 2017).
  86. Kakushadze, Z.; Yu, W. Statistical Risk Models. J. Invest. Strateg. 2017, 6, 1–40. Available online: http://ssrn.com/abstract=2732453 (accessed on 31 March 2017). [CrossRef]
  87. Roy, O.; Vetterli, M. The effective rank: A measure of effective dimensionality. In Proceedings of the European Signal Processing Conference (EUSIPCO), Poznań, Poland, 3–7 September 2007; pp. 606–610. [Google Scholar]
  88. Tomasetti, C.; Li, L.; Vogelstein, B. Stem cell divisions, somatic mutations, cancer etiology, and cancer prevention. Science 2017, 355, 1330–1334. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Cluster Cl-1 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 1. Cluster Cl-1 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g001
Figure 2. Cluster Cl-2 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 2. Cluster Cl-2 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g002
Figure 3. Cluster Cl-3 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 3. Cluster Cl-3 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g003
Figure 4. Cluster Cl-4 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 4. Cluster Cl-4 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g004
Figure 5. Cluster Cl-5 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 5. Cluster Cl-5 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g005
Figure 6. Cluster Cl-6 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 6. Cluster Cl-6 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g006
Figure 7. Cluster Cl-7 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 7. Cluster Cl-7 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g007
Figure 8. Cluster Cl-8 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 8. Cluster Cl-8 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g008
Figure 9. Cluster Cl-9 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 9. Cluster Cl-9 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g009
Figure 10. Cluster Cl-10 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 10. Cluster Cl-10 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g010
Figure 11. Cluster Cl-11 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Figure 11. Cluster Cl-11 in Clustering-E1 with weights based on normalized regressions with arithmetic means.
Genes 08 00201 g011
Table 1. Exome data summary. See Appendix A for the data source definitions. Here, we label cancer types via X1–X32 for use in the tables below.
Table 1. Exome data summary. See Appendix A for the data source definitions. Here, we label cancer types via X1–X32 for use in the tables below.
LabelCancer TypeTotal Counts# of SamplesSource
X1Acute Lymphoblastic Leukemia93886H1, Z1, D1
X2Acute Myeloid Leukemia1414190T1
X3Adrenocortical Carcinoma11,53091T2
X4B-Cell Lymphoma70624M1, L1
X5Benign Liver Tumor88440P1
X6Bladder Cancer90,121341G1, T3
X7Brain Lower Grade Glioma38,041465T4
X8Breast Cancer201,5551182N1, S1, S2, T5
X9Cervical Cancer47,715197T6
X10Cholangiocarcinoma12,156139Z2, T7
X11Chronic Lymphocytic Leukemia97580Q1
X12Colorectal Cancer214,814581S3, T8
X13Esophageal Cancer59,088329D2, T9
X14Gastric Cancer161,078401Z3, W1, T10
X15Glioblastoma Multiforme23,230359P2, T11
X16Head and Neck Cancer96,816591A1, S4, T12
X17Liver Cancer252,755452S5, H2, T13
X18Lung Cancer306,0711018D3, R1, P3, S6, I1, T14
X19Melanoma357,060594S7, D4, B1, A2, H3, T15
X20Nasopharyngeal Cancer224111L2
X21Oral Cancer13,462106I2
X22Ovarian Cancer20,610471J1, T16
X23Pancreatic Cancer39,788184W2, J2, T17
X24Pheochromocytoma and Paraganglioma3709178T18
X25Prostate Cancer22,808480B2, B3, G2, T19
X26Rectum Adenocarcinoma32,797115T20
X27Renal Cell Carcinoma47,635709G3, T21
X28Sarcoma28,256255T22
X29Testicular Germ Cell Tumor6064150T23
X30Thymoma4444123T24
X31Thyroid Carcinoma6833409T25
X32Uterine Cancer164,211305T26
All Cancer Types2,269,80510,656Above
Table 2. Weights for the first 48 mutation categories for the 11 clusters in Clustering-E1 (see Table A5) based on normalized regressions (see Section 3.2 for details). The conventions are the same as in Table A6.
Table 2. Weights for the first 48 mutation categories for the 11 clusters in Clustering-E1 (see Table A5) based on normalized regressions (see Section 3.2 for details). The conventions are the same as in Table A6.
MutationCl-1Cl-2Cl-3Cl-4Cl-5Cl-6Cl-7Cl-8Cl-9Cl-10Cl-11
ACAA0.000.000.000.000.000.000.000.004.050.000.00
ACCA0.000.000.000.000.000.000.000.003.940.000.00
ACGA0.000.000.000.000.000.0013.920.000.000.000.00
ACTA0.000.000.000.000.000.000.000.002.980.000.00
CCAA0.000.000.000.000.000.000.000.005.710.000.00
CCCA0.000.000.000.000.000.000.000.004.760.000.00
CCGA0.000.000.000.000.000.000.000.003.490.000.00
CCTA0.000.000.000.000.000.000.000.007.190.000.00
GCAA0.000.000.000.000.000.000.000.005.780.000.00
GCCA0.000.000.000.000.000.000.000.006.170.000.00
GCGA39.970.000.000.000.000.000.000.000.000.000.00
GCTA0.000.000.000.000.000.000.000.006.960.000.00
TCAA0.000.000.000.000.000.000.000.005.910.000.00
TCCA0.000.000.000.000.000.000.000.005.560.000.00
TCGA26.060.000.000.000.000.000.000.000.000.000.00
TCTA0.000.000.000.000.000.000.000.0013.300.000.00
ACAG0.000.000.000.000.000.0014.830.000.000.000.00
ACCG0.000.0013.730.000.000.000.000.000.000.000.00
ACGG0.000.000.000.000.0010.020.000.000.000.000.00
ACTG0.000.0015.790.000.000.000.000.000.000.000.00
CCAG0.000.000.000.000.000.0014.810.000.000.000.00
CCCG0.000.000.000.000.000.0013.100.000.000.000.00
CCGG0.000.000.000.000.000.0011.850.000.000.000.00
CCTG0.000.000.000.000.000.0017.230.000.000.000.00
GCAG0.000.000.000.000.0014.970.000.000.000.000.00
GCCG0.000.000.0014.360.000.000.000.000.000.000.00
GCGG0.000.000.000.000.0023.520.000.000.000.000.00
GCTG0.000.000.009.160.000.000.000.000.000.000.00
TCAG0.000.000.000.000.000.000.000.009.110.000.00
TCCG0.000.000.000.000.000.000.000.004.300.000.00
TCGG0.000.000.000.000.000.0014.260.000.000.000.00
TCTG0.000.000.000.000.000.000.000.0010.790.000.00
ACAT0.000.000.000.000.000.000.000.000.001.970.00
ACCT0.000.000.000.000.000.000.000.000.002.650.00
ACGT0.000.000.000.000.000.000.000.000.009.100.00
ACTT0.000.000.000.000.000.000.000.000.000.006.90
CCAT0.000.000.000.000.000.000.000.000.004.190.00
CCCT0.000.000.000.000.000.000.000.000.005.370.00
CCGT0.000.000.000.000.000.000.000.000.009.130.00
CCTT0.000.000.000.000.000.000.000.000.004.900.00
GCAT0.000.000.000.000.000.000.000.000.002.640.00
GCCT0.000.000.000.000.000.000.000.000.004.980.00
GCGT0.000.000.000.000.000.000.000.000.0011.430.00
GCTT0.000.000.000.000.000.000.000.000.003.690.00
TCAT0.000.000.000.000.000.000.000.000.008.080.00
TCCT0.000.000.000.000.000.000.000.000.0011.380.00
TCGT0.000.000.000.000.000.000.000.000.0012.030.00
TCTT0.000.000.000.000.000.000.000.000.006.420.00
Table 3. Table 2, continued: weights for the next 48 mutation categories.
Table 3. Table 2, continued: weights for the next 48 mutation categories.
MutationCl-1Cl-2Cl-3Cl-4Cl-5Cl-6Cl-7Cl-8Cl-9Cl-10Cl-11
ATAA0.000.000.000.000.0013.780.000.000.000.000.00
ATCA0.000.0016.080.000.000.000.000.000.000.000.00
ATGA0.000.0016.980.000.000.000.000.000.000.000.00
ATTA0.000.000.000.0014.230.000.000.000.000.000.00
CTAA0.000.000.000.000.0010.070.000.000.000.000.00
CTCA0.000.0018.000.000.000.000.000.000.000.000.00
CTGA33.970.000.000.000.000.000.000.000.000.000.00
CTTA0.000.000.000.0019.110.000.000.000.000.000.00
GTAA0.000.000.000.000.0011.460.000.000.000.000.00
GTCA0.000.000.000.000.000.000.0013.530.000.000.00
GTGA0.000.0019.410.000.000.000.000.000.000.000.00
GTTA0.000.000.000.000.000.000.0010.750.000.000.00
TTAA0.0020.000.000.000.000.000.000.000.000.000.00
TTCA0.0026.570.000.000.000.000.000.000.000.000.00
TTGA0.0024.380.000.000.000.000.000.000.000.000.00
TTTA0.0029.050.000.000.000.000.000.000.000.000.00
ATAC0.000.000.000.000.000.000.000.000.000.006.14
ATCC0.000.000.000.000.000.000.000.000.000.005.13
ATGC0.000.000.000.000.000.000.000.000.000.007.96
ATTC0.000.000.000.000.000.000.000.000.000.006.21
CTAC0.000.000.000.0019.540.000.000.000.000.000.00
CTCC0.000.000.000.000.000.000.000.000.000.007.14
CTGC0.000.000.000.000.000.000.000.000.002.040.00
CTTC0.000.000.000.000.000.000.000.000.000.007.76
GTAC0.000.000.000.000.000.000.000.000.000.005.99
GTCC0.000.000.000.000.000.000.000.000.000.006.64
GTGC0.000.000.000.000.000.000.000.000.000.006.46
GTTC0.000.000.000.000.000.000.000.000.000.006.72
TTAC0.000.000.000.0018.660.000.000.000.000.000.00
TTCC0.000.000.000.000.000.000.000.000.000.005.18
TTGC0.000.000.000.000.000.000.000.000.000.004.68
TTTC0.000.000.000.000.000.000.000.000.000.004.69
ATAG0.000.000.000.000.000.000.009.140.000.000.00
ATCG0.000.000.000.000.000.000.0010.600.000.000.00
ATGG0.000.000.000.000.000.000.0011.810.000.000.00
ATTG0.000.000.000.0014.480.000.000.000.000.000.00
CTAG0.000.000.000.000.000.000.006.740.000.000.00
CTCG0.000.000.000.000.000.000.0014.760.000.000.00
CTGG0.000.000.009.040.000.000.000.000.000.000.00
CTTG0.000.000.000.000.000.000.000.000.000.006.03
GTAG0.000.000.000.000.0016.180.000.000.000.000.00
GTCG0.000.000.0010.550.000.000.000.000.000.000.00
GTGG0.000.000.0047.900.000.000.000.000.000.000.00
GTTG0.000.000.008.980.000.000.000.000.000.000.00
TTAG0.000.000.000.000.000.000.009.400.000.000.00
TTCG0.000.000.000.0013.980.000.000.000.000.000.00
TTGG0.000.000.000.000.000.000.0013.260.000.000.00
TTTG0.000.000.000.000.000.000.000.000.000.006.37
Table 4. The within-cluster cross-sectional correlations Θ s A (Columns 2–12), the overall correlations Ξ s (Column 15) based on the overall cross-sectional regressions and multiple R 2 and adjusted R 2 of these regressions (Columns 13 and 14). The cluster weights are based on unnormalized regressions (see Section 3.2 and Section 3.3.1 for details). All quantities are in the units of 1% rounded to 2 digits. The values above 80% are given in bold font. The values above 70% are underlined.
Table 4. The within-cluster cross-sectional correlations Θ s A (Columns 2–12), the overall correlations Ξ s (Column 15) based on the overall cross-sectional regressions and multiple R 2 and adjusted R 2 of these regressions (Columns 13 and 14). The cluster weights are based on unnormalized regressions (see Section 3.2 and Section 3.3.1 for details). All quantities are in the units of 1% rounded to 2 digits. The values above 80% are given in bold font. The values above 70% are underlined.
TypeCl-1Cl-2Cl-3Cl-4Cl-5Cl-6Cl-7Cl-8Cl-9Cl-10Cl-11 R 2 adj- R 2 Cor
X182.7352.61−38.3588.99−2.4862.3144.7447.9946.9668.29−17.381.5379.1486.03
X257.8479.57−21.352.4614.0729−23.2745.7−9.0637.8623.6961.5556.5770.97
X397.8459.33−34.8885.8493.7120.4949.3672.2824.4348.9213.5575.8472.7175.03
X479.679.542.33−53.4333.46−25.78−10.9837.4749.4235.116.6970.3566.5160.11
X599.2136.4313.5446.6596.99−30.2−76.8751.58−27.2318.4937.0370.9667.2161.21
X6−87.7964.06−30.3793.9489.4327.2541.1181.6766.0661.7757.6864.0959.4474.35
X749.5694.33−63.2723.669.48−4.5553.9788.7345.5934.9728.9559.2854.0168
X8−33.1416.16−72.597.7236.79−35.3876.3558.4467.0651.844.5765.4961.0272.05
X9−94.7661.06−88.86−49.91−5.3−20.1830.4759.5564.4962.6637.6761.2556.2473.38
X1030.52−7.3175.577.4477.24−53.3436.3482.2710.4142.5254.5674.7271.4565.04
X116.4854.5453.91−58.7744.42−0.3970.02−30.5549.2942.5828.1477.4674.5572.98
X12−72.7676.02−15.94−43.697.61−44.7137.6464.4567.7547.0850.2467.3263.0973.76
X13−85.3193.35−52.58−40.1550.9411.6493.6676.3673.5758.1431.2373.9970.6376.38
X1470.9462.01−32.58−42.7985.81−31.9869.1977.9431.3735.2538.4355.4449.6765.84
X1512.187.76−64.1662.0192.19−40.3656.6477.9434.3739.4346.3460.0154.8470.44
X1630.6283.56−8.44−1.1684.79171.169.760.2480.737.1885.7983.9587.99
X1745.651.388.6623.5366.13−13.0745.084017.918.6323.8975.7572.6265.94
X1866.529.279.62−13.5178.2−5.9516.9556.688.3465.8351.1776.8773.8874.51
X19−56.7276.0841.9577.8921.98−24.67−44.4569.91−6.0669.8431.0370.1966.3381.77
X2063.1−45.6859.2399.9571.7798.3−66.794.37−19.7554.9120.791.0389.8794.01
X2130.55−9.19−3.4332.9658.57−42.2618.6610.75.6687.7543.0178.275.3781.7
X2214.389.91−48.97−15.3241.05−28.3545.0677.449.0157.6146.7182.7180.4873.91
X23−94.678.61−10.8854.36−54.29−25.8680.3579.5341.2536.5756.6959.8754.6869.15
X2414.3617.95−64.97−6.4467.953.2568.477.533.9730.868.4869.6765.7471.2
X2599.22−10.4−68.6731.1670.04−30.4651.8588.2539.842.0938.8265.1760.6670.86
X26−99.8668.28−42.04−91.74−44.5741.96−32.18−17.271.0759.3312.8851.1344.867.02
X2722.4677.37−17.6660.2567.81−54.0254.788143.5745.3669.6881.7579.3971.86
X2874.886.01−20.0620.2552.18−29.6460.8782.46−0.9287.4256.0774.6471.3678.39
X2956.6−32.03−73.4186.7689.795.8545.165.9−19.748.9248.863.0158.2252.27
X3053.6889.56−8.7359.4227.2914.2914.5755.29−34.7335.9750.8463.9459.2766.09
X31−63.194.686.58−25.3754.44−4.6823.5770.7745.7558.8245.6980.6378.1281.63
X32−90.3892.13−40.68−46.9−41.39−47.5629.2528.3570.5853.2514.1260.5855.4771.25
Table 5. The within-cluster cross-sectional correlations Θ s A (Columns 2–12), the overall correlations Ξ s (Column 15) based on the overall cross-sectional regressions and multiple R 2 and adjusted R 2 of these regressions (Columns 13 and 14). The cluster weights are based on normalized regressions (see Section 3.2 and Section 3.3.1 for details). All quantities are in the units of 1% rounded to 2 digits. The values above 80% are given in bold font. The values above 70% are underlined.
Table 5. The within-cluster cross-sectional correlations Θ s A (Columns 2–12), the overall correlations Ξ s (Column 15) based on the overall cross-sectional regressions and multiple R 2 and adjusted R 2 of these regressions (Columns 13 and 14). The cluster weights are based on normalized regressions (see Section 3.2 and Section 3.3.1 for details). All quantities are in the units of 1% rounded to 2 digits. The values above 80% are given in bold font. The values above 70% are underlined.
TypeCl-1Cl-2Cl-3Cl-4Cl-5Cl-6Cl-7Cl-8Cl-9Cl-10Cl-11 R 2 adj- R 2 Cor
X174.8252.72−43.9490.191.8565.3645.6947.7751.2184.05−13.3189.2187.8192.03
X246.8679.8−0.943.6814.8230.35−20.1446.18−6.9759.5932.3672.7369.280.3
X399.6959.47−9.3286.8993.7626.1350.4672.3934.4770.5523.1585.2483.3485.05
X471.2310.7615.04−51.4430.56−21.881.9836.4147.9755.7−1.5875.8572.7267.89
X510035.8330.349.2698.08−28.3−65.4551.98−17.4338.9844.8277.9975.1470.79
X6−93.2263.8−16.0694.8490.8431.6727.5682.1261.6153.0564.3158.8353.569.75
X737.9794.43−47.2925.469.110.8363.6888.648.0959.438.1570.7666.9778.2
X8−45.0116.5−62.6398.2839.35−32.2783.659.1963.1143.450.5261.1956.1767.77
X9−89.8662.25−80.98−47.45−8.97−16.7416.1259.3960.35533.3456.2350.5769.13
X1018.02−8.282.6310.1874.99−52.0941.7782.7320.0263.7858.8479.3776.771.53
X11−6.4255.4452.93−57.5447.63−1.2881.7−30.0750.6662.333.1284.3482.3181.56
X12−63.3376.584.49−41.777.54−42.0348.8964.2566.8968.7345.5874.9471.6980.69
X13−77.8993.71−27.8−38.6847.2917.4495.8776.8775.577.5123.6881.5279.1383.69
X1461.2963.15−18.3−41.2787.58−28.281.2578.0936.8660.0533.5467.8563.6976.76
X15−0.7887.99−55.9463.993.15−35.3570.9678.3833.1861.5354.2171.7368.0780.1
X1618.1184.3119.730.9185.337.4860.1770.257.3576.0845.6484.1782.1386.47
X1733.821.0530.6726.7469.21−9.9458.6739.9122.729.3932.7280.4277.8972.46
X1856.368.1880.84−11.7277.470.1214.0857.3214.355.7858.6681.3278.9179.27
X19−66.8476.0363.5879.3525.45−22.45−33.470.1−5.2347.4931.4757.0551.4972.24
X2072.56−45.5350.7699.7674.6796.7−64.7194.63−15.576721.8590.2789.0193.47
X2142.55−8.08−21.7235.2461.4−38.8910.5310.8317.1592.8748.2984.4982.4987.07
X221.4590.5−35.62−13.1339.71−23.6660.3477.9848.672.153.6287.2585.680.31
X23−89.6479.5216.9755.42−56.06−22.5384.9779.2245.9659.2356.6970.9367.1778.71
X241.519.38−60−5.0371.179.5176.8177.9632.3154.1118.5579.1376.4380.84
X2596.78−9.27−60.4933.7371.27−27.2566.7288.2945.1665.4447.1976.973.9181.59
X26−98.3368.09−18.47−91.28−47.4847.37−21.32−17.4968.1170.344.8352.946.868.69
X279.7478.263.262.767.15−51.8265.9880.7448.2165.3674.8987.3885.7480.4
X2882.7286.773.1823.0654.14−24.2169.0482.838.588.9961.8877.7174.8381.01
X2966.74−30.7−75.4186.2690.396.1940.0565.75−9.1533.085070.786763.15
X3064.0989.73−9.7859.1924.4215.1210.0954.47−29.336055.9174.9171.6677.47
X31−72.5694.1568.86−23.6156.73−0.0729.7671.542.0775.5552.1385.883.9686.69
X32−84.1292.48−23.26−45.55−42.69−46.1527.1627.8368.8968.656.5962.6757.8473.22
Table 6. The within-cluster cross-sectional correlations Θ s A (Columns 2–12), the overall correlations Ξ s (Column 15) based on the overall cross-sectional regressions and multiple R 2 and adjusted R 2 of these regressions (Columns 13 and 14). The cluster weights are based on normalized regressions (see Section 3.2 and Section 3.3.1 for details). The definitions of cancer types G.X1–G.X14 for genome data are given in Table A10. All quantities are in the units of 1% rounded to 2 digits. The values above 80% are given in bold font. The values above 70% are underlined.
Table 6. The within-cluster cross-sectional correlations Θ s A (Columns 2–12), the overall correlations Ξ s (Column 15) based on the overall cross-sectional regressions and multiple R 2 and adjusted R 2 of these regressions (Columns 13 and 14). The cluster weights are based on normalized regressions (see Section 3.2 and Section 3.3.1 for details). The definitions of cancer types G.X1–G.X14 for genome data are given in Table A10. All quantities are in the units of 1% rounded to 2 digits. The values above 80% are given in bold font. The values above 70% are underlined.
TypeCl-1Cl-2Cl-3Cl-4Cl-5Cl-6Cl-7Cl-8Cl-9Cl-10Cl-11 R 2 adj- R 2 Cor
G.X18.47−42.14−5.83−28.478.6−27.8168.1−58.7678.7143.091.9876.7473.7364.82
G.X212.58−8.395.78−17.4836.44−39.4665.49−12.2532.0752.7618.7680.0477.4674.6
G.X37.917.51−12.8537.4663.86−48.7943.8640.6320.7753.5710.2178.9676.2479.3
G.X4−7.33−4.67−35.6790.1627.48−35.776.3421.8759.3929.9518.7457.3851.8760.47
G.X58.64−2.634.7613.5718.72−19.8648.29−54.1138.7538.465.2280.9678.4966.43
G.X619.2986.7963.27−26.72−1.53−52.1883.9634.5569.977.0856.9483.6681.5486.8
G.X70.115.2140.26−28.563.03−38.460.4263.0956.6242.039.2468.4564.3762.44
G.X858.3925.87−1.42−17.3−83.3175.44−68.5665.93−23.25−27.4917.4758.6553.39.81
G.X928.73−62.3477.864.5787.7−47.0949.7117.115.133.0229.0576.273.1269.99
G.X10−20.84−15.96−61.6824.617.53−33.4439.85−5.434.9358.494.9978.4875.778.18
G.X117.2539.51−7.8644.2346−54.8867.0825.4550.241.1115.0283.9981.9265.21
G.X127.9−88.83−70.0521.8490.47−23.2866.9253.7359.4967.270.473.5570.1281.42
G.X13−5.33−30.41−61.53−56.72−11.91−37.9464.53−20.6160.2266.73−4.6184.3182.2879.24
G.X146.33−39.9442.62−2156.7−51.1965.01726.195.52−12.2771.5867.939.79

Share and Cite

MDPI and ACS Style

Kakushadze, Z.; Yu, W. Mutation Clusters from Cancer Exome. Genes 2017, 8, 201. https://0-doi-org.brum.beds.ac.uk/10.3390/genes8080201

AMA Style

Kakushadze Z, Yu W. Mutation Clusters from Cancer Exome. Genes. 2017; 8(8):201. https://0-doi-org.brum.beds.ac.uk/10.3390/genes8080201

Chicago/Turabian Style

Kakushadze, Zura, and Willie Yu. 2017. "Mutation Clusters from Cancer Exome" Genes 8, no. 8: 201. https://0-doi-org.brum.beds.ac.uk/10.3390/genes8080201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop