Abnormal epigenetic change in the form of the loss or gain of normal 5-methylcytosine patterns is one of the most ubiquitous epigenetic changes in cancer epigenetics, controlling gene expression via changes in transcription factor binding [1
Methylation in cancer is of particular interest due to its association with specific phenotypes such as the CpG Island Methylator Phenotype (CIMP) that possess unique characteristics [2
]. In colorectal cancer, CIMP is associated with a hypermethylator phenotype [3
], usually found in the right colon in female patients with advanced age, can be associated with microsatellite instability (due to DNA mismatch repair suppression), and has been suggested to be a biomarker for poor prognosis [4
]. In fact, the epigenome has been shown to vary widely across different cancer types, and multiple methylation subtypes have been identified. The TCGA consortium identified four methylator subtypes associated with colorectal cancer, each associated with differing clinico-pathological characteristics [5
]. In head and neck squamous cell carcinoma (HNSCC), analysis of TCGA tumour methylation data has revealed both hyper- and hypomethylated gene sets for diagnostic purposes [6
]. In addition, six methylation clusters have been identified that correlate with clinicopathological characteristics such as HPV status and also three-year survival [7
A particular focus of interest in study of the epigenome is in circulating tumour DNA [8
] (ctDNA). This has attracted particular attention as a biomarker for disease, with the GRAIL consortium utilising a mutation based assay for circulating tumour DNA with over 90% accuracy for the detection of cancer [9
]. However, the use of methylation assays on this sample type has been restricted due to low input concentrations available from ctDNA. The UroMark consortium [10
] has studied the use of bisulphite amplicon resequencing in the analysis of free urinary DNA. The DNA input required for this is relatively plentiful (>30 ng of urinary DNA) compared to that available for circulating tumour DNA (typically 10–20 ng per 10 mls of whole blood).
In order to develop new biomarker panels for the methylation based detection of disease, a whole genome approach is required. Several technologies exist for this purpose—whole genome bisulphite sequencing [11
] (WGBS), reduced representation bisulphite sequencing [12
] (RBBS), and methylation arrays, typically the Illumina Infinium assay [13
]. Both WGBS and RBBS require large input amounts of DNA (~3 µg) and are particularly expensive, the former especially so due to the increased depth of coverage required necessitating increased sequencing cost. Methylation arrays, the most popular of which is the Illumina Infinium assay, require at a minimum 200–300 ng of bisulphite converted DNA, which can be from formalin fixed, paraffin embedded (FFPE) archival samples, repaired by use of a proprietary “DNA restore” kit. The input requirements, as specified by the manufacturer, make this assay seemingly unusable for the requirements of low input and ctDNA experiements.
We hypothesised that as the Illumina Infinium assay utilises a whole genome [14
] amplification (WGA) step, the theoretical input to this kit could encompass concentrations seen in ctDNA samples. We therefore aimed to study the effect of input concentration of DNA on assay performance.
2.1. Sample QC Metrics
Sample workflows are shown in Supplementary Figure S1
. All samples across both blood, FFPE, and fresh tissue passed QC metrics for their respective samples types. For samples C, 1 and 3 all had >99% CpG detection across the range of DNA input (10–500 ng). QC results are shown in Table 1
and Table 2
. For FFPE samples from the SCORT consortium (n
= 16, across 50, 100, and 150 ng input mass), all samples passed QC with >95% CpG detection, reflecting the lower detection threshold provided by Illumina for lower quality samples. A Bland–Altman plot from 10 randomly selected CpG probes comparing fresh tissue and ctDNA in order to understand reproducibility was performed (Supplementary Figure S2
) showing good correlation (Bias = −0.0074, 95% CI = −0.07–0.06). The normalisation graph demonstrated an unusual bump in the beta values around 0.5 but were all from the same blood sample donor at varying input amounts, suggesting a sample specific issue. When investigated, it was found that this donor has recently completed chemotherapy, which may explain the unusual probe distribution.
2.2. Sample Correlations
In order to understand whether there was a relationship between reducing input and loss of correlation between identical samples, a correlation analysis was run using the cor command in the psych module of R.
For samples C, 1, and 3, all correlation coefficients were >0.99 across all comparators (Pearsons p
< 0.001, Table 3
). For the FFPE derived samples, the correlation coefficient was at least 96% in all samples. Correlation matrices and distributions for all sample types are shown in Figure 1
. Correlation between sample inputs dispersed more widely as the input amount, with a particular widening around 10 ng input across the blood derived samples. In the FFPE sample group, these were more widely dispersed generally; however, less of an effect in terms of dispersion at low input concentrations was seen. For the ctDNA samples, correlation was performed between all the samples, and lower correlations (typically <0.95) and very wide dispersions were seen, as the point of this plot was to demonstrate that they were not similar.
In order to understand the differences between ctDNA, tissue samples, and (unmatched) blood samples with leucocyte derived DNA, we carried out a three way analysis (Supplementary Figure S3
). Both hierarchical clustering and a multi-dimensional scaling plot demonstrated clear separation of the plots, meaning that the observed differences between the ctDNA and tumour were not functions of the differences in methylation profiles seen between leucocytes and tumour cells.
We also carried out a genomewide plot (Supplementary Figures S4 and S5
) of all samples in order to understand the differences and reproducibility in methylation across sample types. Unsurprisingly, in the blood derived DNA, there was a bimodal distribution of methylation in keeping with no methylation or methylation of both alleles. Both tumour and ctDNA samples showed a bimodal distribution reflecting the heterogeneity in this sample type.
2.3. Effect of Input Amount on Results
In order to understand whether reducing the sample amount impacted on results obtained through standard analysis, a differentially methylated position analysis was performed with the DMP analysis of R/ChAMP/Bioconductor with each sample randomly assigned to one of two groups, and each sample group analysed separately.
In the fresh blood derived sample group, for sample “1”, the DMP analysis demonstrated no significant differences between any of the groups comparing between each concentration, with the highest ranking probe cg17735539 having a d-Beta-meth of −0.12 an adjusted p-value of 0.99 and a Bayes Factor (B) of 1.96. For sample “3”, the DMP analysis demonstrated no significant differences between any of groups comparing between each concentration, with the highest ranking probe cg14640149 having a d-Beta-meth of −0.11, an adjusted p-value of 0.99, and a B of 0.41. For sample “C”, the DMP analysis demonstrated no significant differences between groups, with the highest ranking probe cg18253591 having a d-Beta-meth of 0.04, an adjusted p-value of 0.43, and a B of 5.61.
In the FFPE sample group, for comparisons between groups, there was no significant difference with the highest ranking probe cg06871764 having a d-Beta-meth of 0.18, an adjusted p-value of 0.99, and a B of 2.77.
2.4. ctDNA Analysis
In order to understand whether the observed low inputs would be suitable for detection of very fragmented, low input (typically 1–2 ng) DNA, an experiment was carried out on six samples of circulating tumour DNA matched to their primary head and neck tumours of origin, sourced from fresh tissue.
All samples hybridised successfully to the arrays with very low input amounts when restored using the FFPE restore kit. One sample (288-ctDNA) had only 93% of probes that passed QC, possibly due to the increased fragmentation inherent to ctDNA. Median correlation between the tumour of origin and what was observed in the ctDNA samples was 0.92 (max 0.98, min 0.84). In the sample was the correlation was observed to be 0.84 (sample 288); this may be due to the poor quality of input sample.
In order to understand whether the observed differences were because of enrichment for “tumour” in the ctDNA samples over the contaminating normal stroma in the “tumour” samples, which would exaggerate methylation differences; rather than contaminating genomic DNA, a standard methylation pipeline analysis using ChAMP was carried out. This was because if the ctDNA was mainly genomic DNA, it was expected that methylated regions typically seen in leucocytes (which are the predominant contributor to contaminating ctDNA) would be observed rather than those typical for HNSCC.
A differential methylation analysis revealed 19,014 differentially methylated positions between the control tumour samples and the circulating tumour DNA samples. The top ranked probe cg09568464 (chr19: 56904901) tagged a region within 200 bp of the transcription start site of ZNF582 (deltaBeta = 0.40, methcontrol = 0.42, methctDNA = 0.02, adjp = 1.72 × 10−4, B = 11.6). G:Profiler pathway analysis of the differentially methylated positions revealed the top enriched pathway was KEGG:04072 (Phosopholipase-D pathway, adj p value = 6.21 × 10−5) Several other tumourigenic pathways of interest were also enriched including RAP1 signalling (padj = 1.14 × 10−4), wnt signalling (padj = 6.12 × 10−4), HPV infection (padj = 9.56 × 10−3), and PIK3-Akt signalling (padj = 0.02).
Analysis of differentially methylated regions revealed 410 DMRs with adjusted p values < 0.05. The top DMR was at chr6:29520965–29521803 (p < 0.05) and tagged a region just upstream of UBD (ubiquitin-D), a key regulator of NF-kB. Pathway analysis utilising G:Profiler demonstrated two Kegg pathways that were enriched: KEGG:04080 (neuroactive ligand receptor interaction, padj = 6.73 × 10−4) and KEGG:04550 (signalling pathways regulating pleuripotency of stem cells, padj = 7.46 × 10−3).
2.5. DNA Copy Number Calls
In order to ascertain whether copy number variation could be detected using this methodology in circulating tumour DNA, the copy prediction algorithm of conumee was used, as per the vignette.
Generally, there was good agreement between copy number predictions for the fresh sample and the ctDNA sample, with multiple copy number changes detected in both. There was dispersion of the copy number plot in the fresh samples, most likely due to contaminating normal and the presence of multiple clones within the tumour sample. An example plot is shown in Figure 2
, where the copy numbers between a fresh sample (top) and ctDNA derived sample from the sample patient (bottom) are compared. Copy number gains in CDKN2A and NOTCH are both observed in both samples, with high copy number seen in the ctDNA as compared to the fresh sample. A similar pattern was seen in the FFPE derived samples (Figure 3
), where increasing dispersion of copy number was seen as input DNA mass decreased. In matched ctDNA and fresh tumour samples, plotting of residual copy number segmentation values (DQ plot, Supplementary Figure S5
) demonstrated reasonably good concordance between copy number values but with dispersion towards the extremes of value, but no more than +/− 0.7 maximally. The TCGA HNSCC dataset [15
] demonstrated that loss of 3p and 8p, as well as gain of 3q, 5p, and 8q, were seen frequently in HNSCC, and copy number changes were seen in all these regions in the ctDNA samples.
Our results demonstrate that lower input to the Illumina Infinium assay as utilised in the MethylationEPIC and now unavailable Methylation 450 K assay are feasible and provide reliable and reproducible results.
Because the core of the Infinium assay protocol is a whole genome amplification step, theoretical inputs into this kit could be as low as 1 ng [16
], and we have shown by our experiments on fresh blood derived samples, FFPE tissue, and circulating tumour DNA that this target is achievable in future experiments. Methylation calls can be reliably and reproducibly carried out on ctDNA samples, including detection of copy number variation with its origins in the base tumour. A weakness of our study is the small numbers of ctDNA samples studied in this project; however, we believe our data illustrates the feasibility of the approach that could be examined in a larger study in the future. Ideally, we would have further validated reproducibility; however, this was limited because of the pilot nature of the experiment. Despite our findings, low inputs into this assay should be used with caution, as they go beyond the manufacturers recommendations, and may not always work as intended.
This advance will allow access to previous inaccessible samples. Typical yields of ctDNA are between 1–10 ng/mL of plasma [17
], and so a typical blood draw will yield 4–40 ng of ctDNA, which could now ideally be divided between mutational and epigenetic assays. We utilised 20 ng of input of ctDNA into the assay, as this was the maximum available to us, and we wanted to ensure that the assay would be successful. While previous strategies to overcome this low ctDNA input was to pool patient samples in risk defined groups [18
] for exploratory purposes, this has no benefit for detecting prognostic signatures in individual patients for accurate risk stratification and/or treatment guidance. Additionally, biopsy samples of human disease states (e.g., cancer), especially when obtained from formalin-fixed, paraffin embedded specimens, are typically very small and yield small amounts of DNA [19
]. These samples are now accessible to whole methylome analysis. For FFPE samples, we would recommend an input of at least 50 ng of DNA, although smaller amounts are possible, with increasing inaccuracy, but if input DNA is not a limiting factor, we would still suggest use at the recommended input into the assay. Therefore, in samples where copy numbers variants need to be detected, as high an input mass of DNA should be achieved as possible, as lower input leads to greater dispersion of probes and therefore greater uncertainty about precise copy number. A weakness of this study is that we were unable to demonstrate reproducibility of analysis of the same ctDNA sample, as the limited input material was expended fully in the initial experiment. However, there was good correlation between the original fresh frozen tumour and the associated ctDNA sample. Additionally, we cannot give a precise recommendation for recommended input to maximise CNV detection, as our sample size precludes this.
We have also observed differences in CNV between ctDNA and fresh tumour samples. It is difficult to know whether these are genuine CNV or artefacts caused by whole genome amplification; however, the use of methylation arrays to derive copy number has been extensively validated [20
]. The ability to access CNV samples in the same assay as whole genome methylation is particularly attractive, especially in ctDNA samples, as it prevents using up precious ctDNA.
Little, if any, work has examined the whole epigenome of circulating tumour DNA. Several groups have demonstrated [21
] single gene techniques across a variety of cancer types. We have demonstrated that oncogenic methylation changes can be detected as well as copy number variants that are present in the original tumour. Recent work in DNA mutation burden from ctDNA samples suggests that the ctDNA seen represent the dominant metastatic clone in the cancer [23
] and can be used for disease tracking [24
]. Several tumours demonstrate aberrant methylation as a key driver of their progression, including head and neck, brain, and colorectal cancer. Epigenetic changes can be demonstrated from plasma samples, but these have been restricted to single gene promoters and thus make discovery work difficult [25
]. Similarly, discovery work in human disease samples, e.g., in pre-treatment biopsies taken at endoscopy, has also been limited due to the amount of input DNA present after extraction. This technique will allow access to these previously inaccessible disease states and allow temporal profiling of methylation i.e., pre-, during, and post-treatment.
In conclusion, we have demonstrated that low input is possible into the Illumina HumanMethylation series of arrays, allowing access to new sample types that would have previously been inaccessible as well as detection of copy number variants and oncogenic methylation changes.
4. Materials and Methods
4.1. Patient Samples
All patient samples were obtained under either ethics for the S–CORT project (ref 15/EE/0241), the ethics for the Birmingham Human Biomaterials Resource Centre (ref 15/NW/0079), or the Accelerated (head and neck squamous cell carcinoma, HNSCC) tissue collection platform (REC ref: 16/NW/0265). In order to understand the effects of low input on FFPE derived samples (the SCORT cohort), DNA from the SCORT consortium was used. Sixteen biological FFPE DNA samples were processed on Infinium® MethylationEPIC BeadChips in a single batch (Illumina Inc., Cambridge, UK). Each biological sample had three varying DNA input amounts of 50, 100, and 150 ng to give a 48 sample comparison. In order to understand the effects of DNA input on fresh DNA, blood derived samples three biological genomic DNA samples (from three separate healthy donors) were processed on an Infinium® HumanMethylation 450 k BeadChip (the Blood cohort). For this one biological sample, input amount varied from 10, 50, 100, 150, 200, 250, 300, and 500 ng (labelled sample “C”). The other two biological DNA samples were processed using 10, 50, 100, 200, and 250 ng input (labelled samples “1” and “3”). For the head and neck ctDNA samples (HNSCC samples), six HNSCC patient biological samples were obtained pre-treatment from the Accelerated tissue collection platform—fresh tissue and blood (collected in Streck blood tubes)—and processed on an Infinium® HumanMethylation 450 k BeadChip.
For the Infinium®
MethylationEPIC BeadChip and Infinium®
HumanMethylation 450 k BeadChip, the Illumina recommended input amount is ≥ 250 ng. Previous data [27
] has demonstrated the equivalence of the 450 k vs. 850 k array for high level analysis.
4.2. DNA Extraction
Human FFPE DNA was extracted from between six and nine 5-micron FFPE colorectal tumour blocks using the QiaAmp Micro Kit (Qiagen, Qiagen Ltd, Manchester, UK). A marked up H&E stained section was used to guide macrodissection of tumour tissue. An assessment of DNA quality and quantity was made initially using a Nanodrop 1000 3.3.0 (Thermo Scientific, Paisley, UK), then the quantity of double stranded DNA was determined using the Quant-iT Picogreen ds DNA Assay Kit (Life technologies, Paisley, UK). Sample concentrations were checked and normalised to 20 ng (six samples) and 250 ng (six samples).
Genomic DNA was extracted from HNSCC fresh tissue biopsies using the DNeasy blood and tissue kit (Qiagen). Cell-free DNA was extracted from 2 mL of plasma from HNSCC blood samples using the QIAamp circulating nucleic acid kit (Qiagen). Sample concentrations were checked and normalised to 20 ng (six ctDNA samples) and 250 ng (six tissue samples).
For the blood samples, a Promega Maxwell RSC instrument (AS4500) was used with the Maxwell RSC Whole Blood DNA Kit (AS1520) (Promega, Southampton, UK). This is a semi-automated DNA extraction method that utilises paramagnetic particles in pre-loaded cartridges to bind DNA and elute into 20 µL volumes. The purified DNA was quantified using the Qubit 2.0 Fluorometric Quantitation instrument using the dsDNA BR (broad range) assay.
4.3. FFPE QC Assay
To determine FFPE DNA suitability for the Infinium HD FFPE methylation assay, the quality was tested in duplicate by real-time PCR following the Illumina FFPE QC protocol (Part # 15020981 Rev. C. Illumina Ltd, Cambridge, UK). Amplification of FFPE sample DNA was compared with the amplification of a Quality Control Template (QCT). The real-time PCR threshold cycle (Ct) was averaged, and a Delta–Ct (DCt) for each sample was calculated (DCt = CtFFPE − CtQCT). An FFPE DNA sample was deemed suitable if the DCt was <5.
4.4. Bisulphite Conversion and Restoration of DNA
The FFPE DNA was bisulphite converted using the EZ DNA Methylation Kit (cat# D5002, Zymo Research, Irvine, CA, USA) following the manufacturer’s instructions appropriate for the Illumina HD FFPE assay. A total of 8 µL of DNA was eluted and taken through to the next stage FFPE DNA Restoration. For non-FFPE DNA, 4 µL was taken through directly to the Infinium HD FFPE Methylation Assay. FFPE DNA restoration was achieved by using the Infinium HD FFPE restore protocol. This restores degraded FFPE DNA to a state that is amplifiable. All eluted restored DNA (approx. 8 µL) was taken through to the Infinium HD FFPE methylation assay.
4.5. Infinium HD FFPE Methylation Assay
All DNA samples were processed following the Infinium HD FFPE Methylation Assay, Manual Protocol. For the Infinium® MethylationEPIC BeadChips, the appropriate RA1 resuspension and hybridising volume were used as provided in the Infinium HD Methylation Assay Manual Protocol. The Illumina iScan was used to scan the arrays recording high resolution images of light emitted by the excited fluorophores at each CpG site on the arrays. The raw intensity data from these images were stored as .idat files and used for analysis. Illumina GenomeStudio v2011.1 was used to process the .idat files. Sample dependent and independent controls were analysed to ensure the assay procedure had been successful. The % of CpG detection was determined at a confidence level of p = 0.05.
All samples passed QC. The analysis had no sample groupings or normalisation, and the background was subtracted.
4.6. Downstream Analysis
All Idat files were imported into R/Bioconductor (R Version 3.6.3) and analysed using ChAMP (v2.13.2) [28
] pipeline up to DMP/DMR processing. ChAMP is a R/Bioconductor pipeline that reads raw data from Illumina iDat files, carries out filtering and quality control, and can carry out all downstream processing. Initially, ChAMP loads in probe intensities direct from iDAT files using a sample sheet provided by the end user. Probe filtering was then carried using ChAMP default settings with probes being dropped if they had a detection p
-value of >0.01, had <3 beads in at least 5% of samples per probe, were a non CpG probe, or were in the list of SNPs as provided in the Zhou et al. paper [29
] or in the Nordlund et al. [30
] paper describing multi-hit probes, and finally, all X/Y chromosome probes were filtered out. Normalisation was carried out using Beta-Mixture Quantile (BMIQ) Normalization. Although samples were done in the same batch, and sample size was relatively small, SVN analysis did not demonstrate any confounding batch effect, so batch correction was not performed. For intra-group comparisons, the probes for each concentration level were compared to every other concentration using multilevel regression. For copy number calling, the conumee algorithm [20
] was used. For between sample correlations, the cor
command of the psych
module from CRAN (https://cran.r-project.org/web/packages/psych/index.html
, accessed on 6 February 2021) was used. All correlations were adjusted for multiple testing using the Bonferroni correction. For pathway analysis, top ranked differentially expressed DMP/DMR were exported to GProfiler [31