Statistical Genetics in Human Diseases

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Molecular Genetics and Genomics".

Deadline for manuscript submissions: closed (15 November 2022) | Viewed by 24364

Special Issue Editor


E-Mail Website
Guest Editor
Department of Epidemiology and Biostatistics, School of Public Health-Bloomington, Indiana University, Bloomington, IN, USA
Interests: statistical genetics; bioinformatics; statistical hypothesis testing; machine learning; survival data analysis; mate-analysis

Special Issue Information

Dear Colleagues,

We are pleased to announce that we will publish a Special Issue on statistical genetics in human diseases. Articles related to statistical genetics in human diseases, either novel statistical approaches or the application of existing statistical methods to human diseases, are considered suitable for this Special Issue. Original research and review studies are welcome.

Dr. Zhongxue Chen
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • statistical genetics
  • association test
  • genetic risk factor
  • human disease
  • GWAS
  • rare genetic variants

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 351 KiB  
Article
Gene–Folic Acid Interactions and Risk of Conotruncal Heart Defects: Results from the National Birth Defects Prevention Study
by Daniel M. Webber, Ming Li, Stewart L. MacLeod, Xinyu Tang, Joseph W. Levy, Mohammad A. Karim, Stephen W. Erickson, Charlotte A. Hobbs and The National Birth Defects Prevention Study
Genes 2023, 14(1), 180; https://0-doi-org.brum.beds.ac.uk/10.3390/genes14010180 - 09 Jan 2023
Cited by 3 | Viewed by 1578
Abstract
Conotruncal heart defects (CTDs) are heart malformations that affect the cardiac outflow tract and typically cause significant morbidity and mortality. Evidence from epidemiological studies suggests that maternal folate intake is associated with a reduced risk of heart defects, including CTD. However, it is [...] Read more.
Conotruncal heart defects (CTDs) are heart malformations that affect the cardiac outflow tract and typically cause significant morbidity and mortality. Evidence from epidemiological studies suggests that maternal folate intake is associated with a reduced risk of heart defects, including CTD. However, it is unclear if folate-related gene variants and maternal folate intake have an interactive effect on the risk of CTDs. In this study, we performed targeted sequencing of folate-related genes on DNA from 436 case families with CTDs who are enrolled in the National Birth Defects Prevention Study and then tested for common and rare variants associated with CTD. We identified risk alleles in maternal MTHFS (ORmeta = 1.34; 95% CI 1.07 to 1.67), maternal NOS2 (ORmeta = 1.34; 95% CI 1.05 to 1.72), fetal MTHFS (ORmeta = 1.35; 95% CI 1.09 to 1.66), and fetal TCN2 (ORmeta = 1.38; 95% CI 1.12 to 1.70) that are associated with an increased risk of CTD among cases without folic acid supplementation. We detected putative de novo mutations in genes from the folate, homocysteine, and transsulfuration pathways and identified a significant association between rare variants in MGST1 and CTD risk. Results suggest that periconceptional folic acid supplementation is associated with decreased risk of CTD among individuals with susceptible genotypes. Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
25 pages, 1425 KiB  
Article
Genetic Overlap Analysis Identifies a Shared Etiology between Migraine and Headache with Type 2 Diabetes
by Md Rafiqul Islam, The International Headache Genetics Consortium (IHGC) and Dale R. Nyholt
Genes 2022, 13(10), 1845; https://0-doi-org.brum.beds.ac.uk/10.3390/genes13101845 - 12 Oct 2022
Cited by 3 | Viewed by 2785
Abstract
Migraine and headache frequently co-occur with type 2 diabetes (T2D), suggesting a shared aetiology between the two conditions. We used genome-wide association study (GWAS) data to investigate the genetic overlap and causal relationship between migraine and headache with T2D. Using linkage disequilibrium score [...] Read more.
Migraine and headache frequently co-occur with type 2 diabetes (T2D), suggesting a shared aetiology between the two conditions. We used genome-wide association study (GWAS) data to investigate the genetic overlap and causal relationship between migraine and headache with T2D. Using linkage disequilibrium score regression (LDSC), we found a significant genetic correlation between migraine and T2D (rg = 0.06, p = 1.37 × 10−5) and between headache and T2D (rg = 0.07, p = 3.0 × 10−4). Using pairwise GWAS (GWAS-PW) analysis, we identified 11 pleiotropic regions between migraine and T2D and 5 pleiotropic regions between headache and T2D. Cross-trait SNP meta-analysis identified 23 novel SNP loci (Pmeta < 5 × 10−8) associated with migraine and T2D, and three novel SNP loci associated with headache and T2D. Cross-trait gene-based overlap analysis identified 33 genes significantly associated (Pgene-based < 3.85 × 10−6) with migraine and T2D, and 11 genes associated with headache and T2D, with 7 genes (EHMT2, SLC44A4, PLEKHA1, CFDP1, TMEM170A, CHST6, and BCAR1) common between them. There was also a significant overlap of genes nominally associated (Pgene-based < 0.05) with both migraine and T2D (Pbinomial-test = 2.83 × 10−46) and headache and T2D (Pbinomial-test = 4.08 × 10−29). Mendelian randomisation (MR) analyses did not provide consistent evidence for a causal relationship between migraine and T2D. However, we found headache was causally associated (inverse-variance weighted, ORIVW = 0.90, Pivw = 7 × 10−3) with T2D. Our findings robustly confirm the comorbidity of migraine and headache with T2D, with shared genetically controlled biological mechanisms contributing to their co-occurrence, and evidence for a causal relationship between headache and T2D. Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
Show Figures

Figure 1

10 pages, 1595 KiB  
Article
Gene-Based Association Tests Using New Polygenic Risk Scores and Incorporating Gene Expression Data
by Shijia Yan, Qiuying Sha and Shuanglin Zhang
Genes 2022, 13(7), 1120; https://0-doi-org.brum.beds.ac.uk/10.3390/genes13071120 - 22 Jun 2022
Cited by 2 | Viewed by 1695
Abstract
Recently, gene-based association studies have shown that integrating genome-wide association studies (GWAS) with expression quantitative trait locus (eQTL) data can boost statistical power and that the genetic liability of traits can be captured by polygenic risk scores (PRSs). In this paper, we propose [...] Read more.
Recently, gene-based association studies have shown that integrating genome-wide association studies (GWAS) with expression quantitative trait locus (eQTL) data can boost statistical power and that the genetic liability of traits can be captured by polygenic risk scores (PRSs). In this paper, we propose a new gene-based statistical method that leverages gene-expression measurements and new PRSs to identify genes that are associated with phenotypes of interest. We used a generalized linear model to associate phenotypes with gene expression and PRSs and used a score-test statistic to test the association between phenotypes and genes. Our simulation studies show that the newly developed method has correct type I error rates and can boost statistical power compared with other methods that use either gene expression or PRS in association tests. A real data analysis figure based on UK Biobank data for asthma shows that the proposed method is applicable to GWAS. Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
Show Figures

Figure 1

11 pages, 1861 KiB  
Article
Opioid Use Disorder and Alternative mRNA Splicing in Reward Circuitry
by Spencer B. Huggett, Ami S. Ikeda, John E. McGeary, Karla R. Kaun and Rohan H. C. Palmer
Genes 2022, 13(6), 1045; https://0-doi-org.brum.beds.ac.uk/10.3390/genes13061045 - 10 Jun 2022
Cited by 3 | Viewed by 2222
Abstract
Opiate/opioid use disorder (OUD) is a chronic relapsing brain disorder that has increased in prevalence in the last two decades in the United States. Understanding the molecular correlates of OUD may provide key insights into the pathophysiology of this syndrome. Using publicly available [...] Read more.
Opiate/opioid use disorder (OUD) is a chronic relapsing brain disorder that has increased in prevalence in the last two decades in the United States. Understanding the molecular correlates of OUD may provide key insights into the pathophysiology of this syndrome. Using publicly available RNA-sequencing data, our study investigated the possible role of alternative mRNA splicing in human brain tissue (dorsal–lateral prefrontal cortex (dlPFC), nucleus accumbens (NAc), and midbrain) of 90 individuals with OUD or matched controls. We found a total of 788 differentially spliced genes across brain regions. Alternative mRNA splicing demonstrated mostly tissue-specific effects, but a functionally characterized splicing change in the clathrin and AP-2-binding (CLAP) domain of the Bridging Integrator 1 (BIN1) gene was significantly linked to OUD across all brain regions. We investigated two hypotheses that may underlie differential splicing in OUD. First, we tested whether spliceosome genes were disrupted in the brains of individuals with OUD. Pathway enrichment analyses indicated spliceosome perturbations in OUD across brain regions. Second, we tested whether alternative mRNA splicing regions were linked to genetic predisposition. Using a genome-wide association study (GWAS) of OUD, we found no evidence that DNA variants within or surrounding differentially spliced genes were implicated in the heritability of OUD. Altogether, our study contributes to the understanding of OUD pathophysiology by providing evidence of a possible role of alternative mRNA splicing in OUD. Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
Show Figures

Figure 1

13 pages, 1343 KiB  
Article
XCMAX4: A Robust X Chromosomal Genetic Association Test Accounting for Covariates
by Youpeng Su, Jing Hu, Ping Yin, Hongwei Jiang, Siyi Chen, Mengyi Dai, Ziwei Chen and Peng Wang
Genes 2022, 13(5), 847; https://0-doi-org.brum.beds.ac.uk/10.3390/genes13050847 - 09 May 2022
Cited by 1 | Viewed by 1500
Abstract
Although the X chromosome accounts for about 5% of the human genes, it is routinely excluded from genome-wide association studies probably due to its unique structure and complex biological patterns. While some statistical methods have been proposed for testing the association between X [...] Read more.
Although the X chromosome accounts for about 5% of the human genes, it is routinely excluded from genome-wide association studies probably due to its unique structure and complex biological patterns. While some statistical methods have been proposed for testing the association between X chromosomal markers and diseases, very a few of them can adjust for covariates. Unfortunately, those methods that can incorporate covariates either need to specify an X chromosome inactivation model or require the permutation procedure to compute the p value. In this article, we proposed a novel analytic approach based on logistic regression that allows for covariates and does not need to specify the underlying X chromosome inactivation pattern. Simulation studies showed that our proposed method controls the size well and has robust performance in power across various practical scenarios. We applied the proposed method to analyze Graves’ disease data to show its usefulness in practice. Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
Show Figures

Figure 1

26 pages, 2954 KiB  
Article
Gene-Based Methods for Estimating the Degree of the Skewness of X Chromosome Inactivation
by Meng-Kai Li, Yu-Xin Yuan, Bin Zhu, Kai-Wen Wang, Wing Kam Fung and Ji-Yuan Zhou
Genes 2022, 13(5), 827; https://0-doi-org.brum.beds.ac.uk/10.3390/genes13050827 - 06 May 2022
Viewed by 1739
Abstract
Skewed X chromosome inactivation (XCI-S) has been reported to be associated with some X-linked diseases, and currently several methods have been proposed to estimate the degree of the XCI-S (denoted as γ) for a single locus. However, no method has been available [...] Read more.
Skewed X chromosome inactivation (XCI-S) has been reported to be associated with some X-linked diseases, and currently several methods have been proposed to estimate the degree of the XCI-S (denoted as γ) for a single locus. However, no method has been available to estimate γ for genes. Therefore, in this paper, we first propose the point estimate and the penalized point estimate of γ for genes, and then derive its confidence intervals based on the Fieller’s and penalized Fieller’s methods, respectively. Further, we consider the constraint condition of γ[0, 2] and propose the Bayesian methods to obtain the point estimates and the credible intervals of γ, where a truncated normal prior and a uniform prior are respectively used (denoted as GBN and GBU). The simulation results show that the Bayesian methods can avoid the extreme point estimates (0 or 2), the empty sets, the noninformative intervals ([0, 2]) and the discontinuous intervals to occur. GBN performs best in both the point estimation and the interval estimation. Finally, we apply the proposed methods to the Minnesota Center for Twin and Family Research data for their practical use. In summary, in practical applications, we recommend using GBN to estimate γ of genes. Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
Show Figures

Figure 1

12 pages, 691 KiB  
Article
Estimation of Causal Effect of Age at Menarche on Pubertal Height Growth Using Mendelian Randomization
by Eun Jae Jo, Shizhong Han and Kai Wang
Genes 2022, 13(4), 710; https://0-doi-org.brum.beds.ac.uk/10.3390/genes13040710 - 17 Apr 2022
Cited by 5 | Viewed by 2153
Abstract
We use Mendelian randomization to estimate the causal effect of age at menarche on late pubertal height growth and total pubertal height growth. The instrument SNPs selected from the exposure genome-wide association study (GWAS) are validated in additional population-matched exposure GWASs. Based on [...] Read more.
We use Mendelian randomization to estimate the causal effect of age at menarche on late pubertal height growth and total pubertal height growth. The instrument SNPs selected from the exposure genome-wide association study (GWAS) are validated in additional population-matched exposure GWASs. Based on the inverse variance weighting method, there is a positive causal relationship of age at menarche on late pubertal growth (β^=0.56, 95% CI: (0.34, 0.78), p=3.16×107) and on total pubertal growth (β^=0.36, 95% CI: (0.14, 0.58), p=1.30×103). If the instrument SNPs are not validated in additional exposure GWASs, the estimated effect on late pubertal height growth increases by 3.6% to β^=0.58 (95% CI: (0.42, 0.73), p=4.38×1013) while the estimates on total pubertal height growth increases by 41.7% to β^=0.51 (95% CI: (0.35, 0.67), p=2.96×1011). Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
Show Figures

Figure 1

15 pages, 356 KiB  
Article
Interep: An R Package for High-Dimensional Interaction Analysis of the Repeated Measurement Data
by Fei Zhou, Jie Ren, Yuwen Liu, Xiaoxi Li, Weiqun Wang and Cen Wu
Genes 2022, 13(3), 544; https://0-doi-org.brum.beds.ac.uk/10.3390/genes13030544 - 19 Mar 2022
Cited by 2 | Viewed by 2738
Abstract
We introduce interep, an R package for interaction analysis of repeated measurement data with high-dimensional main and interaction effects. In G × E interaction studies, the forms of environmental factors play a critical role in determining how structured sparsity should be imposed [...] Read more.
We introduce interep, an R package for interaction analysis of repeated measurement data with high-dimensional main and interaction effects. In G × E interaction studies, the forms of environmental factors play a critical role in determining how structured sparsity should be imposed in the high-dimensional scenario to identify important effects. Zhou et al. (2019) (PMID: 31816972) proposed a longitudinal penalization method to select main and interaction effects corresponding to the individual and group structure, respectively, which requires a mixture of individual and group level penalties. The R package interep implements generalized estimating equation (GEE)-based penalization methods with this sparsity assumption. Moreover, alternative methods have also been implemented in the package. These alternative methods merely select effects on an individual level and ignore the group-level interaction structure. In this software article, we first introduce the statistical methodology corresponding to the penalized GEE methods implemented in the package. Next, we present the usage of the core and supporting functions, which is followed by a simulation example with R codes and annotations. The R package interep is available at The Comprehensive R Archive Network (CRAN). Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
15 pages, 679 KiB  
Article
Leveraging Gene-Level Prediction as Informative Covariate in Hypothesis Weighting Improves Power for Rare Variant Association Studies
by Ying Ji, Rui Chen, Quan Wang, Qiang Wei, Ran Tao and Bingshan Li
Genes 2022, 13(2), 381; https://0-doi-org.brum.beds.ac.uk/10.3390/genes13020381 - 19 Feb 2022
Viewed by 2017
Abstract
Gene-based rare variant association studies (RVASs) have low power due to the infrequency of rare variants and the large multiple testing burden. To correct for multiple testing, traditional false discovery rate (FDR) procedures which depend solely on P-values are often used. Recently, Independent [...] Read more.
Gene-based rare variant association studies (RVASs) have low power due to the infrequency of rare variants and the large multiple testing burden. To correct for multiple testing, traditional false discovery rate (FDR) procedures which depend solely on P-values are often used. Recently, Independent Hypothesis Weighting (IHW) was developed to improve the detection power while maintaining FDR control by leveraging prior information for each hypothesis. Here, we present a framework to increase power of gene-based RVASs by incorporating prior information using IHW. We first build supervised machine learning models to assign each gene a prediction score that measures its disease risk, using the input of multiple biological features, fed with high-confidence risk genes and local background genes selected near GWAS significant loci as the training set. Then we use the prediction scores as covariates to prioritize RVAS results via IHW. We demonstrate the effectiveness of this framework through applications to RVASs in schizophrenia and autism spectrum disorder. We found sizeable improvements in the number of significant associations compared to traditional FDR approaches, and independent evidence supporting the relevance of the genes identified by our framework but not traditional FDR, demonstrating the potential of our framework to improve power of gene-based RVASs. Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
Show Figures

Figure 1

18 pages, 1574 KiB  
Article
A Smoothed Version of the Lassosum Penalty for Fitting Integrated Risk Models Using Summary Statistics or Individual-Level Data
by Georg Hahn, Dmitry Prokopenko, Sharon M. Lutz, Kristina Mullin, Rudolph E. Tanzi, Michael H. Cho, Edwin K. Silverman, Christoph Lange and on the behalf of the NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium
Genes 2022, 13(1), 112; https://0-doi-org.brum.beds.ac.uk/10.3390/genes13010112 - 06 Jan 2022
Cited by 2 | Viewed by 2228
Abstract
Polygenic risk scores are a popular means to predict the disease risk or disease susceptibility of an individual based on its genotype information. When adding other important epidemiological covariates such as age or sex, we speak of an integrated risk model. Methodological advances [...] Read more.
Polygenic risk scores are a popular means to predict the disease risk or disease susceptibility of an individual based on its genotype information. When adding other important epidemiological covariates such as age or sex, we speak of an integrated risk model. Methodological advances for fitting more accurate integrated risk models are of immediate importance to improve the precision of risk prediction, thereby potentially identifying patients at high risk early on when they are still able to benefit from preventive steps/interventions targeted at increasing their odds of survival, or at reducing their chance of getting a disease in the first place. This article proposes a smoothed version of the “Lassosum” penalty used to fit polygenic risk scores and integrated risk models using either summary statistics or raw data. The smoothing allows one to obtain explicit gradients everywhere for efficient minimization of the Lassosum objective function while guaranteeing bounds on the accuracy of the fit. An experimental section on both Alzheimer’s disease and COPD (chronic obstructive pulmonary disease) demonstrates the increased accuracy of the proposed smoothed Lassosum penalty compared to the original Lassosum algorithm (for the datasets under consideration), allowing it to draw equal with state-of-the-art methodology such as LDpred2 when evaluated via the AUC (area under the ROC curve) metric. Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
Show Figures

Figure 1

11 pages, 775 KiB  
Article
CMAX3: A Robust Statistical Test for Genetic Association Accounting for Covariates
by Zhongxue Chen and Yong Zang
Genes 2021, 12(11), 1723; https://0-doi-org.brum.beds.ac.uk/10.3390/genes12111723 - 28 Oct 2021
Cited by 2 | Viewed by 1922
Abstract
The additive genetic model as implemented in logistic regression has been widely used in genome-wide association studies (GWASs) for binary outcomes. Unfortunately, for many complex diseases, the underlying genetic models are generally unknown and a mis-specification of the genetic model can result in [...] Read more.
The additive genetic model as implemented in logistic regression has been widely used in genome-wide association studies (GWASs) for binary outcomes. Unfortunately, for many complex diseases, the underlying genetic models are generally unknown and a mis-specification of the genetic model can result in a substantial loss of power. To address this issue, the MAX3 test (the maximum of three separate test statistics) has been proposed as a robust test that performs plausibly regardless of the underlying genetic model. However, the original implementation of MAX3 utilizes the trend test so it cannot adjust for any covariates such as age and gender. This drawback has significantly limited the application of the MAX3 in GWASs, as covariates account for a considerable amount of variability in these disorders. In this paper, we extended the MAX3 and proposed the CMAX3 (covariate-adjusted MAX3) based on logistic regression. The proposed test yielded a similar robust efficiency as the original MAX3 while easily adjusting for any covariate based on the likelihood framework. The asymptotic formula to calculate the p-value of the proposed test was also developed in this paper. The simulation results showed that the proposed test performed desirably under both the null and alternative hypotheses. For the purpose of illustration, we applied the proposed test to re-analyze a case-control GWAS dataset from the Collaborative Studies on Genetics of Alcoholism (COGA). The R code to implement the proposed test is also introduced in this paper and is available for free download. Full article
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)
Show Figures

Figure 1

Back to TopTop