Next Issue
Volume 5, December
Previous Issue
Volume 5, June
 
 

Stats, Volume 5, Issue 3 (September 2022) – 18 articles

Cover Story (view full-size image): Small area models have attracted increased attention among federal statistical agencies. The United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) conducts the Farm Labor Survey, which provides the basis for employment and wage estimates for all workers directly hired by farms and ranches in all states except Alaska. Implementing small area models for integrating survey estimates with auxiliary information provides more reliable official estimates and valid measures of uncertainty. The paper discusses several hierarchical Bayesian subarea-level models in support of estimates of interest in the Farm Labor Survey. The framework provides a complete set of coherent estimates for all required geographic levels. These methods were incorporated into the Farm Labor publication for the first time in 2020. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
18 pages, 428 KiB  
Article
Robust Permutation Tests for Penalized Splines
by Nathaniel E. Helwig
Stats 2022, 5(3), 916-933; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030053 - 16 Sep 2022
Cited by 1 | Viewed by 1350
Abstract
Penalized splines are frequently used in applied research for understanding functional relationships between variables. In most applications, statistical inference for penalized splines is conducted using the random effects or Bayesian interpretation of a smoothing spline. These interpretations can be used to assess the [...] Read more.
Penalized splines are frequently used in applied research for understanding functional relationships between variables. In most applications, statistical inference for penalized splines is conducted using the random effects or Bayesian interpretation of a smoothing spline. These interpretations can be used to assess the uncertainty of the fitted values and the estimated component functions. However, statistical tests about the nature of the function are more difficult, because such tests often involve testing a null hypothesis that a variance component is equal to zero. Furthermore, valid statistical inference using the random effects or Bayesian interpretation depends on the validity of the utilized parametric assumptions. To overcome these limitations, I propose a flexible and robust permutation testing framework for inference with penalized splines. The proposed approach can be used to test omnibus hypotheses about functional relationships, as well as more flexible hypotheses about conditional relationships. I establish the conditions under which the methods will produce exact results, as well as the asymptotic behavior of the various permutation tests. Additionally, I present extensive simulation results to demonstrate the robustness and superiority of the proposed approach compared to commonly used methods. Full article
Show Figures

Graphical abstract

18 pages, 511 KiB  
Article
Smoothing County-Level Sampling Variances to Improve Small Area Models’ Outputs
by Lu Chen, Luca Sartore, Habtamu Benecha, Valbona Bejleri and Balgobin Nandram
Stats 2022, 5(3), 898-915; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030052 - 11 Sep 2022
Cited by 1 | Viewed by 1277
Abstract
The use of hierarchical Bayesian small area models, which take survey estimates along with auxiliary data as input to produce official statistics, has increased in recent years. Survey estimates for small domains are usually unreliable due to small sample sizes, and the corresponding [...] Read more.
The use of hierarchical Bayesian small area models, which take survey estimates along with auxiliary data as input to produce official statistics, has increased in recent years. Survey estimates for small domains are usually unreliable due to small sample sizes, and the corresponding sampling variances can also be imprecise and unreliable. This affects the performance of the model (i.e., the model will not produce an estimate or will produce a low-quality modeled estimate), which results in a reduced number of official statistics published by a government agency. To mitigate the unreliable sampling variances, these survey-estimated variances are typically modeled against the direct estimates wherever a relationship between the two is present. However, this is not always the case. This paper explores different alternatives to mitigate the unreliable (beyond some threshold) sampling variances. A Bayesian approach under the area-level model set-up and a distribution-free technique based on bootstrap sampling are proposed to update the survey data. An application to the county-level corn yield data from the County Agricultural Production Survey of the United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) is used to illustrate the proposed approaches. The final county-level model-based estimates for small area domains, produced based on updated survey data from each method, are compared with county-level model-based estimates produced based on the original survey data and the official statistics published in 2016. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

17 pages, 1120 KiB  
Project Report
Using Small Area Estimation to Produce Official Statistics
by Linda J. Young and Lu Chen
Stats 2022, 5(3), 881-897; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030051 - 08 Sep 2022
Cited by 4 | Viewed by 1649
Abstract
The USDA National Agricultural Statistics Service (NASS) and other federal statistical agencies have used probability-based surveys as the foundation for official statistics for over half a century. Non-survey data that can be used to improve the accuracy and precision of estimates such as [...] Read more.
The USDA National Agricultural Statistics Service (NASS) and other federal statistical agencies have used probability-based surveys as the foundation for official statistics for over half a century. Non-survey data that can be used to improve the accuracy and precision of estimates such as administrative, remotely sensed, and retail data have become increasingly available. Both frequentist and Bayesian models are used to combine survey and non-survey data in a principled manner. NASS has recently adopted Bayesian subarea models for three of its national programs: farm labor, crop county estimates, and cash rent county estimates. Each program provides valuable estimates at multiple scales of geography. For each program, technical challenges had to be met and a strenuous review completed before models could be adopted as the foundation for official statistics. Moving models out of the research phase into production required major changes in the production process and a cultural shift. With the implemented models, NASS now has measures of uncertainty, transparency, and reproducibility of its official statistics. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

25 pages, 4695 KiB  
Article
Modeling Realized Variance with Realized Quarticity
by Hiroyuki Kawakatsu
Stats 2022, 5(3), 856-880; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030050 - 07 Sep 2022
Viewed by 1556
Abstract
This paper proposes a model for realized variance that exploits information in realized quarticity. The realized variance and quarticity measures are both highly persistent and highly correlated with each other. The proposed model incorporates information from the observed realized quarticity process via autoregressive [...] Read more.
This paper proposes a model for realized variance that exploits information in realized quarticity. The realized variance and quarticity measures are both highly persistent and highly correlated with each other. The proposed model incorporates information from the observed realized quarticity process via autoregressive conditional variance dynamics. It exploits conditional dependence in higher order (fourth) moments in analogy to the class of GARCH models exploit conditional dependence in second moments. Full article
(This article belongs to the Special Issue Modern Time Series Analysis)
Show Figures

Figure 1

15 pages, 966 KiB  
Article
A New Benford Test for Clustered Data with Applications to American Elections
by Katherine M. Anderson, Kevin Dayaratna, Drew Gonshorowski and Steven J. Miller
Stats 2022, 5(3), 841-855; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030049 - 31 Aug 2022
Cited by 1 | Viewed by 2725
Abstract
A frequent problem with classic first digit applications of Benford’s law is the law’s inapplicability to clustered data, which becomes especially problematic for analyzing election data. This study offers a novel adaptation of Benford’s law by performing a first digit analysis after converting [...] Read more.
A frequent problem with classic first digit applications of Benford’s law is the law’s inapplicability to clustered data, which becomes especially problematic for analyzing election data. This study offers a novel adaptation of Benford’s law by performing a first digit analysis after converting vote counts from election data to base 3 (referred to throughout the paper as 1-BL 3), spreading out the data and thus rendering the law significantly more useful. We test the efficacy of our approach on synthetic election data using discrete Weibull modeling, finding in many cases that election data often conforms to 1-BL 3. Lastly, we apply 1-BL 3 analysis to selected states from the 2004 US Presidential election to detect potential statistical anomalies. Full article
(This article belongs to the Special Issue Benford's Law(s) and Applications)
Show Figures

Figure 1

22 pages, 481 KiB  
Article
A New Bivariate INAR(1) Model with Time-Dependent Innovation Vectors
by Huaping Chen, Fukang Zhu and Xiufang Liu
Stats 2022, 5(3), 819-840; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030048 - 19 Aug 2022
Cited by 4 | Viewed by 1404
Abstract
Recently, there has been a growing interest in integer-valued time series models, especially in multivariate models. Motivated by the diversity of the infinite-patch metapopulation models, we propose an extension to the popular bivariate INAR(1) model, whose innovation vector is assumed to be time-dependent [...] Read more.
Recently, there has been a growing interest in integer-valued time series models, especially in multivariate models. Motivated by the diversity of the infinite-patch metapopulation models, we propose an extension to the popular bivariate INAR(1) model, whose innovation vector is assumed to be time-dependent in the sense that the mean of the innovation vector is linearly increased by the previous population size. We discuss the stationarity and ergodicity of the observed process and its subprocesses. We consider the conditional maximum likelihood estimate of the parameters of interest, and establish their large-sample properties. The finite sample performance of the estimator is assessed via simulations. Applications on crime data illustrate the model. Full article
(This article belongs to the Section Time Series Analysis)
Show Figures

Figure 1

14 pages, 2977 KiB  
Article
Deriving the Optimal Strategy for the Two Dice Pig Game via Reinforcement Learning
by Tian Zhu and Merry H. Ma
Stats 2022, 5(3), 805-818; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030047 - 17 Aug 2022
Cited by 2 | Viewed by 2265
Abstract
Games of chance have historically played a critical role in the development and teaching of probability theory and game theory, and, in the modern age, computer programming and reinforcement learning. In this paper, we derive the optimal strategy for playing the two-dice game [...] Read more.
Games of chance have historically played a critical role in the development and teaching of probability theory and game theory, and, in the modern age, computer programming and reinforcement learning. In this paper, we derive the optimal strategy for playing the two-dice game Pig, both the standard version and its variant with doubles, coined “Double-Trouble”, using certain fundamental concepts of reinforcement learning, especially the Markov decision process and dynamic programming. We further compare the newly derived optimal strategy to other popular play strategies in terms of the winning chances and the order of play. In particular, we compare to the popular “hold at n” strategy, which is considered to be close to the optimal strategy, especially for the best n, for each type of Pig Game. For the standard two-player, two-dice, sequential Pig Game examined here, we found that “hold at 23” is the best choice, with the average winning chance against the optimal strategy being 0.4747. For the “Double-Trouble” version, we found that the “hold at 18” is the best choice, with the average winning chance against the optimal strategy being 0.4733. Furthermore, time in terms of turns to play each type of game is also examined for practical purposes. For optimal vs. optimal or optimal vs. the best “hold at n” strategy, we found that the average number of turns is 19, 23, and 24 for one-die Pig, standard two-dice Pig, and the “Double-Trouble” two-dice Pig games, respectively. We hope our work will inspire students of all ages to invest in the field of reinforcement learning, which is crucial for the development of artificial intelligence and robotics and, subsequently, for the future of humanity. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

21 pages, 3953 KiB  
Article
Autoregressive Models with Time-Dependent Coefficients—A Comparison between Several Approaches
by Rajae Azrak and Guy Mélard
Stats 2022, 5(3), 784-804; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030046 - 12 Aug 2022
Cited by 1 | Viewed by 1445
Abstract
Autoregressive-moving average (ARMA) models with time-dependent (td) coefficients and marginally heteroscedastic innovations provide a natural alternative to stationary ARMA models. Several theories have been developed in the last 25 years for parametric estimations in that context. In this paper, we focus on time-dependent [...] Read more.
Autoregressive-moving average (ARMA) models with time-dependent (td) coefficients and marginally heteroscedastic innovations provide a natural alternative to stationary ARMA models. Several theories have been developed in the last 25 years for parametric estimations in that context. In this paper, we focus on time-dependent autoregressive (tdAR) models and consider one of the estimation theories in that case. We also provide an alternative theory for tdAR processes that relies on a ρ-mixing property. We compare these theories to the Dahlhaus theory for locally stationary processes and the Bibi and Francq theory, made essentially for cyclically time-dependent models, with our own theory. Regarding existing theories, there are differences in the basic assumptions (e.g., on derivability with respect to time or with respect to parameters) that are better seen in specific cases such as the tdAR(1) process. There are also differences in terms of asymptotics, as shown by an example. Our opinion is that the field of application can play a role in choosing one of the theories. This paper is completed by simulation results that show that the asymptotic theory can be used even for short series (less than 50 observations). Full article
(This article belongs to the Section Time Series Analysis)
Show Figures

Figure 1

11 pages, 1320 KiB  
Article
Neutrosophic F-Test for Two Counts of Data from the Poisson Distribution with Application in Climatology
by Muhammad Aslam
Stats 2022, 5(3), 773-783; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030045 - 12 Aug 2022
Cited by 3 | Viewed by 1317
Abstract
This paper addresses the modification of the F-test for count data following the Poisson distribution. The F-test when the count data are expressed in intervals is considered in this paper. The proposed F-test is evaluated using real data from climatology. The comparative study [...] Read more.
This paper addresses the modification of the F-test for count data following the Poisson distribution. The F-test when the count data are expressed in intervals is considered in this paper. The proposed F-test is evaluated using real data from climatology. The comparative study showed the efficiency of the F-test for count data under neutrosophic statistics over the F-test for count data under classical statistics. Full article
Show Figures

Figure 1

18 pages, 477 KiB  
Article
Poisson Extended Exponential Distribution with Associated INAR(1) Process and Applications
by Radhakumari Maya, Christophe Chesneau, Anuresha Krishna and Muhammed Rasheed Irshad
Stats 2022, 5(3), 755-772; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030044 - 05 Aug 2022
Cited by 6 | Viewed by 1818
Abstract
The significance of count data modeling and its applications to real-world phenomena have been highlighted in several research studies. The present study focuses on a two-parameter discrete distribution that can be obtained by compounding the Poisson and extended exponential distributions. It has tractable [...] Read more.
The significance of count data modeling and its applications to real-world phenomena have been highlighted in several research studies. The present study focuses on a two-parameter discrete distribution that can be obtained by compounding the Poisson and extended exponential distributions. It has tractable and explicit forms for its statistical properties. The maximum likelihood estimation method is used to estimate the unknown parameters. An extensive simulation study was also performed. In this paper, the significance of the proposed distribution is demonstrated in a count regression model and in a first-order integer-valued autoregressive process, referred to as the INAR(1) process. In addition to this, the empirical importance of the proposed model is proved through three real-data applications, and the empirical findings indicate that the proposed INAR(1) model provides better results than other competitive models for time series of counts that display overdispersion. Full article
Show Figures

Figure 1

17 pages, 872 KiB  
Article
Model-Based Estimates for Farm Labor Quantities
by Lu Chen, Nathan B. Cruze and Linda J. Young
Stats 2022, 5(3), 738-754; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030043 - 03 Aug 2022
Cited by 1 | Viewed by 1571
Abstract
The United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) conducts the Farm Labor Survey to produce estimates of the number of workers, duration of the workweek, and wage rates for all agricultural workers. Traditionally, expert opinion is used to integrate [...] Read more.
The United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) conducts the Farm Labor Survey to produce estimates of the number of workers, duration of the workweek, and wage rates for all agricultural workers. Traditionally, expert opinion is used to integrate auxiliary information, such as the previous year’s estimates, with the survey’s direct estimates. Alternatively, implementing small area models for integrating survey estimates with additional sources of information provides more reliable official estimates and valid measures of uncertainty for each type of estimate. In this paper, several hierarchical Bayesian subarea-level models are developed in support of different estimates of interest in the Farm Labor Survey. A 2020 case study illustrates the improvement of the direct survey estimates for areas with small sample sizes by using auxiliary information and borrowing information across areas and subareas. The resulting framework provides a complete set of coherent estimates for all required geographic levels. These methods were incorporated into the official Farm Labor publication for the first time in 2020. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

24 pages, 2154 KiB  
Article
Reciprocal Data Transformations and Their Back-Transforms
by Daniel A. Griffith
Stats 2022, 5(3), 714-737; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030042 - 30 Jul 2022
Cited by 1 | Viewed by 2360
Abstract
Variable transformations have a long and celebrated history in statistics, one that was rather academically glamorous at least until generalized linear models theory eclipsed their nurturing normal curve theory role. Still, today it continues to be a covered topic in introductory mathematical statistics [...] Read more.
Variable transformations have a long and celebrated history in statistics, one that was rather academically glamorous at least until generalized linear models theory eclipsed their nurturing normal curve theory role. Still, today it continues to be a covered topic in introductory mathematical statistics courses, offering worthwhile pedagogic insights to students about certain aspects of traditional and contemporary statistical theory and methodology. Since its inception in the 1930s, it has been plagued by a paucity of adequate back-transformation formulae for inverse/reciprocal functions. A literature search exposes that, to date, the inequality E(1/X) ≤ 1/(E(X), which often has a sizeable gap captured by the inequality part of its relationship, is the solitary contender for solving this problem. After documenting that inverse data transformations are anything but a rare occurrence, this paper proposes an innovative, elegant back-transformation solution based upon the Kummer confluent hypergeometric function of the first kind. This paper also derives formal back-transformation formulae for the Manly transformation, something apparently never done before. Much related future research remains to be undertaken; this paper furnishes numerous clues about what some of these endeavors need to be. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

25 pages, 1526 KiB  
Article
A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency
by Weijia Ren, Jianzhu Li, Andreea Erciulescu, Tom Krenzke and Leyla Mohadjer
Stats 2022, 5(3), 689-713; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030041 - 27 Jul 2022
Cited by 1 | Viewed by 1616
Abstract
In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. [...] Read more.
In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

16 pages, 395 KiB  
Article
Multivariate Global-Local Priors for Small Area Estimation
by Tamal Ghosh, Malay Ghosh, Jerry J. Maples and Xueying Tang
Stats 2022, 5(3), 673-688; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030040 - 25 Jul 2022
Viewed by 1431
Abstract
It is now widely recognized that small area estimation (SAE) needs to be model-based. Global-local (GL) shrinkage priors for random effects are important in sparse situations where many areas’ level effects do not have a significant impact on the response beyond what is [...] Read more.
It is now widely recognized that small area estimation (SAE) needs to be model-based. Global-local (GL) shrinkage priors for random effects are important in sparse situations where many areas’ level effects do not have a significant impact on the response beyond what is offered by covariates. We propose in this paper a hierarchical multivariate model with GL priors. We prove the propriety of the posterior density when the regression coefficient matrix has an improper uniform prior. Some concentration inequalities are derived for the tail probabilities of the shrinkage estimators. The proposed method is illustrated via both data analysis and simulations. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

42 pages, 726 KiB  
Article
Comparing the Robustness of the Structural after Measurement (SAM) Approach to Structural Equation Modeling (SEM) against Local Model Misspecifications with Alternative Estimation Approaches
by Alexander Robitzsch
Stats 2022, 5(3), 631-672; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030039 - 22 Jul 2022
Cited by 8 | Viewed by 2250
Abstract
Structural equation models (SEM), or confirmatory factor analysis as a special case, contain model parameters at the measurement part and the structural part. In most social-science SEM applications, all parameters are simultaneously estimated in a one-step approach (e.g., with maximum likelihood estimation). In [...] Read more.
Structural equation models (SEM), or confirmatory factor analysis as a special case, contain model parameters at the measurement part and the structural part. In most social-science SEM applications, all parameters are simultaneously estimated in a one-step approach (e.g., with maximum likelihood estimation). In a recent article, Rosseel and Loh (2022, Psychol. Methods) proposed a two-step structural after measurement (SAM) approach to SEM that estimates the parameters of the measurement model in the first step and the parameters of the structural model in the second step. Rosseel and Loh claimed that SAM is more robust to local model misspecifications (i.e., cross loadings and residual correlations) than one-step maximum likelihood estimation. In this article, it is demonstrated with analytical derivations and simulation studies that SAM is generally not more robust to misspecifications than one-step estimation approaches. Alternative estimation methods are proposed that provide more robustness to misspecifications. SAM suffers from finite-sample bias that depends on the size of factor reliability and factor correlations. A bootstrap-bias-corrected LSAM estimate provides less biased estimates in finite samples. Nevertheless, we argue in the discussion section that applied researchers should nevertheless adopt SAM because robustness to local misspecifications is an irrelevant property when applying SAM. Parameter estimates in a structural model are of interest because intentionally misspecified SEMs frequently offer clearly interpretable factors. In contrast, SEMs with some empirically driven model modifications will result in biased estimates of the structural parameters because the meaning of factors is unintentionally changed. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

14 pages, 11029 KiB  
Article
Semiparametric Survival Analysis of 30-Day Hospital Readmissions with Bayesian Additive Regression Kernel Model
by Sounak Chakraborty, Peng Zhao, Yilun Huang and Tanujit Dey
Stats 2022, 5(3), 617-630; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030038 - 14 Jul 2022
Cited by 2 | Viewed by 1883
Abstract
In this paper, we introduce a kernel-based nonlinear Bayesian model for a right-censored survival outcome data set. Our kernel-based approach provides a flexible nonparametric modeling framework to explore nonlinear relationships between predictors with right-censored survival outcome data. Our proposed kernel-based model is shown [...] Read more.
In this paper, we introduce a kernel-based nonlinear Bayesian model for a right-censored survival outcome data set. Our kernel-based approach provides a flexible nonparametric modeling framework to explore nonlinear relationships between predictors with right-censored survival outcome data. Our proposed kernel-based model is shown to provide excellent predictive performance via several simulation studies and real-life examples. Unplanned hospital readmissions greatly impair patients’ quality of life and have imposed a significant economic burden on American society. In this paper, we focus our application on predicting 30-day readmissions of patients. Our survival Bayesian additive regression kernel model (survival BARK or sBARK) improves the timeliness of readmission preventive intervention through a data-driven approach. Full article
(This article belongs to the Special Issue Survival Analysis: Models and Applications)
Show Figures

Figure 1

11 pages, 288 KiB  
Article
A Log-Det Heuristics for Covariance Matrix Estimation: The Analytic Setup
by Enrico Bernardi and Matteo Farnè
Stats 2022, 5(3), 606-616; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030037 - 05 Jul 2022
Viewed by 1486
Abstract
This paper studies a new nonconvex optimization problem aimed at recovering high-dimensional covariance matrices with a low rank plus sparse structure. The objective is composed of a smooth nonconvex loss and a nonsmooth composite penalty. A number of structural analytic properties of the [...] Read more.
This paper studies a new nonconvex optimization problem aimed at recovering high-dimensional covariance matrices with a low rank plus sparse structure. The objective is composed of a smooth nonconvex loss and a nonsmooth composite penalty. A number of structural analytic properties of the new heuristics are presented and proven, thus providing the necessary framework for further investigating the statistical applications. In particular, the first and the second derivative of the smooth loss are obtained, its local convexity range is derived, and the Lipschitzianity of its gradient is shown. This opens the path to solve the described problem via a proximal gradient algorithm. Full article
(This article belongs to the Special Issue Multivariate Statistics and Applications)
23 pages, 594 KiB  
Article
Quantile Regression Approach for Analyzing Similarity of Gene Expressions under Multiple Biological Conditions
by Dianliang Deng and Mashfiqul Huq Chowdhury
Stats 2022, 5(3), 583-605; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030036 - 02 Jul 2022
Cited by 2 | Viewed by 1840
Abstract
Temporal gene expression data contain ample information to characterize gene function and are now widely used in bio-medical research. A dense temporal gene expression usually shows various patterns in expression levels under different biological conditions. The existing literature investigates the gene trajectory using [...] Read more.
Temporal gene expression data contain ample information to characterize gene function and are now widely used in bio-medical research. A dense temporal gene expression usually shows various patterns in expression levels under different biological conditions. The existing literature investigates the gene trajectory using the mean function. However, temporal gene expression curves usually show a strong degree of heterogeneity under multiple conditions. As a result, rates of change for gene expressions may be different in non-central locations and a mean function model may not capture the non-central location of the gene expression distribution. Further, the mean regression model depends on the normality assumptions of the error terms of the model, which may be impractical when analyzing gene expression data. In this research, a linear quantile mixed model is used to find the trajectory of gene expression data. This method enables the changes in gene expression over time to be studied by estimating a family of quantile functions. A statistical test is proposed to test the similarity between two different gene expressions based on estimated parameters using a quantile model. Then, the performance of the proposed test statistic is examined using extensive simulation studies. Simulation studies demonstrate the good statistical performance of this proposed test statistic and show that this method is robust against normal error assumptions. As an illustration, the proposed method is applied to analyze a dataset of 18 genes in P. aeruginosa, expressed in 24 biological conditions. Furthermore, a minimum Mahalanobis distance is used to find the clustering tree for gene expressions. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop