Advances in Computational Statistics and Applications

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: closed (31 December 2022) | Viewed by 20402

Special Issue Editor


E-Mail Website
Guest Editor
Department of Statistics (Biostatistics), School of Medicine, University of Granada, 18016 Granada, Spain
Interests: biostatistics; categorical data analysis; computational statistics; missing data in medical research; statistics in diagnostic medicine
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Computational statistics is a fundamental area of statistics with important applications in many fields. Computational statistics uses algorithms and numerical methods to solve a multitude of problems, such as parameter estimation, hypothesis testing and statistical modelling. The purpose of this Special Issue is to provide a collection of high-quality manuscripts on all aspects of theoretical research and novel applications of computational statistics especially in health sciences, social sciences and engineering. Manuscripts at the interface of statistics and computing with real applications will also be appreciated.

Topics of interest include but are not limited to the followings: algorithms and computational methods, estimation methods, hypotheses testing, reliability inference, statistical modelling, and applications in health sciences and social sciences.

Dr. José Antonio Roldán-Nofuentes
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Bayesian computing
  • biostatistics
  • categorical data analysis
  • functional data analysis
  • hypotheses testing
  • longitudinal data analysis
  • matrix computations
  • missing data
  • reliability
  • sampling methods
  • statistical algorithms
  • statistical simulation
  • statistical software
  • survival analysis

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 2708 KiB  
Article
An Adaptive Multipath Linear Interpolation Method for Sample Optimization
by Yukun Du, Xiao Jin, Hongxia Wang and Min Lu
Mathematics 2023, 11(3), 768; https://0-doi-org.brum.beds.ac.uk/10.3390/math11030768 - 03 Feb 2023
Viewed by 1207
Abstract
When using machine learning methods to make predictions, the problem of small sample sizes or highly noisy observation samples is common. Current mainstream sample expansion methods cannot handle the data noise problem well. We propose a multipath sample expansion method (AMLI) based on [...] Read more.
When using machine learning methods to make predictions, the problem of small sample sizes or highly noisy observation samples is common. Current mainstream sample expansion methods cannot handle the data noise problem well. We propose a multipath sample expansion method (AMLI) based on the idea of linear interpolation, which mainly solves the problem of insufficient prediction sample size or large error between the observed sample and the actual distribution. The rationale of the AMLI method is to divide the original feature space into several subspaces with equal samples, randomly extract a sample from each subspace as a class, and then perform linear interpolation on the samples in the same class (i.e., K-path linear interpolation). After the AMLI processing, valid samples are greatly expanded, the sample structure is adjusted, and the average noise of the samples is reduced so that the prediction effect of the machine learning model is improved. The hyperparameters of this method have an intuitive explanation and usually require little calibration. We compared the proposed method with a variety of machine learning prediction methods and demonstrated that the AMLI method can significantly improve the prediction result. We also propose an AMLI plus method based on the linear interpolation between classes by combining the idea of AMLI with the clustering method and present theoretical proofs of the effectiveness of the AMLI and AMLI plus methods. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
Show Figures

Figure 1

15 pages, 307 KiB  
Article
Reduced Clustering Method Based on the Inversion Formula Density Estimation
by Mantas Lukauskas and Tomas Ruzgas
Mathematics 2023, 11(3), 661; https://0-doi-org.brum.beds.ac.uk/10.3390/math11030661 - 28 Jan 2023
Cited by 3 | Viewed by 1300
Abstract
Unsupervised learning is one type of machine learning with an exceptionally high number of applications in various fields. The most popular and best-known group of unsupervised machine learning methods is clustering methods. The main goal of clustering is to find hidden relationships between [...] Read more.
Unsupervised learning is one type of machine learning with an exceptionally high number of applications in various fields. The most popular and best-known group of unsupervised machine learning methods is clustering methods. The main goal of clustering is to find hidden relationships between individual observations. There is great interest in different density estimation methods, especially when there are outliers in the data. Density estimation also can be applied to data clustering methods. This paper presents the extension to the clustering method based on the modified inversion formula density estimation to solve previous method limitations. This new method’s extension works within higher dimensions (d > 15) cases, which was the limitation of the previous method. More than 20 data sets are used in comparative data analysis to prove the effectiveness of the developed method improvement. The results showed that the new method extension positively affects the data clustering results. The new reduced clustering method, based on the modified inversion formula density estimation, outperforms popular data clustering methods on test data sets. In cases when the accuracy is not the best, the data clustering accuracy is close to the best models’ obtained accuracies. Lower dimensionality data were used to compare the standard clustering based on the inversion formula density estimation method with the extended method. The new modification method has better results than the standard method in all cases, which confirmed the hypothesis about the new method’s positive impact on clustering results. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
15 pages, 1702 KiB  
Article
SimSST: An R Statistical Software Package to Simulate Stop Signal Task Data
by Mohsen Soltanifar and Chel Hee Lee
Mathematics 2023, 11(3), 500; https://0-doi-org.brum.beds.ac.uk/10.3390/math11030500 - 17 Jan 2023
Viewed by 1701
Abstract
The stop signal task (SST) paradigm with its original roots in 1948 has been proposed to study humans’ response inhibition. Several statistical software codes have been designed by researchers to simulate SST data in order to study various theories of modeling response inhibition [...] Read more.
The stop signal task (SST) paradigm with its original roots in 1948 has been proposed to study humans’ response inhibition. Several statistical software codes have been designed by researchers to simulate SST data in order to study various theories of modeling response inhibition and their assumptions. Yet, there has been a missing standalone statistical software package to enable researchers to simulate SST data under generalized scenarios. This paper presents the R statistical software package “SimSST”, available in Comprehensive R Archive Network (CRAN), to simulate stop signal task (SST) data. The package is based on the general non-independent horse race model, the copulas in probability theory, and underlying ExGaussian (ExG) or Shifted Wald (SW) distributional assumption for the involving go and stop processes enabling the researchers to simulate sixteen scenarios of the SST data. A working example for one of the scenarios is presented to evaluate the simulations’ precision on parameter estimations. Package limitations and future work directions for its subsequent extensions are discussed. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
Show Figures

Figure 1

24 pages, 1281 KiB  
Article
Computational Analysis of XLindley Parameters Using Adaptive Type-II Progressive Hybrid Censoring with Applications in Chemical Engineering
by Refah Alotaibi, Mazen Nassar and Ahmed Elshahhat
Mathematics 2022, 10(18), 3355; https://0-doi-org.brum.beds.ac.uk/10.3390/math10183355 - 15 Sep 2022
Cited by 12 | Viewed by 1337
Abstract
This work addresses the estimation issues of the XLindley distribution using an adaptive Type-II progressive hybrid censoring scheme. Maximum likelihood and Bayesian approaches are used to estimate the unknown parameter, reliability, and hazard rate functions. Bayesian estimators are explored under the assumption of [...] Read more.
This work addresses the estimation issues of the XLindley distribution using an adaptive Type-II progressive hybrid censoring scheme. Maximum likelihood and Bayesian approaches are used to estimate the unknown parameter, reliability, and hazard rate functions. Bayesian estimators are explored under the assumption of independent gamma priors and a symmetric loss function. The approximate confidence intervals and the highest posterior density credible intervals are also computed. An extensive simulation study that takes into account various sample sizes and censoring schemes is implemented to evaluate the various estimating methods. Finally, for an explanation, two real data sets from the chemical engineering field are provided to show that the XLindley distribution is the best model compared to some competitive models for the same real data. The Bayesian paradigm utilizing the Metropolis–Hastings algorithm to generate samples from the posterior distribution is recommended to estimate any parameter of life of the XLindley distribution when data are obtained from adaptive Type-II progressively hybrid censored sample. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
Show Figures

Figure 1

19 pages, 3098 KiB  
Article
Optimal Weighted Multiple-Testing Procedure for Clinical Trials
by Hanan Hammouri, Marwan Alquran, Ruwa Abdel Muhsen and Jaser Altahat
Mathematics 2022, 10(12), 1996; https://0-doi-org.brum.beds.ac.uk/10.3390/math10121996 - 09 Jun 2022
Cited by 1 | Viewed by 1373
Abstract
This paper describes a new method for testing randomized clinical trials with binary outcomes, which combines the O’Brien and Fleming (1979) multiple-testing procedure with optimal allocations and unequal weighted samples simultaneously. The O’Brien and Fleming method of group sequential testing is a simple [...] Read more.
This paper describes a new method for testing randomized clinical trials with binary outcomes, which combines the O’Brien and Fleming (1979) multiple-testing procedure with optimal allocations and unequal weighted samples simultaneously. The O’Brien and Fleming method of group sequential testing is a simple and effective method with the same Type I error and power as a fixed one-stage chi-square test, with the option to terminate early if one treatment is clearly superior to another. This study modified the O’Brien and Fleming procedure, resulting in a more flexible new procedure, where the optimal allocation assists in allocating more subjects to the winning treatment without compromising the integrity of the study, while unequal weighting allows for different samples to be chosen for different stages of a trial. The new optimal weighted multiple-testing procedure (OWMP), based on simulation studies, is relatively robust to the added features because it showed a high preference for decreasing the Type I error and maintaining the power. In addition, the procedure was illustrated using simulated and real-life examples. The outcomes of the current study suggest that the new procedure is as effective as the original. However, it is more flexible. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
Show Figures

Figure 1

21 pages, 589 KiB  
Article
Impact of Stratum Composition Changes on the Accuracy of the Estimates in a Sample Survey
by Danutė Krapavickaitė
Mathematics 2022, 10(7), 1093; https://0-doi-org.brum.beds.ac.uk/10.3390/math10071093 - 28 Mar 2022
Cited by 2 | Viewed by 1295
Abstract
The study is devoted to measuring the impact of the element changes on the bias and variance of the estimator of the total in a sample business survey. Stratified simple random sampling is usually used in business surveys. Enterprises may join, split or [...] Read more.
The study is devoted to measuring the impact of the element changes on the bias and variance of the estimator of the total in a sample business survey. Stratified simple random sampling is usually used in business surveys. Enterprises may join, split or change the stratum between sample selection and data collection. Assuming a model for enterprises joining and a model for the enterprises changing the stratum with some probability, expressions for the adjusted estimators of the total and the adjusted estimators of their variances are proposed. The influence of the enterprise changes on the variances of the estimators of the total is measured by the relative differences, i.e., by comparing them with the estimators, if there were no changes. The analytic results are illustrated with a simulation study using modified enterprise data. The simulation results demonstrate a large impact of the enterprise changes on the accuracy of the estimates, even in the case of the low probability of changes. The simulation results justify the need for adjustment of the enterprise changes between the sample selection and data collection, in order to improve the accuracy of results and the adjustment method available. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
Show Figures

Figure 1

11 pages, 381 KiB  
Article
Subgroup Identification and Regression Analysis of Clustered and Heterogeneous Interval-Censored Data
by Xifen Huang and Jinfeng Xu
Mathematics 2022, 10(6), 862; https://0-doi-org.brum.beds.ac.uk/10.3390/math10060862 - 08 Mar 2022
Cited by 1 | Viewed by 1759
Abstract
Clustered and heterogeneous interval-censored data occur in many fields such as medical studies. For example, in a migraine study with the Netherlands Twin Registry, the information including time to diagnosis of migraine and gender was collected for 3975 monozygotic and dizygotic twins. Since [...] Read more.
Clustered and heterogeneous interval-censored data occur in many fields such as medical studies. For example, in a migraine study with the Netherlands Twin Registry, the information including time to diagnosis of migraine and gender was collected for 3975 monozygotic and dizygotic twins. Since each study subject is observed only at discrete and periodic follow-up time points, the failure times of interest (i.e., the time when the individual first had a migraine) are known only to belong to certain intervals and hence are interval-censored. Furthermore, these twins come from different genetic backgrounds and may be associated with differential risks for developing migraines. For simultaneous subgroup identification and regression analysis of such data, we propose a latent Cox model where the number of subgroups is not assumed a priori but rather data-driven estimated. The nonparametric maximum likelihood method and an EM algorithm with monotone ascent property are also developed for estimating the model parameters. Simulation studies are conducted to assess the finite sample performance of the proposed estimation procedure. We further illustrate the proposed methodologies by an empirical analysis of migraine data. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
Show Figures

Figure 1

9 pages, 457 KiB  
Article
Generalized Confidence Intervals for Zero-Inflated Pareto Distribution
by Xiao Wang and Xinmin Li
Mathematics 2021, 9(24), 3272; https://0-doi-org.brum.beds.ac.uk/10.3390/math9243272 - 16 Dec 2021
Cited by 5 | Viewed by 2481
Abstract
This paper considers interval estimations for the mean of Pareto distribution with excess zeros. Three approaches for interval estimation are proposed based on fiducial generalized pivotal quantities (FGPQs), respectively. Simulation studies are performed to assess the performance of the proposed methods, along with [...] Read more.
This paper considers interval estimations for the mean of Pareto distribution with excess zeros. Three approaches for interval estimation are proposed based on fiducial generalized pivotal quantities (FGPQs), respectively. Simulation studies are performed to assess the performance of the proposed methods, along with three measurements to determine comparisons with competing approaches. The advantages and disadvantages of each method are provided. The methods are illustrated using a real phone call dataset. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
Show Figures

Figure 1

23 pages, 431 KiB  
Article
On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures
by Luis Castro-Martín, María del Mar Rueda, Ramón Ferri-García and César Hernando-Tamayo
Mathematics 2021, 9(23), 2991; https://0-doi-org.brum.beds.ac.uk/10.3390/math9232991 - 23 Nov 2021
Cited by 8 | Viewed by 2254
Abstract
In the last years, web surveys have established themselves as one of the main methods in empirical research. However, the effect of coverage and selection bias in such surveys has undercut their utility for statistical inference in finite populations. To compensate for these [...] Read more.
In the last years, web surveys have established themselves as one of the main methods in empirical research. However, the effect of coverage and selection bias in such surveys has undercut their utility for statistical inference in finite populations. To compensate for these biases, researchers have employed a variety of statistical techniques to adjust nonprobability samples so that they more closely match the population. In this study, we test the potential of the XGBoost algorithm in the most important methods for estimation that integrate data from a probability survey and a nonprobability survey. At the same time, a comparison is made of the effectiveness of these methods for the elimination of biases. The results show that the four proposed estimators based on gradient boosting frameworks can improve survey representativity with respect to other classic prediction methods. The proposed methodology is also used to analyze a real nonprobability survey sample on the social effects of COVID-19. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
Show Figures

Figure 1

18 pages, 2213 KiB  
Article
A Bayesian-Deep Learning Model for Estimating COVID-19 Evolution in Spain
by Stefano Cabras
Mathematics 2021, 9(22), 2921; https://0-doi-org.brum.beds.ac.uk/10.3390/math9222921 - 16 Nov 2021
Cited by 13 | Viewed by 2382
Abstract
This work proposes a semi-parametric approach to estimate the evolution of COVID-19 (SARS-CoV-2) in Spain. Considering the sequences of 14-day cumulative incidence of all Spanish regions, it combines modern Deep Learning (DL) techniques for analyzing sequences with the usual Bayesian Poisson-Gamma model for [...] Read more.
This work proposes a semi-parametric approach to estimate the evolution of COVID-19 (SARS-CoV-2) in Spain. Considering the sequences of 14-day cumulative incidence of all Spanish regions, it combines modern Deep Learning (DL) techniques for analyzing sequences with the usual Bayesian Poisson-Gamma model for counts. The DL model provides a suitable description of the observed time series of counts, but it cannot give a reliable uncertainty quantification. The role of expert elicitation of the expected number of counts and its reliability is DL predictions’ role in the proposed modelling approach. Finally, the posterior predictive distribution of counts is obtained in a standard Bayesian analysis using the well known Poisson-Gamma model. The model allows to predict the future evolution of the sequences on all regions or estimates the consequences of eventual scenarios. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
Show Figures

Figure 1

16 pages, 311 KiB  
Article
Simultaneous Comparison of Sensitivities and Specificities of Two Diagnostic Tests Adjusting for Discrete Covariates
by José Antonio Roldán-Nofuentes
Mathematics 2021, 9(17), 2029; https://0-doi-org.brum.beds.ac.uk/10.3390/math9172029 - 24 Aug 2021
Viewed by 1398
Abstract
Adjusting for covariates is important in the study of the performance of diagnostic tests. In this manuscript, the simultaneous comparison of the sensitivities and specificities of two binary diagnostic tests is studied when discrete covariates are observed in all of the individuals in [...] Read more.
Adjusting for covariates is important in the study of the performance of diagnostic tests. In this manuscript, the simultaneous comparison of the sensitivities and specificities of two binary diagnostic tests is studied when discrete covariates are observed in all of the individuals in the sample. Four methods are presented to simultaneously compare the two sensitivities and the two specificities: a global hypothesis test and three other methods based on individual comparisons. The maximum likelihood method was applied to adjust the overall estimators of sensitivities and specificities. Simulation experiments were carried out to study the asymptotic behaviors of the four proposed methods when the covariate is binary, giving general rules of application. The results were applied to a real example. Full article
(This article belongs to the Special Issue Advances in Computational Statistics and Applications)
Back to TopTop