Research

15 pages, 2708 KiB

Open AccessArticle

An Adaptive Multipath Linear Interpolation Method for Sample Optimization

by Yukun Du, Xiao Jin, Hongxia Wang and Min Lu

Mathematics 2023, 11(3), 768; https://0-doi-org.brum.beds.ac.uk/10.3390/math11030768 - 03 Feb 2023

Viewed by 1207

When using machine learning methods to make predictions, the problem of small sample sizes or highly noisy observation samples is common. Current mainstream sample expansion methods cannot handle the data noise problem well. We propose a multipath sample expansion method (AMLI) based on [...] Read more.

When using machine learning methods to make predictions, the problem of small sample sizes or highly noisy observation samples is common. Current mainstream sample expansion methods cannot handle the data noise problem well. We propose a multipath sample expansion method (AMLI) based on the idea of linear interpolation, which mainly solves the problem of insufficient prediction sample size or large error between the observed sample and the actual distribution. The rationale of the AMLI method is to divide the original feature space into several subspaces with equal samples, randomly extract a sample from each subspace as a class, and then perform linear interpolation on the samples in the same class (i.e., K-path linear interpolation). After the AMLI processing, valid samples are greatly expanded, the sample structure is adjusted, and the average noise of the samples is reduced so that the prediction effect of the machine learning model is improved. The hyperparameters of this method have an intuitive explanation and usually require little calibration. We compared the proposed method with a variety of machine learning prediction methods and demonstrated that the AMLI method can significantly improve the prediction result. We also propose an AMLI plus method based on the linear interpolation between classes by combining the idea of AMLI with the clustering method and present theoretical proofs of the effectiveness of the AMLI and AMLI plus methods. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

► Show Figures

Figure 1

15 pages, 307 KiB

Open AccessArticle

Reduced Clustering Method Based on the Inversion Formula Density Estimation

by Mantas Lukauskas and Tomas Ruzgas

Mathematics 2023, 11(3), 661; https://0-doi-org.brum.beds.ac.uk/10.3390/math11030661 - 28 Jan 2023

Cited by 3 | Viewed by 1300

Abstract

Unsupervised learning is one type of machine learning with an exceptionally high number of applications in various fields. The most popular and best-known group of unsupervised machine learning methods is clustering methods. The main goal of clustering is to find hidden relationships between [...] Read more.

Unsupervised learning is one type of machine learning with an exceptionally high number of applications in various fields. The most popular and best-known group of unsupervised machine learning methods is clustering methods. The main goal of clustering is to find hidden relationships between individual observations. There is great interest in different density estimation methods, especially when there are outliers in the data. Density estimation also can be applied to data clustering methods. This paper presents the extension to the clustering method based on the modified inversion formula density estimation to solve previous method limitations. This new method’s extension works within higher dimensions (d > 15) cases, which was the limitation of the previous method. More than 20 data sets are used in comparative data analysis to prove the effectiveness of the developed method improvement. The results showed that the new method extension positively affects the data clustering results. The new reduced clustering method, based on the modified inversion formula density estimation, outperforms popular data clustering methods on test data sets. In cases when the accuracy is not the best, the data clustering accuracy is close to the best models’ obtained accuracies. Lower dimensionality data were used to compare the standard clustering based on the inversion formula density estimation method with the extended method. The new modification method has better results than the standard method in all cases, which confirmed the hypothesis about the new method’s positive impact on clustering results. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

15 pages, 1702 KiB

Open AccessArticle

SimSST: An R Statistical Software Package to Simulate Stop Signal Task Data

by Mohsen Soltanifar and Chel Hee Lee

Mathematics 2023, 11(3), 500; https://0-doi-org.brum.beds.ac.uk/10.3390/math11030500 - 17 Jan 2023

Viewed by 1701

Abstract

The stop signal task (SST) paradigm with its original roots in 1948 has been proposed to study humans’ response inhibition. Several statistical software codes have been designed by researchers to simulate SST data in order to study various theories of modeling response inhibition [...] Read more.

The stop signal task (SST) paradigm with its original roots in 1948 has been proposed to study humans’ response inhibition. Several statistical software codes have been designed by researchers to simulate SST data in order to study various theories of modeling response inhibition and their assumptions. Yet, there has been a missing standalone statistical software package to enable researchers to simulate SST data under generalized scenarios. This paper presents the R statistical software package “SimSST”, available in Comprehensive R Archive Network (CRAN), to simulate stop signal task (SST) data. The package is based on the general non-independent horse race model, the copulas in probability theory, and underlying ExGaussian (ExG) or Shifted Wald (SW) distributional assumption for the involving go and stop processes enabling the researchers to simulate sixteen scenarios of the SST data. A working example for one of the scenarios is presented to evaluate the simulations’ precision on parameter estimations. Package limitations and future work directions for its subsequent extensions are discussed. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

► Show Figures

Figure 1

24 pages, 1281 KiB

Open AccessArticle

Computational Analysis of XLindley Parameters Using Adaptive Type-II Progressive Hybrid Censoring with Applications in Chemical Engineering

by Refah Alotaibi, Mazen Nassar and Ahmed Elshahhat

Mathematics 2022, 10(18), 3355; https://0-doi-org.brum.beds.ac.uk/10.3390/math10183355 - 15 Sep 2022

Cited by 12 | Viewed by 1337

Abstract

This work addresses the estimation issues of the XLindley distribution using an adaptive Type-II progressive hybrid censoring scheme. Maximum likelihood and Bayesian approaches are used to estimate the unknown parameter, reliability, and hazard rate functions. Bayesian estimators are explored under the assumption of [...] Read more.

This work addresses the estimation issues of the XLindley distribution using an adaptive Type-II progressive hybrid censoring scheme. Maximum likelihood and Bayesian approaches are used to estimate the unknown parameter, reliability, and hazard rate functions. Bayesian estimators are explored under the assumption of independent gamma priors and a symmetric loss function. The approximate confidence intervals and the highest posterior density credible intervals are also computed. An extensive simulation study that takes into account various sample sizes and censoring schemes is implemented to evaluate the various estimating methods. Finally, for an explanation, two real data sets from the chemical engineering field are provided to show that the XLindley distribution is the best model compared to some competitive models for the same real data. The Bayesian paradigm utilizing the Metropolis–Hastings algorithm to generate samples from the posterior distribution is recommended to estimate any parameter of life of the XLindley distribution when data are obtained from adaptive Type-II progressively hybrid censored sample. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

► Show Figures

Figure 1

19 pages, 3098 KiB

Open AccessArticle

Optimal Weighted Multiple-Testing Procedure for Clinical Trials

by Hanan Hammouri, Marwan Alquran, Ruwa Abdel Muhsen and Jaser Altahat

Mathematics 2022, 10(12), 1996; https://0-doi-org.brum.beds.ac.uk/10.3390/math10121996 - 09 Jun 2022

Cited by 1 | Viewed by 1373

Abstract

This paper describes a new method for testing randomized clinical trials with binary outcomes, which combines the O’Brien and Fleming (1979) multiple-testing procedure with optimal allocations and unequal weighted samples simultaneously. The O’Brien and Fleming method of group sequential testing is a simple [...] Read more.

This paper describes a new method for testing randomized clinical trials with binary outcomes, which combines the O’Brien and Fleming (1979) multiple-testing procedure with optimal allocations and unequal weighted samples simultaneously. The O’Brien and Fleming method of group sequential testing is a simple and effective method with the same Type I error and power as a fixed one-stage chi-square test, with the option to terminate early if one treatment is clearly superior to another. This study modified the O’Brien and Fleming procedure, resulting in a more flexible new procedure, where the optimal allocation assists in allocating more subjects to the winning treatment without compromising the integrity of the study, while unequal weighting allows for different samples to be chosen for different stages of a trial. The new optimal weighted multiple-testing procedure (OWMP), based on simulation studies, is relatively robust to the added features because it showed a high preference for decreasing the Type I error and maintaining the power. In addition, the procedure was illustrated using simulated and real-life examples. The outcomes of the current study suggest that the new procedure is as effective as the original. However, it is more flexible. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

► Show Figures

Figure 1

21 pages, 589 KiB

Open AccessArticle

Impact of Stratum Composition Changes on the Accuracy of the Estimates in a Sample Survey

by Danutė Krapavickaitė

Mathematics 2022, 10(7), 1093; https://0-doi-org.brum.beds.ac.uk/10.3390/math10071093 - 28 Mar 2022

Cited by 2 | Viewed by 1295

Abstract

The study is devoted to measuring the impact of the element changes on the bias and variance of the estimator of the total in a sample business survey. Stratified simple random sampling is usually used in business surveys. Enterprises may join, split or [...] Read more.

The study is devoted to measuring the impact of the element changes on the bias and variance of the estimator of the total in a sample business survey. Stratified simple random sampling is usually used in business surveys. Enterprises may join, split or change the stratum between sample selection and data collection. Assuming a model for enterprises joining and a model for the enterprises changing the stratum with some probability, expressions for the adjusted estimators of the total and the adjusted estimators of their variances are proposed. The influence of the enterprise changes on the variances of the estimators of the total is measured by the relative differences, i.e., by comparing them with the estimators, if there were no changes. The analytic results are illustrated with a simulation study using modified enterprise data. The simulation results demonstrate a large impact of the enterprise changes on the accuracy of the estimates, even in the case of the low probability of changes. The simulation results justify the need for adjustment of the enterprise changes between the sample selection and data collection, in order to improve the accuracy of results and the adjustment method available. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

► Show Figures

Figure 1

11 pages, 381 KiB

Open AccessArticle

Subgroup Identification and Regression Analysis of Clustered and Heterogeneous Interval-Censored Data

by Xifen Huang and Jinfeng Xu

Mathematics 2022, 10(6), 862; https://0-doi-org.brum.beds.ac.uk/10.3390/math10060862 - 08 Mar 2022

Cited by 1 | Viewed by 1759

Abstract

Clustered and heterogeneous interval-censored data occur in many fields such as medical studies. For example, in a migraine study with the Netherlands Twin Registry, the information including time to diagnosis of migraine and gender was collected for 3975 monozygotic and dizygotic twins. Since [...] Read more.

Clustered and heterogeneous interval-censored data occur in many fields such as medical studies. For example, in a migraine study with the Netherlands Twin Registry, the information including time to diagnosis of migraine and gender was collected for 3975 monozygotic and dizygotic twins. Since each study subject is observed only at discrete and periodic follow-up time points, the failure times of interest (i.e., the time when the individual first had a migraine) are known only to belong to certain intervals and hence are interval-censored. Furthermore, these twins come from different genetic backgrounds and may be associated with differential risks for developing migraines. For simultaneous subgroup identification and regression analysis of such data, we propose a latent Cox model where the number of subgroups is not assumed a priori but rather data-driven estimated. The nonparametric maximum likelihood method and an EM algorithm with monotone ascent property are also developed for estimating the model parameters. Simulation studies are conducted to assess the finite sample performance of the proposed estimation procedure. We further illustrate the proposed methodologies by an empirical analysis of migraine data. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

► Show Figures

Figure 1

9 pages, 457 KiB

Open AccessEditor’s ChoiceArticle

Generalized Confidence Intervals for Zero-Inflated Pareto Distribution

by Xiao Wang and Xinmin Li

Mathematics 2021, 9(24), 3272; https://0-doi-org.brum.beds.ac.uk/10.3390/math9243272 - 16 Dec 2021

Cited by 5 | Viewed by 2481

Abstract

This paper considers interval estimations for the mean of Pareto distribution with excess zeros. Three approaches for interval estimation are proposed based on fiducial generalized pivotal quantities (FGPQs), respectively. Simulation studies are performed to assess the performance of the proposed methods, along with [...] Read more.

This paper considers interval estimations for the mean of Pareto distribution with excess zeros. Three approaches for interval estimation are proposed based on fiducial generalized pivotal quantities (FGPQs), respectively. Simulation studies are performed to assess the performance of the proposed methods, along with three measurements to determine comparisons with competing approaches. The advantages and disadvantages of each method are provided. The methods are illustrated using a real phone call dataset. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

► Show Figures

Figure 1

23 pages, 431 KiB

Open AccessArticle

On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures

by Luis Castro-Martín, María del Mar Rueda, Ramón Ferri-García and César Hernando-Tamayo

Mathematics 2021, 9(23), 2991; https://0-doi-org.brum.beds.ac.uk/10.3390/math9232991 - 23 Nov 2021

Cited by 8 | Viewed by 2254

Abstract

In the last years, web surveys have established themselves as one of the main methods in empirical research. However, the effect of coverage and selection bias in such surveys has undercut their utility for statistical inference in finite populations. To compensate for these [...] Read more.

In the last years, web surveys have established themselves as one of the main methods in empirical research. However, the effect of coverage and selection bias in such surveys has undercut their utility for statistical inference in finite populations. To compensate for these biases, researchers have employed a variety of statistical techniques to adjust nonprobability samples so that they more closely match the population. In this study, we test the potential of the XGBoost algorithm in the most important methods for estimation that integrate data from a probability survey and a nonprobability survey. At the same time, a comparison is made of the effectiveness of these methods for the elimination of biases. The results show that the four proposed estimators based on gradient boosting frameworks can improve survey representativity with respect to other classic prediction methods. The proposed methodology is also used to analyze a real nonprobability survey sample on the social effects of COVID-19. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

► Show Figures

Figure 1

18 pages, 2213 KiB

Open AccessEditor’s ChoiceArticle

A Bayesian-Deep Learning Model for Estimating COVID-19 Evolution in Spain

by Stefano Cabras

Mathematics 2021, 9(22), 2921; https://0-doi-org.brum.beds.ac.uk/10.3390/math9222921 - 16 Nov 2021

Cited by 13 | Viewed by 2382

Abstract

This work proposes a semi-parametric approach to estimate the evolution of COVID-19 (SARS-CoV-2) in Spain. Considering the sequences of 14-day cumulative incidence of all Spanish regions, it combines modern Deep Learning (DL) techniques for analyzing sequences with the usual Bayesian Poisson-Gamma model for [...] Read more.

This work proposes a semi-parametric approach to estimate the evolution of COVID-19 (SARS-CoV-2) in Spain. Considering the sequences of 14-day cumulative incidence of all Spanish regions, it combines modern Deep Learning (DL) techniques for analyzing sequences with the usual Bayesian Poisson-Gamma model for counts. The DL model provides a suitable description of the observed time series of counts, but it cannot give a reliable uncertainty quantification. The role of expert elicitation of the expected number of counts and its reliability is DL predictions’ role in the proposed modelling approach. Finally, the posterior predictive distribution of counts is obtained in a standard Bayesian analysis using the well known Poisson-Gamma model. The model allows to predict the future evolution of the sequences on all regions or estimates the consequences of eventual scenarios. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

► Show Figures

Figure 1

16 pages, 311 KiB

Open AccessArticle

Simultaneous Comparison of Sensitivities and Specificities of Two Diagnostic Tests Adjusting for Discrete Covariates

by José Antonio Roldán-Nofuentes

Mathematics 2021, 9(17), 2029; https://0-doi-org.brum.beds.ac.uk/10.3390/math9172029 - 24 Aug 2021

Viewed by 1398

Abstract

Adjusting for covariates is important in the study of the performance of diagnostic tests. In this manuscript, the simultaneous comparison of the sensitivities and specificities of two binary diagnostic tests is studied when discrete covariates are observed in all of the individuals in [...] Read more.

Adjusting for covariates is important in the study of the performance of diagnostic tests. In this manuscript, the simultaneous comparison of the sensitivities and specificities of two binary diagnostic tests is studied when discrete covariates are observed in all of the individuals in the sample. Four methods are presented to simultaneously compare the two sensitivities and the two specificities: a global hypothesis test and three other methods based on individual comparisons. The maximum likelihood method was applied to adjust the overall estimators of sensitivities and specificities. Simulation experiments were carried out to study the asymptotic behaviors of the four proposed methods when the covariate is binary, giving general rules of application. The results were applied to a real example. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Applications)

Journal Menu

Journal Browser

Advances in Computational Statistics and Applications

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (11 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI