Statistical Methods for High-Dimensional and Massive Datasets

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: 31 August 2024 | Viewed by 5365

Special Issue Editor


E-Mail Website
Guest Editor
School of Mathematics, Cardiff University, Cardiff CF10 3AT, UK
Interests: high-dimensional statistics; supervised and unsupervised dimension reduction; computational statistics; machine learning and text data analysis
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years there has been an explosion in the amount of data that researchers in different fields collect.  This creates the need for better statistical methods to analyse the massive (very high number of observations) and high-dimensional (high number of variables) data being collected. Therefore, there has been an interest for the development of theoretically and computationally efficient methodology for this type of data.

This Special Issue will collect a number of papers which provide methodology to analyse both massive and high-dimensional data. We look for methodology in a wide spectrum of areas: computationally efficient algorithms for massive data, real-time algorithms to analyse stream of data, and feature selection and feature extraction methods to analyse high-dimensional data. We are also looking for efficient ways to apply statistical learning methods like clustering, classification, and discrimination in high-dimensional settings. Finally, we are interested in methodology beyond the classical vectorial setting, i.e., for functional, tensorial types of data. Applications to real data in different sciences will also be considered.

Dr. Andreas Artemiou
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • high-dimensional data
  • feature extraction
  • feature selection
  • real-time (online) algorithms
  • classification
  • clustering
  • discrimination
  • dimension reduction
  • supervised and unsupervised methods
  • statistical/machine learning

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 285 KiB  
Article
Adaptive L0 Regularization for Sparse Support Vector Regression
by Antonis Christou and Andreas Artemiou
Mathematics 2023, 11(13), 2808; https://0-doi-org.brum.beds.ac.uk/10.3390/math11132808 - 22 Jun 2023
Viewed by 770
Abstract
In this work, we proposed a sparse version of the Support Vector Regression (SVR) algorithm that uses regularization to achieve sparsity in function estimation. To achieve this, we used an adaptive L0 penalty that has a ridge structure and, therefore, does not [...] Read more.
In this work, we proposed a sparse version of the Support Vector Regression (SVR) algorithm that uses regularization to achieve sparsity in function estimation. To achieve this, we used an adaptive L0 penalty that has a ridge structure and, therefore, does not introduce additional computational complexity to the algorithm. In addition to this, we used an alternative approach based on a similar proposal in the Support Vector Machine (SVM) literature. Through numerical studies, we demonstrated the effectiveness of our proposals. We believe that this is the first time someone discussed a sparse version of Support Vector Regression (in terms of variable selection and not in terms of support vector selection). Full article
(This article belongs to the Special Issue Statistical Methods for High-Dimensional and Massive Datasets)
14 pages, 349 KiB  
Article
An Ensemble Method for Feature Screening
by Xi Wu, Shifeng Xiong and Weiyan Mu
Mathematics 2023, 11(2), 362; https://0-doi-org.brum.beds.ac.uk/10.3390/math11020362 - 10 Jan 2023
Cited by 1 | Viewed by 773
Abstract
It is known that feature selection/screening for high-dimensional nonparametric models is an important but very difficult issue. In this paper, we first point out the limitations of existing screening methods. In particular, model-free sure independence screening methods, which are defined on random predictors, [...] Read more.
It is known that feature selection/screening for high-dimensional nonparametric models is an important but very difficult issue. In this paper, we first point out the limitations of existing screening methods. In particular, model-free sure independence screening methods, which are defined on random predictors, may completely miss some important features in the underlying nonparametric function when the predictors follow certain distributions. To overcome these limitations, we propose an ensemble screening procedure for nonparametric models. It elaborately combines several existing screening methods and outputs a result close to the best one of these methods. Numerical examples indicate that the proposed method is very competitive and has satisfactory performance even when existing methods fail. Full article
(This article belongs to the Special Issue Statistical Methods for High-Dimensional and Massive Datasets)
Show Figures

Figure 1

32 pages, 486 KiB  
Article
A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation
by Xiaochao Xia and Hao Ming
Mathematics 2022, 10(24), 4638; https://0-doi-org.brum.beds.ac.uk/10.3390/math10244638 - 07 Dec 2022
Viewed by 809
Abstract
Considering the influence of conditional variables is crucial to statistical modeling, ignoring this may lead to misleading results. Recently, Ma, Li and Tsai proposed the quantile partial correlation (QPC)-based screening approach that takes into account conditional variables for ultrahigh dimensional data. In this [...] Read more.
Considering the influence of conditional variables is crucial to statistical modeling, ignoring this may lead to misleading results. Recently, Ma, Li and Tsai proposed the quantile partial correlation (QPC)-based screening approach that takes into account conditional variables for ultrahigh dimensional data. In this paper, we propose a nonparametric version of quantile partial correlation (NQPC), which is able to describe the influence of conditional variables on other relevant variables more flexibly and precisely. Specifically, the NQPC firstly removes the effect of conditional variables via fitting two nonparametric additive models, which differs from the conventional partial correlation that fits two parametric models, and secondly computes the QPC of the resulting residuals as NQPC. This measure is very useful in the situation where the conditional variables are highly nonlinearly correlated with both the predictors and response. Then, we employ this NQPC as the screening utility to do variable screening. A variable screening procedure based on NPQC (NQPC-SIS) is proposed. Theoretically, we prove that the NQPC-SIS enjoys the sure screening property that, with probability going to one, the selected subset can recruit all the truly important predictors under mild conditions. Finally, extensive simulations and an empirical application are carried out to demonstrate the usefulness of our proposal. Full article
(This article belongs to the Special Issue Statistical Methods for High-Dimensional and Massive Datasets)
19 pages, 623 KiB  
Article
Estimation of Error Variance in Regularized Regression Models via Adaptive Lasso
by Xin Wang, Lingchen Kong and Liqun Wang
Mathematics 2022, 10(11), 1937; https://0-doi-org.brum.beds.ac.uk/10.3390/math10111937 - 06 Jun 2022
Cited by 4 | Viewed by 2118
Abstract
Estimation of error variance in a regression model is a fundamental problem in statistical modeling and inference. In high-dimensional linear models, variance estimation is a difficult problem, due to the issue of model selection. In this paper, we propose a novel approach for [...] Read more.
Estimation of error variance in a regression model is a fundamental problem in statistical modeling and inference. In high-dimensional linear models, variance estimation is a difficult problem, due to the issue of model selection. In this paper, we propose a novel approach for variance estimation that combines the reparameterization technique and the adaptive lasso, which is called the natural adaptive lasso. This method can, simultaneously, select and estimate the regression and variance parameters. Moreover, we show that the natural adaptive lasso, for regression parameters, is equivalent to the adaptive lasso. We establish the asymptotic properties of the natural adaptive lasso, for regression parameters, and derive the mean squared error bound for the variance estimator. Our theoretical results show that under appropriate regularity conditions, the natural adaptive lasso for error variance is closer to the so-called oracle estimator than some other existing methods. Finally, Monte Carlo simulations are presented, to demonstrate the superiority of the proposed method. Full article
(This article belongs to the Special Issue Statistical Methods for High-Dimensional and Massive Datasets)
Show Figures

Figure 1

Back to TopTop