Next Article in Journal
Towards the Feasibility of Long Range Wireless Power Transfer over an Ocean Surface
Next Article in Special Issue
Shear Wave Velocity Estimation Based on Deep-Q Network
Previous Article in Journal
Four-Qubit Cluster States Generation through Multi-Coin Quantum Walk
Previous Article in Special Issue
Generation of Controlled Synthetic Samples and Impact of Hyper-Tuning Parameters to Effectively Classify the Complex Structure of Overlapping Region
 
 
Article
Peer-Review Record

Unsupervised and Supervised Feature Selection for Incomplete Data via L2,1-Norm and Reconstruction Error Minimization

by Jun Cai *, Linge Fan, Xin Xu and Xinrong Wu
Reviewer 1: Anonymous
Submission received: 20 July 2022 / Revised: 25 August 2022 / Accepted: 26 August 2022 / Published: 31 August 2022

Round 1

Reviewer 1 Report

The authors present unsupervised and supervised feature selection methods for datasets with missing values, introducing a reconstruction error minimization using the L2,1-norm. The topic is relevant for the journal audience and the proposed methodology is sufficiently novel. My main comments refer to the shortcomings in the validation of the methodology, in particular the missing statistical assessment of the significance between the average performance differences between the compared methods, the lack of a discussion of different types of data missingness (missing at random vs. not missing at random) and their influence on the choice of methodologies, and the missing consideration of state-of-the art imputation approaches, such as multiple imputation, in the comparative evaluation (see Main comments below).

Main comments:
1) When the authors evaluate their proposed methodology and compare it against other approaches, they present only average performance statistics, such as the ACC and NMI (see Table 2), but no measure of the variance of these statistics (e.g. standard deviations) is shown and no statistical test is applied to assess the significance of the difference. While the proposed UFS-ID approach may perform well on average compared to the other methods, if the variation in the performance estimates is high, then the performance improvements may not be statistically significant. Thus, the authors should apply at least one statistical test, e.g. the Friedman test together with a post-hoc analysis, to show whether average differences in performance between methods are also statistically significant.

2) In the discussion of methodologies to handle incomplete data, the authors do not mention that there are different types of data missingness, which can affect the suitability of methods for imputation or feature selection / classification without imputation. Data may be missing (completely) at random, or it may not be missing at random, i.e. there may be systematic differences between the samples with and without missing values which can or cannot be entirely explained by other observed variables. If data is not missing at random, then many common imputation  methods or imputation-independent feature selection methods may not work. The authors do not discuss this important aspect related to the type of data missingness, although it is highly relevant for the selection of suitable data analysis approaches.

3) While the authors include multiple previous approaches in the comparative analyses, widely used state-of-the-art methodologies are not considered. In particular, the commonly applied multiple imputation approach is not mentioned in the manuscript, but also other current methodologies, such as Random Forest Imputation, Discriminative Deep Learning Imputation, Generative Deep Learning Imputation, Variational Autoencoder Imputation and Generative Adversarial Network Imputation are neither discussed nor included in the benchmark analysis. At least a few representative methods for these new imputation approaches should be included in the comparison.

Minor comments:
1) The manuscript contains several language errors (e.g. "the complete information of dataset...", "most of data technologies are based..." etc.) which should be corrected.
2) Wrapper-based feature selection is called "packing" in the manuscript, which is a term not used in the machine learning field. The authors should make sure to use the appropriate terminology or explain why they decide to use a different terminology.
3) Figures 3 and 4: Error bars or standard deviations should be shown in these plots, so that the reader can make an informed judgement on whether the differences between different methods are small or large in relation to the variance in the performance measures.

 

 

Author Response

Dear reviewer,

        Thank you very much for your time and effort put in reviewing our manuscript. We also thank you for your valuable comments that helped to improve the quality of this paper. In the following attachment, we address your concerns in the order that they are mentioned.

        Please see the attachment. Thank again!

Best regards. 

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper proposed unsupervised and supervised feature selection methods from incomplete data by further introducing L2,1-norm and reconstruction error minimization. An alternative iterative algorithm to effectively optimize the proposed objective functions and the convergence of the proposed algorithms are also proved theoretically. Extensive experimental studies are performed on both real and synthetic incomplete data sets to demonstrate the performance of the proposed methods. However, I found following suggestions to improve the manuscript:

1. Tuning parameters in Eq. 6 are not defined properly, please explain their initialisation and impact in Eq. 6.

2. How Eq. 8,9, and 13 are derived? pease explain properly.

3. Explanation of Theorem 1 is not proper, please explain derivation of Eq. 17.

4. Too many variables are used in this paper which make this study unclear and hard to understand. Please summarise all variables in a table.

5. Result section is poor, please include more latest studies for comparing the proposed study.

Author Response

Dear reviewer,

        Thank you very much for your time and effort put in reviewing our manuscript. We also thank you for your valuable comments that helped to improve the quality of this paper. In the following attachment, we address your concerns in the order that they are mentioned.

        Please see the attachment. Thank again!

Best regards. 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have addressed my main comments and improved the manuscript.

Reviewer 2 Report

Authors have revised properly according to my comments.

Back to TopTop