Next Article in Journal
Leaving No Stone Unturned: Flexible Retrieval of Idiomatic Expressions from a Large Text Corpus
Next Article in Special Issue
Templated Text Synthesis for Expert-Guided Multi-Label Extraction from Radiology Reports
Previous Article in Journal
A Combined Short Time Fourier Transform and Image Classification Transformer Model for Rolling Element Bearings Fault Diagnosis in Electric Motors
 
 
Article
Peer-Review Record

Automatic Feature Selection for Improved Interpretability on Whole Slide Imaging

Mach. Learn. Knowl. Extr. 2021, 3(1), 243-262; https://0-doi-org.brum.beds.ac.uk/10.3390/make3010012
by Antoine Pirovano 1,2,*, Hippolyte Heuberger 1, Sylvain Berlemont 1, SaÏd Ladjal 2 and Isabelle Bloch 2,3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Mach. Learn. Knowl. Extr. 2021, 3(1), 243-262; https://0-doi-org.brum.beds.ac.uk/10.3390/make3010012
Submission received: 16 January 2021 / Revised: 10 February 2021 / Accepted: 13 February 2021 / Published: 22 February 2021

Round 1

Reviewer 1 Report

This paper explores the interpretability of deep learning models used in histopathological image analysis, which is an interesting and important problem. The authors claimed that they propose a piece-wise interpretability approach based on gradient-based methods. To my view, the technical novelty is limited, but it has contributions for the application in computational pathology domain. The evaluations have been performed by using the Camelyon-16 dataset with 345 WSIs for tumor vs normal classification problem. The followings list my comments with the hope to improve paper quality further.

The major comments:

  • The authors should publicize the codes for testing by other researchers, as this is important in terms of reproduction nowadays.
  • For the proposed methods section, the approach is mainly described by answering three questions. Is it possible to provide some kinds of intermediate resulting pictures to explain the method, such that it is more understandable by readers?
  • Figure 4 is not very clear. More explanations should be provided.
  • In the lines 249-250, it is mentioned that ‘using this average normalized activation as a tumor prediction score’. Could authors more clearly explain what is the average normalized activation here?
  • In the Table 1, how do you get the improvement of location AUC by 29.2% and 75.5% respectively? Please double check if these numbers are correct or not.
  • In the lines 271-274, the following sentence is not clear: ‘We can observe that our feature-based heat-maps are the ones impacting the most the performances, that CHOWDER tile scores heat-maps are complete and relevant and that attention-based tile scores heat-maps are as complete as random heatmaps which confirms the results in Table 1.’ The authors should rephase the sentence.
  • For interpretation purpose in deep learning, there have been a lot of relevant techniques designed by previous researcher, such as the paper “ J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, Hod Lipson, Understanding Neural Networks Through Deep Visualization” http://yosinski.com/media/papers/Yosinski__2015__ICML_DL__Understanding_Neural_Networks_Through_Deep_Visualization__.pdf.

If the authors could make some forms of comparisons or explanations compared with the presented method, which would be more convincing.

The minor comments are:

  • In the line 57, the abbreviations ‘CHOWDER’ and ‘WELDON’ should be provided with the full names appeared in the first time in the paper.
  • In the Figure 3, the texts in the figure are not very visible, unless the pictures are zoomed in.
  • Figure 4 has several subfigures. The authors should explain the meaning for each subfigure in the captions.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors provide the audience with further heat-maps improvement validation and feature-related investigations. The background helpfully explains this paper’s motivation; however, the authors may want to explore more articles exploring pre-trained CNN models such as VGG16 and ResNet (although used for authors’ experiments). In particular, the authors stated that tile descriptors extracted by ResNet-50 were used for their investigation but didn’t mention comparative researches in the background.

 

The authors offer little explanation of research efforts on tissue detection and tiling processes before showing the overview of the proposed method. The WSI classification architecture is not that simple, particularly when authors can see articles such as QuPath and HistomicsML2.

 

The tiling and feature extraction from the tiles are clearly defined in Figure 2. However, the authors may want to provide more information about color normalization steps. If not, at least the authors may also want to mention why color normalization is not mentioned.

 

A brief attribution of CHOWDER in Figure 3 could give the audience a better idea of how the min/max scores are distributed in predictions. However, the audience may also want to see the distribution of dots inside each boxplot. 

 

The conclusion states that “it is not conceivable to ask a medical expert to analyze deeply more than 50 features individually”. However, it could be criticized when fixing the number of features. In my opinion, the paper would be more reliable if the authors add a discussion session instead of merely insisting on specific results.

Some minor issues:

The range of the AUC (y-axis) is arbitrarily assigned In Figure 8, Figure 9, and Figure 10.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop