Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Improved Surprise Adequacy Tools for Corner Case Data Description and Detection

Appl. Sci. 2021, 11(15), 6826; https://0-doi-org.brum.beds.ac.uk/10.3390/app11156826

by Tinghui Ouyang^1,2,*

, Vicent Sanz Marco², Yoshinao Isobe³, Hideki Asoh¹, Yutaka Oiwa² and Yoshiki Seo²

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2021, 11(15), 6826; https://0-doi-org.brum.beds.ac.uk/10.3390/app11156826

Submission received: 23 June 2021 / Revised: 16 July 2021 / Accepted: 19 July 2021 / Published: 25 July 2021

(This article belongs to the Special Issue AI Engineering: Software Engineering for Artificial Intelligence—Development of Complex Machine Learning Systems)

Round 1

Reviewer 1 Report

The motivation of the paper is sound, research is relevant to the current focus and concerns in deep learning. Proposing a new approach to corner case research via modification of surprise adequacy (SA). The paper is well written with minor additions needed. Also would be nice if a GitHub repo is included to test against the original Kim 2018 work.

Abstract:

Not very clear what this new approach is really doing based on the abstract.

Introduction:

Can you provide a more quantitative discussion of the error tolerance level?
What other examples besides autonomous driving that corner case failure can be critical?
Please don’t use “this” or “this paper” as it can be ambiguous what you really mean.

Corner cases detection based on SA:

Are the modifications additive or exclusive of each other?
What is the next step after capturing the corner cases? It's not very clear what to do after the corner cases are identified.

Experiments and Evaluation

How much extra computation workload do the 3 modifications add? Given the setup in 4.1, what is the end-to-end runtime w/o DSA?
Please include a subsection on caveat and discussion of some limitations of this approach?

Author Response

Abstract:

Not very clear what this new approach is really doing based on the abstract.

Response: Thanks for the reviewer’s comment. We have rewritten the abstract, and expect to make the paper’s purpose clear, as presented below

Introduction:

Can you provide a more quantitative discussion of the error tolerance level?

Response: Thanks for the reviewer’s comment. If the reviewer means AI models’ tolerance level to corner cases, authors don’t think there is a unified standard for all cases. For emergency diseases diagnosis or self-driving, the tolerance level would be low, since a wrong decision caused by corner cases would lead to serious loss on life or properties. For other applications with low safety requirements, their error tolerance level would be higher correspondingly. For this issue, we will also plan to do some research in the future AI quality assurance study by considering different safety level. Thanks for this comment again!

2. What other examples besides autonomous driving that corner case failure can be critical?

Response: Thanks for this comment. Authors added two more corner case examples (such as in medical diagnosis and wind engineering) in the revised manuscript. Details are also presented as below

“For instances, in cerebral small vessel disease, due to the heterogeneous parenchymal damage in morphology and size [10], corner cases happen to cause DL models making wrong decision. The other corner case study in wind engineering is ramp study, which brings harm to electricity industry’s safety [11].”

3. Please don’t use “this” or “this paper” as it can be ambiguous what you really mean.

Response: Thanks for the comment. Authors have corrected this issue in the revised manuscript. Thank you for your valuable suggestion again.

Corner cases detection based on SA:

Are the modifications additive or exclusive of each other?

Response: Thanks for the reviewer’s comment. In the manuscript, these modifications are additive to each other. Some evidences could be found and complemented to make this issue clear. For example:

Due to the specific case shown in Figure 3, so we develop the Modificaiton-1, as “to evaluate if a data sample belongs to corner case, its own novelty respect to all classes seems more important. Hence, we modify the original DSA”
Due to shortages of DSA0 and DSA1 shown in Figure 4, we develop the Modification-2, as “DSA definitions may have a common shortage on processing pair-wise rare data points, especially on describing behaviors of corner case”
Considering the issue “corner-case data usually have some obvious characteristics of closing to boundary or outlier location”, shown in Figure 5, therefore we develop DSA2 by replacing the global descriptor with local descriptor.

Through the above explanation, we can see the development of these modifications are additive.

2. What is the next step after capturing the corner cases? It's not very clear what to do after the corner cases are identified.

Response: Thanks for the comments. Based on the comment, authors have complemented the description on what we can do with the identification of corner cases, as the description shown below

“Afterwards, many potential studies related to corner cases can be considered, e.g. corner case's influence on modeling accuracy, data quality assurance, AI testing quality study, robustness and safety analysis with consideration of corner cases, and so on”

Moreover, the description on future work in the ‘Conclusion’ part also mention the next step with the usage of detected corner cases, as described below

“studying the usage of detected corner case data, some possible studies like AI robustness analysis, stability and dependability in AI quality assurance, may be able to make use of these corner case data.”

Thanks again for the reviewer’s comments!

Experiments and Evaluation

How much extra computation workload do the 3 modifications add? Given the setup in 4.1, what is the end-to-end runtime w/o DSA?

Response: Thanks very much for the comment. The extra computation time of DSAs involves a quadratic number of distance calculation. Compared with DSA0, DSA1 and DSA3 have the same time complexity, but DSA2 has a relatively low runtime since its computation time is proportional to the number of categories, instead of the training set size in DSA0/1/3. Detailed explanation is added in the revised manuscript, as presented below

“Extra time consumption is unavoidable in corner case study. In the DSA computation process, the distances of testing points to the whole training set on activation trace should be computed, so its computation time is proportional to the training set size [24]. For the mentioned four DSAs in this paper, DSA0/1/ 3 have the same time complexity as the above description, except DSA2 whose computation time is proportional to the number of categories. Then, corner case detection in the whole testing set would be quadratic with the testing set and training set size.”

2. Please include a subsection on caveat and discussion of some limitations of this approach?

Response: Thanks for the reviewer’s comment. According to the comment, we complemented an extra subsection discussing the limitations of the proposed approach in this paper, as presented below

“(5)Discussion

While, besides the positive results from the above studies on using DSA and its modifications for corner case identification, there are also some negative points should be noticed. The first one is the application limitation of DSA only for classification problems. When dealing with other machine learning applications, e.g. regression, how to copy with the corner cases would be a challenge for the research in this paper. The second one is the limitation in DSA's modification. Even though three modifications of DSA are proposed in this paper, and achieve relatively good performance on corner case detection, DSAs' abilities on discovering more novelty behaviors are limited since these modifications are additive to the original DSA definition. Therefore, more exploration on SA definitions can be studied, e.g. a novelty variant using silhouette coefficient was proposed in [24] The third point is the time overhead in DSA computation. Extra time consumption is unavoidable in corner case study. In the DSA computation process, the distances of testing points to the whole training set on activation trace should be computed, so its computation time is proportional to the training set size [37]. Then, corner case detection in the whole testing set would be quadratic with the testing set and training set size. However, compared with other methods adopting mutation and multiple testing [38,39] for corner case identification, DSAs' computation time is relatively acceptable in corner case detection.”

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper deals with the topic of corner case detection in Deep Learning. authors propose a novel method based on distance-based surprise adequacy (DSA). Authors also propose three DSA modifications in order to improve the capability of the original DSA on describing corner case data’s behaviors. Some experiments have been conducted using MNIST and CIFAR-10
datasets and an industrial classification data, and the results are shown.

The paper has a good grade of novelty and scientifically sounds. The results of the experiments conducted are interesting.

The main issue with this paper is, in my opinion, the presentation of experiments results. There are many figures that are too small and difficult to read, and the comment to results is too concise. Authors need to add more detail to results.

Moreover, I suggest to add to the description of related work in their introduction Section, also this paper:
Ma, W., Papadakis, M., Tsakmalis, A., Cordy, M., & Traon, Y. L. (2021). Test selection for deep learning systems. ACM Transactions on Software Engineering and Methodology (TOSEM), 30(2), 1-22.

It is a paper that deals with similar topic, and the comparison of Authors' work against methods and results in that paper might be interesting.

Author Response

Response: Thank you very much for these valuable comments. According to the reviewer’s suggestion, we have revised our manuscript from the following two aspects.

First, we re-draw figures which are difficult for reading, e.g. Figure 6, Figure 8, Figure 11-13. Moreover, we complemented more details to some experiment results to make the finding clearer, seeing the “Experiment” part in the revised manuscript.

Second, the reference provided by the reviewer is valuable, and added in the introduction section. Meanwhile, we complemented a brief discussion related to this reference, as below

“In [24], DSA was considered as an uncertainty metric to identify misclassified inputs”

Thanks again for the reviewer’s valuable comments again!

Author Response File: Author Response.pdf

Article Menu

Improved Surprise Adequacy Tools for Corner Case Data Description and Detection

Further Information

Guidelines

MDPI Initiatives

Follow MDPI