Next Article in Journal
Error Correction Method of TIADC System Based on Parameter Estimation of Identification Model
Next Article in Special Issue
Identification of Areas of Anomalous Tremor of the Earth’s Surface on the Japanese Islands According to GPS Data
Previous Article in Journal
Temporal Evolution of Refractive Index Induced by Short Laser Pulses Accounting for Both Photoacoustic and Photothermal Effects
Previous Article in Special Issue
Integrated Earthquake Catalog of the Eastern Sector of the Russian Arctic
 
 
Article
Peer-Review Record

A Spatial Fuzzy Co-Location Pattern Mining Method Based on Interval Type-2 Fuzzy Sets

by Jinyu Guo and Lizhen Wang *
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4:
Appl. Sci. 2022, 12(12), 6259; https://doi.org/10.3390/app12126259
Submission received: 29 May 2022 / Revised: 15 June 2022 / Accepted: 16 June 2022 / Published: 20 June 2022
(This article belongs to the Collection Geoinformatics and Data Mining in Earth Sciences)

Round 1

Reviewer 1 Report

No comments

Author Response

First of all, we are very grateful for your compliments on our work. Secondly, we also thank you for your valuable comments.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors submitted an interesting manuscript dealing with uncertainty associated with type-1 fuzzy membership functions in mining spatial fuzzy co-location patterns. However, the authors should revise some sections of the manuscript by providing short summaries. Below are some comments and suggestions to improve the overall quality of the manuscript:

Lines 35-37, 48-52: Please provide recent relavant references.

In the section of introduction the authors should provide a review of previous studies which have been conducted and related to the topic proposed by the authors. They should indicate the imprortance of their proposed method compared to these previous studies. This review is provided in the section 2 but I think the authors should put it in the introduction before introducing the purpose of this paper.

The section 3 should be very brief, please revise it and provide a short summary.

The section 4 is too long, the authors should summarize their proposed methodology and present most important steps.

Author Response

The authors submitted an interesting manuscript dealing with uncertainty associated with type-1 fuzzy membership functions in mining spatial fuzzy co-location patterns. However, the authors should revise some sections of the manuscript by providing short summaries. Below are some comments and suggestions to improve the overall quality of the manuscript:

Lines 35-37, 48-52: Please provide recent relevant references.

Response:

We have added recent relevant references in the Lines 35-37, 48-52.

 

In the section of introduction the authors should provide a review of previous studies which have been conducted and related to the topic proposed by the authors. They should indicate the importance of their proposed method compared to these previous studies. This review is provided in the section 2 but I think the authors should put it in the introduction before introducing the purpose of this paper.

Response:

We have put previous studies in the introduction before introducing the purpose of this paper.

 

The section 3 should be very brief, please revise it and provide a short summary.

Response:

We have simplified the section 3 and added a short summary before the corresponding paragraph.

 

The section 4 is too long, the authors should summarize their proposed methodology and present most important steps.

Response:

We simplified the section 4 and put a brief description of the important steps at the beginning of the corresponding paragraph.

 

First of all, we are very grateful for your compliments on our work. Secondly, we also thank you for your valuable comments.

Author Response File: Author Response.docx

Reviewer 3 Report

The authors tackle the problem of exhaustively mining co-location patterns using the notion of fuzzy sets. Instead of type 1 fuzzy sets which assign a single probability for membership, type 2 fuzzy sets as proposed recognise that the single probability in type-1 fuzzy sets may also inherently contain errors. Further, instead of only using location information for co-location, the authors recognise that entities in the same location can have different attributes, which can also be assigned a membership function and mined for 'co-location' with a different 'dist' in attribute space'

The authors devise a complete algorithm framework to implement the mining strategy efficiently for arbitrary databases consisting:

1) degree membership evaluation from 'expert evaluation',

2) fitting the degree membership distribution for each attribute using linear regression and elliptic curve modification to model the fitting deviation.

3) Complete generation of all spatial trees to enable exhaustive generation of potential colocation patterns.

4) definition and propagation of the error intervals (fuzzy) to allow evaluation of the error in combination of features and in particular derivation of a prevalence scoring of a colocation pattern of features.

5) implementation of pruning strategy on the prevalence score to suggest a minimal set of 'significant' co-occurring patterns.

Overall comments.

- in general the paper is very detailed and well-written and the choice of example with geology and metals highlight well the need to take into account attribute properties in addition to spatial. However it is rather technical and difficult for readers not fully aware of the terminology and mathematical notation used in the fuzzy set literature. It was very helpful for readers to have an easy explanation of the pseudocode algorithms. Similarly it would help to have a much more intuitive explanation of participation ratio i.e. PR(fuzz_f, f_i) is the probabily of instances of f_i in the colocation pattern fuzz_f relative to all instances of f_i in the dataset, especially since this is a standard measure in the literature and is crucial in how the final colocalisation patterns are mined.

- there is a lot of mathematical terminology and notation. It would be useful to have an appendix to summarise and explain in general terminology.

- the main novelty of this work seems to be 1) the usage of a elliptic curve which gives significant advantage of needing only 1 free parameter, g to model the uncertainty and 2) the proposal of a prevalence measure (definition 10) that enables thresholding. My main comments for this are:

- (use of elliptic curve), the one parameter is a significant advantage for fitting but also limits the uncertainty that can be estimated and presumably how the feature e.g. Cu levels is partitioned. I note in particular the usage of 2 thresholds for middle level. How in general should a user create the levels, e.g. should they fit a piecewise linear function first? I also note e.g. for Cu levels, the low covers 6-20, mid 10-80, and high 60-120, that is the concentration intervals overlap. Should this generally be the case? or was this done in order to fit an elliptic curve?

- are there heurisitics / rules to find the granule size i.e. how was 0.1 decided upon?

- (prevalence measure), the prevalence measure is required in order to be able to have a single scalar quantity for thresholding on and determine if the co-location pattern is significant or 'prevalent. This measure is effectively a linear interpolation over the upper and lower bound with a user defined cutoff at minprev. Presumably, minprev is not necessarily the same over different FCPs, particularly when multiple variables are now considered together or is this effectively taken into account due to the specification the elliptic curve represents a minimum confidence interval? From a statistical perspective, my question relates to does this type of mining enable equal precision to mine for 4-tuples vs 2-tuples and 1-tuples.

- whilst mentioned in the text, the number of significant FCPs if using the equivalent Type 1 fuzzy set was not specifically given. It would be good to add this and comment on if there are a particular class of FCPs that are likely to be missed, e.g. I might assume worse sensitivity for large k size FCPs ...

- big distance neighbor and small distance neighbor is defined the same in the text. please correct this to reflect that what is meant is that small distance neighbor is the immediate tree connection and big-distance regards clique-clique connectivity.

- there is some confusion between row-filter and join algorithm. Please clarify and use consistent terminology.

- it would be good to add an introduction on difference between type-2 and type-1 fuzzy set for readers with graphical illustration and explain why the elliptic curve and interval methods are necesary for feasible computation

- lines 248-260 is overly complex explanation. please explain more conceptually or leave out as it is obvious from the mathematical definition of prevalence tendency degree.

 

Minor comments

- please define star neighborhood, and mention this is non-complete vs spatial clique approach.

- give names to the pseudocode table for easier finding and reading.

- please define the pruning rate in figures to aid interpretation

- please use the real functional area names i.e. industrial, agricultural, living area,  in figure 5.11, 5.12 to help understanding.

- for section 5.2, the spatial proximity is given and prevalence threshold, what is the min_dist threshold for determining colocation of attributes?

- document compilation error on line 395. the lines don't flow well relative to figure insertion. 

 

 

Author Response

Reviewer #3:

Overall comments.

- in general the paper is very detailed and well-written and the choice of example with geology and metals highlight well the need to take into account attribute properties in addition to spatial. However it is rather technical and difficult for readers not fully aware of the terminology and mathematical notation used in the fuzzy set literature. It was very helpful for readers to have an easy explanation of the pseudocode algorithms. Similarly it would help to have a much more intuitive explanation of participation ratio i.e. PR(fuzz_f, f_i) is the probabily of instances of f_i in the colocation pattern fuzz_f relative to all instances of f_i in the dataset, especially since this is a standard measure in the literature and is crucial in how the final colocalisation patterns are mined.

- there is a lot of mathematical terminology and notation. It would be useful to have an appendix to summarise and explain in general terminology.

- the main novelty of this work seems to be 1) the usage of a elliptic curve which gives significant advantage of needing only 1 free parameter, g to model the uncertainty and 2) the proposal of a prevalence measure (definition 10) that enables thresholding. My main comments for this are:

- (use of elliptic curve), the one parameter is a significant advantage for fitting but also limits the uncertainty that can be estimated and presumably how the feature e.g. Cu levels is partitioned. I note in particular the usage of 2 thresholds for middle level. How in general should a user create the levels, e.g. should they fit a piecewise linear function first? I also note e.g. for Cu levels, the low covers 6-20, mid 10-80, and high 60-120, that is the concentration intervals overlap. Should this generally be the case? or was this done in order to fit an elliptic curve?

Response:

It is feasible to fit the piecewise linear function, but it is worth noting that after fitting the piecewise linear function, the FOU must meet the given confidence and connectivity. This requires us to use complex methods to select appropriate parameters for piecewise linear functions, , which will be a very large workload. Our paper focus on the influence of the uncertainty of membership degree on the fuzzy co-location patterns, so we design such a simple and feasible method.

The overlapping of concentration intervals is a common phenomenon. For example, for a specific value of heavy metal content, some people believe that it has a large tendency to belong to low concentration, but also that it has a small tendency to belong to middle concentration.

 

- are there heurisitics / rules to find the granule size i.e. how was 0.1 decided upon?

Response:

The granule size should be determined according to people's evaluation habits. If the granule size does not conform to people's habitual evaluation interval (too large or too small), then people's subjective evaluation deviation will increase. We choose the particle size as 0.1, which is more in line with people's evaluation habits.

 

- (prevalence measure), the prevalence measure is required in order to be able to have a single scalar quantity for thresholding on and determine if the co-location pattern is significant or 'prevalent. This measure is effectively a linear interpolation over the upper and lower bound with a user defined cutoff at minprev. Presumably, minprev is not necessarily the same over different FCPs, particularly when multiple variables are now considered together or is this effectively taken into account due to the specification the elliptic curve represents a minimum confidence interval? From a statistical perspective, my question relates to does this type of mining enable equal precision to mine for 4-tuples vs 2-tuples and 1-tuples.

Response:

Different interval type-2 fuzzy sets may produce different interval membership degrees. In this paper, for different evaluation data, we use the same interval type-2 fuzzy membership function construction method. In previous co-location pattern mining studies, for FCPs mined from the same data set, min_prev is usually set to a fixed value to ensure that the prevalence of FCPs in the same data set can be objectively evaluated. This method is also used in this paper, so we think this type of mining enable equal precision to mine for 4-tuples vs 2-tuples and 1-tuples.

 

- whilst mentioned in the text, the number of significant FCPs if using the equivalent Type 1 fuzzy set was not specifically given. It would be good to add this and comment on if there are a particular class of FCPs that are likely to be missed, e.g. I might assume worse sensitivity for large k size FCPs ...

Response:

We have added relevant comments in 6.Discussion to explain that using type-1 fuzzy sets may miss important fuzzy co-location patterns.

 

- big distance neighbor and small distance neighbor is defined the same in the text. please correct this to reflect that what is meant is that small distance neighbor is the immediate tree connection and big-distance regards clique-clique connectivity.

Response:

We have corrected this error in definitions 11 and 12. For an instance s, small neighbor instance set SNs(s) is a set of all the instances that are smaller than s and have spatial neighbor relationships with s, big neighbor instance set BNs(s) is a set of all the instances that are bigger than s and have spatial neighbor relationships with s.

 

- there is some confusion between row-filter and join algorithm. Please clarify and use consistent terminology.

Response:

For the description of row-filter and join algorithm, we have clarified and used uniform terms in consistent terminology 4.2.1.

 

- it would be good to add an introduction on difference between type-2 and type-1 fuzzy set for readers with graphical illustration and explain why the elliptic curve and interval methods are necesary for feasible computation

Response:

In the 1. Introduction, we have added an introduction on difference between type-2 and type-1 fuzzy set.( Lines 53-58)

 

 

- lines 248-260 is overly complex explanation. please explain more conceptually or leave out as it is obvious from the mathematical definition of prevalence tendency degree.

 Response:

We have deleted lines 248-260, because we consider this explanation is superfluous.

 

Minor comments

- please define star neighborhood, and mention this is non-complete vs spatial clique approach.

Response:

We have added the definition of star neighborhood in Section 4.2.2. (lines 762-763)

 

- give names to the pseudocode table for easier finding and reading.

Response:

We have given names to each pseudocode tables.

 

- please define the pruning rate in figures to aid interpretation

Response:

We have defined the pruning rate in the figures.

 

 

- please use the real functional area names i.e. industrial, agricultural, living area,  in figure 5.11, 5.12 to help understanding.

Response:

We have used the real functional area names in Figure 5.11, 5.12.

 

- for section 5.2, the spatial proximity is given and prevalence threshold, what is the min_dist threshold for determining colocation of attributes?

Response:

The distance threshold min_dist is 50. We have corrected the mistake in this statement.

 

- document compilation error on line 395. the lines don't flow well relative to figure insertion.

Response:

We have corrected the document compilation error in line 395 so that it can be clearly displayed.

 

First of all, we are very grateful for your compliments on our work. Secondly, we also thank you for your valuable comments.

Reviewer 4 Report

Some comments must be clarified and improved in the revised version of the manuscript:

- Please, Authors should explain in a detailed way about the applicability of the current research work in new location, situations or environments.

- The novelty of the manuscript must be incorporated in the revised version.

- Which is the innovation of this research work?

- The objective of the research is not well-defined and it is not appear in the current manuscript. Please, add and improve.

- I could not detect the limitations of the study. Please, improve.

- How were defined the inference rules of the systems?

- Where Author incorporated the professional expert survey in the fuzzy logic model? It is not clear. Please, clarify.

- Discussion section must be suited in previous research work related to the area. References are needed. Please improve.

Reconsideration, after major revision. Thank you!

Author Response

Reviewer #4:

Some comments must be clarified and improved in the revised version of the manuscript:

- Please, Authors should explain in a detailed way about the applicability of the current research work in new location, situations or environments.

Response:

In this paper, our main work is to design a method of mining fuzzy co-location patterns based on interval type-2 fuzzy sets. According to the steps of this method, we extend the traditional co-location pattern mining algorithm to mine fuzzy co-location patterns. Therefore, other co-location pattern mining algorithms can also be extended according to our designed steps to mine fuzzy co-location patterns. Therefore, our work still has good applicability in new locations, situations and environments.

 

-The novelty of the manuscript must be incorporated in the revised version.

Response:

We have added Section 2.Innovation to explain the work and novelty of our paper.

 

- Which is the innovation of this research work?

Response:

We describe the innovation of our work in 2.Innovation. It mainly includes:

  1. We design a method to construct a special interval type-2 fuzzy membership function.
  2. We define the concepts of fuzzy membership interval, upper bound participation ratio, lower bound participation ratio of fuzzy features, upper bound participation index, and lower bound participation index for FCP mining. We propose the concepts of absolutely prevalent FCPs, FCPs with prevalence tendency degree, and absolutely non-prevalent FCPs. We propose the prevalence tendency degree to measure the prevalence degree of FCPs.
  3. We propose an FCPs mining method based on interval type-2 fuzzy sets and clique to mine FCPs. In addition, we apply interval type-2 fuzzy sets to traditional co-location pattern mining algorithms, and form an FCPs mining method based on interval type-2 fuzzy sets and the traditional Join-based algorithm, and another FCPs mining algorithm based on interval type-2 fuzzy sets and the traditional Joinless algorithm.

 

- The objective of the research is not well-defined and it is not appear in the current manuscript. Please, add and improve.

Response:

We have added the clear objective of our paper in the last paragraph of 1.Introduction.

 

- I could not detect the limitations of the study. Please, improve.

Response:

We have added a description of the limitations of our paper in 7.Conclusions. (lines 1151-1154)

 

 

 

- How were defined the inference rules of the systems?

Response:

We follow the inference rules of the systems of the previous methods of mining co-location patterns. For the extended part, we verify it through reasoning and experiments to ensure that the method can mine fuzzy co-location patterns.

 

-Where Author incorporated the professional expert survey in the fuzzy logic model? It is not clear. Please, clarify.

Response:

We incorporated the professional expert survey in the fuzzy logic model in Section 4.1.1 (1) Count the granular evaluation data. We collected the interval evaluation data of experts and presented them in the appendix. The number in the particle indicates the number of evaluators who consider that an interval value corresponds to the interval membership.

Based on this interval evaluation data, we construct an interval type-2 fuzzy membership function.

 

Discussion section must be suited in previous research work related to the area. References are needed. Please improve.

Response:

We have added previous research work related to the area and relevant references in the discussion section.

 

Reconsideration, after major revision. Thank you!

 

First of all, we are very grateful for your compliments on our work. Secondly, we also thank you for your valuable comments.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

No further comments

Author Response

  First of all, we are very grateful for your compliments on our work. Secondly, we also thank you for your valuable comments.

Author Response File: Author Response.docx

Reviewer 4 Report

I have no additional comments. English language and style are fine/minor spell check required.

Author Response

    First of all, we are very grateful for your compliments on our work. Secondly, we also thank you for your valuable comments.

    We have checked the spelling of the English language and style of this paper and made corresponding modifications.

Author Response File: Author Response.docx

Back to TopTop