1. Introduction
Classification is a supervised learning method. It establishes a mapping from a feature space onto classes so that things can be classified and identified. Many classification problems are encountered in various fields, including graphic images [
1], medicine [
2], economics [
3], finance [
4], cyber security [
5], etc. In order to improve the performance and accuracy of classification, many classification methods have been proposed. Classification methods mainly focus on statistical analysis [
6,
7], neural networks [
8,
9,
10], rule reasoning [
11,
12], etc.
In the classification methods of statistical analysis theory, scholars have developed many classification techniques and models, such as decision trees [
13], random forests [
14], Bayesian models [
15], logistic regression models [
16], k-Nearest Neighbor classification [
17,
18], etc. These classical statistical classification methods are built on the basis of the Bayesian decision theory [
19]. The theoretical premise is that the prior probability should be known. However, the prior probability is often difficult to be truly obtained.
Many scholars turned to study classification technologies involving neural networks. Vapnik et al. proposed the support vector machine (SVM) [
20,
21,
22], which is a shallow neural network model. Based on the theory of structural risk minimization, the optimal hyper plane is constructed in the feature space, which makes the classifier globally optimized and achieves excellent classification performance in a small sample environment. SVM has been widely applied in face recognition [
23,
24], image segmentation [
25], pedestrian detection [
26], gene selection [
27,
28], and so on. In recent years, the shallow neural networks have developed into deep neural networks. Hinton et al. proposed the concept of deep learning in 2006 [
29,
30], which solved the optimization problems caused by deep structures to a certain extent. Deep learning has achieved a great success in computer vision [
31,
32,
33], speech recognition [
34], and natural language processing [
35]. However, with the number of deep neural network layers and the number of neurons in each layer increasing, the parameters of the inter-layer connections become very large, which result in being slow in the global convergence speed. The neural network is a black box and lacks theoretical explanations and guidance. Parameter tuning often relies on experience.
It is well known that fuzzy logic can deal with uncertainty and fuzziness [
36,
37]. The classification rules derived from fuzzy logic have good interpretability. The use of rules like “if-then” provides a good insight into the structure of the classifier, which makes the classification results have good interpretability. However, the formation of the rule base is inseparable from the domain knowledge of experts, and the speed of searching and matching in the rule base is slow, while the learning ability is weak.
The classifiers are divided into linear classifiers and nonlinear classifiers. The linear classifiers are established on the basis of the hyper planes. The nonlinear classifiers are built on neural networks. In this paper, we propose a new classifier model based on the set theory and the neighborhood information granulation. The concept of information granulation was introduced by Zadeh [
38], and the concept of granular computing was proposed by T.Y. Lin [
39,
40]. Yao studied the neighborhood systems and neighborhood granular computing [
41,
42,
43], the applications of which were developed by Hu in neighborhood reduction and classification [
44,
45,
46]. Morente-Molinera proposed a multigranular linguistic modeling with fuzzy entropy for supervised classification learning purposes [
47]. Pedrycz developed a certain category of granule classifiers referred to as hyper box-driven classifiers [
48,
49,
50]. We focus on the neighborhood granulation of classification systems and propose a granular structure, granular distance, and granular rule by a neighborhood granulation, so as to build a new granule classifier model. Since the granules are formed with sets, the granule classifier is a set-based classifier. Furthermore, we present the granular similarity for searching and matching in a granular rule base, design a granule classifier, and verify it experimentally. Theoretical analysis and experimental results show that the proposed classifier can achieve a better classification performance under a suitable granulating parameter.
The paper is structured as follows. First, in
Section 2, we introduce the neighborhood granulation of classification systems. Then, we propose a new neighborhood granule classifier in
Section 3. In
Section 4, we present experimental results. Conclusions and future works are covered in
Section 5.
2. Neighborhood Granulation of a Classification System
The rough set theory (RST) [
51] proposed by Pawlak, a mathematician of Poland, is a widely used model in classification systems. In a classification system, the data with real values need to be discretized, but the process of discretization easily leads to a loss of classification information. Aiming at the limitation of Pawlak RST, a neighborhood RST is introduced to granulate these real-value data. In RST, an equivalence class is regarded as a granule. As for neighborhood RST, neighborhood granules are constructed.
Suppose
is a classification system, where
is a set of samples or objects;
is a set of condition features or attributes;
represents a decision feature or attribute;
is a neighborhood parameter [
44]. The data of samples in the condition feature set
A are real values, and the data of samples in the decision feature
D are symbolic or discrete values.
Suppose
is a classification system. For any sample
and a feature subset
, where
, a distance function on the feature subset
P is defined as [
44]:
where
represents the value of the sample
x on a feature
. If
, it is named the Manhattan distance, and if
, it is named the Euclidean distance.
Suppose
is a classification system. For any sample
and a condition feature subset
, the
neighborhood granule of
x on
P is formed by:
which is called a process of neighborhood granulation.
The data on decision features in classification systems are symbolic values. The samples are divided into equivalent classes on the decision features and the neighborhood parameter
. Therefore, these samples are granulated into equivalent granules, which are called decision equivalent granules, and they are expressed as:
It is obvious that a neighborhood granule is degenerated into an equivalent granule when the neighborhood parameter
is zero. Therefore, the equivalent granule is a special neighborhood granule. In a classification system, for any sample
and a condition feature subset
, it determines a neighborhood relation, which is defined as:
The is called a covering of the domain U, which is induced by a neighborhood relation, and the covering is a collection of neighborhood granules. The neighborhood relation is a kind of similarity relation that satisfies reflexive and symmetric properties, rather than the equivalence relation. When a neighborhood parameter is equal to zero, the neighborhood relation is degenerated into an equivalence relation. Therefore, the equivalence relation is a special neighborhood relation. The rough sets based on neighborhood relations can handle the real-value data, while the rough sets based on equivalence relations can only deal with the symbolic data.
A classification system is
. For any sample
and a feature subset
, let
be a neighborhood granule, then the size of granule
is defined as:
where
represents the cardinality of a set. It is easy to know that the size of the neighborhood granule holds the property:
.
Example 1. A classification system is shown in Table 1. Suppose is the sample set, is the feature set, and the neighborhood granulation parameter . This example uses the Euclidean distance for neighborhood granulation. The domain is . We can granulate the samples on feature subset . These neighborhood granules are , , , and .
The sizes of condition granules are calculated as: , , .
According to the neighborhood granulation of feature subset , the condition neighborhood granules are: , , , .
The sizes of condition granules are calculated as: . It is easy to know that if , then . The decision equivalent granules are , , , .
3. The Model of Neighborhood Granule Classifiers
A granular rule is defined in this paper, which is constructed by a condition granule and a decision granule, being formed by a neighborhood granulation of a sample. The condition granule is a precursor of the granular rule, while the decision granule is a consequence of the granular rule. The granular rules of all the samples constitute a granular rule base. Then, two kinds of granular distances and granular similarity measures are defined so that the searching and matching of granules can be performed. A test sample is granulated into a condition granule as a test granule. The test granule is matched with the precursor of a granular rule in a granular rule base. The label of the most numerous classes in the matched granular rules is the predictive label of the test sample.
3.1. Neighborhood Granular Structure
Suppose
is a classification system. For
,
, there exists a neighborhood granule
on
B. The set of all neighborhood granules on
B is called the neighborhood granular swarm, which is defined as:
Suppose
is a classification system. For
, let
be a neighborhood granular swarm; the size of neighborhood granular swarm
is defined as:
It is easy to know that the size of neighborhood granular swarm holds the property: .
Suppose is a classification system. For , , there exists a neighborhood granule on B and a decision equivalent granule on D. A neighbor granular rule is defined as an ordered pair: . The neighborhood granular rule base is defined as: .
The samples of a classification system are granulated into neighborhood granules by neighborhood parameters, and the set of neighborhood granules constitutes a neighborhood granular swarm. Condition granules and decision granules form a granular rule, and the collection of granular rules constitutes a granular rule base. Therefore, the process of classification can be transformed into the process of reasoning, searching, and matching in the granular rule base.
3.2. Granular Distance between Neighborhood Granules
Suppose
is a classification system. For
and
, suppose
,
are two neighborhood granules, then the relative distance between two neighborhood granules is defined as:
Example 2. There are two neighborhood granules such as and , respectively. We can obtain their relative granular distance: .
Proposition 1. The relative distance between two neighborhood granules is a distance metric, which satisfies the following three properties:
- (1)
, non-negativity;
- (2)
, symmetry;
- (3)
, triangle inequality.
Proof. The two properties of (1) and (2) are easily proven by the definition of the relative distance of neighborhood granules. The following proves the property (3).
Given any , let , , then . According to the definition of the relative distance between neighborhood granules, we can know = = ≥ = = . Therefore, . That is . ☐
Proposition 2. The granular relative distance of any two neighborhood granules satisfies: .
Proof. According to the definition of the relative distance of neighborhood granules, it is easy to prove.
Suppose
is a classification system. For
and
, suppose
,
are two neighborhood granules, then the absolute distance between two neighborhood granules is defined as:
☐
Example 3. There are two neighborhood granules such as and , respectively. The sample domain is . We can obtain their absolute granular distance: .
Proposition 3. The absolute distance between two neighborhood granules satisfies: .
Proof. According to the definition of the absolute distance of neighborhood granules, it is easy to prove.
These granular distances of neighborhood granules can be used as the similarity measures of granules, which indicate the degree of similarity of neighborhood granules. The smaller the granular distance is, the greater the similarity is. On the contrary, the larger the granular distance is, the smaller the similarity is. ☐
3.3. Neighborhood Granule Classifiers
Suppose is a classification system. For and , suppose , are two neighborhood granules, then the similarity measures between two neighborhood granules are defined as:
Relative similarity measure: .
Absolute similarity measure: .
Suppose is a classification system. is a training sample set, and is a test sample set. For , is a granular rule, and is a granular rule base. For , is a test neighborhood granule, which is granulated by the training sample set. Suppose is a similarity parameter, , if or , then a granular rule is matched from the granular rule base, and the decision granule of the granular rule is merged into a decision granule set. This matching process is a granular search and is expressed as follows:
Using the relative similarity measure: .
Using the absolute similarity measure: .
Suppose is a classification system. is a training sample set, and is a test sample set. For , is the decision granule set of the test sample y, then the label of the test sample y is the most numerous granule label in .
4. Design of a Neighborhood Granule Classifier
A neighborhood granule is a set. The neighborhood granule classifier is a classifier with set operations. The classifier is divided into the granulation process and the classification process. In this section, we discuss the principle of the neighborhood granule classifier and give the algorithm of the neighborhood granule classification.
4.1. Granulation and Classification of Neighborhood Granules
The neighborhood granule classifier has only the granulation and the classification steps, and it has no training process. The granulation process includes: data preprocessing, partitioning of a training set and a test set, the training set granulated into a granular rule base, and the test set granulated into a set of test granules. The classification process includes: searching and matching in the granular rule base for test granules and determining the labels of test granules. The detailed granulation and classification processes are presented as follows:
Data preprocessing: deleting the data with missing values and normalizing the data with the interval .
Divided with a training set and a test set: is the training set, and is the test set.
The neighborhood granulation of the training set: a granular rule base is constructed according to the Euclidean distance and neighborhood parameters.
The neighborhood granulation of the test set: for a test sample, computing the Euclidean distance between the test sample and each training sample to form a test granule, and all the test samples are granulated into the test granular set.
Searching and matching for the test granule: setting the value of a similarity parameter, for the test granule, computing the similarity between the test granule and the condition granule of a rule in the granular rule base, getting all the granular rules whose similarities are larger than the similarity threshold, and collecting the decision granules of matched granular rules to form a set of decision granules. Note:the elements of a decision granular set may be identical.
Judging the label of the test granule: The label of the test granule is the most numerous granule label in the decision granule set.
From the above analysis, it can be known that the neighborhood granule classifier is similar to the KNN classifier, and it is a lazy learning algorithm.
4.2. The Algorithm for Neighborhood Granule Classification
From the analysis of a structure of neighborhood granules, we propose the distance and similarity measures for neighborhood granules. Therefore, we can design a neighborhood granule classifier. The detailed neighborhood granule classification algorithm is described as follows:
Algorithm 1: Neighborhood granule classification (NGC). |
Input: A training set , a test sample t, a neighborhood parameter , and a similarity parameter . Output: The of a test sample t. (1) Normalization: , . (2) For each training sample (3) Compute a neighborhood granule by the neighborhood parameter ; (4) Achieve an equivalence granule ; (5) Form a granular rule , and insert it into a granular rule base L; (6) End for. (7) For the test sample t, granulate it as according to the training set. (8) For each granular rule (9) Compute the granular distance ; (10) Compute the granular similarity degree ; (11) If , then achieve the granular rule , and insert the decision granule into a target base T; (12) End for. (13) The of the test sample t is the label of a granule from T, and its quantity is the maximum. (14) Return .
|
In the NGC algorithm, the neighborhood granulation process is mainly involved. In Step 3, the time complexity of neighborhood granulation is , while the number of features is m, and that of training samples is n. Therefore, the time complexity of neighborhood granulation for training samples is in Steps 2–6. Step 7 is the neighborhood granulation of the test set, and its time complexity is , while t is the number of test samples and . Steps 8–14 have linear complexity. Therefore, in the worst case, the time complexity of the NGC algorithm is .
5. Experimental Analyses
Due to the different range of values of the datasets, the datasets needed to be normalized and preprocessed. Since values of neighborhood parameter varied between zero and one, we use the max-min normalization method to ensure that all data could be converted into values between [0, 1]. The formula of max-min normalization is the following:
All the classification experiments were implemented by a five-fold cross-validation on the four datasets. We split the dataset into five equally-sized subsets. Of the five subsets, a single subset was retained as the testing data, and the remaining four subsets were used as training data. The cross-validation process was then repeated five times, with each of the five subsets used exactly once as the testing data. The classification accuracy was the mean of five tests. In order to test the classification accuracy of our proposed granule classifier, the values of neighborhood granulation parameters were varied from 0.55–0.95 with an interval of 0.05, which involved a total of nine times for testing. The neighborhood granule classifier needed to set thresholds of granular similarity measure, which started from 0.55–0.95 by intervals of 0.05 with a total of nine changes. The granular similarity measures in these experiments used the absolute similarity measure. The experimental results are shown in
Figure 1,
Figure 2,
Figure 3 and
Figure 4.
As can be seen from
Figure 1, for the glass dataset, when the neighborhood parameter
was 0.95 and the similarity threshold
was 0.95, the classification accuracy reached a maximum of 0.6038. The next values of classification accuracy in maximum were mainly distributed in the corners where granular similarity thresholds and neighborhood granulation parameters were both large. The smallest values of classification accuracy were mainly distributed on the plane with smaller granular similarity thresholds and larger neighborhood granular parameters.
As can be seen from
Figure 2, for the pima dataset, when the neighborhood parameter
was 0.55 and the similarity threshold
was 0.85, the classification accuracy reached a maximum of 0.75. The second largest values of the classification accuracy were basically distributed on the whole plane. The minimum value of classification accuracy was 0.3906, and it was distributed at the point where
was (0.60, 0.95).
As can be seen from
Figure 3, for the segmentation dataset, when the
was (0.55, 0.95), the classification accuracy reached a maximum of 0.8095. The second largest values were mainly distributed on the band plane with larger similarity thresholds. The minimum values of the classification accuracy were mainly distributed on the plane where similarity thresholds and neighborhood parameters were small at the same time, and the sub-minimum values of the classification accuracy were mainly distributed on the band plane where the similarity threshold was small.
As can be seen from
Figure 4, for the wine dataset, there were 11 maximum values of classification accuracy which were equal to 0.9773, mainly distributed on the plane between the neighborhood parameters ranging from 0.70–0.90 and the similarity thresholds ranging from 0.6–0.85. The minimum values of classification accuracy were mainly distributed in corners with neighborhood parameters being smaller and similarity thresholds being smaller at the same time.
From the above experiments, we can see that, when the dataset had a large number of categories, such as glass and segmentation, the classification accuracy was mainly related to the granular similarity threshold. The similarity threshold was larger, and the classification effect was better. When the dataset had a large number of features, such as segmentation and wine, the classification accuracy was mainly related to the neighborhood granulation parameter, and the smaller the neighborhood granulation parameter, the worse the classification effect.
In order to compare our algorithms with other classifiers, we ran some experiments on only four datasets. The classification algorithm uses traditional K-Nearest Neighbor (KNN) classifier, and our proposed Neighborhood Granule Classifier was based on Relative (NGCR) distance and the Neighborhood Granule Classifier on Absolute (NGCA) distance. In
Table 3, the compared results are shown with a fivefold cross-validation testing for the wine dataset. The test average results of the four UCI datasets are illustrated in
Figure 5,
Figure 6,
Figure 7 and
Figure 8.
As can be seen from
Figure 5, for the glass dataset, the classification accuracy of KNN was 0.6415, the maximum classification accuracy of NGCR 0.6981, and the maximum classification accuracy of NGCA 0.6038. For the NGCR classifier, the classification accuracy was higher than KNN when the value of neighborhood parameter was 0.9. The NGCR was better than NGCA in any neighborhood parameter value.
As can be seen from
Figure 6, for the pima dataset, the classification accuracy of KNN was 0.7448. The classification accuracy of NGCR reached a maximum value of 0.7656, when the value of the neighborhood parameter was 0.6; and the classification accuracy of NGCA reached a maximum of 0.75, when the value of the neighborhood parameter was 0.55. Therefore, the NGCR and NGCA granule classifiers were better than the traditional KNN classifier under a suitable neighborhood parameter value.
For the segmentation dataset in
Figure 7, the classification accuracy of KNN was 0.7857; while the value of the neighborhood parameter was 0.55; the classification accuracy of NGCR reached a maximum value of 0.8095; and that result was the same as NGCA. It can be seen that NGCR and NGCA were better than the traditional KNN under a suitable neighborhood parameter value. The performance of NGCR was better than that of NGCA.
As for the wine dataset in
Figure 8, the classification accuracy of KNN was 0.9545; the classification accuracy of NGCR reached the maximum value of 0.9773, while the neighborhood parameter values were 0.75, 0.90 and 0.95; and while the neighborhood parameter values were 0.75–0.9, the classification accuracy of NGCA reached the maximum value of 0.9773. It can be seen that NGCR and NGCA were better than the traditional KNN under suitable neighborhood parameter values.