Next Article in Journal
The Need for Speed: A Fast Guessing Entropy Calculation for Deep Learning-Based SCA
Next Article in Special Issue
A Novel Short-Memory Sequence-Based Model for Variable-Length Reading Recognition of Multi-Type Digital Instruments in Industrial Scenarios
Previous Article in Journal
GRNN: Graph-Retraining Neural Network for Semi-Supervised Node Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Classification Algorithm Based on Multidimensional F1 Fuzzy Transform and PCA Feature Extraction

by
Barbara Cardone
1 and
Ferdinando Di Martino
1,2,*
1
Dipartimento di Architettura, Università degli Studi di Napoli Federico II, Via Toledo 402, 80134 Napoli, Italy
2
Centro Interdipartimentale di Ricerca A. Calza Bini, Università degli Studi di Napoli Federico II, Via Toledo 402, 80134 Napoli, Italy
*
Author to whom correspondence should be addressed.
Submission received: 26 January 2023 / Revised: 21 February 2023 / Accepted: 22 February 2023 / Published: 23 February 2023
(This article belongs to the Special Issue Machine Learning and Deep Learning in Pattern Recognition)

Abstract

:
The bi-dimensional F1-Transform was applied in image analysis to improve the performances of the F-transform method; however, due to its high computational complexity, the multidimensional F1-transform cannot be used in data analysis problems, especially in the presence of a large number of features. In this research, we proposed a new classification method based on the multidimensional F1-Transform in which the Principal Component Analysis technique is applied to reduce the dataset size. We test our method on various well-known classification datasets, showing that it improves the performances of the F-transform classification method and of other well-known classification algorithms; furthermore, the execution times of the F1-Transform classification method is similar to the ones obtained executing F-transform and other classification algorithms.

1. Introduction

The Fuzzy Transform technique (for short F-transform) [1] is a fuzzy regression method introduced to approximate a continuous function of k variables f(x1,x2,…, xk) defined in the domain [a1,b1] × [a2,b2] ×…× [ak,bk] ⊂ Rk. If this function is known in N points pj = (pj1, pj2,…,pjk) j = 1,…N, it can be approximated by a weighted average whose weights are constants given by the components of the direct F-transform.
F-transform was applied in many image and data analysis problems. An extensive description of the F-transform-based techniques used in image and data analysis is given in [2].
In [3] a generalization of the F-transform, called high degree F-transform and labelled with Fs-transform (s ≥ 0) was proposed, where the F-transform is given by the zero-degree F0-transform. In the Fs-transform, with s > 0, the constant components of the direct F-transform, were replaced by s-degree polynomial components with the aim to capture more information about the original function. The greater the polynomial degree, the finer the approximation of the original function; however, as the polynomial degree increases, the computational complexity of the algorithm that uses the FS transform increases considerably. In addition, the meeting of the constraint of sufficient data density with respect to fuzzy partitions further requires additional consumption of memory resources and CPU time [2,4].
Some researchers apply the F1-transform in image analysis, in which only two input variables are used, and the sufficient data density constraint is always met. In [5,6] an algorithm based on the F1-transform in two variables was applied in an image processing edge detection problem. In [7] a lossy image compression method based on the F1-transform in two variables is proposed; the authors show that this method improves the decoded image quality obtained using the F-transform image compression [8]. In [9] a hybrid deep neural network in which the Fs-transform is used in image analysis to construct in a preprocessing phase the convolution kernels in the first two layers of the network.
Generally, when the multidimensional F-transform is applied in classification, regression, and prediction problems, as the dimensionality of the data increases, the computational complexity greatly increases; for this reason, the application of high-order F-transforms in data regression or classification has a high computational complexity, in terms of memory and time consumption, especially in the presence of massive multidimensional data [4]. An application of the first-order unidimensional fuzzy transform was proposed in [10] where the unidimensional F1-transform is applied to seasonal time series weather datasets in a seasonal forecasting model. Comparison tests show that this model improves the forecasting time series performances obtained by applying the F0-transform model proposed in [11,12]. In [13] the F1-transform is applied to remove seasonal components and noise in time series.
Recently, in [14] a new classification algorithm based on the multidimensional F0-transform for massive datasets called Multi-dimensional F-transform Classification (for short, MFC), was proposed. In MFC the K-fold cross-validation technique is applied to avoid overfitting in the data; the multidimensional direct F0-transform is applied to each fold and a weighted mean of the multi-dimensional inverse F0-transform calculated from the direct F0-transform components obtained in each fold is used to classify data points. Comparisons with well-known classification algorithms showed that MFC has better classification performances than Naive Bayes [15] and Lazy Bk [16] and is comparable with respect to the ones obtained by using Decision tree J48 [17] and Multilayer Perceptron [18] algorithms. On the other hand, the main drawback of this method is the high computational complexity that is reached when the number of features increases.
In [19] a hybrid fast classification method based on F-transform and Principal Component Analysis (for short, PCA) [20,21,22,23] is proposed. This method is tested to classify images; the results show that it improves the success rate and computation time obtained by applying the F-transform algorithm.
PCA is a well-known dimensionality reduction multivariate statistics technique whose goal is to reduce the number of features of a dataset by losing the least amount of information possible. PCA is one of the most used feature extraction techniques in data analysis. Its strong point is to be able to reduce the dimensionality of the data, while preserving their information content.
In this research, we propose a hybrid classification method applied to massive data including the PCA and the multidimensional F1-transform techniques. The main goals of the proposed method are:
-
Improve the MFC classification performances; the application of the multidimensional direct and inverse F1-transform allows to increase the accuracy and precision of the classifier with respect to the use of the multidimensional F-transform.
-
Significantly reduce the time and memory consumption executing the PCA feature extraction algorithm in the preprocessing phase to reduce the dimensionality of the data.
In Section 2, the concepts of multidimensional direct and invers F1-transform are introduced and the PCA algorithm is synthetized. Section 3 is focused on the analysis of the architecture and functional characteristics of our classification method based on the multidimensional F1-transform. The results of experimental tests are discussed in Section 4. Finally, in Section 5 the conclusions and future perspectives are included.

2. Preliminaries

2.1. Principal Component Analysis

Data dimensionality reduction techniques are divided into feature selection and feature extraction techniques. Feature selection techniques, such as random forest or grid search algorithms, select a subset of the original features in order to reduce the complexity and computational efficiency of the model. Conversely, feature extraction techniques extract information from the original features set and create a new features subspace. While feature selection techniques aim to select the most significant features, discarding the less significant ones from the set of original features, feature extraction techniques construct a new reduced set of features, starting from the existing ones, able to synthesize most of the information contained in the original set of features.
The use of feature selection techniques is preferable when the explainability of the model and the semantic meaning of the features are required; feature extraction techniques are used to reduce the model complexity, improving its predictive performance.
PCA is one of the most used feature extraction techniques in data analysis. Its strong point is to be able to reduce the dimensionality of the data, while preserving their information content.
Let D be the original dataset with s features X1, …, Xs and N instances. The ith instance is characterized by a vector (xi1, xi2,…, xis)T where xij is the value of the ith instance in correspondence to the jth feature.
Let mj be the mean value of the jth feature, given by:
m j = 1 N i = 1 N x ij   j = 1 , , s
and stj the standard deviation is given by:
st j = 1 N i = 1 N   z ij m j 2 j = 1 , , s
To remember this definition, we can break it down into eight steps:
1.
The aim of this phase is to standardize the range of the initial variables so that each one of them contributes equally to the analysis. For each instance xij, its normalized value is computed, given by:
z ij =   x ij m j s j i = 1 , , N j = 1 , , s
2.
The relationships among features are analyzed by computing the symmetric covariance matrix C = ZTZ, where ZT is the transpose of the normalized matrix Z. The components of C are given by
C jk = 1 N 1 i = 1 N z ji · z ik j ,   k = 1 , , s
3.
In this phase, the s eigenvalues and the s eigenvectors of the covariance matrix are extracted. The eigendecomposition of C is where we decompose C into VDV−1, where V is the matrix of eigenvectors and D is a diagonal matrix in which the diagonal components are the eigenvalues λi i =1,…,s and the other elements are equal to 0.
4.
The eigenvalues on the diagonal of D will be associated with the corresponding column in P—that is, the first element of D is λ1 and the corresponding eigenvector is the first column of P. This holds for all elements in D and their corresponding eigenvectors in P. We will always be able to calculate PDP−1 in this fashion.
5.
In this phase the s eigenvalues are sorted in descending order; in the same way the corresponding eigenvectors in the matrix V are ordered, obtaining the matrix V′, whose columns correspond to the ordered eigenvectors.
6.
The normalized data matrix Z is transformed in the matrix of the principal components Z′ multiplying Z by the ordered matrix of the eigenvectors V′: Z′ = ZV′.
7.
The significant principal components are selected by analyzing the eigenvalues, sorted in descending order. Three heuristic criteria are generally used for the choice of the number of components:
-
Select only the main components corresponding to the eigenvalues whose sum, compared to the sum of all the eigenvalues, is greater than or equal to a specific threshold, for example, 80% or 90%.
-
Adopt the Kaiser criterion, in which only those components are selected which correspond to an eigenvalue greater than or equal to 1, or, equivalently, the components that have variance greater than the average.
-
Build the eigenvalue graph, called Scree Plot, and select the number of components corresponding to the elbow point beyond which the graph the eigenvalues stabilizes.
8.
The reduced dataset is constructed considering only the significant principal components.

2.2. Multidimensional F-Transform

Let f: X ⊆ Rn → Y⊆ R be a continuous n-dimensional function defined in a closed interval X = [a1,b1] × [a2,b2] ×…× [an,bn] ⊆ Rn and known in a discrete set of N points P = {(p11, p12, …, p1n), (p21, p22, …, p2n),…, (pN1, pN2, …,pNn)}.
For each i = 1,…,n let xi1, xi2, …, ximi with mi ≥ 2 be a set of mi points of [ai,bi], called nodes, such that xi1 = ai < xi2 <…< ximi = bi.
For each i = 1,…,n let Ai1, Ai2,…, Aimi: [ai, bi] → [0,1] be a family of fuzzy sets forming a fuzzy partition of [ai,bi], where:
  • Aih(xih) = 1 for every h =1,2,…, mi;
  • Aih(x) = 0 if x is not in (xih−1, xih+1), where we assume xi0 = xi1 = ai and xin+1 = xin = bi by commodity of presentation;
  • Aih(x) strictly increases on [xih−1, xih] for h =2,…, mi and strictly decreases on [xih, xih+1] for h = 1,…, mi − 1;
  • h = 1 m i A ih x = 1 for every x ∊ [ai, bi].
The fuzzy sets Ai1, Ai2,…, Aimi are called basic functions.
Let ci = (bi − ai)/(mi − 1). The basic functions Ai1, Ai2,…, Aimi form a uniform fuzzy partition of [ai,bi] if:
5.
mi ≥ 3 and the nodes are equidistant, i.e., xih = ai + di ∙ (h − 1), where di = (bi − ai)/(mi − 1) and h = 1, 2, …, mi.
6.
Aih(xih − x) = Aih(xih + x) ∀ x ∊ [0,h] and ∀ h = 2,…, mi − 1;
7.
Aih + 1(x) = Aih(x − di) ∀ x ∊ [xih, xih + 1] and ∀ h = 1,2,…, mi − 1.
We say that the set P = {(p11, p12, …, p1n), (p21, p22, …, p2n),…,(pN1, pN2, …,pNn)} is sufficiently dense w.r.t. the set of fuzzy partitions A 11 A 12 A 1 m 1 ,…, A i 1 A i 2 A im i ,…, A n 1 A n 2 A nm n if for each combination A 1 h 1 A 2 h 2 A nh n exists at least a point pj = p j 1 ,   p j 2 , , p j n ∊ P, such that A 1 h 1 p j 1 A 2 h 2 p j 2 A nh n p jn > 0. In this case, we can define the direct multidimensional F-transform of f with the (h1,h2,…,hs)th component F h 1 h 2 h n given by
F h 1 h 2 h n = j = 1 N f p j 1 , p j 2 , p jn A 1 h 1 p j 1 A 2 h 2 p j 2 A nh n p jn j = 1 N A 1 h 1 p j 1 A 2 h 2 p j 2 A nh n p jn
The multidimensional inverse F-transform, calculated in the point pj, is given by:
f n 1 n 2 n s F p j 1 , p j 2 , , p jn = h 1 = 1 m 1 h 2 = 1 m 2 h n = 1 m n F h 1 h 2 h n A 1 h 1 p j 1 A n h n p jn
It approximates the function f in the point pj. In [11,12] the multidimensional inverse F-transform (6) is applied in regression analysis to find dependencies between attributes in datasets.
To highlight the use of the multidimensional F-transform, consider, as an example, a dataset, given by two input features defined in the close intervals, respectively, [1.1, 4.9] and [0.1, 1.0].
Suppose we create for each of the two input variables a fuzzy partition consisting of three basic functions, setting m1 = 3 and m2 = 3. We obtain c1 = 1.9 and c2 = 0.45.
Table 1 shows the values of the three nodes for the two input variables.
Figure 1 shows the points in the input variable plane. The four rectangles are drawn to show that the dataset is sufficiently dense with respect to the set of the two fuzzy partitions {A11, A12, Ai3} and partitions {A21, A22, A23}. In fact, in each rectangle in the figure, there is at least one point; this implies that for every combination of basic functions A1i, A2i i = 1,2,3, there exists at least one point pj = (pj1,pj2) such that A1i (pj1) A2i (pj1) ≠ 0.
Since the data are sufficiently dense with respect to the sets of fuzzy partitions, it is possible to apply Equation (5) to calculate the components of the multidimensional direct F-transform Fh1h2 h1, h2 = 1,2,3. Finally, Equation (6) can be applied to calculate the multidimensional inverse F-transform in a point p; it approximates the function f in that point.

2.3. High Degree F-Transform

This paragraph introduces the concept of higher degree fuzzy transform or Fr-transform. One-dimensional square-integrable functions will now be considered.
Let Ah, h = 1,…,n, be the hth basic function defined on [a,b] and L2([xh−1,xh+1]) be the Hilbert space of square-integrable functions f,g: [xh−1,xh+1] ⟶ R with the inner product:
f , g h = x h 1 x h + 1 f x g x A k x dx x h 1 x h + 1 A h x dx  
Let Lr2([xh−1,xh+1]), with r positive integer, be a linear subspace of the Hilbert space L2([xh−1,xh+1]) with orthogonal basis given by polynomials {P0h, P1h,…,Prh} obtained applying the Gram-Schmidt orthonormalization to the linear independent system of polynomials {1, x, x2,…, xr} defined in the interval [xh−1,xh+1]. We have:
P h 0 = 1 P h s + 1 = x s + 1 j = 1 s x s + 1 , P h j P h j , P h j   s = 1 , , r 1
The following Lemma holds (Cfr. Perfilieva et al., 2011 [3], Lemma 1):
Lemma 1.
Let F k r be the orthogonal projection of the function f on Lr2([xh−1,xh+1]). Then:
F h r x = s = 1 r c h , i P h s x  
where
c h , s = f , P kh s k P h s , P h s h = x k 1 x k + 1 f x P h s x A h x dx x k 1 x k + 1 P h s x 2 A h x dx
F h r it is the h-th component of the direct Fr-transform of f Fr[f] = (Fr1, Fr2,,Frn).
The inverse Fr-transform of f in a point x ∊ [a,b] is:
f F , n r x = k = 1 n F h r A k x  
For r = 0 we have P0h = 1 and the F0-transform is given by the F-transform in one variable (F0h(x) = ch,0).
For r = 1 we have P1h = (x − xh) and the h-th component of the F1-transform is given by the formula:
F1h(x) = ch,0 + ch,1 (x − xh) = F0h(x) + ch,1 (x − xh)
If the function f is known in a set of N data points p1,…pN, ch,0 and ch,1 can be discretized in the form:
c h , 0 = i = 1 n f p i A h p i i = 1 n A h p i
c k , 1 = i = 1 n f p i p i x h A h p i i = 1 n A h p i p i x h 2
Likewise, let L2 ([xk1−1, xk1+1] × [xk2−1,xk2+1] ×…× [xkn−1, xkn+1]) be the Hilbert space of square- integrable n-variables functions f: [xk1−1, xk1+1] × [xk2−1,xk2+1] ×…× [xkn−1, xkn+1] → R with the weighted inner product:
f , g h 1 h 2 h n = x h 1 1 x h 1 + 1 x h 2 1 x h 2 + 1 x h n 1 x h n + 1 f x 1 , x 2 , , x n g x 1 , x 2 , , x n A h 1 x 1 A h 2 x 2 A h n x n dx 1 dx n
Two function f , g L2([xh1−1, xh1+1] × [xh2−1,xh2+1] ×…× [xhn−1, xhn+1]) are orthogonal if f , g h 1 h 2 h n = 0 .
Let f: X ⊆ Rn → Y⊆ R be a continuous n-dimensional function defined in a closed set [a1,b1] × [a2,b2] ×…× [an,bn]. Let Ahk, k = 1,…,nk, be the kth basic function defined on the interval [ah,bh] and L2([xh,k−1,xh,k+1]) be the Hilbert space of square-integrable functions f,g: [xh,k−1,xh,k+1] ⟶ R.
The inverse F1-transform of f in a point x = (x1, x2,…, xn) ∊ [a1,b1] × [a2,b2] ×…× [an,bn] is:
f F , n 1 x = h 1 = 1 m 1 h 2 = 1 m 2 h n = 1 m n F h 1 h 2 h n 1 A 1 h 1 x 1 A n h n x n  
where F h 1 h 2 h n 1 x is the (h1, h2,…, hn)th component of the direct F1-transform, given by the formula:
F h 1 h 2 h n 1 x = c h 1 h 2 h n 0 + s = 1 n c h 1 h 2 h n 1 s x s x h t  
If f is known in a set of N n-dimensional data points p1,…pN where pi = (pi1, pi2,…, pin), we obtain:
c h 1 h 2 h n 0 = F h 1 h 2 h n = j = 1 N f p j 1 , p j 2 , p jn A 1 h 1 p j 1 A 2 h 2 p j 2 A nh n p jn j = 1 N A 1 h 1 p j 1 A 2 h 2 p j 2 A n h n p j n  
c h 1 h 2 h n 1 s = j = 1 N f p j 1 , p j 2 , p jn p js x h s A 1 h 1 p j 1 A 2 h 2 p j 2 A nh n p jn j = 1 N p js x h s 2 A 1 h 1 p j 1 A 2 h 2 p j 2 A nh n p jn  
where c h 1 h 2 h n 0 is the component F h 1 h 2 h n   of the multidimensional discrete direct F transform of f, given by (5).

3. The F1-Transform Classification Method

The proposed classification method executes the multidimensional F1-transform to classify data points. The method is schematized in Figure 2.
In a pre-processing phase, PCA is executed in order to reduce the number of features. The scree plot method is applied to detect the principal components. The eigenvalues are sorted in ascending order to create the Scree Plot; then, the elbow point is set; the features corresponding to the eigenvalues under the elbow point on the y-axis will be deleted. Output of the pre-processing phase is the new reduced dataset in the transformed coordinates.
The F1-transform classifier follows the strategy adopted in the MFC algorithm [14]. Initially the classification accuracy threshold is set to θ and coarse-grained fuzzy partitions are created (n = 3). After creating the fuzzy partitions of the domains of each feature the algorithm checks that the data are sufficiently dense with respect to the fuzzy partitions; if they are, then the direct and inverse F1-transforms are computed, otherwise, the algorithm terminates because the cardinality of the fuzzy partitions is too fine compared to the density of data points in the feature space and the direct F1-transform cannot be computed.
The Calculate CA index component measures the accuracy of the classification. To classify a data point we adopt the following method applied in [9].
Let C be the number of classes and let l1, l2,…, lC be the labels of the C classes. Let pj = (pj1, pj2,…, pjn, mj) the jth data point, where mj is the position of the corresponding class.
The components of the F1-transform (18) and (19) are given by:
c h 1 h 2 h n 0 = F h 1 h 2 h n = j = 1 N m j A 1 h 1 p j 1 A 2 h 2 p j 2 A nh n p jn j = 1 N A 1 h 1 p j 1 A 2 h 2 p j 2 A n h n p j n  
c h 1 h 2 h n 1 s = j = 1 N m j p js x h s A 1 h 1 p j 1 A 2 h 2 p j 2 A nh n p jn j = 1 N p js x h s 2 A 1 h 1 p j 1 A 2 h 2 p j 2 A nh n p jn  
Let f F , n 1 p j the inverse F1-transform computed by (16). The index of the class assigned to the data point pj is given by an integer m j ^   in {1,2,…,C} given by:
m j ^ = min m = 1 , , C f F , n 1 ( p j ) m  
where [a] stands for the greatest integer containing the positive real number a.
The classification accuracy CA is given by the ratio between the number of data points correctly classified and the total number of data points.
If CA is less than the threshold θ, then finer fuzzy partitions with n + 1 fuzzy sets are created and the process is iterated. Otherwise, the F1-transform classifier ends, storing the coefficients of the final direct F1-transform: c h 1 h 2 h n 0 and c h 1 h 2 h n 11 c h 1 h 2 h n 1 n computed, respectively by (18) and (19).
A new input data x = (x1, x2,…, xn) will be classified computing the inverse F1-transform f F , n 1 x by (16) and assigning it the wth class by (20).
The F1-transform classification algorithm is schematized below (Algorithm 1):
Algorithm 1. F1-transform classification
  • Execute the PCA feature reduction algorithm
  • Use the Scree plot method to reduce the number of features
  • Create the reduced dataset
  • Set the accuracy threshold θ
  • n:=3
  • CA:=0 // initialize the CA index to 0
  • While CA < θ
  • Create the n-size fuzzy partitions of the feature domains
  • If data are sufficiently dense with respect to the n-dimensional fuzzy partitions Then
  • Calculate the direct F1-transform by (17)
  • Calculate the inverse F1-transform by (16)
  • Calculate the CA index
  • If CA ≥ θ Then
  • Store the direct F1-transform coefficients c h 1 h 2 h n 0 and c h 1 h 2 h n 11 c h 1 h 2 h n 1 n
15.
Return “Data classified”
16.
End if
17.
ld>17. Else
18.
Return “Data cannot be classified”
19.
ld>19. End if
20.
n:= n+1
21.
End While

4. Experimental Results

We tested the F1-transform classification method on various classification datasets extracted from the UC Irvine Machine Learning Repository. Table 2 shows the size and the number of features of the classification datasets used in these comparison tests.
To measure the classification performances, we compared the F1-transform method with the Support Vector Machine (SVM) [24], Random Forest (RF) [25], Artificial Neural Network (ANN) [26], and the MFC classification algorithm (MFC) [14].
Our comparison tests were executed using an Intel Core I7 processor having a 5.4 GHz clock frequency.
After removing ambiguous data points, each dataset was randomly segmented into a training and a test set, containing 80% and 20% of the data points, respectively.
While running RF the number of decision trees is set to 100. The Radial Basis kernel function is used to execute SVM, setting the parameters c and gamma, respectively, to 1.0 and 0.1. ANN is executed by constructing a network having two hidden layers, and setting to 100 the maximum number of epochs.
For brevity, we show the complete results obtained for three classification datasets: the two datasets Red and White Wine Quality and the dataset Adult.

4.1. Red and White Wine Quality Classification Dataset

The red and white wine quality datasets are two classification datasets used to classify the quality of red and white variants of the Portuguese vinho verde wine.
The data points are vectors given by 11 physicochemical features. In Table 3 all the features are described.
The last feature, called quality, is the output feature containing the class of wine quality in a scale between 0 (very bad) to 10 (excellent).
The white wine quality dataset contains 4898 data points, and the red wine quality dataset contains 1599 data points. The two datasets are unbalanced, with over 70% of the data points classified with quality five and six. Paragraphs 4.1.1 and 4.1.2 show the results obtained, respectively, for the white and for the red wine quality datasets.

4.1.1. White Wine Quality Dataset—Classification Results

Now we show the results obtained on the white wine-quality dataset.
In the preprocessing phase, the PCA algorithm was executed on the training set. In Figure 3 the scree plot is shown. The orange line highlights the elbow point. In the transformed training set, the size of the data points is reduced from eleven to five. Then, the F1-transform classification algorithm is executed on the transformed training set. Finally, the stored final F1-transform coefficients are used to classify the data points in the transformed test set.
We compare the classification results with the ones obtained executing SVM, RF, ANN, and MFC on the original training set.
Table 4 shows the accuracy, precision, recall, and F1-score measures computed on the training and test sets executing the five classification methods.
The results in Table 3 show that the best performances are obtained by executing the F1-transform classification method. The F1-transform method improves the classification accuracy obtained by executing ANN by about 5% on the training and test sets; it improves the one obtained by executing MFC by about 7% on the training set and 6% on the test set.

4.1.2. Red Wine Quality Dataset—Classification Results

The PCA algorithm was executed on the training set. Figure 4 shows the scree plot. The orange line highlights the elbow point.
As well as for the white wine quality training set, the size of the data points in the transformed training set is reduced from eleven to five.
Table 5 shows the Accuracy, Precision, Recall, and F1-score measures computed on the training and test sets executing the five classification methods.
The results of the tests applied on the red wine quality dataset confirm that the best performances are obtained executing the F1-transform classification method The F1-transform method improves the classification accuracy of MFC by about 5% on both training and test sets.
The F1-transform method improves the classification accuracy obtained by executing ANN by about 4% on the training set and 3% on the test set; it improves the one obtained by executing MFC by about 7% on the training set and 5% on the test set.

4.2. Adult Classification Dataset

The Adult dataset contains information extracted from the United States census bureau database, which includes information on residents of various states to determine whether a citizen earns more or less than 50K USD per year.
The training set contains 32,561 unambiguous data points; the data points are vectors formed by fourteen input features, four continuous and ten categorical. The test set contains 16,282 unambiguous data points. Table 6 shows the description of each feature.
The output class feature, called class, assumes two values, depending on whether the person makes over 50K or under 50K USD a year. The dataset is unbalanced, where over 75% of the data points are classified with an annual income under 50K USD.
To execute F1-transform, all the categorical features were transformed into integers assigning to each term in the list of values an integer starting from 1 to the number of unique values.
PCA is executed in order to reduce the number of features. Figure 5 shows the scree plot and the elbow point set to 2.5. The first nine components are selected and the input features are reduced from fourteen to nine.
Then, the F1-transform classification method is applied to the reduced training set.
The comparison results with SVM, RF, ANN, and MFC are shown in Table 7.
Even in this case, the best performances are obtained executing F1-transform.
The F1-transform method improves the classification accuracy obtained by executing ANN by about 4% on the training set and 3% on the test set; it improves the one obtained by executing MFC by about 5% on the training set and 4% on the test set.
Table 8 shows, for each dataset, the final classification accuracy and the CPU time measured executing the five algorithms.
In all cases the F1-transform method improves the final accuracy obtained executing SVM, RS, ANN, and MFC; the final accuracy obtained using F1-transform improves that obtained using MFC by a value ranging between 2% (dataset Diabetes) and 7% (dataset Adult).
The CPU times employed by executing F1-transform are compatible with the ones employed by executing the other four classification algorithms. In fact, even if the construction of the multidimensional F1-transform in the proposed algorithm requires more computational expenditure than that necessary to construct the multidimensional F-transform in the MFC algorithm, the application of the PCA algorithm in the preprocessing phase allows for the reduction of the size of the data points, reducing, thus, the computational complexity. The results show that the F1-transform classification method improves the classification accuracy of all other classification methods, with CPU times comparable with those of MFC. In fact, the classification accuracy of F1-transform improves that of MFC by a value between 2% and 7% and the CPU times are similar to those obtained by running MFC.
In a nutshell, in all tests performed F1-transform produces classification accuracy better than that obtained with other classifiers and compatible execution times. In particular, the results in Table 8 show that:
-
The F1-transform classifier improves the accuracy obtained with MFC by a percentage in the range between 2% and 7%;
-
Even if the computation times of the multidimensional F1-transform are higher than those of the multidimensional F-transform, the proposed classification algorithm has execution times similar to those of MFC.
A possible critical point of F1-transform consists of the choice of the elbow point in the scree plot obtained by executing the PCA method in the preprocessing phase. In all the tests performed in the scree plot obtained, there is a sudden change in slope which allows the elbow point to be easily recognized. It could happen that this trend change in the scree plot is not so evident, to the point of not allowing the precise definition of the elbow point. In these cases, the solution could be to run F1-transform multiple times, each time choosing a different number of main components and selecting the one that produces the greatest accuracy. However, this solution can significantly increase the CPU times of the algorithm.

5. Conclusions

We propose a classification method based on the multidimensional F1-transform in which the PCA technique is applied in the preprocessing phase to reduce the size of the features. The main aim of this research is to improve the accuracy of the MFC classification algorithm based on the multidimensional F-transform, without further increasing the processing times.
We compared the performances of the proposed classification method both with those of MFC and with those of the well-known SVM, Random Forest, and ANN classification algorithms; the comparative tests were performed on classification datasets of the UCI machine learning repository.
The results show that the multidimensional F1-transform classification algorithm increases the classification performances obtained executing SVM, Random Forest, ANN, and MFC; in addition, the execution times are comparable with the ones obtained running MFC.
We intend to carry out future research to test the use of the classification method based on the multidimensional F1-transform on a more extensive and varied set of datasets, adapting the method to the management of ambiguous data points.

Author Contributions

Conceptualization, B.C. and F.D.M.; methodology, B.C. and F.D.M.; software, B.C. and F.D.M.; validation, B.C. and F.D.M.; formal analysis, B.C. and F.D.M.; investigation, B.C. and F.D.M.; resources, B.C. and F.D.M.; data curation, B.C. and F.D.M.; writing—original draft preparation, B.C. and F.D.M.; writing—review and editing, B.C. and F.D.M.; visualization, B.C. and F.D.M.; supervision, B.C. and F.D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Perfilieva, I. Fuzzy transforms: Theory and applications. Fuzzy Sets Syst. 2006, 157, 993–1023. [Google Scholar] [CrossRef]
  2. Di Martino, F.; Sessa, S. Fuzzy Transforms for Image Processing and Data Analysis—Core Concepts, Processes and Applications; Springer Nature: Cham, Switzerland, 2020; p. 217. [Google Scholar] [CrossRef]
  3. Perfilieva, I.; Dankova, M.; Bede, B. Towards a higher degree F-transform. Fuzzy Sets Syst. 2011, 180, 3–19. [Google Scholar] [CrossRef]
  4. Di Martino, F.; Sessa, S.; Perfilieva, I. A Summary of F-Transform Techniques in Data Analysis. Electronics 2021, 10, 1771. [Google Scholar] [CrossRef]
  5. Hodáková, P.; Perfilieva, I. F1-transform of functions of two variables. In Proceedings of the 8th conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2013), Milan, Italy, 11–13 September 2013; Advances in Intelligent Systems Research. Atlantis Press: Milano, Italy, 2013; pp. 547–553. [Google Scholar] [CrossRef] [Green Version]
  6. Perfilieva, I.; Dankova, M.; Hurtik, P. Differentiation by the F-transform and application for edge detection. Fuzzy Sets Syst. 2014, 288, 96–114. [Google Scholar] [CrossRef]
  7. Di Martino, F.; Sessa, S.; Perfilieva, I. First Order Fuzzy Transform for Images Compression. J. Signal Inf. Process. 2017, 8, 178–194. [Google Scholar] [CrossRef] [Green Version]
  8. Di Martino, F.; Loia, V.; Perfilieva, I.; Sessa, S. An image coding/decoding method based on direct and inverse fuzzy trans-forms. Int. J. Approx. Reason. 2008, 48, 110–131. [Google Scholar] [CrossRef] [Green Version]
  9. Molek, V.; Perfilieva, I. Deep Learning and Higher Degree F-Transforms: Interpretable Kernels Before and After Learning. Int. J. Comput. Intell. Syst. 2020, 13, 1404–1414. [Google Scholar] [CrossRef]
  10. Di Martino, F.; Sessa, S. Seasonal Time Series Forecasting by F1-Fuzzy Transform. Sensors 2019, 19, 3611. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Perfilieva, I.; Novák, V.; Dvořák, A. Fuzzy transform in the analysis of data. Int. J. Approx. Reason. 2007, 48, 36–46. [Google Scholar] [CrossRef] [Green Version]
  12. Di Martino, F.; Loia, V.; Sessa, S. Fuzzy transforms method in prediction data analysis. Fuzzy Sets Syst. 2011, 180, 146–163. [Google Scholar] [CrossRef]
  13. Novák, V.; Perfilieva, I.; Dvořák, A.; Holčapek, M.; Kreinovich, V. Filtering out high frequencies in time series using F-transform. Inf. Sci. 2014, 274, 192–209. [Google Scholar] [CrossRef] [Green Version]
  14. Di Martino, F.; Sessa, S. A classification algorithm based on multi-dimensional fuzzy transforms. J. Ambient Intell. Humaniz. Comput. 2022, 13, 2873–2885. [Google Scholar] [CrossRef]
  15. Dimitoglou, G.; Adams, J.A.; Jim, C.M. Comparison of the C4.5 and a naive Bayes classifier for the prediction of lung cancer survivability. Comput. Inf. Sci. 2012, 4, 1–9. [Google Scholar] [CrossRef]
  16. Maron, O.; Moore, A.W. The Racing Algorithm: Model Selection for Lazy Learners. In Lazy Learning; Aha, D.W., Ed.; Springer: Dordrecht, The Netherlands, 1997; Volume 11, pp. 193–225. [Google Scholar] [CrossRef]
  17. Bhargawa, N.; Sharma, G.; Bhargava, R.; Mathuria, M. Decision tree analysis on J48 algorithm for data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. (IJARCSSE) 2013, 3, 1114–1119. [Google Scholar]
  18. Chaudhuri, B.B.; Bhattacharya, U. Efficient training and improved performance of multilayer perceptron in pattern classification. Neurocomputing 2007, 34, 11–27. [Google Scholar] [CrossRef]
  19. Hurtik, P.; Perfilieva, I. Fast Training and Real-Time Classification Algorithm Based on Principal Component Analysis and F-Transform. In Proceedings of the 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS), Toyama, Japan, 5–8 December 2018; pp. 275–280. [Google Scholar] [CrossRef]
  20. Sehgal, S.; Singh, H.; Agarwal, M.; Bhasker, V.; Shantanu. Data analysis using principal component analysis. In Proceedings of the 2014 International Conference on Medical Imaging, m-Health and Emerging Communication Systems (MedCom), Greater Noida, India, 7–8 November 2014; pp. 45–48. [Google Scholar] [CrossRef]
  21. Joliffe, I.T.; Morgan, B. Principal component analysis and exploratory factor analysis. Stat. Methods Med. Res. 1992, 1, 69–95. [Google Scholar] [CrossRef] [PubMed]
  22. Joliffe, I.T. Principal Component Analysis. Springer Series in Statistics, 2nd ed.; Springer: New York, NY, USA, 2010; p. 516. ISBN 978-1441929990. [Google Scholar]
  23. Joliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. A 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  25. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  26. Zhang, G.P. Neural Networks for Classification: A Survey. IEEE Trans. Syst. Man Cybern.—Part C Appl. Rev. 2000, 30, 451–462. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Data points and nodes in the example.
Figure 1. Data points and nodes in the example.
Algorithms 16 00128 g001
Figure 2. Overview of the F1-transform classification method.
Figure 2. Overview of the F1-transform classification method.
Algorithms 16 00128 g002
Figure 3. Scree plot obtained for the white wine quality training set; the orange line shows the elbow point.
Figure 3. Scree plot obtained for the white wine quality training set; the orange line shows the elbow point.
Algorithms 16 00128 g003
Figure 4. Scree plot obtained for the red wine quality training set; the orange line shows the elbow point.
Figure 4. Scree plot obtained for the red wine quality training set; the orange line shows the elbow point.
Algorithms 16 00128 g004
Figure 5. Adult training set—Scree plot; the orange line show the elbow point.
Figure 5. Adult training set—Scree plot; the orange line show the elbow point.
Algorithms 16 00128 g005
Table 1. Set of nodes of the two input variables in the example.
Table 1. Set of nodes of the two input variables in the example.
Input VariableNodeValue
First variablex11.1
x23.0
x34.9
Second variabley10.1
y20.55
y31.0
Table 2. Datasets in the UCI machine learning datasets used in the comparison analysis.
Table 2. Datasets in the UCI machine learning datasets used in the comparison analysis.
DatasetNumber of Data PointsNumber of Features
Adult48,84214
Balance Scale6254
Bank Marketing41,18817
Breast cancer2869
Echocardiogram13212
Ecoli3367
Heart disease30314
Hepatitis15519
Thyroid disease720021
Wine quality—red wine159912
Wine quality—white wine489812
Table 3. Red and white wine datasets: Description of the features.
Table 3. Red and white wine datasets: Description of the features.
Feature NameDescriptionType of Field
fixed aciditytartaric acid (g/dm3)Continuous
volatile acidityAcetic acid (g/dm3)Continuous
citric acidCitric acid (g/dm3)Continuous
residual sugarResidual sugar (g/dm3)Continuous
chloridesSodium chloride(g/dm3)Continuous
free sulfur dioxideFree sulfur dioxide (mg/dm3)Continuous
total sulfur dioxideTotal sulfur dioxide (mg/dm3)Continuous
densityDensity (g/cm3)Continuous
pHpHContinuous
sulphatesPotassium sulphate (g/dm3)Continuous
alcoholAlcohol (vol.%)Continuous
qualityWine qualityList (an integer from 0 to 10)
Table 4. White wine dataset: Classification performance comparison.
Table 4. White wine dataset: Classification performance comparison.
PhaseMethodAccuracyPrecisionRecallF1-Score
TrainingSVM0.750.770.740.75
RF0.730.760.710.73
ANN0.780.790.770.78
MFC0.760.770.750.76
F1-transform0.830.820.830.82
TestSVM0.740.750.740.74
RF0.730.740.710.72
ANN0.770.800.770.79
MFC0.760.760.740.76
F1-transform0.820.810.820.81
Table 5. Red wine dataset: Classification performance comparison.
Table 5. Red wine dataset: Classification performance comparison.
PhaseMethodAccuracyPrecisionRecallF1-Score
TrainingSVM0.680.680.670.67
RF0.700.680.660.67
ANN0.720.710.690.72
MFC0.690.690.680.69
F1-transform0.760.740.750.75
TestSVM0.690.690.670.68
RF0.690.680.670.68
ANN0.720.730.720.72
MFC0.700.700.710.70
F1-transform0.750.760.750.75
Table 6. Adult dataset: Description of the features.
Table 6. Adult dataset: Description of the features.
Feature NameDescriptionType of Field
ageAgeContinuous
workclassWork classList
fnlwgtSurvey weightContinuous
educationEducationList
education-numEducation numberContinuous
marital statusMarital statusList
occupationOccupationList
relationshipType of relationshipList
raceRaceList
sexGenderList: (Female, Male)
capital-gameCapital-gameContinuous
capital-lossCapital-lossContinuous
hours-per-weekHours of work per weekContinuous
native-countryNative countryList:
classAnnual incomeList: (<=50 K, <50 K)
Table 7. Adult dataset: Classification performance comparison.
Table 7. Adult dataset: Classification performance comparison.
PhaseMethodAccuracyPrecisionRecallF1-Score
TrainingSVM0.870.850.840.84
RF0.850.840.830.83
ANN0.880.860.860.86
MFC0.860.850.850.85
F1-transform0.930.890.900.89
TestSVM0.870.860.850.85
RF0.850.830.820.67
ANN0.880.870.860.86
MFC0.870.860.860.86
F1-transform0.920.910.900.90
Table 8. Classification Accuracy and CPU time comparison results.
Table 8. Classification Accuracy and CPU time comparison results.
DatasetMethodSVMRFANNMFCF1-tr
AdultAccuracy0.870.850.880.860.93
CPU time (s)858.22849.74915.19881.86877.37
Balance scaleAccuracy0.950.920.960.950.98
CPU time (s)26.4826.1928.4127.6326.92
Bank MarketingAccuracy0.770.750.780.770.82
CPU time (s)832.31823.09944.56867.83864.95
Breast cancerAccuracy0.900.880.910.900.94
CPU time (s)31.6731.7833.5932.0631.78
DiabetesAccuracy0.730.730.760.750.77
CPU time (s)44.7743.4647.0945.0244.91
EchocardiogramAccuracy0.750.730.750.740.78
CPU time (s)31.6730.9333.1031.2831.32
EcoliAccuracy0.790.780.810.790.83
CPU time (s)32.6532.3834.3232.7732.82
Heart diseaseAccuracy0.830.820.850.850.87
CPU time (s)37.7937.5640.1838.5138.47
HepatitisAccuracy0.880.890.910.890.93
CPU time (s)42.6142.4944.2642.6842.73
Thyroid diseaseAccuracy0.950.940.970.960.99
CPU time (s)103.34102.85113.93102.98102.87
Wine quality—red wineAccuracy0.680.700.720.690.76
CPU time (s)67.5266.8169.5468.1568.23
Wine quality—white wineAccuracy0.750.730.780.760.83
CPU time (s)58.6058.4659.8758.7258.85
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cardone, B.; Martino, F.D. A Novel Classification Algorithm Based on Multidimensional F1 Fuzzy Transform and PCA Feature Extraction. Algorithms 2023, 16, 128. https://0-doi-org.brum.beds.ac.uk/10.3390/a16030128

AMA Style

Cardone B, Martino FD. A Novel Classification Algorithm Based on Multidimensional F1 Fuzzy Transform and PCA Feature Extraction. Algorithms. 2023; 16(3):128. https://0-doi-org.brum.beds.ac.uk/10.3390/a16030128

Chicago/Turabian Style

Cardone, Barbara, and Ferdinando Di Martino. 2023. "A Novel Classification Algorithm Based on Multidimensional F1 Fuzzy Transform and PCA Feature Extraction" Algorithms 16, no. 3: 128. https://0-doi-org.brum.beds.ac.uk/10.3390/a16030128

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop