Next Article in Journal
Affecting Young Children’s Knowledge, Attitudes, and Behaviors for Ultraviolet Radiation Protection through the Internet of Things: A Quasi-Experimental Study
Next Article in Special Issue
Brain Tumor Segmentation of MRI Images Using Processed Image Driven U-Net Architecture
Previous Article in Journal
Blockchain and Recordkeeping: Editorial
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

B-MFO: A Binary Moth-Flame Optimization for Feature Selection from Medical Datasets

by
Mohammad H. Nadimi-Shahraki
1,2,*,
Mahdis Banaie-Dezfouli
1,2,
Hoda Zamani
1,2,
Shokooh Taghian
1,2 and
Seyedali Mirjalili
3,4,*
1
Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Najafabad 8514143131, Iran
2
Big Data Research Center, Najafabad Branch, Islamic Azad University, Najafabad 8514143131, Iran
3
Centre for Artificial Intelligence Research and Optimisation, Torrens University Australia, Adelaide 5000, Australia
4
Yonsei Frontier Lab, Yonsei University, Seoul 03722, Korea
*
Authors to whom correspondence should be addressed.
Submission received: 20 September 2021 / Revised: 19 October 2021 / Accepted: 19 October 2021 / Published: 25 October 2021
(This article belongs to the Special Issue Advances of Machine and Deep Learning in the Health Domain)

Abstract

:
Advancements in medical technology have created numerous large datasets including many features. Usually, all captured features are not necessary, and there are redundant and irrelevant features, which reduce the performance of algorithms. To tackle this challenge, many metaheuristic algorithms are used to select effective features. However, most of them are not effective and scalable enough to select effective features from large medical datasets as well as small ones. Therefore, in this paper, a binary moth-flame optimization (B-MFO) is proposed to select effective features from small and large medical datasets. Three categories of B-MFO were developed using S-shaped, V-shaped, and U-shaped transfer functions to convert the canonical MFO from continuous to binary. These categories of B-MFO were evaluated on seven medical datasets and the results were compared with four well-known binary metaheuristic optimization algorithms: BPSO, bGWO, BDA, and BSSA. In addition, the convergence behavior of the B-MFO and comparative algorithms were assessed, and the results were statistically analyzed using the Friedman test. The experimental results demonstrate a superior performance of B-MFO in solving the feature selection problem for different medical datasets compared to other comparative algorithms.

Graphical Abstract

1. Introduction

Nowadays, with advances in science and medical technology, numerous large medical datasets including many features have been created, which also contain redundant and irrelevant features. Data-driven decision making in high-risk diseases such as heart diseases [1] is a significant trend in which many data mining and machine learning methods are introduced [2]. Since medical data are obtained from multiple sources, all captured features are not necessary and some of them are irrelevant and redundant, which may reduce algorithms’ performance in the data-driven decision-maker software. The irrelevant and redundant data can be removed since they are useless in improving the accuracy’s classification, because the irrelevant data have a weak correlation with class and the redundant data have a strong correlation with one or more features. For instance, the FSBRR algorithm [3] removes the feature of radius in the Breast Cancer Wisconsin Dataset as a redundant feature because its correlation is very high with feature of smoothness. Feature selection can address this problem, through which a subset of the relevant and effective features must be found. Feature selection is used in a variety of real-world applications such as disease diagnosis [4,5], email spam detection [6], text clustering [7,8], and human activity recognition [9].
Based on the strategy used for selecting features, the feature selection algorithms can be categorized into three methods [10]: filter-based, wrapper-based, and hybrid methods. The filter-based methods analyze features based on intrinsic properties of the data, without using the classification algorithm [11], whereas wrapper-based methods use classifiers to assess possible solutions [12]. Hybrid methods combine the benefits of both filter-based and wrapper-based approaches. Although the wrapper-based methods are computationally expensive and their performance depends on the utilized learning algorithm, they are usually more accurate than the other two categories [13]. The wrapper-based methods use different search approaches such as exhaustive, random, greedy, heuristic, and metaheuristic [14], which, except for the last search approach, are impractical to select effective features from medium and large datasets [15]. Thus, a wide range of metaheuristic optimization algorithms is proposed to solve the feature selection problems for applications with large datasets such as medicine [16].
Metaheuristic optimization algorithms are mostly inspired by nature, and can be classified into three categories: evolutionary, physics-based, and swarm intelligence. The simplicity in development and sufficient results of the swarm intelligence (SI) algorithms for a variety of problems have made some of them very attractive and popular, such as particle swarm optimization (PSO) [17], grey wolf optimizer (GWO) [18], whale optimization algorithm (WOA) [19], moth-flame optimization (MFO) [20], and aquila optimizer (AO) [21]. Since SI algorithms mimic the behavioral model of insects, aquatic animals, terrestrial animals, and birds, these algorithms can share information between their swarms that enhances their robustness [22]. Local optima trapping, premature convergence, and unbalanced search strategy may all be disadvantages of these algorithms, despite their benefits [23,24]. Hence, many improvements of these algorithms have been introduced thus far, including optimal control strategies [25], chaotic whale optimization algorithm (CWOA) [26], grasshopper optimization algorithm (GOA) based on the opposition-based learning (OBLGOA) [27], improved grey wolf optimizer (I-GWO) [28], disruption bare-bones particle swarm optimization (DBPSO) [29], improved krill herd (IKH) [30], particle swarm optimization with backtracking search optimization algorithm (PSOBSA) [31], and representative-based grey wolf optimizer (R-GWO) [32].
Since feature selection is an NP-complete problem [33], SI algorithms are widely used to solve this problem. Many researchers have adapted different SI algorithms for converting from the continuous form to binary, such as wrapper-based binary sine cosine algorithm (WBSCA) [34], binary grasshopper optimization algorithm (BGOA) [35], binary butterfly algorithm (BBA) [36], efficient binary symbiotic organisms search (EBSOS) [37], binary grey wolf optimizer with support vector machine (GWOSVM) [38], and island binary moth-flame optimization (IsBMFO) [39]. However, most binary SI algorithms are not effective or scalable enough to select effective features from large datasets as well as small ones.
Therefore, in this study, a binary moth-flame optimization (B-MFO) is proposed to solve the feature selection problem. The canonical MFO was introduced by Mirjalili [20], which was inspired by the transverse orientation mechanism of the moths in the night around artificial lights. There have been many variants of MFO developed, such as EMFO [40], IMFO [41], CLSGMFO [42], and improved MFO [43], given its simplicity, weaknesses, and applications. Several researchers have applied S-shaped and V-shaped transfer functions to convert the continuous MFO to binary. In this study, in addition to the S-shape and V-shaped transfer functions, another transfer function named U-shaped was adapted, which is a novel transfer function with multiple alterable parameters for solving feature selection problems. Each category contains four versions of transfer functions. Therefore, twelve versions of B-MFO were introduced in three categories of transfer functions. Then, they were evaluated by seven medical datasets: Pima, Lymphography, Breast-WBDC, PenglungEw, Parkinson, Colon, and Leukemia. In addition, the winner versions of B-MFO were compared with the best results gained by four well-known binary metaheuristic optimization algorithms: BPSO [44], bGWO [45], BDA [46], and BSSA [47]. The convergence behavior of the winner versions of B-MFO and comparative algorithms was evaluated and visualized. Finally, the results were statistically analyzed by the Friedman test.
In the rest of this study, Section 2 discusses the related works. Section 3 describes the canonical MFO algorithm. Then, the proposed B-MFO is presented and evaluated in Section 4 and Section 5. Finally, the conclusion and future works are explained in Section 6.

2. Related Work

There are many different discrete problems such as feature selection [48,49], tour planning [50], complex systems [51], and traveling salesman problems [52] that must be solved with discrete optimization algorithms [53]. To solve feature selection problems, wrapper-based methods widely apply discrete metaheuristic optimization algorithms as search strategies to find effective feature subsets [47,54,55,56,57]. Since the majority of metaheuristic optimization algorithms such as DA [58], SSA [59], HGSO [60], FFA [61], MTDE [62], QANA [63], and AO [21] have been proposed to solve continuous problems such as engineering [64,65,66,67,68], cloud computing [69], and rail-car fleet sizing [70], they should be converted into binary algorithms for using in wrapper-based methods and solving discrete problems. The continuous algorithm can be converted to a binary form in a variety of ways [71]. The JayaX [72] and BitABC [73] use the logical operators for changing to the binary form. Another way is using the transfer function (TF), which converts the continuous search space to the binary one in which the search agents can shift to nearer or farther corners of a hypercube by flipping various numbers of bits [44]. Thus, transfer functions apply a mapping function to gain the probability of changing a solution from 0 to 1 or vice versa.
Many transfer functions were introduced such as S-shaped [44,74], V-shaped [74,75], and U-shaped [76] to convert the continuous metaheuristic optimization algorithms to binary ones. The binary particle swarm optimization (BPSO) [44] was introduced by Kennedy and Eberhart, which applied a sigmoid function to solve various discrete optimization problems [77,78,79]. Yuan et al. [80] proposed a new improved binary PSO (IBPSO) in which the BPSO is combined with the lambda-iteration method to solve the unit commitment problem. The BPSO has been applied for various problems such as text clustering [81,82], text feature selection [83], and disease diagnosis [84,85,86].
Binary grey wolf optimizer (bGWO) is another wrapper method for feature selection which was proposed by Emary et al. [45]. The binary version of GWO was performed using the sigmoid transfer function and was utilized to fix the feature selection problems and large-scale unit commitment [87,88], and text classification [89]. To enhance the solution quality of transfer functions, Hu et al. in [90] introduced new transfer functions and improved them based on analysis parameters of GWO. Al-tashi et al. [87] proposed a new hybrid optimization algorithm named (BPSOGWO) to find the best feature subset.
Zamani et al. [91] proposed a new metaheuristic algorithm named feature selection based on whale optimization algorithm (FSWOA) to reduce the dimensionality of medical datasets. Hussien et al. proposed two binary variants of WOA (bWOA) [92,93] based on V-shaped and S-shaped to use for dimensionality reduction and classification problems. The binary WOA (BWOA) [94] was suggested by Reddy et al. for solving the PBUC problem, which mapped the continuous WOA to the binary one through various transfer functions.
The binary dragonfly algorithm (BDA) [95] was proposed by Mafarja to solve discrete problems. The BDFA [96] was proposed by Sawhney et al. which incorporates a penalty function for optimal feature selection. Although BDA has good exploitation ability, it suffers from being trapped in local optima. Thus, a wrapper-based approach named hyper learning binary dragonfly algorithm (HLBDA) [97] was developed by Too et al. to solve the feature selection problem. The HLBDA used the hyper learning strategy to learn from the personal and global best solutions during the search process.
Faris et al. employed the binary salp swarm algorithm (BSSA) [47] in the wrapper feature selection method. Ibrahim et al. proposed a hybrid optimization method for the feature selection problem which combines the slap swarm algorithm with the particle swarm optimization (SSAPSO) [98]. The chaotic binary salp swarm algorithm (CBSSA) [99] was introduced by Meraihi et al. to solve the graph coloring problem. The CBSSA applies a logistic map to replace the random variables used in the SSA, which causes it to avoid the stagnation to local optima and improves exploration and exploitation. A time-varying hierarchal BSSA (TVBSSA) was proposed in [15] by Faris et al. to design an improved wrapper feature selection method, combined with the RWN classifier.

3. The Canonical Moth-Flame Optimization

Moth-flame optimization (MFO) [20] is a nature-inspired algorithm that imitates the transverse orientation mechanism of moths in the night around artificial lights. This mechanism applies to navigation, and forces moths to fly in a straight line and maintain a constant angle with the light. MFO’s mathematical model assumes that the moths’ position in the search space corresponds to the candidate solutions, which are represented in a matrix, and the corresponding fitness of the moths are stored in an array. In addition, a flame matrix shows the best positions obtained by the moths so far, and an array is used to indicate the corresponding fitness of the best positions. To find the best result, moths search around their corresponding flame and update their positions; therefore, moths never lose their best position. Equation (1) shows the position updating of each moth relative to the corresponding flame.
M i = S ( M i , F j )
where S is the spiral function, and Mi and Fj represent the i-th moth and the j-th flame, respectively. The main update mechanism is a logarithmic spiral, which is defined by Equation (2):
S ( M i , F j ) = D i . e b t . cos ( 2 π t ) + F j
where Di is the distance between the i-th moth and the j-th flame, which is computed by Equation (3), and b is a constant value for defining the shape of the logarithmic spiral. The parameter t is a random number in the range [−r, 1], in which r is a convergence factor and linearly decreases from −1 to −2 during the course of iterations.
D i = | M i F j |
To avoid trapping in local optima, each moth updates its position using one flame. In each iteration, the list of flames is updated and sorted based on their fitness values. The first moth updates its position according to the best flame and the last moth updates its position according to the worst flame. In addition, to increase the exploitation of the best promising solutions, the number of flames is reduced in the course of iterations by an adaptive mechanism, which is shown in Equation (4):
f l a m e N o = r o u n d ( N i t e r · ( N 1 ) / M a x I t e r )
where N indicates the maximum number of moths, and iter and MaxIter are the current and maximum number of iterations, respectively.

4. Binary Moth-Flame Optimization (B-MFO)

In this study, three different categories of S-shaped, V-shaped, and U-shaped transfer functions are applied to convert the MFO algorithm from continuous to binary for solving the feature selection problem. First, in Section 4.1, these different categories of transfer functions and how to apply them to develop different variants of B-MFO are described in detail accompanied by their flowchart and pseudo-code. Then, in Section 4.2, solving feature selection problem using B-MFO is explained.

4.1. Developing Different Variants of B-MFO

4.1.1. B-MFO Using S-Shaped Transfer Function

The sigmoid (S-shaped) function shown in Equation (5) is a usual transfer function named S2 [100], which was originally introduced for developing the binary PSO (BPSO) [44].
T F s ( v i d ( t + 1 ) ) = 1 / ( 1 + e x p v i d ( t ) )
where vid(t) is the i-th search agent’s velocity in dimension d at iteration t. The TFs converts the velocity to a probability value and the next position xid(t + 1) is obtained with the probability value of its velocity as given in Equation (6), where r is a random value between 0 and 1.
x i d ( t + 1 ) = { 0 I f   r < T F s ( v i d ( t + 1 ) ) 1 I f   r T F s ( v i d ( t + 1 ) )
According to Equation (6), the position updating of each search agent is computed by the current velocity and the previous position. In some binary metaheuristic algorithms such as BPSO [44] and BGSA [101], the velocity is used in transfer functions to calculate the probability value of changing the position. In some other algorithms such as bGWO [45] and BMFO [102], transfer functions apply the updated position of each search agent to calculate the probability value. In addition to the S2 function introduced in Equation (5), three variants of the S-shaped function named S1, S3, and S4 [74] are developed by manipulating the coefficient of the velocity value in Equation (5). All variants of the S-shaped transfer function are shown in Table 1 and visualized in Figure 1, which shows that if the slope of the S-shaped transfer function increases, the probability value of changing the position value increases. Thus, among of S-shaped functions, the S1 obtains the highest probability and the S4 provides the lowest value for the same velocity, which can affect the position updating of search agents and finding the optimum solution. In addition to the advantages of S-shaped, this category of transfer functions has a shortcoming in those metaheuristic algorithms that search agents are updated considering by their velocity value. The zero value of velocity is converted to one or zero with a probability of 0.5, while the search agents should not be moved with the zero value of velocity [103]. Several researchers tried to resolve this shortcoming, but they could not avoid trapping into local optima.

4.1.2. B-MFO Using V-Shaped Transfer Function

The hyperbolic (V-shaped) function [104] named V2 was presented for developing BGSA [101] which has new position updating as shown in Equation (7).
T F v ( v i d ( t + 1 ) ) = | t a n h ( v i d ( t ) ) |
where vid(t) shows the i-th search agent’s velocity in dimension d at iteration t. Since the V-shaped function is different from the S-shaped function, this function is updated with new rules that are shown in Equation (8).
x i d ( t + 1 ) = { ¬ ( x i d ( t ) ) I f   r < T F v ( v i d ( t + 1 ) ) x i d ( t ) I f   r T F v ( v i d ( t + 1 ) )
where xid(t) indicates the i-th search agent’s position in dimension d at iteration t and ¬(xid(t)) represents the complement of (xid(t)). In addition, r is a random value between 0 and 1. If the velocity is low, the TFv encourages the search agents to stay in their current positions; otherwise, if the velocity is high, the search agents switch to complement. In addition to the function introduced in Equation (7), three variants of V-shaped function named V1, V3, and V4 are introduced [74], which are shown in Table 1 and Figure 1. It can be seen that V1 provides a higher probability than V2, V3, and V4 for the same velocity, which can affect the position updating of search agents and finding the optimum solution. The V-shaped transfer function was proposed to tackle some shortcomings of the S-shaped. Although this transfer function can solve the problem of metaheuristic algorithms with the velocity by zero value, they still suffer from falling into local optima. If in an iteration, the velocity of a search agent is low, in the next iteration, the search agent remains the same with high probability [103]. In this study, in addition to the transfer functions mentioned so far, we utilized a novel U-shaped transfer function to convert the continuous MFO to binary form.

4.1.3. B-MFO Using U-Shaped Transfer Function

The U-shaped transfer function [76] was proposed with two control parameters α and β that define the slope and width of the U-shaped function’s basin, respectively. Equations (9) and (10) indicate the U-shaped function.
T F u ( v i d ( t + 1 ) ) = α | ( v i d ( t ) ) β |   α = 1 ,   β = 1.5 ,   2 ,   3 ,   4
x i d ( t + 1 ) = { ¬ ( x i d ( t ) )   I f   r < T F u ( v i d ( t + 1 ) ) x i d ( t )   I f   r T F u ( v i d ( t + 1 ) )  
where vid(t) shows the i-th search agent’s velocity in dimension d at iteration t, and the r is a uniform random number between 0 and 1. We used the U-shaped transfer function accompanied by two main conditions shown in Equations (11) and (12) in which the lower and upper bounds are limited by 1.
lim v i U ( v i d ( t ) ) = 1
lim v i U ( v i d ( t ) ) = 1
The variants of the U-shaped transfer function named U1, U2, U3, and U4 were introduced using a different control parameter β [76], shown in Table 1 and Figure 1. In the initial iterations, the exploration is an important step to seek the whole search space and after switching from exploration to exploitation, in the final iterations, the exploitation step is essential to find better solutions. The U-shaped transfer functions with different shapes have different exploratory and exploitative behaviors. As illustrated in Figure 1, the U-shaped is comparable to V-shaped; however, variants of U-shaped have a higher exploration in contrast to variants of V-shaped. The U1 function and variants of V-shaped intersect around +0.7 and −0.7; before this point, the exploration of V-shaped is higher, and after it, variants of U-shaped display a higher exploration. Therefore, U-shaped transfer functions have sufficient potential to outperform the other transfer functions.
To map the continuous MFO to the binary one, each search agents’ dimension obtained by Equation (2) is converted to a probability value in the range [0, 1] using all variants of TFs, TFv, and TFu. Therefore, we adapted twelve different transfer functions from the three categories, S-shaped, V-shaped, and U-shaped. By using these transfer functions, each search agents’ position is mapped to the probability value using Equations (5), (7), and (9). Finally, this probability value is updated using Equations (6), (8), and (10), and a new search agent’s position is created. Thus, we compare twelve versions of proposed B-MFO to find the suitable version. It is important to apply a proper transfer function, since converting a continuous search space to a binary one is significant in the results of classifier of feature selection problems. Algorithm 1 and Figure 2 show the pseudo-code and the flowchart of B-MFO, respectively. The time complexity of B-MFO is O(NDT) where N, D, and T signify the population size, dimension, and maximum number of iterations, respectively.

4.2. B-MFO for Solving Feature Selection Problem

The feature selection problem is to select an optimum subset of the relevant and effective features to construct a more accurate data model. To formulate the feature selection problem, a vector of one or zero as a subset of features is defined by using a transfer function, which obtains probability values to change elements in the vector that can be 0 (not selected) or 1 (selected). The length of the vector is equal to the dimensions of the dataset. In addition, a fitness function is determined to evaluate the subset of features. The problem of feature selection is referred to as a multi-objective optimization problem [105,106] since it usually aims to minimize the number of selected features and maximize the data model accuracy. As shown in Equation (13), the objectives are represented in a fitness function, where CE shows the classification error. Nsf and Ntf are the number of selected features and total features of the dataset, respectively. η and λ (1 − η) demonstrate the significance of classification quality and feature reduction, respectively [46].
F i t n e s s = η . C E + λ N s f N t f
Algorithm 1. The pseudo-code of B-MFO.
Algorithm of binary moth-flame optimization (B-MFO)
Input: N (Population size), MaxIter (Maximum number of iterations), dim (the number of dimensions).
Output: The global optimum (best flame).
1:Procedure B-MFO
2:Initializing the moth population.
3:While iter < MaxIter
4:    Updating number of flames (flameNo) using Equation (4).
5:    Calculating the fitness function M as OM.
6:    If iter==1
7:        OF = sort (OM1).
8:        F = sort (M1).
9:     Else
10:        OF = sort (OMiter-1, OMiter).
11:        F = sort (Miter-1, Miter).
12:    End if
13:    Determining the best flame.
14:    For i = 1: N
15:      For j = 1: dim
16          Updating r and t.
17:          Calculating D using Equation (3).
18:          Updating M(i, j) using Equation (1) and Equation (2).
19:      End for
20:     End for
21:    Calculating the probability value of M(i, j) using TFs in Equation (5), TFv in Equation (7), and TFu in Equation (9).
22:    Updating new position using Equation (6), Equation (8), and Equation (10).
23:    iter = iter +1.
24:End while
25:Return the global optimum (best flame).
26:End Procedure

5. Experimental Assessment

To execute the proposed B-MFO and other comparative algorithms in a fair condition, all algorithms were implemented using the MATLAB 2018a platform. They were conducted on Windows 10 with a processor Intel Core i7-6500U CPU (2.50 GHz) with 8 GB on main memory. In addition, the population size (N) and the maximum number of iterations (MaxIter) were considered as 20 and 300, respectively, and each algorithm was run 30 times. The proposed B-MFO is experimentally evaluated on all transfer functions of the three categories of S-shaped [44], V-shaped [101], and U-shaped [76] to solve the feature selection problem. The experimental results were compared with the best result gained by four well-known binary metaheuristic algorithms: BPSO [44], bGWO [45], BDA [46], and BSSA [47]. The parameters of BPSO and bGWO were set to be the same as original studies such as that for BPSO w = [0.9 to 0.4] and C1 = C2 = 2, and for bGWO a = [2 to 0]. In addition, the other algorithms did not need any parameter setting.

5.1. Data Description

In this study, seven medical datasets [107,108] were applied to evaluate the B-MFO and comparative algorithms in the feature selection problem. Table 2 shows the details of datasets in terms of the number of features, number of samples, and size that is considered large if the number of features is more than 100. In our evaluation, a k-nearest neighbor (k-NN) classifier with a Euclidean distance metric and kneighbor = 5 [56] was applied as a fitness function to assess the quality of selected features subsets. To reduce the overfitting problem, the k-fold cross-validation with kfold = 10 was utilized, which divides datasets into k folds, and the classifier used the k-1 folds for training data and the 1 fold for test data. This process was repeated for each of the k folds, and all folds were selected once as test data.

5.2. Evaluation Criteria

The proposed B-MFO was compared with comparative algorithms using various metrics consisting of average accuracy, the standard deviation of accuracy, average fitness, the standard deviation of fitness, and the average number of selected features. Moreover, the performance of the k-NN classifier was measured using sensitivity and specificity derived from the confusion matrix, which includes the information about actual and predicted classifications given by the classifier. The sensitivity is a metric that evaluates the ability of the model to predict true positives, and the specificity is the metric that measures the ability of the model to predict true negatives. The average accuracy gained by B-MFO and comparative algorithms was statistically analyzed by the nonparametric Friedman test [109]. In addition, the convergence behavior of B-MFO and comparative algorithms were visualized.

5.3. Discussion of the Results

In this section, the best results achieved by B-MFO using three categories S-shaped, V-shaped, and U-shaped for each dataset are compared to comparative algorithms in terms of various metrics. Table 3 reports the average accuracy, the standard deviation of accuracy, and the average number of selected features. The average fitness and the standard deviation of fitness are indicated in Table 4, where the bold letters characterize the best results. In addition, Table 5 demonstrates achieved specificity and sensitivity by the k-NN classifier on large datasets, which proves that B-MFO has presented better results than the comparative algorithms. Our hypothesis is that the sensitivity and specificity of B-MFO are more reliable than other comparative algorithms when the size of datasets is increased. According to Figure 3 and the average accuracy obtained, the B-MFO outperforms the comparative algorithms, especially on large datasets. In addition, in most datasets, the minimum number of features selected by B-MFO shows that B-MFO could avoid the local optima trapping and obtain the optimum solution. Figure 4 presents the average number of selected features in large datasets: PenglungEW, Parkinson, Colon, and Leukemia. These results indicate the significant effect of transfer functions on algorithms’ behavior in the position updating of search agents and finding the optimum solution in the feature selection problem. Among the three categories of transfer functions used by B-MFO, the U-shaped transfer functions outperform the V-shaped and S-shaped in terms of maximizing the classification accuracy and minimizing the number of selected features, especially for large datasets.
Figure 5 shows convergence curves of average fitness achieved by the winner version of B-MFO and comparative algorithms. The curves show that B-MFO can find better solutions and provide a balance between exploration and exploitation. To statistically analyze the results, the Friedman test was applied to rank B-MFO and comparative algorithms. Table 6 presents the results of the Friedman test on average accuracy achieved by B-MFO and comparative algorithms, which shows B-MFO has the first rank in comparison with other algorithms.

6. Conclusions and Future Work

Numerous large datasets that include redundant and irrelevant features have been created in the field of medical technology. To select effective features from different medical datasets, this study proposed three categories of binary moth-flame optimization (B-MFO). Consequently, the canonical MFO was converted from continuous to binary using variants of S-shaped, V-shaped, and U-shaped transfer functions. Each category contains four versions of transfer functions; accordingly, twelve versions of B-MFO were experimentally evaluated on seven medical datasets. Finally, the winner versions of B-MFO were compared with the best results achieved by four well-known binary metaheuristic optimization algorithms: BPSO, bGWO, BDA, and BSSA. The results show that the B-MFO algorithm outperforms other comparative algorithms in terms of classification accuracy and minimizing the number of selected features, especially for large medical datasets. In addition, among variants of transfer functions used by B-MFO, the U-shaped functions outperform the V-shaped and S-shaped in terms of classification accuracy and minimized the number of selected features. For future works, B-MFO particularly using U-shaped transfer functions can be applied to select effective features in large-scale optimization problems, since it showed sufficient results for large datasets. In addition, B-MFO can be applied to other applications such as various engineering applications.

Author Contributions

Conceptualization, M.H.N.-S., M.B.-D.; Methodology, M.H.N.-S., M.B.-D.; Software, M.H.N.-S., M.B.-D.; Validation, M.H.N.-S., M.B.-D., H.Z., S.T.; Formal analysis, M.H.N.-S., M.B.-D., H.Z., S.T.; Investigation, M.H.N.-S., M.B.-D., H.Z., S.T.; Resources, M.H.N.-S., M.B.-D., H.Z., S.T., S.M.; Data curation, M.H.N.-S., M.B.-D.; Writing, M.H.N.-S., M.B.-D.; Original draft preparation, M.H.N.-S., M.B.-D.; Writing—review and editing, M.H.N.-S., M.B.-D., H.Z., S.T., S.M.; Visualization, M.H.N.-S., M.B.-D., H.Z., S.T., S.M.; Supervision, M.H.N.-S.; Project administration, M.H.N.-S., S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sorkhabi, L.B.; Gharehchopogh, F.S.; Shahamfar, J. A systematic approach for pre-processing electronic health records for mining: Case study of heart disease. Int. J. Data Min. Bioinform. 2020, 24, 97. [Google Scholar] [CrossRef]
  2. Esfandiari, N.; Babavalian, M.R.; Moghadam, A.-M.E.; Tabar, V.K. Knowledge discovery in medicine: Current issue and future trend. Expert Syst. Appl. 2014, 41, 4434–4463. [Google Scholar] [CrossRef]
  3. Zhang, B.; Cao, P. Classification of high dimensional biomedical data based on feature selection using redundant removal. PLoS ONE 2019, 14, e0214406. [Google Scholar] [CrossRef] [PubMed]
  4. Arjenaki, H.G.; Nadimi-Shahraki, M.H.; Nourafza, N. A low cost model for diagnosing coronary artery disease based on effective features. IJECCE 2015, 6, 93–97. [Google Scholar]
  5. Dezfuly, M.; Sajedi, H. Predict Survival of Patients with Lung Cancer Using an Ensemble Feature Selection Algotithm and Classification Methods in Data Mining. J. Inf. 2015, 1, 1–11. [Google Scholar]
  6. Mohammadzadeh, H.; Gharehchopogh, F.S. Novel Hybrid Whale Optimization Algorithm with Flower Pollination Algorithm for Feature Selection: Case Study Email Spam Detection. Comput. Intell. 2021, 37, 176–209. [Google Scholar] [CrossRef]
  7. Abualigah, L.M.; Khader, A.T.; Al-Betar, M.A. Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering. In Proceedings of the 2016 7th International Conference on Computer Science and Information Technology (CSIT), Amman, Jordan, 13–14 July 2016; 2016; pp. 1–6. [Google Scholar] [CrossRef]
  8. Abualigah, L.M.Q. Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering; Springer: Berlin, Germany, 2019. [Google Scholar] [CrossRef]
  9. Helmi, A.M.; Al-Qaness, M.A.; Dahou, A.; Damaševičius, R.; Krilavičius, T.; Elaziz, M.A. A novel hybrid gradient-based optimizer and grey wolf optimizer feature selection method for human ac-tivity recognition using smartphone sensors. Entropy 2021, 23, 1065. [Google Scholar] [CrossRef] [PubMed]
  10. Huan, L.; Lei, Y. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar] [CrossRef] [Green Version]
  11. Xue, B.; Zhang, M.; Browne, W.N. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Appl. Soft Comput. 2014, 18, 261–276. [Google Scholar] [CrossRef]
  12. Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  13. Pashaei, E.; Aydin, N. Binary black hole algorithm for feature selection and classification on biological data. Appl. Soft Comput. 2017, 56, 94–106. [Google Scholar] [CrossRef]
  14. de Souza RC, T.; de Macedo, C.A.; dos Santos Coelho, L.; Pierezan, J.; Mariani, V.C. Binary coyote optimization algorithm for feature selection. Pattern Recognit. 2020, 107, 107470. [Google Scholar] [CrossRef]
  15. Faris, H.; Heidari, A.A.; Al-Zoubi, A.M.; Mafarja, M.; Aljarah, I.; Eshtay, M.; Mirjalili, S. Time-varying hierarchical chains of salps with random weight networks for feature selection. Expert Syst. Appl. 2019, 140, 112898. [Google Scholar] [CrossRef]
  16. Taghian, S.; Nadimi-Shahraki, M.H. Binary Sine Cosine Algorithms for Feature Selection from Medical Data. Adv. Comput. Int. J. 2019, 10, 1–10. [Google Scholar] [CrossRef]
  17. Eberhart, R.; Kennedy, J. A new optimizer using particle swarm theory. In Proceedings of the MHS’Proceedings of the Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, 4–6 October 1995. [Google Scholar]
  18. Mirjalili, S.; Mirjalili, S.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef] [Green Version]
  19. Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
  20. Mirjalili, S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl. Based Syst. 2015, 89, 228–249. [Google Scholar] [CrossRef]
  21. Abualigah, L.; Yousri, D.; Elaziz, M.A.; Ewees, A.A.; Al-Qaness, M.A.; Gandomi, A.H. Aquila Optimizer: A novel meta-heuristic optimization algorithm. Comput. Ind. Eng. 2021, 157, 107250. [Google Scholar] [CrossRef]
  22. Zamani, H.; Nadimi-Shahraki, M.H.; Gandomi, A. CCSA: Conscious Neighborhood-based Crow Search Algorithm for Solving Global Optimization Problems. Appl. Soft Comput. 2019, 85, 105583. [Google Scholar] [CrossRef]
  23. Askarzadeh, A. A novel metaheuristic method for solving constrained engineering optimization problems: Crow search al-gorithm. Comput. Struct. 2016, 169, 1–12. [Google Scholar] [CrossRef]
  24. Shi, Y.; Pun, C.-M.; Hu, H.; Gao, H. An improved artificial bee colony and its application. Knowl. Based Syst. 2016, 107, 14–31. [Google Scholar] [CrossRef]
  25. Shang, Y. Optimal Control Strategies for Virus Spreading in Inhomogeneous Epidemic Dynamics. Can. Math. Bull. 2013, 56, 621–629. [Google Scholar] [CrossRef]
  26. Oliva, D.; Abd El Aziz, M.; Hassanien, A.E. Parameter estimation of photovoltaic cells using an improved chaotic whale optimization algorithm. Appl. Energy 2017, 200, 141–154. [Google Scholar] [CrossRef]
  27. Ewees, A.A.; Abd Elaziz, M.; Houssein, E.H. Improved grasshopper optimization algorithm using opposition-based learning. Expert Syst. Appl. 2018, 112, 156–172. [Google Scholar] [CrossRef]
  28. Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S. An improved grey wolf optimizer for solving engineering problems. Expert Syst. Appl. 2020, 166, 113917. [Google Scholar] [CrossRef]
  29. Liu, H.; Ding, G.; Wang, B. Bare-bones particle swarm optimization with disruption operator. Appl. Math. Comput. 2014, 238, 106–122. [Google Scholar] [CrossRef]
  30. Guo, L.; Wang, G.-G.; Gandomi, A.; Alavi, A.H.; Duan, H. A new improved krill herd algorithm for global numerical optimization. Neurocomputing 2014, 138, 392–402. [Google Scholar] [CrossRef]
  31. Zaman, H.R.R.; Gharehchopogh, F.S. An improved particle swarm optimization with backtracking search optimization algorithm for solving continuous optimization problems. Eng. Comput. 2021, 1–35. [Google Scholar] [CrossRef]
  32. Banaie-Dezfouli, M.; Nadimi-Shahraki, M.H.; Beheshti, Z. R-GWO: Representative-based grey wolf optimizer for solving en-gineering problems. Appl. Soft Comput. 2021, 106, 107328. [Google Scholar] [CrossRef]
  33. Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. From data mining to knowledge discovery in databases. AI Mag. 1996, 17, 37. [Google Scholar]
  34. Taghian, S.; Nadimi-Shahraki, M.H. A binary metaheuristic algorithm for wrapper feature selection. Int. J. Comput. Sci. Eng. 2019, 8, 168–172. [Google Scholar]
  35. Mafarja, M.; Aljarah, I.; Faris, H.; Hammouri, A.I.; Al-Zoubi, A.M.; Mirjalili, S. Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst. Appl. 2018, 117, 267–286. [Google Scholar] [CrossRef]
  36. Arora, S.; Anand, P. Binary butterfly optimization approaches for feature selection. Expert Syst. Appl. 2018, 116, 147–160. [Google Scholar] [CrossRef]
  37. Mohammadzadeh, H.; Gharehchopogh, F.S. Feature Selection with Binary Symbiotic Organisms Search Algorithm for Email Spam Detection. Int. J. Inf. Technol. Decis. Mak. 2021, 20, 469–515. [Google Scholar] [CrossRef]
  38. Safaldin, M.; Otair, M.; Abualigah, L. Improved binary gray wolf optimizer and SVM for intrusion detection system in wireless sensor networks. J. Ambient. Intell. Humaniz. Comput. 2020, 12, 1559–1576. [Google Scholar] [CrossRef]
  39. Khurma, R.; Alsawalqah, H.; Aljarah, I.; Elaziz, M.; Damaševičius, R. An Enhanced Evolutionary Software Defect Prediction Method Using Island Moth Flame Optimization. Mathematics 2021, 9, 1722. [Google Scholar] [CrossRef]
  40. Elsakaan, A.A.; El-Sehiemy, R.A.; Kaddah, S.S.; Elsaid, M.I. An enhanced moth-flame optimizer for solving non-smooth economic dispatch problems with emissions. Energy 2018, 157, 1063–1078. [Google Scholar] [CrossRef]
  41. Pelusi, D.; Mascella, R.; Tallini, L.; Nayak, J.; Naik, B.; Deng, Y. An Improved Moth-Flame Optimization algorithm with hybrid search phase. Knowl. Based Syst. 2019, 191, 105277. [Google Scholar] [CrossRef]
  42. Xu, Y.; Chen, H.; Heidari, A.A.; Luo, J.; Zhang, Q.; Zhao, X.; Li, C. An efficient chaotic mutative moth-flame-inspired optimizer for global optimization tasks. Expert Syst. Appl. 2019, 129, 135–155. [Google Scholar] [CrossRef]
  43. Neshat, M.; Sergiienko, N.; Mirjalili, S.; Nezhad, M.M.; Piras, G.; Garcia, D.A. Multi-Mode Wave Energy Converter Design Optimisation Using an Improved Moth Flame Optimisation Algorithm. Energies 2021, 14, 3737. [Google Scholar] [CrossRef]
  44. Kennedy, J.; Eberhart, R. A discrete binary version of the particle swarm algorithm. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Orlando, FL, USA, 12–15 October 1997; pp. 4104–4108. [Google Scholar] [CrossRef]
  45. Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 172, 371–381. [Google Scholar] [CrossRef]
  46. Mafarja, M.; Aljarah, I.; Heidari, A.A.; Faris, H.; Fournier-Viger, P.; Li, X.; Mirjalili, S. Binary dragonfly optimization for feature selection using time-varying transfer functions. Knowledge-Based Syst. 2018, 161, 185–204. [Google Scholar] [CrossRef]
  47. Faris, H.; Mafarja, M.M.; Heidari, A.A.; Aljarah, I.; Al-Zoubi, A.; Mirjalili, S.; Fujita, H. An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems. Knowledge-Based Syst. 2018, 154, 43–67. [Google Scholar] [CrossRef]
  48. El Aziz, M.A.; Hassanien, A.E. Modified cuckoo search algorithm with rough sets for feature selection. Neural Comput. Appl. 2016, 29, 925–934. [Google Scholar] [CrossRef]
  49. Ibrahim, R.A.; Elaziz, M.A.; Oliva, D.; Cuevas, E.; Lu, S. An opposition-based social spider optimization for feature selection. Soft Comput. 2019, 23, 13547–13567. [Google Scholar] [CrossRef]
  50. Dezfouli, M.B.; Nadimi-Shahraki, M.H.; Zamani, H. A novel tour planning model using big data. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing, New York, NY, USA, 28–30 September 2018; pp. 1–6. [Google Scholar] [CrossRef]
  51. Shang, Y.; Bouffanais, R. Consensus reaching in swarms ruled by a hybrid metric-topological distance. Eur. Phys. J. B 2014, 87, 1–7. [Google Scholar] [CrossRef]
  52. Benyamin, A.; Farhad, S.G.; Saeid, B. Discrete farmland fertility optimization algorithm with metropolis acceptance criterion for traveling salesman problems. Int. J. Intell. Syst. 2020, 36, 1270–1303. [Google Scholar] [CrossRef]
  53. Talbi, E.-G. Metaheuristics: From Design to Implementation; Wiley Publishing: Hoboken, NJ, USA, 2009. [Google Scholar]
  54. Hafez, A.I.; Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Sine cosine optimization algorithm for feature selection. In Proceedings of the 2016 International Symposium on INnovations in Intelligent SysTems and Applications, Sinaia, Romania, 2–5 August 2016. [Google Scholar]
  55. Mafarja, M.; Mirjalili, S. Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. 2018, 62, 441–453. [Google Scholar] [CrossRef]
  56. Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary ant lion approaches for feature selection. Neurocomputing 2016, 213, 54–65. [Google Scholar] [CrossRef]
  57. Ghimatgar, H.; Kazemi, K.; Helfroush, M.S.; Aarabi, A. An improved feature selection algorithm based on graph clustering and ant colony optimization. Knowledge-Based Syst. 2018, 159, 270–285. [Google Scholar] [CrossRef]
  58. Mirjalili, S. Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput. Appl. 2015, 27, 1053–1073. [Google Scholar] [CrossRef]
  59. Mirjalili, S.; Gandomi, A.H.; Mirjalili, S.Z.; Saremi, S.; Faris, H.; Mirjalili, S.M. Salp Swarm Algorithm: A bio-inspired optimizer for engineering design problems. Adv. Eng. Softw. 2017, 114, 163–191. [Google Scholar] [CrossRef]
  60. Hashim, F.A.; Houssein, E.H.; Mabrouk, M.S.; Al-Atabany, W.; Mirjalili, S. Henry gas solubility optimization: A novel physics-based algorithm. Futur. Gener. Comput. Syst. 2019, 101, 646–667. [Google Scholar] [CrossRef]
  61. Shayanfar, H.; Gharehchopogh, F.S. Farmland fertility: A new metaheuristic algorithm for solving continuous optimization problems. Appl. Soft Comput. 2018, 71, 728–746. [Google Scholar] [CrossRef]
  62. Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S.; Faris, H. MTDE: An effective multi-trial vector-based differential evolution algorithm and its applications for engineering design problems. Appl. Soft Comput. 2020, 97, 106761. [Google Scholar] [CrossRef]
  63. Zamani, H.; Nadimi-Shahraki, M.H.; Gandomi, A.H. QANA: Quantum-based avian navigation optimizer algorithm. Eng. Appl. Artif. Intell. 2021, 104, 104314. [Google Scholar] [CrossRef]
  64. Zamani, H.; Nadimi-Shahraki, M.H.; Taghian, S.; Banaie-Dezfouli, M. Enhancement of bernstain-search differential evolution algorithm to solve constrained engineering problems. Int. J. Comput. Sci. Eng. 2020, 386–396. [Google Scholar]
  65. Ghasemi, M.R.; Varaee, H. Enhanced IGMM optimization algorithm based on vibration for numerical and engineering problems. Eng. Comput. 2017, 34, 91–116. [Google Scholar] [CrossRef]
  66. Varaee, H.; Hamzehkolaei, N.S.; Safari, M. A Hybrid Generalized Reduced Gradient-Based Particle Swarm Optimizer for Constrained Engineering Optimization Problems. J. Soft Comput. Civil Eng. 2021, 5, 86–119. [Google Scholar] [CrossRef]
  67. Sayarshad, H.R. Using bees algorithm for material handling equipment planning in manufacturing systems. Int. J. Adv. Manuf. Technol. 2009, 48, 1009–1018. [Google Scholar] [CrossRef]
  68. Fard, E.S.; Monfaredi, K.; Nadimi-Shahraki, M.H. An Area-Optimized Chip of Ant Colony Algorithm Design in Hardware Platform Using the Address-Based Method. Int. J. Electr. Comput. Eng. 2014, 4, 989. [Google Scholar] [CrossRef]
  69. Tadi, A.A.; Aghajanloo, Z. Load Balancing in Cloud Computing using Cuckoo Optimization Algorithm. J. Innov. Res. Eng. Sci. 2018, 4, 4. [Google Scholar]
  70. Zahrani, H.K.; Nadimi-Shahraki, M.H.; Sayarshad, H.R. An intelligent social-based method for rail-car fleet sizing problem. J. Rail Transp. Plan. Manag. 2020, 17, 100231. [Google Scholar] [CrossRef]
  71. Taghian, S.; Nadimi-Shahraki, M.H.; Zamani, H. Comparative Analysis of Transfer Function-based Binary Metaheuristic Algorithms for Feature Selection. In Proceedings of the International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 28–30 September 2018; pp. 1–6. [Google Scholar] [CrossRef]
  72. Aslan, M.; Gunduz, M.; Kiran, M.S. JayaX: Jaya algorithm with xor operator for binary optimization. Appl. Soft Comput. 2019, 82, 105576. [Google Scholar] [CrossRef]
  73. Jia, D.; Duan, X.; Khan, M.K. Binary Artificial Bee Colony optimization using bitwise operation. Comput. Ind. Eng. 2014, 76, 360–365. [Google Scholar] [CrossRef]
  74. Mirjalili, S.; Lewis, A. S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization. Swarm Evol. Comput. 2013, 9, 1–14. [Google Scholar] [CrossRef]
  75. Nezamabadi-pour, H.; Rostami-Shahrbabaki, M.; Maghfoori-Farsangi, M. Binary particle swarm optimization: Challenges and new solutions. CSI J Comput Sci Eng 2008, 6, 21–32. [Google Scholar]
  76. Mirjalili, S.; Zhang, H.; Mirjalili, S.; Chalup, S.; Noman, N. A Novel U-Shaped Transfer Function for Binary Particle Swarm Optimisation. In Soft Computing for Problem Solving 2019; Springer: Singapore, 2019. [Google Scholar]
  77. Pedrasa, M.A.A.; Spooner, T.D.; MacGill, I.F. Scheduling of Demand Side Resources Using Binary Particle Swarm Optimization. IEEE Trans. Power Syst. 2009, 24, 1173–1181. [Google Scholar] [CrossRef]
  78. Liao, C.-J.; Tseng, C.-T.; Luarn, P. A discrete version of particle swarm optimization for flowshop scheduling problems. Comput. Oper. Res. 2007, 34, 3099–3111. [Google Scholar] [CrossRef]
  79. Lin, J.C.-W.; Yang, L.; Fournier-Viger, P.; Hong, T.-P.; Voznak, M. A binary PSO approach to mine high-utility itemsets. Soft Comput. 2016, 21, 5103–5121. [Google Scholar] [CrossRef]
  80. Yuan, X.; Nie, H.; Su, A.; Wang, L.; Yuan, Y. An improved binary particle swarm optimization for unit commitment problem. Expert Syst. Appl. 2009, 36, 8049–8055. [Google Scholar] [CrossRef]
  81. Bharti, K.K.; Singh, P.K. Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl. Soft Comput. 2016, 43, 20–34. [Google Scholar] [CrossRef]
  82. Abualigah, L.; Khader, A.T.; Hanandeh, E.S. A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J. Comput. Sci. 2018, 25, 456–466. [Google Scholar] [CrossRef]
  83. Lu, Y.; Liang, M.; Ye, Z.; Cao, L. Improved particle swarm optimization algorithm and its application in text feature selection. Appl. Soft Comput. 2015, 35, 629–636. [Google Scholar] [CrossRef]
  84. Sheikhpour, R.; Sarram, M.A.; Sheikhpour, R. Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Appl. Soft Comput. 2016, 40, 113–131. [Google Scholar] [CrossRef]
  85. Gunasundari, S.; Janakiraman, S.; Meenambal, S. Velocity Bounded Boolean Particle Swarm Optimization for improved feature selection in liver and kidney disease diagnosis. Expert Syst. Appl. 2016, 56, 28–47. [Google Scholar] [CrossRef]
  86. Zamani, H.; Nadimi-Shahraki, M.H. Swarm Intelligence Approach for Breast Cancer Diagnosis. Int. J. Comput. Appl. 2016, 151, 40–44. [Google Scholar] [CrossRef]
  87. Al-Tashi, Q.; Kadir, S.J.A.; Rais, H.M.; Mirjalili, S.; Alhussian, H. Binary Optimization Using Hybrid Grey Wolf Optimization for Feature Selection. IEEE Access 2019, 7, 39496–39508. [Google Scholar] [CrossRef]
  88. Panwar, L.K.; Reddy, S.; Verma, A.; Panigrahi, B.K.; Kumar, R. Binary Grey Wolf Optimizer for large scale unit commitment problem. Swarm Evol. Comput. 2018, 38, 251–266. [Google Scholar] [CrossRef]
  89. Chantar, H.; Mafarja, M.; Alsawalqah, H.; Heidari, A.A.; Aljarah, I.; Faris, H. Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Comput. Appl. 2019, 32, 12201–12220. [Google Scholar] [CrossRef]
  90. Hu, P.; Pan, J.-S.; Chu, S.-C. Improved Binary Grey Wolf Optimizer and Its application for feature selection. Knowledge-Based Syst. 2020, 195, 105746. [Google Scholar] [CrossRef]
  91. Zamani, H.; Nadimi-Shahraki, M.H. Feature selection based on whale optimization algorithm for diseases diagnosis. Int. J. Comput. Sci. Inf. Secur. 2016, 14, 1243. [Google Scholar]
  92. Hussien, A.G.; Houssein, E.H.; Hassanien, A.E. A binary whale optimization algorithm with hyperbolic tangent fitness function for feature selection. In Proceedings of the Eighth International Conference on Intelligent Computing and Information Systems, Cairo, Egypt, 5–7 December 2017; pp. 166–172. [Google Scholar] [CrossRef]
  93. Hussien, A.G.; Hassanien, A.E.; Houssein, E.H.; Bhattacharyya, S.; Amin, M. S-shaped binary whale optimization algorithm for feature selection. In Recent Trends in Signal and Image Processing; Springer: Singapore, 2019; pp. 79–87. [Google Scholar] [CrossRef]
  94. Reddy, K.S.; Panwar, L.; Panigrahi, B.K.; Kumar, R. Binary whale optimization algorithm: A new metaheuristic approach for profit-based unit commitment problems in competitive electricity markets. Eng. Optim. 2019, 51, 369–389. [Google Scholar] [CrossRef]
  95. Mafarja, M.M.; Eleyan, D.; Jaber, I.; Hammouri, A.; Mirjalili, S. Binary dragonfly algorithm for feature selection. In Proceedings of the 2017 International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, 11–13 October 2017. [Google Scholar]
  96. Sawhney, R.; Jain, R. Modified Binary Dragonfly Algorithm for Feature Selection in Human Papillomavirus-Mediated Disease Treatment. In Proceedings of the 2018 International Conference on Communication, Computing and Internet of Things (IC3IoT), Tamil Nadu, India, 15–17 February 2018; pp. 91–95. [Google Scholar] [CrossRef]
  97. Too, J.; Mirjalili, S. A Hyper Learning Binary Dragonfly Algorithm for Feature Selection: A COVID-19 Case Study. Knowledge-Based Syst. 2020, 212, 106553. [Google Scholar] [CrossRef]
  98. Ibrahim, R.A.; Ewees, A.; Oliva, D.; Elaziz, M.A.; Lu, S. Improved salp swarm algorithm based on particle swarm optimization for feature selection. J. Ambient. Intell. Humaniz. Comput. 2018, 10, 3155–3169. [Google Scholar] [CrossRef]
  99. Meraihi, Y.; Ramdane-Cherif, A.; Mahseur, M.; Achelia, D. A Chaotic binary salp swarm algorithm for solving the graph coloring problem. In International Symposium on Modelling and Implementation of Complex Systems; Springer: Cham, Switzerland, 2018; pp. 106–118. [Google Scholar] [CrossRef]
  100. Geem, Z.W.; Kim, J.H.; Loganathan, G. A New Heuristic Optimization Algorithm: Harmony Search. Simulation 2001, 76, 60–68. [Google Scholar] [CrossRef]
  101. Rashedi, E.; Nezamabadi-Pour, H.; Saryazdi, S. BGSA: Binary gravitational search algorithm. Nat. Comput. 2009, 9, 727–745. [Google Scholar] [CrossRef]
  102. Reddy, S.; Panwar, L.K.; Panigrahi, B.K.; Kumar, R. Solution to Unit Commitment in Power System Operation Planning Using Modified Moth Flame Optimization Algorithm (MMFOA): A Flame Selection Based Computational Technique. J. Comput. Sci. 2017, 25, 298–317. [Google Scholar]
  103. Ghosh, K.K.; Singh, P.K.; Hong, J.; Geem, Z.W.; Sarkar, R. Binary Social Mimic Optimization Algorithm With X-Shaped Transfer Function for Feature Selection. IEEE Access 2020, 8, 97890–97906. [Google Scholar] [CrossRef]
  104. Wang, L.; Fu, X.; Menhas, M.I.; Fei, M. A Modified Binary Differential Evolution Algorithm; Springer: Berlin, Germany, 2010; pp. 49–57. [Google Scholar] [CrossRef]
  105. Michalak, K. Selecting Best Investment Opportunities from Stock Portfolios Optimized by a Multiobjective Evolutionary Al-gorithm. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. 2015, Association for Computing Machinery, Madrid, Spain, 11–15 July 2015; pp. 1239–1246. [Google Scholar]
  106. Daolio, F.; Liefooghe, A.; Verel, S.; Aguirre, H.; Tanaka, K. Global vs local search on multi-objective NK-landscapes: Contrasting the impact of problem features. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, Madrid, Spain, 11–15 July 2015; 2015. [Google Scholar]
  107. Statnikov, A.; Aliferis, C.F.; Tsamardinos, I.; Hardin, D.; Levy, S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 2004, 21, 631–643. [Google Scholar] [CrossRef] [Green Version]
  108. Dua, C.G.D. Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml/index.php (accessed on 18 October 2021).
  109. Derrac, J.; García, S.; Molina, D.; Herrera, F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 2011, 1, 3–18. [Google Scholar] [CrossRef]
Figure 1. S-shaped, V-shaped, and U-shaped transfer functions.
Figure 1. S-shaped, V-shaped, and U-shaped transfer functions.
Computers 10 00136 g001
Figure 2. The flowchart of B-MFO.
Figure 2. The flowchart of B-MFO.
Computers 10 00136 g002
Figure 3. Average accuracy obtained by B-MFO and comparative algorithms on large datasets.
Figure 3. Average accuracy obtained by B-MFO and comparative algorithms on large datasets.
Computers 10 00136 g003
Figure 4. Average features’ number selected by B-MFO and comparative algorithms on large datasets.
Figure 4. Average features’ number selected by B-MFO and comparative algorithms on large datasets.
Computers 10 00136 g004
Figure 5. The convergence curves of winner versions of B-MFO and comparative algorithms.
Figure 5. The convergence curves of winner versions of B-MFO and comparative algorithms.
Computers 10 00136 g005
Table 1. The variants of S-shaped, V-shaped, and U-shaped transfer functions.
Table 1. The variants of S-shaped, V-shaped, and U-shaped transfer functions.
S-Shaped Transfer FunctionV-Shaped Transfer FunctionU-Shaped Transfer Function
No.Transfer FunctionNo.Transfer FunctionNo.Transfer Function
S1 T F s ( x ) = 1 / ( 1 + e x p 2 x ) V1 T F v ( x ) = | e r f ( π 2 x ) | = | ( π 2 0 π 2 x e t 2 d t ) | U1 T F u ( x ) = α | x 1.5 |
S2 T F s ( x ) = 1 / ( 1 + e x p x ) V2 T F v ( x ) = | tan h ( x ) | U2 T F u ( x ) = α | x 2 |
S3 T F s ( x ) = 1 / ( 1 + e x p ( x / 2 ) ) V3 T F v ( x ) = | ( x ) / 1 + x 2 | U3 T F u ( x ) = α | x 3 |
S4 T F s ( x ) = 1 / ( 1 + e x p ( x / 3 ) ) V4 T F v ( x ) = | 2 π arctan ( π 2 x ) | U4 T F u ( x ) = α | x 4 |
Table 2. The datasets’ descriptions.
Table 2. The datasets’ descriptions.
No.Medical DatasetsNo. FeaturesNo. SamplesSize
1Pima8768Small
2Lymphography18148Small
3Breast-WDBC30569Small
4PenglungEW32573Large
5Parkinson754756Large
6Colon200062Large
7Leukemia712972Large
Table 3. The accuracy and selected features’ number gained by winner versions of B-MFO and comparative algorithms.
Table 3. The accuracy and selected features’ number gained by winner versions of B-MFO and comparative algorithms.
Datasets
(Winner)
MetricsBPSObGWOBDABSSAB-MFO
Pima
(B-MFO-S1)
Avg accuracy0.79220.77260.78490.77980.7902
Std accuracy0.00330.00630.01190.00790.0046
Avg no. features4.73337.60003.26674.76675.2667
Lymphography
(B-MFO-V3)
Avg accuracy0.91630.86940.90410.88820.9095
Std accuracy0.00990.01080.01820.88820.0089
Avg no. features8.933316.96675.53339.10005.3667
Breast-WDBC
(B-MFO-U3)
Avg accuracy0.97100.96260.96660.96550.9719
Std accuracy0.00210.00280.00780.00300.0020
Avg no. features12.833327.60002.400013.80003.2333
PenglungEW
(B-MFO-U2)
Avg accuracy0.96260.95410.95070.95670.9692
Std accuracy0.00400.00440.01260.00580.0063
Avg no. features161.0667322.666783.5667199.500081.5333
Parkinson
(B-MFO-V2)
Avg accuracy0.79520.77360.76430.77930.8603
Std accuracy0.02430.00360.00560.01260.0094
Avg no. features376.4333741.2333192.7333332.766779.1000
Colon
(B-MFO-U2)
Avg accuracy0.96250.95260.92960.95350.9694
Std accuracy0.00560.00480.02070.00510.0059
Avg no. features999.93331948.8667618.43331152.2000350.7667
Leukemia
(B-MFO-U2)
Avg accuracy0.99880.99010.97030.99540.9998
Std accuracy0.00130.00210.01670.00230.0005
Avg no. features3542.06706746.96702283.73303435.2330669.2333
Table 4. The comparison results between winner versions of B-MFO and comparative algorithms on fitness.
Table 4. The comparison results between winner versions of B-MFO and comparative algorithms on fitness.
Datasets
(Winner)
MetricsBPSObGWOBDABSSAB-MFO
Pima
(B-MFO-S1)
Avg fitness0.21170.23470.24560.22400.2143
Std fitness0.00340.00680.00520.00760.0046
Lymphography
(B-MFO-V3)
Avg fitness0.08780.13870.15030.11570.0925
Std fitness0.00950.01100.01890.01060.0084
Breast-WDBC (B-MFO-U3)Avg fitness0.03300.04620.05710.03870.0289
Std fitness0.00190.00270.01110.00330.0021
PenglungEW
(B-MFO-U2)
Avg fitness0.04200.05540.88450.04900.0330
Std fitness0.00400.00430.10060.00590.0061
Parkinson
(B-MFO-V2)
Avg fitness0.20780.23402.16070.22290.1393
Std fitness0.02410.00350.21040.01350.0095
Colon
(B-MFO-U2)
Avg fitness0.04210.05676.25400.05180.0321
Std fitness0.00550.00480.57400.00510.0056
Leukemia
(B-MFO-U2)
Avg fitness0.00620.019222.86670.00940.0011
Std fitness0.00130.00222.67450.00230.0006
Table 5. The comparison results between winner versions of B-MFO and comparative algorithms on specificity and sensitivity.
Table 5. The comparison results between winner versions of B-MFO and comparative algorithms on specificity and sensitivity.
Datasets
(Winner)
MetricsBPSObGWOBDABSSAB-MFO
PenglungEW
(B-MFO-U2)
Avg specificity0.99751.00000.99400.99800.9945
Avg sensitivity0.97220.94440.93330.95000.9722
Parkinson
(B-MFO-V2)
Avg specificity0.90040.88980.88760.89150.9321
Avg sensitivity0.38820.29130.30020.36860.5510
Colon
(B-MFO-U2)
Avg specificity0.94670.95000.93920.94670.9475
Avg sensitivity0.77120.69700.66060.72270.8227
Leukemia
(B-MFO-U2)
Avg specificity1.00001.00000.99720.99571.0000
Avg sensitivity0.96670.84130.78000.92670.9947
Table 6. Friedman test for the accuracy obtained by winner versions of B-MFO and comparative algorithms.
Table 6. Friedman test for the accuracy obtained by winner versions of B-MFO and comparative algorithms.
DatasetBPSObGWOBDABSSAB-MFO
Pima4.271.473.172.333.77
Lymphography3.177.172.338.733.77
Breast-WDBC7.172.3314.308.733.77
PenglungEW2.3317.4018.908.733.77
Parkinson10.1022.4023.9013.707.10
Colon10.1026.5028.6013.907.10
Leukemia10.1027.0029.1013.907.10
Average rank6.7614.9017.2010.005.20
Overall rank24531
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Nadimi-Shahraki, M.H.; Banaie-Dezfouli, M.; Zamani, H.; Taghian, S.; Mirjalili, S. B-MFO: A Binary Moth-Flame Optimization for Feature Selection from Medical Datasets. Computers 2021, 10, 136. https://0-doi-org.brum.beds.ac.uk/10.3390/computers10110136

AMA Style

Nadimi-Shahraki MH, Banaie-Dezfouli M, Zamani H, Taghian S, Mirjalili S. B-MFO: A Binary Moth-Flame Optimization for Feature Selection from Medical Datasets. Computers. 2021; 10(11):136. https://0-doi-org.brum.beds.ac.uk/10.3390/computers10110136

Chicago/Turabian Style

Nadimi-Shahraki, Mohammad H., Mahdis Banaie-Dezfouli, Hoda Zamani, Shokooh Taghian, and Seyedali Mirjalili. 2021. "B-MFO: A Binary Moth-Flame Optimization for Feature Selection from Medical Datasets" Computers 10, no. 11: 136. https://0-doi-org.brum.beds.ac.uk/10.3390/computers10110136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop