Next Article in Journal
Tensile Characteristics and Fracture Mode of Frozen Fractured Rock Mass Based on Brazilian Splitting Test
Next Article in Special Issue
Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional Datasets
Previous Article in Journal
SIMONE: A Dynamic Monitoring Simulator for the Evacuation of Navy Ships
Previous Article in Special Issue
A Modified Gorilla Troops Optimizer for Global Optimization Problem
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Binary Ebola Optimization Search Algorithm for Feature Selection and Classification Problems

by
Olatunji Akinola
1,
Olaide N. Oyelade
1,2 and
Absalom E. Ezugwu
1,*
1
School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg 3201, KZN, South Africa
2
Department of Computer Science, Ahmadu Bello University, Zaria 810211, Nigeria
*
Author to whom correspondence should be addressed.
Submission received: 20 October 2022 / Revised: 9 November 2022 / Accepted: 17 November 2022 / Published: 19 November 2022
(This article belongs to the Special Issue Evolutionary Algorithms and Large-Scale Real-World Applications)

Abstract

:
In the past decade, the extraction of valuable information from online biomedical datasets has exponentially increased due to the evolution of data processing devices and the utilization of machine learning capabilities to find useful information in these datasets. However, these datasets present a variety of features, dimensionalities, shapes, noise, and heterogeneity. As a result, deriving relevant information remains a problem, since multiple features bottleneck the classification process. Despite their adaptability, current state-of-the-art classifiers have failed to address the problem, giving rise to the exploration of binary optimization algorithms. This study proposes a novel approach to binarizing the Ebola optimization search algorithm. The binary Ebola search optimization algorithm (BEOSA) uses two newly formulated S-shape and V-shape transfer functions to investigate mutations of the infected population in the exploitation and exploration phases, respectively. A model is designed to show a representation of the binary search space and the mapping of the algorithm from the continuous space to the discrete space. Mathematical models are formulated to demonstrate the fitness and cost functions used for evaluating the algorithm. Using 22 benchmark datasets consisting of low, medium and high dimensional data, we exhaustively experimented with the proposed BEOSA method and six other recent similar feature selection methods. The experimental results show that the BEOSA and its variant BIEOSA were highly competitive with different state-of-the-art binary optimization algorithms. A comparative analysis of the classification accuracy obtained for eight binary optimizers showed that BEOSA performed competitively compared to other methods on nine datasets. Evaluation reports on all methods revealed that BEOSA was the top performer, obtaining the best values on eight datasets and eight fitness and cost functions. Computation for the average number of features selected showed that BEOSA outperformed other methods on 11 datasets when population sizes of 75 and 100 were used. Findings from the study revealed that BEOSA is effective in handling the challenge of feature selection in high-dimensional datasets.

1. Introduction

Machine learning and data mining are fast-growing topics in research and industry because of the massive amount of data being generated which needs to be converted into usable information. This conversion process plays an essential part in the process of knowledge discovery, as it comprises a set of repetitive task sequences including the transformation, reduction, cleansing, and integration of data, among others [1]. These steps are known as pre-processing; their outcome directly impacts the machine learning and data mining algorithm performance. Due to its importance, data is regarded as the “currency” of the present decade. This makes the correct handling of data a necessity. With the increase in data and the growth of machine learning and data mining, the processing of data is becoming more and more tedious. The increase in dimensionality means that training machine learning and data mining algorithms takes longer, making them more computationally expensive. Researchers have developed different methods to address the problem of dimensionality. One such method is feature selection, which removes the presence of noisy data such as unnecessary or useless features that do not assist with the purpose of classification [2].
Feature selection is a pre-processing stage which assists in selecting valuable features and separating them from a set of unwanted ones, thereby improving the performance of the classifiers. This method eradicates redundant, irrelevant features, thereby reducing time complexity [3]. Feature selection is generally performed in two ways, namely, wrapper and filter methods [4]. The wrapper method utilizes learning algorithm(s) to choose the subsets of features. This method produces better performance but is more computationally expensive than the filter method. The feature selection under the wrapper technique is referred to as an optimization problem [5]. The filter-based technique does not depend on a learning algorithm; rather, it chooses useful features by utilizing information gain, mutual information etc. [6]. This method is computationally inexpensive but does not produce as good a performance as the wrapper-based techniques. Finding the relevant subset of features is a challenging task, as the main aim is to select the minimum number of features and get the maximum accuracy possible. Due to the increased time required to locate the optimal subset of features, feature selection is referred to as an NP-hard problem [7]. Should we have an N feature, the sum of 2 N 1 of the number of the combination of features is needed to investigate and locate the best features [8]. The need for a high-performing metaheuristic to take care of this type of problem is important to reduce the processing time. The search processes of metaheuristics rely on the trade-off between the exploitation (intensification)—that conducts a thorough neighborhood search to obtain better possible solutions—and exploration (diversification)—that tests the solution of candidates not within the neighborhood. These two objectives are the factors that define the ability to find optimal solution(s). Recently, feature selection as an optimization problem has been solved using metaheuristic algorithms because they show better performance than exact methods [8,9,10,11,12,13]. However, due to the no free lunch (NFL) theorem, which proposes that no one algorithm is sufficient to solve all optimization problems, the need to develop new or improve existing methods that can make high-quality solutions for the candidate problem becomes unavoidable.
Despite the great effort and advancements in this area, most metaheuristic algorithms have at least one deficiency or shortcoming. Examples of such limitations include getting trapped in local optima, premature convergence, too many parameters to be tuned and so on. The question which raises serious research opportunity is: does good performance and the superiority demonstrated by a continuous variant of an optimization algorithm translate into similar good performance when applied to solve binary optimization problems? To answer this question, this paper presents a binary Ebola optimization search algorithm (BEOSA) to solve the feature selection problem and to avoid some of these drawbacks. The baseline EOSA is a recently proposed metaheuristic algorithm [14], a bio-based algorithm inspired by the Ebola virus disease propagation model. The base algorithm was evaluated on 47 classical benchmark functions and compared with seven well-known techniques and 14 CEC benchmark functions, producing superior performance over other methods in the study. This consideration and the selection of the EOSA method as the base algorithm for the binary optimization method proposed in this study was motivated by the performance of the algorithm itself and even a recent outstanding report of its immunity-based variant, namely, IEOSA. The performance of this biology-based algorithm stood out among most of the state-of-the-art optimization methods with similar sources of inspiration. Since the algorithm has proven relevant in addressing some very difficult continuous optimization problems, we sought to determine whether its operators and optimization process could find optimal solutions for binary optimization problems. Hence, through an exhaustive and rigorous experimentation on several heterogeneous and high-dimensional datasets, this study investigates the influence, impact and benefit of designing and applying a binary variant of the EOSA/IEOSA methods. The performance of this method was the motivation for this binarization, and since this method was developed, it has not been utilized to solve the feature selection problem. Since the feature selection problem is binary, we present a binary version of the EOSA to solve this problem. We also utilize the k-near neighbor (kNN) as the classifier to test the goodness of the selected subset of features. The the major contributions of this work are as follows:
  • Proposal of the binary version of the EOSA algorithm (called BEOSA) for feature selection problems.
  • Evaluation of the performance of BEOSA using a convergence curve and other computational analysis metrics.
  • Evaluation and validation of the proposed method with 22 small and medium size and 3 high-dimensional datasets.
  • The proposed method was assessed using seven classifiers to evaluate its performance.
  • A comparison is made of the efficacy of the BEOSA with some other popular feature selection methods.
The remainder of this manuscript is structured as follows: Section 2 presents a review of the relevant literature. Section 3 discusses the methodology used in this study. Section 4 details our proposed BEOSA approach and its application in feature selection. Section 5 centers on the results of the experiments and presents a discussion of this work. Section 6 provides the conclusion.

2. Related Work

A detailed review of related studies on the subject of the concept described in this study is presented in this section. The literature shows that several binary metaheuristic algorithms have been developed to solve the feature selection problem. The feature selection technique based on the wrapper approach uses the binary search capability of metaheuristic algorithms. Swarm- and evolutionary-based algorithms are becoming commonplace methods in the feature selection domain [15].
Particle Swarm Optimization (PSO) [16] is a bio-inspired metaheuristic method which has attracted much attention due to its tested and trusted mathematical modelling. This algorithm has been binarized and enhanced to solve problems in discrete search spaces. A study by Unler and Murat [17], presented a modified discrete PSO that used the logistic regression model and applied it to the feature selection domain. A year later, Chuang et al. [18], proposed an improved BPSO that introduced the effect of catfish, called “catfishBPSO”, for feature selection. The BPSO was also improved to tackle the optimization problem of feature selection [19]. Ji et al. [13], proposed an improved PSO based on the Levy flight local factor, a weighting inertia coefficient based on the global factor and a factor of improvement based on the mechanism of mutation diversity, called (IPSO), to tackle the feature selection problem. This improvement came with shortcomings, however, such as the inclusion of more parameters compared with other improved versions of the PSO, which makes tuning difficult for various application problems and increases computational time. Since every particle in BPSO moves closer to and farther from the hypercube corner, its major shortcoming is stagnation.
The genetic algorithm (GA) is another popular bio-inspired feature selection method which has been widely utilized as a wrapper-based technique. Huang and Wang [20], proposed a GA-based method using the support vector machine (SVM) as a learning algorithm to solve the feature selection problem. The major goal of their work was concurrent parameter and feature subset optimization without reducing the classification accuracy of the SVM. The method reduced the number of feature subsets and improved the accuracy of classification but was outperformed by the Grid algorithm. Later, Nemati et al. [21], presented a hybrid GA and ant colony optimizer (ACO) as a feature selection method to predict protein functions. These two algorithms were combined to enable better and faster capabilities with very low computational complexity. Furthermore, Jiang et al. [22], proposed a modified GA (MGA), i.e., a feature selection method using a pre-trained deep neural network (DNN) for the prediction of the demand for different patients’ key resources in an outpatient department.
Apart from these two notable algorithms, several other nature-inspired methods have been utilized to solve feature selection problems. The binary wrapper-based bat algorithm was developed by Nakamura et al. in 2012 [23]. It uses the classifier optimum-path forest to locate the feature sets that produce maximum classification accuracy. Hancer et al. [24], proposed a binary artificial bee colony (ABC) that employed a similarity search mechanism inspired by evolution to resolve the feature selection problem. Emary et al. [11], proposed the binary ant lion optimizer (BALO) which utilizes the transfer function as a means of moving ant lions within a discrete search space. The binary grey wolf optimizer with two techniques was proposed the following year to locate a subset of features that cater for the two conflicting objectives of the feature selection problem, i.e., to maximize the accuracy of the classification and minimize the number of selected features. However, this method was plagued with premature convergence, despite its outperformance of other methods used for comparison in the study. Zhang et al. [25], designed a variation of the binary firefly algorithm called return-cost-based FFA (Rc-FFA), which was able to prevent premature convergence. A binary dragonfly optimizer was developed by Mafarja et al. [26], which employed a time-varying transfer function that improved its exploitation and exploration phases. However, its performance was not close to optimal.
Faris et al. [27], proposed two variants of the salp swarm algorithm (SSA) to solve the feature selection problem. The first utilized eight transfer functions to convert a continuous search space to a binary one, and the other introduced a crossover operator to improve the exploration behavior of the SSA; however, the study did not provide an analysis of the transfer functions. A binary grasshopper optimization algorithm (BGOA) was proposed by Mafarja et al. [28], using the V-shaped transfer function and sigmoid. This study incorporated the mutation operator to enhance the exploration phase of the BGOA. In Mafarja and Mirjalili [29], two binary versions of the whale optimization algorithm were proposed. The first utilized the effect of a roulette wheel and tournament mechanisms of selection with a random operator in the process of searching, while the second version employed the mutation and crossover mechanisms to enhance diversification. Kumar et al. [30], proposed a binary seagull optimizer which employed four S- and V-shaped transfer functions to binarize the baseline algorithm, applying it to solve the feature selection problem. The reported results showed competitive performance with other methods; their technique was also evaluated using high-dimensional datasets.
Elgin Christo et al. [31], and Murugesan et al. [32], designed bio-inspired metaheuristics comprising three algorithms. The former combined glowworld swarm optimization, lion optimization algorithm and differential evolution, while the latter hybridized krill herd, cat swarm and bacteria foraging optimizers, with both using the AddaBoostSVM classifier as the fitness function and a backpropagation neural network to perform classification, which was applied to clinical diagnoses of diseases. The methods showed superior performance over other methods. However, these proposed methods were computationally expensive due to the use of combinations of different metaheuristic methods. Balasubramanian and Ananthamoorthy [33], proposed a bio-inspired method (salp swarm) with kernel-ELM as a classifier to diagnose glaucoma disease from medical images. The results produced by this method showed superior performance over other methods. However, the technique was not tested on collections of large, real-time datasets because this proved to be more challenging. The different algorithms mentioned above provided better solutions to many of the feature selection problems [34]. Many of these methods, however, could not yield an optimal subset of features for datasets of high-dimensional magnitude. Additionally, the inference from the NFL theorem that no single algorithm can solve all optimization problems holds in the feature selection domain as well. Hence, a new binary method needs to be developed to solve the optimization problem of feature selection.
Some bio-inspired metaheuristic algorithms are based on susceptible infectious recovery (SIR), the class of models to which the EOSA algorithm belongs. Therefore, reviewing some efforts made using this model in the literature is appropriate here. Some such methods have been proposed to tackle the problem of detection and classification, among which we may cite the SIR model [35]. This approach is based on sample paths and was employed to detect the sources of information in a network. The assumption of that study was that all nodes on the network were in their initial state and were susceptible, apart from a source that was in a state of infection. The susceptible nodes could then become infected by the infected node, which itself may have no longer been infected. The result of this simulation revealed that the estimator that the reverse-infection algorithm produced for the tree network was nearer to the real source. A further performance evaluation was conducted on many real-world networks with good outcomes. However, the assumption of a single source node only was the drawback of this model, since, in most real-world scenarios, this is close to impossible. To overcome this problem, Zang et al. [36], utilized a divide-and-conquer approach to find many sources in social networks using the SIR model. The technique showed promising results with high accuracy of its estimations. However, these methods have not been directly employed in the feature selection optimization problem.
Since the outbreak of the COVID-19 virus in 2020, more SIR model-based methods have been designed to detect or diagnose corona virus infection in humans. In Al-Betar et al. [37], a new coronavirus herd immunity optimizer (CHIO) was proposed which drew its inspiration from the concept of herd immunity and social distance strategy so as to protect society from contracting the virus. The herd immunity employed three main kinds of individuals: susceptible, infected and immunized; it was applied to solve engineering optimization problem. This algorithm has since been utilized to solve feature selection and classification problems, including the introduction of a novel COVID-19 diagnostic strategy, known as patient detection strategy (CPDS) [38], that combined the wrapper and filter methods for feature selection. The improved k-near neighbor (EKNN) was used for the wrapper method using the chest CT images of COVID-19 infected and non-infected patients. The results revealed the superiority of the proposed method over other, recently developed ones in terms of accuracy, sensitivity, precision, and time of execution. Similarly, the greedy search operator was incorporated with and without the CHIO to make two wrapper-based methods, which were evaluated on 23 benchmark datasets and a real-world COVID-19 dataset.
Some high-dimensional datasets have been employed to assess the efficacy of the proposed methods. Alweshah [39], boosted the efficiency of the probabilistic neural network (PNN) using CHIO to solve the classification problem. Eleven benchmark datasets were used to assess the accuracy of classification of the proposed CHIO-PNN which, on all the datasets used, produced a summative classification rate of 90.3% with a quicker rate of convergence than other methods. However, the drawback of this method was its use on low and medium rank datasets. As such, there is a concern that higher dimensional datasets may negatively impact its performance.

3. Methodology

This section presents the methodology of the proposed binarization approach for the EOSA algorithm. To achieve the design, an overview of the EOSA algorithm and its immunity-based variant is presented. This is followed by a description of the procedure for the generation and binarization of the search space. The binary variant of EOSA is then formulated and incorporated into the binary search space. The variant can use the proposed transformation functions to map the continuous space to a discrete space. The classification models used to support the feature selection process are also discussed.

3.1. Overview of EOSA and IEOSA

The EOSA metaheuristic [14] was inspired by the classical SIR model and the propagation model of the Ebola virus. Drawing from the natural phenomena associated with the development of immunity by individuals against virus strains and the potential coverage an immune individual provides for a susceptible individual, a new variant was proposed, named immunity-based variant (IEOSA). Both the base algorithm and the immunity-based variant were exhaustively tested using continuous benchmark functions. The obtained results confirmed their viability. We present a summary of the mathematical models of the methods to support discussion of the techniques for the proposed BEOSA and BIEOSA. The population initialization of EOSA and IEOSA is undertaken as shown in Equations (1) and (2).
i n d i = L + r a n d   U L
i n d i + 1 = g   i n d i   1 i n d i
where g is a constant (3), r a n d is a randomly generated real number, L is the lower bound and U is the upper bound of the optimization problem. The mutation of infected individuals in the continuous space is described by Equation (3), where Δ is the change factor of an individual and g b e s t is a global best solution.
i n d i n e w = Δ e r a n d cos 2 π r a n d   ( i n d i   g b e s t )
The calculations for the allocation of individuals to compartments I, R, D, H, V and Q were detailed in [14,40]. Considering the increasing demand for solving binary optimization problems and the outstanding performance reported by the EOSA method, the binary EOSA (BEOSA) is proposed in this study. In the following subsections, we include a detailed discussion on the design of the algorithm for BEOSA and BIEOSA.

3.2. Binarization of Search Space

The BEOSA search space consists of individuals whose representations are of the binary search space form. The entire population represents individuals whose anatomies are made up of binary digits. This representation is required to aid in the processes of identification and differentiation of selected features from those which are not. Figure 1 presents an illustration of the entire search space for the BEOSA algorithm. First, the population of individuals in the search space is determined according to two parameters, namely, population size p s i z e and the dimension of dataset D . D is obtained by computing the number of features in dataset X , while p s i z e is declared during the initialization of the population. Following an iterative approach, each individual i n d i in the population is initialized to a value of 1 for the whole of D dimension in i n d i . It is expected that the application of the BEOSA operation on the search space will result in optimized solutions whose internal representation will have been modified to values between 0 and 1 for the whole of the D dimension in i n d i .
The complete optimization process, which is expected to run for a number of iterations, will yield output for each individual i n d i , similar to what is shown in Figure 2. It is assumed that cells whose values are 1 s are considered to translate into the features which have been selected. Recall that the dimension of D for arbitrary solution i n d i is similar to the number of features F in the dataset of X . As a result, we simply count the number of 1 s in the dimension of D for every i n d i which represents the instances in the dataset X .
The formalization of the search space is necessary to support the process of binarization of EOSA which is suitable for solving the problem of feature selection. In the following subsection, we describe the composition of the proposed BEOSA method.

3.3. Binarization of EOSA (BEOSA)

The design of the new variant of EOSA which will be able to optimize solutions in a discrete solution space applies some new operators to existing ones in the algorithm. The first is the definition of transformation functions which can change the solution representation and optimization process from a continuous form to a discrete one. This is necessary to allow the new method to process problems which are peculiar to feature selection. The second operation modelled to achieve the new variant BEOSA is modifying the fitness function. Evaluating the solutions to find the global best among all individuals requires that the fitness of the solutions be computed. The definition of the fitness function is presented to suit the problem domain. Furthermore, the design of the BEOSA algorithm and a flowchart are presented and discussed.

3.3.1. Transformation of Method

We proposed four transformation functions to position infected individuals i n d i in the discrete space. These functions follow the popular S-functions and V-functions categories, so that two functions are described for the latter and two for the former. Equations (4) and (5) contain the S1 and S2 functions which belong to the S-transform function, while Equations (6) and (7) contain the V1 and V2 functions which belong to the V-function family.
S 1 =   1 1 + e ( x 2 )
S 2 =   1 1 1 + e x
V 1 =   x 2 + x 2
V 2 = tan x
In Figure 3, the behavior of the transform functions is plotted to show that they are truly able to generate patterns similar to the class of function they belong to. For instance, the (a) part of the figure shows that the two S-functions result in an S-shaped pattern when the function is applied to values [−6, 6], while a V-shaped pattern is reported for the V-functions when they are applied to the same values. Note that these functions confine their output on the y-axis to values between [0, 1], which is the aim of using the transform functions.
The aim of applying these transform functions is to ensure that they can help transfer the composition of feature positions in an individual to either 0 or 1. Additionally, these functions can increase the probability of changing the natural composition of that individual, so that they become a potential solution for solving feature selection problems. This is illustrated using Equations (8) and (9). The first part of the two equations controls the selection of either the S1 or S2 function when applying the S-function and the use of either T1 or T2 when applying the V-function. A determinant factor is used to guide this decision, so that if r a n d 0 | 1 , the function generates 1, and the S2 or T2 function is called as appropriate; otherwise, the S1 or T1 function is called. In the second part of the two equations, the value of the k t h position in the representation of individual i n d i is modified to be 1 when r > S i n d i k for S-functions and r > T i n d i k for T-functions; otherwise, 0 is assigned to the k t h position whereby k lies between 0 k < D , and r is a randomly generated between [0, 1].
S i n d i k ,   T i n d i k = S 2 i n d i k , T 2 i n d i k r a n d 0 | 1 = = 1 S 1 i n d i k , T 1 i n d i k r a n d 0 | 1 = = 0
i n d i k = 1 r > S i n d i k     r >   T i n d i k   0 o t h e r w i s e
A flowchart of the process of applying the transform functions to achieve the translation of the BEOSA from the continuous space to the discrete space is illustrated in Figure 4. The optimization process begins with a population described as the susceptible group. Based on the natural phenomenon of the EOSA method, some individuals are exposed to the virus, thereby leading to some of them being allocated to the infected subgroup. It is these infected individuals that are optimized for a number of iterations. It is expected that during the iteration, almost all the members of the susceptible subgroup will move to the infected subgroup. For each i n d i in the I subgroup, the k t h position is mutated using either of the S-functions or V-functions, depending on the satisfiability of the p o s i < T H R E S H O L D criteria. Note that the p o s i function computes the current position and displacement of individual i n d i . A constant value of 0.5 was assumed for the T H R E S H O L D parameter during experimentation. The satisfiability of this condition determines whether the S-functions or the V-function will be applied. The final output of the optimization process is in a vector of 0 s and 1 s, as shown in Figure 4.
The mutation of the values of the k t h position in every i n d i in the I subgroup and the termination of the iterative condition will lead to the evaluation of the fitness values of each individual in the entire population, thereby determining the current global best solution for solving the feature selection problem. The following subsection discusses the fitness function used in this study.

3.3.2. Fitness and Cost Functions

A combination of both the fitness function evaluation and the cost function evaluation was used to locate the best-performing solution to solve the feature selection problem. The fitness function in Equation (10) evaluates the solution based on its performance on classifier c l f on subset of the dataset X : 1 i n d i and with the application of control parameter ω . The notation 1 i n d i , as used in the equation, returns the number of 1 s in the array representing individual i n d i . Note that the notation F returns the number of features selected in the individual, while D represents the dimension of the features in dataset X . For experimental purposes, a value of 0.99 was used for ω .
f i t = ω * ( 1 c l f X : 1 i n d i + 1 ω   F D
In Equation (11), the cost function is evaluated from the output of the fitness function, i.e., by simply subtracting the value returned by f i t from 1. Both the fitness and cost function values are graphically applied to analyze and interpret the relevance and quality of every best solution obtained for each dataset.
c o s t = 1 f i t
In the following subsection, we demonstrate how these functions are used in the description of the proposed BEOSA method.

3.3.3. BEOSA Algorithm and Flowchart

The representative models for the binary search space and mathematical models described in the previous subsections are formalized using the algorithm and flowchart presented in this subsection. First, we present the algorithmic formalization as seen in Algorithm 1, which indicates that the values for e p o c h (maximum number of iterations), p s i z e (population size), s r a t e (short distance rate) and l r a t e (long displacement rate) are required for input, while the output of the algorithm is the global best solution, the cost values for each iteration and the feature count obtained for the optimization process. The binarization of the solution space and computation of the fitness values for each solution are listed in Lines 4–5. The current global best solution and the displacement positions for all individuals in the susceptible compartment are computed in Lines 6–7. In Lines 8–34, the iteration for the optimization process is described, given the satisfiability of two conditions: the number of maximum iterations is not reached, and some individuals remain infected. An estimation of the number of individuals to quarantine from the infected is computed, and a declaration of the separation of quarantined from infected individuals is made, in Lines 9–10. Iteration of the infected individuals is declared in Line 11, and the number of newly infected cases in the susceptible group is shown in Line 12. In Lines 13–28, we iterate the newly infected cases and generate the discriminant value in Line 14. If the evaluation of the condition in Line 15 is true, then it implies that the method will search within a local space; otherwise, it will search in a global space. In each case of exploitation and exploration, we compute the anticipated number of infections. In Lines 17–21, we apply the S 1   ( ) or S 2   ( ) function, depending on the value of d . Additionally, depending on the satisfiability of the condition in Line 18, the feature position in that individual is mutated to either 1 or 0. A similar procedure is repeated for the exploration phase using the T 1   ( ) or T 2   ( ) function, depending on the value of d . Finally, the compartments are updated and the global best solution is determined before executing the next iteration.
Algorithm 1 Pseudocode of the BEOS Algorithm
  • Input: epoch, psize, srate, lrate
  • Output: gbest, costs, fcount
  • begin
  • Initialize the populations (psize) as S
  • Binarize the solution space S
  • Assign first item in population to first infected case (I)
  • Make newly infected case global best
  • while e < epoch and size (I) > 0 do:
  • Compute individuals to be quarantinea
  • I = difference of current infected cases (I) from quarantine cases
  • for i in 1 to size(I) do:
  •   generate new infected (nI) case from S
  •   for i in 1 to size(nI) do:
  •    randomly generate d between 1|0
  •    if displacement(nI[i]) > 0.5 do:
  •     update size of nI using srate
  •     s = use S2(nI[i]) to transform all dimensions if d is 1, otherwise use S1(nI[i])
  •     if s >= rand do:
  •      nI[i] = 1
  •     else:
  •      nI[i] = 0
  •    else:
  •     update size of nI using lrate
  •     t = use T2(nI[i]) to transform all dimensions if d is 1, otherwise use T1(nI[i])
  •     if t >= rand do:
  •      nI[i] = 1
  •     else:
  •      nI[i] = 0
  • Evaluate new fitness of nI[i]
  • add (nI) cases to (I) cases
  • Update all compartment
  • Update best solution so far
  • Increment e by 1
  • End while
  • Compute feature count (fcount)
  • Returnbest solution, cost of best solution, fcount
Figure 5 is a flowchart of the entire optimization process of the algorithm. The figure provides a representation of the entire algorithm using a graphical method, including a graphic representation of the flow of the use of the transformation functions. The parting points for the S-functions and T-functions are shown clearly. The flowchart shows the initialization of the population and the global best updates upon the completion of an iterative process.
In the following subsection, we describe the various classifiers applied in this study to obtain the fitness and cost values of the BEOSA method.

3.4. Feature Selection and Count

The computation of the fitness and cost functions largely depends on the classification accuracy obtained by using a classifier on a selected fragment of the dataset. In this study, an investigative exploration was carried out to determine the influence of different popular classifiers in solving the feature selection problem. Although the K-nearest-neighbor (KNN) was used as the base classifier, we applied the random forest (RF), multi-layer perceptron (MLP), decision tree (DT), support vector machine (SVM), and Gaussian Naive Bayes (GNB) classifiers as well. On this basis, the number of features selected for arbitrary individual i n d i is computed using Equation (12), where D and 1 i n d i k represent the dimension of the feature size in the dataset and the number of feature positions with 1s in individual i n d i , respectively.
f c i = k = 0 D ( 1 i n d i k ) D  
The following listing summarizes and describes the procedures and parametrization used for each of the classifiers investigated in this study:
(a)
KNN model: this model solves the classification problem by obtaining K-sets of items sharing some similarity. k-fold values of 5, 3 and 2 were investigated to ascertain the most viable settings. For most of the applied datasets, we found a k-fold of 5 to yield optimal performance, whereas in the case of the Iris and Lung datasets using the BSFO algorithm, we found a k-fold of 2 to be optimal.
(b)
DT model: similar to the KNN, this study found that k-fold values of 5 and 2 were more suitable for most algorithms and the datasets studied. A significant number of the experiments showed impressive performance using a k-fold of 5. Meanwhile, the maximum depth used for the decision tree model was 2.
(c)
RF model: the classification task of RF for all of the benchmark datasets that were applied using the proposed algorithm was tested using 300 estimators, while the k-fold used for the cross-validation operation remained at 5.
(d)
MLP model: the MLP model was tested with the settings of 0.001 for the alpha parameter and with hidden layer sizes of the tuple (1000, 500, 100). The model was trained over 2000 epochs with a random state of 4. Additionally, a k-fold of 5 was used for the cross-validation task.
(e)
SVM model: the SVM undertakes its classification operation by identifying a decision boundary which is approximate enough to separate items in a dataset into classes. The linear function was applied for the kernel settings, while a C value of 1 and a k-fold value of 5 were investigated with the proposed BEOSA and BIEOSA algorithms.
(f)
GNB model: The default values for the parameters of the GNB model were applied for the experimentation, although we manually set the k-fold value to 5 for the cross-validation task. These default parameters demonstrated optimal performance in computing the probability value, which may be described as follows: given class label Y and feature vector X , we can compute the probability of X when that of Y is known, as shown in Equation (13).
P ( Y | X ) = P X | Y P Y P X
In the next section, we present a detailed discussion of the experimental settings and computational environment with the datasets used to test the method presented in this section.

4. Experimental Setup

A description of the experimental configuration is presented in this section. We first note that the computational environment used for our experiments was a personal computer (PC) with the following configuration: CPU, Intel® Core i5-4210U CPU 1.70 GHz, 2.40 GHz; RAM of 8 GB; Windows 10 OS. We also experimented on a series of computer systems with the following configuration: Intel® Core i5-4200, CPU 1.70 GHz, 2.40 GHz; RAM of 16 GB; 64-bit Windows 10 OS. The binary metaheuristic algorithms were implemented using Python 3.7.3 and supporting libraries, such as Numpy and other dependent libraries. While this describes the computational environment, the following subsections detail the parameter settings and the nature of the input supplied during the experiments. This section also presents and justifies the selection of some of the evaluation metrics applied for our comparison of results.

4.1. Dataset

Exhaustive experimentation with BEOSA was carried out using 22 benchmark and popularly available datasets [41]. These datasets have been widely used for comparative analyses of binary metaheuristic algorithms and were therefore considered suitable for testing the efficiency and performance of the method proposed in this study. Table 1 provides some information about the applied datasets. High, moderate and low-dimension datasets are included, making them suitable for experimenting with the BEOSA method on those three dimensions. This became necessary, considering the importance of investigating the suitability of an algorithm on a variety of datasets, high-dimension ones in particular, since these often have similarities with real-life binary optimization problems.
The number of biomedical datasets is growing rapidly; this has led to the generation of high-dimensional features that negatively affect the classifiers of machine learning processes [42]. Many of the feature selection methods described in the literature suffer from diversity of population and local optima problems when they are evaluated against high-dimensional datasets, such as the ever-growing body of biomedical datasets. Feature selection is aimed at selecting the most effective features from an original set containing irrelevant elements; this becomes especially challenging to with high-dimensional datasets, which is why it is important for us to prove the efficacy of the BESOA with such data dimensionality.
The Lung, Prostate, Leukemia, KrVsKpEW, Colon and WaveformEW datasets are considered here as high-dimensional datasets with feature sizes ranging between 4 and 7070. Additionally, most of these datasets have binary classification problems or, in the case of Lung and WaveformEW, multi-classification problems. BreastEW, Exactly, Exactly2, M-of-n and Tic-tac-toe are medium-sized dimensional datasets. Most of the datasets in this category have a number of instances between 9 and 203, and numbers of features are mostly more than 270, except for Iris, which has about four features. Meanwhile, are all binary classification problems. Low-dimensional datasets are those considered to have <500 instances and probably fewer features. The CongressEW, Iris, HeartEW, Ionosphere, Lymphography, PenglungEW, Sonar, SpectEW, Vote and Zoo datasets are in this category. The Iris dataset demonstrates exceptional characteristics, since only four features exist in that dataset, but each has 150 instances. All are binary classification problems except for PenglungEW, Zoo and Lymphography, which are multi-classification problems.
A description of each of these datasets is included. Most share some biological features, while the rest were collated from various other domains.

4.2. Parameter Configuration and Settings

Eight binary variants of metaheuristic algorithms were employed for a comparative analysis with the method proposed in this study, i.e., the binary dwarf mongoose optimizer (BDMO) [14], the binary simulated normal distribution optimizer (BSNDO), the binary particle swarm optimizer (BPSO), the binary whale optimization algorithm (BWOA), the binary sailfish optimizer (BSFO), the binary grey wolf optimizer (BGWO), BEOSA and BIEOSA. Table 2 lists the parameter settings applied for each of the algorithms. The values for the parameters π, β1, β2, β3 and β4, as used in our experiments with BEOSA and BIEOSA, are listed in the table as 0.1, 0.1, 0.1, 0.1 and 0.1, respectively. Similarly, the values for nb, na, ns, peep, τ and L were set at 3, (N-nb), (n-nb), 1, [0,1] and round (0.6*D*nb) for BDMO. The mean of the population size was computed to set the mo parameter of the BSNDO. Additionally, the BPSO control parameters c1, c2, W and Vmax were initialized at 2, 2, 0.9 and 6. The parameters for the remaining algorithms are shown in the table, with p, l, b r, and C for BWOA being initialized at [0,1], [0,1], 1, [0,1] and 2r. In BSFO, p was 0.1, A was 4 and epsilon was 0.001. Lastly, BGWO was initialized within the bound [2, 0] as coefficient for decreasing power attack.
Population sizes of 25, 50, 75 and 100 were investigated for each of the algorithms to show how this variable affected performance. The training of the algorithms followed 50 iterative processes, and the experiment for each algorithm was typically repeated 10 times/runs to determine the average performance. The formulae applied to compute these averages and all other similar metrics used for our comparative analysis are presented in the following subsection.

4.3. Evaluation Metrics

The evaluation metrics presented in the following paragraphs describe the approach used to quantify the obtained values to support our performance comparison. The following metrics are discussed: classification accuracy, mean accuracy, best accuracy and the standard deviation fitness obtained using Equations (14)–(16).
(a)
Classification accuracy (CA): this computes the accuracy of classifier c l f with dataset X and label Y , as described in Equation (14):
C A = c l f X ,   Y
(b)
Mean accuracy (MA): this computes the mean of all classification accuracies obtained after a certain number of runs on a given algorithm, where a c c i is the accuracy obtained during iteration i after N iterations and all accuracy values a c c obtained for N times, as described in Equation (15):
M e a n a c c = 1 N i = 0 N C A i
(c)
Best Accuracy (BA): the best of all classification accuracies obtained after a certain number of runs, as described in Equation (16):
b e s t a c c = max C A
(d)
Average feature count (AFC): obtained by finding the average value for all numbers of selected features for all population groups P G , as described in Equation (17):
A F C = 1 P G i = 0 P G f c i
The following section presents the results of all experiments and a comparative analysis of the algorithms. Additionally, the findings derived from the results are highlighted.

5. Results and Discussion of Findings

The results presented in this section are focused on the performance of BEOSA and BIEOSA in comparison with those of similar methods, i.e., the binary dwarf mongoose optimizer (BDMO) [3], the binary simulated normal distribution optimizer (BSNDO) [9], the binary particle swarm optimizer (BPSO) [53], the binary whale optimization algorithm (BWOA) [54], the binary grey wolf optimizer (BGWO) [55] and the binary sailfish optimizer (BSFO) [56] algorithms. The selection of these algorithms was based on their outstanding performance, as described in various reports, and their status as state-of-the-art methods for binary optimization. We note that our evaluations of most of these algorithms applied the same parameterizations, e.g., the number of iterations and parameter settings. The following subsections are organized as follows. Firstly, we provide a comparative analysis of the various methods based on their fitness performance and the number of selected features. We then examine the classification accuracy of each method as compared with others. Next, we compare the cost functions of all methods and show the impact of the choice of classifiers on the feature classification procedure. Finally, we report the computational time of each method and discuss our findings. The following subsections using tabular and graphical means for the sake of clarity.

5.1. Comparative Analysis of Fitness and Cost Functions

BEOSA and BIEOSA are now compared with related algorithms based on the results obtained for fitness and cost functions. The fitness function aims to minimize the objective function, while the cost function aims to maximize it. Table 3 lists the results obtained for each of the binarized algorithms for all benchmark datasets.
The BWOA algorithms performed better on the BreastEW, Lung, Iris, Exactly2, Colon and Vote datasets, with fitness values of 0.0307, 0.0006, 0.0050, 0.2384, 0.0004 and 0.0013, respectively. BWOA showed superiority with six benchmark datasets, while BGWO showed superiority with WaveformEW, yielding a fitness value of 0.1817. BSNDO outperformed the other methods on eight datasets, i.e., Lymphography, M-of-n, PenglungEW, Sonar, SpectEW, Tic-tac-toe, Wine and KrVsKpEW, with fitness values of 0.0380, 0.0046, 0.0013, 0.0047, 0.0948, 0.1647, 0.0298 and 0.0250, respectively. Interestingly, BEOSA outperformed most of the other methods, showing superiority with nine datasets, i.e., CongressEW, Exactly, Exactly2, HeartEW, Ionosphere, Prostate, Wine and Zoo, with fitness values of 0.0575, 0.2620, 0.2384, 0.0772, 0.0722, 0.0002, 0.0298 and 0.0533, respectively. Meanwhile, the associated variant of the proposed algorithm, BIEOSA, was competitive with BEOSA, showing superiority on two datasets. The implication of these findings is that the new method is suitable for minimizing the fitness function, allowing it to solve the difficult problem of feature selection on a wide range of datasets with different dimensionalities.
The values obtained for the cost function are plotted in Figure 6 to show the variation in the performance of the algorithms with the various datasets. A close examination of the plots for the Zoo, Vote, Wine, Sonar and Tic-tac-toe datasets shows that BEOSA yielded outstanding cost values during the iterative process. In the five considered datasets, the BGWO method showed unstable performance on the cost function, whereas the BEOSA, BIEOSA, BDMO, BSNDO, BPSO, and BWOA were stable and BEOSA, BIEOSA, and BPSO often yielded similar results. The BEOSA curve was above those of the other methods for the Zoo, Sonar and Tic-tac-toe datasets and was close behind those of other methods for the Vote and Wine datasets. In the second category, we compared the performance curves of all the methods using M-of-N, Ionosphere, Exactly, Exactly2, and HeartEW datasets. The BGWO maintained its unstable performance along the curve line, whereas all the remaining methods yielded good results. For example, BEOSA and BPSO closely shared the top section of the plots, meaning that their performance on the cost function was superior to those of the other methods. At the same time, both BDMO and BWOA were low in all the plots, showing that their performance in evaluating the cost function was poor. The BIEOSA and BSNDO were average performers in the five datasets. The third categories of datasets for comparison were Congress, Lymphography, BreastEW, Colon, and SpectEW. With the high dimensional Colon dataset, the BEOSA yielded similar results to BPSO and BGWO, even though the latter was unstable, while the variant BIEOSA and BSNDO methods demonstrated average performance. For the BreastEW dataset, both BEOSA and BIEOSA outperformed the other methods, yielding the best cost function curve. The BEOSA algorithm was just below that of BPSO on the Lymphography dataset, which superseded all other algorithms. The BIEOSA, BPSO, and BWOA were all plotted at the top section for the CongressEW datasets, while the BEOSA algorithm trailed behind. Similarly, the BEOSA outperformed all methods on the SpectEW datasets, although the BIEOSA algorithm yielded a curve in the lower section.
The performance of the algorithms on the Zoo dataset were as follows: the cost values range for BIEOSA was 0.50–0.52, BSNDO 0.64–0.65, BDMO 0.75–0.76, BWOA 0.80–0.81, BPSO 0.84–0.85, BGWO 0.74–0.95, and BEOSA 0.99–1.0. The Vote dataset yielded the following results: BWOA 0.80, BGWO 0.75–0.93, BSNDO, BDMO and BEOSA all 0.94, BIEOSA 0.94–0.97, and BPSO 0.98. The Wine dataset results were as follows: BDMO 0.58, BGWO 0.58–0.97, BSNDO 0.7750–0.7799, BWOA 0.81, BEOSA 0.94, BPSO 0.88–0.97, and BIEOSA 0.97. Performance with the sonar dataset was as follows: BDMO was the lowest among all curves at less than 0.55; meanwhile, BSNDO was at 0.76, BIEOSA was 0.78, BWOA was 0.81, BGWO 0.88–0.87, BPSO 0.88–0.91, and BEOSA 0.86–0.93. For the tic-tac-toe dataset performance BGWO outperformed the other methods by running from 0.62–0.82, BDMO was 0.59, BWOA was 0.62, BIEOSA was 0.63, BSNDO was 0.66, BPSO was 0.68, and BEOSA was 0.73.
The performance of the algorithms on the M-of-n dataset was as follows: the cost function values for BDMO were just above the 0.50 value, while those of BSNDO were 0.62, BWOA was 0.72, BIEOSA was 0.78, BEOSA was 0.83, BGWO was 0.80–0.84 with its peak at 0.97, and BPSO was 0.92-1.0. The Ionosphere dataset yielded the following results: BGWO began its curve from 0.752 and ended at 0.777, BWOA ran through 0.812, BDMO ran through 0.8125, BIEOSA went from 0.826 to 0.840, BSNDO was above 0.850, the BEOSA curve was just above 0.875, and the BPSO curve started from 0.805 and extended to just above 0.900. The Exactly and Exactly2 datasets yielded the following patterns: the BDMO curves were at 0.577 and 0.45 for Exactly and Exactly2, respectively. The BIEOSA curve was 0.625 with Exactly and ranged from 0.64 to 0.68 on Exactly2, BWOA was 0.635 with Exactly and 0.47 with Exactly2, BSNDO was 0.635 with Exactly and 0.60 on Exactly2, BGWO ranged from 0.650 to 0.635 on Exactly and from 0.75 to 0.70 on Exactly2, BPSO was 0.675 with Exactly and 0.75 with Exactly2, and BEOSA was just below 0 to above 0.76 with Exactly and Exactly2, respectively. The result for HeartEW showed that BWOA and BDMO ranked lowest, with cost function value of around 0.50. BGWO followed, starting at 0.55, peaking at 0.83 and ending at 0.69; BIEOSA was 0.64, BPSO ran from 0.65 to 0.75, and lastly, BEOSA, was above all the other algorithms at 0.78.
The CongressEW and Lymphography datasets demonstrated some similarity, with the BSNDO curve at the bottom with 0.62 and 0.52, respectively. This was followed by BDMO, which was at 0.80 and 0.70 with the CongressEW and Lymphography. While the BIEOSA, BWOA and BPSO curves were around 0.95 for the CongressEW dataset, the same algorithms were sparsely plotted with the Lymphography dataset at 0.68, 0.80, and 0.90, respectively. Typically for BGWO, in this case, it started at 0.89 and ended at 0.86 for CongressEW and started at 0.84 and ended at 0.78 with the Lymphography dataset. The BEOSA curve was a 0.875 on the CongressEW dataset and 0.80 on the Lymphography dataset. The algorithm curves showed different performance with the Colon and BreastEW datasets. For instance, where the BWOA algorithm curve was below 0.70 with Colon, it shot up above 0.90 with BreastEW. Additionally, the curve of BSNDO was around 0.85 for the Colon graph but fell below 0.70 with the BreastEW graph. BIEOSA also showed some disparity on Colon, where it crossed the graph close to 0.85; meanwhile, with the BreastEW, it had a better cost value, running close to 0.95. The characteristic of BGWO is that it always zig-zagged its curves, as can be seen with Colon, where it started at 0.92 and ended on the same value, peaking at around 1.0 and dipping to around 0.85. The same algorithm started at 0.93 and ended at 0.92, with its peak at around 0.94 and trough at 0.86 for the BreastEW dataset. The BDMO curve was just below 0.70 on the Colon and around 0.93 on BreastEW. The BPSO and BEOSA curves were around 1.0 with the Colon dataset. Lastly, the SpectEW dataset had some interesting curves for BIEOSA, BPSO and BEOSA, with curves starting from 0.62, 0.83, and 0.90, respectively, and then stabilizing at 0.62, 0.83, and 0.89, respectively. BSNDO and BWOA consistently had curves at 0.75 and 0.80, respectively. BGWO spiked up and down, starting from 0.70 to 0.80 and having a peak at 0.81. BDMO was just below 0.80.
The takeaway from these cost function evaluations is that whereas the values obtained varied across datasets, both BPSO and BEOSA always performed well, mostly yielding curves above those of the other algorithms. This implies that both algorithms demonstrated superiority compared with the other methods, though in most cases, BEOSA outperformed BPSO.
The implication of these outcomes is that both the BEOSA and BIEOSA methods are relevant binary optimization algorithms with great potential for producing very good performance on heterogeneous datasets with different dimensionalities. The cost function, which evaluates how far an algorithm moves away from the fitness function value, also evaluates the robustness of the algorithm in terms of its ability to sustain a good cost function evaluation; the higher the cost function value, the better the fitness value obtained. Considering the consistently outstanding performance of both BEOSA and BIEOSA on the fitness and cost function evaluations with all datasets, we conclude that the algorithm is very suitable for solving the problem of feature selection with effective minimization and maximization of fitness and cost values, respectively. In the following subsection, we compare the number of selected features obtained for all methods and associate this with the fitness evaluation discussed in this section.

5.2. Comparative Analysis of Selected Features for All Methods

The basis of solving feature selection problems using the binary optimization method is to reduce the number of features used for classification purposes. This is necessary to eliminate the bottleneck which is often associated with high-dimensional datasets on classifiers. Another benefit of reducing the number of features is to ensure that only relevant ones are used for the classification operation. In this subsection, we evaluate BEOSA and BIEOSA and compare their performance with that of BWOA, BPSO, BFSO, BGWOA, BDMO and BSNDO. Table 4 compares the number of features selected for each algorithm across four different population sizes, namely, 25, 50, 75 and 100.
An interesting performance result was observed when the algorithms were compared based on their average number of selected features. For example, the BPSO outputs a value of 1 for the number of selected features for all population sizes and for all datasets considered during our experiments. While this showed some measure of abnormality in the process of feature selection in the algorithm, we observed more standard performance for all the remaining methods. As an example, consider the outcome of some low-dimensional datasets such as the BreastEW, CongressEW, Exactly and Exactly2. The BEOSA and BIEOSA yielded similar results to all the other methods. For BWOA, BGWO, BDMO, BSNDO, BEOSA and BIEOSA, the average numbers of features (on population sizes yielding this average performance) were 17.0 (25), 16.9 (100), 5.5 (50), 3.0 (25), 7.3 (25) and 5.9 (75), respectively. The CongressEW showed 8.9 (100), 10.1 (25), 2.4 (50), 2.4 (50), 5.3 (100) and 4.6 (50) for the same BWOA, BGWO, BDMO, BSNDO, BEOSA and BIEOSA methods. Similarly, the Exactly dataset reported values of 6.2 (50), 8.5 (25), 3.0 (25), 2.2 (25), 4.2 (25) and 2.5 (75), while Exactly2 gave 7.0 (75), 8.3 (100), 3.5 (100), 2.0 (25), 1.7 (100) and 2.5 (25) on the same methods, respectively. These results show that in most cases, population sizes of 25–50 were sufficient to produce the desired results. Even population-intensive algorithms such as BEOSA and BIEOSA demonstrated that their best average feature selection counts could be obtained using a population size range of 25–75.
High-dimensional datasets like Lung, Prostate, Leukemia and Colon, and moderate-dimensional datasets like the PenglungEW, showed how superior the BEOSA and BIEOSA methods were. For instance, the average feature selection numbers in the Lung dataset were 1692.5 (100), 2151.1 (75), 820.3 (50), 2098.4 (25), 403.9 (100) and 685.3 (75) for BWOA, BGWO, BDMO, BSNDO, BEOSA and BIEOSA respectively. Clearly, the BEOSA algorithm yielded the best performance, i.e., 403.9 using 100 as the population size. Additionally, the results obtained for the Prostate dataset showed that the BDMO, BSNDO, BEOSA and BIEOSA methods were able to yield average numbers of selected features of 1402.8 (100), 1478.4 (25), 682.2 (100) and 1141.5 (25), while the BEOSA provided the best performance with a population size of 100. A very impressive result was obtained for BEOSA and BIEOSA on the Leukemia dataset; the BWOA, BGWO, BDMO, BSNDO, BEOSA and BIEOSA algorithms yielded 1708.3 (25), 2320.4 (75), 999.7 (75), 928.5 (25), 50.3 (25) and 589.7 (75), respectively, for the average number of selected features. BEOSA produced an optimal number of 50.3 for selected features with a population size of 25. Moreover, the BIEOSA variant also yielded 589.7 as the average feature size with a population size of 75. Furthermore, the performance of the BWOA, BPSO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA algorithms on the PenglungEW dataset showed values of 170.0 (75), 4.0 (50), 193.0 (50), 23.9 (100), 124.0 (100), 35.0 (75) and 24.0 (25), respectively, as the average number of selected features. Accordingly, BEOSA and BIEOSA performed well using population sizes of 75 and 25, respectively. The performance on the Colon dataset for the BEOSA and BIEOSA methods was also very impressive compared with related methods; the BWOA, BGWO, BDMO, BSNDO, BEOSA and BIEOSA yielded 1016.1 (50), 1301.5 (100), 546.2 (25), 1374.3 (25), 157.1 (75) and 316.5 (100), respectively. We observed that both BEOSA and BIEOSA yielded a low average number of selected features, with values of 157.1 and 316.5, respectively, for methods with population sizes of 75 and 100. Note that the population was not so relevant to the obtained result, since BGWO, which obtained its best average number of selected features with a population size of 100, yielded a far worse result.
The performance of BEOSA and BIEOSA regarding the average number of selected features showed that the proposed method is suitable for selecting the optimal set of features required to achieve improved classification accuracy. An interesting finding revealed by this performance analysis was that BEOSA and BIEOSA are very suitable methods for high-dimensional datasets with a larger number of features to start with. The result also showed that both BEOSA and BIEOSA were very competitive approaches, even when dealing with low-dimensional datasets. In the following subsection, we evaluate and compare the classification accuracy of the selected features by each of the methods discussed in this section.

5.3. Comparative Analysis of the Classification Accuracy of All Methods

The average number of selected features influences the classification accuracy, with a lower number of features being desirable so that classification operation is not bottlenecked. In this subsection, we evaluate the performance of BIEOSA and BEOSA and compare these approaches to related methods. Moreover, a comparative analysis is done in a manner that considers the influence of the population size on performance. Similar to the previous subsection, population sizes of 25, 50, 75, and 100 were compared for each binary optimizer algorithm.
Table 5 shows the performance of the BreastEW, Exactly2, HeartEW and Ionosphere datasets in relation to BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA. For the BreastEW datasets, values of 0.9351, 0.9535, 0.9272, 0.8921, 0.6930, 0.9430 and 0.9149 were obtained for BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA, respectively. Interestingly, the BEOSA algorithm with a population size 100 yielded the best overall performance. The best overall performance obtained with the Exactly2, HeartEW and Ionosphere datasets was 0.7660, 0.8074 and 0.9286 using BPSO, BEOSA and BPSO, respectively. The breakdown results on the Exactly2 dataset showed classification accuracies of 0.7345, 0.7660, 0.7350, 0.7175, 0.6875, 0.5900, 0.7625 and 0.7495 when using BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA, respectively. Values of 0.6963, 0.8019, 0.7222, 0.6963, 0.5722, 0.4815, 0.8074 and 0.6870, and 0.8500, 0.9286, 0.9000, 0.8457, 0.8143, 0.7286, 0.9143 and 0.8729 were obtained for the BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA when using the HeartEW and Ionosphere datasets, respectively.
Similarly, the performance of the Tic-tac-toe, Vote, Wine and Zoo datasets with BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA was observed. The results showed with Tic-tac-toe, BEOSA was superior, with a classification accuracy of 0.7964. In contrast, the worst performance on this dataset was observed with the BDMO method, which yielded a value of 0.6219. The BEOSA method showed a classification accuracy of 0.9583 with the Vote dataset, a demonstration of superiority above all other methods; BPSO yielded 0.9450 and BDMO yielded 0.8333. The Wine and Zoo datasets yielded 0.9556 and 0.9400 classification accuracy with the BEOSA method for the two datasets. We note that in most cases where BEOSA outperformed the other methods, the population sizes were 75 and 100, which supports the attainment of optimal performance in the high dimensional datasets.
UPTOHERE The performance summary for all the methods showed that the BWOA algorithm operated with optimal classification accuracy on 10 datasets with 100 population size, while population sizes 25, 50 and 75 showed 0, 9 and 1 optimal classifications, respectively. The BPSO method demonstrated that using the population size of 100 resulted in 10 datasets performing very well. In contrast, the population sizes 25, 50 and 75 were only able to obtain the best performances, 2, 3 and 5, respectively. Similarly, we observed that the BSFO, BGWO, BDMO, and BSNDO obtained their best classification accuracy using the population sizes of 50, 75, 75, and 100 on 6, 9, 6, and 19 datasets, respectively. The BEOSA and BIEOSA methods obtained their best classification accuracy when the population sizes of 100 and 50 were used such that they both gave such best on 9 and 7 datasets, respectively. Meanwhile, BEOSA showed that using a population size of 25 and 50 will impair the performance, indicating that increased population size supports the improvement of the performance of the algorithm.
The classification accuracy curves for the Zoo, Vote, Wine, Sonar, Tic-tac-toe, M-of-n, Ionosphere, Exactly, Exactly2, HeatEW, CongressEW, Lymphography, Colon, BreastEW, and SpectEW datasets on the BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA, and BIEOSA are analyzed for further understanding of performance differences. The plots for classification curve analysis are presented in Figure 7. The curves of all the methods on the Zoo dataset showed that BEOSA and BIEOSA performed better than any of the other methods. Similarly, we observed that the BEOSA method performed well on Vote, Wine, Sonar, Tic-tac-toe, HeartEW, CongressEW, BreastEW and SpectEW datasets. Using the M-of-n, Ionosphere, Exactly, and Exactly2 datasets, the BEOSA and BIEOSA methods demonstrated strong competition with the BPSO method while outperforming the remaining methods.
The classification accuracy for the Zoo dataset on each of the algorithms shows that the BDMO curve runs from 0.59 and ends at 0.61 with a peak at 0.66. BSNDO had a straight curve on 0.65. BIEOSA started from 0.75, peaked at 0.79 and ended at 0.78, BWOA started from 0.87, deep at 078 and ended at the same 0.78, BGWO started from 0.90 peaked and dipped at 0.85 and 0.89 respectively, and ended at 0.86. BPSO and BEOSA top the plots with their curves starting from 0.91 and 0.92 and ending at 0.88 and 0.91, respectively. For the Vote dataset, BDMO rose from 0.775 to 0.825, and BIEOSA started from 0.840, peaked at 0.875 and ended below 0.825. BGWO rose from above 0.850 and terminated above 0.875, while BWOA started from 0.875 and rose slightly to 0.880. BPSO and BSNDO both started just above 0.925 and ended at 0.935 and 0.925, respectively. BEOSA tops the graph by peaking just above 0.950. The performance of Wine and Sonar are similar, with the BDMO method running at the bottom of the graphs of the two datasets starting from an average accuracy value of 0.65 and ending at around 0.66, although the curve peaked above 0.70 for the Sonar dataset. BGWO started from around 0.75 for both Wine and Sonar, and ended just below 0.75 and 0.78 respectively. BSNDO curves in both datasets run between 0.75, and BIEOSA similarly started just above 0.80 and ended just below 0.80 in both cases. As a characteristic of BPSO and BEOSA, both algorithms peaked the graphs for Wine and Sonar by starting from 0.88 and 0.94 on Wine, 0.83 and 0.82 on Sonar, then ending at 0.93 and 0.95 on Wine, and 0.88 and 0.86 on Sonar. The Tic-tac-toe dataset has BDMO at the bottom and BEOSA at the top, starting from 0.61 and 0.74 and ending at 0.60 and 0.79, respectively. BSNDO and BIEOSA ended their curves at around 0.64 but started at 0.63 and 0.65, respectively, while BWOA and BGWO started at the same point of 0.68 but ended at 0.66 and 0.69, respectively. The performance for BPSO showed that it peaked when the population size of 75 was used to obtain an accuracy value of 0.75.
Experimental results for M-of-n, Ionosphere and Exactly are consistent for BPSO which tops the graphs of the three datasets by showing the lowest performance with population size 25 in all cases, but reported the best accuracies at 75, 50 and 75 population sizes each at 0.87, 0.925, and 0.84 accordingly for three datasets. BDMO lies at the lowest in M-of-n and Exactly but ranked second lowest in Ionosphere by obtaining its peaks at 0.61 for 75 population size, 0.810 for 75 population size, and 0.64 at 50 population size for M-of-n, Ionosphere and Exactly, respectively. The BSNDO reported straight curves in the three datasets. BIEOSA curves showed its peak classification accuracies at 0.74 using 100 population size, around 0.875 using 50 population size, 0.67 using 100 population size for M-of-n, Ionosphere and Exactly. The BWOA and BGWO algorithms showed average performances in the three datasets by obtaining their peak classification accuracy values of 0.79 and 0.80 at 50 and 75 population size with M-of-n, 0.845 and 0.835 both at 75 population size with Ionosphere, and 0.67 and 0.69 at 50 and 75 population size with Exactly. BEOSA obtain its best accuracy at 0.85 using 50 population size, 0.910 using 50 population size, and 0.74 using 100 population size for M-of-n, Ionosphere and Exactly datasets, respectively. The Exactly2 and HeartEW datasets showed that BSNDO results in the same classification accuracy for all population sizes at around 0.580 and 0.480, respectively. This is followed by BDMO, which obtained the best classification accuracies at 0.685 and 0.57 using 75 and 50 population sizes. The BWOA, BPSO and BIEOSA algorithms are seen to overlap in performances on the two datasets, with each reporting peak accuracy at 0.725, 0.710, and 0.750 using 50, 25 and 25 population sizes on the Exactly2 dataset. Similarly, BWOA, BPSO and BIEOSA showed their peak accuracies at 0.69 using 100, 100 and 25 population sizes. BPSO and BEOSA demonstrate a strong competitive performance by having their peak accuracy values at around 0.750 in Exactly2 and 0.80 in HeartEW, in both cases at 100 population size.
Results obtained for CongressEW, Lymphography and BreastEW datasets showed that the BSNDO algorithm performances are almost similar for all population sizes at 0.63, 0.45, and 0.69, respectively. This is followed by the BDMO algorithm, which has its peak accuracies at 0.82, 0.63, and 0.89 using population sizes 50, 75, and 25 on the three datasets. The BIEOSA obtained its best classification accuracies at 0.90 using 100 population size, 0.69 using population size, and 0.91 using 75 population size for CongressEW Lymphography and BreastEW, respectively. BWOA and BGWO competed in performance as seen on their curves in CongressEW Lymphography and BreastEW where BWOA had its best accuracies at 0.93, 0.77, and 0.93 using 75, 50 and 50 population sizes. Similarly, BPSO and BEOSA both peaked in performance by obtaining 0.95, between [0.8–0.9], and around 0.95, all using 75 population size in the three datasets. We observed the curves on the Colon and SpectEW datasets for all algorithms. In both cases, BDMO curves rank lowest at the bottom of the graphs having its peak performances as 0.75 and 0.735, using 25 and 100 population sizes. BSNDO shows close flat curves in both datasets, and peak performances averaged at 0.85 and 0.74 for Colon and SpectEW, respectively. In the SpectEW dataset, BWOA, BIEOSA and BGWO all reported their peak performances at around 0.80 and using 100, 50, and 50 population sizes, while the same algorithms had different curve patterns on Colon. For instance, BIEOSA peak accuracy is at 0.88 and 0.81 using 100 and 50 population size, BWOA peak accuracy is at 0.98 and close to 0.82 using 50 and 100 population sizes, and BGWO peak accuracy is at 0.98 and 0.81 using 75 and 50 population size.
The summary of the results obtained for the classification accuracies on each algorithm with respect to all datasets is consistent with the performance reported for cost function evaluation. BPSO and BEOSA algorithms are seen to perform very well compared with other methods, but in most cases, the proposed BEOSA algorithm yields better performance than BPSO. These consistent performances of BEOSA with regard to fitness function evaluation, cost function evaluation, and classification accuracy for the selected feature sizes confirm the relevance of the algorithm in solving the feature selection problem.
The performance superiority demonstrated by the BEOSA and BIEOSA methods in this comparative analysis for this subsection reinforced the argument that the proposed method is suitable for solving the feature selection problem. This finding is supported by the fact the minimal and optimal number of features selected by the BEOSA and BIEOSA methods were sufficient and determinant enough to yield the best classification accuracy. In the following subsection, we investigate the impact of varying the choice of a classifier and whether this choice influences the performance of the binary optimizer method.

5.4. Performance Evaluation of State-of-the-Art Classifiers on Methods

The classification accuracy analysis applied for the comparative analysis discussed in the previous subsection uses the KNN method. This subsection presents our investigation regarding whether using a different classifier from the list of existing state-of-the-art classifiers would improve the classification accuracy of the optimizers. Table 6 lists the comparative analysis of the influence of different classifiers is presented using the M-of-n dataset as a sample solution. The KNN, random forest (RF), MLP, decision tree (DTree), SVM, and Gaussian naïve Bayes (GNB) classifiers were compared using the accuracy, precision, recall, F1-score and area under curve (AUC) metrics.
The classification accuracy for KNN, RF, MLP, DTree, SVM, and GNB classifiers for BEOSA and BIEOSA were 0.815, 0.835, 0.84, 0.795, 0.845, and 0.83, 0.935, 0.665, 0.67, 0.67, 0.67, and 0.67 respectively. Results showed that the SVM and KNN worked well for the BEOSA and BIEOSA by obtaining classification accuracy of 0.845 and 0.935 for the SVM and KNN respectively. The most competing method, the BPSO algorithm, showed that the MLP and SVM classifiers are more suitable for obtaining better classification accuracy when compared with other classifiers. For precision-recall, F1-score and AUC, the values of 0.875, 1, 0.933333, and 0.993132 were obtained for the BEOSA, while the values of 1, 0.866667, 0.928571, and 1 were obtained for BIEOSA using the SVM and KNN classifiers respectively. This result confirms that when a classifier produces a good classification result on a binary optimizer, that classifier has a tendency to improve the results of the precision, recall, F1-score and area under curve (AUC) metrics as well.
To provide a broader view of the performance of the KNN, RF, MLP, DT, SVM and GNB classifiers with population sizes of 25, 50, 75 and 100, we plotted graphs to show how the BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA algorithms performed. Figure 8 shows the results of the comparisons carried out using the CongressEW dataset as a sample. The BWOA algorithm performed well with SVM, BPSO performed well with KNN, BFSO performed well with MLP, BGWO performed well with RF, BDMO performed well with GNB, BSNDO performed well with GNB, SVM, MLP and RF, BEOSA performed well with KNN and BIEOSA performed well with SVM. These performance differences are a strong indication that research on the use of a binary optimizer to solve feature selection must not be limited to the performance of the optimizer alone, but rather, that efforts must be made to select a fitting classifier as well. Interestingly, we found that KNN and SVM, which are known to work well with most classification tasks, showed good performance with the proposed BEOSA and BIEOSA methods.
The experimental results for the five classifiers using the CongressEW dataset with the BWOA algorithm showed that the classification accuracies of GNB and KNN were around 0.90 for a population size of 25, rising to a peak at 0.94 and 0.93 with a population size of 75. RF and SVM yielded the same value, i.e., 0.93, with a population size of 25 but peaked at around 0.96 with a population size of 75. In the middle of the curves is MLP curve, which has its peak classification value at 0.94 with a population size of 75; its lowest reported value was 0.84, with a population size of 50. With BFSO and BGWO, KNN, RF, MLP, DT, SVM and GNB achieved classification accuracies of 0.86, 0.92, 0.94, 0.86, 0.89, and 0.88 with a population size 50, and 0.935, 0.968, 0.949, 0.956, 0.962 and 0.953 with a population size of 50. In contrast, KNN achieved the best performance with a population size of 75. The graph plots for BDMO and BIEOSA demonstrate another interesting aspect of their performance, i.e., all classifiers in each case peaked and deepened with a population size of 50. For instance, for BDMO, all the classifiers peaked with a population size of 75 with classification accuracies 0.781, 0.80, 0.801, 0.822 and 0.82 for KNN, RF, MLP, DT, SVM and GNB, resprectively. BIEOSA yielded the best classification accuracies for all classifiers with a population size of 25, showing values just above 0.925 for KNN, around 0.950 for MLP, DT and SVM, and around 0.975 for RF and GNB. BSNDO obtained curves running consistently at 0.805 for RF, MLP, DT, SVM and GNB, but obtained approximately 0.76 for all population sizes using the KNN classifier. With BPSO and BEOSA, KNN peaked with a population size of 100 and 75 at 0.989 and 0.0650 accuracies, while GNB peaked at 0.95 and around 0.9540 with a population size of 50 for BPSO and BEOSA. SVM obtained its peak performance at values of 0.959 and 0.9575 on BPSO and BEOSA with population sizes of 100 and 50. With BPSO and BEOSA, MLP peaked with population sizes of 100 and 75 at 0.96 and around 0.9525, while RF peaked at 0.959 and 0.9575 with population sizes of 50 and 75 for BPSO and BEOSA. DT peaked with a similar accuracy to that reported for GNB.
Figure 9 shows the performance of the BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA algorithms with the SpectEW dataset, providing the classification accuracies of the KNN, RF, MLP, DT, SVM, and GNB classifiers. From the plots shown in the figure, it can be seen that the best classification accuracies were obtained with population sizes of 25, 50, 75, and 100 for BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA using SVM, KNN, SVM (also DT and KNN), MLP, RF, KNN, KNN and RF. The result also showed that for BPSO, BSFO and BIEOSA, the best performance of their respective classifiers was obtained with a population size of 100. In contrast, BSFO, BDMO, BSNDO and BEOSA obtained their best classification accuracy with a population size of 25 using their respective classifiers. We note that BSFO and BGWO also performed well with a population size of 75, while BWOA did well with a population size of 50.
The performance of BWOA with the SpectEW dataset showed that most classifiers achieved their peak accuracies with a population size of 50, except for GNB, which obtained its best output with a population size of 100, albeit at a much lower value of 0.64. Meanwhile, KNN, RF, MLP, DT and SVM obtained values of 0.87, 0.84, 0.84, 0.79 and 0.89, respectively. BPSO showed an interesting result when a population size of 50 was used, with all classifiers converging at a classification accuracy of 0.79 as their lowest values. Interestingly, their best accuracies also occurred with a population size of 100, with KNN yielding 0.89, RF 0.83, MLP 0.84, DT 0.80, SVM0.79 and GNB 0.73. The BSFO algorithm is unique, as it showed reccurring overlap with most of the classifiers, as can be seen with GNB and DT, whose maximum accuracy values were 0.75 with a population size of 75, while others also peaked at that point but with a classification accuracy of 0.8. Meanwhile, differentiated classification accuracies were observed for all classifiers when using the BGWO algorithm. The RF and GNB classifiers obtained their best performance with a population size of 50, i.e., 0.83 and 0.7. SVM, KNN, DT and MLP obtained their best accuracies, i.e., 0.82, 0.81, 0.79 and 0.88, with a population size of 75. For BDMO and BSNDO, KNN, RF, MLP, DT, SVM and GNB achieved their best performance as follows: 0.71 with a population size of 75; 0.82 with a population size of 25; 0.8 with a population size of 50; 0.86 with a population size of 25; and 0.53 with a population size of 75. The other algorithms yielded 0.83 with a population size of 75, 0.85 with a population size of 25 and 0.80 with a population size of 50 for KNN, RF, MLP, DT, SVM and GNB. We compared BEOSA and BIEOSA and found a large degree of variance. For instance, whereas KNN obtained its best value, i.e., 0.83, with a population size of 75 with BEOSA, for BIEOSA, the same classifier yielded a value of 0.98 with a population size of 25. Additionally, RF peaked at 0.85 with a population size of 100 and 0.89 with a population size of 25 in BEOSA and BIEOSA. MLP obtained its best values, i.e., 0.85 and 0.89, with a population size of 25 on BEOSA and BIEOSA, respectively. DT dipped in BIEOSA at an accuracy value of 0.78 with a population size of 100, but peaked in BEOSA with an accuracy of 0.85 with a population size of 25. SVM showed a good performance with BIEOSA, achieving an accuracy of 0.89 with a population size of 25, whereas with BEOSA, it achieved its best value, i.e., 0.80, with all population sizes. GNB performed better on BEOSA, with an accuracy of 0.85 with a population size of 25, but obtained 0.79 on BIEOSA with a population size of 100.
A comparative analysis of the plots of the CongressEW and SpectEW datasets showed that the performance of BEOSA on all of the classifiers was outstanding, standing shoulder-to-shoulder with BPSO and significantly outperforming BWOA, BSFO, BGWO, BSNDO, and BDMO. We note that the proposed method proved itself to be well-rounded and robust. Moreover, the good classification performance, derived from the number of features selected by the BEOSA, further confirms the applicability of the method to find the best number of required features, even in real-life problems. Additionally, the fitness function and cost function values were impressive for BEOSA and its variant BIEOSA.
The experiment using different classifiers in this study has shown that the choice of a classifier with a binary optimizer must be made carefully based on empirical investigation when such hybrid models are being deployed to address real-life problems. Having compared the performance of BEOSA with other related methods using the values obtained for fitness and cost functions, the average number of selected features and classification accuracy, in the following subsection, we compare the computational runtime required for each of the algorithms.

5.5. Computational Time Analysis

Computational resources, especially computational time, often play a pivotal role in the choice of an algorithm in time constrained applications. However, in cases where computational time is not a constraint, the selection of an algorithm is often based purely on performance. This subsection compares the computational time obtained for the binary optimizers considered in this study. Table 7 outlines the performance of all the algorithms with respect to each of the applied benchmark datasets.
The computational time of BSNDO was abnormally distributed. However, we found that BWOA showed reduced computational time for the BreastEW, CongressEW, Exactly, Iris, Exactly2, Ionosphere, Sonar, SpectEW, Tic-tac-toe, Vote, Wine, Zoo and KrVsKpEW datasets. BSPO performed best on the Lymphography dataset, while BDMO reported a reasonable computational time with the HeartEW and M-of-n datasets. The proposed method, BEOSA, demonstrated minimal computational time with the Leukemia and Colon datasets.
Figure 10 shows a graphical illustration of the distribution of the computational time for each dataset with respect to all of the tested binary optimization methods. The figure shows that with most binary optimizer algorithms, BSFO often demanded the most computation time, followed by BDMO and then BIEOSA. BSNDO and BGWO were shown to require less computation runtime. The implication of this is that the proposed BEOSA algorithm achieved outstanding performance with an average computational time compared with those of the other binary optimizers.

5.6. Discussion of Findings

As corroborated by the obtained results and discussed in detail in previous sub-sections, this study may conclude that the proposed BEOSA method demonstrated very promising performance on all benchmark datasets. We have shown that this method produced the optimal average number of selected features on each of the tested datasets. Furthermore, we discovered that the popular classifiers KNN, MLP, SVM, GNB and RF were relevant in terms of supporting the performance of binary optimizers. This motivates designers of binary optimizers to investigate which state-of-the-art classifier is suitable for supporting particular wrapper-based feature selection and classification tasks. Moreover, when a classifier impedes the performance of a binary optimizer, it diminishes the importance of using the algorithm for optimization purposes. Hence, deploying such binary optimizers to real problems must be accompanied by the selection of an appropriate classifier. The average numbers of feature selected for all datasets using the proposed BEOSA demonstrated that the algorithm is suitable for maximizing the cost function and minimizing the fitness function. Our findings also confirm that the novel method used for applying the S-functions and V-functions enhanced the performance of the proposed method.
Research on the optimization of the number of selected features for classification operations is aimed at obtaining the best optimizer. It is expected that such an optimizer will advance research in the field by ensuring that while classification accuracy is maximized, the number of features must be as low as possible. This demonstrates the increasing need for new algorithms which are capable of solving this multi-objective function. The algorithm proposed in this study satisfies this condition, since the two objectives were achieved. Moreover, we noted that BEOSA also improved the fitness and cost function values; these functions are pivotal when justifying the relevance of a binary optimizer in terms of selecting the optimal number of features required to obtain the best classification result. Furthermore, this study provides a wealth of experimental results, i.e., comparisons of the performance of different classifiers with several binary optimizers. We found this to be very rare in the literature and, as such, we hope that our work will benefit the community of researchers in the field.

6. Conclusions

This study presents the design of binary variants of the EOSA and IEOSA algorithms, referred to as the BEOSA and BIEOSA optimizers. Using models to represent the binary search space and an optimization process to change from a continuous to a discrete search space, the study shows that the new methods are suitable. Furthermore, we investigated the performance impact of using different transfer functions in the exploitation and exploration of two S-functions and two V-functions. Exhaustive experimentation was carried out using over 20 datasets with a wide range of heterogeneous features, and a comparative analysis was made with the BDMO, BSNDO, BPSO, BWOA, BSFO and BGWO methods. The performance outcomes showed that both BEOSA and BIEOSA performed reasonably well with most of the datasets and demonstrated competitive results with the others. This evaluation was shown using the values obtained for the fitness and cost function and the number of selected features. Furthermore, the study examined the impact of the choice of classifier used for feature classification purposes with respect to the optimizer. The findings showed that KNN and SVM performed the feature classification tasks exceptionally well. Meanwhile, a comparative analysis of the runtime and a statistical analysis of the methods were also reported. The results showed that significant performance improvements could be achieved when the transfer functions were skillfully formulated and applied. This finding was supported by the fact that the separation of applicability of the S-function from the V-function in the exploration and exploitation phases enhanced the performance of the algorithm. This study advances research in this domain through a novel demonstration, i.e., using different transfer functions in the search process involving the exploration and intensification phase. Moreover, the formulation of new transfer functions adds to the novelty of the proposed binary methods. One limitation with the study is associated with the performance of the immunity-based method, IEOSA, whose binary variant was unable to compete with other methods, in contrast with BEOSA, which yielded similar results to other state-of-the-art classifiers. This limitation will require further fine-tuning to enhance the algorithm. In future, we propose investigating the use of competing optimization algorithms as a hybrid solution with the BEOSA and BIEOSA methods. This is motivated by the need to capitalize upon the advantages of other methods in order to reduce the limitations of the base EOSA method. Future research opportunities with respect to the proposed method may be centred on using deep learning-based feature extraction and classification procedures. This could possibly result in an outstanding hybrid model, which, to date, no study has considered. Another future work is to investigate the possibility of swapping the usage of the S-function and V-function and to compare the performance with that described in this study.

Author Contributions

Contributed to the conception and design of the research work, Material preparation, experiments, and analysis, O.A., O.N.O. and A.E.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Code Availability (Software Application or Custom Code)

All codes used are available online at their indicated references.

References

  1. Hatamlou, A. Black hole: A new heuristic optimization approach for data clustering. Inf. Sci. 2013, 222, 175–184. [Google Scholar] [CrossRef]
  2. Dash, M.; Liu, H. Feature selection for classification. Intell. Data Anal. 1997, 1, 131–156. [Google Scholar] [CrossRef]
  3. Akinola, O.A.; Agushaka, J.O.; Ezugwu, A.E. Binary dwarf mongoose optimizer for solving high-dimensional feature selection problems. PLoS ONE 2022, 17, e0274850. [Google Scholar] [CrossRef] [PubMed]
  4. Liu, H.; Motoda, H. Feature Selection for Knowledge Discovery and Data Mining; Springer Science & Business Media: Berlin, Germany, 2012; Volume 454. [Google Scholar]
  5. Li, Y.; Li, T.; Liu, H. Recent advances in feature selection and its applications. Knowl. Inf. Syst. 2017, 53, 551–577. [Google Scholar] [CrossRef]
  6. Guyon, I.; De, A.M. An Introduction to Variable and Feature Selection André Elisseeff. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  7. Žerovnik, J. Heuristics for NP-hard optimization problems—simpler is better!? Logist. Sustain. Transp. 2015, 6, 1–10. [Google Scholar] [CrossRef] [Green Version]
  8. Hammouri, A.I.; Mafarja, M.; Al-Betar, M.A.; Awadallah, M.A.; Abu-Doush, I. An improved Dragonfly Algorithm for feature selection. Knowl.-Based Syst. 2020, 203, 106131. [Google Scholar] [CrossRef]
  9. Ahmed, S.; Sheikh, K.H.; Mirjalili, S.; Sarkar, R. Binary Simulated Normal Distribution Optimizer for feature selection: Theory and application in COVID-19 datasets. Expert Syst. Appl. 2022, 200, 116834. [Google Scholar] [CrossRef]
  10. Banka, H.; Dara, S. A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation. Pattern Recognit. Lett. 2015, 52, 94–100. [Google Scholar] [CrossRef]
  11. Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary ant lion approaches for feature selection. Neurocomputing 2016, 213, 54–65. [Google Scholar] [CrossRef]
  12. Emary, E.; Zawbaa, H.M. Feature selection via Lèvy Antlion optimization. Pattern Anal. Appl. 2019, 22, 857–876. [Google Scholar] [CrossRef]
  13. Ji, B.; Lu, X.; Sun, G.; Zhang, W.; Li, J.; Xiao, Y. Bio-Inspired Feature Selection: An Improved Binary Particle Swarm Optimization Approach. IEEE Access 2020, 8, 85989–86002. [Google Scholar] [CrossRef]
  14. Oyelade, O.N.; Ezugwu, A.E.S.; Mohamed, T.I.A.; Abualigah, L. Ebola Optimization Search Algorithm: A New Nature-Inspired Metaheuristic Optimization Algorithm. IEEE Access 2022, 10, 16150–16177. [Google Scholar] [CrossRef]
  15. Xue, B.; Zhang, M.; Browne, W.N.; Yao, X. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Trans. Evol. Comput. 2016, 20, 606–626. [Google Scholar] [CrossRef] [Green Version]
  16. Kennedy, J.; Eberhart, R.C. A discrete binary version of the particle swarm algorithm. In Proceedings of the 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Orlando, FL, USA, 12–15 October 1997; Volume 5, pp. 4104–4108. [Google Scholar] [CrossRef]
  17. Unler, A.; Murat, A. A discrete particle swarm optimization method for feature selection in binary classification problems. Eur. J. Oper. Res. 2010, 206, 528–539. [Google Scholar] [CrossRef]
  18. Chuang, L.Y.; Tsai, S.W.; Yang, C.H. Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 2011, 38, 12699–12707. [Google Scholar] [CrossRef]
  19. Mafarja, M.; Jarrar, R.; Ahmad, S.; Abusnaina, A.A. Feature selection using Binary Particle Swarm optimization with time varying inertia weight strategies. In Proceedings of the 2nd International Conference on Future Networks and Distributed Systems, Amman, Jordan, 26–27 June 2018. [Google Scholar] [CrossRef]
  20. Huang, C.; Wang, C. A GA-based feature selection and parameters optimization for support vector machines. Expert Syst. Appl. 2006, 31, 231–240. [Google Scholar] [CrossRef]
  21. Nemati, S.; Ehsan, M.; Ghasem-aghaee, N.; Hosseinzadeh, M. Expert Systems with Applications A novel ACO—GA hybrid algorithm for feature selection in protein function prediction. Expert Syst. Appl. 2009, 36, 12086–12094. [Google Scholar] [CrossRef]
  22. Jiang, S.; Chin, K.S.; Wang, L.; Qu, G.; Tsui, K.L. Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst. Appl. 2017, 82, 216–230. [Google Scholar] [CrossRef]
  23. Nakamura, R.Y.; Pereira, L.A.; Costa, K.A.; Rodrigues, D.; Papa, J.P.; Yang, X.S. BBA: A Binary Bat Algorithm for Feature Selection. In Proceedings of the 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images, Ouro Preto, Brazil, 22–25 August 2012; pp. 291–297. [Google Scholar] [CrossRef]
  24. Hancer, E.; Xue, B.; Karaboga, D.; Zhang, M. A binary ABC algorithm based on advanced similarity scheme for feature selection. Appl. Soft Comput. J. 2015, 36, 334–348. [Google Scholar] [CrossRef]
  25. Zhang, Y.; Song, X.F.; Gong, D.W. A return-cost-based binary firefly algorithm for feature selection. Inf. Sci. 2017, 418–419, 561–574. [Google Scholar] [CrossRef]
  26. Mafarja, M.; Aljarah, I.; Heidari, A.A.; Faris, H.; Fournier-Viger, P.; Li, X.; Mirjalili, S. Binary dragonfly optimization for feature selection using time-varying transfer functions. Knowl.-Based Syst. 2018, 161, 185–204. [Google Scholar] [CrossRef]
  27. Faris, H.; Mafarja, M.M.; Heidari, A.A.; Aljarah, I.; Al-Zoubi, A.M.; Mirjalili, S.; Fujita, H. An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems. Knowl.-Based Syst. 2018, 154, 43–67. [Google Scholar] [CrossRef]
  28. Mafarja, M.; Aljarah, I.; Faris, H.; Hammouri, A.I.; Al-Zoubi, A.M.; Mirjalili, S. Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst. Appl. 2019, 117, 267–286. [Google Scholar] [CrossRef]
  29. Mafarja, M.; Mirjalili, S. Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. 2018, 62, 441–453. [Google Scholar] [CrossRef]
  30. Kumar, V.; Kumar Di Kaur, M.; Singh Di Idris, S.A.; Alshazly, H. A Novel Binary Seagull Optimizer and its Application to Feature Selection Problem. IEEE Access 2021, 9, 103481–103496. [Google Scholar] [CrossRef]
  31. Elgin Christo, V.R.; Khanna Nehemiah, H.; Minu, B.; Kannan, A. Correlation-based ensemble feature selection using bioinspired algorithms and classification using backpropagation neural network. Comput. Math. Methods Med. 2019, 2019, 7398307. [Google Scholar] [CrossRef]
  32. Murugesan, S.; Bhuvaneswaran, R.S.; Khanna Nehemiah, H.; Keerthana Sankari, S.; Nancy Jane, Y. Feature Selection and Classification of Clinical Datasets Using Bioinspired Algorithms and Super Learner. Comput. Math. Methods Med. 2021, 2021, 6662420. [Google Scholar] [CrossRef]
  33. Balasubramanian, K.; Ananthamoorthy, N.P. Correlation-based feature selection using bio-inspired algorithms and optimized KELM classifier for glaucoma diagnosis. Appl. Soft Comput. 2022, 128, 109432. [Google Scholar] [CrossRef]
  34. Agrawal, P.; Abutarboush, H.F.; Ganesh, T.; Mohamed, A.W. Metaheuristic algorithms on feature selection: A survey of one decade of research (2009–2019). IEEE Access 2021, 9, 26766–26791. [Google Scholar] [CrossRef]
  35. Chen, Z.; Zhu, K.; Ying, L. Detecting multiple information sources in networks under the SIR model. IEEE Trans. Netw. Sci. Eng. 2016, 3, 17–31. [Google Scholar] [CrossRef]
  36. Zang, W.; Zhang, P.; Zhou, C.; Guo, L. Locating multiple sources in social networks under the SIR model: A divide-and-conquer approach. J. Comput. Sci. 2015, 10, 278–287. [Google Scholar] [CrossRef]
  37. Al-Betar, M.A.; Alyasseri, Z.A.; Awadallah, M.A.; Abu Doush, I. Coronavirus herd immunity optimizer (CHIO). Neural Comput. Appl. 2021, 33, 5011–5042. [Google Scholar] [CrossRef] [PubMed]
  38. Shaban, W.M.; Rabie, A.H.; Saleh, A.I.; Abo-Elsoud, M.A. A new COVID-19 Patients Detection Strategy (CPDS) based on hybrid feature selection and enhanced KNN classifier. Knowl.-Based Syst. 2020, 205, 106270. [Google Scholar] [CrossRef]
  39. Alweshah, M. Coronavirus herd immunity optimizer to solve classification problems. Soft Comput. 2022. [CrossRef]
  40. Oyelade, O.N.; Ezugwu, A.E. Immunity-Based Ebola Optimization Search Algorithm (IEOSA) for Minimization of Feature Extraction with Reduction in Digital Mammography Using CNN Models. Sci. Rep. 2022, 13, 17916. [Google Scholar] [CrossRef]
  41. Dua, D.; Graff, C. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2019; Available online: http://archive.ics.uci.edu/ml (accessed on 12 September 2022).
  42. Elgamal, Z.M.; Yasin, N.M.; Sabri, A.Q.M.; Sihwail, R.; Tubishat, M.; Jarrah, H. Improved equilibrium optimization algorithm using elite opposition-based learning and new local search strategy for feature selection in medical datasets. Computation 2021, 9, 68. [Google Scholar] [CrossRef]
  43. Hong, Z.Q.; Yang, J.Y. Optimal Discriminant Plane for a Small Number of Samples and Design Method of Classifier on the Plane. Pattern Recognit. 1991, 24, 317–324. [Google Scholar] [CrossRef]
  44. Schlimmer, J.C. Concept Acquisition through Representational Adjustment. Doctoral Dissertation, Department of Information and Computer Science, University of California, Irvine, CA, USA, 1987. [Google Scholar]
  45. Raman, B.; Ioerger, T.R. Instance Based Filter for Feature Selection. Mach. Learn. Res. 2002, 1, 1–23. [Google Scholar]
  46. Fisher, R.A. The use of multiple measurements in taxonomic problems. Annu. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
  47. Sigillito, V.G.; Wing, S.P.; Hutton, L.V.; Baker, K.B. Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech. Dig. 1989, 10, 262–266. [Google Scholar]
  48. Cestnik, G.; Konenenko, I.; Bratko, I. Assistant-86: A Knowledge-Elicitation Tool for Sophisticated Users. In Progress in Machine Learning; Bratko, I., Lavrac, N., Eds.; Sigma Press: Wilmslow, UK, 1987; pp. 31–45. [Google Scholar]
  49. Kurgan, L.A.; Cios, K.J.; Tadeusiewicz, R.; Ogiela, M.; Goodenday, L.S. Knowledge Discovery Approach to Automated Cardiac SPECT Diagnosis. Artif. Intell. Med. 2001, 23, 149–169. [Google Scholar] [CrossRef]
  50. Aha, D.W. Incremental constructive induction: An instance-based approach. In Proceedings of the Eighth International Workshop on Machine Learning, Evanston, IL, USA, 1 June 1991; Morgan Kaufmann: San Francisco, CA, USA, 1991; pp. 117–121. [Google Scholar]
  51. Cortez, P.; Cerdeira, A.; Almeida, F.; Matos, T.; Reis, J. Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 2009, 47, 547–553. [Google Scholar] [CrossRef] [Green Version]
  52. Breiman, L.; Friedman, J.H.; Olshen, A.; Stone, J. Classification and Regression Trees; Routledge: Abingdon, UK, 1984. [Google Scholar]
  53. Mirjalili, S.; Lewis, A. S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization. Swarm Evol. Comput. 2013, 9, 1–14. [Google Scholar] [CrossRef]
  54. Houssein, E.H.; Oliva, D.; Juan, A.A.; Yu, X. Binary whale optimization algorithm for dimensionality reduction. Mathematics 2020, 8, 1821. [Google Scholar] [CrossRef]
  55. Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 17, 371–381. [Google Scholar] [CrossRef]
  56. Ghos, K.K.; Ahmed, S.; Singh, P.K.; Sarkar ZW, G.R. Improved Binary sailfish Optimizer Based on Adaptive B-Hill Climbing for Feature Selection. IEEE Access 2020, 8, 83548–83560. [Google Scholar] [CrossRef]
Figure 1. Representation of the search space for all individuals in the population, with an illustration of the binarization procedure of feature indicators for each individual.
Figure 1. Representation of the search space for all individuals in the population, with an illustration of the binarization procedure of feature indicators for each individual.
Applsci 12 11787 g001
Figure 2. Identification of selected features as obtainable in every instance of the entire dataset.
Figure 2. Identification of selected features as obtainable in every instance of the entire dataset.
Applsci 12 11787 g002
Figure 3. Graphical chart of the values of (a) the S transform function for both S 1 i n d i and S 2   i n d i , and of (b) the T transform functions for both T 1   i n d i and T 2   i n d i .
Figure 3. Graphical chart of the values of (a) the S transform function for both S 1 i n d i and S 2   i n d i , and of (b) the T transform functions for both T 1   i n d i and T 2   i n d i .
Applsci 12 11787 g003
Figure 4. Process flow using BEOSA and BIEOSA to search for the best individual in a discrete search space.
Figure 4. Process flow using BEOSA and BIEOSA to search for the best individual in a discrete search space.
Applsci 12 11787 g004
Figure 5. Flowchart of the BEOSA algorithm showing the application of the V-functions and S-functions to transform the feature indicators of individuals in the infected sub-population.
Figure 5. Flowchart of the BEOSA algorithm showing the application of the V-functions and S-functions to transform the feature indicators of individuals in the infected sub-population.
Applsci 12 11787 g005
Figure 6. Graph-based comparative analysis of the cost function values obtained for all binary optimization methods on (a) Zoo; (b) Vote; (c) Wine; (d) Sonar; (e) Tic-tac-toe; (f) M-of-n; (g) Ionosphere; (h) Exactly; (i) Exactly2; (j) HeatEW; (k) CongressEW; (l) Lymphography; (m) Colon; (n) BreastEW; and (o) SpectEW datasets.
Figure 6. Graph-based comparative analysis of the cost function values obtained for all binary optimization methods on (a) Zoo; (b) Vote; (c) Wine; (d) Sonar; (e) Tic-tac-toe; (f) M-of-n; (g) Ionosphere; (h) Exactly; (i) Exactly2; (j) HeatEW; (k) CongressEW; (l) Lymphography; (m) Colon; (n) BreastEW; and (o) SpectEW datasets.
Applsci 12 11787 g006aApplsci 12 11787 g006b
Figure 7. Graph-based comparative analysis of the classification accuracy performance for all binary optimization methods on (a) Zoo; (b) Vote; (c) Wine; (d) Sonar; (e) Tic-tac-toe; (f) M-of-n; (g) Ionosphere; (h) Exactly; (i) Exactly2; (j) HeatEW; (k) CongressEW; (l) Lymphography; (m) Colon, (n) BreastEW; and (o) SpectEW datasets.
Figure 7. Graph-based comparative analysis of the classification accuracy performance for all binary optimization methods on (a) Zoo; (b) Vote; (c) Wine; (d) Sonar; (e) Tic-tac-toe; (f) M-of-n; (g) Ionosphere; (h) Exactly; (i) Exactly2; (j) HeatEW; (k) CongressEW; (l) Lymphography; (m) Colon, (n) BreastEW; and (o) SpectEW datasets.
Applsci 12 11787 g007aApplsci 12 11787 g007b
Figure 8. Classification accuracy of the KNN, RF, MLP, decision tree, SVM, and Naïve Bayes models with population sizes of 25, 50, 75, and 100 using the (a) BWOA, (b) BPSO, (c) BSFO, (d) BGWO, (e) BDMO, (f) BSNDO, (g) BEOSA, and (h) BIEOSA algorithms wih the CongressEW dataset.
Figure 8. Classification accuracy of the KNN, RF, MLP, decision tree, SVM, and Naïve Bayes models with population sizes of 25, 50, 75, and 100 using the (a) BWOA, (b) BPSO, (c) BSFO, (d) BGWO, (e) BDMO, (f) BSNDO, (g) BEOSA, and (h) BIEOSA algorithms wih the CongressEW dataset.
Applsci 12 11787 g008aApplsci 12 11787 g008b
Figure 9. Classification accuracy of the KNN, RF, MLP, decision tree, SVM and Naïve Bayes models with population sizes of 25, 50, 75, and 100, using the (a) BWOA, (b) BPSO, (c) BSFO, (d) BGWO, (e) BDMO, (f) BSNDO, (g) BEOSA and (h) BIEOSA algorithms with the SpectEW dataset.
Figure 9. Classification accuracy of the KNN, RF, MLP, decision tree, SVM and Naïve Bayes models with population sizes of 25, 50, 75, and 100, using the (a) BWOA, (b) BPSO, (c) BSFO, (d) BGWO, (e) BDMO, (f) BSNDO, (g) BEOSA and (h) BIEOSA algorithms with the SpectEW dataset.
Applsci 12 11787 g009aApplsci 12 11787 g009b
Figure 10. Comparison of the computational times for BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA on all the benchmark datasets.
Figure 10. Comparison of the computational times for BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA on all the benchmark datasets.
Applsci 12 11787 g010
Table 1. Datasets and their corresponding details, such as the number of features, classes and instances, and a description of each.
Table 1. Datasets and their corresponding details, such as the number of features, classes and instances, and a description of each.
Dataset and ReferencesNumber of FeaturesNumber of InstancesNumber of ClassDescription
BreastEW 569302Biology-based and medical-oriented dataset
Lung [43]33122035Biology-based and medical-oriented dataset
CongressEW [44]435162Congressional voting dataset
Exactly [45]1000132Artificial binary classification dataset
Iris [46]41502Biology-based dataset
Exactly2 [45]1000132Artificial binary classification dataset
HeartEW270132Biology-based and medical-oriented dataset
Ionosphere [47],351342Electromagnetic dataset
Prostate59661022Biology-based and medical-oriented dataset
Lymphography [48] 148184Biology-based and medical-oriented dataset
M-of-n1000132Biology-based and medical-oriented dataset
Leukemia7070722Biology-based and medical-oriented dataset
PenglungEW325737Biology-based and medical-oriented dataset
Sonar208602Sonar signal classification dataset
SpectEW [49]267222Biology-based and medical-oriented dataset
Colon2000622Biology-based and medical-oriented dataset
Tic-tac-toe [50]95892Endgame dataset
Vote300162Electioneering domain
Wine [51]178133Wine dataset showing the results of analysis of chemicals in wines.
Zoo101167Biology-based dataset
KrVsKpEW3196362Game dataset
WaveformEW [52]5000403A generator dataset generates three classes of waves, with each class sampled at 21 intervals. Additionally, each class is a random convex combining 2 out of 3 base waves.
Table 2. Parameters for the BEOSA, BIEOSA, BDMO, BSNDO, BPSO, BWOA, BSFO and BGWO metaheuristic algorithms in this study. N, as used for BDMO and BSNDO, denotes the population size.
Table 2. Parameters for the BEOSA, BIEOSA, BDMO, BSNDO, BPSO, BWOA, BSFO and BGWO metaheuristic algorithms in this study. N, as used for BDMO and BSNDO, denotes the population size.
MethodParameterValueDefinition
BEOSAπ0.1Recruitment rate
β1, β2, β3 and β40.1, 0.1, 0.1, and 0.1Contact rate of infected individuals, of the host, with the dead and with the recovered individuals
BIEOSAπ0.1Recruitment rate
β1, β2, β3 and β40.1, 0.1, 0.1, and 0.1Contact rate of infected individuals, of the host, with the dead and with the recovered individuals
BDMOnb3Number of babysitters
naN- nbNumber of alpha
nsN- nbNumber of subordinates
peep, τ 1, rand (0, 1)Peep sound, tau operator for fitness evaluation
Lround (0.6*D*nb)Babysitter exchange parameter
BSNDOmomean(N)Mean position of the population
BPSOc1, c22, 2Positive learning factors constant 1 and constant 2
W0.9Initial weight
Vmax6 Maximum velocity vector
BWOAp, l[0, 1], [-1, 1]Random number, random number
b1Shape of spiral
r, C[0, 1], 2r Random vector, coefficient vector
BSFO Pp0.1Percentage of the sardine population
A, ε 4, 0.001 The coefficient for decreasing power attack
BGWOau[2,0]
Table 3. Results of fitness and cost functions for BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA on all benchmark datasets.
Table 3. Results of fitness and cost functions for BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA on all benchmark datasets.
DatasetBWOABPSOBSFOBGWOBDMOBSNDOBEOSABIEOSA
FitnessCostFitnessCostFitnessCostFitnessCostFitnessCostFitnessCostFitnessCostFitnessCost
BreastEW0.03070.96931.00000.00000.03740.96260.05640.94360.82330.17670.92650.07350.04040.95960.06510.9349
Lung0.00060.99940.05160.94840.05350.94650.03070.96930.97290.02710.97290.02710.04900.95100.04920.9508
CongressEW0.05880.94120.21750.78250.0260.97420.02590.97410.90710.09290.90710.09290.05750.94250.10680.8932
Exactly0.69230.30770.48590.51410′01470.98530.0260.97400.69230.30770.69230.30770.26200.73800.35530.6447
Iris0.00500.99500.62950.3705NA1.00000.00250.99750.70050.29950.83000.17000.03800.96200.20300.7970
Exactly20.23840.76160.23990.76010.03550.96450.23240.76760.69840.30160.69840.30160.23840.76160.23840.7616
HeartEW0.25820.74180.44310.55690.37440.62560.13220.86780.54010.45990.48590.51410.07720.92280.29560.7044
Ionosphere0.07340.92660.21710.78290.17910.82090.13350.86650.88600.11400.01620.98380.07220.92780.12880.8712
Prostate0.00040.99960.09630.90370.00640.99360.00640.99360.95260.04740.95260.04740.00020.99980.04860.9514
Lymphography0.20240.79760.36690.63310.36470.63530.10620.89380.59960.40040.03800.96200.10400.89600.30030.6997
M-of-n0.25060.74940.62520.37480.36780.63220.00540.99460.72810.27190.00460.99540.15810.84190.36780.6322
Leukemia0.06620.93380.07360.9264NA1.00000.00630.99370.92970.07030.92970.07030.06620.93380.00420.9958
PenglungEW0.07050.92950.00420.99580.20650.79350.00590.99410.66720.33280.00130.99870.06720.93280.26890.7311
Sonar0.07240.92760.11940.88060.19460.80540.19460.80540.76260.23740.00470.99530.07170.92830.18890.8111
SpectEW0.13150.86850.22230.77770.24650.75350.11590.88410.77640.22360.09480.90520.14980.85020.24330.7567
Colon0.00040.99960.38600.6140NA1.00000.00630.99370.84490.15510.84490.15510.00010.99990.07760.9224
Tic-tac-toe0.26230.73771.00000.00000.76350.23650.17500.82500.65340.34660.16470.83530.29430.70570.38090.6191
Vote0.00130.99881.00000.00000.16810.83190.02030.97980.84710.15290.00190.99810.05450.94550.08630.9138
Wine0.03060.96940.38650.61350.30480.69520.08630.91370.66850.33150.02980.97020.02980.97020.11310.8869
Zoo0.05200.94800.20050.79950.19920.80080.05450.94550.75000.25000.75000.25000.05330.94680.20170.7983
KrVsKpEW0.06120.93880.47280.52720.35190.64810.03480.96520.68280.31720.02500.97500.03820.96180.40830.5917
WaveformEW0.21020.78980.54680.45330.31490.68510.18170.81830.33940.6606 1.00000.24310.75690.27620.7238
Summary6600001100888822
Table 4. Results of the average number of features selected for BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA on all datasets with population sizes of 25, 50, 75 and 100.
Table 4. Results of the average number of features selected for BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA on all datasets with population sizes of 25, 50, 75 and 100.
DatasetBWOABPSOBSFOBGWOBDMOBSNDOBEOSABIEOSA
Number of FeaturesNumber of FeaturesNumber of FeaturesNumber of FeaturesNumber of FeaturesNumber of FeaturesNumber of FeaturesNumber of Features
255075100255075100255075100255075100255075100255075100255075100255075100
BreastEW17.017.117.318.41.01.01.01.010.510.011.011.520.118.918.616.97.18.35.55.63.03.03.03.07.310.39.68.17.75.910.08.1
Lung1907.21840.71693.51692.51.01.01.01.0NANANANA2165.42180.42151.12168.9847.5820.31161.9971.22098.42098.42098.42098.4443.7857.4461.9403.9970.41189.7685.3699.9
CongressEW9.49.89.18.91.01.01.01.06.76.66.05.810.111.210.210.24.72.42.93.22.42.42.42.45.95.76.45.35.74.65.15.9
Exactly7.16.27.77.61.01.01.01.00.70.70.70.78.59.19.39.43.03.93.33.12.22.22.22.24.24.34.84.93.53.62.52.6
Iris3.03.03.02.01.01.01.01.01.01.01.01.01.03.03.02.02.02.01.01.01.02.01.02.01.41.31.71.82.02.02.02.0
Exactly27.37.08.05.71.01.01.01.04.55.05.04.58.88.98.38.34.03.74.83.52.02.02.02.02.01.91.71.72.53.73.63.6
HeartEW7.17.58.27.71.01.01.01.05.05.01.05.08.79.58.37.52.82.83.72.91.01.01.01.04.33.14.82.64.33.72.72.8
Ionosphere18.617.217.617.21.01.01.01.08.014.010.08.021.921.821.121.55.08.88.35.58.48.48.48.47.37.17.68.48.010.210.68.2
Prostate3274.13257.73255.63286.41.01.01.01.039163916391639163937.13949.73929.23927.22272.81506.31685.31402.81478.41478.41478.41478.41326.1941.5895.3682.21141.51389.31437.31359.3
Lymphography12.010.310.19.61.01.01.01.09.09.09.05.011.712.812.211.43.94.75.93.51.01.01.01.07.36.97.17.14.55.96.05.6
M-of-n8.28.18.37.31.01.01.01.01.05.02.06.08.67.88.28.42.21.92.92.90.60.60.60.67.17.57.06.14.34.03.24.5
Leukemia1708.31719.81872.71778.91.01.01.01.0NANANANA2340.62334.12320.42337.81025.21483.3999.71194.3928.5928.5928.5928.5253.6110.3121.950.31202.4865.7589.7876.0
PenglungEW192.0190.0170.0179.01.01.01.01.0109.04.08567207.0193.0212.0213.0176.2103.9189.423.9142.0144.0124.0170.046.0134.035.040.024.0122.092.0158.0
Sonar34.634.031.632.11.01.01.01.020.010.09.08.038.238.539.637.97.313.018.611.123.023.023.023.022.423.724.316.518.818.519.313.9
SpectEW12.213.112.310.21.01.01.01.03.06.04.09.013.413.915.613.65.710.16.38.43.13.13.13.17.79.37.07.57.95.35.96.3
Colon1127.01016.11066.21035.01.01.01.01.0NANANANA1302.01303.21306.31301.5546.2623.6657.4727.71374.31374.31374.31374.3338.5286.8197.0157.1384.3632.0472.5316.5
Tic-tac-toe4.74.75.44.71.01.01.01.03.03.04.01.05.66.36.26.12.32.31.82.11.41.41.41.45.55.65.26.42.52.93.12.7
Vote8.59.28.47.51.01.01.01.01.01.01.01.010.811.311.010.23.44.14.33.77.07.07.07.05.15.85.75.13.54.45.04.0
Wine7.48.28.07.31.01.01.01.03.01.03.05.07.78.28.07.33.03.83.12.62.62.62.62.65.04.54.95.44.33.73.83.8
Zoo10.18.68.89.31.01.01.01.06.01.04.06.010.910.610.09.32.41.83.93.33.03.03.03.06.87.77.37.65.14.64.55.6
KrVsKpEW19.023.021.026.01.01.01.01.010.010.010.010.027.022.027.025.06.12.44.25.72.42.42.42.422.617.420.921.18.010.88.812.8
WaveformEW25.826.127.025.71.01.01.01.0NANANANA27.027.027.025.014.06.91.02.120.020.025.024.025.020.026.014.011.018.04.012.8
Table 5. Comparative analysis of classification accuracy obtained for BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA on all datasets using population sizes of 25, 50, 75, and 100.
Table 5. Comparative analysis of classification accuracy obtained for BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA on all datasets using population sizes of 25, 50, 75, and 100.
DatasetBWOABPSOBSFOBGWOBDMOBSNDOBEOSABIEOSA
AccuracyAccuracyAccuracyAccuracyAccuracyAccuracyAccuracyAccuracy
255075100255075100255075100255075100255075100255075100255075100255075100
BreastEW0.91750.93510.91840.92190.93680.94390.95350.94210.80700.90350.74560.85960.92720.91230.91320.91400.89210.85260.75440.79120.69300.69300.69300.69300.94300.94560.93770.94300.89740.88070.91490.9026
Lung0.95120.96590.95120.96100.98290.98050.97800.9902NA NA NA NA 0.95120.93900.95370.95120.90980.90730.90490.92680.92680.92680.92680.92680.98290.98540.98540.98290.95370.94150.94630.9439
CongressEW0.91260.93330.93100.90460.95400.96780.96090.96210.55170.87360.82760.82760.92640.91950.93910.93330.81490.67930.74830.77130.82760.82760.82760.82760.96210.95630.96440.96670.89890.91030.88620.9034
Exactly0.65750.67150.66450.64900.71800.76100.82850.76000.69000.61500.69000.69000.67300.66050.69600.67400.55450.59100.56550.56850.62500.62500.62500.62500.69950.71300.71100.74750.62400.65650.64350.6860
Iris0.93331.00000.93330.90000.96670.93331.00001.00000.96670.9666670.96670.96670.73331.00000.93331.00000.96670.83330.40000.5667NANA NA NA 1.00001.00001.00000.96670.73330.93330.83330.8000
Exactly20.70700.72750.72100.73450.76100.76200.76100.76600.62500.73500.73000.73000.71750.71200.71150.71500.66400.63650.68750.62600.59000.59000.59000.59000.76200.76250.76250.76100.74950.74350.73750.7325
HeartEW0.65370.62960.66670.69630.78330.80190.78520.80190.70370.61110.51850.72220.68330.65560.62410.69630.54810.57220.55560.53150.48150.48150.48150.48150.77410.77590.80740.80740.69070.65930.68700.6685
Ionosphere0.84000.81570.84860.85000.90860.92860.90570.92860.90000.84290.81430.82860.80710.83860.84570.84000.79000.81290.81430.76140.72860.72860.72860.72860.89430.91430.89570.91430.84290.87290.84430.8157
Prostate0.93330.92860.94760.97621.00001.00001.0000 1.00001.00001.000011.00000.95710.98100.94290.90950.78570.75240.77140.74760.90480.90480.90480.90480.99051.00000.99521.00000.90950.88570.88570.8286
Lymphography0.80000.80000.76670.66670.85330.87330.88000.87000.66670.73330.76670.63330.75000.73000.73330.76330.56330.55670.62670.52331.00000.93331.00000.96670.82330.85670.85000.84000.65670.68000.69330.6900
M-of-n0.74250.79050.75850.77650.87650.87050.87550.87050.63000.69000.66000.74000.79500.78950.83050.79050.58950.60500.60350.63850.62000.62000.62000.62000.83200.84850.80800.79950.69800.68550.63600.7375
Leukemia0.96000.94670.95330.96670.98670.99330.98671.0000NANANANA0.98000.96670.98670.98000.91330.92000.89330.90000.86670.86670.86670.86671.00000.99330.98000.99330.91330.96000.96000.9800
PenglungEW0.86670.80000.80000.86670.86670.80000.80001.00000.80000.40000.93330.66670.80000.80000.86670.86670.73330.73330.66670.46670.66670.80000.73330.93330.93330.80000.86670.86670.73331.00000.80000.6667
Sonar0.79520.80950.77140.76900.87380.88810.86190.88570.71430.78570.64290.69050.75000.81900.79290.78810.62860.68330.71900.64290.76190.76190.76190.76190.86430.86900.87860.85710.81670.80950.82860.7810
SpectEW0.77590.80370.80370.81670.83700.84630.83330.84630.79630.79630.79630.75930.79630.81480.78520.81110.69630.72780.72960.73520.74070.74070.74070.74070.85000.84440.82590.84440.78330.81300.78700.8037
Colon0.95380.99230.90770.90771.00001.00001.00001.0000NANANANA 0.93850.97690.98460.91540.75380.72310.68460.71540.84620.84620.84620.84621.00001.00001.00001.00000.83080.87690.86920.8923
Tic-tac-toe0.68180.68440.70050.68700.73180.71980.74740.73850.61980.54170.69790.65100.68280.69480.72550.70780.57760.62190.59220.60420.64580.64580.64580.64580.73960.77970.74480.79640.63070.66980.64010.6479
Vote0.87500.88330.89170.89170.93500.92830.94500.94500.85000.85000.81670.85000.86000.89670.88500.89330.77330.80670.83330.83000.93330.93330.93330.93330.94170.94670.95830.95500.84170.83330.87830.8200
Wine0.73060.70830.74720.76390.91110.90280.92780.88890.88890.61110.66670.72220.77220.73890.78330.76390.63610.65280.64440.65830.72220.72220.72220.72220.93890.91940.93610.95560.82500.72780.81110.8028
Zoo0.87500.87000.83000.83000.91000.93500.92500.88000.90000.40000.60000.90000.89500.84500.89500.85500.58500.57500.67000.61500.65000.65000.65000.65000.92000.94000.90500.90500.75000.79500.75000.7850
KrVsKpEW0.85130.84820.94680.97030.97030.96560.9608760.96710.61660.69640.64160.72300.92180.74180.84510.94370.56040.51330.53510.55350.67290.67290.67290.67290.89190.85680.91030.88650.65010.69620.68510.7351
WaveformEW0.75090.77690.79120.76390.81540.82410.82950.8021NA NA NA NA 0.77430.7582 0.7788 0.7913 0.35500.46400.46800.6060NANANANA0.80650.79730.79190.79000.70380.61190.60250.6222
Summary09110235102625239555650011933695755
Table 6. The classification accuracy, precision, recall, F1-score and area under curve (AUC) report for the M-of-n datasets using KNN, random forest (RF), MLP, Decision Tree (DTree), SVM, and Gaussian naïve Bayes (GNB) classifiers on the BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA algorithms.
Table 6. The classification accuracy, precision, recall, F1-score and area under curve (AUC) report for the M-of-n datasets using KNN, random forest (RF), MLP, Decision Tree (DTree), SVM, and Gaussian naïve Bayes (GNB) classifiers on the BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA and BIEOSA algorithms.
AlgorithmClassifierAccuracyPrecisionRecallF1-ScoreAUC
BWOAKNN0.6650.70.7857140.7333330.770667
RF0.710.7142860.6666670.6666670.805333
MLP0.6850.6666670.80.7272730.821333
DTree0.720.80.5333330.640.766667
SVM0.710.7777780.60.6428570.826667
GNB0.720.7142860.6666670.6896550.832
BPSOKNN0.9050.8461540.8666670.7857140.90522
RF0.990.9230770.8666670.8571430.976648
MLP11111
DTree0.790.8181820.6666670.7142860.741333
SVM11111
GNB0.95510.80.8888891
BSFOKNN0.690.5294120.60.56250.741333
RF0.7250.6666670.60.560.704
MLP0.7250.8571430.5333330.5517240.74
DTree0.740.9166670.7333330.8148150.898667
SVM0.740.9166670.7333330.8148150.850667
GNB0.740.9166670.7333330.8148150.882667
BGWOKNN0.6750.80.5714290.6666670.826923
RF0.710.70.6666670.6666670.834667
MLP0.7050.7272730.5333330.6153850.805333
DTree0.7550.8571430.60.6428570.8
SVM0.7550.8571430.7333330.7586210.806667
GNB0.7550.7777780.6666670.6896550.846154
BDMOKNN0.60.6923080.8666670.6842110.821333
RF0.7250.7058820.8666670.750.889333
MLP0.7250.7058820.80.750.881333
DTree0.7250.750.5333330.6153850.817333
SVM0.7250.750.5333330.6153850.897333
GNB0.7250.750.8666670.7741940.865333
BSNDOKNN10.9090910.8666670.8666670.970667
RF11111
MLP11111
DTree0.77510.5333330.6666670.805333
SVM11111
GNB0.961111
BEOSAKNN0.8150.8666670.9285710.8965520.98489
RF0.83510.9285710.8965520.95467
MLP0.840.90.9285710.8965520.971154
DTree0.7950.90.7142860.7692310.896978
SVM0.8450.87510.9333330.993132
GNB0.830.9285710.9285710.9285710.995879
BIEOSAKNN0.93510.8666670.9285711
RF0.6650.5384620.50.5185190.748626
MLP0.670.5833330.4666670.4827590.712912
DTree0.670.5454550.5333330.50.717033
SVM0.670.70.5333330.5833330.843407
GNB0.6710.6666670.5714290.769231
Table 7. Comparative analysis of the computational times of BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA, and BIEOSA.
Table 7. Comparative analysis of the computational times of BWOA, BPSO, BSFO, BGWO, BDMO, BSNDO, BEOSA, and BIEOSA.
DatasetBWOABPSOBSFOBGWOBDMOBSNDOBEOSABIEOSA
BreastEW2863.075716.9515,231.683284.614693.940.057755.127143.08
Lung10,836.276407.44NA13,519.0221,901.970.0627,005.2127,469.88
CongressEW2494.413293.7521,216.452911.413167.040.053681.293439.88
Exactly4238.075576.7037,823.174967.035139.550.056135.805239.17
Iris1539.721569.790.03001248.062454.020.053638.874367.40
Exactly24894.755460.2184,742.305886.605831.480.055629.125949.60
HeartEW5991.476180.3211,961.876103.144953.760.058289.177775.06
Ionosphere2425.155192.2626,648.482711.814576.300.066795.185838.63
Prostate13,996.8623,715.340.050019,962.2218,691.100.0514,753.8814,729.31
Lymphography7088.644101.1715,660.773363.444113.9224,995.096342.726345.36
M-of-n3377.354705.0256,589.184243.874042.420.055927.314770.37
Leukemia12,557.9615,367.75NA15,879.9114,626.920.0510,973.1713,082.37
PenglungEW972.861186.147022.131027.131057.121613.041570.831238.32
Sonar2478.743265.9023,013.842694.134308.820.064640.604789.33
SpectEW2809.203386.7515,919.293107.253314.520.044389.904118.12
Colon9003.7810,541.57NA11,003.8010,026.800.068971.409562.65
Tic-tac-toe7769.299578.1654,646.608591.517463.630.0513,032.3010,677.70
Vote2430.622837.5224,196.792561.772947.240.054046.514197.18
Wine2774.644566.9319,463.483309.425377.520.057005.928468.67
Zoo2013.562427.558370.562118.252465.750.053384.513437.60
KrVsKpEW8482.4817,028.06173,356.612,685.9020,658.050.0725,079.9818,698.16
WaveformEW17,449.5832,664.62NA21,421.4722,591.37NA26,812.9527,680.63
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Akinola, O.; Oyelade, O.N.; Ezugwu, A.E. Binary Ebola Optimization Search Algorithm for Feature Selection and Classification Problems. Appl. Sci. 2022, 12, 11787. https://0-doi-org.brum.beds.ac.uk/10.3390/app122211787

AMA Style

Akinola O, Oyelade ON, Ezugwu AE. Binary Ebola Optimization Search Algorithm for Feature Selection and Classification Problems. Applied Sciences. 2022; 12(22):11787. https://0-doi-org.brum.beds.ac.uk/10.3390/app122211787

Chicago/Turabian Style

Akinola, Olatunji, Olaide N. Oyelade, and Absalom E. Ezugwu. 2022. "Binary Ebola Optimization Search Algorithm for Feature Selection and Classification Problems" Applied Sciences 12, no. 22: 11787. https://0-doi-org.brum.beds.ac.uk/10.3390/app122211787

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop