Next Article in Journal
Optimum Design of Flexural Strength and Stiffness for Reinforced Concrete Beams Using Machine Learning
Previous Article in Journal
Homomorphic Encryption Based Privacy-Preservation for IoMT
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dimension Reduction Using New Bond Graph Algorithm and Deep Learning Pooling on EEG Signals for BCI

1
State Key Laboratory for Manufacturing System Engineering, System Engineering Institute, Xi’an Jiaotong University, Xianning West Road No. 28, Xi’an 710049, China
2
Department of Computing, University of Turku, 20014 Turku, Finland
3
Payame Noor University of Oskou, Oskou 19395-4697, Iran
*
Author to whom correspondence should be addressed.
Submission received: 19 August 2021 / Revised: 10 September 2021 / Accepted: 15 September 2021 / Published: 20 September 2021
(This article belongs to the Special Issue Artificial Intelligence on Brain–Computer Interface (BCI))

Abstract

:
One of the main challenges in studying brain signals is the large size of the data due to the use of many electrodes and the time-consuming sampling. Choosing the right dimensional reduction method can lead to a reduction in the data processing time. Evolutionary algorithms are one of the methods used to reduce the dimensions in the field of EEG brain signals, which have shown better performance than other common methods. In this article, (1) a new Bond Graph algorithm (BGA) is introduced that has demonstrated better performance on eight benchmark functions compared to genetic algorithm and particle swarm optimization. Our algorithm has fast convergence and does not get stuck in local optimums. (2) Reductions of features, electrodes, and the frequency range have been evaluated simultaneously for brain signals (left-handed and right-handed). BGA and other algorithms are used to reduce features. (3) Feature extraction and feature selection (with algorithms) for time domain, frequency domain, wavelet coefficients, and autoregression have been studied as well as electrode reduction and frequency interval reduction. (4) First, the features/properties (algorithms) are reduced, the electrodes are reduced, and the frequency range is reduced, which is followed by the construction of new signals based on the proposed formulas. Then, a Common Spatial Pattern is used to remove noise and feature extraction and is classified by a classifier. (5) A separate study with a deep sampling method has been implemented as feature selection in several layers with functions and different window sizes. This part is also associated with reducing the feature and reducing the frequency range. All items expressed in data set IIa from BCI competition IV (the left hand and right hand) have been evaluated between one and three channels, with better results for similar cases (in close proximity). Our method demonstrated an increased accuracy by 5 to 8% and an increased kappa by 5%.

1. Introduction

Today, the EEG is used as a non-invasive system to record brain signals from electrodes on the scalp for brain activity. Brain signals aim to control systems for sick and healthy people, playing an essential role in various programs in different fields [1,2,3,4,5,6,7]. The primary advantage of processing brain signals is to discover information for prediction and classification. Brain signals have some general information-processing steps: filtering, feature extraction, feature selection, and classification. Most researchers have focused on feature selection for finding new suitable methods and algorithms for improvement [8,9,10,11,12,13,14].
Feature selection can appear in three possible methods.
  • In the first method, feature selection (one-dimensional and more) after data preprocessing (e.g., filtering) and before feature extraction (if used). If the feature extraction is not used, the selected features will be sent directly to the classifiers for classification after reducing the dimensions.
  • In the second method, the following steps are performed respectively: (1) feature extraction, (2) feature selection, and (3) classification.
  • In the third method, after reducing the dimensions of the features (i.e., feature selection), the following steps are performed, respectively: (1) feature extraction, (2) feature selection, and (3) classification.
In some studies, using the first and third methods, a dimensional reduction is used to reduce features, reduce electrodes, or both.
In all of the methods mentioned above, the features from a larger set (including channels, etc.) are converted into better and smaller feature sets by removing the less effective features. In most cases, a certain frequency range (8–30 Hz for the brain signal) is processed for specific purposes (such as left and right-hand imaging). In most studies, all the features along with specific channels or whole channels have been considered [15,16,17,18,19,20]. In addition, in some studies, one-dimensional or two-dimensional reduction methods are performed for reducing features and the electrodes. However, none of the previous studies examined reducing the frequency range from one to three electrodes.
In some articles, certain frequency ranges have been used. In other words, feature reduction is used by reducing the frequency range with a single electrode. Given that human brain signals are generally a combination of different domains, examining them as a single domain is challenging and has been studied in many previous studies [20,21,22,23,24,25,26,27,28,29,30,31,32]. In this study, we propose a new optimization algorithm based on the concept of Bond Graph, which is the modeling and simulation language for multi-domain systems. We examined and analyzed the reduction of frequency intervals of brain signals.
The main contributions of this paper are as follows:
  • Introduction of the Bond Graph algorithm (BGA), which demonstrated a good performance compared to the genetic algorithm (GA) and particle swarm optimization (PSO) in the benchmark functions. The assessment was performed by testing on eight benchmarks and EEG brain signals of right and left-hand perception (two classes). Important features of our proposed algorithm (i.e., BGA) include fast convergence and no trapping in the local optimization.
  • The reduction of features, reduction of electrodes, and frequency range have been evaluated simultaneously.
  • Feature extraction and feature selection for time domain, frequency domain, wavelet coefficients, and autoregression have been investigated as feature reduction, electrode reduction, and frequency range reduction.
  • Initially, the features, electrodes, and frequency are reduced, and then, new signals are made based on them using special formulas for each row. Then, a common spatial pattern (CSP) is performed to remove noise and feature extraction, which is followed by classification.
  • A separate study with a deep learning sampling method has been implemented as feature selection in several layers with different functions and different window sizes. This part also includes feature and frequency reduction.
In cases 1 to 4, the proposed BGA, GA, PSO, and Quantum Genetic Algorithm (QGA) are used for evaluation. In case 5, no algorithm was required, but simple computational functions were used.
The main objective of this study is dimension reduction in single and multi-domain brain signals. For this purpose, five general models (seven model scenarios) have been implemented for testing:
  • One filter bank and single channel from a set of two filter banks (i.e., 2 and 5) with two channels (i.e., right and left hemispheres of the brain) have been investigated by four main algorithms, i.e., BGA, GA, PSO, and QGA. The two channels’ features are selected from the right and left hemispheres. The reduction of features for two channels is considered separately for each channel.
  • A combination of filter banks with a single channel from a set of four filter banks (i.e., 2, 5, 6, and 9) and two channels (i.e., right and left hemispheres of the brain) have been investigated with the same algorithms. The reduction of features for two channels is considered separately for each channel.
  • Four general methods, including time domain, frequency domain, wavelet coefficients, and autoregression have been used for feature extraction from a combination of two filter banks (i.e., 2 and 5) as a new signal with a single channel (i.e., right or left hemispheres of the brain), two channels (i.e., right and left hemispheres of the brain), or three channels (i.e., right and left hemispheres and center of the brain). Then, feature selection is done by BGA, GA, PSO, and QGA separately.
  • Feature selection was done by four main algorithms on a combination of filter banks (from filter banks 2, 5, and 6) and two (i.e., right and left hemispheres of the brain) or three channels (i.e., right and left hemispheres and center of the brain). Then, using special formulas, new signals were formed, which were used as input for CSP. Finally, the ELM classifier classified the extracted features from CSP.
  • Feature selection was performed by deep learning sampling with five general functions on all filter banks together or each filter bank individually with all channels in three sampling models. In the first sampling model, each filter bank with the same functions was used in all layers. In the second sampling model, the functions were the same selected for all layers and all filter banks. In the third sampling model, the functions were randomly selected for each layer.
The rest of the article is organized as follows: In the Section 2, a summary of previous work on the dimensionality in all items is provided for brain signals. Then, in the Section 3, we review the previous related work and define the Bond Graph and how to implement the proposed algorithm and use it to reduce the dimension in brain signals and reduce the dimension by sampling. The configuration of the experiments and the data set used in this article are described in the Section 4. Finally, in the Section 5, the results of the experiments and beads are analyzed and examined. The last part presents the conclusion.

2. Previous Work

In the article [10], Adham et al. implemented specific individual and specific masks working with the CAR method, standard resource methods, and ELM. The method was a two-dimensional reduction of features and electrodes, which has achieved accuracy in the range of 15 to 32%. In the article [33], Jing Luo et al. performed the feature extraction method by analyzing the wavelet components for two channels (each channel separately) in two modes, first with a dynamic frequency feature selection method and second with a full frequency range (without frequency range selection). They obtained accuracy results of 68% and 67% for the two modes, respectively. So, the feature selection had improved their accuracy. In their work, feature selection was performed after feature extraction, which reduced the channels from 22 to two channels. Liu et al. [34] used the glow algorithm to select the features. First, feature extraction is performed by CSP for all channels and three important and specific channels individually (channels are left hemisphere, right hemisphere, and center of the brain). The GA, PSO, and a Firefly Algorithm obtained 59.85%, 60%, and 70.20% accuracies, respectively. In the article [35], Bashar Awwad used the sixth-order autoregression method with a sample window shift. They reduced the features to 20 features (characteristics) with PCA and then classified them with LDA. They obtained accuracy results between 46.8% and 59% for one channel and between 48% and 62% for two channels. They reported the average of the best accuracies in different non-identical channels as 59.67%. In the article [36], Adham et al. designed an individual and specific work mask that has been shared between individuals, so that part of the data is trained for subjects and tested for the untrained subjects. Features reduction and channel reduction were up to 90%, and the average accuracy was between 73.5% and 74.5%. In the article [37], Nakisa et al. used frequency-temporal and temporal-frequency feature extraction. They implemented five channels with ACO, DE, GA, SA, and PSO algorithms along with the PNN classifier to select the features. The best result in this study was 65% for four classes. Peterson [38] used the MNE and Infomax preprocessing methods along with feature extraction with spectral power by selecting the feature by GA and SVM classifier. In this paper, for two classes, the number of channels has been reduced from 32 to two. The accuracy range of the subject is between 55% and 67%. By changing the method of the CSP with SVM classifier, Mahnaz Arvandeh [39] obtained 70.90% accuracy for three channels, 79.07% for four to 14 channels, and 81.63% accuracy for nine to 19 channels, while the total accuracy for the standard CSP is 79.23%.
Yang et al. [40] used the time domain parameter method to extract the feature and the time-space optimizer to select the optimal channels along with the Fischer analysis classifier. When three channels were selected from 118 channels with BP and TDPS, average results of 71% and 72% were obtained, respectively. When using all the channels, the accuracy was 76%. While in their proposed model, they could increase the accuracy up to 78%. Chen et al. [41] used the method of extracting features from nine filter banks (bands) using the CSP for each filter bank. After feature selection from features collected from all the filter banks, the NBW classifier is used for classification. In their model, the reduction of channels was three or 13 out of 22 channels. The average accuracy was 75% for three specific channels and 87% for 13 channels. In the article [42] by Izabela Rejer, the BSS and PCA methods were used to select the feature. Three channels and 12 filter banks (bands) were used. However, in this study, only one person was examined. For different modes, the accuracy was between 55% and 80% for FSS and between 52% and 87% for PCA. Eslahi et al. [43] have implemented a modified feature extraction method with wavelet subbands and feature selection by GA with four classifiers. A person with three channels was used for the experiment. Results between 68% and 84% were obtained for different classifiers. Wang [44] used the method of the Warp Laser Space Group. The feature extraction is based on the statistical time domain, spectral power, autoregression, and wavelet coefficients in their study. The Warp Laser space group with strategy autoregression model has been used to select channels and features. The total results of the dataset were 83.37%, which reduced the number of channels to 17 and 18 channels, resulting in an accuracy of 84.7%. Kuman [45] used noise elimination cases, cross-correlation normalization by selecting effective channels (from the left and right nipples), and the calculation of data statistics along with ANN and SVM classifiers. Data are collected by a sensory headset device to imagine left and right finger movements. With 14 channels and 420 features, they obtained an average accuracy of 95%. With 10 features and 14 channels, they obtained an average accuracy of 96.69%. With three features and 14 channels, they obtained an average accuracy of 97.34%. In the article by Kasun Amarasinghe [46], where the sensing headset collected the data, the feature selection method was used in addition to the SVM, ANN, and NB classifiers. With 14 channels and 11 features selected for three classes and one person for NB, the accuracy was 82.97% for ANN 83.07% and 83.26% for SVM.
Chin [47] used the channel analysis method to select channels for filtering banks. Three to 14 channels were selected for filtering banks, and feature extraction was done by CSP. Channel reduction was made based on the accuracy of its valid crossover, and the feature reduction was not performed. The total average between 13 and 14 channels is 84.51%, and for three channels, it is 75%. Ang et al. [48] used feature selection from the filter bank common spatial pattern (FBCSP) method with the NBPW classifier. With the original filter bank of CSP (oFBCSP), kappa was 60.7%. With the S filter bank of CSP (sFBCSP), kappa was 61.9%. With the E filter bank of CSP (eFBCSP), kappa was 63.5%.
Wang et al. [49] applied the FDCSP frequency amplitude method. The accuracy results obtained for CSP, FDCSP, SCSP, TRCSP, WTRCSP, and FBCSP were 78%, 82.94%, 81.63%, 78.79%, 78.47%, and 79.30% respectively. Finally, Tang et al. [50] used the neural network model to extract features with five layers of CNN, with different classifiers including Power + SVM, CSP + SVM, and Power + SVM. They achieved accuracy of 81.25%, 82.61%, and 77.17%, respectively, for each classifier, and the average of the proposed method for the two subjects was 86.41%.
We summarize the advantage and disadvantages of the previous methods that we presented above:
Advantages:
  • Most of them use heuristic algorithms, which achieved good performance related to their works.
  • The heuristic algorithms are suitable for selecting features with or without extracting features using conventional methods.
  • In all of them, two general models are used that include (1) feature selection on preprocessed data before the final classification, (2) feature selection after feature extraction on preprocessed data followed by the final classification. It has been demonstrated that the second method is more efficient.
  • Most of them used continuous frequency domain for processing.
  • All of the previous works only used conventional algorithms for the EEG area.
Disadvantages:
  • The heuristic algorithms fail to converge very fast.
  • All of them only used a large frequency domain (8 to 30 Hz) that is not effective enough for feature extraction.
  • Even though the conventional algorithms are well established, there is still a need for new methods with higher performance and accuracy for processing the EEG data.
  • A combination of two general models is not studied.
  • Smaller continuous domains reduce the noise and distinctive features between classes and, as a result, reduce the accuracy. Whereas larger continuous domains increase the noise and distinctive features between classes and, as a result, reduce the accuracy. However, a combination of small and large frequency domains to reduce the noise and increase the distinctive features between classes is not studied.

3. Material and Methods

3.1. Bond Graph Introduction

Bond Graph presents a graphical representation of the dynamic behavior of independent domains from physicals systems [51,52,53]. A graphical representation of the Bond Graph is similar to a flow chart diagram with a different meaning for analyzing the systems. The Bond Graph represents the state of space as a dialogue and interaction inside, outside, and between the systems. The Bond Graph uses a graphical model to show and explain details of the system and the relationship of subsystems and elements. This relationship shows the implementation of the calculation of the system for solving problems. Similar to the block diagram, it uses a single-current graph and also represents one-way information. The Bond Graph can integrate multiple domains in the best possible way.
The basis of the Bond Graph relies on bands. These bands connect a single port (port), a double port, and several ports of the elements. A band is a line connection between two and more elements, and the port is the connection point between an element and a band. The band provides energy and power in real time. Power variables consist of pairs of variables distinguished by the band’s current (power is calculated based on flow and effort). These variables are flow and effect variables. For example, the variables of flow and effort are electric current and electric voltage in electrical systems, respectively. Meanwhile, the variables of flow and effort are velocity and force in mechanical systems, respectively. Figure 1 shows an example of a Bond Graph architecture for the mechanical field.
The connection model of the sub-models determines the priority and superiority in the direction of computing the bands. Calculations of the port variables will be determined. The structure of the formula calculation is determined based on the connection model, which contributes to solving the problem. Bond Graphs can be combined with diagram block ports. Bond Graph models can be used as power ports, signal ports, and output signals. In the physical domain, the concept of a band (energy (electric potential) or current (electric current)) can be used to support modeling processes.
For example, in electrical networks, port variables are moved through the Bond Graph elements, the electrical voltage is moved across the element port, and the electric current is moved through the port. A port is an interface from one element to another (connecting points of bands). Power is obtained from the potential multiplication relation of the flow, which is always changing. Power is changed by the system port. A power band connection means that the band notes that the energy changes between the elements. A band is the design of the edges with the elements and notes the direction of the bands in a positive direction from the energy flow. We have a flow source that delivers power to the system, and other elements absorb power.
The Bond Graph consists of two connections.
Connection 1: The flow (current) through all connection bands is the same (the algebraic sum of the currents at the input and output nodes is zero). It means that the algebraic sum of effort (voltage) differences along one closed loop is zero (Kirchhoff’s voltage law).
Connection 0: The input effort (voltage) of the connection of all elements is the same. The effort (voltage) across all the band connections is the same, which means that the algebraic sum of the current is zero (Kirchhoff’s current law).
The variables of flow and effort are electric current and electric voltage in electrical systems, respectively. Figure 1 shows an example of a Bond Graph architecture for a mechanical field.
Table 1 describes several domains for the Bond Graph methodology that show the flow and effort variables in the various domains associated with them. The most basic variables of each domain are introduced as flow and effort variables that can be understood in different domains.
We briefly describe the graph structure of the Bond Graph below.
To produce a graph-band model, we start with an ideal physical model. In fact, there is a systematic method that we present here as a method or procedure. This procedure generally consists of identifying the basic domains and elements, generating the connection structure (called the connection structure), placing the elements, and possibly simplifying the diagram. This method is different for mechanical domains compared to other domains. These differences are expressed between the parentheses. This is because elements must be linked to different variables or between global variables. Effort variables in non-mechanical domains and velocities (flow variables) in mechanical domains are the global variables we need.
The Bond Graph includes the following steps. Steps 1 and 2 are about identifying domains and elements. Steps 3 to 5 describe the production of a connection structure, which is called a connection structure.
  • Determine which physical domains exist in the system and identify all the basic elements such as C or C-elements (storage elements such as a capacitor or spring), I or I-elements (storage elements such as an inductor or mass), R or R-elements (dissipate free energy such as resistors), SE (sources effort such as motors), SF (sources flow such as motors), TF (transformer, within the same domain (toothed wheel) or between different domains (electromotor)), and GY (gyrator such as an electromotor, a pump, or a turbine). For this purpose, each element is assigned a unique name to distinguish it from the other.
  • Specify a reference source in the ideal physical model for each domain, given that only resources are directed in the mechanical realm.
  • Identify other variables (mechanical domains: speed) and assign unique names to them.
  • Graphically design these effort variables (mechanical: velocities) rather than the source, zero connection (connection 0), and connection mechanics (connection one).
  • Identify all the effort variables (mechanical: speed difference (=current)) required for the ports of all the elements listed in step 1 of the junction structure.

3.2. Bond Graph Optimization and Algorithm

The optimization algorithm proposed in this paper is based on the bond graph methodology. Our proposed algorithm leads to the good convergence and performance of these models. The main structure of this optimizer model and some steps that have been taken to design our algorithm are presented below:
  • Determining the number and type of the physical domains (models): In our model, four specific domains with different formulas for each domain (type of domains) are introduced.
  • Combination of subdomain details: The subdomains are small linear algebraic equations that form bigger algebraic formulas (by addition of them) for calculating the changes in features. Since the small formulas are linear, they lead to the formation of bigger linear formulas. Variables of the small algebraic formulas are effective element attributes.
  • Identifying the basic elements: In our method, the best general element or best local element, the worst general element or any local element, and the average public element or average local element are introduced as basic elements.
  • Identifying all the influencers and naming them: Different combination formulas that are specific to each domain (model) are connected based on ports, and the most influential combinations are determined. Each linear algebraic addition forms a port in our model.
  • Determining the algebraic sum of coefficient of each domain: the algebraic sum of the coefficient of each domain is set to zero or one.
  • Calculation of power based on current or energy: Finally, the power is calculated based on the values obtained multiplied by random numbers. Firstly, values obtained from formula(s) are multiplied by one random number. Secondly, new element values are calculated by adding the powers of each element and its old element value.
In our algorithm, the Bond Graph concepts are partially used with some modifications. This algorithm can be used similarly to other algorithms to optimize problems in all areas. However, unlike other algorithms, the same or different formula can be used for each dimension. Our algorithm can be customized by modification of the base formulas, domain, and effective elements for other optimization problems.
Our four domains are given below with their formulas and combinations:
All coefficients are calculated as follows.
α = α 1 + α 2 + α 3 + α 4 ;   α = 0   o r   α = 1
where α 1 ,   α 2 ,   α 3 , and   α 4 are the coefficients used in the four models.
The values of the coefficients are obtained from the distance of the identified elements or the current element, which is as follows:
α 1 = x c e x g b
α 2 = x c e x g l b
α 3 = x c e x n l b  
α 4 = x c e x g w
α a l l = α 1 + α 2 + α 3 + α 4 ,   i f   α = 1
α 1 = α 1 / α a l l ,   α 2 = α 2 / α a l l ,   α 3 = α 3 / α a l l ,   α 4 = α 4 / α a l l
α a l l = α i + α j + α k ;   i , j , k = 1 4   ,   i # j # k ,   i f   α = 0
α d = α a l l ,   d = 1 4 ,   d   # i # j # k .
x c e , x g b , x g l b ,   x g w , and x n l b are variables for the current element, global best, global–local best, global worst element, and normal local best. These variables are calculated in a similar way as in the PSO algorithm, but for normal local best (nlb), near to the average of them is selected.
In this section, four domains are considered to introduce one formula or set of formulas for updating each domain. For each domain, the combination of models is considered different. If the calculation formula for a domain is more than one, the calculation for new elements is divided between formulas, respectively. The following ranges and formulas are stated in 4, 5, 6, and 7, which are:
Δ E M o d e l _ 1 = α 1 × x c e x g b + α 2 × x c e x g l b + α 3 × x c e x n l b + α 4 × x c e x g w
Δ E M o d e l _ 2 _ 1 = α 1 × x c e x g b + α 2 × x c e x g l b + α 3 × x c e x n l b
Δ E M o d e l _ 2 _ 2 = α 1 × x c e x g b + α 2 × x c e x g l b + α 4 × x c e x g w
Δ E M o d e l _ 2 _ 3 = α 1 × x c e x g b + α 3 × x c e x n l b + α 4 × x c e x g w
Δ E M o d e l _ 2 _ 4 = α 2 × x c e x g l b + α 3 × x c e x n l b + α 4 × x c e x g w
Δ E M o d e l _ 3 _ 1 = α 1 × x c e x g b + α 2 × x c e x g l b
Δ E M o d e l _ 3 _ 2 = α 1 × x c e x g b + α 3 × x c e x n l b
Δ E M o d e l _ 3 _ 3 = α 1 × x c e x g b + α 4 × x c e x g w
Δ E M o d e l _ 4 _ 1 = α 1 × x c e x g b
Δ E M o d e l _ 4 _ 2 = α 2 × x c e x g l b  
Δ E M o d e l _ 4 _ 3 = α 3 × x c e x n l b
Δ E M o d e l _ 4 _ 4 = α 4 × x c e x g w  
Δ E M o d e l _ x = Δ E M o d e l _ x × r a n d ,   x = 1 4
x N e w _ E l e m e n t s = x O l d _ E l e m e n t s + Δ E M o d e l _ x .
In each domain, the number of formulas depends on your design and the model you implement. In this case, one formula for the first domain, three formulas for the second domain, and four formulas for the third and fourth domains are used to change the values of the elements. When more than one formula is used, the distribution of elements is done randomly. It is possible to distribute all of the elements to one formula in each iteration. However, in fact, the elements distribute between different formulas. Different modes of this model can be used: (1) Using the same formulas (same range) to optimize more dimensions at runtime; (2) Using different formulas (different domains) to optimize more dimensions at runtime; and (3) Using all different formulas (different domains) to optimize more dimensions at runtime. Due to the high complexity of this algorithm, in this article, the first part of the tests is reviewed.
The pseudo-code of the Bond Graph algorithm is defined as follows (Figure 2):
(1)
Define algorithm parameters (algorithm variables, elements, element properties, etc.).
(2)
Initialize the defined parameters (initialize the properties of the elements and some fixed parameters such as selecting the domain as a constant).
(3)
Calculate the cost function based on the proposed functions (evaluating the performance of the algorithm).
(4)
Find the best, worst, and average variables locally and globally.
(5)
For the calculation of changes, the elements are divided between the formulas if there are different formulas. It is up to the user to determine the model of division of the number of elements and the order of division (the order of division is not random but in order). Finally, they are multiplied by a fixed value or random changes to make the changes a little more moderate.
(6)
These changes are applied to the values of the old elements, and new elements are obtained.
(7)
Calculate the cost function based on the proposed functions (evaluate the algorithm’s performance).
(8)
Find the variables of best, worst, and average locally and globally.
(9)
Check whether the best element is general and better than the predicted value or not?
(10)
If it is better, go to step 11 and if not, go to step 5.
(11)
Execution is completed, because the best general element is the best answer to solve the problem.
We explain the differences between BGA and PSO in the following:
  • Structure of main formula for calculation: In PSO, only the difference between the current particle and the local best particle and global best particle is calculated in each iteration. The coefficients are two fixed random numbers. However, in BGA, for example, in model three, three formulas are used for updating and improving the values of features. The first formula calculates the difference between the current element and the global best and global–local best, and the coefficients α 1 and α 2 are calculated separately for each iteration. The second formula calculates the difference of the current element from the global best and global worst, and the coefficients α 1 and α 4 are calculated separately for each iteration. The third formula calculates the difference between the current element and the normal local best and global worst, and the coefficients α 3 and α 4 are calculated separately for each iteration.
  • Comparing of main formula for calculation: In the PSO, only one formula for calculating velocity is used. In BGA, for different elements, different formulas are used to calculate the element attribute values. The formulas are randomly chosen for the calculation of each element.

3.3. Common Spatial Pattern (CSP)

The common spatial pattern algorithm (CSP) [41,47,48] is known as an efficient and effective EEG signal class analyzer. In other words, it is a feature extraction method that gives signals from several channels below the mapping space, which can maximize the difference between classes and minimize their similarities. This is accomplished by maximizing the variance of one class by minimizing the variance of another class.
The CSP calculation is done as follows
C = E E / t r a c e ( E E )
where C is the covariance of the normalized space of data input E, which provides raw data from a single imaging period. E is an N × T matrix. T is the number of electrodes or channels, and N is the number of samples in the channel. Trace (.) is defined to be the sum of the elements on the main diagonal of matrix A. The apostrophe represents the transposition operator. Trace is also a set of diagonal elements of x.
The covariance matrix of both classes C1 and C2 is calculated by the average of several imaging periods of the EEG data, and the covariance of the combined space Cc is calculated as follows:
C c = C 1 + C 2
where C c is real and symmetric and can be defined as follows:
C c = u c λ c u c  
where u c is a matrix of special vectors and λ c is the diameter of the matrix of eigenvalues. p is whitening transformation:
p = λ 1 u c .
The variances in space are equalized by u c , and all eigenvalues of p c ¯ L p p r i m e are equal to 1.
S L = p c ¯ L p
S R = p c ¯ R p
S L and S R are matrix covariance with eigenvector and eigenvalue, provided that if
S L = B λ L B
S R = B λ R B  
λ L + λ R = I ,
I is the identity matrix.
Eigenvalues are arranged in descending order, and the projection matrix is defined as W :
W = U T P .
The reflection matrix of each training is as follows:
Z = W × i
where N rows are selected to represent each period of conception W P p = 1 , 2 , , N and the covariance P of Z , P components of the feature vectors are calculated for the nth instruction. Normalized variance is used as the algorithm:
f p = log V a r Z p / V a r ( Z p ) .

3.4. Spatial and Temporal Frequency Domains

This section deals with the formulas of time-domain, frequency-domain, autoregression, and wavelet coefficients because most of the articles explain these methods in detail. In our paper, the formulas are used from papers [28,44]. In the following, we explain the formulas briefly.
These extracting features formulas from 36 to 53 and 54 to 57 belong to the time-domain and frequency-domain, and other formulas that belong to other methods are mentioned above. For wavelet coefficients, the MATLAB function is used. Based on algorithms, features are selecting from different formulas. Using algorithms, the features of some formulas are selected, and the rest are not selected.
Absolute   values   sum = i = 1 N x i
M e a n   a b s o l u t e   V a l u e ,   M A V = 1 N i = 1 N x i
M o d i f i e d   m e a n   a b s o l u t e   v a l u e   t y p e   1 ,   M A V T y p e   1 = 1 N i = 1 N w i x i w i = 1   ,   i f   0.25 N i 0.75 N 0.5 ,    
m o d i f i e d   m e a n   a b s o l u t e   v a l u e   t y p e   2 ,   M A V T y p e   2 = 1 N i = 1 N w i x i w i = 1 ,   i f   0.25 N i 0.75 N 4 i N ,   s l s e i f   i < 0.25 N 4 i N N
S i m p l e   s q u a r e   i n t e g r a l ,   S S I = i = 1 N x i 2  
V a r i a n c e ,   V a r = 1 N 1 i = 1 N x i 2
The absolute value of the 3rd, 4th, and 5th temporal moment
T M 3 = 1 N i = 1 N x i d ,   d = 1 , 23 .
R o o t   m e a n   s q u a r e ,   R M S = 1 N i = 1 N x i 2
W a v e f o r m   l e n g t h ,   W L = i = 1 N 1 x i + 1 x i  
A v e r a g e   a m p l i t u d e   c h a n g e ,   A A C = 1 N i = 1 N 1 x i + 1 x i  
D i f f e r e n c e   a b s o l u t e   s t a n d a r d   d e v i a t i o n   v a l u e , D A S D V = 1 N 1 I = 1 N 1 x i + 1 x i 2
M a x i m u m   v a l u e ,   M a x = m a x x i ,   i = 1 , 2 , , N
M i n i m u m   v a l u e ,   M i n = m i n x i ,   i = 1 , 2 , , N  
S t a n d a r d   d e v i a t i o n ,   S D = s d x i ,   i = 1 , 2 , , N  
M e a n   f r e q u a n c e ,   M D F = 1 2 j = 1 M p j  
P e a k   f r e q u a n c e   ,   R K F = m a x P j
M e a n   p o w e r ,   M N P = j = 1 M P j M
T o t a l   p o w e r ,   T T P = j = 1 M p j  
A u t o r e g r e s s i v e   c o e f i c i e n t s ,   x i = i = 1 p a p x i p + w i ,     p   i s   o r d e r   o f   A R  
P o w e r   S p e c t r u m   S A A f = F F T A F F T A N  
p s m = n = 1 N r x x n e j × 2 π i m n N ,   m = 0 , 1 , 2 , , N  
r x x t   and   r x x n   are   autocorrelation   functions .
C o e f f i c i e n t   W a v e l e t   ϕ j + 1 , 2 i + 1 x = 1 2 N j + 1 s = 0 2 N j 1 ϕ j , s 2 l + 1 π 2 N j ϕ j . s x s = 0 2 N j 1 ϕ j . l s π N j Ψ j , s x , l = 0 , 1 , , 2 N j 1

3.5. Reduce and Extract Features of Brain Signals with Filter Banks and Bands (FBs)

In some studies, feature and electrode reduction methods have been used in combination or separately to process information to reduce data volume. GA and PSO or some other algorithms have been implemented for each person or individual to reduce the dimensions before extracting the features or selecting the features after extracting the features or as the classifiers for classification separately. In this paper, our proposed method is inspired by different methods to reduce the dimensions, while the proposed method is somewhat different from the methods presented in some articles. In a way, it has partial similarities with the items mentioned in the previous. First, in all our models, bandwidth reduction (filter bank) has been used to extract the feature. Second, in some cases, the reduction of features and electrodes has also been used. Third, selecting the feature is done after extracting the features. For this purpose, we implemented different models, which support all items. These models (five general models) are as follows:
(1)
The selection of features is made through a filter bank with a specific channel. Two filter banks and two channels are used for the purpose (separately for individuals by selecting two filters and two specific channels).
(2)
The selection of features is made through a hybrid filter bank with a specific channel, and a new signal from two filter banks along the channel is used for this purpose. (Separately for a person with a combined signal and two specific channels).
(3)
Reduction of features for two channels with two filter banks (in the range of 8 Hz) is intended for frequency reduction. We have a 63% reduction in the frequency range; i.e., from the whole frequency domain, which is 8 to 30 Hz, we have used a combination of smaller frequency ranges. For example, 8 to 12 Hz and 20 to 24 Hz represent a combination of filter banks 2 and 5. Then, we create a signal matrix with dimensions of 10 × 100 for each imaging period (approximately 1.33% of the value of imaging period characteristics for a channel) that is used as input to CSP to filter and extract features. This is followed by a classification of features.
(4)
Features reduction for three channels and three bank or band filters (12 Hz interval) has been used to reduce the frequency. From the 22 Hz range (8–30 Hz), the 12 Hz range is selected (45% reduction in the frequency range). Then, a signal matrix of 18 × 100 is created for each imaging period (generally generated for all imaging periods), which is 2.4% of one channel. The total features are generally slightly more than two channels. Then, the extracted properties are used as two-dimensional matrices as input to the CSP to filter and extract the properties. After extracting the features, classification is done on them.
(5)
Feature extraction from 1 to 3 channels is done separately (for each channel, feature extraction and feature selection are made separately). Time domain, frequency domain, wavelet coefficients, and autoregression coefficients were used to extract the features. From each feature extraction method, a certain amount is considered for feature selection. In other words, four parts of features (features related to each proposed method) are used for each channel separately. Feature extraction and feature selection have been implemented for two channels, 8 parts, and for 3 channels, 12 parts. The amount of feature selection is determined separately from each method and channel, and all features from all channels and methods are used for classification.
In all the stated cases, 100 features are considered for the channel using algorithms. In other words, the reduction is done from 875 to 100 features. To evaluate the five models mentioned above, we utilized the proposed BGA, PSO, GA, and QGA with ELM classifier for classification.
If more than one channel is used, the ability to detect 100 features is the same on all channels. The coefficients determine the location of the features. The values of coefficients of the elements in our algorithm, chromosomes of genes in GA and QGA, and attributes of particles in PSO between zero and one change during the execution of the algorithm. However, in order to improve the accuracy in the fifth model, the coefficients related to the time domain and autoregression are set to binary values.
For models one and two, filter banks 2 and 5 and channels c3 and c4 have been used, and their evaluation has been performed on four algorithms, namely the proposed algorithm, PSO, GA, and QGA. However, for the rest of the models, the ELM classification and the proposed algorithm have been evaluated on the filter banks and individuals, etc. Figure 3 shows an overview of the model for better expression and understanding in detail, which is described in more detail in the experimental section. In this figure, BGA is used as the main algorithm. The other three algorithms (GA, PSO, and QGA) are used similarly.

3.6. Reduce Features Using Pooling for Channels

The key idea is to make a sampling that happens randomly at each layer of complexity (deep learning). Common forms of complex (deep learning) sampling are a function of the mean and maximum determinants that select the largest activity in each sampling area. In a random sample, the activity selection consists of a polynomial explanation drawn by the activities in the sampling area. A secondary view of random sampling is similar to standard maximal sampling but with the copying of input images, each of which has short local variations. This is similar to the explicit elastic deformation of input images, which provides excellent performance for some datasets. In addition, the use of random sampling in a multilayer model gives a large number of changes compared to the higher layers. In random sampling, we select the pool mapping response by sampling a distributed polynomial description of activities from each pool area. In maximum sampling, only the strongest activity is taken from the time filters with input from each area, and whether the rest of the activities have any effect is not considered. For this purpose, random sampling is implemented for activities when maximum activities will be useful. You can see an example of this sampling model in Figure 4 [54,55,56].
In some papers, convolutional layers accept arbitrary input sizes. However, they produce variable output sizes. Classifications and artificial neural layers need to be connected to fixed vectors. The spatial pyramid sampling improves the previous method to store space information by sampling in local space depots. Therefore, these depots are proportionate in size. They are relative to the image.
In this paper, we introduce random sampling to select features randomly and statistically. This is because sampling methods are considered in all layers. Each layer reduces the number of features. After the last layer of classification, it begins to gain accurate classification. Most methods use GA and particle cluster optimization to reduce the dimension that the algorithms use. In our study, the feature reduction model is based on deep learning sampling. Deep learning sampling can be used for samples with any channel, i.e., hybrid channels or individuals. In this article, sampling is done in two ways: (1) for all channels and features for each filter bank individually, and (2) for all channels and features for all filter banks altogether.
In addition to the functions used in older models, i.e., maximum, minimum, and average, two new functions are introduced in this article. The first is the intelligent selection function that selects the maximum or minimum, or average based on a specific formula. In this case, first, we find the average of all the features of the window; then, we calculate the absolute value of the difference between their distance and the average. The maximum value is the selected value. The second function is calculating the average without the participation of the maximum value.
In the following sections, we briefly present a structure of the sampling layer model with classification for our idea:
(1)
In the filter input of banks in general, no preprocessing has been done. After the record of subject, data, or domain, the specified frequencies are filtered.
(2)
Select the size of different molds (from 2 to 6) for each sampling layer (fixed for each layer). However, the active layers are three layers for models with sizes 2 to 4 and five layers for models with sizes 5 to 6.
(3)
Selection of sampling functions for each layer (maximum, minimum, average function, smart maximum or minimum selection, average function without considering the maximum). All sampling models are supported for all layers. Two sampling models for layers are used. First, the layers use only one identical function. Second, the layers use different functions.
(4)
The output of each layer is the input of the next layer: according to the combination of channels and features, the size of the template is a vector, and the output will be one page. Dimensional reduction occurs for features separately.
(5)
Perform steps 2 to 4 for all layers.
(6)
The outputs from the last layers all add up and form a vector (all channels). For classification, if the output is two-dimensional (more than one channel), it is converted to a one-dimensional matrix to send classifiers.
(7)
Selection of training and testing model for classification (10–10 fold method on total data): The whole data set is divided into ten parts. One part is used for testing and the rest are used for training. The whole part of the steps related to testing this model is repeated ten times.
(8)
Classification is done in two classes. The results are different for each person. The results of the first random classification are discussed later in this paper.
An overview of our proposed model for reducing the channel dimension is presented in Figure 5 and Figure 6.

4. Experiment and Results

4.1. Case Study and Numerical Results on Proposed Algorithm

The parameters of the benchmark functions are shown in Table 2. These are 8 general functions for testing optimization regions. Our algorithm with all of the models (domains) along with GA and PSO [57,58] are run in MATLAB 2016 software on PC, and a comparison is performed. All details of the test scenarios for the algorithms are briefly described in Table 3 [59].

4.2. Bond Graph Algorithm (BGA)

Compared to other algorithms, the BGA for the benchmark functions is rapid convergence and near to the global optimum. In other words, it avoids falling into local traps (the main targets are ranges 3 and 4). It is demonstrated that the performance of this model is better than PSO and GA. Figure 7 shows the convergence diagram and the effect of the BGA compared to the other two algorithms. The convergences of the two benchmark functions over time for better understanding are illustrated in Figure 7.
Domains 3 and 4 had excellent results near the global optimum. However, the first two domains in our definition had the worst convergence, which needs to be improved (Figure 7 down). Table 4 shows the comparison of 35-dimensional results of GA and PSO and all of the models of the BGA. From this table, we can conclude that the design of the formulas and variables used is very important and influential in convergence.
The BGA optimization algorithm, similar to PSO, and GA is linear with order O(n). Only one loop is used for calculation. BGA is six times more time consuming than PSO and half as time consuming as GA. For example, the average execution time for 400 iterations and 30 executions on Spheres (first bench mark) for GA, BGA, and BGA was 0.83 s, 0.42 s, and PSO 0.07 s, respectively.

4.3. Experiments and Scenarios

In this study, data set IIa from BCI competition IV [60] is used for our experiments. This database contains the following details: (1) The number of participants for the test consists of nine subjects. (2) Brain information record electrodes include 22 channels (processing is related to these channels). (3) The training and evaluation data for each participant consist of one session. (4) Each session contains 288 training (including movement imaging activities). (5) The training time frame of each training session is 3 s (in our study, we consider 3.5 s, out of which half a second is considered for the preparation for training). (6) It includes four classes, including the right-hand, left-hand, tongue, and foot. (We have only used two classes of right-hand and left-hand in our experiments, similarly to the studies done in the previous works in the literature.) (7) A Butterworth filter is used with 100th order for nine filter banks, including 4–8 Hz, 8–12 Hz, 12–16 Hz, 16–20 Hz, 20–24 Hz, 24–28 Hz, 28–32 Hz, 32–36 Hz, and 36–40 Hz. The details of the data set are presented in Table 5 [60].
In our experiments, details of the new optimizer paradigm are described in Section 3 for the production of models. This is similar to most heuristic-based algorithms for finding solutions to problems by reducing the dimension for different modes and extracting each case individually. In this approach, a set of features in the elements is used to represent the location of features in the channels.
In the following, we present the details of the models (seven model scenarios) used in our experiments for dimension reduction before the feature extraction (three-dimensional reduction) and the models that are used for dimension reduction after feature extraction:
  • The first model scenario: a filter bank with one channel for each participant is examined by four algorithms with ELM [61,62] classification. Two filter banks, i.e., 2 and 5 with two channels, i.e., 8 and 12 which are representing C3 and C4, respectively, are used.
  • The second model scenario: a new hybrid filter (combination of two, three, or four filter banks) with one channel with four algorithms with ELM classification is examined for each participant. A hybrid filter bank with two channels, i.e., 8 and 12, representing C3 and C4, respectively, is used.
  • The third model scenario: In this model scenario, we first extract the features from channel 8 (C3) or 12 (C4) according to Table 6. Then, from the extracted features, the feature selection is done with graph algorithms, genetics, particle clustering, and quantum genetics (Table 6), and then, the classification is done.
  • The fourth model scenario: In this mode, first, feature extraction from two channels 8 (C3) and 12 (C4) according to Table 6 is performed. Then, from the extracted features, feature selection (i.e., dimension reduction) is done by the specified algorithms. In this approach, the selection of features for each method and each channel is selected separately (Table 6), and their set is sent to the classifier for classification.
  • The fifth model scenario: This model scenario is the same as the previous model scenario except that in this model scenario, we used three channels. First, feature extraction is done from three channels 8 (C3), 10 (CZ), and 12 (C4), according to Table 6. Then, from the extracted features, feature selection (i.e., dimension reduction) is done by the specified algorithms. In this case, we have four methods and three channels forming 12 parts. From each part, the best features are selected for the classification according to Table 6. We used the ELM classifier for classification.
  • The sixth model scenario: In this model scenario, first, the features of channels 8 and 12 are reduced by 100 features (100 features are selected for each channel). These features are selected from the frequency domain of 8–12 Hz, 20–24 Hz, and 24–28 Hz that are within the general range of 8–30 Hz. Subsequently, for both of them, using specific formulas, a matrix of 10 × 100 is prepared for each period of conception as a part of CSP. The extracted components of CSP are equal to m = 5 (10 features). After the feature extraction was done for all imaging periods, the classification operation is performed on it. The formulas of each row in the CSP input matrix are as follows:
    R 1 = R C 3   &   F B 5
    R 2 = R C 3   &   F B 6  
    R 3 = R C 4   &   F B 5
    R 4 = R C 4   &   F B 6
    R 5 = R C 3   &   F B 5 + R C 3   &   F B 6  
    R 6 = R C 4   &   F B 5 + R C 4   &   F B 6  
    R 7 = R C 3   &   F B 5 + R C 4   &   F B 5
    R 8 = R C 3   &   F B 6 + R C 4   &   F B 6  
    R 9 = R C 3   &   F B 5 + R C 4   &   F B 6  
    R 10 = R C 3   &   F B 6 + R C 4   &   F B 5  
    where R is a row of the CSP input matrix. For R 1 to R 4 , the values are considered from a specific electrode and frequency domain (one filter bank). For the rest, each row is calculated by the addition of two selected electrodes and frequencies (two filter banks).
  • Seventh model scenario: This model scenario is similar to model scenario 6, but in this model scenario, first, the features are selected from three channels, i.e., 8, 10, and 12, that are reduced by 100 features. These features are selected from the frequency domain of 8–12 Hz, 20–24 Hz, and 24–28 Hz that are within the general range of 8–30 Hz. After that, for both of them, using specific formulas, a matrix of 18 × 100 is prepared for each period of conception as a part of CSP. Extracted components of CSP are equal to m = 5 (10 features). After the feature extraction was done for all imaging periods, the classification operation is performed on it. These formulas are the same as the previous ten formulas in Part 6, to which eight new formulas have been added. The formulas are as follows:
    R 11 = R C 3   &   F B 2  
    R 12 = R C 4   &   F B 2
    R 13 = R C Z   &   F B 2
    R 14 = R C Z   &   F B 5
    R 15 = R C Z   &   F B 6  
    R 16 = R C 3   &   F B 6 + R C 4   &   F B 2  
    R 17 = R C 3   &   F B 5 + R C Z   &   F B 6  
    R 18 = R C Z   &   F B 5 + R C 4   &   F B 6 .
The following are the sampling models used to reduce the dimension based on deep learning sampling:
  • The first sampling model: Dimensional reduction based on deep learning sampling for each filter bank and participant is done separately with different window sizes and different functions. The same function is used in the layers.
  • The second sampling model: Dimensional reduction based on deep learning sampling for all filter banks together and participants separately is done with different window sizes and different functions. The same function is used in the layers.
  • Third sampling model: Dimensional reduction based on deep learning sampling for all filter banks together and participants separately is done with different window sizes and different functions. Different functions randomly selected with non-uniform distribution are used in each layer.
Two diagnostic measurements, i.e., accuracy and kappa, are considered for the analysis of each mental task.

4.4. Results on Algorithms

4.4.1. Results of First Model Scenario

Table 7 shows the results of a dimensional reduction by 100 features performed by four algorithms on filter bank number 2 (frequency range was 8 to 12). The average accuracies for channel 8 were calculated as 61.1%, 61.0%, 61.3%, and 60.0%, respectively, and for channel 12, they were calculated as 61.3%, 61.7%, 62.3%, and 61.4%, respectively for four algorithms, i.e., PSO, GA, BGA (with the third model), and QGA [63,64]. The overall best accuracy for the channels 8 and 12 are highlighted for each algorithm. The average accuracy of our proposed algorithm is better than the others. In channel 8, our algorithm demonstrated a slightly better accuracy compared with PSO and GA (i.e., 0.1%) and outperformed compared with the QGA by about 1%. In channel 12, our algorithm outperformed other algorithms by about 1–2%. However, for some test subjects, the highest accuracy was not obtained by our proposed algorithm. For example, in channel 8, the highest accuracy was obtained by GA for the first, seventh, and ninth subjects, i.e., 62.2%, 61.4%, and 60.8%, respectively. However, in the same channel, our algorithm had better accuracy for the second, third, fifth, and eighth subjects, i.e., 61.7%, 61.9%, 60.8%, and 60.3%, respectively. In channel 12, our proposed algorithm had the highest accuracy for the first, second, third, seventh, eighth, and ninth subjects, i.e., 60.5%, 60.3%, 63.0%, 66.6%, 60.4%, and 62.5%, respectively.
Table 8 shows the results of a dimensional reduction by 100 features performed by four algorithms on filter bank number 5 (frequency range was 20 to 24). The average accuracies for channel 8 were calculated as 60.8%, 61.2%, 61.4%, and 60.1%, respectively, and for channel 12, they were calculated as 61.5%, 61.9%, 63.2%, and 61.1%, respectively for four algorithms, i.e., PSO, GA, BGA (with the third model), and QGA. In general, for each channel and subject, the highest accuracies are about 0.2 to 1% higher than the other algorithms. Our proposed algorithm performed better for most of the cases. However, for some test subjects, the highest accuracy was not obtained by our proposed algorithm. For example, in channel 8, the highest accuracies were obtained by GA for the first, third, and sixth subjects, i.e., 61.6%, 62.0%, and 63.2%, respectively, and for the ninth subject, the highest accuracy, i.e., 61.8%, was obtained by QGA. In channel 12, the highest accuracy, i.e., 65.2%, was obtained for the fourth subject by PSO algorithm. The rest of the highest accuracies was obtained by our proposed algorithm in both channels for different subjects.

4.4.2. Results of Second Model Scenario

In the following, we will examine the results of the hybrid signals, i.e., a combination of filter banks (a) 2 and 5 presented in Table 9, (b) 2, 5, and 6 presented in Table 10, and (c) 2, 5, 6, and 9 presented in Table 11. Such cases do not exist in nature for recording, and no system can record such models. Hence, we generate them offline. We have highlighted the highest obtained accuracies in these tables, similar to the previous two tables presented above. For these cases, our proposed algorithm mostly had the highest accuracies for channel 12. In addition, our algorithm demonstrated higher average accuracies in all of the cases. In Table 11, similar to Table 9, in channel 12, the majority of the best accuracies was obtained by our algorithm. According to the results presented in Table 7, Table 8, Table 9, Table 10 and Table 11, in general, the algorithms perform better in channel 12 compared to channel 8 (1–3% higher accuracy). Reduction of the dimensions for the brain signals is effective in avoiding the local minimums.

4.4.3. Results of Third Model Scenario

Table 12 presents the results related to the extraction and selection of features, including time domain, frequency domain, autoregression, and wavelet coefficients, on a single channel (C3 = channel 8 or C4 = channel 12) by combining two filter banks, i.e., two and five (making a new signal with eight new frequency ranges). The best average accuracy was obtained by GA on both channels, i.e., 61.90% and 60.97%, respectively. On channel 8, the average accuracies were calculated as 59.84%, 59.78%, and 59.75% for PSO, BGA, and QGA, respectively. On channel 12, the average accuracies were calculated as 61.24%, 61.14%, and 60.89% for PSO, QGA, and BGA, respectively. In channel 12, all other algorithms performed better than our proposed algorithm. In general, in this model, our proposed algorithm did not perform well.

4.4.4. Results of Fourth and Fifth Model Scenarios

Table 13 and Table 14 present the results related to the extraction and selection of features, including time domain, frequency domain, autoregression, and wavelet coefficients, on two channels (C3 = 8 and C4 = 12) and three channels (C3 = 8 and CZ = 10 and C4 = 12), respectively. In both cases, two filter banks, i.e., two and five, were combined, making a new signal with eight new frequency ranges.
The GA obtained better average accuracies in both models, i.e., 66.16% and 65.45% for two channels and 66.59% and 66.57% for three channels.

4.4.5. Results of Sixth and Seventh Model Scenarios

Table 15 presents the accuracy of the two models with CSP (sixth and seventh models that we defined earlier). In this case, a feature reduction by 100 features was performed, which was followed by the creation of new signals using the filter banks 2, 5, and 6. As a result, matrices M1 and M2 are created as input to CSP based on two and three primary channels, respectively. Our proposed algorithm shows the best performance, i.e., 73.12% and 79.53% average accuracies for M1 and M2, respectively. Kappa was calculated as 46.24% and 59.05% for M1 and M2, respectively. The average accuracy for M2 was also relatively good for the other three algorithms, i.e., 75.07%, 73.99%, and 71.18% for PSO, GA, and QGA, respectively. While in M1, the accuracy drops by 4 to 5% when using two channels and three filter banks.
Our proposed algorithm’s average accuracy improves 6.41% from M1 to M2, and the average Kappa improves 12.82% from M1 to M2. In PSO, this improvement is 5.53% and 11.06% for the average accuracy and kappa, respectively. In GA, this improvement is 4.7% and 9.43% for average accuracy and kappa, respectively. In QGA, this improvement is 5.22% and 10.45% for average accuracy and kappa, respectively. This shows that the addition of the channels and filter banks has a significant impact on improving the accuracy.

4.5. Results on Pooling for Sampling

4.5.1. Results of First Sampling Model

Figure 8 and Figure 9 show the results of selecting four different window sizes for participants, all five functions, and all nine filter banks. The first four functions select one of the features to reduce the dimension, and the fifth function extracts the average of all of the features in each window. Overall, the fifth function, i.e., average function, has the best average accuracy, i.e., 73.17%, compared to the other functions in all filter banks and for all participants. The performance of the rest of the functions, i.e., maximum–minimum automatic function, no-maximum average function, maximum function, and minimum function, are 50.70%, 51.34%, 64.74%, and 63.72%, respectively.
We investigate three well-known functions, i.e., maximum, minimum, and average functions for different participants.
For example, for subject one, the accuracy obtained for maximum, minimum, and average functions were 58.70%, 58.00%, and 71.90%, respectively. While for subject two, these accuracy values were about 10% different.
The accuracy of the first function, i.e., maximum function, is close to 50% in most subjects. The accuracy of the second function, i.e., minimum function, in some cases is 4 or 5% better than the first function. In the second function, the difference in accuracies is 13% and 31% in some subjects. This indicates the effectiveness of the maximum function.
The average accuracy on the average function for all subject on the filter bank 5 is 61.45%, 54.30%, 62.30%, 73.17%, 62.53%, 59.27%, 62.09%, and 63.07%, respectively. So, it can be concluded that filter bank 5 (FB5) has the most information that is related to the perception of the left and right hand. According to our experiment, the frequency range between 8 and 30 Hz contains the most important information that is represented as filter bank 5.
Figure 10, Figure 11, Figure 12 and Figure 13 show the results of two filter banks, i.e., 5 and 6, with all the functions and different types of sampling window sizes for participants. In practice, we have five different types of sampling window sizes in the range of 3–7. We present the results of the average accuracy of all subjects with varying window sizes for all functions and the filter bank 5 in Figure 10.
The highest accuracy obtained for the first, second, third, fourth, and fifth functions and filter bank 5 were related to window sizes 3, 6, 4, 4, and 4, respectively, i.e., 56.79%, 57.43%, 64.74%, 63.72%, and 73.17%, respectively (Figure 12). Similarly, the results for filter bank 6 is presented in Figure 12 and Figure 13.

4.5.2. Results of Second and Third Sampling Models

Table 16 presents the result of two different approaches: first, all filter banks with the same function used in all layers, and second, all filter banks with randomly selected functions for each layer. Both cases are applied for different types of sampling window sizes for participants. The best-obtained accuracies are highlighted in the table. According to this experiment, we conclude that the window size and the selected functions significantly impact the accuracy. The best accuracies in this scenario, i.e., using all filter banks, are in the range of 55–62%, while this range is between 60 and 73% when using only one filter bank, i.e., filter bank 5 (Figure 11). This demonstrates that effective features are selected in filter bank 5, contributing to better accuracies.

5. Results and Discussion

5.1. Discussion of Algorithm on Brain Signals

Table 17 shows the best results for four different algorithms for different settings, i.e., different combinations of filter banks and channels. Our proposed algorithm (BGA) on filter bank 5 and channel 12 has obtained the best accuracies in most subjects. The average best accuracy, in this case, is 63.2%. The second-best result is related to GA with filter bank 5 and channel 12 (i.e., 61.9%). Subsequently, BGA and PSO have achieved the best results of 61.6% and 61.7%, respectively, when the combination of filter banks 2 and 5 are used. This indicates that the combination of filter banks 2 and 5 provides valuable information, but the results are still 1.5% less than the case using only filter bank 5.
In Table 18 and Figure 14, the results of accuracy for GA, BGA, and PSO with two channels and two proposed models (i.e., FM and CSP) are compared with the methods of some articles [33,65]. BGA with CSP obtained the best accuracies in most subjects with the best average accuracy, i.e., 73.12%, which is about 5–6% more than the other three algorithms, while the results related to GA and PSO are within the same range as the other three algorithms.
In Table 18, the statistical significance of performance between RF and DDFS with two other methods [33] with the fourth model scenario (four feature extractions, i.e., time domain, frequency domain, autoregression, and wavelet coefficients with two channels), and the sixth model scenario (CSP with two channels) have been calculated. A paired t-test is calculated for comparing p-values. The methods are considered of statistical significance when the p-value is less than 0.05. BGA with CSP with the fourth and sixth model scenarios could achieve p-values 0.020 and 0.005, respectively.
In Table 19 and Figure 15, the results of kappa for GA, BGA, and PSO with three channels and two proposed models (i.e., FM and CSP) are compared with the methods of some articles that use different channels [39]. BGA with CSP obtained the best kappa in most subjects with the average kappa, i.e., 59.06%, which is only 0.84% less than standard kappa (i.e., 60%). This result is significantly higher (about 18%) than the article [39] with three channels, while this difference is only about 0.5–1% compared to the results of the article [39] that uses all or 8.55 channels.
In Table 19, the statistical significance of performance between CSP with three channels (C3, C4, and CZ) with two other methods [39], the fifth model scenario (four feature extractions, i.e., time domain, frequency domain, autoregression, and wavelet coefficients with three channels), and the seventh model scenario (CSP with three channels) have been calculated. The methods are considered statistically significant when the p-value is less than 0.05. BGA with CSP in the seventh model scenario and two models from other papers could achieve p-values of 0.032, 0.003, and 0.003, respectively.
Table 20 and Figure 16 show the results of kappa for GA, BGA, and PSO with three channels, and two proposed models (i.e., FM and CSP) are compared with the methods of some articles that use different channels and different filter banks [39,48]. The best average kappa in this experiment is related to FBCSP in [48], which was 4.41% higher than BGA with CSP. In this scenario, the other methods, in general, had 1.5–3% better kappa compared to BGA with CSP. This is because in BGA, we use only three channels with 60% of the frequency range, while in other studies, 22 channels with the entire frequency range have been used. Considering the significant difference in the channels and frequency range, the BGA had only 1.5% less kappa compared to the smallest kappa in the other works.
Table 21 and Figure 17 show the results of kappa for various implementation methods with CSP [66]. The kappa values are in the range of 50% to 63%. BGA with CSP, three channels, and 60% of the frequency range obtained the second-best average kappa, i.e., 59%. While in all the other methods, all channels with all frequency ranges have been used.
In Table 21, the statistical significance of performance between CSP (C3, C4, CZ) [39] with two of our model scenarios and other methods is evaluated. We used a paired t-test for the comping of p-value. All of the methods have a p-value of less than 0.05 (p-value < 0.05).
Figure 18 shows one training epoch of the increasing accuracy of subjects 3 and 8 with algorithms using the seventh model scenario (CSP with three channels (CSP2)). BGA has the best converging to reach optimal results.
In most studies involving filter banks, first, all filter banks are applied to noise space filters to reduce noise, and then, features are extracted from them. Next, all the features of all filter banks are added together, and we select a set of features from them. In addition, in reducing the features for channels or selecting a channel, an interval of eight to 30 frequencies is considered. In this study, two or three filter banks with two important channels are applied to reduce the noise and extract more important information. We examined various algorithms tested on brain signals. We also proposed a new algorithm based on the Bond Graph method, i.e., BGA. Our design goal for the new algorithm is to effectively reduce the dimension of features and the effect of specified filter banks and channels.
Most of these algorithms show close convergence with each other. Although the accuracy values for some subjects were low in some cases, it was still far enough from the local minimum trap. In most cases, BGA showed the best performance compared to the other algorithms.

5.2. Discussion Sampling (Pooling) on Brain Signals

In all our study cases, sampling is part of deep learning to reduce the dimension in the layers. It is used during the deep learning phase. We use samplings based on four main functions, i.e., maximum, minimum, maximum–minimum automatic function, and no-maximum average function, on the data along with filter banks to reduce the dimension of the channels. The number of channels is fixed, and the dimensional reduction of each channel depends on the window size. The output window sizes for each channel are as follows: window size 3 with four features, window size 4 with one feature, window size 5 with seven features, window size 6 with five features, and window size 7 with three features.
Window size 4 holds the lowest number of features (i.e., one feature) with 99.88% dimension reduction. So, in our experiment, a total of 22 features out of a total of 19,250 features have been selected for classification for window size 1. Window size 4 holds the highest number of features (i.e., seven features), with 99.2% dimension reduction. So, in our experiment, a total of 154 features out of a total of 19,250 features have been selected for classification for window size 4.
In addition to the earlier functions, we also used an average function for the sampling in each channel. In the average function, only the average value of the features within each window size is selected. So, the number of output samples for the average function is 22 features.
The average accuracy of the functions mentioned above is in the range of 55% to 73%. This can be due to the excessive reduction of features in each channel. Therefore, in future works, more features must be considered to examine the main performance. However, despite the very small selection of features, when using a single filter bank, the accuracy of 61.45% to 64.74% is obtained, which is very good considering the very small selected features.
Compared to the articles on the basic quantum neural network method [65], where the average accuracy of all individuals based on feature extraction is 66.59%, all channels and total data have been involved in the extraction, which is only 2.2% better than our sampling methods with a single filter bank and significant dimension reduction.
Compared to Jing Luo’s [33] article, feature selection was selected after feature extraction from two channels, i.e., 8 and 12, using a wavelet package. The average accuracy in the range of 67–68% was obtained using a random forest classifier in the frequency range of 8–30, which is only 3–4% better than our sampling methods with a single filter bank and significant dimension reduction.
In Adham Atyabi’s article [36], a Mask model has been used on 118 channels to reduce the electrodes and features. As a result, the feature reduction rate was 99%, which is near to our work (i.e., 99.88%). They obtained accuracy in the range of 63–86% for five subjects. The average accuracy was 74% in their work, which is only1% more than our best average accuracy. However, the dimensional reduction in our work was more than their work.

5.3. Strength and Weakness of the Bond Graph Algorithm

Based on our experiments for the new BGA, BGA could achieve a very fast convergence compared to GA and PSO on eight benchmarks with models 3 and 4. It means that the formulation of two domains is suitable for converging, whereas the convergence of models 1 and 2 was worse than the others. Based on our experience, introducing a formula structure is very effective for improving the convergence.
It is possible to change parameters to find out the most suitable parameters for convergence. However, it may not have a significant effect on their performance, because their structure remains unchanged.
We obtained the best results for feature selection in one or two channels with model 3 when the selected features are sent directly to the ELM classifier. In addition, we obtained the best results for feature selection in two or three channels with model 3 when the selected features are sent directly to the CSP feature extraction. Then, the extracted features are sent to the ELM classifier. In both cases, the BGA converged significantly faster to the global optimum compared to the other algorithms.
However, the BGA converges slower and may fall in local optimum when the feature selection is done based on four different parts, i.e., time domain, frequency domain, autoregression, and wavelet coefficients. In this case, if some parts are selected based on the binary model and some other parts are selected based on the random mode, the BGA performs poorly compared to the other algorithms. However, if all of the parts are selected based on either binary or random models, the results are better than the other algorithms. So, to solve this problem, two different models for feature selection should not be used simultaneously.

6. Conclusions

In this article, a new algorithm called BGA is introduced and implemented to reduce the feature dimensions in brain signals. This algorithm shows a better performance than GA and PSO on general functions. Our algorithm performs better compared to other algorithms when reducing the dimension to 100 features on the specific filter banks. Some scenarios for testing this algorithm were implemented, including the following. (1) Reductions of features, electrodes, and frequency range have been evaluated simultaneously for brain signals. (2) Feature selection (with algorithms) and feature extraction by time domain, frequency domain, wavelet coefficients, and autoregression have been studied on some electrodes and filter banks. (3) Feature, electrodes, and the frequency range are reduced, which is followed by the construction of new signals based on the proposed formulas. Then, the CSP is used for the extraction of features. (4) Finally, a separate experiment with the deep learning sampling method was implemented as feature selection in several layers. The dimensional reduction was performed by sampling using three general functions and two new functions. All scenarios expressed in the left hand and right hand have been evaluated between one and three channels. Our algorithm outperformed by up to 5 to 8% accuracy and had 5% better kappa compared to the other studies with the same or similar settings.
For future works, first, we will investigate even smaller sizes and different combinations of filter banks to optimize the noise reduction and increase the distinctive patterns. Second, we aim to examine new functions of deep learning sampling models with the aim of increasing accuracy and performance.

Author Contributions

A.N. conceived the ideas. Z.F., G.A., and F.H. helped to complete ideas. A.N. performed the numerical simulations. Z.F. validated the results. G.A. and A.N. wrote the original draft. F.H. and A.N. edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 61876147.

Institutional Review Board Statement

No applicable.

Informed Consent Statement

No applicable.

Data Availability Statement

The data are publicly available in the link [Doi]. For analyzing and implementation, it needs to request from the corresponding author for explaining. Due to the bond graph methodology and different models, It is necessary to explain more details for implementation.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Krauledet, M.; Schroder, M.; Blankert, B.; Muller, K.R. Reducing Calibration Time for Brain-Computer Interface: A Clustering Approach, Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2007; Volume 19, pp. 753–760. [Google Scholar]
  2. Krauledet, M.; Tangermann, M.; Blankert, B.; Muller, K.R. Towards zero training for brain-computer interfaceing. PLoS ONE 2008, 3, e2967. [Google Scholar]
  3. Birbaumer, N.; Cohen, L.G. Brain-Computer interfaces: Communication and restoration of movement in paralysis. J. Physiol. 2007, 579, 621–636. [Google Scholar] [CrossRef] [PubMed]
  4. Varkuti, B.; Guan, C.; Pan, Y.; Phua, K.S.; Ang, K.K.; Kuah, C.W.K.; Chua, K.; Ang, B.T.; Birbaumer, N.; Sitaram, R. Resting state changes in functionalconnectivity correlate with movement recovery for BCI and robot-assisted upper-extremity training after stroke. Neurorehabil. Neural Repair 2013, 27, 53–62. [Google Scholar] [CrossRef]
  5. McCane, L.M.; Heckman, S.M.; McFarland, D.J.; Townsend, G.; Mak, J.N.; Sellers, E.W.; Zeitlin, D.; Tenteromano, L.M.; Wolpaw, J.R.; Vaughan, T.M. P300-Based brain–computer interface (BCI) event-related potentials (ERPS): People with amyotrophic lateral sclerosis (ALS) vs. age-matched controls. Clin. Neurophysiol. 2015, 126, 2124–2131. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Milovanovic, I.; Robinson, R.; Fetz, E.E.; Moritz, C.T. Simultaneous and independent control of a brain–computer interface and contralateral limbmovement. Brain Comput. Interfaces 2015, 2, 174–185. [Google Scholar] [CrossRef] [Green Version]
  7. Yin, E.; Zhou, Z.; Jiang, J.; Chen, F.; Liu, Y.; Hu, D. A novel hybrid BCI speller based on the incorporation of SSVEP into the P300 paradigm. J. Neural Eng. 2013, 10, 026012. [Google Scholar] [CrossRef] [PubMed]
  8. Wang, M.; Daly, I.; Allison, B.Z.; Jin, J.; Zhang, Y.; Chen, L.; Wang, X. A new hybrid BCI paradigm based on P300 and SSVEP. J. Neurosci. Methods 2015, 244, 16–25. [Google Scholar] [CrossRef] [PubMed]
  9. Brumberg, J.S.; Burnison, J.D.; Pitt, K. Using Motor Imagery to Control Brain-Computer Interfaces for Communication. In International Conference on Augmented Cognition; Springer: Berlin/Heidelberg, Germany, 2016; pp. 14–25. [Google Scholar]
  10. Atyabi, A.; Luerssen, M.H.; FitzGibbon, S.P.; Powers, D.M.W. The Impact of PSO Based Dimension Reduction on EEG Classification. In International Conference on Brain Informatics; Springer: Berlin/Heidelberg, Germany, 2012; pp. 220–231. [Google Scholar] [CrossRef]
  11. Jin, J.; Wang, X.; Zhang, J. Optimal Selection of EEG Electrodes via DPSO Algorithm. In Proceedings of the 7th World Congress on Intelligent Control and Automation, Chongqing, China, 25–27 June 2008; pp. 5095–5099. [Google Scholar]
  12. Mistry, K.; Zhang, L.; Neoh, S.C.; Lim, C.P.; Fielding, B. A Micro-GA Embedded PSO Feature Selection Approach to Intelligent Facial Emotion Recognition. IEEE Trans. Cybern. 2016, 47, 1496–1509. [Google Scholar] [CrossRef] [Green Version]
  13. Al-Ani, A. Feature Subset Selection Using Ant Colony Optimization. 2005. Available online: https://opus.lib.uts.edu.au/handle/10453/6181 (accessed on 10 January 2020).
  14. Sabeti, M.; Boostani, R.; Katebi, S.D. A New Approach to Classify the Schizophrenic and Normal Subjects by Finding the Best Channels and Frequency Bands. In Proceedings of the 15th International Conference on Digital Signal Processing, Wales, UK, 1–4 July 2007; pp. 123–126. [Google Scholar] [CrossRef]
  15. Yom-Tov, E.; Inbar, G. Feature selection for the classification of movements from single movement-related potentials. IEEE Trans. Neural Syst. Rehabilitation Eng. 2002, 10, 170–177. [Google Scholar] [CrossRef] [PubMed]
  16. Dias, N.S.; Jacinto, L.R.; Mendes, P.M.; Correia, J.H. Feature Down Selection in Brain Computer Interface. In Proceedings of the 4th International IEEE EMBS Conference on Neural Engineering, Antalya, Turkey, 29 April–2 May 2009; pp. 323–326. [Google Scholar]
  17. Largo, R.; Munteanu, C.; Rosa, A. CAP event detection by wavelets and GA tuning. IEEE Int. Workshop Intell. Signal Process. 2005, 2005, 44–48. [Google Scholar]
  18. Hasan, B.A.S.; Gan, J.Q.; Zhang, Q. Multi-Objective Evolutionary Methods for channel selection in brain Computer in-terface: Some Preliminary Experimental Results. In Proceedings of the 2010 IEEE Congress on Evolutionary Computation (CEC), Barcelona, Spain, 18–23 July 2010; pp. 1–6. [Google Scholar]
  19. Al Moubayed, N.; Hasan, B.A.S.; Gan, J.Q.; Petrovski, A.; McCall, J. Binary-SDMOPSO and its application in channel selection for Brain-Computer Interfaces. In Proceedings of the UK Workshop on Computational Intelligence (UKCI), Colchester, UK, 8–10 September 2010. [Google Scholar] [CrossRef]
  20. Atyabi, A.; Luerssen, M.; Fitzgibbon, S.; Powers, D.M.W. Evolutionary feature selection and electrode reduction for EEG classification. In Proceedings of the IEEE Congress on Computational Intelligence, Washington, DC, USA, 17–18 November 2012. [Google Scholar] [CrossRef]
  21. Atyabi, A.; Luerssen, M.; Fitzgibbon, S.; Powers, D.M.W. Adapting subject-independent task-specific EEG feature masks using PSO. In Proceedings of the IEEE Congress on Computational Intelligence, Washington, DC, USA, 17–18 November 2012. [Google Scholar] [CrossRef] [Green Version]
  22. Atyabi, A.; Luerssen, M.; Fitzgibbon, S.P.; Powers, D.M.W. Dimension reduction in EEG data using Particle Swarm Optimization. In Proceedings of the IEEE World Congress on Computational Intelligence (WCCI), Brishbana, Australia, 10–15 June 2012. [Google Scholar] [CrossRef]
  23. Qi, Y.; Ding, F.; Xu, F.; Yang, J. Channel and Feature Selection for a Motor Imagery-Based BCI System Using Multilevel Particle Swarm Optimization. Comput. Intell. Neurosci. 2020, 2020, 1–11. [Google Scholar] [CrossRef]
  24. Yin, Z.; Liu, L.; Chen, J.; Zhao, B.; Wang, Y. Locally robust EEG feature selection for individual-independent emotion recognition. Expert Syst. Appl. 2020, 162, 113768. [Google Scholar] [CrossRef]
  25. Islam, R.; Tanaka, T.; Molla, K.I. Multiband tangent space mapping and feature selection for classification of EEG during motor imagery. J. Neural Eng. 2018, 15, 046021. [Google Scholar] [CrossRef] [PubMed]
  26. Baig, M.Z.; Javed, E.; Ayaz, Y.; Afzal, W.; Gillani, S.O.; Naveed, M.; Jamil, M. Classification of left/right hand movement from EEG signal by intelligent algorithms. In Proceedings of the 2014 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), Penang, Malaysia, 7–8 April 2014. [Google Scholar] [CrossRef]
  27. Dupres, A.; Cabestaing, F.; Rouillard, J. Supervision of time-frequency features selection in EEG signals by a human expert for brain-computer interfacing based on motor imagery. In Proceedings of the 2016 IEEE Systems, Man and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016. [Google Scholar] [CrossRef] [Green Version]
  28. Phinyomark, A.; Phukpattaranont, P.; Limsakul, C. Feature reduction and selection for EMG signal classification. Expert Syst. Appl. 2012, 39, 7420–7431. [Google Scholar] [CrossRef]
  29. Gan, J.Q. Feature dimensionality reduction by manifold in brain-computer interface design. In Proceedings of the 3rd International Workshop on Brain-Computer Interface, Colchester, UK, 21–24 September 2006; pp. 28–29. [Google Scholar]
  30. Jun, L.; Meichun, L. Common spatial pattern and particle swarm optimization for channel selection in BCI. In Proceedings of the 3rd International Conference on Innovative Computing Information and Control, Dalian, China, 18–20 June 2008; p. 457. [Google Scholar]
  31. Jain, A.; Zongker, D. Feature selection: Evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 153–158. [Google Scholar] [CrossRef] [Green Version]
  32. Liu, H.; Yu, L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar] [CrossRef] [Green Version]
  33. Luo, J.; Feng, Z.R.; Zhang, J.; Lu, N. Dynamic frequency feature selection based approach for classification of motor im-ageries. Comput. Biol. Med. 2016, 75, 45–53. [Google Scholar] [CrossRef]
  34. Liu, A.; Chen, K.; Liu, Q.; Ai, Q.; Xie, Y.; Chen, A. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata. Sensors 2017, 17, 2576. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Hassan, B.A.S.; Gan, J.Q. Multi-Objective particle Swarm Optimization for channel selection in Brain-Computer Interface. In Proceedings of the UK Workshop on Computational Intelligence (UKCI 2009), Nottingham, UK, 7–9 September 2009. [Google Scholar]
  36. Atyabi, A.; Luerssen, M.H.; Powers, D.M. PSO-based dimension reduction of EEG recordings: Implications for subject transfer in BCI. Neurocomputing 2013, 119, 319–331. [Google Scholar] [CrossRef]
  37. Nakisa, B.; Rastgoo, M.N.; Tjondronegoro, D.; Chandran, V. Evolutionary computation algorithms for feature selection of EEG-based emotion recognition using mobile sensors. Expert Syst. Appl. 2018, 93, 143–155. [Google Scholar] [CrossRef] [Green Version]
  38. Peterson, D.A.; Knight, J.N.; Kirby, M.J.; Anderson, C.W.; Thaut, M.H. Feature Selection and Blind Source Separation in an EEG-Based Brain-Computer Interface. EURASIP J. Adv. Signal Process. 2005, 2005, 218613. [Google Scholar] [CrossRef] [Green Version]
  39. Arvandeh, M.; Guan, C.; Ang, K.; Quek, C. Optimizing the Channel Selection and Classification Accuracy in EEG-Based BCI. IEEE Trans. Biomed. Eng. 2011, 58, 1865–1873. [Google Scholar] [CrossRef] [PubMed]
  40. Yang, Y.; Kyrgyzov, O.; Wiart, J.; Bloch, I. Subject-Specific Channel Selection for Classification of Motor Imagery Elec-troencephalographic Data. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), Vancouver, BC, Canada, 26–31 May 2013; pp. 1277–1280. [Google Scholar]
  41. Chin, Z.Y.; Ang, K.K.; Wang, C.; Guan, C.; Zhang, H. Multi-Class filter bank common spatial pattern for four-class motor imagery BCI. In Proceedings of the 31st Annual International Conference of IEEE EMBS, Minneapolis, MN, USA, 2–6 September 2009; pp. 571–574. [Google Scholar] [CrossRef]
  42. Rejer, I. EEG Feature Selection for BCI Based on Motor Imaginary Task. Found. Comput. Decis. Sci. 2012, 37, 283–292. [Google Scholar] [CrossRef] [Green Version]
  43. Eslahia, S.V.; Dabanloob, N.J.; Maghooli, K. A GA-based feature selection of the EEG signals by classification evaluation, Application in BCI system. arXiv 2019, arXiv:1903.02081v1. [Google Scholar]
  44. Wang, J.; Xue, F.; Li, H. Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso. BioMed Res. Int. 2015, 2015, 1–13. [Google Scholar] [CrossRef] [Green Version]
  45. Lokman, M.; Dabag, A.; Ozkurt, N.; Najeeb, S.M.M. Feature Selection and Classification of EEG Finger Movement Based on Genetic Algorithm. In Proceedings of the 2018 Innovations in Intelligent Systems and Applications Conference (ASYU), Adana, Turkey, 4–6 October 2018. [Google Scholar]
  46. Amarasinghe, K.; Sivils, P.; Manic, M. EEG Feature Selection for Thought Driven Robots using Evolutionary Algorithms. In Proceedings of the IEEE 9th International Conference on Human System Interactions (HSI), Portsmouth, UK, 7 July 2016. [Google Scholar]
  47. Chin, Z.Y.; Ang, K.K.; Wang, C.; Guan, C. Discriminative Channel Addition and Filter Bank Common Spatial Pattern in Motor imagery BCI. In Proceedings of the 36th Annual IEEE International Conference of Engineering in Medicine and Biology Society (EMBC), Chicago, IL, USA, 26–30 August 2014; pp. 1310–1313. [Google Scholar]
  48. Ang, K.K.; Chin, Z.Y.; Zhang, H.; Guan, C. Filter Bank Common Spatial Pattern (FBCSP) algorithm using online adaptive and semi-supervised learning. In Proceedings of the International Joint Conference on Neural Networks 2011, San Jose, CA, USA, 31 July–5 August 2011; pp. 392–396. [Google Scholar] [CrossRef]
  49. Wang, J.; Feng, Z.R.; Lu, N. Feature extraction by Common Spatial Pattern in Frequency Domain for Motor Imagery Tasks Classidication. In Proceedings of the 29th Chinese Control And Decision Conference (CCDC), Chongqing, China, 28–30 May 2017. Electronic ISSN: 1948-9447. [Google Scholar]
  50. Tang, Z.; Li, C.; Sun, S. Single-Trial EEG classification of motor imagery using deep convolutional neural networks. Optik 2017, 130, 11–18. [Google Scholar] [CrossRef]
  51. Broenink, J.F. Bond Graph modelling in Modelica. In Proceedings of the 9th European Simulation Symposium, Passau, Germany, 19–22 October 1997; pp. 137–141. [Google Scholar]
  52. Broenink, J.F. Object-Oriented modelling with bond graph and modelica. In Proceedings of the International Conference on Bond Graph Modelling ICBGM’99, San Francisco, CA, USA, 1 January 1999; Volume 31, pp. 163–168. [Google Scholar]
  53. Broenink, J.F. Introduction to Physical Systems Modelling with Bond Graphs. SiE Whitebook Simul. Methodol. 1999, 31, 2. [Google Scholar]
  54. Zeiler, M.D.; Fergus, R. Stochastic Pooling for Regularization of Deep Convolutional Neural Network. arXiv 2013, arXiv:1301.3557v1. [Google Scholar]
  55. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [Green Version]
  56. Garrett, D.; Peterson, D.A.; Anderson, C.W.; Thaut, M.H. Comparison of Linear, Nonlinear, and feature Selection Methods for EEG Signal Classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2003, 11, 2. [Google Scholar] [CrossRef] [PubMed]
  57. Eberhart, R.; Kennedy, J. A New Optimizer Using Particle Swarm Theory. In Proceedings of the Sixth International Symposium on Micro Machine and Human Science, MHS’95, Nagoya, Japan, 4–6 October 1995. [Google Scholar]
  58. Kennedy, J.; Eberhart, R. A Discrete Binary Version of The Particle Swarm Algorithm. In Proceedings of the 1997 IEEE International Conference on Systems, Man and Cybernetics, Orlando, FL, USA, 12–15 October 1997. [Google Scholar]
  59. Zhang, H.; Zhu, Y.; Chen, H. Root growth model: A novel approach to numerical function optimization and simulation of plant root system. Soft Comput. 2013, 18, 521–537. [Google Scholar] [CrossRef]
  60. Naeem, M.; Brunner, C.; Leeb, R.; Graimann, B.; Pfurtscheller, G. Seperability of four-class motor imagery data using independent components analysis. J. Neural Eng. 2006, 3, 208–216. [Google Scholar] [CrossRef] [Green Version]
  61. Huang, G.-B.; Zhou, H.; Ding, X.; Zhang, R. Extreme Learning Machine for Regression and Multiclass Classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2011, 42, 513–529. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  62. Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  63. Narayanan, A.; Moore, M. Quantum-Inspired genetic algorithms. In Proceedings of the IEEE International Conference on Evolutionary Computation, Nagoya, Japan, 20–22 May 1996. [Google Scholar] [CrossRef]
  64. Han, K.-H.; Kim, J.-H. Genetic quantum algorithm and its application to combinatorial optimization problem. In Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512), La Jolla, CA, USA, 16–19 July 2000. [Google Scholar]
  65. Gandhi, V.; Prasad, G.; Coyle, D.; Behera, L.; McGinnity, T.M. Quantum Neural Network-Based EEG Filtering for a Brain–Computer Interface. IEEE Trans. Neural Networks Learn. Syst. 2013, 25, 278–288. [Google Scholar] [CrossRef] [PubMed]
  66. Lotte, F.; Guan, C. Regularizing Common Spatial Patterns to Improve BCI Designs: Unified Theory and New Algorithms. IEEE Trans. Biomed. Eng. 2010, 58, 355–362. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. The following figure shows the structure of a mechanical model on the left and the Bond Graph model on the right.
Figure 1. The following figure shows the structure of a mechanical model on the left and the Bond Graph model on the right.
Applsci 11 08761 g001
Figure 2. Overview of the Bond Graph algorithm.
Figure 2. Overview of the Bond Graph algorithm.
Applsci 11 08761 g002
Figure 3. The overview of the dimension reduction using the BGA algorithm and proposed methods for feature selection and feature extraction.
Figure 3. The overview of the dimension reduction using the BGA algorithm and proposed methods for feature selection and feature extraction.
Applsci 11 08761 g003
Figure 4. The example of random sampling for deep learning. (a) Image. (b) Filter. (c) Rectified linear. (d) Activation. (e) Probabilities. (f) Sampled Activation.
Figure 4. The example of random sampling for deep learning. (a) Image. (b) Filter. (c) Rectified linear. (d) Activation. (e) Probabilities. (f) Sampled Activation.
Applsci 11 08761 g004
Figure 5. The steps of sampling in different layers using functions.
Figure 5. The steps of sampling in different layers using functions.
Applsci 11 08761 g005
Figure 6. Reduction of channel dimensions based on the deep learning sampling model on a single filter bank or the entire filter banks.
Figure 6. Reduction of channel dimensions based on the deep learning sampling model on a single filter bank or the entire filter banks.
Applsci 11 08761 g006
Figure 7. (TopLeft) Convergence of GA. PSO and BGA (third model) on sphere function during 400 steps. (TopRight) Convergence of GA, PSO, and BGA (third model) on Ackley function during 400 steps. (DownLeft) Convergence of all of BGA models (four models (domains)) on sphere function during 400 steps. (DownRight) Convergence of all BGA models (four models) on Ackley function during 400 steps.
Figure 7. (TopLeft) Convergence of GA. PSO and BGA (third model) on sphere function during 400 steps. (TopRight) Convergence of GA, PSO, and BGA (third model) on Ackley function during 400 steps. (DownLeft) Convergence of all of BGA models (four models (domains)) on sphere function during 400 steps. (DownRight) Convergence of all BGA models (four models) on Ackley function during 400 steps.
Applsci 11 08761 g007
Figure 8. Reduction data based on pooling model on some subjects (1, 2, 4, 5, 7, 8) and filter banks with window size 4 and RF for 10 iterations.
Figure 8. Reduction data based on pooling model on some subjects (1, 2, 4, 5, 7, 8) and filter banks with window size 4 and RF for 10 iterations.
Applsci 11 08761 g008
Figure 9. Reduction data based on pooling model on average subjects with filter banks and window size 4 and RF for 10 iterations.
Figure 9. Reduction data based on pooling model on average subjects with filter banks and window size 4 and RF for 10 iterations.
Applsci 11 08761 g009
Figure 10. Reduction data based on pooling model on some subjects (1, 3, 4, 6, 7, 9) and filter banks with different window sizes for FB 5 and RF for 10 iterations.
Figure 10. Reduction data based on pooling model on some subjects (1, 3, 4, 6, 7, 9) and filter banks with different window sizes for FB 5 and RF for 10 iterations.
Applsci 11 08761 g010
Figure 11. Reduction data based on pooling model on average subjects with filter banks with different window sizes for FB 5 and RF for 10 iterations.
Figure 11. Reduction data based on pooling model on average subjects with filter banks with different window sizes for FB 5 and RF for 10 iterations.
Applsci 11 08761 g011
Figure 12. Reduction data based on pooling model on some subjects (2, 3, 5, 6, 8, 9) and filter banks with different window sizes for FB 6 and RF for 10 iterations.
Figure 12. Reduction data based on pooling model on some subjects (2, 3, 5, 6, 8, 9) and filter banks with different window sizes for FB 6 and RF for 10 iterations.
Applsci 11 08761 g012
Figure 13. Reduction data based on pooling model on average subjects with filter banks with different window sizes for FB 6 and RF for 10 iterations.
Figure 13. Reduction data based on pooling model on average subjects with filter banks with different window sizes for FB 6 and RF for 10 iterations.
Applsci 11 08761 g013
Figure 14. The best results of different methods for feature selection (along with feature extraction, feature reduction, channels, etc.) on two channels, 8 and 12, by combining different select bands.
Figure 14. The best results of different methods for feature selection (along with feature extraction, feature reduction, channels, etc.) on two channels, 8 and 12, by combining different select bands.
Applsci 11 08761 g014
Figure 15. Examining the best kappa different methods for selecting features (along with feature extraction, reducing features and channels, etc.) on three channels, 8, 10, and 12, by combining different selected bands.
Figure 15. Examining the best kappa different methods for selecting features (along with feature extraction, reducing features and channels, etc.) on three channels, 8, 10, and 12, by combining different selected bands.
Applsci 11 08761 g015
Figure 16. Examining the best kappa different methods for selecting features (along with feature extraction, reducing features and channels, etc.) on three channels with more than three channels by combining different selected bands and so on.
Figure 16. Examining the best kappa different methods for selecting features (along with feature extraction, reducing features and channels, etc.) on three channels with more than three channels by combining different selected bands and so on.
Applsci 11 08761 g016
Figure 17. Checking the best kappa results of different methods compared with the proposed method.
Figure 17. Checking the best kappa results of different methods compared with the proposed method.
Applsci 11 08761 g017
Figure 18. Converging of subjects 3 and 8 with algorithms on the seventh model scenario (CSP with three channels (PSO-CSP2, GA-CSP2, BGA-CSP2, and QGA-CSP2)).
Figure 18. Converging of subjects 3 and 8 with algorithms on the seventh model scenario (CSP with three channels (PSO-CSP2, GA-CSP2, BGA-CSP2, and QGA-CSP2)).
Applsci 11 08761 g018
Table 1. Define flow and effect variables in different domains.
Table 1. Define flow and effect variables in different domains.
Energy Domain f t e t
GeneralizedNameGeneralized flowGeneralized effort
Symbol f t e t
Linear mechanicalNameVelocityForce
Symbol v t F t
ElectromagneticNameCurrentVoltage
Symbol i t V t
Hydraulic pneumaticNameVolume flow ratePressure
Symbol φ t P t
Table 2. Parameters of the benchmark function.
Table 2. Parameters of the benchmark function.
FunctionDimentionIntitial RangeMinimum
Sphere35[−100, 100]D0
SumSquares35[−10, 10]D0
Rosenbrock35[−30, 30]D0
Schwefel 2.2235[−10, 10]D0
Rastrigin35[−10, 10]D0
Schwefel35[−500, 500]D−12,569.50
Ackley35[−32.768, 32.768]D0
Griewank35[−600, 600]D0
Table 3. Parameters of the algorithm setting.
Table 3. Parameters of the algorithm setting.
FunctionPopulationDetails
GA40Single point crossover (0.8), mutation rate (0.01), generation gap (0.9)
PSO40Inertia weight (0.6), cognitive and social components (1.8)
BGA40Selection model 1 to 4, an average of coefficients (1), initial element (2.5%)
Table 4. Comparison of 35-dimensional results of GA and PSO and all of the models of the BGA.
Table 4. Comparison of 35-dimensional results of GA and PSO and all of the models of the BGA.
Public FunctionParametersPSOGABGA_Model1BGA_Model2BGA_Model3BGA_Model4
SphereMean4.72 × 10−243.414.20 × 10+41.60 × 10+45.40 × 10−1101.56 × 10−90
Std5.64 × 10−213.745.44 × 10+31.11 × 10+42.00 × 10−1097.43 × 10−90
Min2.45 × 10−322.872.43 × 10+44.28 × 10+38.10 × 10−1201.40 × 10−102
Max2.24 × 10−176.584.89 × 10+44.82 × 10+41.00 × 10−1084.14 × 10−89
SumSquaresMean6.89 × 10−37.158.53 × 10+33.04 × 10+33.80 × 10−1113.92 × 10−90
Std9.46 × 10−32.021.40 × 10+31.35 × 10+31.70 × 10−1102.17 × 10−89
Min5.11 × 10−43.095.06 × 10+38.64 × 10+21.10 × 10−1232.50 × 10−100
Max4.39 × 10−211.201.07 × 10+45.86 × 10+39.30 × 10−1101.21 × 10−88
RosenbrockMean2.48 × 10+31.78 × 10+31.22 × 10+82.02 × 10+733.9133.93
Std1.07 × 10+41.07 × 10+33.21 × 10+72.17 × 10+70.050.06
Min58.938.28 × 10+24.82 × 10+71.16 × 10+733.7733.75
Max5.89 × 10+45.40 × 10+31.71 × 10+81.11 × 10+833.9733.98
Schwefel 2.22Mean0.192.043.71 × 10+114.21 × 10+33.80 × 10−567.44 × 10−46
Std0.350.281.33 × 10+122.29 × 10+41.40 × 10−551.59 × 10−45
Min0.011.581.33 × 10+234.775.31 × 10−619.97 × 10−52
Max1.602.595.91 × 10+121.27 × 10+57.65 × 10−555.47 × 10−45
RastriginMean1.06 × 10+220.528.36 × 10+24.67 × 10+20.000.00
Std18.952.871.02 × 10+21.00 × 10+20.000.00
Min59.3414.046.12 × 10+22.90 × 10+20.000.00
Max1.42 × 10+226.099.83 × 10+26.59 × 10+20.000.00
SchwefelMean−1.00 × 10+50−2.54 × 10+3−7.84 × 10+3−7.99 × 10+3−8.06 × 10+3−7.90 × 10+3
Std1.95 × 10+504.84 × 10+22.83 × 10+22.31 × 10+24.85 × 10+25.41 × 10+2
Min−7.40 × 10+50−3.82 × 10+3−8.65 × 10+3−8.44 × 10+3−9.30 × 10+3−1.01 × 10+4
Max−5.80 × 10+50−1.78 × 10+3−7.32 × 10+3−7.62 × 10+3−7.22 × 10+3−7.07 × 10+3
AckleyMean1.492.5319.3613.97−8.90 × 10−16−8.90 × 10−16
Std0.750.330.542.810.000.00
Min0.051.8118.257.93−8.90 × 10−16−8.90 × 10−16
Max3.713.0720.0118.15−8.90 × 10−16−8.90 × 10−16
GriewankMean0.101.363.64 × 10+21.49 × 10+20.000.00
Std0.160.0952.3870.580.000.00
Min0.011.202.91 × 10+232.940.000.00
Max0.831.544.68 × 10+22.81 × 10+20.000.00
Table 5. The details of data set IIa from BCI competition IV.
Table 5. The details of data set IIa from BCI competition IV.
FunctionDimensionInitial Range
Volunteers (subjects)99
Electrodes (channels)2222
Session–Trials1–2281–228
Imagination Time–Sample Rate4 s-250Afler 0.5 s,3.5 s-250
Each Trial Features1000875
ClassesLeft–right hand, foot, and tongueLeft–right hand
Filtering-9 filter banks
Order of filtering-100
Table 6. Selection features of all features based on methods and channels (C3 = 8, CZ = 10, and C4 = 12).
Table 6. Selection features of all features based on methods and channels (C3 = 8, CZ = 10, and C4 = 12).
Channels/Type FeaturesTime and Frequency FormulasAutoregressionCoefficients WaveletFast Fourier Transformation
Selected FeaturesAll FeaturesSelected FeaturesAll FeaturesSelected FeaturesAll FeaturesSelected FeaturesAll Features
Channel C38/12206102020030200
Channel CZ10/10206102020030200
Channel C412/15206102020030200
Table 7. Feature selection accuracy of PSO, GA, BGA, and QGA with 100 features, FB2, and ELM for 100 iterations.
Table 7. Feature selection accuracy of PSO, GA, BGA, and QGA with 100 features, FB2, and ELM for 100 iterations.
SubjectPSOGABGAQGA
Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12
160.359.162.259.060.060.559.159.8
260.958.960.360.361.760.358.859.5
360.561.661.362.461.963.059.861.7
465.564.563.764.364.265.063.263.8
560.162.358.962.860.862.458.362.0
663.060.161.759.661.960.161.560.4
761.164.461.465.060.866.660.565.0
859.159.158.660.160.360.458.859.7
959.861.660.862.060.162.560.261.1
Ave61.161.361.061.761.362.360.061.4
Table 8. Feature selection accuracy of PSO, GA, and BGA with 100 features, FB5, and ELM for 100 iterations.
Table 8. Feature selection accuracy of PSO, GA, and BGA with 100 features, FB5, and ELM for 100 iterations.
SubjectPSOGABGAQGA
Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12
160.558.761.659.460.661.160.458.1
260.759.559.960.060.960.158.359.7
360.362.562.061.860.963.360.561.7
465.065.262.564.963.864.460.263.3
559.661.959.563.560.863.958.962.3
662.861.063.259.462.062.862.460.0
760.764.661.866.062.267.460.265.4
858.558.859.359.560.561.158.458.0
959.461.660.762.461.164.661.861.3
Ave60.861.561.261.961.463.260.161.1
Table 9. Feature selection accuracy of PSO, GA, and BGA with 100 features, FB2 & 5, and ELM for 100 iterations.
Table 9. Feature selection accuracy of PSO, GA, and BGA with 100 features, FB2 & 5, and ELM for 100 iterations.
SubjectPSOGABGAQGA
FB2 & 5FB2 & 5FB2 & 5FB2 & 5
Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12
161.260.462.5262.0961.760.461.961.4
262.659.859.5359.4461.160.258.358.6
360.862.461.1764.5162.064.061.462.8
466.565.062.7661.4565.165.162.260.7
560.962.959.7762.4062.363.259.161.2
661.161.362.2760.0459.961.661.560.1
760.262.760.6863.8460.766.260.963.8
857.458.859.460.7860.762.456.858.7
959.561.860.6361.9660.863.861.061.5
Ave61.161.761.0061.8361.663.060.361.0
Table 10. Feature selection accuracy of PSO, GA, and BGA with 100 features, FB2 & 5 & 6, and ELM for 100 iterations.
Table 10. Feature selection accuracy of PSO, GA, and BGA with 100 features, FB2 & 5 & 6, and ELM for 100 iterations.
SubjectPSOGABGAQGA
FB 2 & 5 & 6FB 2 & 5 & 6FB 2 & 5 & 6FB 2 & 5 & 6
Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12
160.760.962.4263.3261.160.962.261.5
261.158.960.3159.6361.160.559.260.0
360.661.760.8662.2561.563.559.961.4
465.564.660.2762.4865.264.760.060.8
561.562.259.5263.0361.462.558.061.8
662.060.662.7759.1260.862.561.959.9
760.664.361.9065.6660.066.260.065.3
859.060.059.5659.6461.861.259.158.5
959.661.160.4362.3161.163.060.261.3
Ave61.261.660.8961.8661.662.860.161.2
Table 11. Feature selection accuracy of PSO, GA, and BGA with 100 features, FB2 & 5 & 6 & 9, and ELM for 100 iterations.
Table 11. Feature selection accuracy of PSO, GA, and BGA with 100 features, FB2 & 5 & 6 & 9, and ELM for 100 iterations.
SubjectPSOGABGAQGA
FB 2 & 5 & 6 & 9FB 2 & 5 & 6 & 9FB 2 & 5 & 6 & 9FB 2 & 5 & 6 & 9
Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12
160.060.659.7162.2560.661.058.359.9
258.760.359.6060.7562.760.858.659.4
360.361.661.1862.0761.163.061.061.1
461.362.359.7661.1063.962.858.560.2
559.963.161.5563.3261.164.160.161.8
663.560.362.3459.7662.064.762.259.7
761.365.861.7865.8060.366.760.766.2
860.760.260.3661.9962.661.959.558.5
960.561.359.5661.361.463.258.961.4
Ave60.761.760.6561.461.563.159.860.9
Table 12. The results of algorithms with four methods (TD, FD, wavelet, and autoregression) on a single channel (C3 = 8 or C4 = 12) with FB 2 & 5 (25).
Table 12. The results of algorithms with four methods (TD, FD, wavelet, and autoregression) on a single channel (C3 = 8 or C4 = 12) with FB 2 & 5 (25).
SubjectPSOGABGAQGA
Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12Channel 8Channel 12
162.7859.1663.1859.9862.0058.7962.4359.13
257.2460.2156.6061.0458.3059.5756.3859.32
360.5263.1362.2163.8161.8062.1061.0564.75
459.1860.0059.7660.8060.0059.8459.0560.03
559.5961.4860.0362.1057.7360.3857.7759.20
657.5757.9859.7558.3856.6557.8957.7057.94
756.9761.4656.8862.3456.9460.0156.6559.68
860.7159.9561.1262.0960.1259.9060.2960.66
964.0367.7769.1766.5664.4269.5166.5069.61
Ave59.8461.2460.9761.9059.7860.8959.7561.14
Table 13. The results of algorithms with four methods (TD, FD, wavelet, and autoregression) on two channels (C3 = 8 and C4 = 12) with FB 2 & 5 (25).
Table 13. The results of algorithms with four methods (TD, FD, wavelet, and autoregression) on two channels (C3 = 8 and C4 = 12) with FB 2 & 5 (25).
SubjectPSOGABGAQGA
M1M2M1M2M1M2M1M2
163.1763.7264.6265.4962.1762.8862.0362.83
260.4260.6761.6861.5659.2060.6959.4759.73
384.6484.6685.1085.1084.0583.7984.6184.31
461.4460.7362.4061.3360.3060.2360.1260.52
561.3160.1863.4459.9860.6160.0560.2559.79
662.4961.8562.2562.3461.2561.4861.1262.20
765.1459.1767.4863.1561.1158.4162.6559.60
861.5662.9961.7962.5060.4762.4862.0062.22
968.9468.3866.6867.6068.2166.2070.8266.39
Ave65.4664.7166.1665.4564.1564.0264.7864.18
Table 14. The results of algorithms with four methods (TD, FD, wavelet, and autoregression) on three channels (C3 = 8 and CZ = 10 and C4 = 12) with FB 2 & 5 (25).
Table 14. The results of algorithms with four methods (TD, FD, wavelet, and autoregression) on three channels (C3 = 8 and CZ = 10 and C4 = 12) with FB 2 & 5 (25).
SubjectPSOGABGAQGA
M1M2M1M2M1M2M1M2
163.1763.3663.0363.8862.1962.8962.0662.50
262.3061.5061.6561.1561.0959.8160.0860.02
384.3485.2584.8885.0382.5383.1182.8583.06
463.2063.0564.5263.8362.0961.6862.2661.66
559.3159.9259.4259.8559.9058.9660.5659.20
661.8162.3261.7261.9361.2360.9160.8860.76
761.1361.4061.8862.0660.7560.9661.4461.62
865.4164.3965.9165.3465.5563.9366.0663.66
976.0476.7576.3176.0674.2375.2275.1675.04
Ave66.3066.4466.5966.5765.5165.2765.7165.28
Table 15. The accuracy results of algorithms on dimension reduction and features extraction of CSP new model input matrix (using formulas) on two and three channels (C3 = 8 and CZ = 10 and C4 = 12).
Table 15. The accuracy results of algorithms on dimension reduction and features extraction of CSP new model input matrix (using formulas) on two and three channels (C3 = 8 and CZ = 10 and C4 = 12).
SubjectPSOGABGAQGA
M1M2M1M2M1M2M1M2
168.9475.2070.8874.1370.3278.1667.3971.14
267.6772.0368.372.3468.5075.0264.4068.12
381.6681.9671.1775.3789.5090.8567.3272.45
466.18773.1869.7374.8168.4175.2166.4570.92
565.2771.1268.8070.5168.2077.2365.4068.64
665.3674.6167.8373.5869.9077.9764.5270.74
768.9372.7070.7274.5470.7477.0968.3272.17
864.9174.5966.5773.2070.3279.2262.4370.90
976.9880.2869.5177.4782.1885.0267.3875.54
Ave69.5475.0769.2973.9973.1279.5365.9671.18
Table 16. Reduction data based on pooling model on filter banks with different window sizes for all filter banks with the same function in layers or randomly FB 5 and all of FBs and RF for 10 iterations.
Table 16. Reduction data based on pooling model on filter banks with different window sizes for all filter banks with the same function in layers or randomly FB 5 and all of FBs and RF for 10 iterations.
Subject FB-SameFB-Hybrid
AvgSize/FunctionF1F2F3F4F5R1_HFR2_HFR3_HFR4_HFR5_HF
Size 30.5450.5410.590.5950.590.5570.5450.5540.5590.554
Size 40.5230.560.5920.5890.6230.5350.5160.5220.5240.534
Size 50.5060.5230.5690.5770.590.5260.5180.5110.5120.511
Size 60.5370.5870.5880.6040.6170.5710.5550.550.5420.578
Size 70.5290.5670.5870.5810.6040.5750.5480.5470.5450.571
Table 17. The best results of different methods for feature selection (with feature extraction, feature reduction, and channels, etc.) on a single channel 8 or 12 by combining different select bands.
Table 17. The best results of different methods for feature selection (with feature extraction, feature reduction, and channels, etc.) on a single channel 8 or 12 by combining different select bands.
SubjectPSO
FB2 & 5 & 6
Ch8
PSO
FB2 & 5
Ch12
GA
FB5
Ch8
GA
FB5
Ch12
BGA
FB2 & 5
Ch8
BGA
FB5
Ch12
QA
FB2 & 5
Ch8
QA
FB2
Ch12
160.760.461.659.461.761.161.959.8
261.159.859.960.061.160.158.359.5
360.662.462.061.862.063.361.461.7
465.565.062.564.965.164.462.263.8
561.562.959.563.562.363.959.162.0
662.061.363.259.459.962.861.560.4
760.662.761.866.060.767.460.965.0
859.058.859.359.560.761.156.859.7
959.661.860.762.460.864.661.061.1
Ave61.261.761.261.961.663.260.361.4
Std1.751.751.32.371.432.111.741.8
The table is comparing first and second model scenarios using algorithms. FB I & J means that the new signal is the combination of filter bank I with J. Ch 12 and Ch 8 mean Channel 12 and Channel 8, respectively.
Table 18. The best results of different methods for feature selection (along with feature extraction, feature reduction, channels, etc.) on two channels 8 and 12 by combining different select bands.
Table 18. The best results of different methods for feature selection (along with feature extraction, feature reduction, channels, etc.) on two channels 8 and 12 by combining different select bands.
SubjectPSO
FM2
GA
FM2
BGA
FM2
PSO
CSP1
GA
CSP1
BGA
CSP1
RF & DFFSRF, 8–30 HzRQNN
163.1764.6262.1768.9470.8870.3263.6966.5261.11
260.4261.6859.2067.6768.3068.5061.9756.6261.11
384.6485.1084.0581.6671.1789.5091.0990.3679.17
461.4462.4060.3066.1869.7368.4161.7259.8360.42
561.3163.4460.6165.2768.8068.2063.4157.6371.53
662.4962.2561.2565.3667.8369.9066.1165.1961.11
765.1467.4861.1168.9370.7270.7459.5764.8658.33
861.5661.7960.4764.9166.5770.3262.8466.5767.36
968.9466.6868.2176.9869.5182.1884.4677.6979.36
Avg65.4666.1664.1569.5469.2973.1268.3267.2566.59
Std7.216.987.455.531.467.0710.6510.087.77
p-value0.0860.1910.0200.2850.3980.005-0.2420.199
p-value, the paired t-test between results of RF & DFFS [33], with fourth (four methods with two channels (FM2)) and sixth (CSP with two channels (CSP1)) model scenarios and some previous methods.
Table 19. Examining the best kappa different methods for selecting features (along with feature extraction, reducing features and channels, etc.) on three channels, 8, 10, and 12, by combining different selected bands.
Table 19. Examining the best kappa different methods for selecting features (along with feature extraction, reducing features and channels, etc.) on three channels, 8, 10, and 12, by combining different selected bands.
SubjectPSO
FM3
GA
FM3
BGA
FM3
PSO
CSP2
GA
CSP2
BGA
CSP2
CSP
ALL
CSP
3.0 Channels
CSP
8.55 Channels
126.7226.0624.3850.4048.2656.3281.9451.3883.32
22323.322.1844.0644.6850.0412.56.9420.82
370.569.7665.0663.9250.7481.793.0486.194.28
426.129.0424.1846.3649.6250.4245.8236.141.66
519.8418.8419.842.2441.0254.4627.766.9426.38
624.6423.4422.4649.2247.1655.9427.7622.2222.22
722.823.7621.545.449.0854.1859.7215.2656.94
828.7831.8231.149.1846.458.4494.4473.690.26
953.552.6248.4660.5654.9470.0483.3277.7687.50
Avg32.8833.1831.0150.1447.9859.0658.4741.8158.15
Std16.2615.8914.616.963.699.7329.4729.6429.44
p-value0.1120.1140.0760.1760.2700.0320.003-0.003
p-value, the paired t-test between results of CSP (C3, C4, CZ (all 3)) [39], with the fifth (four methods with three channels (FM3)) and seventh (CSP with three channels (CSP2)) model scenarios and some previous methods.
Table 20. Examining the best kappa different methods for selecting features (along with feature extraction, reducing features and channels, etc.) on three channels with more than three channels by combining different selected bands and so on.
Table 20. Examining the best kappa different methods for selecting features (along with feature extraction, reducing features and channels, etc.) on three channels with more than three channels by combining different selected bands and so on.
SubjectGA
FM2
BGA
FM2
PSO
CSP2
BGA
CSP2
CSP
ALL
CSP
8.55 Channels
CSP
13.22 Channels
oFBCSPsFBCSPeFBCSP
126.0624.3850.456.3281.9483.3283.3272.0072.1074.70
223.322.1844.0650.0412.520.8234.7238.9039.5041.60
369.7665.0663.9281.793.0494.2895.8282.2081.6082.40
429.0424.1846.3650.4245.8241.6644.4438.1038.4040.00
518.8419.842.2454.4627.7626.3830.5456.3059.2060.80
623.4422.4649.2255.9427.7622.2233.3425.5028.7030.90
723.7621.545.454.1859.7256.9469.4480.0083.0084.90
831.8231.149.1858.4494.4490.2694.4478.5078.6078.70
952.6248.4660.5670.0483.3287.5083.3274.7076.0077.20
Avg33.1831.0150.1559.0658.4858.1563.2660.7061.9063.50
Std15.8914.616.969.7329.4729.4425.84202020
Table 21. Checking the best kappa results of different methods compared with the proposed method [66].
Table 21. Checking the best kappa results of different methods compared with the proposed method [66].
MethodsS1S2S3S4S5S6S7S8S9AvgStdp-Value
CSP77.82.893.140.39.743.162.587.587.55632.090.013
GLRCSP72.216.787.534.711.130.662.587.576.453.228.470.028
CCSP172.220.887.513.91.430.662.587.576.45031.710.112
CCSP277.86.994.440.38.336.158.390.380.654.831.710.013
DLCSPauto77.82.893.140.313.943.163.987.587.556.631.460.011
DLCSPcv77.81.493.140.311.12562.587.573.652.532.130.046
DLCSPcvdiff77.81.493.140.311.12562.587.573.652.532.130.046
SSRCSP77.86.994.440.312.537.558.394.480.655.931.490.008
TRCSP77.88.393.141.72534.762.591.783.357.629.410.005
WTRCSP77.89.793.140.331.923.662.591.781.956.929.540.009
SRCSP77.826.493.133.326.427.856.991.784.757.627.870.004
SCSP81.912.59345.827.827.859.794.483.358.529.450.003
SCSP183.334.795.844.430.533.369.494.483.363.325.840.002
SCSP283.320.894.341.726.422.256.990.387.558.229.440.003
BGA-CSP256.35081.750.454.555.954.258.47059.19.730.032
CSP (C3,C4, CZ)51.386.9486.1036.16.9422.2215.2673.6077.7641.8129.64-
p-value, the paired t-test between results of CSP (C3, C4, CZ (all 3)) [39], with the seventh (CSP with three channels (CSP2)) model scenario and some previous methods.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Naebi, A.; Feng, Z.; Hosseinpour, F.; Abdollahi, G. Dimension Reduction Using New Bond Graph Algorithm and Deep Learning Pooling on EEG Signals for BCI. Appl. Sci. 2021, 11, 8761. https://0-doi-org.brum.beds.ac.uk/10.3390/app11188761

AMA Style

Naebi A, Feng Z, Hosseinpour F, Abdollahi G. Dimension Reduction Using New Bond Graph Algorithm and Deep Learning Pooling on EEG Signals for BCI. Applied Sciences. 2021; 11(18):8761. https://0-doi-org.brum.beds.ac.uk/10.3390/app11188761

Chicago/Turabian Style

Naebi, Ahmad, Zuren Feng, Farhoud Hosseinpour, and Gahder Abdollahi. 2021. "Dimension Reduction Using New Bond Graph Algorithm and Deep Learning Pooling on EEG Signals for BCI" Applied Sciences 11, no. 18: 8761. https://0-doi-org.brum.beds.ac.uk/10.3390/app11188761

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop