An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data
Abstract
:1. Introduction
- A novel gene selection approach is proposed based on an enhanced binary sand cat swarm optimization for high-dimensional biomedical data.
- A pinhole-imaging opposition-based learning (PIOBL) scheme is employed to boost the exploration and convergence characteristics of the BSCSO.
- The Crossover operator is fused with BSCSO to improve the search performance of the original BSCSO.
- An initial population strategy based on the Differential Expression (DE) analysis is conducted to identify differentially expressed genes (DEGs), which makes the proposed algorithm, called PILC-BSCSO, obtain higher classification accuracy with a better-initialized population.
- The suggested PILC-BSCSO approach is compared to 11 state-of-the-art methods on three benchmark microarray datasets and outperforms them all.
- The efficiency of the PILC-BSCSO approach was further assessed using a real Liver Hepatocellular Carcinoma (TCGA-HCC) data set, and PILC-BSCSO selects a subset of five marker genes while offering the best accuracy.
2. Materials and Methods
2.1. Sand Cat Swarm Optimization
Algorithm 1: Pseudo-code of the SCSO algorithm. | |
1. | Determine the number of population , and maximum number of iteration |
2. | Initialize the sand cat population |
3. | While do |
4. | Calculate the fitness function of each sand cat based on the objective function |
5. | Determine |
6. | Calculate when |
7. | For to do |
8. | Calculate and |
9. | For to do |
10. | Randomly selected using Roulette wheel selection |
11. | if then |
12. | |
13. | //update position using (5) |
14. | else |
15. | |
16. | |
17. | //update position using (8) |
18. | End if |
19. | End for |
20. | End for |
21. | |
22. | End while |
23. | Return |
2.2. Binary Sand Cat Swarm Optimization for Feature Selectıon
2.3. Pinhole Imaging Opposition-Based Learning
2.4. Single Point Crossover
2.5. The Proposed Algorithm
Algorithm 2: Pseudo-code of the proposed PILC-BSCSO algorithm for feature selection. | |
1. | Load Microarray dataset |
2. | Extracting DEG lists using Limma and obtaining shrinking dataset with D features |
3. | //Perform PILC-BSCSO algorithm |
4. | |
5. | with the binary value 1 |
6. | While do |
7. | Calculate the fitness function of each sand cat using SVM with a 10-fold CV |
8. | |
9. | when |
10. | For do |
11. | |
12. | For do |
13. | using Roulette wheel selection |
14. | if then |
15. | Update the search agent position using Equation (5) |
16. | else |
17. | Update the search agent position using Equation (8) |
18. | End if |
19 | |
20 | if then else |
21. | End for//j |
22. | if then |
23. | //Perform crossover operator |
24. | = Crossover (, ) |
25. | Calculate the fitness values of using SVM |
26. | if fitness value of is better than fitness values of and then |
27. | |
28. | else if the fitness value of is better than the fitness value of then |
29. | |
30. | End if |
31. | else |
32. | //Perform PIOBL operator |
33. | when |
34. | |
35. | Calculate the fitness values of using SVM |
36. | if the fitness value of is better than the fitness values of then |
37. | |
38. | = |
End if | |
End if | |
39. | End for//i |
40. | |
41. | End while |
42. | Return |
3. Results
3.1. Experimental Setup
3.2. Experimental Results on Three Benchmark Microarray Datasets
3.3. Experimental Results on Liver Hepatocellular Carcinoma TCGA
4. Discussion
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yan, C.; Ma, J.; Luo, H.; Patel, A. Hybrid binary Coral Reefs Optimization algorithm with Simulated Annealing for Feature Selection in high-dimensional biomedical datasets. Chemom. Intell. Lab. Syst. 2019, 184, 102–111. [Google Scholar] [CrossRef]
- Qtaish, A.; Albashish, D.; Braik, M.; Alshammari, M.T.; Alreshidi, A.; Alreshidi, E.J. Memory-Based Sand Cat Swarm Optimization for Feature Selection in Medical Diagnosis. Electronics 2023, 12, 2042. [Google Scholar] [CrossRef]
- Pashaei, E.; Ozen, M.; Aydin, N. Biomarker discovery based on BBHA and AdaboostM1 on microarray data for cancer classification. In Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), Orlando, FL, USA, 16–20 August 2016. [Google Scholar]
- Pashaei, E.; Pashaei, E. Gene Selection for Cancer Classification using a New Hybrid of Binary Black Hole Algorithm. In Proceedings of the 28th IEEE Conference on Signal Processing and Communications Applications (SIU2020), Gaziantep, Turkey, 5–7 October 2020. [Google Scholar]
- Pashaei, E. Mutation-based Binary Aquila optimizer for gene selection in cancer classification. Comput. Biol. Chem. 2022, 101, 107767. [Google Scholar] [CrossRef]
- Dabba, A.; Tari, A.; Meftali, S.; Mokhtari, R. Gene selection and classification of microarray data method based on mutual information and moth flame algorithm. Expert Syst. Appl. 2021, 166, 114012. [Google Scholar] [CrossRef]
- Yan, C.; Ma, J.; Luo, H.; Zhang, G.; Luo, J. A Novel Feature Selection Method for High-Dimensional Biomedical Data Based on an Improved Binary Clonal Flower Pollination Algorithm. Hum. Hered. 2019, 84, 34–46. [Google Scholar] [CrossRef]
- Hu, B.; Dai, Y.; Su, Y.; Moore, P.; Zhang, X.; Mao, C.; Chen, J.; Xu, L. Feature Selection for Optimized High-Dimensional Biomedical Data Using an Improved Shuffled Frog Leaping Algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018, 15, 1765–1773. [Google Scholar] [CrossRef] [PubMed]
- Pashaei, E.; Pashaei, E. Gene selection using hybrid dragonfly black hole algorithm: A case study on RNA-seq COVID-19 data. Anal. Biochem. 2021, 627, 114242. [Google Scholar] [CrossRef] [PubMed]
- Shreem, S.S.; Ahmad Nazri, M.Z.; Abdullah, S.; Sani, N.S. Hybrid Symmetrical Uncertainty and Reference Set Harmony Search Algorithm for Gene Selection Problem. Mathematics 2022, 10, 374. [Google Scholar] [CrossRef]
- Chaudhuri, A.; Sahu, T.P. A hybrid feature selection method based on Binary Jaya algorithm for micro-array data classification. Comput. Electr. Eng. 2021, 90, 106963. [Google Scholar] [CrossRef]
- Zhang, G.; Hou, J.; Wang, J.; Yan, C.; Luo, J. Feature Selection for Microarray Data Classification Using Hybrid Information Gain and a Modified Binary Krill Herd Algorithm. Interdiscip. Sci. Comput. Life Sci. 2020, 12, 288–301. [Google Scholar] [CrossRef]
- Pashaei, E.; Pashaei, E. Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data. Neural Comput. Appl. 2023, 35, 353–374. [Google Scholar] [CrossRef]
- Seyyedabbasi, A.; Kiani, F. Sand Cat swarm optimization: A nature-inspired algorithm to solve global optimization problems. Eng. Comput. 2022, 39, 2627–2651. [Google Scholar] [CrossRef]
- Kiani, F.; Anka, F.A.; Erenel, F. PSCSO: Enhanced sand cat swarm optimization inspired by the political system to solve complex problems. In Advances in Engineering Software; Elsevier: Amsterdam, The Netherlands, 2023; Volume 178, p. 103423. [Google Scholar]
- Yu, Y.; Li, Y.; Li, J.; Gu, X.; Royel, S. Nonlinear Characterization of the MRE Isolator Using Binary-Coded Discrete CSO and ELM. Int. J. Struct. Stab. Dyn. 2017, 18, 1840007. [Google Scholar] [CrossRef]
- Lu, W.; Shi, C.; Fu, H.; Xu, Y. A Power Transformer Fault Diagnosis Method Based on Improved Sand Cat Swarm Optimization Algorithm and Bidirectional Gated Recurrent Unit. Electronics 2023, 12, 672. [Google Scholar] [CrossRef]
- Zhao, W.; Zhang, Z.; Seyyedabbasi, A. Binary Sand Cat Swarm Optimization Algorithm for Wrapper Feature Selection on Biological Data. Biomimetics 2023, 8, 310. [Google Scholar] [CrossRef]
- Pashaei, E.; Pashaei, E. Training Feedforward Neural Network Using Enhanced Black Hole Algorithm: A Case Study on COVID-19 Related ACE2 Gene Expression Classification. Arab. J. Sci. Eng. 2021, 46, 3807–3828. [Google Scholar] [CrossRef] [PubMed]
- Yao, J.; Sha, Y.; Chen, Y.; Zhang, G.; Hu, X.; Bai, G.; Liu, J. IHSSAO: An Improved Hybrid Salp Swarm Algorithm and Aquila Optimizer for UAV Path Planning in Complex Terrain. Appl. Sci. 2022, 12, 5634. [Google Scholar] [CrossRef]
- Long, W.; Jiao, J.; Liang, X.; Wu, T.; Xu, M.; Cai, S. Pinhole-imaging-based learning butterfly optimization algorithm for global optimization and feature selection. Appl. Soft Comput. 2021, 103, 107146. [Google Scholar] [CrossRef]
- Shukla, A.K.; Singh, P.; Vardhan, M. A new hybrid wrapper TLBO and SA with SVM approach for gene expression data. Inf. Sci. 2019, 503, 238–254. [Google Scholar] [CrossRef]
- Yu, Y.; Rashidi, M.; Samali, B.; Yousefi, A.M.; Wang, W. Multi-Image-Feature-Based Hierarchical Concrete Crack Identification Framework Using Optimized SVM Multi-Classifiers and D–S Fusion Algorithm for Bridge Structures. Remote Sens. 2021, 13, 240. [Google Scholar] [CrossRef]
- Pashaei, E.; Yilmaz, A.; Aydin, N. A combined SVM and Markov model approach for splice site identification. In Proceedings of the 6th International Conference on Computer and Knowledge Engineering (ICCKE 2016), Mashhad, Iran, 20 October 2016. [Google Scholar]
- Pashaei, E.; Pashaei, E. Hybrid binary arithmetic optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical data. J. Supercomput. 2022, 78, 15598–15637. [Google Scholar] [CrossRef]
- Dramiński, M.; Koronacki, J. rmcfs: An R Package for Monte Carlo Feature Selection and Interdependency Discovery. J. Stat. Softw. 2018, 85, 1–28. [Google Scholar] [CrossRef]
- Kursa, M.B. Praznik: High performance information-based feature selection. SoftwareX 2021, 16, 100819. [Google Scholar] [CrossRef]
- Bai, Y.; Lu, D.; Qu, D.; Li, Y.; Zhao, N.; Cui, G.; Li, X.; Sun, X.; Liu, Y.; Wei, M.; et al. The Role of ANGPTL Gene Family Members in Hepatocellular Carcinoma. Dis. Markers 2022, 2022, 1844352. [Google Scholar] [CrossRef]
- Lu, D.; Bai, X.; Zou, Q.; Gan, Z.; Lv, Y. Identification of the association between HMMR expression and progression of hepatocellular carcinoma via construction of a co-expression network. Oncol. Lett. 2020, 20, 2645–2654. [Google Scholar] [CrossRef]
- Zhang, L.; Fan, Y.; Wang, X.; Yang, M.; Wu, X.; Huang, W.; Lan, J.; Liao, L.; Huang, W.; Yuan, L.; et al. Carbohydrate Sulfotransferase 4 Inhibits the Progression of Hepatitis B Virus-Related Hepatocellular Carcinoma and Is a Potential Prognostic Marker in Several Tumors. Front. Oncol. 2020, 10, 554331. [Google Scholar] [CrossRef] [PubMed]
- Yao, T.; Hu, W.; Chen, J.; Shen, L.; Yu, Y.; Tang, Z.; Zang, G.; Zhang, Y.; Chen, X. Collagen XV mediated the epithelial-mesenchymal transition to inhibit hepatocellular carcinoma metastasis. J. Gastrointest. Oncol. 2022, 13, 2472–2484. [Google Scholar] [CrossRef]
- Wu, M.; Lan, H.; Ye, Z.; Wang, Y. Hypermethylation of the PZP gene is associated with hepatocellular carcinoma cell proliferation, invasion and migration. FEBS Open Bio 2021, 11, 826–832. [Google Scholar] [CrossRef] [PubMed]
Dataset Name | No. of Samples | No. of Features | No. of Classes | Distribution of Class Label |
---|---|---|---|---|
Colon cancer | 62 | 2000 | 2 | 40, 22 |
CNS | 60 | 7129 | 2 | 39, 21 |
Breast | 97 | 24,481 | 2 | 51, 46 |
TCGA-LIHC | 421 | 56,602 | 2 | 371, 50 |
Dataset Name | All Features | DEGs | mRMR (50) | mRMR (100) | mRMR (200) | mRMR (300) | MCFS (50) | MCFS (100) | MCFS (200) | MCFS (300) |
---|---|---|---|---|---|---|---|---|---|---|
Colon cancer | 83.87 | 85.48 | 80.64 | 83.87 | 83.87 | 80.64 | 79.03 | 88.70 | 85.483 | 88.70 |
CNS | 68.33 | 90 | 60 | 6333 | 7833 | 68.33 | 81.66 | 0.85 | 91.66 | 93.33 |
Breast | 67.01 | 75.25 | 76.28 | 78.35 | 78.35 | 79.38 | 72.16 | 76.28 | 87.62 | 89.69 |
Dataset | Metrics | Accuracy | #Genes | ||||||
---|---|---|---|---|---|---|---|---|---|
BSCSO | GA | PSO | PILC-BSCSO | BSCSO | GA | PSO | PILC-BSCSO | ||
Colon | AVG | 97.63 | 91.311 | 94.35 | 99.22 | 8.33 | 133.6 | 70.4 | 15 |
best | 100 | 93.54 | 98.38 | 100 | 6 | 113 | 50 | 10 | |
worst | 93.81 | 85 | 83.87 | 96.9 | 9 | 145 | 89 | 23 | |
STDEV | 2.35 | 3.349 | 4.47 | 1.348 | 1.966 | 12.91 | 15.51 | 5.244 | |
t-test (p-value) | 0.0195 | 0.0066 | 0.0519 | 0.0159 | 1.2259 × 10⁻⁵ | 0.0022 | |||
CNS | AVG | 99.34 | 98.332 | 99.16 | 100 | 33.25 | 100.5 | 73.4 | 16.25 |
best | 100 | 100 | 100 | 100 | 14 | 45 | 54 | 13 | |
worst | 98.49 | 95 | 98.333 | 100 | 59 | 144 | 90 | 22 | |
STDEV | 0.755 | 2.041 | 0.914 | 0 | 18.76 | 42.914 | 13.29 | 4.0311 | |
t-test (p-value) | 0.07198 | 0.0622 | 0.0755 | 0.0479 | 0.00808 | 0.00118 | |||
Breast | AVG | 91.819 | 91.06 | 96.2 | 96.38 | 11.4 | 62 | 58 | 26.4 |
best | 97.926 | 95.87 | 100 | 100 | 5 | 56 | 52 | 15 | |
worst | 88.7533 | 84.53 | 93.81 | 93.98 | 16 | 66 | 65 | 40 | |
STDEV | 3.808 | 4.36 | 2.61 | 2.5 | 4.722 | 4.32 | 5.09 | 12.30 | |
t-test (p-value) | 0.00097 | 0.008246 | 0.6330 | 0.00730 | 0.00036 | 0.00047 |
Methods | High Dimensional Biomedical Datasets | |||||
---|---|---|---|---|---|---|
Colon Cancer | CNS | Breast | ||||
∣#G∣ | ACC | ∣#G∣ | ACC | ∣#G∣ | ACC | |
PILC-BSCSO | 15 | 99.22 | 16.25 | 100 | 26.4 | 96.38 |
BMSCSO [2] | 997.80 | 93.33 | - | - | - | - |
mRMR-MBAO [5] | 16.11 | 95.74 | 21.37 | 88.57 | 23.58 | 89.12 |
SU-RSHSA [10] | 7.59 | 93.17 | 13.15 | 89.36 | 18.31 | 80.40 |
mRMR-DBH [9] | 12 | 97.02 | 39.75 | 97.19 | 14 | 90.21 |
IBCFPA [7] | 25.90 | 92.16 | 25.2 | 84.82 | - | - |
MIM-MFO [6] | 24.25 | 99.19 | 17 | 85.00 | 22.50 | 84.11 |
BCROSAT [1] | 20.5 | 92.31 | 21.40 | 82.00 | - | - |
ISFLA [8] | 37.1 | 89.56 | 41.1 | 77.46 | - | - |
TOPSIS-Jaya [11] | 18.90 | 97.76 | 8.7 | 96.22 | - | - |
IG-MBKH [12] | 17.10 | 96.47 | 14.70 | 90.34 | - | - |
mRMR-BCOOT-CSA [13] | 8.75 | 94.75 | 7 | 93.22 | 15 | 95.54 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pashaei, E. An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data. Bioengineering 2023, 10, 1123. https://0-doi-org.brum.beds.ac.uk/10.3390/bioengineering10101123
Pashaei E. An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data. Bioengineering. 2023; 10(10):1123. https://0-doi-org.brum.beds.ac.uk/10.3390/bioengineering10101123
Chicago/Turabian StylePashaei, Elnaz. 2023. "An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data" Bioengineering 10, no. 10: 1123. https://0-doi-org.brum.beds.ac.uk/10.3390/bioengineering10101123