Next Article in Journal
Translation-Based Embeddings with Octonion for Knowledge Graph Completion
Previous Article in Journal
BenSignNet: Bengali Sign Language Alphabet Recognition Using Concatenated Segmentation and Convolutional Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Empirical Assessment of Performance of Data Balancing Techniques in Classification Task

1
Symbiosis Centre for Information Technology, Symbiosis International (Deemed University), Pune 411057, India
2
Faculty of Computers and Information, South Valley University, Qena 83523, Egypt
3
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
4
Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
*
Authors to whom correspondence should be addressed.
Submission received: 9 March 2022 / Revised: 6 April 2022 / Accepted: 8 April 2022 / Published: 13 April 2022

Abstract

:
Many real-world classification problems such as fraud detection, intrusion detection, churn prediction, and anomaly detection suffer from the problem of imbalanced datasets. Therefore, in all such classification tasks, we need to balance the imbalanced datasets before building classifiers for prediction purposes. Several data-balancing techniques (DBT) have been discussed in the literature to address this issue. However, not much work is conducted to assess the performance of DBT. Therefore, in this research paper we empirically assess the performance of the data-preprocessing-level data-balancing techniques, namely: Under Sampling (OS), Over Sampling (OS), Hybrid Sampling (HS), Random Over Sampling Examples (ROSE), Synthetic Minority Over Sampling (SMOTE), and Clustering-Based Under Sampling (CBUS) techniques. We have used six different classifiers and twenty-five different datasets, that have varying levels of imbalance ratio (IR), to assess the performance of DBT. The experimental results indicate that DBT helps to improve the performance of the classifiers. However, no significant difference was observed in the performance of the US, OS, HS, SMOTE, and CBUS. It was also observed that performance of DBT was not consistent across varying levels of IR in the dataset and different classifiers.

1. Introduction

Classification is a supervised machine learning (ML) technique used to predict class label of unseen data by building a classifier using historical data. Classification algorithms usually work with the assumption that dataset used to build the classifier is balanced. However, many datasets are highly imbalanced. An imbalanced dataset refers to a dataset where one class outnumbers the other classes in the dataset with respect to the target class variable. For example, consider a dataset that contains 1000 transactions, out of which 990 are nonfraudulent and only 10 are fraudulent transactions. This is a good example of a highly imbalanced dataset. Many such examples of imbalanced datasets in classification tasks are discussed in the literature. Some of them are software product defect detection [1], survival prediction of hepatocellular carcinoma patients [2], customer churn prediction [3], predicting freshmen student attrition [4], insurance fraud detection [5], and intrusion and crime detection [6]. When we build a classifier using a highly imbalanced dataset, the classifier is usually biased towards the majority class cases. It means that the performance of the classifier will be better at correctly predicting majority class cases than minority class cases. However, in real life we expect that a classifier should be unbiased and equally good at correctly predicting both minority and majority cases. Therefore, balancing imbalanced datasets is one of the most important activities because it helps to reduce bias in the model prediction, and thereby enhances the classifier’s performance.
To address the problem of imbalanced datasets in the classification task, several solutions have been proposed in the literature [7,8,9]. These solutions are broadly divided into several categories, namely data-preprocessing-level solutions, cost-sensitive learning methods, algorithm-level solutions, and ensemble methods. The data-preprocessing-level solutions are based on resampling of the original data. Resampling is performed before building the classifier. Therefore, resampling techniques are easy to implement and are independent of the classifier. Cost-sensitive learning approaches take into account the significance of misclassification of majority and minority class instances. Algorithm-level solutions either suggest a new algorithm or modify existing algorithms. Algorithm-level solutions are dependent on algorithms and require a detailed understanding of the algorithm for implementation. Therefore, algorithm-level solutions are less popular compared to resampling techniques. Ensemble solutions combine ensemble (bagging and boosting) models with resampling techniques or a cost-sensitive approach [7,8].
Though several solutions have been proposed in the literature to deal with the imbalanced dataset problem in classification tasks, there is a lack of research assessing the performance of DBT [7]. As a large number of solutions have been proposed in the literature, it is difficult to assess the performance of all proposed DBT. Therefore, in this study we limited the scope of our study to assess the performance of only resampling techniques used to balance the imbalanced dataset at the data-preprocessing level. The reason for choosing resampling techniques was that these techniques are very widely used in the literature to deal with imbalanced dataset problems in classification tasks.
The objectives of this study are: (1) to assess performance of DBT used to balance the imbalanced dataset; (2) to assess whether performance of DBT is independent of the level of imbalance ratio in the dataset; (3) to assess whether performance of DBT is independent of the classifier; (4) to assess whether DBT help to improve the performance of the classifiers.
The rest of the paper is organized as follows. Section 2 describes the theoretical background and related work. In Section 3, we discuss the experimental setup. The results of the experiment are analyzed and discussed in Section 4. Finally, the paper is concluded in Section 5.

2. Background and Related Work

Machine learning algorithms in classification tasks work with the assumption that the distribution of the data with respect to the targeted class variable is equal. However, most classification problems suffer from an imbalanced dataset. Therefore, dealing with imbalanced datasets is considered one of the most important activities in classification tasks. In order to deal with this problem, several solutions have been proposed in the literature [7,8,9].
The data-preprocessing-level solutions deal with imbalanced datasets by resampling of data [9,10]. Resampling is performed by OS minority cases or US majority cases, or by combining US and OS strategies. Using resampling techniques, we can balance the dataset at any desired level of imbalance ratio (IR). It is not necessary that the number of majority and minority cases is exactly the same. Resampling techniques are broadly divided into three categories, namely US, OS, and HS [9].
Under Sampling (US): In this method, the dataset is balanced by deleting majority class instances [10,11]. The instances are selected randomly and deleted from the dataset until the dataset is balanced. The weakness of this method is that we might lose some potentially useful information required for the learning process when we remove the instances from the majority class data.
Over Sampling (OS): In this method, the dataset is balanced by randomly OS minority class instances [12]. This method suffers from duplication of information due to OS of the minority class instances, which might lead to overfitting the model. However, this method does not lose any important information, unlike the RUS method.
Hybrid Sampling (HS): In this method, the dataset is balanced by combining the OS and US approaches [13,14].
Random Over Sampling Examples (RSE): This method is based on smoothed bootstrap-based techniques [15].
Synthetic minority over sampling technique (SMOTE): This method is an OS technique. In this method, instead of replicating minority class instances, new instances are generated synthetically [16]. It follows the following process to generate synthetic data: first, it randomly selects a minority class instance then finds its k-nearest neighbors. Then new instances are generated synthetically by interpolation between the selected minority class instance and its nearest neighbors. There are many literatures that talk about the other variants of SMOTE method such as SMOTEBoost [17], MSMOTE [18], and MWMOTE [19].
One-sided Selection Method (OSS): This method falls under the US techniques category. In this method, borderline and redundant majority class instances are removed [12].
Clustering-Based Under Sampling (CBUS): This method is a US strategy. In this method, US is achieved by creating clusters of majority class instances [20]. The number of clusters should be equal to the number of minority class instances. There are two clustering strategies. In the first strategy, the cluster centre represents the cluster, and in the second strategy, the nearest neighbor of the cluster center represents the cluster.
Clustering-Based Over Sampling and Under Sampling (CBOUS): This method is an extension of the CBUS. In this method, data balancing is achieved by combining US and OS approaches by creating clusters of majority and minority class instances [21].
The cost-sensitive methods are based on the assumption that the cost of misclassification of the minority class instances is higher than the cost of misclassification of the majority class instances. Cost-sensitive learning can be incorporated at the data-preprocessing level or at the algorithm level. Cost-sensitive methods are difficult to implement compared to the resampling technique, as detailed knowledge of the algorithm is required if it is to be incorporated into an algorithm. Several cost-sensitive solutions have been discussed in the literature [22,23,24,25,26].
The algorithm-level solutions deal with imbalanced dataset by proposing a new algorithm or modifying an existing algorithm. Some examples of algorithms modified for dealing with imbalanced dataset are discussed in existing literature [27,28,29,30].
A large number of ensemble solutions have been proposed in the literature to deal with the imbalanced dataset problem in the classification task [31,32,33,34]. In this approach, bagging and boosting algorithms are combined with resampling techniques or cost-sensitive learning methods.
Susan and Kumar reviewed state-of-the art data-balancing strategies used prior to the learning phase. The study discussed the strengths and weaknesses of the techniques and also reported intelligent sampling procedures based on their performance, popularity, and ease of implementation [35]. Halimu and Kasem proposed a data-processing sampling technique named Split Balancing (sBal) for ensemble methods. The proposed method creates multiple balanced bins and then multiple base learners are induced on balanced bins. It was found that the sBal method improves classification performance considerably compared to existing ensemble methods [36]. Tolba et al. used SMOTE, NearMiss, cost-sensitive learning, k-Means SMOTE, TextGan, LoRAS, SDS, Clustering-Based Under Sampling, and a VL strategy to balance the imbalanced dataset for automatic detection of online harassment [37]. Tao et al. proposed a novel SVDD boundary-based weighted over sampling approach for dealing with imbalanced and overlapped dataset classification issues [38]. Islam et al. proposed a K-nearest neighbor over sampling approach (KNNOR) for augmentation and to generate synthetic data points for the minority class [39].
Few papers talk about the performance of resampling techniques. The following are some observations based on these papers. The US technique works well when the number of minority class instances is large (in the hundreds). The OS technique works well when the number of minority class instances is small. When the data size is too large, a combination of SMOTE and US techniques works well. The study conducted by Lopez et al. [40] compared preprocessing techniques against cost-sensitive learning and it was found that there are no differences among the data-preprocessing techniques. Both preprocessing and cost-sensitive learning found good and equivalent approaches. The study conducted by Thammasiri et al. [4] tested three DBT, namely US, OS, and SMOTE, using four different classifiers. The study found that SVM combined with SMOTE gives better results. The study conducted by Burez et al. [41] found that US leads to improved prediction accuracy.
As the focus of this study was to assess the performance of the resampling techniques, we assessed the following most-used resampling techniques: (i) Under Sampling (US); (ii) Over Sampling (OS); (iii) Hybrid Sampling (HS); (iv) Clustering-Based Under Sampling (CBUS); (v) Random Over Sampling Examples (ROSE); (vi) Synthetic Minority Over Sampling technique (SMOTE) and (vii) Clustering-based US (CBUS) technique.

3. Experimental Setup and Datasets

We used six different classifiers, namely the Decision Tree (C4.5), k-Nearest Neighbor (kNN), Logistic Regression (LR), Naïve Bayes (NB), random forest (RF), and support vector machine (SVM), to assess the performance of DBT instead of using a single classifier. Considering six different classifiers will also help us to understand whether performance of DBT varies for different classifiers or it is the same. In this study, we used 25 different small datasets with varying levels of IR. All datasets were downloaded from the KEEL dataset repository [42]. Information about the datasets is given in Table 1. More details about the dataset are available at (https://sci2s.ugr.es/keel/imbalanced.php (accessed on 3 November 2021)). The last column in Table 1 is the Imbalance Ratio (IR), which indicates the proportion of the number of majority class instances to the minority class instances.
We built a total of 1050 (25 datasets times 7 techniques times 6 classifiers) classifiers using open source ‘R’ software. To build the classification model and assess its performance, the following processes were used: (i) Divide the dataset into training and test sets. The training set contains 80% of the data and the test set contains 20% of the data; (ii) apply DBT on the training dataset; (iii) build the classification model using the balanced training set; (iv) test the performance of the classification model using the test set. The performance of the classifier was measured using the area under ROC curve (AUC value) [43]. To train the classifiers, we used the default hyperparameters settings of the caret package in ‘R’ tool. No specific hyperparameter tuning was performed, as the objective of this study was not to improve the performance of the classifiers but to assess the performance of DBT. More details about the caret package are discussed by Max Kuhk et al. [44].
We used Friedman’s test statistics to compare the performances of different classifiers as they are, based on the average ranked performance of the DBT in the classification task for each dataset [45,46]. The Friedman test statistics helped us to understand whether there was significance difference in the DBT performance of different classifiers [46]. To report the differences in the performance of DBT, we applied a post hoc Nemenyi test [46,47]. It tells us which DBT differed significantly with respect to its performance in the classification task. We used Kendall’s test statistics [48] to test the agreement on rankings of DBT, based on the performance in the classification task, for varying levels of IR in the dataset. Kendall’s test was used to assess the performance of the imputation method [49]. If the value of Kendall’s ‘w’ is 1, it means that there is complete agreement over the ranking, and when it is 0, this means there is no agreement over the ranking.

4. Results and Discussion

4.1. Performance of DBT

Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 report the performances of six different classifiers, namely DT, kNN, LR, NB, RF, and SVM, for different data-balancing techniques, namely None, US, OS, HS, ROSE, and CBUS. The performance of the classifier is measured using the area under receiver operating characteristics curve, i.e., the AUC value. The first column in the table indicates the name of the imbalanced dataset. The second column indicates the performance of the classifier without balancing the imbalanced dataset (None strategy). Column 3 to Column 8 indicate the performance of the classifier for US, OS, HS, ROSE, SMOTE, and CBUS strategies. The mean rank of the classifiers based on their performance over 25 different datasets, along with Friedman test statistics, is also reported in each table in the last two rows. Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13 report the post hoc analysis using the Nemenyi multiple comparison test. The Nemenyi statistics are used to understand which DBT performance is significantly different.
The Friedman test statistics show that for all the classifiers, the ‘p’ value is less than 0.05. So, we can say that, statistically, there is a significance difference in the performance of DBT. As difference in the performance of DBT is significant, the Nemenyi test was then applied to find which DBT has significant difference in the performance. The following are our observations based on the Friedman statistics and Nemenyi post hoc analysis:
  • For the DT classifier, the performance of the None strategy was poor and significantly different than US, OS, HS, SMOTE, and CBUS. However, no difference in the performance was observed between None and ROSE strategies. Further, no significant difference was observed in the performance of US, OS, HS, SMOTE, and CBUS.
  • For the kNN classifier, the performance of the None strategy was poor and significantly different than OS, HS, ROSE, SMOTE, and CBUS. However, no difference in the performance was observed between the None and US strategy. Further, no significant difference was observed in the performance of OS, HS, ROSE, and CBUS. Significant difference was observed in the performance of US and OS strategies.
  • For the LR classifier, the performance of the None strategy was found to be poor and significantly different than US, OS, HS, and SMOTE. However, no significant difference in the performance was observed between None, ROSE, and CBUS. It was also observed that there was no difference in the performance of US, OS, HS, ROSE, SMOTE, and CBUS.
  • For the NB classifier, the performance of the None strategy was found to be poor and significantly different to US, SMOTE, and CBUS. However, no difference in the performance was observed between None, OS, HS, and ROSE. Further, no significant difference was observed in the performance of US, SMOTE, and CBUS.
  • For the RF classifier, it was found that the performance of the None Strategy was poor and significantly different to the US and CBUS strategies. However, no difference was observed in the performance of the None, OS, HS, ROSE, and SMOTE. Further, no significant difference was observed in the performance of US, HS, SMOTE, and CBUS.
  • For SVM, the performance of the None strategy was poor and significantly different than US, OS, HS, SMOTE, and CBUS. However, no significant difference was observed in the performance of None and ROSE. Further, no significant difference was observed in the performance of US, OS, HS, ROSE, SMOTE, and CBUS.
Therefore, from all the observations above, we can infer that: (i) the performance of the None and ROSE strategies are poor and significantly different than the others; (ii) no significant difference was observed in the performance of US, OS, HS, SMOTE, and CBUS strategies. Dealing with imbalanced datasets is a very common problem in classification tasks, and which DBT is more suitable to enhance the performance of the classifier is the most common question that needs to be answered. In this section, we have attempted to answer this question by applying data-preprocessing-level DBT to 25 different datasets using six different classifiers. From the results of the experiment and statistical analysis, we can infer the following: (i) balancing the imbalanced dataset certainly helps to improve the performance of the classifier; (ii) For DT classifier CBUS and US techniques give a better performance; (iii) For logistic regression SMOTE and OS techniques give a better performance; (iv) For Naïve Bayes classifier US and SMOTE give a better performance; (v) For random forest CBUS and US techniques give a better performance; (vi) For support vector machine HS and CBUS give better results; (vii) For kNN classifer OS and SMOTE give better results. However, it is important to note that every time we apply DBT on an imbalanced dataset, there is no guarantee that the same data will be generated or removed in order to balance it. Therefore, model performance and results could also vary slightly.

4.2. Performance of DBT across the Classifier

In this section, we assess whether performance of DBT is consistent across the classifiers or varies for different classifiers. To do this assessment, we used Kendall’s ‘w’ statistics. When ‘w’ is 1, then there is complete agreement over the ranks; and when ‘w’ is 0, then there is complete disagreement over the ranks. The ranks of DBT and results of the Kendall’s statistics are shown in Table 14. The results show that there is agreement over the ranking of data-balancing techniques. However, the concordance coefficient value (w) is 0.562, which indicates that there is partial agreement over the ranking. It is observed from Table 14 that there is consistency in the ranking only for None and ROSE techniques. However, there was no consistency in the performance of the US, OS, HS, SMOTE and CBUS techniques. The experimental results show that the performance of None and ROSE was poor and consistent. However, there was no consistency in the performance of US, OS, HS, SMOTE, and CBUS techniques, but its performance was better than ROSE and None.
In this section of the paper, we have attempted to answer the following question: Is performance of DBT consistent across classifiers? The results show that the performance of DBT is not consistent across different classifiers.

4.3. Performance of DBT for Varying Levels of IR in the Datase

Table 15, Table 16, Table 17, Table 18, Table 19 and Table 20 show ranks of DBT for six different classifiers for varying levels of IR in the dataset. In order to assess the performance of DBT for varying levels of IR, we used Kendall’s test statistics. The rows in the tables indicate the DB strategy and the columns indicate the range of IR in the dataset. The values in each cell indicate the rank of DBT for varying levels of IR in the dataset for a given classifier. The last row in each table shows the results of Kendall’s test statistics.
From the results of Kendall’s statistics, we can infer that:
  • For the DT classifier, there is no agreement over the rankings of DBT as the “p” value is greater than 0.05. This means that for the DT classifier, the performance of the data balancing techniques was not consistent for varying imbalance-ratio percentages.
  • For the kNN classifier, there is agreement over the rankings of the data-balancing techniques as the “p” value is less than 0.05. However, the concordance value (w) is 0.593, which indicates that there is partial agreement over the rankings. From the ranks of DBT, it is observed that the performance of None seemed consistent, whereas the performance of other DBT was different for varying imbalance-ratio percentages.
  • For the LR classifier, there is no agreement over the rankings of data-balancing techniques as the “p” value is greater than 0.05. This means that for the LR classifier, the performance of the data-balancing techniques was not consistent for varying imbalance-ratio percentages.
  • For the NB classifier, there is agreement over the rankings of data-balancing techniques as the “p” value is less than 0.05. However, the concordance value (w) is 0.686, which indicates that there is partial agreement over the rankings. From the ranks of DBT, it is observed that performance of the None was consistent, whereas the performance of other data-balancing techniques was different for varying imbalance-ratio percentages.
  • For the RF classifier, there is agreement over the rankings of data-balancing techniques as the “p” value is less than 0.05. However, the concordance value (w) is 0.539, which indicates that there is partial agreement over the rankings. From the ranks of DBT, it is observed that performance of the None, US, CBUS seemed consistent, whereas the performance of other data-balancing techniques was different for varying imbalance-ratio percentages.
  • For the SVM classifier, there is agreement over the rankings of data-balancing techniques as the “p” value is less than 0.05. However, the concordance value (w) is 0.564, which indicates that there is partial agreement over the rankings. From the ranks of DBT, it is observed that only the performance of the None was consistent, whereas the performance of other data-balancing techniques was different for varying imbalance-ratio percentages.
In this section of the paper, we have attempted to answer the following question: Is the performance of the DBT consistent for varying levels of IR in the dataset? The results of the experiment show that for all the classifiers, the performance of the None and ROSE strategy was poor and consistent for varying levels of IR in the dataset. However, performances of the other DBT were not consistent for varying levels of IR in the dataset.

5. Conclusions and Recommendation for Further Work

In this research paper, we have assessed the performance of six different DBT. The assessment was performed using six different classifiers and 25 different datasets that had different levels of IR. The performance of the DBT was assessed using the performance of classifiers, which was measured using the area under ROC curve. The experimental results show that (i) for all the six classifiers, the performance of the None and ROSE strategy was poor and significantly different than the others. It was also observed that there was no significant difference in the performance of the US, OS, HS, SMOTE, and CBUS techniques; (ii) performance of None and ROSE was poor and consistent across the classifiers. There was no consistency in the performance of US, OS, HS, SMOTE, and CBUS techniques, but its performance was better than the ROSE and None strategy; (iii) there was no agreement over the ranks of DBT for varying levels of IR in the dataset except for the None and ROSE strategy; (iv) DBT helps to improve the performance of the classifiers. However, performance of the ROSE was not significantly different than the None Strategy. Thus, from the experimental results, we may infer that DBT helps to improve the performance of the classifier in classification tasks. Further, performance of the DBT is independent of the classification algorithm and IR in the dataset. These inferences are drawn based on our experimental results.
As stated earlier in the introduction section, we assessed the performance of only data-preprocessing-level data-balancing techniques. However, there is a need to assess the performance of advanced DBT such as algorithm-level solutions, cost-based learning, and ensemble methods.

Author Contributions

Conceptualization, A.J. and S.M.M.; methodology, A.J. and S.M.M.; software, A.J. and S.M.M.; validation, A.J. and S.M.M.; formal analysis, A.J. and S.M.M.; investigation, A.J., H.E., F.K.K. and S.M.M.; resources, A.J., H.E., F.K.K. and S.M.M.; data curation, A.J. and S.M.M.; writing—original draft preparation, A.J., H.E., F.K.K. and S.M.M.; writing—review and editing, A.J., H.E., F.K.K. and S.M.M.; visualization, A.J., H.E., F.K.K. and S.M.M.; supervision, H.E., F.K.K. and S.M.M.; project administration, H.E. and F.K.K.; funding acquisition, H.E. and F.K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research project was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R300).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request.

Acknowledgments

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R300), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare that they have no conflict of interest to report regarding the present study.

References

  1. Siers, M.J.; Islam, M.Z. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf. Syst. 2015, 51, 62–71. [Google Scholar] [CrossRef]
  2. Santos, M.S.; Abreu, P.H.; Laencina, P.J.G.; Simão, A.; Carvalho, A. A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J. Biomed. Inform. 2015, 58, 49–59. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Zhu, B.; Baesens, B.; Broucke, S.K.L.M. An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf. Sci. 2017, 408, 84–99. [Google Scholar] [CrossRef] [Green Version]
  4. Thammasiri, D.; Delen, D.; Meesad, P.; Kasap, N. A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition. Expert Syst. Appl. 2014, 41, 321–330. [Google Scholar] [CrossRef] [Green Version]
  5. Hassan, A.K.I.; Abraham, A. Modeling insurance fraud detection using imbalanced data classification. In Proceedings of the 7th World Congress on Nature and Biologically Inspired Computing (NaBIC2015), Pietermaritzburg, South Africa, 18 November 2015; pp. 117–127. [Google Scholar]
  6. Hajian, S.; Ferrer, J.D.; Balleste, A.M. Discrimination prevention in data mining for intrusion and crime detection. In Proceedings of the IEEE Symposium on Computational Intelligence in Cyber Security (CICS), Paris, France, 11–15 April 2011; pp. 1–8. [Google Scholar]
  7. Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2012, 42, 463–484. [Google Scholar] [CrossRef]
  8. Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
  9. Kotsiantis, S.; Kanellopoulos, D.; Pintelas, P. Handling imbalanced datasets: A review. GESTS Int. Trans. Comput. Sci. Eng. 2006, 30, 1–12. [Google Scholar]
  10. Kotsiantis, S.; Pintelas, P. Mixture of Expert Agents for Handling Imbalanced Data Sets. Ann. Math. Comput. TeleInformatics 2003, 1, 46–55. [Google Scholar]
  11. Tahir, M.A.; Kittler, J.; Mikolajczyk, K.; Yan, F. A multiple expert approach to the class imbalance problem using inverse random under sampling. In Proceedings of the International Workshop on Multiple Classifier Systems, Reykjavik, Iceland, 10–12 June 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 82–91. [Google Scholar]
  12. Kubat, M.; Matwin, S. Addressing the curse of imbalanced training sets: One sided selection. In Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, USA, 8 July 1997; pp. 179–186. [Google Scholar]
  13. Cateni, S.; Colla, V.; Vannucci, M. A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing 2014, 135, 32–41. [Google Scholar] [CrossRef]
  14. Yeh, C.W.; Li, D.C.; Lin, L.S.; Tsai, T.I. A Learning Approach with Under and Over-Sampling for Imbalanced Data Sets. In Proceedings of the 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Kumamoto, Japan, 10–14 July 2016; pp. 725–729. [Google Scholar]
  15. Lunardon, N.; Menardi, G.; Torelli, N. ROSE: A Package for Binary Imbalanced Learning. R J. 2014, 6, 79–89. [Google Scholar] [CrossRef] [Green Version]
  16. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  17. Chawla, N.V.; Lazarevic, A.; Hall, L.O.; Bowyer, K.W. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, Cavtat-Dubrovnik, Dubrovnik, Croatia, 22–26 September 2003; pp. 107–119. [Google Scholar]
  18. Hu, S.; Liang, Y.; Ma, L.; He, Y. MSMOTE: Improving classification performance when training data is imbalanced. In Proceedings of the Second International Workshop on Computer Science and Engineering, Qingdao, China, 28–30 October 2009; pp. 13–17. [Google Scholar]
  19. Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE—Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 2012, 26, 405–425. [Google Scholar] [CrossRef]
  20. Lin, W.; Tsai, C.; Hu, Y.; Jhang, J. Clustering-based undersampling in class-imbalanced data. Inf. Sci. 2017, 409, 17–26. [Google Scholar] [CrossRef]
  21. Jadhav, A. Clustering Based Data Preprocessing Technique to Deal with Imbalanced Dataset Problem in Classification Task. In Proceedings of the IEEE Punecon, Pune, India, 30 November–2 December 2018; pp. 1–7. [Google Scholar]
  22. Fan, W.; Stolfo, S.J.; Zhang, J.; Chan, P.K. AdaCost: Misclassification cost-sensitive boosting. In Proceedings of the Sixteenth International Conference on Machine Learning, San Francisco, CA, USA, 27–30 June 1999; pp. 99–105. [Google Scholar]
  23. Zhou, Z.; Liu, X. Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem. IEEE Trans. Knowl. Data Eng. 2006, 18, 63–77. [Google Scholar] [CrossRef]
  24. Domingos, P. MetaCost: A general method for making classifiers cost-sensitive. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 15–18 August 1999; pp. 155–164. [Google Scholar]
  25. López, V.; Río, S.D.; Benítez, J.M.; Herrera, F. Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets Syst. 2015, 258, 5–38. [Google Scholar] [CrossRef]
  26. Sun, Y.; Kamel, M.S.; Wong, A.K.; Wang, Y. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit. 2007, 40, 3358–3378. [Google Scholar] [CrossRef]
  27. Chen, Z.Y.; Shu, P.; Sun, M. A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. Eur. J. Oper. Res. 2012, 223, 461–472. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Fu, P.; Liu, W.; Chen, G. Imbalanced data classification based on scaling kernel-based support vector machine. Neural Comput. Appl. 2014, 25, 927–935. [Google Scholar] [CrossRef]
  29. Kim, S.; Kim, H.; Namkoong, Y. Ordinal Classification of Imbalanced Data with Application in Emergency and Disaster Information Service. IEEE Intell. Syst. 2016, 31, 50–56. [Google Scholar] [CrossRef]
  30. Godoy, M.D.P.; Fernández, A.; Rivera, A.J.; Jesus, M.J.D. Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets. Pattern Recognit. Lett. 2010, 31, 2375–2388. [Google Scholar] [CrossRef]
  31. Seiffert, C.; Khoshgoftaar, T.M.; Hulse, J.V.; Napolitano, A. RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2010, 40, 185–197. [Google Scholar] [CrossRef]
  32. Wang, S.; Yao, X. Diversity analysis on imbalanced data sets by using ensemble models. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009; pp. 324–331. [Google Scholar]
  33. Barandela, R.; Valdovinos, R.M.; S´anchez, J.S. New applications of ensembles of classifiers. Pattern Anal. Appl. 2003, 6, 245–256. [Google Scholar] [CrossRef]
  34. Liao, J.J.; Shih, C.H.; Chen, T.F.; Hsu, M.F. An ensemble-based model for two-class imbalanced financial problem. Econ. Model. 2014, 37, 175–183. [Google Scholar] [CrossRef]
  35. Susan, S.; Kumar, A. The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art. Eng. Rep. 2021, 3, e12298. [Google Scholar] [CrossRef]
  36. Halimu, C.; Kasem, A. Split balancing (sBal)—A data preprocessing sampling technique for ensemble methods for binary classification in imbalanced datasets. In Computational Science and Technology; Springer: Singapore, 2021; pp. 241–257. [Google Scholar]
  37. Tolba, M.; Ouadfel, S.; Meshoul, S. Hybrid ensemble approaches to online harassment detection in highly imbalanced data. Expert Syst. Appl. 2021, 175, 114751. [Google Scholar] [CrossRef]
  38. Tao, X.; Zheng, Y.; Chen, W.; Zhang, X.; Qi, L.; Fan, Z.; Huang, S. SVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning. Inf. Sci. 2022, 588, 13–51. [Google Scholar] [CrossRef]
  39. Islam, A.; Belhaouari, S.B.; Rehman, A.U.; Bensmail, H. KNNOR: An oversampling technique for imbalanced datasets. Appl. Soft Comput. 2022, 115, 108288. [Google Scholar] [CrossRef]
  40. López, V.; Fernández, A.; Torres, J.G.M.; Herrera, F. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst. Appl. 2012, 39, 6585–6608. [Google Scholar] [CrossRef]
  41. Burez, J.; Poel, V.D. Handling class imbalance in customer churn prediction. Expert Syst. Appl. 2009, 36, 4626–4636. [Google Scholar] [CrossRef] [Green Version]
  42. Alcalá-Fdez, J.; Fernández, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F. KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
  43. Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef] [Green Version]
  44. Kuhn, M.; Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.; Kenke, B.; R Core Team. Classification and Regression Training. 2022. Available online: https://cran.r-project.org/web/packages/caret/caret.pdf (accessed on 3 November 2021).
  45. Friedman, M. A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 1940, 11, 86–92. [Google Scholar] [CrossRef]
  46. Brown, I.; Mues, C. An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 2012, 39, 3446–3453. [Google Scholar] [CrossRef] [Green Version]
  47. Nemenyi, P. Distribution-Free Multiple Comparisons. Ph.D. Thesis, University of Princeton, Princeton, NJ, USA, 1963. [Google Scholar]
  48. Kendall, M.G.; Smith, B.B. The Problem of m Rankings. Ann. Math. Stat. 1939, 10, 275–287. [Google Scholar] [CrossRef]
  49. Jadhav, A.; Pramod, D.; Ramanathan, K. Comparison of performance of data imputation methods for numeric dataset. Appl. Artif. Intell. 2019, 33, 913–933. [Google Scholar] [CrossRef]
Table 1. Dataset Information.
Table 1. Dataset Information.
Sr. No.Dataset NameNo. of SamplesNo. of FeaturesImbalance Ratio
Imbalance ratio from 1–10
1Pima76881.87
2ecoli133673.36
3segment02308196.02
4yeast3148488.1
5vowel0988139.98
Imbalance ratio from 10–20
6glass2214911.59
7yeast-1_vs_7459714.3
8ecoli4336715.8
9page-blocks-1-3_vs_44721015.86
10glass-0-1-6_vs_5184919.44
Imbalance ratio from 20–30
11yeast-1-4-5-8_vs_7693822.1
12yeast-2_vs_8482823.1
13yeast41484828.1
14poker-9_vs_72441029.5
15winequality-red-415991129.17
Imbalance ratio from 30–40
16yeast-1-2-8-9_vs_7947830.57
17winequality-white-9_vs_41681132.6
18yeast51484832.73
19winequality-red-8_vs_66561135.44
20ecoli-0-1-3-7_vs_2-6281739.14
Imbalance ratio from 40–50
21abalone-21_vs_8581840.5
22yeast61484841.4
23winequality-white-3_vs_79001144
24winequality-red-8_vs_6-78551146.5
25abalone-19_vs_10-11-12-131622849.69
Table 2. Performance of DBT for DT classifier.
Table 2. Performance of DBT for DT classifier.
DatasetNoneUSOSHSROSESMOTECBUS
Pima0.6352830.6767920.6807550.6713210.6801890.735660.701792
ecoli10.8333330.9411760.9235290.9411760.8921570.8607840.921569
segment00.9769230.9962030.9769230.9769230.8379750.9846150.980915
yeast30.8948860.8645830.8854170.8797350.7916670.8934660.883996
vowel00.9972070.9720670.9972070.9804470.9273740.9694290.963687
glass20.50.7179490.6025640.756410.7179490.6410260.628205
yeast-1_vs_70.50.5225490.5656860.5470590.5058820.5421570.921875
ecoli40.9841270.9365080.9841270.8591270.8921570.9841270.921569
glass-0-1-6_vs_510.985714110.95714310.957143
page-blocks-1-3_vs_410.994318110.89204510.982955
yeast-1-4-5-8_vs_70.50.6022730.5075760.5303030.5113640.553030.549242
yeast-2_vs_80.50.6847830.864130.8586960.8750.7065220.728261
yeast40.50.7555940.5912590.7685310.5052450.7272730.795105
winequality-red-40.50.7349510.5386730.524110.5690940.5595470.579935
poker-9_vs_70.50.8085110.5212770.9787230.8297870.5212770.904255
yeast-1-2-8-9_vs_70.50.5628420.5724040.566940.5027320.633880.618852
winequality-white-9_vs_40.50.92187510.5468750.9531250.968750.921875
yeast50.9340280.8940970.8750.9322920.9479170.8715280.883681
winequality-red-8_vs_60.50.5429140.6606790.6337330.6806390.6367270.715569
ecoli-0-1-3-7_vs_2-60.50.8240740.50.5277780.5092590.5092590.648148
abalone-21_vs_80.50.991150.5176990.9867260.9646020.5265491
yeast60.50.8956990.7073650.8519530.501730.8450320.733564
winequality-white-3_vs_70.50.8238640.5085230.5198860.7329550.5113640.681818
winequality-red-8_vs_6-70.50.5429140.6606790.6337330.6806390.6367270.715569
abalone-19_vs_10-11-12-130.50.6289310.5691820.526730.6116350.5597480.5
Average AUC Score0.650230.7928930.7284260.7599680.738810.7351390.793583
Mean Rank5.66(7)3.42(2)3.86(5)3.70(4)4.34(6)3.64(3)3.38(1)
Friedman test statisticsFriedman chi-squared = 21.081, df = 6, p-value = 0.001774
Table 3. Performance of DBT for kNN classifier.
Table 3. Performance of DBT for kNN classifier.
DatasetNoneUSOSHSROSESMOTECBUS
Pima0.6797170.6301890.6934910.6151890.6524530.6684910.692358
ecoli10.9333330.9411760.9607840.950980.950980.9274510.95098
segment00.9897760.9113920.9695230.9454720.7367090.9821810.903797
yeast30.8205490.8693180.8631630.8593750.8754730.8821020.878788
vowel00.9444440.9329610.9944130.9608940.9664810.944134
glass20.5128210.7948720.8846150.6538460.6666670.8974360.628205
yeast-1_vs_70.50.7039220.7215690.6686270.525490.7098040.515625
ecoli410.9603170.9682540.9682540.950980.9920630.95098
glass-0-1-6_vs_50.50.6142860.5285710.9285710.8714290.9857140.628571
page-blocks-1-3_vs_40.7886360.7806820.8545450.8488640.60.9715910.792045
yeast-1-4-5-8_vs_70.50.5606060.6704550.6363640.6174240.571970.583333
yeast-2_vs_80.8750.8097830.8315220.7826090.8695650.7934780.869565
yeast40.5982520.7370630.8062940.7923080.7835660.8993010.724825
winequality-red-40.50.5362460.5114890.6302590.5042070.5221680.625243
poker-9_vs_70.50.5212770.9787230.936170.9787230.9574470.521277
yeast-1-2-8-9_vs_70.50.6803280.5314210.5819670.5204920.5846990.510929
winequality-white-9_vs_40.50.8281250.593750.6250.843750.5781250.515625
yeast50.81250.9548610.9809030.9791670.9618060.9843750.956597
winequality-red-8_vs_60.50.5119760.6157680.5678640.6137720.5439120.61477
ecoli-0-1-3-7_vs_2-60.50.5277780.5185190.5185190.5462960.5185190.611111
abalone-21_vs_80.750.9159290.991150.9867260.9955750.9778760.938053
yeast60.6411270.8208110.906080.8956990.8541770.8277310.885319
winequality-white-3_vs_70.50.6761360.5994320.7102270.6164770.5710230.715909
winequality-red-8_vs_6-70.50.5119760.6157680.5678640.6137720.5439120.61477
abalone-19_vs_10-11-12-130.50.5613210.5487420.5283020.6477990.5974840.578616
Average AUC Score0.6538460.7317330.7655580.7655650.7505630.7795540.726057
Mean Rank5.92(7)4.66(6)2.80(1)3.74(4)3.66(3)3.24(2)3.98(5)
Friedman test statisticsFriedman chi-squared = 34.192, df = 6, p-value = 6.177 × 10−6
Table 4. Performance of DBT for LR.
Table 4. Performance of DBT for LR.
DatasetNoneUSOSHSROSESMOTECBUS
Pima0.6941510.7001890.7096230.7284910.6901890.7096230.690755
ecoli10.9705880.9117650.9215690.9117650.9313730.9215690.921569
segment00.9808180.9784810.9782860.9810130.9604670.9846150.991139
yeast30.8499050.8868370.8806820.8863640.8645830.884470.882576
vowel00.9416510.991620.9638420.9219430.8885790.9416510.96648
glass20.5512820.7179490.7692310.756410.7051280.7692310.74359
yeast-1_vs_70.50.8401960.8343140.8401960.7862750.8401960.78125
ecoli40.9920630.8353170.9920630.9920630.9313730.9920630.921569
glass-0-1-6_vs_50.9857140.90.9714290.5428570.9714290.9714290.871429
page-blocks-1-3_vs_40.60.9772730.9886360.7886360.7772730.7886360.9375
yeast-1-4-5-8_vs_70.50.5492420.5833330.6325760.6212120.5984850.560606
yeast-2_vs_80.750.6902170.8043480.8043480.8097830.8043480.722826
yeast40.5017480.6283220.6853150.6870630.6870630.6940560.674825
winequality-red-40.5016180.6768610.770550.7334950.7156960.7770230.76877
poker-9_vs_70.50.8404260.8617020.8191490.8617020.872340.776596
yeast-1-2-8-9_vs_70.50.7800550.6857920.6215850.6379780.6967210.647541
winequality-white-9_vs_40.50.906250.50.9843750.968750.50.78125
yeast50.8715280.9843750.9774310.9114580.9722220.9774310.982639
winequality-red-8_vs_60.5029940.614770.5678640.5888220.5528940.5768460.856287
ecoli-0-1-3-7_vs_2-60.5185190.8611110.5185190.5370370.5555560.5185190.583333
abalone-21_vs_810.9601770.9690270.9646020.9734510.991150.938053
yeast60.6428570.7913990.815620.80870.812160.8208110.789669
winequality-white-3_vs_70.50.6988640.6732950.6732950.8011360.5426140.65625
winequality-red-8_vs_6-70.5029940.614770.5678640.5888220.5528940.5768460.856287
abalone-19_vs_10-11-12-130.50.7547170.7185530.723270.6933960.723270.783019
Average AUC Score0.6743370.8036470.7883560.7771330.7889020.7789580.803432
Mean Rank5.64(7)3.70(4)3.64(2)3.68(3)4.28(6)3.06(1)4.00(5)
Friedman test statisticsFriedman chi-squared = 21.978, df = 6, p-value = 0.001222
Table 5. Performance of DBT for NB classifier.
Table 5. Performance of DBT for NB classifier.
DatasetNoneUSOSHSROSESMOTECBUS
Pima0.6930190.7084910.6951890.6374530.6597170.7179250.718491
ecoli10.7705880.950980.9176470.8843140.9176470.8843140.85098
segment00.977020.9542360.9706910.9758520.8443040.9782860.954236
yeast30.5293560.8925190.8167610.8366480.865530.8361740.859848
vowel00.9388580.9469270.9720670.9608940.9359090.974860.969274
glass20.8461540.7179490.7179490.7179490.7179490.7692310.717949
yeast-1_vs_70.50.7107840.5774510.5715690.5176470.5715690.859375
ecoli40.6250.9523810.750.851190.9176470.8750.85098
glass-0-1-6_vs_50.5142860.60.5142860.5142860.57142910.514286
page-blocks-1-3_vs_40.5886360.93750.9659090.9488640.6829550.9772730.865909
yeast-1-4-5-8_vs_70.50.5871210.50.5037880.5037880.50.636364
yeast-2_vs_80.50.635870.750.750.739130.750.516304
yeast40.50.6220280.6290210.6272730.6272730.6720280.643357
winequality-red-40.5902910.6252430.5932040.5867310.5754050.6059870.642395
poker-9_vs_70.50.9255320.5106380.5319150.9255320.9148940.787234
yeast-1-2-8-9_vs_70.50.6748630.50.5081970.5696720.5027320.520492
winequality-white-9_vs_410.9062510.968750.8750.9843750.859375
yeast50.50.8923610.8645830.8645830.8593750.8072920.946181
winequality-red-8_vs_60.514970.6826350.5828340.5678640.5379240.5828340.613772
ecoli-0-1-3-7_vs_2-60.50.9074070.50.50.50.50.5
abalone-21_vs_80.9734510.907080.9380530.9469030.9424780.9380530.933628
yeast60.50.8576370.8502220.8536830.7753340.7822540.793129
winequality-white-3_vs_70.7443180.7926140.7159090.8494320.7443180.6960230.596591
winequality-red-8_vs_6-70.514970.6826350.5828340.5678640.5379240.5828340.613772
abalone-19_vs_10-11-12-130.50.6462260.5581760.5487420.5141510.5251570.561321
Average AUC Score0.6328370.7886910.7189370.722990.7143210.7571640.73301
Mean Rank5.54(7)2.72(1)3.98(4)4.02(5)4.60(6)3.44(2)3.70(3)
Friedman test statisticsFriedman chi-squared = 27.272, df = 6, p-value = 0.0001288
Table 6. Performance of DBT for RF.
Table 6. Performance of DBT for RF.
DatasetNoneUSOSHSROSESMOTECBUS
Pima0.7018870.7317920.7201890.7040570.6863210.7284910.736792
ecoli10.90.9313730.9470590.9411760.9117650.9372550.95098
segment00.976923110.9923080.8658230.9923081
yeast30.8442230.9232950.8754730.8873110.8276520.9029360.904356
vowel00.9972070.9776540.9972070.9972070.9581010.9972070.980447
glass20.50.8333330.6282050.8076920.7179490.6282050.74359
yeast-1_vs_70.6607840.7450980.5833330.5774510.5058820.6549020.84375
ecoli40.9920630.9603170.9920630.9920630.9117650.9920630.912698
glass-0-1-6_vs_50.50.8428570.510.95714310.957143
page-blocks-1-3_vs_411110.9772730.90.988636
yeast-1-4-5-8_vs_70.50.5568180.5037880.5757580.5113640.5189390.556818
yeast-2_vs_80.8750.7010870.7445650.7445650.6521740.7336960.711957
yeast40.5482520.719580.5982520.6860140.5052450.7342660.804895
winequality-red-40.50.6139160.5032360.5129450.6336570.5322010.68123
poker-9_vs_70.50.9468090.50.50.9255320.50.595745
yeast-1-2-8-9_vs_70.5806010.6051910.5833330.5778690.5109290.5724040.610656
winequality-white-9_vs_40.50.9843750.50.50.98437510.84375
yeast50.8732640.9079860.8732640.9322920.9513890.8732640.958333
winequality-red-8_vs_60.6666670.7694610.6636730.6546910.7285430.6516970.751497
ecoli-0-1-3-7_vs_2-60.5092590.9166670.50.50.5277780.50.564815
abalone-21_vs_80.750.946903110.9867260.991150.982301
yeast60.7125560.8524470.7073650.8519530.5830450.7822540.816115
winequality-white-3_vs_70.6193180.6590910.50.50.7471590.5028410.576705
winequality-red-8_vs_6-70.6666670.7694610.6636730.6546910.7285430.6516970.751497
abalone-19_vs_10-11-12-130.50.5141510.50.5078620.5503140.5141510.545597
Average AUC Score0.6949870.8163870.7033870.7439160.7538580.7516770.790812
Mean Rank5.00(7)2.84(2)4.56(5)3.94(3)4.72(6)4.22(4)2.72(1)
Friedman test statisticsFriedman chi-squared = 27.41, df = 6, p-value = 0.0001213
Table 7. Performance of DBT for SVM.
Table 7. Performance of DBT for SVM.
DatasetNoneUSOSHSROSESMOTECBUS
Pima0.6752830.7007550.7051890.6913210.6907550.6857550.681887
ecoli10.9176470.8882350.9117650.9117650.9313730.9117650.911765
segment00.9923080.9924050.9974680.9974680.9630960.9910420.994937
yeast30.8461170.8787880.8806820.8825760.8664770.8806820.878788
vowel00.8833020.9636870.9610490.974860.8997520.9860340.96648
glass20.50.8205130.7692310.7820510.6153850.7692310.705128
yeast-1_vs_70.50.6441180.8519610.7686270.7803920.8401960.90625
ecoli40.9920630.9523810.9920630.9920630.9313730.9920630.992063
glass-0-1-6_vs_510.85714310.9571430.92857110.9
page-blocks-1-3_vs_40.60.7715910.9602270.9829550.7602270.9602270.823864
yeast-1-4-5-8_vs_70.50.5984850.5795450.6098480.6136360.5871210.568182
yeast-2_vs_80.8750.8750.7010870.7119570.864130.6956520.76087
yeast40.50.6283220.6888110.6958040.6853150.6940560.733566
winequality-red-40.50.686570.7737860.7318770.7108410.7770230.778479
poker-9_vs_70.50.8191490.8191490.8085110.8191490.8404260.829787
yeast-1-2-8-9_vs_70.50.6065570.6885250.6379780.573770.6857920.60929
winequality-white-9_vs_410.8906250.50.9843750.9531250.50.90625
yeast50.8732640.9513890.9774310.9722220.9704860.9774310.96875
winequality-red-8_vs_60.50.7634730.5678640.5858280.5409180.5678640.850299
ecoli-0-1-3-7_vs_2-60.50.620370.5185190.5370370.5833330.5185190.601852
abalone-21_vs_80.750.9955750.9867260.9867260.9867260.9867260.986726
yeast60.50.8225410.815620.810430.8190810.817350.80351
winequality-white-3_vs_70.50.5909090.6676140.6619320.5852270.5369320.627841
winequality-red-8_vs_6-70.50.7634730.5678640.5858280.5409180.5678640.850299
abalone-19_vs_10-11-12-130.50.7735850.7012580.702830.6933960.7059750.831761
Average AUC Score0.6761990.7942260.7833370.7985610.7722980.7790290.818745
Mean Rank5.90(7)3.84(5)3.48(3)3.16(1)4.60(6)3.62(4)3.40(2)
Friedman test statisticsFriedman chi-squared = 30.856, df = 6, p-value = 2.7 × 10−5
Table 8. Post hoc analysis using Nemenyi multiple comparison test for DT.
Table 8. Post hoc analysis using Nemenyi multiple comparison test for DT.
NoneUSOSHSROSESMOTE
US0.004607NANANANANA
OS0.0503430.991424NANANANA
HS0.0226970.9993090.999974NANANA
ROSE0.3173590.7415750.9864010.942875NANA
SMOTE0.016470.9998290.99982910.913892NA
CBUS0.00359610.9864010.9985210.7009120.999549
Table 9. Post hoc analysis using Nemenyi multiple comparison test for kNN.
Table 9. Post hoc analysis using Nemenyi multiple comparison test for kNN.
NoneUSOSHSROSESMOTE
US0.37547NANANANANA
OS6.82 × 10−60.037722NANANANA
HS0.006620.7415750.721504NANANA
ROSE0.0040730.6584190.7980571NANA
SMOTE0.0002330.2323290.9914240.9831780.993325NA
CBUS0.0251910.9244230.4594070.9997160.9985210.89016
Table 10. Post hoc analysis using Nemenyi multiple comparison test for LR.
Table 10. Post hoc analysis using Nemenyi multiple comparison test for LR.
NoneUSOSHSROSESMOTE
US0.025191NANANANANA
OS0.0183521NANANANA
HS0.02269711NANANA
ROSE0.2814610.9644040.9428750.958006NANA
SMOTE0.0004820.9428750.9644040.9508410.416662NA
CBUS0.1022180.9989750.9971370.9985210.9993090.721504
Table 11. Post hoc analysis using Nemenyi multiple comparison test for NB classifier.
Table 11. Post hoc analysis using Nemenyi multiple comparison test for NB classifier.
NoneUSOSHSROSESMOTE
US8.02 × 10−5NANANANANA
OS0.1407030.37547NANANANA
HS0.1636160.336191NANANA
ROSE0.7215040.0341690.9508410.964404NANA
SMOTE0.0105480.9024730.9750650.9644040.481232NA
CBUS0.0415880.6798610.9993090.9985210.7610610.999549
Table 12. Post hoc analysis using Nemenyi multiple comparison test for RF.
Table 12. Post hoc analysis using Nemenyi multiple comparison test for RF.
NoneUSOSHSROSESMOTE
US0.007451NANANANANA
OS0.9914240.072553NANANANA
HS0.5924480.5478040.950841NANANA
ROSE0.9993090.0341690.9999740.862863NANA
SMOTE0.8628630.2644390.997920.9993090.983178NA
CBUS0.0035960.9999950.0415880.4166620.0183520.176038
Table 13. Post hoc analysis using Nemenyi multiple comparison test for SVM.
Table 13. Post hoc analysis using Nemenyi multiple comparison test for SVM.
NoneUSOSHSROSESMOTE
US0.013214NANANANANA
OS0.0014550.997137NANANANA
HS0.0001480.9244230.998521NANANA
ROSE0.336190.8769540.5254940.217261NANA
SMOTE0.0035960.9998290.9999880.9891330.679861NA
CBUS0.0008450.99142410.9997160.4378650.999829
Table 14. Ranks of DBT for different classifiers.
Table 14. Ranks of DBT for different classifiers.
DTkNNLRNBRFSVMMean Rank
None7777777
US2641253.333333
OS5124533.333333
HS4435313.333333
ROSE6366665.5
SMOTE3212442.666667
CBUS1553122.833333
Kendall’s statisticsW = 0.562, chi-sq = 20.2, p-value = 0.00254
Table 15. Ranks of DBT for DT for varying levels of IR.
Table 15. Ranks of DBT for DT for varying levels of IR.
01–1011–2021–3031–4041–50Mean
None647776.2
US352513.2
OS11.56463.7
HS533644.2
ROSE774234.6
SMOTE21.55353.3
CBUS461122.8
Kendall’s statisticsW = 0.282, chi-sq = 8.46, p-value = 0.206
Table 16. Ranks of DBT for kNN for varying levels of IR.
Table 16. Ranks of DBT for kNN for varying levels of IR.
01–1011–2021–3031–4041–50Mean
None577776.6
US746465.4
OS12122.51.7
HS632.5443.9
ROSE452.5112.7
SMOTE214453.2
CBUS36562.54.5
Kendall’s statisticsW = 0.593, chi-sq = 17.8, p-value = 0.00679
Table 17. Ranks of DBT for LR for varying levels of IR.
Table 17. Ranks of DBT for LR for varying levels of IR.
01–1011–2021–3031–4041–50Mean
None657776.4
US346123.2
OS513.5654.1
HS433.533.53.4
ROSE76.5243.54.6
SMOTE121512
CBUS26.55264.3
Kendall’s statisticsW = 0.401, chi-sq = 12, p-value = 0.0615
Table 18. Ranks of DBT for NB classifier for varying levels of IR.
Table 18. Ranks of DBT for NB classifier for varying levels of IR.
01–1011–2021–3031–4041–50Mean
None777777
US223111.8
OS3.53.55.5333.7
HS55.55.5424.4
ROSE65.54665.5
SMOTE112552.8
CBUS3.53.51242.8
Kendall’s statisticsW = 0.686, chi-sq = 20.6, p-value = 0.00217
Table 19. Ranks of DBT for RF for varying levels of IR.
Table 19. Ranks of DBT for RF for varying levels of IR.
01–1011–2021–3031–4041–50Mean
None65746.55.7
US322212
OS26666.55.3
HS513643.8
ROSE77532.54.9
SMOTE444654.6
CBUS13112.51.7
Kendall’s statisticsW = 0.539, chi-sq = 16.2, p-value = 0.0129
Table 20. Ranks of DBT for SVM classifier for varying levels of IR.
Table 20. Ranks of DBT for SVM classifier for varying levels of IR.
01–1011–2021–3031–4041–50Mean
None767776.8
US5553.513.9
OS2163.543.3
HS134132.4
ROSE673665.6
SMOTE322553.4
CBUS441222.6
Kendall’s statisticsW = 0.564, chi-sq = 16.9, p-value = 0.00963
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jadhav, A.; M. Mostafa, S.; Elmannai, H.; Karim, F.K. An Empirical Assessment of Performance of Data Balancing Techniques in Classification Task. Appl. Sci. 2022, 12, 3928. https://0-doi-org.brum.beds.ac.uk/10.3390/app12083928

AMA Style

Jadhav A, M. Mostafa S, Elmannai H, Karim FK. An Empirical Assessment of Performance of Data Balancing Techniques in Classification Task. Applied Sciences. 2022; 12(8):3928. https://0-doi-org.brum.beds.ac.uk/10.3390/app12083928

Chicago/Turabian Style

Jadhav, Anil, Samih M. Mostafa, Hela Elmannai, and Faten Khalid Karim. 2022. "An Empirical Assessment of Performance of Data Balancing Techniques in Classification Task" Applied Sciences 12, no. 8: 3928. https://0-doi-org.brum.beds.ac.uk/10.3390/app12083928

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop