Next Article in Journal
The Sources of Polycyclic Aromatic Hydrocarbons in Road Dust and Their Potential Hazard
Previous Article in Journal
Towards Sustainable Digital Agriculture for Smallholder Farmers: A Systematic Literature Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantitative Analysis and Prediction of Academic Performance of Students Using Machine Learning

1
School of Fine Art, Shandong University of Technology, Zibo 255000, China
2
School of Civil Engineering and Geomatics, Shandong University of Technology, Zibo 255000, China
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(16), 12531; https://0-doi-org.brum.beds.ac.uk/10.3390/su151612531
Submission received: 24 July 2023 / Revised: 12 August 2023 / Accepted: 16 August 2023 / Published: 18 August 2023
(This article belongs to the Section Sustainable Education and Approaches)

Abstract

:
Academic performance evaluation is essential to enhance educational affection and improve educational quality and level. However, evaluating academic performance is difficult due to the complexity and nonlinear education process and learning behavior. Recently, machine learning technology has been adopted in Educational Data Mining (EDM) to predict and evaluate students’ academic performance. This study developed a quantitative prediction model of academic performance and investigated the performance of various machine learning algorithms and the influencing factors based on the collected educational data. The results conclude that machine learning provided an excellent tool to characterize educational behavior and represent the nonlinear relationship between academic performance and its influencing factors. Although the performance of various methods has some differences, all could be used to capture the complex and implicit educational law and behavior. Furthermore, machine learning methods that fully consider various factors have better prediction and generalization performance. In order to characterize the educational law well and evaluate accurately the academic performance, it is necessary to consider as many influencing factors as possible in the machine learning model.

1. Introduction

Education plays a significant and indispensable role for individuals and human society. Meanwhile, education also affects a country’s and society’s sustainable development [1,2]. Therefore, education has received extensive attention worldwide from society, the state, and the family. In order to improve the quality and level of education, each country has put forward different educational policies and invested a lot of educational funds. How to evaluate education is very critical to the formulation, implementation, and reform of educational policies. The academic performance of students is one of the most essential components of the education evaluation. The academic performance of the student has gained significant attention in past decades. However, evaluating students’ academic performance is challenging due to the complexity of education, which is affected by various factors such as social status, school, family, friends, instructors, classroom environment, cultural background, etc. Moreover, the influencing factors of academic performance depend on the person and their education environment. For example, Subasi et al. (2023) and Doyumğaç et al. (2021) investigated and studied the students’ perceptions of online/distance education through online photovoice methodology during the COVID-19 pandemic [3,4]. Scientific and reasonable evaluation of academic performance is essential to students, schools, and the whole education system.
The early identification of students at risk of underperformance, provision of assistance, and enhancement of the quality of teaching and learning necessitates using quantitative analysis and assessment for academic performance. In order to reveal the complex relationship between academic performance and its influencing factors, data mining was applied to the education system, and Education Data Mining (EDM) was defined and developed in the last decades [5,6]. EDM, a specialized branch within the field of data mining, focuses on analyzing educational data to gain insights into teaching and learning patterns in the education system. Furthermore, educational data mining (EDM) has emerged as a valuable tool for uncovering hidden patterns in educational data, assessing academic performance, and enhancing students’ academic achievements [7]. EDM has been extensively employed to explore the affective aspects of academic performance and capture educational behaviors during teaching and learning processes [8,9,10,11]. Baker (2010) provided an overview of EDM, encompassing five research directions: relationship identification, model discovery, data distillation for human judgment, clustering analysis, and prediction modeling [11]. Among the five directions of EDM, prediction (evaluation) of academic performance is the most popular application [12,13,14,15]. Thai-nghe et al. (2010) proposed an innovative recommendation system that evaluates students’ academic performance by extracting education-related data [16]. They developed a personalized multiple linear regression model to forecast student achievement in American universities [17]. On the other hand, Polyzou and Karypis (2016) devised a distributed linear model based on historical grades to evaluate students’ academic progress [18]. Bao et al. (2023) adopted the logistic binary regression to predict the student’s academic performance [19]. However, it is difficult to characterize and predict academic performance using traditional mathematical and statistical methods. Machine learning has been commonly used to predict and evaluate educational performance in EDM.
With artificial intelligence and computer science advancements, various machine learning techniques have been widely employed in engineering systems [20,21,22,23,24]. Additionally, machine learning has garnered significant attention from educational data mining (EDM) and is extensively used for predicting students’ academic performance. For instance, decision tree analysis has been utilized to examine the correlation between academic performance and its influencing factors, enabling the prediction of academic success based on educational data [25]. Bhardwaj and Pal (2011) employed the Naive Bayes classification algorithm to forecast students’ academic performance [26]. Arsad et al. (2012) applied neural network algorithms within EDM to predict final grades by leveraging first-year grades as input [27]. Hamsa et al. (2016) developed a hybrid model combining fuzzy genetic algorithms with decision trees to predict students’ academic performance [28]. MohammadNoor et al. (2020) proposed a bagging ensemble model with high accuracy for predicting students’ academic success [29]. On the other hand, Zhang et al. (2020) utilized tree-based machine learning models to evaluate bachelor students’ academic performance within an engineering department in China [30]. In order to enhance predictive models further, Juan and Héctor (2022) introduced a stacking ensemble technique that achieves high accuracy in identifying student dropout risks [31]. The random forests classifiers are constructed to predict the academic success and major of university students [32]. Vidhya and Vadivu (2021) devised a novel two-level ensemble classification model to analyze and categorize student data [33]. Meanwhile, Sinem and Sevdaa (2022) developed an innovative hybrid ensemble learning algorithm to forecast students’ future academic achievements [34]. Bansal et al. developed an automated student performance estimation system based on deep machine learning approaches and students’ academic records [35]. The machine learning algorithm is an essential component of EDM and is paid more attention during educational evaluation. As many machine learning algorithms continue to emerge and are applied to EDM, the performance of EDM has been significantly improved and enhanced. However, the characteristics, scope of application, and data requirements of various machine learning algorithms are very different, and understanding their advantages and disadvantages is very important to perfect and improve EDM. In addition, it is also crucial to use different machine learning algorithms to analyze and evaluate the educational process to characterize the educational behavior and enhance the educational level and quality.
This study used various machine learning methods to capture the complex relationship between academic performance and their influencing factor. The performance of various machine learning has been investigated. Furthermore, feature importance and sensitive analysis were utilized to analyze and evaluate the educational process. The remainder structure of this study is as follows. Firstly, academic performance and various machine learning algorithms were introduced and reviewed in Section 2. Secondly, the idea, theory, framework, and procedure of the machine learning-based academic performance model are presented briefly in Section 3. Then, the machine learning-based academic performance model was applied to the collected educational dataset, and the academic performance, features of various factors, and their comparison were investigated and discussed in Section 4. Finally, some conclusions are drawn in Section 5.

2. Academic Performance Evaluation and Machine Learning

Academic performance evaluation is essential to EDM and critical to evaluating and improving educational quality. Characterizing educational behavior is the core content of the academic performance evaluation. Machine learning has been adopted to evaluate and predict students’ academic performance due to the limitations of traditional mathematical methods and the complex relationship between academic performance and its influencing factors.

2.1. Academic Performance Evaluation

Academic performance is critical to evaluate teaching efficiency and result. The primary purpose of academic performance evaluation is to improve student learning. The academic performance evaluation is also essential for adapting curriculum and instructional approaches and determining the effectiveness of programs and classroom practices. Evaluation is a process of gathering information from various sources that accurately characterize how well students are finishing the educational expectations for improving the educational strategy (Figure 1). However, it is not easy to evaluate students’ academic performance due to the complexity of the educational process. The Graded Point Average (GPA) is a commonly used index for evaluating students’ academic performance. In order to characterize and analyze the student behavior, statistical and regression methods were applied to predict the academic performance based on the records of the student. Singh et al. (2016) utilized a multiple regression analysis model to capture the complex relationship between students’ academic performance and its influencing factors, such as learning facilities, communication skills, and parent guidance, and predict the performance of students [36]. Recently, EDM has been developed to evaluate and predict educational performance. Academic performance prediction is one of the most components of EDM [37]. Moreover, machine learning is a popular technology for predicting academic performance. EDM has adopted Various machine learning methods to evaluate students’ academic performance.

2.2. Machine Learning

Machine learning is a promising tool for characterizing the complex and nonlinear relationship between input and output (Figure 2). Machine learning has obtained extensive application in various engineering and science fields. In this study, some popular machine learning approaches, such as neural networks, support vector machines, decision trees, gradient methods, etc., were utilized to quantify and analyze students’ academic performance of students.

2.2.1. Support Vector Machine (SVM)

The SVM was first developed to simulate nonlinear relationships based on the structural risk minimization principle [38]. The advantage of SVM is that it is a uniquely solvable quadratic optimization problem during training. The SVM transforms the input space into a high-dimensional space using a nonlinear mapping based on an internal integral function and then looks for a nonlinear relationship between the input and output. The SVM has theoretical support for the problems with small training samples, high dimensionality, nonlinear and local optimal, and can find the global optimal solution. Various applications, such as pattern recognition, nonlinear regression, etc., have proved the generalization ability of support vector machines from experience. The SVM model could be represented as follows.
y x = k = 1 n α k K ( x , x k ) + b
where n denotes the number of the training samples, αk denotes Lagrange multipliers, b denotes the scalar threshold, α k and b could be obtained according to the SVM algorithm. K ( x , x k ) denotes the kernel function that the user can select. The corresponding literature can refer to the detailed SVM algorithm [39,40].

2.2.2. Artificial Neural Networks

Artificial Neural Networks (ANNs), a subset of machine learning and the core of deep learning algorithms [41], are inspired by the human brain and mimic how biological neurons transmit signals to each other. A typical ANN includes input, multiple hidden, and output layers. Figure 3 shows the architecture of the ANN. The first layer of the neural network is called the input layer and is used to receive various inputs from the outside. The final layer of the ANN is called the output layer and outputs the processing results. There are one or more layers between the input and output layers, called hidden layers, and most neural networks consist of hidden layers. Each layer consists of unit nodes (or neurons, perceptrons), which are connected to the unit nodes in the previous and subsequent layers. Almost all neural networks are fully connected; that is, the cell nodes of each layer are connected to the cell nodes of each layer on either side of it. The neural network assigns each connection a weight value that represents the influence of the output of the previous unit on the output of the next unit.
In order to produce the desired output, neural networks need to be trained and learned beforehand [42]. The learning of neural networks can be divided into supervised, unsupervised, and reinforcement learning. The most commonly used neural network training is supervised learning. In supervised learning, each training instance consists of an input value and an expected output value (also called a supervised label). The neural network obtains the trained output value based on the input value. The error between the output value and the expected value is fed back to the neural network layer by layer from the output layer. This feedback process is the reverse propagation process of the ANN. In the backpropagation process, the ANN algorithm corrects the weight of the connection between the cells according to the error, thus reducing the error between the output and the expectation. The key to the whole training is to set the correct values for the weights so that the neural network can obtain the expected output. A trained neural network can process new input data to generate the expected output value.

2.2.3. Decision Tree

A decision tree is a decision support model that uses a tree-like model of decisions and their possible outcomes, including resource costs, chance event outcomes, and utility. The decision tree is often considered a machine-learning algorithm [43]. The goal of a decision tree is to construct a mapping model that can determine the target variable based on the input variables. The decision tree consists of two models: classification tree, whose target variable is a set of discrete values, and regression tree, whose target variable is usually continuous (usually real). Figure 4 shows the structure of the decision tree. Decision trees include a variety of specific algorithms, such as ID3 Iterative binary classifier (ID3), Classification And Regression Tree (CART), Chi-square Automatic Interaction Detector (CHAID), and so on [44,45].

2.2.4. Ensemble Learning

In machine learning, the algorithm, which is not easy to select due to a lack of theoretical guidance, is significant to obtain a satisfactory result. Ensemble learning is a machine learning paradigm to produce a more robust model by combining multiple trained models (often called “weak learners”) (Figure 5). The weak learner performs poorly by themselves due to a high bias or too much variance. Ensemble methods could construct a stronger learner who performs better by reducing bias or variance. Ensemble learning methods include three main algorithms, i.e., bagging, boosting, and stacking. The bagging algorithm is a parallel learning process of homogenous weak learners independently of each other and combines them according to some deterministic average process. Boosting algorithm is a sequential learning process of homogenous weak learners in an adaptative way independently and combines them following a deterministic strategy. This study adopted AdaBoost, XGBoost, GradientBoost, and random forest to evaluate academic performance.

3. Theoretical Framework: Machine Learning-Based Prediction of Academic Performance

3.1. Main Idea

This study developed a machine learning-based theoretical framework to investigate the performance of various machine learning and analyze the influencing factors of academic performance to predict academic performance. Machine learning was used to characterize the complex relationship between academic performance and its influencing factors. The prediction model of academic performance was developed based on the above-obtained relationship. Machine learning could reveal and capture educational law and behavior using the corresponding data and provides an excellent tool to predict and investigate students’ academic performance. The machine learning-based prediction model of academic performance includes collecting academic data, academic performance presentation, and prediction of academic performance.

3.2. The Academic Performance Presentation Based on Machine Learning

Students’ academic performance has to do with the students, teachers, school environment, social status, classroom environment, etc. Because the relationship between academic performance and its influencing factors is complex and nonlinear, it is not easy to characterize educational law and evaluate students’ academic performance. This study employed machine learning to represent the complex relationship between academic performance and its influencing factors. The machine learning model of the academic performance of student ML(X) can be presented as follows.
ML ( X ) :   R N R
APS = ML(X)
where X = (x1, x2, …, xN), xi(i = 1, 2, …, N) is the influencing factors of the academic performance of students such as students, teachers, school environment, social status, classroom environment, etc. APS denotes the academic performance of students.

3.3. Data Generation

Data is very critical to the machine learning algorithm. The performance of machine learning depends on the quality and level of data. To establish ML(X), some known student and educational information (training samples) are needed. This study collected the necessary student data from the literature. Machine learning was used to generate the complex and nonlinear relation of students’ academic performance and its influencing factors.

3.4. Prediction of the Academic Performance

Once the machine learning model ML(X) is obtained, it could be used to predict and investigate students’ academic performance. The educational strategy, policy, and learning habits could be investigated, updated, and improved based on the obtained predictive model. This study utilized machine learning to evaluate the importance and sensitivity of the various influencing factors. Moreover, the performance of machine learning was investigated for the student’s academic performance.

3.5. Procedure of Machine Learning-Based Educational Data Mining

This study employed machine learning to approximate the complex and nonlinear relationship between students’ academic performance and its influencing factors based on the collected educational data, which was obtained from the literature. Then, the academic performance of the student and its influencing factors constitute the training samples for machine learning. Based on generated training samples, the function mapping of the academic performance of students and its influencing factors can be described using the machine learning algorithm. The academic performance of the new unknown student was predicted using obtained machine learning model. The brief procedure of the machine learning-based academic performance model was presented as follows. Figure 6 shows the flowchart of the machine learning-based academic performance model.
Step 1: Identify the student and collect relevant educational data.
Step 2: Determine and select the influencing factors of academic performance.
Step 3: Process the student data and academic performance, quantify the influencing factors, and generate the dataset (training sample).
Step 4: Select the machine learning algorithm based on the above training samples.
Step 5: Set the machine learning parameters and call the corresponding algorithm based on the above training samples.
Step 6: Train and generate the academic performance model based on machine learning.
Step 7: Predict and evaluate the academic performance of the new unknown student based on the obtained machine learning model.

4. Evaluation and Discussion

This study adopted the collected data from the literature to investigate machine learning for academic performance prediction. Students’ academic performance was predicted using machine learning, and the feature importance was illustrated. The performance of each machine learning method was also illustrated and discussed based on the statistical parameters of the result using machine learning technology.

4.1. Dataset

This study collected data on the mathematics curriculum for the 2005–2006 school year in the Alentejo region of Portugal [34,46]. The dataset, which consists of the final grade and their impacted factors such as social, emotional, demographic, and school-related attributes, includes 395 examples from mathematics. The final grade was utilized to evaluate students’ academic performance. The impacted factors included the first-period note, the second-period note, the number of school absences, free time after school, etc. Figure 7 shows the statistical property of corresponding parameters for the collected student information. The training samples comprised the final grade and corresponding impacted factors. In order to understand and verify the performance of machine learning, the three hundred and ninety-five students were divided into two groups. One group includes three hundred and sixty-five students. The other group has thirty students, which were utilized to investigate the generalization performance of machine learning.
The correlation relation between students’ academic performance and their influencing factors was studied based on the collected student data. Figure 8 shows the correlation coefficient between academic performance and their impacted factors. The results conclude that some factors have a positive coefficient while others have a negative coefficient. The coefficient on the first and second notes, the educational level of the mother who wanted to receive higher education, the weekly study time, and the final grade are positive and more significant than 0.2. Therefore, the number of past class failures and school is negative and is less than 0.2. It concludes that the first note, second note, the educational level of the mother who wants to pursue higher education, weekly study time, past class failures, and school have a significant positive effect on the final grade, whereas past class failures and school have a significant negative effect. This is consistent with our common sense. It concludes that evaluating and predicting academic performance based on the selected impacted factors is feasible.

4.2. Evaluation the Academic Performance Using Machine Learning

Machine learning was utilized to mine the complex relationship between students’ academic performance and their influencing factors based on the collected data. Three hundred sixty-five students’ data were the training samples to build the evaluation model using machine learning technology. Based on the procedure of the machine learning-based academic evaluation model, the prediction model was generated.

4.3. Results

4.3.1. Prediction

The obtained machine learning model was used to predict the academic performance of the unknown student. Figure 9a shows the comparison between the actual final grade and the predicted by various machine learning approaches. The predicted final grade almost agrees with the actual value for most students’ performance. In particular, it is essential to point out that the predicted academic performance by XGBoost is in excellent agreement with the actual values. In order to further illustrate the generalization performance, the thirty student’s data was used to predict the final grade. Figure 9b compares the actual final grade and the predicted by various machine learning approaches for the testing data. The predicted performances by machine learning are almost identical to the actual values. It concluded that machine learning could characterize student behavior well based on the collected student data. Machine learning technology could quantify and evaluate students’ academic performance based on the collected data.

4.3.2. Performance Comparison

To further illustrate and verify the machine learning, the performance of each algorithm was investigated and compared based on standard derivation, correlation coefficient, and root square mean error. Figure 10 shows the Taylor diagram, which shows the performance comparison of machine learning methods each other. Machine learning has an excellent performance. Figure 10a shows machine learning algorithm performance during the training stage. The correlation coefficient is all higher than 90%. It proves that it is feasible to evaluate students’ academic performance using machine learning. The performance of machine learning is shown in Figure 10b during the testing stage. The correlation coefficient all are also higher than 90%. It proved that machine learning has an excellent generalization performance for predicting the final grade. It also proved that it is scientific and reliable to quantify and evaluate academic performance using machine learning. Of course, there are specific differences in the performance of various machine learning algorithms. For example, XGBoost has excellent predictive and generalization performance. The correlation coefficient is almost close to one. The results also further proved that machine learning technology provides an excellent tool for mining educational law and predicting students’ academic performance.

4.3.3. Sensitive Analysis

Many factors impact students’ academic performance. There are different affections of the various factors. The affection of each factor depends on the evaluation model, individual, educational environment, etc. This study investigated the sensitivity of various factors based on the XGBoost model. Figure 11 shows the total sensitivity of each factor of academic performance. We can see that the second-period note has the most significant impact on the final grade. Next, the first-period notes and the number of absences are critical to evaluate the final grade. In other words, the first-period note, the second-period note, and the number of absences can characterize well the academic performance.

4.3.4. Feature Importance Based on Machine Learning

To further investigate the affections of various impacted factors, the built-in function of the machine learning model was used to calculate the feature importance based on the different methods. Figure 12 shows the feature importance based on the different machine learning models. No matter which model, the second-period note and the number of absences are critical for evaluating the final grade. The importance of other influencing factors varies with different machine learning models. This further indicates that the relationship between academic performance and its influencing factors is complex and difficult to characterize. Therefore, the evaluation model of academic performance should consider as many factors as possible and not ignore some features with lower feature importance. In Figure 12c,f, the features obtained based on the AdaBoost and XGBoost models have more features whose importance is greater than 0. AdaBoost and XGBoost models take more influencing factors into account. AdaBoost and XGBoost models perform significantly better than others (Seen Figure 10). So, machine learning-based academic performance must fully consider various influencing factors.

4.3.5. Feature Analysis of Machine Learning Model for the Academic Performance

This study also investigates the effect of various influencing factors on students’ academic performance based on the machine learning model using the SHAP value. Figure 13 shows the SHAP value of various factors in the machine learning model. The two key features of academic performance evaluation are the second-period note and the number of absences. Moreover, this result agrees with the correlation analysis (Figure 10) and feature importance analysis (Figure 12). Figure 14 shows the influences of various factors on the final grade. The increase in the second-period note can enhance the final grade, and the number of absences negatively influences the final grade. It also proved that machine learning characterizes student academic performance and captures each influencing factor’s contribution to the final grade. Machine learning provides a scientific and reasonable tool to evaluate academic performance and mine the educational law.

4.4. Discussion

4.4.1. Complexity of the Academic Performance

It is challenging to predict academic performance due to the complex and nonlinear relation between students’ academic performance and its influencing factors. In this study, machine learning can evaluate the final grade based on the influencing factors. Figure 15 shows the relationship between the number of absences, the second-period note, and the final grade based on the XGBoost model. Their relationship is complex and nonlinear. Meanwhile, the final grade was also impacted by the interaction relation of the factors. It further proved that educational behavior and law are complex, and it is difficult to characterize them using statistical and regression analysis.

4.4.2. Significance of Machine Learning

The student’s academic performance is essential to enhance the educational policy and improve learning behavior and habits. However, it takes work to characterize educational law and learning behavior due to the complexity of academic performance. This study investigated the application of machine learning to students’ academic performance. Mache learning provides a promising way to capture and evaluate academic performance. Machine learning is a helpful, scientific, and reliable approach to characterizing and describing educational law.

4.5. Implication

This study concludes that machine learning could characterize student behavior well based on the collected student data, and the predicted performances are almost identical to the actual values. Machine learning technology can evaluate and predict academic performance using the collected data. Meanwhile, the first-period note, the second-period note, and the number of absences significantly affect academic performance.

4.6. Limitation

This study developed a machine learning-based academic performance model using the collected from the Portugal Alentejo region during the 2005–2006 academic year. The sensitivity and feature importance of influencing factors were investigated. However, it is challenging to predict academic performance due to the complexity of the academic performance and educational process. The developed model should be extended and further deepened with increasing academic data for characterizing the educational law well.

5. Conclusions

Recently, machine learning has been applied to evaluate students’ academic performance in EDM. The performance of various machine learning is essential to the academic performance model. Meanwhile, the machine learning model of academic performance depends on various factors during education. This study investigated machine learning technology for evaluating students’ academic performance. Based on the collected student data from the literature, various machine-learning technology was utilized to predict and quantify students’ academic performance. The performance of each method was illustrated and investigated using sensitivity analysis, feature important evaluation, SHAP value, etc. The results show that it is challenging to characterize educational behavior and evaluate academic performance due to the complex, uncertain, and nonlinear relationship between academic performance and its influencing factors. However, machine learning technology provides a promising way to quantify and predict students’ academic performance based on the collected data about education and student. Although various machine learning methods have different performance outcomes, each method’s performance could meet the academic performance requirement in varying degrees or on similar lines. The results of this study supported the following specific conclusions.
(1)
It is challenging to present the complex relationship between academic performance and its influencing factors. This study investigated machine learning-based academic performance based on the collected data. Machine learning provides an excellent and reliable tool to quantify and predict students’ academic performance;
(2)
Various factors influence academic performance, and the feature importance depends on the evaluation model. In order to enhance the prediction accuracy and scientifically characterize educational behavior, as many influencing factors as possible should be considered in the machine learning-based evaluation model;
(3)
Machine learning is a data-driven machine learning with excellent interpretable performance. Machine learning-based academic performance could be used to explain educational behavior, conduct an in-depth analysis of each influencing factor of students and explore the education evaluation law. Furthermore, machine learning is helpful for other complex problems of education evaluation;
(4)
Machine learning is an excellent tool for evaluating students’ academic performance and understanding educational law and learning behavior. A machine learning-based academic performance model is helpful to guide the education process and update the learning strategy;
(5)
This study illustrated the quantitative model of academic performance based on machine learning and the collected dataset. Due to the complexity of education behavior, this study only investigated some machine learning technology. With the accumulation and increasing data, the developed model will further extend and deepen in future studies. Meanwhile, various new technology will apply to evaluate students’ academic performance with machine learning development.

Author Contributions

Conceptualization, L.Z. (Lihong Zhao) and H.Z.; methodology, H.Z.; software, H.Z.; validation, L.Z. (Lihong Zhao), L.Z. (Lin Zhang) and J.R., formal analysis, L.Z. (Lin Zhang) and J.R.; investigation, L.Z. (Lihong Zhao) and H.Z.; resources, L.Z. (Lihong Zhao); data curation, L.Z. (Lihong Zhao); writing—original draft preparation, H.Z.; writing—review and editing, L.Z. (Lihong Zhao), L.Z. (Lin Zhang) and J.R.; visualization, H.Z.; supervision, H.Z.; project administration, L.Z. (Lihong Zhao); funding acquisition, L.Z. (Lihong Zhao). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the 2022 laboratory construction project at Shandong University of Technology under Grant No.2022017.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bai, L.; Yang, B.; Yuan, S. Evaluating of Education Effects of Online Learning for Local University Students in China: A Case Study. Sustainability 2023, 15, 9860. [Google Scholar] [CrossRef]
  2. Doukanari, E.; Ktoridou, D.; Efthymiou, L.; Epaminonda, E. The Quest for Sustainable Teaching Praxis: Opportunities and Challenges of Multidisciplinary and Multicultural Teamwork. Sustainability 2021, 13, 7210. [Google Scholar] [CrossRef]
  3. Doyumğaç, İ.; Tanhan, A.; Kıymaz, M.S. Understanding the most important facilitators and barriers for online education during COVID-19 through online photovoice methodology. Int. J. High. Educ. 2021, 10, 166–190. [Google Scholar] [CrossRef]
  4. Subasi, Y.; Adalar, H.; Tanhan, A.; Arslan, G.; Allen, K.; Boyle, C.; Lissack, K.; Collett, K.; Lauchlan, F. Investigating students’ experience of online/distance education with photovoice during COVID-19. Distance Educ. 2023. [Google Scholar] [CrossRef]
  5. Sánchez, A.; Vidal-Silva, C.; Mancilla, G.; Tupac-Yupanqui, M.; Rubio, J.M. Sustainable e-Learning by Data Mining—Successful Results in a Chilean University. Sustainability 2023, 15, 895. [Google Scholar] [CrossRef]
  6. Chen, X.; Vorvoreanu, M.; Madhavan, K. Mining social media data for understanding students’ learning experiences. IEEE Trans. Learn. Technol. 2014, 72, 46–259. [Google Scholar] [CrossRef]
  7. Mustafa, Y. Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Smart Learn. Environ. 2022, 9, 11. [Google Scholar] [CrossRef]
  8. Aina, C.; Baici, E.; Casalone, G.; Pastore, F. The determinants of university dropout: A review of the socio-economic literature. Socio-Econ. Plan. Sci. 2021, 79, 101102. [Google Scholar] [CrossRef]
  9. Khan, A.; Ghosh, S.K. Student performance analysis and prediction in classroom learning: A review of educational data mining studies. Educ. Inf. Technol. 2021, 26, 205–240. [Google Scholar] [CrossRef]
  10. Namoun, A.; Alshanqiti, A. Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Appl. Sci. 2021, 11, 237. [Google Scholar] [CrossRef]
  11. Upadhyay, H.; Juneja, S.; Juneja, A.; Dhiman, G.; Kautish, S. Evaluation of ergonomics-related disorders in online education using fuzzy AHP. Comput. Intell. Neurosci. 2021, 2021, 2214971. [Google Scholar] [CrossRef]
  12. Asad, R.; Altaf, S.; Ahmad, S.; Mahmoud, H.; Huda, S.; Iqbal, S. Machine Learning-Based Hybrid Ensemble Model Achieving Precision Education for Online Education Amid the Lockdown Period of COVID-19 Pandemic in Pakistan. Sustainability 2023, 15, 5431. [Google Scholar] [CrossRef]
  13. Mohamada, S.K.; Tasir, Z. Educational data mining: A review. Procedia-Soc. Behav. Sci. 2013, 97, 320–324. [Google Scholar] [CrossRef]
  14. Baker, R.S.J.D.; Yacef, K. The state of educational data mining in. A review and future visions. J. Educ. Data Min. 2009, 1, 3–17. [Google Scholar] [CrossRef]
  15. Papadogiannis, I.; Poulopoulos, V.; Wallace, M. A Critical Review of Data Mining for Education: What has been done, what has been learnt and what remains to be seen. Int. J. Educ. Res. Rev. 2020, 5, 353–372. [Google Scholar] [CrossRef]
  16. Thai-Nghe, N.; Drumond, L.; Krohn-Grimberghe, A.; Schmidt-Thieme, L. Recommender system for predicting student performance. Procedia Comput. Sci. 2010, 1, 2811–2819. [Google Scholar] [CrossRef]
  17. Elbadrawy, A.; Studham, S.; Karypis, G. Personalized Multi-regression models for predicting students’ performance in course activities. In Proceedings of the 5th International Conference on Learning Analytics and Knowledge, Poughkeepsie, NY, USA, 16–20 March 2015; pp. 16–20. [Google Scholar] [CrossRef]
  18. Polyzou, A.; Karypis, G. Grade prediction with models specific to students and courses. Int. J. Data Sci. Anal. 2016, 2, 159–171. [Google Scholar] [CrossRef]
  19. Bao, C.; Li, Y.; Zhao, X. The Influence of Social Capital and Intergenerational Mobility on University Students’ Sustainable Development in China. Sustainability 2023, 15, 6118. [Google Scholar] [CrossRef]
  20. Deng, J.; Gu, D.; Li, X.; Yue, Z.Q. Structural reliability analysis for implicit performance functions using artificial neural network. Struct. Safe 2005, 27, 25–48. [Google Scholar] [CrossRef]
  21. Wang, L.; Wang, C.; Khoshnevisan, S.; Ge, Y.; Sun, Z. Determination of two-dimensional joint roughness coefficient using support vector regression and factor analysis. Eng. Geol. 2017, 231, 238–251. [Google Scholar] [CrossRef]
  22. Zhao, H.; Yin, S.; Ru, Z. Relevance vector machine applied to slope stability analysis. Int. J. Numer. Anal. Method Geomech. 2012, 36, 643–652. [Google Scholar] [CrossRef]
  23. Ren, J.; Zhao, H.; Zhang, L.; Zhao, Z.; Xu, Y.; Cheng, Y.; Wang, M.; Chen, J.; Wang, J. Design optimization of cement grouting material based on adaptive Boosting algorithm and simplicial homology global optimization. J. Build. Eng. 2022, 49, 104049. [Google Scholar] [CrossRef]
  24. He, M.; Zhang, L. Machine learning and symbolic regression investigation on stability of MXene materials. Comput. Mater. Sci. 2021, 196, 110578. [Google Scholar] [CrossRef]
  25. Kabra, R.R.; Bichkar, R.S. Performance prediction of engineering students using decision trees. Int. J. Comput. Appl. 2011, 36, 8–12. [Google Scholar] [CrossRef]
  26. Bhardwaj, B.K.; Pal, S. Data Mining: A prediction for performance improvement using classifcation. Int. J. Comput. Sci. Inf. Secur. 2011, 9, 355–358. [Google Scholar] [CrossRef]
  27. Arsad, P.M.; Buniyamin, N.; Manan, J.-L. Neural network model to predict electrical students’ academic performance. In Proceedings of the 4th International Congress on Engineering Education (ICEED), Georgetown, Malaysia, 5–7 December 2012. [Google Scholar] [CrossRef]
  28. Hamsa, H.; Indiradevi, S.; Kizhakkethottam, J.J. Student academic performance prediction model using decision tree and fuzzy genetic algorithm. Procedia Technol. 2016, 25, 326–332. [Google Scholar] [CrossRef]
  29. MohammadNoor, I.; Abdallah, M.; Ali, B.N.; Abdallah, S. Multi-split optimized bagging ensemble model selection for multi-class educational data mining. Appl. Intell. 2020, 504, 506–4528. [Google Scholar] [CrossRef]
  30. Zhang, W.; Wang, Y.; Wang, S. Predicting academic performance using tree-based machine learning models: A case study of bachelor students in an engineering department in China. Educ. Inf. Technol. 2022, 271, 3051–13066. [Google Scholar] [CrossRef]
  31. Juan, A.T.C.; Héctor, G.C. A stacking ensemble machine learning method for early identifcation of students at risk of dropout. Educ. Inf. Technol. 2023, 1–21. [Google Scholar] [CrossRef]
  32. Cédric, B.; Jefrey, S.R. Predicting University Students’ Academic Success and Major Using Random Forests. Res. High. Educ. 2019, 60, 1048–1064. [Google Scholar] [CrossRef]
  33. Vidhya, R.; Vadivu, G. Towards developing an ensemble based two-level student classifcation model (ESCM) using advanced learning patterns and analytics. J. Ambient. Intell. Humaniz. Comput. 2021, 127, 095–7105. [Google Scholar] [CrossRef]
  34. Sinem, B.K.; Sevda, A. HELA: A novel hybrid ensemble learning algorithm for predicting academic performance of students. Educ. Inf. Technol. 2022, 27, 4521–4552. [Google Scholar] [CrossRef]
  35. Bansal, V.; Buckchash, H.; Raman, B. Computational intelligence enabled student performance estimation in the age of COVID-19. SN Comput. Sci. 2022, 3, 41. [Google Scholar] [CrossRef]
  36. Singh, S.; Malik, S.; Singh, P. Factors Affecting the Academic Performance of Students. J. Educ. Pract. 2016, 114509778. [Google Scholar] [CrossRef]
  37. Bakhshinategh, B.; Zaiane, O.R.; ElAtia, S.; Ipperciel, D. Educational data mining applications and tasks: A survey of the last 10 years. Educ. Inf. Technol. 2017, 23, 537–553. [Google Scholar] [CrossRef]
  38. Vapnik, V.N.; Golowich, S.E.; Smola, A.J. Support vector method for function approximation, regression estimation, and signal processing. Adv. Neural Inf. Process. Syst. 1996, 9, 281–287. [Google Scholar]
  39. Suykens JA, K.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
  40. Smola Alex, J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
  41. Hopfield, J.J. Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl. Acad. Sci. USA 1984, 81, 3088–3092. [Google Scholar] [CrossRef]
  42. Rumelhart, D.E.; Hinton, G.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  43. Rokach, L.; Maimon, O. Data Mining with Decision Trees: Theory and Applications, 2nd ed.; World Scientific Pub Co., Inc.: Singapore, 2014; p. 200814. [Google Scholar] [CrossRef]
  44. Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
  45. Kamiński, B.; Jakubczyk, M.; Szufel, P. A framework for sensitivity analysis of decision trees. Cent. Eur. J. Oper. Res. 2017, 26, 135–159. [Google Scholar] [CrossRef] [PubMed]
  46. Cortez, P. Student Performance. UCI Machine Learning Repository. 2014. Available online: https://archive.ics.uci.edu/dataset/320/student+performance (accessed on 16 June 2023).
Figure 1. Main component and process of academic performance evaluation.
Figure 1. Main component and process of academic performance evaluation.
Sustainability 15 12531 g001
Figure 2. The concept and idea of machine learning.
Figure 2. The concept and idea of machine learning.
Sustainability 15 12531 g002
Figure 3. ANN structural architecture.
Figure 3. ANN structural architecture.
Sustainability 15 12531 g003
Figure 4. The concepts and structure of decision tree.
Figure 4. The concepts and structure of decision tree.
Sustainability 15 12531 g004
Figure 5. The basic idea and structure of ensemble learning.
Figure 5. The basic idea and structure of ensemble learning.
Sustainability 15 12531 g005
Figure 6. The flowchart of the academic performance using machine learning.
Figure 6. The flowchart of the academic performance using machine learning.
Sustainability 15 12531 g006
Figure 7. Some statistical properties of the collected student data.
Figure 7. Some statistical properties of the collected student data.
Sustainability 15 12531 g007
Figure 8. The correlation between academic performance and the statistical parameters of various influencing factors.
Figure 8. The correlation between academic performance and the statistical parameters of various influencing factors.
Sustainability 15 12531 g008
Figure 9. Comparison between the actual final grade and the predicted by machine learning.
Figure 9. Comparison between the actual final grade and the predicted by machine learning.
Sustainability 15 12531 g009
Figure 10. The performance comparison of various machine learning models.
Figure 10. The performance comparison of various machine learning models.
Sustainability 15 12531 g010
Figure 11. The sensitivity of the various influencing factors on the final grade.
Figure 11. The sensitivity of the various influencing factors on the final grade.
Sustainability 15 12531 g011
Figure 12. The feature importance of various influencing factors on the final grade.
Figure 12. The feature importance of various influencing factors on the final grade.
Sustainability 15 12531 g012aSustainability 15 12531 g012b
Figure 13. The importance of the various influencing factors based on SHAP.
Figure 13. The importance of the various influencing factors based on SHAP.
Sustainability 15 12531 g013
Figure 14. The impact on the final grade in the XGBoost model of academic performance.
Figure 14. The impact on the final grade in the XGBoost model of academic performance.
Sustainability 15 12531 g014
Figure 15. The relationship between the number of absences, the second-period note, and the final grade.
Figure 15. The relationship between the number of absences, the second-period note, and the final grade.
Sustainability 15 12531 g015
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, L.; Ren, J.; Zhang, L.; Zhao, H. Quantitative Analysis and Prediction of Academic Performance of Students Using Machine Learning. Sustainability 2023, 15, 12531. https://0-doi-org.brum.beds.ac.uk/10.3390/su151612531

AMA Style

Zhao L, Ren J, Zhang L, Zhao H. Quantitative Analysis and Prediction of Academic Performance of Students Using Machine Learning. Sustainability. 2023; 15(16):12531. https://0-doi-org.brum.beds.ac.uk/10.3390/su151612531

Chicago/Turabian Style

Zhao, Lihong, Jiaolong Ren, Lin Zhang, and Hongbo Zhao. 2023. "Quantitative Analysis and Prediction of Academic Performance of Students Using Machine Learning" Sustainability 15, no. 16: 12531. https://0-doi-org.brum.beds.ac.uk/10.3390/su151612531

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop