1. Introduction
As the world’s population ages and the prevalence of dementia rates rises, early detection, prevention, and treatment of neurological aspects of aging, such as cognitive decline and dementia, are becoming increasingly important. The degree of deviation from the normal range is an indication of pathological brain aging. This has fueled a growing interest in the development of methods to identify individuals deviating from a normative brain aging trajectory. The concept of brain age, an estimated biological age from anatomical and/or functional brain imaging data, has garnered significant attention in recent years [
1,
2]. Predictive deviations of brain age from chronological age have led to the development of personalized biomarkers for describing healthy brain development, abnormal aging, and early signs of clinical neuropsychiatric issues [
3]. Brain age prediction using machine learning (ML) techniques can infer an individual’s brain age from neuroimaging data, where brain age is roughly equivalent to the underlying biological age of the brain. Once trained, the brain-age model can be used to assess brain health in independent samples. Individuals with an estimated brain age below their chronological age have younger brains than their age-matched, healthy contemporaries, indicating a greater resistance to pathology and neurodegeneration. Conversely, accelerated brain aging occurs when predicted brain age exceeds chronological age, suggesting the brain has been subjected to cumulative insults or severe pathological impacts. The brain age gap estimation (BrainAGE) [
4,
5] metric has been introduced as an alternative to determine the degree of neuropathology, defined as the difference between the predicted brain age and chronological age. BrainAGE research using neuroimaging data has yielded important insights into the pathology of the brain in a wide range of neurological diseases such as Alzheimer’s disease (AD) [
6], mild cognitive impairment [
6], traumatic brain injury [
7], epilepsy [
8], multiple sclerosis [
9], as well as psychiatric disorders such as schizophrenia [
10], bipolar disorder [
11], and major depressive disorder [
12].
Various ML algorithms, such as the last absolute shrinkage and selection operator (Lasso) [
13,
14,
15], relevance vector regression (RVR) [
1,
4,
16,
17], support vector regression (SVR) [
14,
18,
19,
20,
21,
22], multilayer perceptron (MLP) [
23], and extreme gradient boosting (XgBoost) [
24,
25], have been employed for predicting brain age by using relevant features extracted from neuroimages. One such feature is the gray matter (GM) density map, which has been utilized in several studies to predict brain age. Franke et al. [
4] developed a brain age prediction system using a RVR approach based on preprocessed gray matter (GM) density maps. Their system achieved a mean absolute error (MAE) of 4.98 years after training on a cohort of 410 healthy adults aged 20 to 86 years. Similarly, Le et al. [
19] applied the SVR method to GM density maps from a larger cohort of 964 individuals aged 18 to 60 years and obtained a similar MAE of 4.84 years. Varikuti et al. [
15] took a different approach, using non-negative matrix factorization (NMF) clustering and Lasso regression analysis to develop a brain age prediction model based on GM density maps from 693 older individuals (aged 55 to 75 years), achieving an impressive MAE of 3.6 years. In addition to good performance, the model produced neurobiologically interpretable maps. The combination of deformation fields with GM volume has been shown to improve the accuracy of brain age prediction. For example, a RVR approach trained on a large cohort of healthy individuals (aged 20–86 years) achieved better results (MAE = 6.90 years) than using GM volumetric information alone (MAE = 7.96 years). [
16]. Transfer learning is another approach that has been employed to improve brain age prediction accuracy. Lin and colleagues [
26] utilized transfer learning to extract features from 594 healthy older individuals aged 50 to 90 years, which were then used as input for RVR. This approach achieved an MAE of 4.51 years. Diffusion tensor imaging (DTI) has been widely used to investigate WM microstructure, providing important insights into brain aging. Mwangi et al. [
17] applied the RVR approach to a cohort of 188 participants aged 4 to 85 years using popular DTI metrics and found that fractional anisotropy ageing patterns follow non-linear trajectories. In addition to structural changes, alterations in brain structural and functional connectivity have also been examined for brain age prediction. For example, Lin et al. [
23] employed MLP to predict brain age based on the structural connectivity network of 112 healthy participants aged 50.4–79.1 years and reported a mean MAE of 4.29 years. Resting-state functional MRI (rsfMRI) has also been utilized for brain age prediction. Vergun and colleagues [
22] used a SVR algorithm trained on rsfMRI data from 117 healthy individuals (aged 19–85 years) and found that SVR with a linear kernel performed better than a Gaussian kernel. Multi-modal imaging, which combines different types of imaging techniques, has been shown to provide complementary information and improve the accuracy of predicting brain age. For instance, Anatürk et al. [
24] utilized T1-weighted MRI (T1), DTI, and T2 fluid attenuated inversion recovery (T2) MRI to extract 1118 GM features and 245 WM features from 537 participants aged 60.34 to 82.76 years. They applied the XgBoost model and achieved an impressive MAE of 3.32 years. De Lange et al. [
25] further explored the potential of multi-modal imaging for brain age prediction. They employed an XgBoost algorithm trained on T1, DTI, and rsfMRI image modalities from 610 participants aged 60.34 to 84.58 years. Their results demonstrated that combining the three modalities was superior to using a single modality. Moreover, Cole [
13] adopted Lasso regression on six image modalities (T1, T2, susceptibility-weighted imaging (SWI), diffusion-MRI (dMRI), task fMRI (tfMRI), and rsfMRI) for the prediction of brain age. They have also verified that multi-modality imaging is more accurate than single-modality imaging.
In recent years, researchers have begun to investigate the impact of ML algorithms on the prediction performance of brain age. Structural imaging features have been widely used in brain age prediction studies due to their effectiveness in characterizing brain morphology. Lombardi et al. [
14] compared the performance of several ML strategies, including deep neural networks (DNN), random forest (RF), SVR, and Lasso, based on the anatomical features of 2168 participants. Their results showed that DNN outperformed the other methods. Valizadeh et al. [
27] applied multiple linear regression (MLR), ridge regression (RR), neural networks (NN), k-nearest neighborhood (KNN), support vector machine (SVM), and RF to various combinations of anatomical measures of 3144 participants (aged 7–96 years). They found that the NN and SVM models performed better than the other models. In another study, Baecker et al. [
18] investigated the impact of input type and model choice on brain age prediction performance using regional and voxel-based anatomical measures from 10,824 participants (aged 47–73 years). Their results showed that the input type had a greater impact on performance than model choice and that SVR, RVR, and Gaussian process regression (GPR) all performed similarly. Although these studies offer a wealth of knowledge for anatomically based brain age prediction, the findings from these studies cannot be simply extended to other imaging modalities or to a multi-modality investigation. A recent study by Niu et al. [
20] compared the performance of various ML models, including RR, SVR, GPR, and DNN, using imaging features from three modalities (T1, DTI, and resting-state functional MRI) in a cohort of 839 young participants. The author found that GPR, using multi-modal features, achieved the highest prediction accuracy, while the other three ML algorithms exhibited similar performance. They also suggested that multi-modality imaging features may confer an advantage for age prediction. It should be noted, however, that the study’s age span (8–21 years) limits the generalizability of the results. When examining the effects of ML algorithms, age range is a critical consideration. Brain age prediction models are typically developed for three age groups: childhood through adolescence, middle age through old age, and all ages. The predicted brain’s age can reveal how various diseases and cognitive activities have impacted the brain throughout a person’s life, and prediction errors may vary among different age groups. Infancy through adolescence is the most precise age range for prediction, followed by middle age and old age, while the entire age range is the most challenging. In most studies covering the entire age range, the majority of participants are young, with fewer middle-aged and older individuals, making it easier to predict. Therefore, it is not appropriate to directly compare the results based on different age groups or different age distributions within full-age groups. In this study, we focus on predicting the brain age of the middle-aged and senior age group, using a sizable dataset of over 10,000 individuals. To our knowledge, this is the first comprehensive exploration of the relationships between ML algorithms, imaging modalities, and brain age prediction in this age group.
To investigate the extent to which the algorithm, the imaging modality, and the interaction between them impact brain aging prediction performance, we conducted a comprehensive experiment using six ML algorithms (Lasso, RVR, SVR, XgBoost, category boost (CatBoost), and MLP) and six imaging modalities (T1, T2, SWI, diffusion-weighted imaging (DWI), tfMRI, and rsfMRI). Our study was designed to accomplish four goals. The primary goal of this study was to determine which ML algorithm could most accurately predict brain age. The second goal was to identify the imaging modality that is most sensitive to predicting brain age. Third, we sought to determine whether there was any interaction between the ML approach and image modalities. The fourth goal was to assess the interpretability of BrainAGE in multi-modal brain-age prediction.
4. Discussion
In this study, we aimed to investigate the impact of the ML algorithm, image modalities, and the interaction between the two on the performance of brain age prediction. To achieve this, we employed imaging data from six modalities to create 2218 IDPs. We evaluated a total of seven ML models, including six individual models and one ensemble model. Our study had several objectives: first, to identify the ML method that could estimate brain age most accurately; second, to determine which imaging modality was most effective in predicting brain age; third, to examine how different ML algorithms interact image features on brain aging prediction; and finally, to assess the interpretability of BrainAGEs generated by various ML models.
4.1. Image Modalities and ML Approaches
While all imaging modalities do demonstrate some ability to predict brain aging, they are not equally effective. T1 and DWI were determined to be the most relevant image modalities for brain age prediction. We found that changes in gray matter morphology and WM microstructures, particularly cortical thickness measurements, are the most critical imaging features. This conclusion was also supported by the leave-one-modality-out experiment. There are two reasons for the superior performance of T1 and DWI in predicting brain age. First, cognitive impairment in older adults is often related to brain atrophy and myelin degradation, as demonstrated by clinical and neuropathological studies [
54,
55]. T1-weighted images offer high anatomical resolution, allowing for detailed visualization of brain structures. Meanwhile, DWI images are sensitive to microstructural changes in white matter, providing information on white matter connectivity and integrity, which decline with age. This makes T1 and DWI-based IDPs particularly relevant for predicting brain age. Second, compared to the other modalities, T1 (N = 1436) and DWI (N = 675) had considerably more IDPs. Increasing the number of features in an imaging modality has been shown to improve the predictive ability of ML models. This is because a greater number of features can capture more detailed information about the biological and pathological processes underlying the imaging data, which can improve the accuracy of the model’s predictions. However, it is also important to consider the potential for overfitting when using large feature sets, as this can lead to reduced generalizability and poorer performance on new data. The other four modalities were only able to explain a small amount of variance in age, particularly the SWI and tfMRI. Among seven feature sets, the FSL-based and Freesurfer-based feature sets had the highest correlation with predicted age (r ranging from 0.74 to 0.84 on six ML models). Our findings are consistent with the findings of previous studies, which have shown gray matter morphology [
4,
15,
19] or T1 integrity [
25] to be reliable predictors of brain age. Surface-based features showed promise, with all ML models achieving MAEs below 3.48 years, and the lowest MAE being 3.127 years. Surface-based features have several advantages over volumetric measures in assessing age-related changes in the brain. Surface-based features are more accurate and precise than volumetric measures in capturing age-related changes in the brain [
56,
57]. Additionally, surface-based features are better at detecting local changes in brain morphology and handling partial volume averaging [
58]. Furthermore, surface-based features have a higher sensitivity to age-related changes and are a reliable predictor of chronological age [
59]. The performance rankings of several ML methods for a given set of image features are relatively similar, and the imaging modality is more important than the choice of ML models. The results of this study were based on cognitively normal participants. However, under disease conditions, the imaging modality might play a more critical role. For example, in patients with WM disease, a DWI-based brain age is more meaningful than a cortical thickness-based brain age.
Even though features from T1 or DWI have shown promising results, adding additional modalities may lead to a more efficient prediction than any single modality [
13], despite their noticeable collinearity. When different ML algorithms were applied to construct predictors using the features from all imaging modalities, the multi-modal models outperformed those trained with uni-modal models. One potential confounding factor in age prediction models is the shared variance between chronological age and BrainAGE. To address this issue, an age bias adjustment approach was used to remove the shared variance between chronological age and BrainAGE. After correction, there was a strong positive correlation between the predicted ages from different models (
r ranging from 0.93 to 1). High positive correlations between chronological age and predicted age (
r ranged from 0.88 to 0.92) were also observed. These models contributed almost 75% of the variance in the test data, with a corrected MAE of less than 2.71 years and a minimum of 2.45 years, and a corrected R
2 greater than 79.8% and a maximum of 85.0%.
In this study, the Lasso model outperformed other ML models in terms of prediction accuracy, likely due to its ability to handle high-dimensional multi-modal IDPs, prevent overfitting, and effectively select important features while removing irrelevant or redundant ones, while also addressing collinearity among predictor variables. Our results show that the performance of the Lasso model is comparable to or even better than that reported in earlier studies (with MAEs ranging from 3.4 years to 5.99 years) [
13,
14,
15]. Unlike previous studies, we utilized six image modalities and a larger sample size (N = 27,842) in our investigation. Ensemble learning is an ML technique that combines predictions from multiple models to produce a more accurate prediction. In our study, the ensemble model (MAE = 2.338 years) outperformed the single ML model, indicating the benefits of ensemble learning. The improved performance of the ensemble model is attributed to its ability to leverage the strengths of multiple models and mitigate their weaknesses, leading to overall better performance. Additionally, ensemble learning enhances the stability of predictions by reducing the variance of individual models and lowering the risk of overfitting, a common issue in ML models.
4.2. The Interpretability of BrainAGE
Brain age prediction is an important area of research, as it provides a biomarker for cognitive aging and age-related neurological diseases. To translate this research tool into a clinical application, it is crucial to identify the factors that underlie the BrainAGE biomarker in an interpretable manner. Brain age prediction models strive to accurately estimate the age of the brain while also providing a high degree of interpretability in the BrainAGE. The MAE is a commonly used metric to evaluate the accuracy of brain age predictions. Interestingly, as the predictive error decreases, there is an initial increase and then decrease in the interpretability of the BrainAGE. If predictive error is zero, the BrainAGE is also zero, indicating a perfect match between predicted and chronological age, but this also means there is no interpretability. Thus, finding a balance between accuracy and interpretability is crucial for developing effective brain age prediction models with practical applications in clinical and research settings.
One possible approach to improve the interpretability of BrainAGE is to enhance the signal-to-noise ratio by reducing unexplained variance while preserving explained variance. However, this approach cannot be implemented in the error function as interpretability is a population concept. To address this issue, we investigated the interpretability of BrainAGE from two perspectives. Firstly, we conducted a correlation analysis between BrainAGE predicted by seven ML models and 217 non-IDPs. Interestingly, we found that the BrainAGE generated by CatBoost and XgBoost had closer associations with non-IDPs, but they also had greater prediction errors in brain age prediction. However, a closer relationship with a large number of non-IDPs does not necessarily indicate stronger interpretability but may be due to a larger variance in BrainAGE, which can lead to more false positive reports. Secondly, we investigated the association between BrainAGE predicted by seven ML models and five non-IDPs previously identified in BrainAGE studies. Our results showed that Lasso regression and ensemble learning models ranked highest in the comprehensive evaluation of the relationship between BrainAGE and five non-IDPs. As there is a non-linear relationship between model error and interpretability, reducing model error may not always improve interpretability. Nevertheless, our study shows the critical role of minimizing model errors in enhancing the interpretability of BrainAGE. Of the six ML models evaluated, Lasso demonstrated the lowest model prediction error and ranked highest in the comprehensive evaluation with the five BrainAGE-related factors. Therefore, Lasso should be the preferred choice. A single ML model often has limitations and may not perform well on all datasets. This was evident in the study where no single model showed the strongest correlation with all five factors. In contrast, ensemble learning combines the predictions of multiple models to achieve a more robust and accurate prediction. By using diverse models, ensemble learning can capture different aspects of the data, resulting in a higher explainable variance. In the study, ensemble learning reflected the combined ability of the models to explain the five factors, resulting in a lower model prediction error than Lasso and comparable performance in the comprehensive evaluation. Moreover, the ensemble learning model exhibited stable performance in terms of correlations for non-IDPs, ranking in the middle of the pack of seven models. Therefore, we recommend the use of ensemble learning when computational efficiency is not a major concern, given its superior performance in the comprehensive evaluation and lower model prediction error compared to Lasso, as well as its relatively stable performance with respect to non-IDPs.
4.3. Limitation
The present study has several significant limitations. Firstly, the study population included both healthy individuals and those with diseases or risk factors, which may have altered brain structure or function, and decreased sensitivity to aging. Secondly, this study examines brain age prediction using data from the UKB database, specifically focusing on individuals of white British ancestry. However, the generalizability of these findings is limited by the study’s exclusive focus on this specific population. Further research is necessary to understand how brain age prediction may vary across different ethnic and national groups. Thirdly, it is important to note that all the study’s findings were based on globally and locally customized imaging features from several free neuroimaging tools. The deep learning-based framework can identify the optimal representation of features from a high-dimensional image space, eliminating the need for domain knowledge in feature engineering.