Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET0) Using Limited Meteorological Data

Salahudin, Hamza; Shoaib, Muhammad; Albano, Raffaele; Inam Baig, Muhammad Azhar; Hammad, Muhammad; Raza, Ali; Akhtar, Alamgir; Ali, Muhammad Usman

doi:10.3390/hydrology10080169

Open AccessArticle

Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET₀) Using Limited Meteorological Data

¹

Department of Agricultural Engineering, Bahauddin Zakariya University, Multan 60000, Pakistan

²

School of Engineering, University of Basilicata, 85100 Potenza, Italy

³

Department of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China

⁴

School of Science and the Environment, Grenfell Campus, Memorial University, St. John’s, NL A1C 5S7, Canada

^*

Authors to whom correspondence should be addressed.

Hydrology 2023, 10(8), 169; https://0-doi-org.brum.beds.ac.uk/10.3390/hydrology10080169

Submission received: 22 June 2023 / Revised: 27 July 2023 / Accepted: 9 August 2023 / Published: 11 August 2023

(This article belongs to the Special Issue Big Data and Machine Learning in Hydrology: Recent Advances and Trends)

Download

Browse Figures

Versions Notes

Abstract

:

To maximize crop production, reference evapotranspiration (ET₀) measurement is crucial for managing water resources and planning crop water needs. The FAO-PM56 method is recommended globally for estimating ET₀ and evaluating alternative methods due to its extensive theoretical foundation. Numerous meteorological parameters, needed for ET₀ estimation, are difficult to obtain in developing countries. Therefore, alternative ways to estimate ET₀ using fewer climatic data are of critical importance. To estimate ET₀ with alternative methods, difference climatic parameters of temperatures, relative humidity (maximum and minimum), sunshine hours, and wind speed for a period of 20 years from 1996 to 2015 were used in the study. The data were recorded by 11 meteorological observatories situated in various climatic regions of Pakistan. The significance of the climatic parameters used was evaluated using sensitivity analysis. The machine learning techniques of single decision tree (SDT), tree boost (TB) and decision tree forest (DTF) were used to perform sensitivity analysis. The outcomes indicated that DTF-based models estimated ET₀ with higher accuracy and fewer climatic variables as compared to other ML techniques used in the study. The DTF technique, with Model 15 as input, outperformed other techniques for the most part of the performance metrics (i.e., NSE = 0.93, R² = 0.96 and RMSE = 0.48 mm/month). The results indicated that the DTF with fewer climatic variables of mean relative humidity, wind speed and minimum temperature could estimate ET₀ accurately and outperformed other ML techniques. Additionally, a non-linear ensemble (NLE) of ML techniques was further used to estimate ET₀ using the best input combination (i.e., Model 15). It was seen that the applied non-linear ensemble (NLE) approach enhanced modelling accuracy as compared to a stand-alone application of ML techniques (R² Multan = 0.97, R² Skardu = 0.99, R² ISB = 0.98, R² Bahawalpur = 0.98 etc.). The study results affirmed the use of an ensemble model for ET₀ estimation and suggest applying it in other parts of the world to validate model performance.

Keywords:

reference evapotranspiration; machine learning techniques; ensemble approach; limited meteorological data

1. Introduction

Estimation of reference evapotranspiration (ET₀) has become momentous and necessary. It is considered a crucial parameter due to its boundless and extensive range of applications in hydrological studies. These studies are used to estimate the amount of water crops will need, making irrigation scheduling possible, stimulating crop yield, and enabling better planning and management of water resources [1]. Accurate estimation of ET₀ values has gained higher importance in agro-meteorological, hydrological and water-balance studies. In the interaction between flora, atmosphere, and soil, ET₀ is an important variable. Also, it can provide an accurate quantification for planning cropland water consumption [2] and effective irrigation [3].

Evapotranspiration is the term for the overall loss of moisture (water) caused by evaporation from surfaces like soil and plants [4]. The moisture loss from a well-irrigated grassy surface is referred to as “reference evapotranspiration” (ET₀) [5]. By using in situ monitoring-based experimental techniques such as the lysimeter method, the Bowen ratio-energy balance methodology, or eddy covariance devices, ET₀ can be measured directly [6]. A weighing lysimetric method that was based on the phenomena of water gain and loss to estimate ET₀ directly was used by the author of [7]. This method gained significant importance among direct methods (eddy covariance system, Bowen ratio) and was widely used in scientific studies [8]. The high capital, operating, and maintenance expenses of these techniques may restrict their practical application. Therefore, the best option for quantification is to rely on an indirect technique. Utilizing empirical models, such as those based on temperature, radiation, mass transfer, and other variables, is one of these indirect strategies. Nearly all empirical models determine ET₀. This is due to the difficulty of determining ET for each crop. As a result, crop coefficients are used to estimate crop evapotranspiration (ET_c) of each desired crop once ET₀ is first determined using indirect methods. To accurately estimate ET₀, several attempts have been undertaken. The Penman–Monteith (FAO PM56) approach, however, was developed by Allen in 1998, and he validated it in a variety of climatic conditions. The FAO PM56 method is recommended by the United Nations Food and Agriculture Organization (FAO) as the primary reference approach for determining ET₀ and validating other techniques [9,10,11]. Many locations throughout the world do not have the entire set of meteorological data needed to calculate ET₀ using the FAO PM56 method. A substantial obstacle to estimating ET₀ using the FAO PM56 approach is the lack of accessibility to all required information, uncertainty in dependability of climatic data, and unavailability of climatic data for many locations [12,13].

Recent research [14,15] reproduced FAO PM56 ET₀ using machine learning (ML) by utilizing a comprehensive set of climatic data, revealing the links and interrelationships among the variables. Similarly, the results of a deep learning neural network model using only one predictor parameter of solar radiation and FAO PM56 for estimating ET₀ were compared [16]. In a semi-arid location, the authors of [14] reported that Rs was the most important meteorological variable in determining ET₀. It is possible to substitute the variable Rs with the number of sunny hours (n); however, this is not always a viable option. It is also supported by the authors of [17] who investigated the potential of a deep factorization machine, gradient boosting techniques, and three tree-based ML models for modeling daily ET₀ in the context of a daily time series. According to previous studies [18,19,20], sunny hours have a stronger relationship with net radiation (Rn) than any other meteorological variable. As a result, this study chose N as an alternative to Rs [21,22]. The phenomenon of net radiation holds significant implications for the thermal characteristics of the Earth’s surface, thus constituting a crucial variable in the examination of land-surface phenomena and the wider topic of global climate change. Rn is the difference between inbound and outbound radiation (i.e., reflected shortwave radiation) at the surface of earth.

In practical applications, employing stand-alone AI models to process complex datasets can result in inadequate predictive capabilities. This limitation stems from the inability of an individual model to learn the diverse array of intricate patterns in data. The outcome can be suboptimal predictions. An ensemble of stand-alone prediction models can be used to get around this problem, yielding promising outcomes that surpass the performance of an individual model [23]. These are used to reduce single ML model bias and variance [24,25]. Ensemble of different ML models over the individual model yield best results as stated by the author of [26]. The authors of [27] have recommended the ensemble of ML models as they found better results in comparison to an individual model. The following literature highlighted the use of an ensemble approach reported recently in the literature.

The findings of the authors of [28] clearly depicted that ensembles of ML models have the capacity to increase the efficiency of individual models. By applying the ensemble approach, they have improved the efficiency of Artificial Intelligence (AI) models and empirical models up to 22% and 55%, respectively. Furthermore, they also found AI ensemble modeling superior to the empirical models. The ensemble-based genetic programming model was also utilized by the authors of [29] to measure the degree of unpredictability related to the model architecture. The findings support the idea that quantifying the structural ambiguity of the model may be carried out thoroughly, objectively, and realistically by using the projections of these ensemble models. To forecast the model’s dependability, the authors of [30] examined three linear ensembles and one non-linear ensemble technique. The non-linear ensemble surpassed all the other ensembles and the individual statistical and intelligent methods, according to the research. The ensembles created here can also be utilized to replace current techniques in effective ways.

Despite the ease with which weather data are being made accessible recently, many locations still lack reliable and consistent weather information. Insufficient weather observatories were established in Pakistan (the subject of our study), and climatic information for various sites was observed to be inadequate for calculating crop water needs based on ET₀. Consequently, conventional techniques (like PM56) are not suitable to be used owing to exorbitant requirements of input or the absence of weather-related variables, such as Rs. The development of approaches depending on lesser weather-related data inputs and the advancement of ML algorithms for the estimation of ET₀ with limited climate data become tasks of great significance. ML is among the finest solutions for developing an ET₀ model for this purpose. However, the formation of an ML model that can be tested versus a target variable using an established set of input parameters is a key and essential issue that was successfully solved in this work. With less climate data, the constructed ML models were tested at several test sites to confirm their accuracy in predicting ET₀. Moreover, in stand-alone applications, the ML models were prone to poor performance due to their inability to capture the trends and abruptly changing elements, which often reduced modelling performance. The objective of the ensemble technique, as shown by its notation, is to achieve distinctive characteristics for the component models that will result in the varied patterns that are displayed in the dataset [31]. In addition, an ensemble of various ML models increases the predictive ability of the model to draw input–output relations perfectly [32,33]. The selection of the best model to use in an ensemble depends on the outcomes of comparative analysis of the stand-alone performance of the models. The machine learning models with better performance are selected and ensembled into a conjunction to leverage the strengths of each model. The current study unifies the three tree-based techniques (TB, SDT and DTF) using an MLP-based non-linear ensemble through a parallel combination of the machine learning models. This implication enables the ensemble model to leverage the strength of each technique to enhance the final modelling accuracy.

In view of the above-discussed literature, it is evident that a reliable ensemble of machine learning models can significantly decrease parametric requirements to accurately predict reference evapotranspiration. Most of the existing literature focuses on making hydrological predictions or forecasts by direct modeling from input space to output space, therefore ensemble modelling is still a growing research direction in the field of hydrology. In the context of evapotranspiration estimation, an ensemble of tree-based machine learning techniques is a novel application. Therefore, this study aims to apply a tree-based ML ensemble approach for ET₀ estimation with the following objectives: (i) apply sensitivity analysis using tree-based ML techniques to identify the best indicators of ET₀ in order to reduce parametric requirements (ii) develop an ensemble model and improve ET₀ estimation, (iii) investigate ensemble model performance at various climatic stations. In addition, the studies conducted on ET₀ estimation using ML techniques have been limited to the analysis of only one climatic station or region. For example, the authors of [34] investigated a hybrid neural network approach in a semi-arid station only, and recommended using at least one climatic station from arid, semi-arid and humid regions to propose a generalized conclusion of the developed ensemble approach. Thus, the current study includes climatic stations from each selected region to investigate the performance of the developed tree-based ensemble ML approach.

2. Materials and Methods

2.1. Study Area and Datasets

In this study, 11 climatic stations located in different climatic regions of Pakistan have been studied. The input climatic parameters and daily average values of ET₀ were recorded on a monthly basis in Bhakkar, Jhang, Toba Tek (T.T) Singh, Sahiwal, D.G Khan, Bahawalpur, Rahim Yar (R.Y) Khan, and Jacobabad as arid regions, while Multan, Islamabad, and Skardu were considered as hyper arid, semi-arid, and humid regions, respectively [35]. The monthly dataset duration of climatic stations and their climatic conditions corresponding to each region are explicitly mentioned in Table 1. Figure 1 indicates the geographic position of all the selected climatic stations. Blue dots represent climatic stations near to Multan Station (purple dot), while red dots represent distant climatic stations.

2.2. Methodology

Firstly, the climatic data of Multan station (1996–2015) were divided into 70% training and 30% testing sets and SDT, TB and DTF were applied to estimate ET₀. This division of data into training and testing is practiced by most of the researchers in hydrology [36,37] and is also regarded as a simplified form of the V-fold rule of data partition [38]. Different input combinations of meteorological parameters i.e., T_min, T_max, T_mean, RH_mean, (u(x)), and n, were formed and used as input in the selected tree-based ML model.

Afterward, an effective input parameter combination for ET₀ estimation was selected by developing, training, and testing tree-based ML models (i.e., SDT, TB, and DTF) at Multan station, using input combinations. The tree-based linear and non-linear ensemble models were developed using the multi-layer perceptron (MLP) technique. Lastly, the performance of the developed tree-based ensemble model was tested in different weather stations located in various climatic regions (arid, semi-arid, humid) to validate the ensemble model’s results (for details: Sections 4.3 and 4.4). For this purpose, monthly data of climate parameters for the selected stations were applied as input to estimate ET₀ values using a tree-based ensemble model. The FAO-PM56 Method, which is described in Section 2.2.1, was used to calculate the ET₀ value that is indicated in Table 2. The statistical summary of the dataset for all the selected climatic stations is summarized in Table 2. In this section, we will further discuss the FAO-PM56 method and machine learning techniques used to estimate ET₀. Further, we will also explain the development of non-linear ensemble models based on the best-performing machine learning technique.

2.2.1. FAO-PM56 Method

Using Allen’s [5] FAO-56 PM approach, the ET₀ values for the Multan Station during the course of the research period were calculated using the meteorological variables:

{E T}_{0} = \frac{0.408 ∆ (R_{n} - G) + γ \times \frac{900}{T_{m e a n + 273}} \times U_{2} \times (e_{s} - e_{a})}{∆ + γ (1 + 0.34 u_{2})}

(1)

e_{s} = \frac{(e_{m i n} + e_{m a x})}{2}

(2)

e_{a} = \frac{(e_{m i n} \times (\frac{({R H}_{m a x})}{100}) + e_{m a x} \times (\frac{({R H}_{m i n})}{100}))}{2}

(3)

U_{2} = \frac{w s \times 4.87 \times 1000}{3600 \times e_{m i n} (67.8 \times 3 - 5.42)}

(4)

where ET₀ is calculated in mm/day, R_n is representing the net radiation (MJ/m² day) at the surface of the crop, soil heat flux density is represented by G (MJ/m² day, mean average temperature in °C is shown by the parameter T_mean, U₂ denotes the wind speed (m/s), e_s e_a, e_min and e_max represents the saturation actual, minimum and maximum vapor pressure (kPa). Finally, Δ and Ɣ are the vapor pressure curve slope (kPa/°C) and psychometric constant (kPa/°C), respectively.

2.2.2. Tree-Based Machine Learning Techniques

Tree-based machine learning approaches have a setup that resembles a tree and numerous nodes which are further responsible for examining and categorizing the given dataset [39,40]. The objective of this work was to identify the most useful climatic parameters for ET₀ estimation using the techniques of TB, SDT, and DTF. The SDT consists of one decision tree while TB and DTF are designed on multiple trees. The difference between TB and DTF originates from the transfer of error from the previous tree to the next (i.e., series combination) in TB and the parallel combination in DTF. The background and applied procedure of these techniques can be found in [41]. In addition to finding optimal values, the ML techniques based on superlative algorithms are of critical importance. The selected ML algorithms corresponding to the applied tree-based techniques are given in Table 3.

2.2.3. Development of Ensemble Models

A concept of ensemble process was employed which united the single output of each ML model by means of an arbitration process to attain an accurate target value by improving its performance [30]. The author of [42] has explained the arbitration process while complete detail regarding ensemble modeling with its diversity and size is elaborated in [43]. Ensemble modeling has fractionized into different types: (a) linear ensemble (b) non-linear ensemble. Linear ensemble (LE) includes Stack regression [44], weighted average [45], and simple average methods [46], while a combination of ML techniques is called a non-linear ensemble (NLE). The non-linear ensemble method is favored and preferred over the linear ensemble method according to recent studies. Linear ensemble methods have the advantage of computational simplicity over NLEs, whereas the latter are sought as having greater predictive accuracy as compared to linear ensemble methods. In addition, the authors of [28] have found NLE modelling superior in comparison to NLE for ET₀ estimation using pan evaporation data. They have also intricated the superior characteristics of NLEs over an LE ensemble approach and henceforth recommend applying NLEs to obtain significant results.

The ensemble modeling in this study was organized via one linear (simple averaging) and non-linear (combined ML techniques) ensemble method in order to make better comparison and a strong case. NLE methods combine the predictions of individual tree-based models using a non-linear function i.e., bagging or boosting. The non-linear function can be a weighted sum of the individual model predictions, or it can involve more complex operations such as decision trees, neural networks, or kernel methods. Non-linear ensembles can capture more complex relationships between the attributes of the input and the desired variable, and this can result in higher predictive accuracy compared to linear ensembles. In the linear ensemble (LE), a simple averaging method is carried out as:

{E T}_{L E} = \frac{1}{N} \sum_{i = 1}^{N} {E T}_{i}

(5)

Here, ET_LE, ETi, and N indicate the results of the ensemble model, the combination of the single model and the total number of selected models, respectively.

On the other hand, the outcome of each selected ML model has been accounted for and then further used as a predictor (input) in another chosen ML model to acquire entire ensemble results. In this study, a multi-layer perceptron (MLP) has employed the selected ensemble model. The NLE-ET₀ is estimated based on the ET₀ outputs of the ML models (SDT, TB, DTF) as:

{E T}_{N L E} = f ({E T}_{S D T}, {E T}_{T B}, {E T}_{D T F})

(6)

Here, ET_SDT, ET_TB, and ET_DTF present predicted ET₀ by SDT, TB and DTF models, respectively; while ET_NLE is ensemble ET₀ obtained by a non-linear ensemble (NLE) technique. The process continued until each subset had been analyzed once during validation. The general ensemble procedure can be seen in Figure 2.

Researchers [35,47] have confirmed the performance of the MLP (type of ANN) model over other AI models in the selection of a non-linear ensemble approach. For each ML model, the prime parameters of training algorithms, the number of iterations, convergence value and execution times always play a critical role [28]. Thus, this study has employed MLP as an ensemble model to obtain overall ensemble results. The parametric values for the selected ensemble model are given in Table 4 as recommended by the authors of [35].

By calculating the Nash–Sutcliffe efficiency (NSE), coefficient of determination (R²), and root mean squared error (RMSE), the performance of these models was examined. The error values indicate deviation error from the mean-ET₀ value. In addition, the lowest deviation error from the mean, and highest effectiveness of climatic parameters on ET₀ was observed [28]. The RMSE value for each model was calculated using Equation (7), while Equations (8) and (9) were used to determine the Nash–Sutcliffe Efficiency (NSE) and coefficient of determination (R²). The RMSE, NSE, and R² values of both the training and testing datasets are summarized in Table 5, Table 6 and Table 7, respectively.

R M S E = \sqrt{\sum_{i = 1}^{N} \frac{{{(E T}_{o b s} - {E T}_{e s t})}^{2}}{N}}

(7)

N S E = 1 - \frac{\sum_{i = 1}^{n} {({E T}_{o b s} - {E T}_{e s t})}^{2}}{\sum_{i = 1}^{n} {({E T}_{o b s} - \bar{{E T}_{o b s}})}^{2}}

(8)

R^{2} = \frac{{[n [\sum_{i = 1}^{n} ({E T}_{o b s} * {E T}_{e s t}) - (\sum_{i = 1}^{n} {E T}_{o b s}) (\sum_{i = 1}^{n} {E T}_{e s t})]]}^{2}}{[n \sum_{i = 1}^{n} {E T}_{o b s}^{2} - {({E T}_{e s t})}^{2}] - [n \sum_{i = 1}^{n} {E T}_{e s t}^{2} - {({E T}_{e s t})}^{2}]}

(9)

The value of RMSE is always positive as of the squaring function used in its mathematical formula. An increase in the divergence between observations and predictions results in an increase in RMSE value. The results obtained with a high RMSE value from the model are always ignored and not acceptable. Conversely, an output of low RMSE from the selected model has been chosen for perfect fit. If the value approaches 0, it shows the perfect fit of the model. Figure 3 refers to the flow chart of best input combination selection and a non-linear ensemble of tree-based techniques for ET₀ estimation.

3. Results

3.1. Determination of Effective Climatic Parameters

A total of 17 models based on different meteorological-input datasets were tried using selected ML techniques for ET₀ estimation at Multan station. It can be observed in Table 5 that model 15 having Tmin, RHmean, u(x) among all the models had the least RMSE value which indicated less deviation from ET₀-mean values. However, the TB technique outperformed in testing as RMSE was recorded at 0.42 mm/month while 0.48 mm/month and 0.58 mm/month were calculated in the case of DTF and SDT, respectively. The testing NSE values observed for TB, DTF, and SDT while using Model 15 were 0.91, 0.93 and 0.90, respectively. Hence, DTF performed best in estimating ET₀ using the selected input combination. Similarly, Table 6 summarizes the NSE values of 17 SDT, TB and DTF models with 17 input combinations, whereas Table 7 presents the summary of results in terms of R².

To validate the results of ET₀ estimation at Multan station (summarized in Table 5, Table 6 and Table 7), comparison of RMSE results obtained through tree-based techniques (SDT, TB and DTF) is graphically presented in Figure 4 to determine effective meteorological input combinations on ET₀ estimation. It can be observed from Figure 4a that testing RMSEs for Model 1, Model 5, Model 10, Model 11, Model 13, and Model 15 under the SDT technique were found to be less than 0.7 mm/month. On the other hand, deviations in ET₀ values were observed above 50% from the mean value when meteorological input combinations based on other models were used in SDT for ET₀ estimation. The testing RMSEs for Model 1, Model 5, Model 13, and Model 15 under TB recorded less than 0.7 mm/month for ET₀ estimation among all other applied models as shown in Figure 4b. For DTF, only Model 1, Model 5, and Model 15 generated testing RMSEs less than 0.7 mm/month as observed in Figure 4c. Similarly, Figure 5 graphically presents the performance of SDT, TB and DTF models in terms of NSE. Model 1 is based on the maximum number of climatic variables including Tmin, Tmax, RHmean, u(x), and n, while Model 5 uses Tmax, Tmin, n, and u(x), as input variables. Therefore, Model 15, having the minimum number of variables, is rendered as the best input combination.

The reason behind this is that some models did not contain temperature as an input parameter which generated more residuals in resulting values and hence the error recorded was highest. For Multan station, which has an arid climatic nature, the change in temperature affected ET₀ and was considered an effective parameter for ET₀ estimation.

The above results could be summarized as applied tree-based ML techniques with Model 15 having a total of three input parameters (Tmin, RHmean, u(x)) which outperformed other models and generated the best results for ET₀ estimation. As the FAO-PM56 method is not only reliant on meteorological and aerodynamic parameters but also requires local calibration, in this situation tree-based techniques dependent on only meteorological parameters are the best alternative way to estimate ET₀. Thus, a scatter plot of SDT, TB and DTF techniques’ performance in the testing phase using the Model 15 input combination against the FAO-PM56 method was plotted and is presented in Figure 6. The obtained results indicated that Model 15 with only 3 climatic parameters (Tmin, RHmean, u(x)) generated less variance and the R² obtained is higher. For the TB-based model with input combination 15, an R² value of 0.93 was observed during the testing phase. For SDT, this value was 0.94 and for DTF, it was 0.96. These observations of R² also validated our above results which indicated that the RMSE value increased as the number of non-effective climatic parameters increased as input in applied tree-based techniques.

3.2. Ensemble Model Results

After the comparative analysis of DTF, TB and SDT performance at all seventeen input combinations, the best technique of DTF with Model 15 as the input combination was selected for ensemble. The ensemble of an individual technique enhanced the capability of the target value and generated close results to the actual value. In addition, the output obtained from the ensemble approach captured seasonal variations in the best way and generated good results against target values. The current study applied one linear (simple averaging) and non-linear (combined ML techniques) ensemble approach to estimate ET₀. The obtained results are shown in Figure 7. Simple linear ensemble-based ET₀ (LE-ET₀) shows less accuracy (i.e., R² = 0.89) than that of non-linear ensemble-based ET₀ (NLE-ET₀) (i.e., R² = 0.97) with respect to PM-ET₀. Similarly, the RMSE of LE-ET₀ (RMSE = 0.38 mm/month) is higher than that of NLE-ET₀ (RMSE = 0.18 mm/month).

3.3. Testing of the NLE Method at Nearby Climate Stations

In this section, comparison of NLE and FAO-PM56 is presented by considering climatic data from adjacent stations in southern Punjab. These climatic stations include Bhakkar, DG Khan, Jhang, RY Khan, Sahiwal, TT Singh and Bahawalpur. The selected Model 15, with an input combination of Tmin, RHmean and u(x), was used as input to estimate ET₀ by applying an NLE approach and compared with the FAO-PM56 method. The obtained results for selected climatic stations are shown in Figure 8. At Bhakkar station, an MLP-based NLE model was able to reproduce the PM-method ET₀ with a small estimation error (i.e., RMSE = 0.34 mm/month) and high similarity (i.e., R² = 0.96). Similarly, values of RMSE at DG Khan, Jhang, RY Khan, Sahiwal, TT Singh, and Bahawalpur stations were 0.38, 0.36, 0.36, 0.25, 0.32, and 0.33 (mm/month), respectively, whereas R² values were observed to be above 0.96 at all stations.

It can be perceived from Figure 8 that ET₀ obtained through an NLE approach compared well with the FAO-PM56 method. The shape of the trend for each climatic station in Figure 7 indicated: (1) available data duration of climatic stations; (2) winter and summer seasons. The higher and lower peaks of ET₀ in the results indicated climatic variation over the selected periods. The random data duration of adjacent climatic stations was selected to investigate the seasonal changes over the selected period. At each climatic station, the NLE approach overlapped with FAO-PM56 results and generated supreme results.

3.4. Testing of NLE Approaches in Faraway Climatic Stations

To investigate NLE performance in other climatic regions, three climatic stations, namely, Jacobabad (arid region); Islamabad (semi-arid region) and Skardu (humid region) were analyzed. Only the effective input meteorological parameters of Tmin, RHmean, and u(x) were used as input (Table 5) to estimate ET₀ by applying an NLE approach and compared with the FAO-PM56 method. It was noted in Figure 9 that ET₀ estimated by NLE approach compared well with the FAO-PM56 method. The higher and lower peaks of ET₀ at each selected station with the NLE and FAO-PM56 method closely overlapped. This indicated that ET₀ obtained through the NLE approach is reliable and acceptable with the use of limited climatic data. Similar to adjacent stations, NLE-based ET₀ showed an excellent resemblance to PM-based ET₀. The RMSE values for Jacobabad, Islamabad, and Skardu were 0.37, 0.32, and 0.19 (mm/month), respectively. The R² scores ranged between 0.96 at Jacobabad and 0.99 at Skardu stations.

3.5. Discussion

In this study, firstly, the findings indicated that DTF outperformed TB and SDT in estimating ET₀ using climatic parameter-based combinations as input to machine learning models. The climatic data of different weather stations, across diverse climate zones of Pakistan, was used. Earlier, for estimation of ET₀, TB was found to outperform SDT and DTF in Pakistan and other countries including the USA, New Zealand, and China [35,47,48,49,50]. However, DTF has been found to be an effective machine learning techniques in other hydrological applications including rainfall-runoff modelling [51,52]. This result is inconsistent with past investigations in the case of ET₀ and with hydrological applications generally. This contradiction is possibly due to the greater number of climatic variables involved in the estimation of ET₀ as compared to other hydrological applications.

Secondly, an ensemble of machine learning models has been found to enhance modelling performance and accuracy. The ensemble of SDT, TB and DTF by using MLP enhanced the accuracy of ET₀ estimation with minimum parametric requirements. Earlier studies have also shown that an adequate ensemble of machine learning techniques can increase modelling performance as compared to stand-alone applications. Therefore, this finding of the current study is consistent with those of other researchers [28,29,30,32,33].

Thirdly, mean relative humidity, mean temperature, and wind speed were found to be critical indicators of ET₀ in our study. The study’s findings supported the assertion made by the authors of [28] that increased air moisture content causes relative humidity to have greater impact in wet locations; as a result, when the aridity index increases, air moisture content is constrained, and its effects are less. Temperature and relative humidity were discovered to be the most important predictors of ET₀ in a study [53]. In another study [54], the effect of weather parameters on ET₀ estimation in Esfahan province in Iran was investigated. The study concluded that minimum air temperature, sunshine hours, and relative humidity formed effective parameters for ET₀ estimation in this region. Similarly, it was observed by the authors of [49] that climatic variables related to relative humidity had a significant influence on ML modelling of ET₀. Including relative humidity in machine learning-based models increased performance by up to 24%. These earlier observations support our finding on the selection of the best input combination.

However, it is recommended to employ ML over empirical and locally calibrated models in cases where climatic data is unavailable, inconsistent or of poor quality. Calibration of ML models in the training phase is critical to avoid over- or underestimation of ET₀ values. ET₀ is underrated with more training data, but it is overestimated with less training data. The use of models based on machine learning techniques with minimum information requires sufficient training. Therefore, in order to test the efficacy of the ML-ET₀ models generated, this study evaluated ML models in diverse climates. The data requirements for the current study, “Using the FAO PM56 and ML models for ET₀ estimation,” are displayed in Table 8.

FAO PM56 may be observed in Table 8 to depend on numerous characteristics that are difficult to obtain, especially in poor countries. As an alternative to the FAO PM56 approach, ML models use fewer parameters that yield the best ET₀ value.

4. Conclusions

The research effort that was carried out to create an ensemble-based machine learning model to predict ET₀ with scant climate data is discussed in this paper. The lengthy process and significant data requirements (not readily accessible in some scenarios) for determining ET₀ using the FAO-PM56 approach, which is recommended, served as the impetus for the study. The study’s findings demonstrate that it is possible to predict ET₀ from extant climatic data using a tree-based model. It has been shown that the mean relative humidity, minimum temperature, and wind speed are the three most important inputs for a precise determination of ET₀ using a tree-based model. This effective input was supported by a sensitivity analysis of the input parameters on ET₀ carried out using tree-based models, where the lowest RMSE and maximum R² values were obtained. According to the study’s findings, tree-based models can still predict ET₀ precisely even when just data for these three variables are provided. Furthermore, an ensemble approach was applied to improve ET₀ estimation using only three effective inputs (T_min, RH_mean, u(x)) and the results showed considerable improvement in ET₀ estimation. The performance of this ensemble model was further investigated in seven adjacent and four faraway climatic stations of the selected study area to include different climatic effects from diverse climatic regions. The obtained results of the ensemble model indicate its usefulness and reliability as the obtained ET₀ was well correlated with the standard FAO-PM56 method. Lastly, the study proposed to develop different ensemble ML techniques for ET₀ estimation in other parts of the world.

Because ML strategies can handle system uncertainty, the ensemble stays superior, which implies that when a single ML methodology performs poorly, an ensemble approach will have more potential for improvements. Additionally, when a single ML approach performs well, ensemble modelling produces findings of a high caliber, and when a stand-alone ML technique performs poorly, improved results may be obtained. The approach we have suggested for estimating ET₀ has to be applied in a number of places with diverse climatic conditions. The crucial thing to remember is that applying this ensemble approach in many parts of the world will assist in increasing its veracity and accuracy, and more recent machine learning approaches built on cutting-edge algorithms offer fodder for further study. This study proposes that an ensemble approach can be used by combining other ML techniques such as ANFIS, SVM, GMDH and CCNN on ET₀ estimation. The most significant factor in applying an ensemble approach is the use in all parts of the world to determine its efficiency and reliability, specifically in areas that have limited climatic data. In addition, the current study used climatic data on a monthly basis, therefore we recommend future research should be focused to develop an ensemble model based on data on a daily basis to increase the accuracy and generalizability of the developed ensemble model for ET₀ estimation.

Author Contributions

Conceptualization, M.S.; methodology, M.S., R.A. and A.R.; software, M.S., M.H. and H.S.; validation, A.R., M.S. and M.A.I.B.; formal analysis, A.R., M.U.A., H.S. and M.H.; investigation, M.S. and A.R.; resources, M.S., A.R., A.A., R.A. and H.S.; data curation, A.R. and H.S.; writing—original draft preparation, A.R., H.S. and M.H. and R.A.; writing—A.R., A.A., M.H. and R.A.; visualization, R.A., A.A. and M.U.A.; supervision, M.S. and M.A.I.B.; project administration, M.S.; funding acquisition, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Higher Education Commission (HEC) of Pakistan, 7368.

Data Availability Statement

The Pakistan Meteorological Department (PMD) provided the data for the study, which the authors gratefully acknowledge. One can get the information directly from PMD.

Acknowledgments

We are thankful to the academic editor and reviewers for their insightful reviews and suggestions to improve the quality of our work. Also, The Pakistan Meteorological Department (PMD) provided the data for the study, which the authors gratefully acknowledge.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lieth, H. Modeling the Primary Productivity of the World. In Primary Productivity of the Biosphere; Lieth, H., Whittaker, R.H., Eds.; Springer: Berlin/Heidelberg, Germany, 1975; pp. 237–263. [Google Scholar]
Zhang, Y.; Sun, A.; Sun, H.; Gui, D.; Xue, J.; Liao, W.; Yan, D.; Zhao, N.; Zeng, X. Error Adjustment of TMPA Satellite Precipitation Estimates and Assessment of Their Hydrological Utility in the Middle and Upper Yangtze River Basin, China. Atmos. Res. 2019, 216, 52–64. [Google Scholar] [CrossRef]
Jung, M.; Reichstein, M.; Ciais, P.; Seneviratne, S.I.; Sheffield, J.; Goulden, M.L.; Bonan, G.; Cescatti, A.; Chen, J.; de Jeu, R.; et al. Recent decline in the global land evapotranspiration trend due to limited moisture supply. Nature 2010, 467, 951–954. [Google Scholar] [CrossRef] [Green Version]
Goyal, M.R.; Harmsen, E.W. Evapotranspiration: Principles and Applications for Water Management; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapotranspiration—Guidelines for Computing Crop Water Requirements; Food and Agriculture Organization: Rome, Italy, 1998. [Google Scholar]
Wang, L.; Iddio, E.; Ewers, B. Introductory overview: Evapotranspiration (ET) models for controlled environment agriculture (CEA). Comput. Electron. Agric. 2021, 190, 106447. [Google Scholar] [CrossRef]
Van, B.C.H. Lysimetric measurements of evapotranspiration rates in the eastern United States. Soil Sci. Soc. Am. J. 1961, 25, 138–141. [Google Scholar]
Ding, R.; Kang, S.; Li, F.; Zhang, Y.; Tong, L.; Sun, Q. Evaluating eddy covariance method by large-scale weighing lysimeter in a maize field of northwest China. Agric. Water Manag. 2010, 98, 87–95. [Google Scholar] [CrossRef]
Garcia, M.; Dirk, R.; Rick, A.; Carlos, H. Dynamics of Reference Evapotranspiration in the Bolivian Highlands (Altiplano). Agric. For. Meteorol. 2004, 125, 67–82. [Google Scholar] [CrossRef]
Gavilán, P.; Lorite, I.J.; Tornero, S.; Berengena, J. Regional Calibration of Hargreaves Equation for Estimating Reference ET in a Semiarid Environment. Agric. Water Manag. 2006, 81, 257–281. [Google Scholar] [CrossRef]
McMahon, F.H.S.; Chiew, N.N.; Kamaladasa, H.M.; Malano, T.A. Penman-Monteith, FAO-24 Reference Crop Evapotranspiration and Class—A Pan Data in Australia. Agric. Water Manag. 1995, 28, 9–21. [Google Scholar] [CrossRef]
Gocic, M.; Trajkovic, S. Software for estimating reference evapotranspiration using limited weather data. Comput. Electron. Agric. 2010, 71, 158–162. [Google Scholar]
Tabari, H.; Talaee, P. Local calibration of the Hargreaves and Priestley–Taylor equations for estimating reference evapotranspiration in arid and cold climates of Iran based on the Penman–Monteith model. J. Hydrol. Eng. 2011, 16, 837–845. [Google Scholar]
Başağaoğlu, H.; Chakraborty, D.; Winterle, J. Reliable Evapotranspiration Predictions with a Probabilistic Machine Learning Framework. Water 2021, 13, 557. [Google Scholar] [CrossRef]
Chakraborty, D.; Başağaoğlu, H.; Winterle, J. Interpretable vs. noninterpretable machine learning models for data-driven hydro-climatological process modeling. Expert Syst. Appl. 2021, 170, 114498. [Google Scholar] [CrossRef]
Ravindran, S.M.; Bhaskaran, S.K.M.; Ambat, S.K.N. A Deep Neural Network Architecture to Model Reference Evapotranspiration Using a Single Input Meteorological Parameter. Environ. Process. 2021, 8, 1567–1599. [Google Scholar] [CrossRef]
Zhou, Z.; Zhao, L.; Lin, A.; Qin, W.; Lu, Y.; Li, J.; Zhong, Y.; He, L. Exploring the potential of deep factorization machine and various gradient boosting models in modeling daily reference evapotranspiration in China. Arab. J. Geosci. 2020, 13, 1287. [Google Scholar] [CrossRef]
Deo, R.C.; Wen, X.; Qi, F. A wavelet-coupled support vector machine model for forecasting global incident solar radiation using limited meteorological dataset. Appl. Energy 2016, 168, 568–593. [Google Scholar] [CrossRef]
Wang, L.; Kisi, O.; Zounemat-Kermani, M.; Salazar, G.; Zhu, Z.; Gong, W. Solar radiation prediction using different techniques: Model evaluation and comparison. Renew. Sustain. Energy Rev. 2016, 61, 384–397. [Google Scholar] [CrossRef]
Wang, L.; Kisi, O.; Zounemat-Kermani, M.; Hu, B.; Gong, W. Modeling and comparison of hourly photosynthetically active radiation in different ecosystems. Renew. Sustain. Energy Rev. 2015, 56, 436–453. [Google Scholar] [CrossRef]
Rahimikhoob, A. Estimation of Evapotranspiration Based on Only Air Temperature Data Using Artificial Neural Networks for a Subtropical Climate in Iran. Theor. Appl. Climatol. 2010, 101, 83–91. [Google Scholar] [CrossRef]
Slavisa, T.; Kolakovic, S. Estimating Reference Evapotranspiration Using Limited Weather Data. J. Irrig. Drain. Eng. 2009, 135, 443–449. [Google Scholar] [CrossRef]
Lessmann, S.; Bart, B.; Hsin-vonn, S.; Lyn, C.T. Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring: An Update of Research. Eur. J. Oper. Res. 2015, 247, 124–136. [Google Scholar] [CrossRef] [Green Version]
Kim, M.; Sung-hwan, M.; Ingoo, H. An Evolutionary Approach to the Combination of Multiple Classifiers to Predict a Stock Price Index. Earth Syst. Appl. 2006, 31, 241–247. [Google Scholar] [CrossRef]
Tsai, C.; Yu-chieh, H. Combining Multiple Feature Selection Methods for Stock Prediction: Union, Intersection, and Multi-Intersection Approaches. Decis. Support Syst. 2010, 50, 258–269. [Google Scholar] [CrossRef]
Baker, K. Operational Research Society Is Collaborating with JSTOR to Digitize, Preserve, and Extend Access to Operational Research Quarterly (1970–1977). Oper. Res. Q. 1977, 27, 155–167. [Google Scholar]
Makridakis, S.; Andersen, A.; Carbone, R.; Fildes, R.; Hibon, M.; Lewandowski, R.; Newton, J.; Parzen, E.; Winkler, R. The accuracy of extrapolation (time series) methods: Results of a forecasting competition. J. Forecast. 1982, 1, 111–153. [Google Scholar]
Nourani, V.; Elkiran, G.; Abdullahi, J. Multi-station artificial intelligence-based ensemble modeling of reference evapotranspiration using pan evaporation measurements. J. Hydrol. 2019, 577, 123958. [Google Scholar] [CrossRef]
Parasuraman, K.; Amin, E. Toward Improving the Reliability of Hydrologic Prediction: Model Structure Uncertainty and Its Quantification Using Ensemble-Based Genetic Programming Framework. Water Resour. Res. 2008, 44, 1–12. [Google Scholar] [CrossRef]
Kiran, N.R.; Ravi, V. Software reliability prediction by soft computing techniques. J. Syst. Softw. 2008, 81, 576–583. [Google Scholar]
Sharghi, E.; Nourani, V.; Nazanin, B. Earthfill Dam Seepage Analysis Using Ensemble Artificial Intelligence Based Modeling. J. Hydroinform. 2018, 20, 1071–1084. [Google Scholar] [CrossRef]
Finlay, S. Multiple Classifier Architectures and Their Application to Credit Risk Assessment. Eur. J. Oper. Res. 2011, 210, 368–378. [Google Scholar] [CrossRef] [Green Version]
Paleologo, G.; André, E.; Gianluca, A. Subagging for Credit Scoring Models. Eur. J. Oper. Res. 2010, 201, 490–499. [Google Scholar] [CrossRef]
Sharma, G.; Singh, A.; Jain, S. A hybrid deep neural network approach to estimate reference evapotranspiration using limited climate data. Neural Comput. Appl. 2021, 34, 4013–4032. [Google Scholar] [CrossRef]
Raza, A.; Shoaib, M.; Faiz, M.A.; Baig, F.; Khan, M.M.; Ullah, M.K.; Zubair, M. Comparative Assessment of Reference Evapotranspiration Estimation Using Conventional Method and Machine Learning Algorithms in Four Climatic Regions. Pure Appl. Geophys. 2020, 177, 4479–4508. [Google Scholar] [CrossRef]
Hammad, M.; Shoaib, M.; Salahudin, H.; Baig, M.A.I.; Khan, M.M.; Ullah, M.K. Rainfall forecasting in upper Indus basin using various artificial intelligence techniques. Stoch. Environ. Res. Risk Assess. 2021, 35, 2213–2235. [Google Scholar] [CrossRef]
Quilty, J.; Adamowski, J. Addressing the incorrect usage of wavelet-based hydrological and water resources forecasting models for real-world applications with best practices and a new forecasting framework. J. Hydrol. 2018, 563, 336–353. [Google Scholar] [CrossRef]
Stone, M. Cross-Validatory Choice and Assessment of Statistical Predictions. J. R. Stat. Soc. 1974, 36, 111–147. [Google Scholar] [CrossRef]
Peng, W.; Juhua, C.; Haiping, Z. An Implementation of IDE3 Decision Tree Learning Algorithm. Mach. Learn. 2009, 9417, 1–20. [Google Scholar]
Sherrod, P.; DTREG Predictive Modeling Software. DevDigital: Nashvilla Software Development. 2009. Available online: https://www.dtreg.com (accessed on 11 April 2023).
Raza, A.; Shoaib, M.; Khan, A.; Baig, F.; Faiz, M.A.; Khan, M.M. Application of Non-Conventional Soft Computing Approaches for Estimation of Reference Evapotranspiration in Various Climatic Regions. Theor. Appl. Climatol. 2020, 139, 1459–1477. [Google Scholar] [CrossRef]
Vannieuwenhuyse, G. Arbitration and new technologies: Mutual benefits. J. Int. Arbitr. 2018, 35, 119–129. [Google Scholar] [CrossRef]
Rokach, L. Ensemble Methods in Supervised Learning. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2010; pp. 959–979. [Google Scholar]
Richman, R.; Wüthrich, M.V. Nagging Predictors. Risks 2020, 8, 83. [Google Scholar] [CrossRef]
Perrone, M.P.; Copper, L.N. When Networks Disagree: Ensemble Methods for Technical Report Hybrid Neural Networks Unclassified; Brown University Providence Ri Institute for Brain and Neural Systems: Providence, RI, USA, 1992. [Google Scholar]
Benediktsson, J.A.; Sveinsson, J.R.; Ersoy, O.K.; Swain, P.H. Parallel consensual neural networks. IEEE Trans. Neural Netw. 1997, 8, 54–64. [Google Scholar] [CrossRef] [Green Version]
Raza, A.; Shoaib, M.; Faiz, M.A.; Shakil, A.; Khan, M.M.; Ullah, M.K.; Sarfraz, H. Comparative Study of Powerful Predictive Modeling Techniques for Modeling Monthly Reference Evapotranspiration in Various Climatic Regions. Fresenius Environ. Bull. 2021, 30, 7490–7513. [Google Scholar]
Tikhamarine, Y.; Malik, A.; Kumar, A.; Souag-Gamane, D.; Kisi, O. Estimation of monthly reference evapotranspiration using novel hybrid machine learning approaches. Hydrol. Sci. J. 2019, 64, 1824–1842. [Google Scholar] [CrossRef]
Ferreira, L.B.; da Cunha, F.F.; de Oliveira, R.A.; Filho, E.I.F. Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM—A new approach. J. Hydrol. 2019, 572, 556–570. [Google Scholar] [CrossRef]
Kisi, O.; Sanikhani, H.; Zounemat-Kermani, M.; Niazi, F. Long-term monthly evapotranspiration modeling by several data-driven methods without climatic data. Comput. Electron. Agric. 2015, 115, 66–77. [Google Scholar] [CrossRef]
Khan, M.T.; Shoaib, M.; Hammad, M.; Salahudin, H.; Ahmad, F.; Ahmad, S. Application of Machine Learning Techniques in Rainfall—Runoff Modelling of the Soan River Basin, Pakistan. Water 2021, 13, 3528. [Google Scholar] [CrossRef]
Khan, M.T.; Shoaib, M.; Albano, R.; Inam, M.A.; Salahudin, H.; Hammad, M.; Ahmad, S.; Ali, M.U.; Hashim, S.; Ullah, M.K. Intercomparison and Assessment of Stand-Alone and Wavelet-Coupled Machine Learning Models for Simulating Rainfall-Runoff Process in Four Basins of Pothohar. Atmosphere 2023, 14, 452. [Google Scholar] [CrossRef]
Estévez, J.; Pedro, G.; Joaquín, B. Sensitivity Analysis of a Penman–Monteith Type Equation to Estimate Reference Evapotranspiration in Southern Spain. Hydrol. Process. 2009, 23, 3342–3353. [Google Scholar] [CrossRef]
Eslamian, S.; Saeid, S.; Alireza, G.; Zareian, M.J.; Alireza, F. Estimating Penman-Monteith Reference Evapotranspiration Using Artificial Neural Networks and Genetic Algorithm: A Case Study. Arab. J. Sci. Eng. 2012, 37, 935–944. [Google Scholar] [CrossRef]

Figure 1. Study Area location.

Figure 2. Mechanism of Ensemble Modeling applied in the Study.

Figure 3. Flow chart of best input combination selection and non-linear ensemble of tree-based techniques for ET₀ estimation.

Figure 4. Training and testing results of RMSE for (a) SDT, (b) TB, and (c) DTF based on input to various models.

Figure 5. Training and testing results of NSE for (a) SDT, (b) TB, and (c) DTF based on input to various models.

Figure 6. Regression comparison of SDT, TB and DTF with FAO-PM56 method.

Figure 7. ET₀ Comparison of LE and NLE approaches with FAO-PM56 for Model 15 at Multan station.

Figure 8. NLE Performance against FAO-PM56 in adjacent climatic stations.

Figure 9. Performance of NLE against FAO-PM56 in faraway climatic stations.

Table 1. Dataset duration and climatic characteristic of selected stations.

Sr. No.	Station Name	Latitude	Longitude	Duration	Years	Climatic Region
1	Multan	30.2705	71.5024	1996–2015	20	Hyper Arid
2	Jhang	31.2781	72.3317	2004–2017	14	Arid
3	T.T. Sing	30.9709	72.4826	2009–2017	9	Arid
4	Sahiwal	30.6682	73.1114	2005–2017	13	Arid
5	Bahawalpur	29.3544	71.6911	1987–2016	30	Arid
6	R.Y. Khan	28.4212	70.2989	2002–2017	16	Arid
7	D.G. Khan	30.0489	70.6455	2003–2017	15	Arid
8	Bhakkar	31.6082	71.0854	2010–2017	8	Arid
9	Jacobabad	28.2823	68.4472	2004–2016	12	Arid
10	Islamabad	33.6844	73.0479	2004–2016	12	Semi-Arid
11	Skardu	35.3247	75.5510	2004–2016	12	Humid

Table 2. Climatic data of other Stations.

Statistical Parameters	T_max	T_min	RH_mean	U (x)	n	ET₀
Statistical Parameters	°C	°C	%	Knots	hour/day	Mean (mm/day)
Multan
Mean	32.43	18.85	56.56	6.07	7.48	4.78
Median	34.70	20.20	59.00	5.50	7.69	4.75
Maximum	43.80	30.60	80.00	18.78	11.25	10.30
Minimum	18.00	3.80	28.00	0.00	3.13	1.10
Std. Dev.	7.39	8.62	11.83	3.97	1.47	2.61
Toba Tek Singh (T.T. Singh)
Mean	31.7	17.5	65.2	0.83	6.1	3.36
Median	34.3	18.8	67.5	0.7	6.8	3.5
Maximum	41.7	28.4	82.5	2.45	9.7	6.7
Minimum	16.9	2.7	39.5	0.00	0.00	1.00
Std. Dev.	7.13	8.17	10.82	0.61	2.44	1.64
Sahiwal
Mean	31.47	17.55	61.34	1.75	7.33	4.20
Median	34.00	18.60	64.00	1.65	8.00	4.20
Maximum	42.00	28.00	82.00	4.25	10.50	7.50
Minimum	16.40	3.20	33.00	0.10	0.00	1.40
Std. Dev.	7.38	7.90	11.14	0.89	2.56	1.74
Raheem Yar Khan (R.Y. Khan)
Mean	34.29	18.65	57.54	2.20	0.00	4.62
Median	36.60	20.05	59.75	2.15	0.00	4.40
Maximum	44.90	29.60	83.00	7.10	0.00	10.40
Minimum	19.90	4.40	31.00	0.15	0.00	1.40
Std. Dev.	7.39	8.09	10.32	1.25	0.00	2.22
Jhang
Mean	31.71	17.50	62.16	1.01	8.08	4.04
Median	34.15	18.55	65.00	0.90	8.37	4.00
Maximum	42.10	29.00	82.50	3.10	11.34	8.50
Minimum	16.90	3.40	35.00	0.00	3.44	0.90
Std. Dev.	7.17	8.27	11.39	0.73	1.67	2.16
Dera Ghazi Khan (D.G Khan)
Mean	32.49	19.05	57.08	3.25	7.36	4.79
Median	35.00	20.45	60.00	3.25	8.41	4.95
Maximum	43.70	30.20	76.00	6.10	10.62	9.30
Minimum	17.60	5.00	24.50	0.80	1.37	1.50
Std. Dev.	7.45	7.98	10.66	1.08	2.97	2.13
Bhakkar
Mean	32.59	17.61	60.30	1.02	3.46	3.62
Median	34.60	19.10	62.50	0.95	0.00	3.50
Maximum	44.70	29.50	87.00	3.20	10.10	7.60
Minimum	17.50	3.20	34.00	0.10	0.00	0.90
Std. Dev.	8.01	8.49	11.08	0.67	3.92	1.90
Bahawalpur
Mean	32.49	24.46	20.52	4.98	5.08	4.95
Median	33.00	24.70	5.05	4.10	5.80	5.10
Maximum	44.90	29.60	63.00	11.00	11.40	10.50
Minimum	19.90	4.40	34.00	0.10	0.00	1.50
Std. Dev.	12.43	13.88	20.59	3.36	3.46	2.06
Jacobabad
Mean	33.82	20.29	41.80	2.91	7.64	4.45
Maximum	45.45	30.75	72.85	7.10	8.45	8.98
Minimum	19.95	6.35	12.85	0.15	6.85	1.22
Std Dev.	7.28	7.83	13.55	1.50	0.44	1.93
Islamabad
Mean	28.62	14.16	49.68	1.61	7.30	3.40
Maximum	40.15	25.35	73.85	7.44	11.15	8.19
Minimum	15.05	−2.90	22.85	0.05	5.35	1.73
Std. Dev.	6.41	7.70	11.07	1.31	1.40	0.76
Skardu
Mean	19.14	4.13	39.21	2.46	5.98	3.22
Maximum	9.24	19.40	81.00	2.04	1.81	2.04
Minimum	−2.70	−17.90	14.00	0.15	2.55	0.37
Std. Dev.	9.60	8.14	14.56	8.54	1.95	2.12

Table 3. Summary of Applied Machine Learning Techniques.

ML Techniques	Learning Algorithm	Optimal Values of Prime Parameters
ML Techniques	Learning Algorithm	Rows in Node	Tree Level	Node Size
SDT	Iterative Dichotomiser 3 (ID3)	5	10	10
TB	Gradient Boosting Algorithm (GBA)	400	5	5
DTF	Random Forest Algorithm (RFA)	200	50	2

Table 4. Parametric values for selected MLP ensemble model.

Parameters	Values	Parameters	Values
Number of layers	3	Number of Iterations	10,000
Min to max neurons	2–20	Convergence tolerance	−1.00 × 10⁻⁵
Neurons in hidden layer	6	Minimum improvement delta	−1.00 × 10⁻⁶
Hidden layer Function	Sigmoid	Minimum gradient	−1.00 × 10⁻⁷
Output layer function	Linear	Maximum execution time	0

Table 5. Results of RMSE (mm/month) for all the meteorological input combinations.

Model	Meteorological Input Dataset	SDT		TB		DTF
Model	Meteorological Input Dataset	Training	Testing	Training	Testing	Training	Testing
Model 1	T_min, T_max, RH_mean, u(x), n	0.55	0.66	0.39	0.46	0.38	0.54
Model 2	RH_mean, n	1.05	1.74	1.12	1.41	1.14	1.9
Model 3	RH_mean, n, u(x)	0.79	1.76	0.61	1.64	0.68	1.39
Model 4	RH_mean, u(x)	0.79	1.76	0.6	1.63	0.36	1.7
Model 5	T_max, T_min, n, u(x)	0.42	0.62	0.32	0.4	0.22	0.51
Model 6	T_max, RH_mean, n, u(x)	0.38	0.97	0.3	0.87	0.18	1.06
Model 7	T_max, RH_mean, u(x)	0.38	0.97	0.29	0.86	0.2	1.23
Model 8	T_max, T_min, RH_mean, n	0.4	0.83	0.39	1.21	0.25	1.04
Model 9	T_max, T_min, RH_mean, n, u(x)	0.32	0.83	0.27	0.82	0.18	0.81
Model 10	T_mean, RH_mean, n, u(x)	0.45	0.64	0.29	0.95	0.18	0.99
Model 11	T_mean, RH_mean, u(x)	0.45	0.64	0.28	0.95	0.18	1.18
Model 12	T_mean, RH_mean	0.52	0.7	0.41	1.21	0.27	1.5
Model 13	T_mean, n	0.55	0.64	0.6	0.64	0.62	0.74
Model 14	T_mean, RH_mean, n	0.52	0.7	0.42	1.21	0.51	1.31
Model 15	T_min,RH_mean, u(x)	0.48	0.58	0.38	0.42	0.24	0.48
Model 16	T_min, RH_mean, n, u(x)	0.45	1.17	0.29	1.11	0.19	1
Model 17	T_mean, u(x)	0.45	1.17	0.29	1.12	0.2	1.17

Table 6. Results of NSE for all the meteorological input combinations.

Model	Meteorological Input Dataset	SDT		TB		DTF
Model	Meteorological Input Dataset	Training	Testing	Training	Testing	Training	Testing
Model 1	T_min, T_max, RH_mean, u(x), n	0.97	0.93	0.95	0.94	0.99	0.98
Model 2	RH_mean, n	0.71	0.68	0.66	0.58	0.65	0.54
Model 3	RH_mean, n, u(x)	0.83	0.4	0.9	0.42	0.88	0.44
Model 4	RH_mean, u(x)	0.83	0.5	0.9	0.43	0.94	0.46
Model 5	T_max, T_min, n, u(x)	0.95	0.89	0.93	0.91	0.95	0.82
Model 6	T_max, RH_mean, n, u(x)	0.96	0.73	0.98	0.78	0.95	0.68
Model 7	T_max, RH_mean, u(x)	0.96	0.73	0.98	0.78	0.94	0.56
Model 8	T_max, T_min, RH_mean, n	0.96	0.8	0.96	0.58	0.92	0.68
Model 9	T_max, T_min, RH_mean, n, u(x)	0.97	0.8	0.98	0.81	0.93	0.81
Model 10	T_mean, RH_mean, n, u(x)	0.94	0.88	0.98	0.72	0.94	0.71
Model 11	T_mean, RH_mean, u(x)	0.94	0.88	0.98	0.74	0.94	0.6
Model 12	T_mean, RH_mean	0.93	0.86	0.96	0.58	0.92	0.35
Model 13	T_mean, n	0.92	0.88	0.9	0.88	0.9	0.84
Model 14	T_mean, RH_mean, n	0.93	0.86	0.95	0.58	0.93	0.5
Model 15	T_min,RH_mean, u(x)	0.94	0.90	0.96	0.91	0.98	0.93
Model 16	T_min, RH_mean, n, u(x)	0.94	0.61	0.98	0.64	0.95	0.71
Model 17	T_mean, u(x)	0.94	0.61	0.98	0.64	0.93	0.6

Table 7. Results of R² for all the meteorological input combinations.

Model	Meteorological Input Dataset	SDT		TB		DTF
Model	Meteorological Input Dataset	Training	Testing	Training	Testing	Training	Testing
Model 1	T_min, T_max, RH_mean, u(x), n	0.96	0.95	0.96	0.95	0.97	0.96
Model 2	RH_mean, n	0.69	0.66	0.64	0.57	0.64	0.53
Model 3	RH_mean, n, u(x)	0.81	0.39	0.88	0.41	0.86	0.43
Model 4	RH_mean, u(x)	0.81	0.49	0.88	0.42	0.92	0.45
Model 5	T_max, T_min, n, u(x)	0.93	0.87	0.91	0.89	0.93	0.80
Model 6	T_max, RH_mean, n, u(x)	0.94	0.71	0.96	0.76	0.93	0.66
Model 7	T_max, RH_mean, u(x)	0.94	0.71	0.96	0.76	0.92	0.55
Model 8	T_max, T_min, RH_mean, n	0.94	0.78	0.94	0.57	0.90	0.66
Model 9	T_max, T_min, RH_mean, n, u(x)	0.95	0.78	0.96	0.79	0.91	0.79
Model 10	T_mean, RH_mean, n, u(x)	0.92	0.86	0.96	0.70	0.92	0.69
Model 11	T_mean, RH_mean, u(x)	0.92	0.86	0.96	0.72	0.92	0.59
Model 12	T_mean, RH_mean	0.91	0.84	0.94	0.57	0.90	0.34
Model 13	T_mean, n	0.90	0.86	0.88	0.86	0.88	0.82
Model 14	T_mean, RH_mean, n	0.91	0.84	0.93	0.57	0.91	0.49
Model 15	T_min,RH_mean, u(x)	0.96	0.93	0.95	0.94	0.97	0.96
Model 16	T_min, RH_mean, n, u(x)	0.92	0.60	0.96	0.63	0.93	0.69
Model 17	T_mean, u(x)	0.92	0.60	0.96	0.63	0.91	0.59

Table 8. Data required for the Et₀ estimation using the FAO PM56 and ML models.

Input Data	T_min	T_max	RH_min	RH_max	RH_mean	U(x)	N	Rn	Aerodynamic Factors	Adopted Methodology
Input Data	T_min	T_max	RH_min	RH_max	RH_mean	U(x)	N	Rn	(R_n, e_s, e_a, e_min, e_max, Δ, Z, and Ɣ)	Adopted Methodology
Climatic and aerodynamic	**	**	**	**	**	**	**	**	**	FAO PM56
Effective variables	**	xx	xx	xx	**	**	xx	xx	xx	ML models

**—parameters required for ET₀ estimation. xx—parameters used in the best input combination.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Salahudin, H.; Shoaib, M.; Albano, R.; Inam Baig, M.A.; Hammad, M.; Raza, A.; Akhtar, A.; Ali, M.U. Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET₀) Using Limited Meteorological Data. Hydrology 2023, 10, 169. https://0-doi-org.brum.beds.ac.uk/10.3390/hydrology10080169

AMA Style

Salahudin H, Shoaib M, Albano R, Inam Baig MA, Hammad M, Raza A, Akhtar A, Ali MU. Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET₀) Using Limited Meteorological Data. Hydrology. 2023; 10(8):169. https://0-doi-org.brum.beds.ac.uk/10.3390/hydrology10080169

Chicago/Turabian Style

Salahudin, Hamza, Muhammad Shoaib, Raffaele Albano, Muhammad Azhar Inam Baig, Muhammad Hammad, Ali Raza, Alamgir Akhtar, and Muhammad Usman Ali. 2023. "Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET₀) Using Limited Meteorological Data" Hydrology 10, no. 8: 169. https://0-doi-org.brum.beds.ac.uk/10.3390/hydrology10080169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu