Next Article in Journal
Evaluation of SNPP and NOAA-20 VIIRS Datasets Using RadCalNet and Landsat 8/OLI Data
Previous Article in Journal
MSL-Net: An Efficient Network for Building Extraction from Aerial Imagery
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Above-Ground Biomass Prediction for Croplands at a Sub-Meter Resolution Using UAV–LiDAR and Machine Learning Methods

1
Department of Geosciences and Natural Resources Management (IGN), Copenhagen University, DK-1350 Copenhagen, Denmark
2
Department of Computer Science (DiKU), Copenhagen University, DK-2100 Copenhagen, Denmark
3
Department of Information Systems, University of Münster, 48149 Münster, Germany
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(16), 3912; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14163912
Submission received: 14 June 2022 / Revised: 7 August 2022 / Accepted: 9 August 2022 / Published: 12 August 2022
(This article belongs to the Section AI Remote Sensing)

Abstract

:
Current endeavors to enhance the accuracy of in situ above-ground biomass (AGB) prediction for croplands rely on close-range monitoring surveys that use unstaffed aerial vehicles (UAVs) and mounted sensors. In precision agriculture, light detection and ranging (LiDAR) technologies are currently used to monitor crop growth, plant phenotyping, and biomass dynamics at the ecosystem scale. In this study, we utilized a UAV–LiDAR sensor to monitor two crop fields and a set of machine learning (ML) methods to predict real-time AGB over two consecutive years in the region of Mid-Jutland, Denmark. During each crop growing period, UAV surveys were conducted in parallel with AGB destructive sampling every 7–15 days, the AGB samples from which were used as the ground truth data. We evaluated the ability of the ML models to estimate the real-time values of AGB at a sub-meter resolution (0.17–0.52 m2). An extremely randomized trees (ERT) regressor was selected for the regression analysis, based on its predictive performance for the first year’s growing season. The model was retrained using previously identified hyperparameters to predict the AGB of the crops in the second year. The ERT performed AGB estimation using height and reflectance metrics from LiDAR-derived point cloud data and achieved a prediction performance of R 2 = 0.48 at a spatial resolution of 0.35 m2. The prediction performance could be improved significantly by aggregating adjacent predictions ( R 2 = 0.71 and R 2 = 0.93 at spatial resolutions of 1 m2 and 2 m2, respectively) as they ultimately converged to the reference biomass values because any individual errors averaged out. The AGB prediction results were examined as function of predictor type, training set size, sampling resolution, phenology, and canopy density. The results demonstrated that when combined with ML regression methods, the UAV–LiDAR method could be used to provide accurate real-time AGB prediction for crop fields at a high resolution, thereby providing a way to map their biochemical constituents.

Graphical Abstract

1. Introduction

Non-destructive crop surveying techniques are currently being developed to aid decision-making in precision agriculture [1]. Developing accurate crop monitoring methods is an important step toward improving climate change adaptation strategies within agroecosystem management and food production systems [2]. It is expected that the results of this study could have particular impacts on the following two research areas: (i) food security assessment [3] in the face of global land use [4] and climate changes [5,6,7]; (ii) crop yield modeling [8,9,10] and nutrient cycle studies [11,12].
The expansion of field studies and crop monitoring systems is therefore strongly encouraged by both the modeling community (to test and calibrate models) and the cultivation community (to enable adaptive harvesting operations and the real-time monitoring of nutritional constituents) [13].
Above-ground biomass (AGB) is a fundamental agronomic parameter that accounts for the content of dry biomass in the standing part of plants. It is an indicator for the phenological and health status of plants, as well as their nutrient contents [14]. As such, accurate AGB mapping is a crucial component for a wide range of scientific disciplines, including precision agriculture and agroecology [15]. Most established techniques for acquiring remote sensing (RS)-based estimates of AGB primarily rely on (i) passive optical and (ii) synthetic aperture radar (SAR) methods. However, these techniques present limitations. When SAR signals become saturated with increasing AGB [16,17], its efficacy in AGB mapping is sensitive to polarization and the combination of available bands [18]. Furthermore, it is primarily suited to mapping highly vegetated areas, such as woodlands and forests [19,20,21], and it has been shown to overestimate AGB in areas with low biomass levels [22,23]. Passive optical methods have been extensively applied in precision agriculture [24,25]. These methods are bound to two-dimensional features that are derived from various parameters, such as canopy structure [26], image texture [27,28], and greenness indices [24,29], which also suffer from substantial signal saturation and limitations that are linked to spectral resolution and radiometric sensor calibration [24,25]. In comparison to SAR and passive optical methods, regression models that are derived from light detection and ranging (LiDAR) data consistently achieve better results [30,31] and have become the preferred option when precise spatial AGB distributions are required [21].
LiDAR is an active sensing technique that is based on pulses of light. The reflections of these pulses are detected by the same device, which allows the precise location of intercepted objects to be determined based on the laser pulse travel time [32]. Variations in illumination and other atmospheric conditions do not affect the resulting data as much as the data from passive optical sensors. Progress in LiDAR technology has enabled advanced terrestrial ecosystem research. High-density LiDAR point cloud data (PCD) are currently used to investigate several variables, such as plant-level morphological traits in agroecosystems [33] and forests [34,35,36], as well as light–ecosystem interactions [37,38]. In fact, the integration of LiDAR systems in ecosystem monitoring at different scales (from the plot to ecosystem or regional level), combined with new modeling techniques [39,40,41], is broadening the field’s perspectives in an unprecedented manner [32]. The use of LiDAR systems in unstaffed aerial vehicles (UAVs) allows for the collection of PCD at sub-cm resolutions, which has enabled the development of a growing number of applications in agroecosystem monitoring [42,43,44]. This data collection technique, combined with machine learning (ML) regression methods, has proved to be useful for the precise AGB quantification of croplands [45,46,47,48] and forest ecosystems [49].
Advanced ML methods can attain better performances in regression tasks than the methods that are commonly used in AGB research (e.g., linear and power regression models that take the architectural and ancillary traits of plants as their inputs [50,51,52]). This is due to the ability of ML methods to account for more complex non-linear relationships between predictors and AGB. There is a growing interest in applying these recently developed ML methods to AGB prediction for land ecosystems. Recent state-of-the-art studies have combined ML methods and UAV-sensed data [53,54,55]. For example, Ma et al. [56] estimated the AGB of croplands using convolutional neural networks and close-range imagery, while Pan et al. [48] used neural networks (with so-called attention-based modules) and rover-borne LiDAR PCD as direct inputs instead of statistics. There have been many studies on AGB prediction using ML methods, ranging from sub-meter resolutions [48] to coarser resolutions [53,54,56]. However, the spatial resolution and spatial distributions of AGB predictions [55] are important factors that have strong influences on the obtained results. The effects of these two factors are usually not considered, which makes it difficult to compare the results that have been reported in the literature.
This study aimed to test the following hypotheses: discrete return UAV–LiDAR surveys are a valid technique for providing accurate estimates of crop structures in dense agricultural crops that allow for AGB regression analysis; the distribution of LiDAR PCD features empirically relates to the patterns of AGB distribution across crop fields; the empirical relationship between (i) the patterns in the remotely sensed data and (ii) those in the AGB field-based measurements can therefore be formulated as a supervised regression problem, which can then be learnt by an ML model to predict AGB; and, therefore, it is possible to automate the accurate estimation of AGB distributions in dense crops based on UAV–LiDAR surveys and ML regression methods at high spatial resolutions.
We proposed a method for estimating the spatial distributions of AGB in crop fields using UAV–LiDAR for data acquisition and ML methods for regression analysis. Our results showed that this method allowed real-time estimates of AGB in crops to be obtained at sub-meter resolutions and was effective for two different species (barley and winter wheat) and contrasting crop structures.

2. Materials and Methods

This section presents the study area, instrumentation, field measurements, datasets, and methods that were employed in this study.

2.1. Study Area

The study area (Figure 1) was a conventionally managed cropland site, which is located near to an Integrated Carbon Observation System (ICOS) class-1 ecosystem station at Voulundgaard (DK-Vng) in Mid-Jutland, Denmark (56.037476N, 9.160709E). It has an area of ca. 13 ha and is located on the eastern part of the Skjern River catchment. The cropland site is a relatively flat plain at an altitude of 64–68 m above sea level. The terrain has smooth undulations and a slight slope to the northwest.
The crops that were investigated were barley (Hordeum vulgare L.) in 2020 and 2021 and winter wheat (Triticum aestivum L.) in 2021. The growing period of the barley lasted from the end of April 2020 (shoot emergence) to the end of August 2020 (harvest) and followed a similar cycle in 2021. In 2021, the growing period of the winter wheat extended from January 2021 (shoot emergence) to the end of August 2021 (harvest). The conventional agricultural practice at the site included the application of fertilizers in the form of slurry (according to local regulations [57]), pesticides, and fungicides throughout the growing season, as well as sufficient irrigation to prevent water stress [58]. The region has a humid temperate climate that is characterized by a mean annual precipitation of 961.0 mm, a mean annual temperature of 8.1 °C, and an overcast or scattered cloud cover (with a mean annual incoming shortwave radiation of 108 W/m2). The ploughing layer (30 cm deep) sits on a sandy loam with pebble inclusions that are around 3–5 cm in diameter. The water table depth lies at 5.5 ± 1 m below ground. A visual documentation of the two crops is shown in Figure 2c,d.

2.2. Instrumentation, Flight Parameters, and Field Measurements

The UAV–LiDAR system that was utilized (i.e., LS Nano M8, LidarSwiss GmbH, CH) was equipped with three built-in devices: (i) a Global Navigation Satellite System (GNSS) receiver; (ii) an inertial measurement unit (IMU) with a mono-antenna navigation system; and (iii) a discrete LiDAR scanner (M8 sensor, Quanenergy Systems, Inc., Sunnyvale, CA, USA). The overall setup allowed for synchronization between the three data streams so as to produce a georeferenced point cloud dataset output after processing, as described in [59]. The system was mounted to a commercial DJI Matrice 600 Pro payload (Figure 3) at a 90° pitch angle and the same heading and roll angles as the UAV platform. The laser-firing system consisted of a stack of eight lasers, which emitted discrete near-infrared pulses (at 905 nm) in sequence at a constant rate of 53 kHz, with an individual beam spacing (along the direction of the UAV movement) of 3.20°. The beam divergence was 3 mrad, which provided a footprint diameter increase of 3 cm every 10 m along the propagation direction. However, the beam energy was spatially Gaussian distributed [60], which resulted in smaller active laser footprints. During data collection, a differential Global Positioning System (dGPS, Trimble R8) was set up as the receiver base station in real-time kinematic (RTK) positioning mode, which logged real-time corrected satellite coverage data during the UAV–LiDAR survey.
After running a number of tests with this instrumentation and considering our scanner specifications with reference to the available literature [59], the following LiDAR flight parameters were considered for this study: a flight height of 40 m above ground level; a horizontal speed of 5 m/s; a vertical field of view (FoV) of 21.5°; a horizontal FoV of 180°; a range gate of 2–200 m; LiDAR scan swaths of 50 m wide; a side overlap between individual LiDAR swaths of 20%; and a scan angle of 0–32°. These parameters provided a balanced trade-off between LiDAR penetration through the canopy (i.e., PCD porosity), gap fraction, point cloud density (ca. 200 pp/m2), and field coverage. The UAV–LiDAR surveys followed the calendar and frequency of the AGB sampling campaigns (Figure 2a,b), which resulted in a maximum time difference between the UAV–LiDAR surveys and AGB collection of 48 h in case of adverse weather conditions.
Immediately after the UAV–LiDAR surveys, the AGB sampling was carried out. At the study site, scattered georeferenced locations were selected to conduct the AGB collection. Several sampling campaigns were conducted each year (six in 2020 and seven in 2021). With regard to the spatial distribution of the AGB sampling locations (Figure 4), the two major campaigns (i.e., on 22 June 2020 and on 14 July 2021) utilized randomized sampling, while the rest (i.e., minor campaigns) were conducted in areas that showed greater variability. After the AGB collection, each sample was oven-dried for 72 h at 65 °C and weighed according to the reference protocols [61]. This procedure was replicated for every sampling campaign over both growing seasons. While the procedure that was followed in 2020 was used as the baseline reference, this AGB collection method was adapted for the 2021 AGB campaigns in order to enlarge the AGB reference labeled datasets (Table 1). Further details about the datasets are presented in Section 2.4.

2.3. LiDAR Data Processing

The steps in the laser data processing pipeline could be divided into two parts: (i) point cloud scene generation and (ii) point cloud scene processing. Lastly, a post-processing step was included to add the phenological stage information from each of the datasets.

2.3.1. Point Cloud Scene Generation

The raw positioning file (which was logged by the internal IMU and GNSS during the survey) was coupled with the satellite coverage data to produce a processed trajectory file using NAVsolve software (Oxford Technical Solutions, Bicester, UK). We stored the 3D position and UAV attitude data (i.e., longitude, latitude, ellipsoidal height, and orientation angles) at a 10 Hz sampling resolution. Then, in order to filter out LiDAR data that were compromised due to adverse circumstances (e.g., heading deviations that were too large or disturbances in cruise speed that were caused by wind gusts), the UAV trajectory was used to mask out the raw laser files following a quality criterion ( Δ heading > 25 ° was taken as the threshold for the rejection of raw laser data). Finally, the processed laser files were output in parallel swaths, according to the UAV–LiDAR survey trajectory plan. In this way, several parallel swaths produced the full PCD scene (Figure 5).

2.3.2. Point Cloud Scene Processing

At each location within the PCD scene, a PCD clipping was created that covered the same area as each individual AGB sample (i.e., 1 m × 0.35 m in 2020 and 0.5 m × 0.35 m in 2021). These clipped areas were considered separately (as shown in Figure 6a). Then, for each individual PCD clipping, the PCD height value (i.e., ellipsoidal height) was normalized by subtracting the terrain height. Finally, we extracted the statistical metrics of height and reflectance intensity from each individual PCD clipping, which were used as prediction features.
In order to select the prediction features, a set of candidate predictors was initially short-listed based on the Pearson coefficient between them and the target AGB labels. Then, iteratively, we removed the considered predictors one by one until we observed a drop in the regression performance using the validation set. Following this process, the PCD features that were finally selected as the predictors were the following:
  • The metrics of height were the mean, median, standard deviation, variance, skewness, and kurtosis;
  • The metrics of reflectance were the mean and standard deviation.

2.3.3. Data Post-Processing

The data were split into training and validation sets (70% and 30%, respectively). Both datasets were then subjected to stratification [62] to produce separate subgroups of instances. This processing step was inspired by the three main phenological growth stages of crops: (i) “early season”, from shoot emergence to the start of stem elongation; (ii) “rapid development”, from when crops start growing to when they reach their peak leaf area index; and (iii) “maturity”, when nutrients are allocated to grains and senescence occurs [63,64]. Both the training and validation sets were stratified according to the mean AGB value from the training distribution using K-means clustering [65,66] (Figure 7a).

2.4. Datasets

Following the aforementioned procedure, three original datasets were produced, which contained AGB- and LiDAR-derived prediction features. There was one dataset of original samples that corresponded to the year 2020 (barley) and two original datasets that corresponded to the year 2021 (wheat and barley). Data were also collected during 2021 so as to allow for the production of two additional datasets of augmented samples (Figure 6c).
The datasets that were composed of the augmented samples were produced as follows. During the 2021 AGB collection campaigns, samples were located and collected from adjacent plots (three samples per location) along the planting line (Figure 6b). In this way, by adding either two or three original samples, a new augmented instance was obtained, whose AGB value was determined by averaging the respective AGB values from the original samples. With regard to the LiDAR-derived metrics of the augmented samples, they were re-calculated by considering all of the LiDAR returns that were contained in the original samples. The numbers of instances that were included in each dataset, as well as the sample size, are summarized in Table 1.
The augmented datasets were used to test the model’s ability to generalize predictions because (i) the augmented data samples provided a better representation of the vegetation structures of the target plots than the original datasets (i.e., as the augmented samples covered a larger area, there was a higher count of LiDAR returns per sample) and (ii) the augmented datasets had a lower sample variance among the AGB labels (because the standard error decreased when averaged out over a larger area). These two observations were consistent with the fact that the augmented datasets had fewer noisy samples and higher correlations between the AGB and the assessed predictors.

2.5. ML Model Training and Evaluation

Three different regression models were considered. Then, according to their performance using the validation set, the most suitable model was selected for testing. The Huber regressor [67,68] was selected as a representative of the linear methods with regularization. This model is a common choice in AGB research due to its robustness to outliers [69]. Furthermore, we employed extremely randomized trees (ERT) [70], which is a tree-based ensemble method. The XGboost regressor [71] was selected as a representative of the boosting methods.
In contrast to the model parameters (e.g., the weights that were fitted during the training phase), the hyperparameters (e.g., the definition of the specific loss function to be minimized) were external to the learning model in question and were pre-defined to control the learning process [72]. Here, the hyperparameter optimization [73] of the three ML models (Table 2) was conducted using cross-validated grid searches (10 folds) in the training set. The b a r l e y 20 dataset was used for training and validation. The remaining four datasets from the 2021 growing season were used for refitting (70%) and testing (30%).
After hyperparameter tuning (Table 3), the model selection was conducted by comparing the regression performance of the three models using the 2020 validation set (Figure 7a), all of which were compared to a linear model (the results are shown in Section 3.1). The selected model was then re-fitted to the 70% of unseen data from 2021. The final performance evaluation was conducted using the remaining 30% of the 2021 datasets. The metrics that were used to assess the quality of predictions were the coefficient of determination ( R 2 ), the mean absolute error (MAE), the mean absolute percentage error (MAPE), and the root mean square error (RMSE). When the reporting of the scores as average values was considered to resolve any doubts about outliers, the scores are reported with an overline (e.g., M A P E ¯ ), which indicated that the result is a mean value over 10 repetitions. Figure 7 shows the overall workflow of the applied method.

2.6. Description of the Selected ML Model

ERT is an ensemble learning technique that aggregates the results of multiple individually created decision trees to produce regression results [70]. It was originally derived from the random forest (RF) model [76]. Every individual predictor, i.e., binary decision tree, in an ERT is constructed using the whole training set. At each node, a single tree decides which split of a random subset of feature splits would reduce the reconstruction error (e.g., MAE or MSE) the most. The random sampling of features and the random splits within the feature ranges lead to diverse and less correlated decision trees. Each tree is considered to be a “weak” regressor performance-wise but in combination, they create an ensemble that can outperform individual regressors. As the final prediction (AGB in our case), the average of the individual predictions of all of the decision trees in the forest is used.

2.7. Generation of AGB Prediction Maps

The PCD scenes were subjected to a binary classification, which separated the ground from vegetation LiDAR returns. For this purpose, several filtering algorithms were evaluated [77,78,79]. The cloth simulation filtering algorithm [78] (implemented in R software [80]) and the slope-based filter [77] (implemented in Lidar360 software [81]) produced the best results, in contrast to morphological filters [79], which performed poorly. However, in order to ensure optimal accuracy, the segmentation was completed manually. Next, the ground returns were rasterized using the inverse distance weighting algorithm [82,83] and then extended to the whole region of interest through the interpolation of the natural neighbors [84] in order to produce a digital terrain model (DTM). The DTM was used to normalize the vegetation returns to the ground returns by subtracting the DTM height from the initial PCD scene. Based on the spatial distributions of the height and reflectance of the normalized vegetation returns, a series of LiDAR-derived prediction features were mapped at a 1 m2 resolution. These prediction feature maps were used as the input for the trained regression model in order to produce the AGB prediction maps (Figure 8).

2.8. Uncertainty of AGB Field-Based Measurements

The collection of AGB data is often prone to estimation errors [85]. Therefore, accounting for the uncertainty that is introduced into AGB field-based measurements is a necessary step to avoid the underestimation of errors in AGB prediction mapping products [86]. In this study, an additional experiment was dedicated to account for the uncertainty that was introduced by the AGB sampling technique and the effects of crop sparsity on the AGB labels, which is shown in Appendix A.

3. Results

3.1. Model Selection

The extremely randomized trees (ERT), XGboost, and Huber regression models (Table 2) were trained using an optimized set of model-specific hyperparameters (Table 3) and were evaluated based on their regression performance using the validation set, according to the MAE, MAPE, RMSE, and R 2 metrics. The comparison of their performances using the validation set indicated that the ERT model provided the most accurate predictions across all four metrics. Figure 9 shows the predictive performances of the three considered models using both the training and validation sets. The criterion for model selection was the model performance using the validation set, for which ERT achieved the highest R 2 ¯ value and the lowest R M S E ¯ , M A P E ¯ , and M A E ¯ values (note that the Huber model consistently achieved the same performance as it was a deterministic linear model, hence the flat boxplot in the figure). In all four subplots in Figure 9, the horizontal blue and green lines show the performance of the linear model that was employed as the comparative baseline using the training and validation sets, respectively.
Therefore, ERT was selected for the final evaluation using the testing sets. The optimized parameters that were selected for the ERT regression model were: mean square error, which was used as the criterion to evaluate the quality of a split; the maximum depth that was allowed for any individual tree was limited to seven; the sampling was without bootstrapping; and the maximum number of features to consider per individual tree was limited to log 2 (number of features). Finally, the number of trees was set to 1000 [70].

3.2. AGB Prediction at a Sub-Meter Resolution

The AGB predictions at a 0.35 m2 spatial resolution that were produced by the ERT model using the barley and wheat testing datasets are shown in Figure 10a and Figure 11a, respectively. In both cases, the results were compared to the linear regression model, which was fitted to the same datasets (i.e., fitting to the 70% and predicting the remaining 30% of the 2021 data). The predictive performances of this model are shown in Figure 10b and Figure 11b for comparison.

3.3. Aggregated AGB Predictions

The prediction results using the testing set were subjected to a resampling technique (i.e., random sampling with replacement) and aggregated into i {1, 2, …, N} samples, where i is the number of pairs (AGB prediction vs. AGB target value) that were aggregated at each step. As shown in Figure 12a and Figure 13a, the mean value of the residual distribution of the barley and wheat testing datasets approached zero. There was a systematic overestimation of 2% in the wheat testing dataset (barley: 2.2 g/m2; wheat: −18.5 g/m2). The residuals were independent and distributed around zero, so they were partially canceled out when averaged over a larger area, which caused the R 2 ¯ score to converge to 1 as the number of aggregate predictions increased (Figure 12b and Figure 13b). Thus, the prediction could be improved by coarsening the spatial resolution (Table 4). This technique is known as the spatial averaging of errors [87].
By only aggregating two predictions that were randomly sampled for each testing set, the AGB results at a 0.7 m2 resolution for both the barley and wheat datasets improved significantly compared to the respective AGB predictions at a 0.35 m2 resolution (Table 4 and Figure 12b and Figure 13b).
The R 2 score of the aggregate predictions presented a turning point between 1 m2 and 2 m2 of spatial resolution (i.e., three and six aggregate samples, respectively), reaching optimal predictions ( R 2 0.95 ) from 3 m2 onward for the barley crops (Figure 12b) and from 4 m2 onward for the wheat crops (Figure 13b). At 2 m2 of spatial resolution, the R 2 reached 0.93 and 0.89 for the barley and wheat testing datasets, respectively (Table 4 and Figure 12b.1 and Figure 13b.1).
The spatial distribution of the AGB regression results is shown as an AGB prediction map in Figure 8c at a 1 m2 spatial resolution.

4. Discussion

4.1. AGB Prediction

This study explored a new method to acquire real-time estimates of the AGB distribution in croplands at a sub-meter resolution using UAV–LiDAR data and ML regression models. The results indicated that local variations in crop morphology (e.g., crop structure and height development) could be captured by a close-range UAV–LiDAR survey, thereby enabling a regression analysis of the AGB. The accuracy that was achieved using the ERT model was on a par with the currently established methods, i.e., multivariate linear regression and power regression models, while attaining resolutions that were one to two orders of magnitude higher than those in the reference studies, e.g., Han et al. (2.4 m2) [53], Ma et al. (12 m2) [56] and Zha et al. (160–300 m2) [54].
Figure 10 and Figure 11 show how much the ERT model predictions diverged from the ideal case (i.e., the line of equality) at a 0.35 m2 resolution when predicting AGB for the same crop species that was used for training and when predicting AGB for a different crop species, respectively. The ERT regression model achieved valid predictions in both case and the ERT model predictions outperformed those of the linear regression model that was used as the comparative baseline (Figure 10b and Figure 11b). However, the prediction results for different crop species (i.e., training the model with the barley dataset and predicting the AGB for wheat crops) were not as accurate as those for the same species, which was expected. The higher accuracy of the predictions for the same crop species also meant that the aggregated prediction required fewer aggregated samples to converge in the barley set (Figure 12b) than in the wheat set (Figure 13b). The performance reduction when training and predicting for different species could be attributed to (i) the different plant-level AGB distributions between the two crop species, as captured by the AGB labels (Figure 2), and (ii) the different morphology in the canopy structures, as portrayed by the LiDAR-derived predictors. The differences between both crops were retained in the datasets, which caused a dataset shift effect [88]. This shift challenged the accuracy of the AGB predictions since the joint distributions of the predictors and the target AGB values were different for the training and testing phases.
Regarding the importance assessment of the selected predictors (i.e., independent variables), care had to be taken when considering predictors that had high levels of correlation between them, as occurs in LiDAR-derived height metrics. It has been observed empirically [89,90] and analytically [91] that high correlation across prediction features compromises the importance assessment in tree-based methods.
The predictive features that showed the highest correlations with AGB were those that were derived from the height metrics and showed a Pearson correlation of between 0.65–0.71. In contrast, the intensity metrics did not correlate consistently with AGB. This was probably due to the low cardinality of the intensity range, which prevented the production of sufficiently descriptive features.
With regard to phenology, the barley dataset that was collected over an extended time period in 2020 covered different crop development phases and presented higher correlation coefficients between height-related predictors and AGB compared to the barley dataset that was collected in 2021, which did not cover all of the phases of vegetation growth. The results showed that it was especially important to use datasets that covered different phases of vegetation development to train the models for AGB predictions in order for the regression models to learn the relationships between the predictive features and the target values. However, once a model was adequately trained, it could predict AGB using datasets that were collected during “snapshot surveys” (i.e., data acquisition campaigns that were conducted in one single day), such as the b a r l e y 21 ) dataset.
When considering vegetation structure, it was observed that open canopies and heterogeneous crop patterns were morphological characteristics that compromised the use of LiDAR-derived height metrics as predictors of AGB. Plots that were surrounded by sparse vegetation were commonly reached by LiDAR beams at lower heights than plots in denser areas, thus distorting the relationships between the AGB and height metrics.
Additionally, increased variations in the AGB labels could enlarge the training domain, which favored the regression model’s ability to interpolate predictions using the testing set [92]. Likewise, more variability in the prediction features could make them more descriptive and, therefore, better at capturing the morphological traits of the crops (e.g., attaining a better portrayal of stem density and canopy structure).

4.2. Aggregation of AGB Predictions

It was notable that AGB predictions with significantly higher accuracies could be achieved using slightly coarser spatial resolutions (Figure 12 and Figure 13). The aggregation improved the model’s predictions considerably by only adding two instances (i.e., at a spatial resolution of 0.7 m2), while the 1 m2 resolution was found to be the optimal trade-off between spatial resolution and prediction performance for capturing local variations in AGB (Table 4 and Figure 12b.1 and Figure 13b.1). In terms of the sampling resolution, the datasets that were composed of smaller samples (i.e., 0.5 m × 0.34 m) showed lower correlations between the height-related predictors and AGB than the same datasets after the aggregation of two or three samples (1 m × 0.34 m and 1.5 m × 0.34 m, respectively). The AGB samples with a very high spatial resolution (e.g., 0.175 m2) could suffer from poor PCD representation (i.e., low counts of LiDAR returns), thereby compromising the reliability of the statistics that were extracted. Additionally, the datasets that had small sample sizes presented higher variance in the AGB ground truth values (i.e., the measurements at a 0.175 m2 resolution produced noisier labels).

4.3. Applied ML Methods

Although we considered different ML methods, it was not our intention to systematically compare the performances of a series of ML models to find the lowest possible error rate since the experimental design of this study did not allow for such comparisons. Only relatively few training instances were available and the target AGB values had a high variance (Figure 2a,b). Instead, we aimed to test an ML regression method that utilized features that were derived from point cloud data that represented the structures of crop canopies to predict AGB, taking into account the aforementioned limitations. The suggested method consisted of selecting one representative model per family of methods (Table 2) and the set of parameters (Table 3) that performed the best using the validation set, followed by re-fitting the selected model (i.e., ERT) to an unseen dataset that was collected the following year (Figure 7). As such, the selection of a specific ML model was not relevant in this case. Indeed, by exploring extended parameter sets, one model could slightly outperform the others for individual predictions or a different model could perform better using datasets from subsequent years or another crop type. Achieving greater accuracy and predicting AGB at higher spatial resolutions would require more training data that are representative of different growing conditions and collected in a protocolized manner using both AGB labels and PCD scenes in order to achieve AGB predictions that are robust to inter-annual and inter-species variations (e.g., crop structure, weather conditions, different crops, etc.).

5. Conclusions

In this study, we developed a method that combines UAV–LiDAR surveys and ML techniques to predict the above-ground biomass of standing vegetation for cereal crops at sub-meter resolutions. This method was developed and tested for two crop species at the same field site over two consecutive growing seasons: winter wheat (Triticum aestivum L.) and barley (Hordeum vulgare L.). The ERT model performed real-time AGB estimation, taking as its input the height and intensity metrics that were derived from the point cloud data scene, and achieved a prediction performance of R 2 = 0.48, RMSE = 207 g, MAE = 162 g, and MAPE = 42% at a spatial resolution of 0.35 m2. However, by aggregating the individual predictions, it was observed that the prediction performance could be increased significantly by coarsening the spatial resolution as the predictions were statistically independent and uncorrelated. At a spatial resolution of 2 m2, the regression performance achieved R 2 = 0.93, RMSE = 300 g/m2, MAE = 266 g/m2, and MAPE = 13% when the training and testing datasets corresponded to the same crop species. The aggregated AGB predictions also achieved reasonable results at a 2 m2 spatial resolution when the model was trained on one crop type and tested on another: R 2 = 0.89, RMSE = 400 g/m2, MAE = 351 g/m2, and MAPE = 16%. This slight reduction in the prediction performance was explained by the differences between (i) the canopy structures and (ii) the plant-level biomass distributions of the two crop species under consideration. We encourage the continuation of protocolized field-based data collection campaigns, as well as UAV–LiDAR surveying, as a means to gain valuable data that are representative of crops under different environmental conditions in order to develop AGB regression models that are capable of generalizing predictions that are even more robust to inter-annual and inter-species variations.
More precise AGB estimates allow for real-time evaluations of crop yields, thereby enabling adaptive management practices (such as adjusting harvest dates according to mill capacity) and estimated production, as well as improving early warning systems for food security in climate-vulnerable regions. The method that was introduced here for acquiring real-time AGB estimates for croplands using UAV–LiDAR surveys and ML supervised regression models could pave the way for future studies on predicting the spatial distributions of AGB-related biochemical constituents at sub-meter resolutions by applying this method throughout vegetation growth periods for precision agriculture and agroecological applications. This quantitative method could also assist in management decision-making. Finally, this line of research could help to narrow down the margins of uncertainty that are still present in the estimates of current above-ground carbon stocks and C-turnover values at the ecosystem level, for example.

Author Contributions

Original conceptual framework: K.T., F.C.G., C.I. and T.F.; experimental design: K.T., T.F. and J.C.R.; UAV–LiDAR data collection: J.C.R. and K.T.; field-based data collection and curation: R.J., J.C.R. and K.T.; laser data processing: J.C.R. and K.T.; feature engineering and the training and evaluation of the machine learning models: J.C.R., S.O., L.L., C.I. and F.C.G.; visualisation: J.C.R.; supervision: T.F., K.T., C.I., F.C.G. and S.O.; project administration K.T. and F.C.G.; writing—original draft preparation: J.C.R.; writing—review and editing: all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This project received funding support from the Talent Program Horizon 2020/Marie Skłodowska-Curie Actions; a Villum Experiment grant from the Velux Foundations, the Drone-Borne LiDAR and Artificial Intelligence for Assessing Carbon Storage (MapCland) project (grant number: 00028314); the Deep Learning for Accurate Quantification of Carbon Stocks in Cropland and Forest Areas (DeepCrop) project (UCPH Strategic plan 2023 Data + Pool); as well as a UAS ability infrastructure grant from the Danish Agency for Science, Technology, and Innovation. The authors also acknowledge the financial support from the Independent Research Fund, Denmark, through the Monitoring Changes in Big Satellite Data via Massively-Parallel Artificial Intelligence project (grant number: 9131-00110B) and the Villum Fonden through the Deep Learning and Remote Sensing for Unlocking Global Ecosystem Resource Dynamics (DeReEco) project (grant number: 34306).

Data Availability Statement

Not applicable.

Acknowledgments

The authors acknowledge the contributions of René Lee, Lars Rasmussen, Rune Skov Maigoord, Binsheng Gao, and Alek Wieckowski, who supported the task of field data acquisition and contributed to this study as fieldwork and laboratory assistants.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AGBAbove-Ground Biomass
dGPSDifferential Global Positioning System
DTMDigital Terrain Model
ERTExtremely Randomized Trees
FoVField of View
ICOSIntegrated Carbon Observation System
LiDARLight Detection and Ranging
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
MLMachine Learning
PCDPoint Cloud Data
RMSERoot Mean Square Error
RSRemote Sensing
RTKReal-Time Kinematic
SARSynthetic Aperture Radar
UAVUnstaffed Aerial Vehicle

Appendix A

In order to quantify the effects of crop sparsity on the uncertainty that was contained in the AGB labels, we collected 180 samples from 10 different locations of 1 m2 each (see Figure A1). This extra study contrasted with the main field study, for which sub-meter samples were taken exclusively. To calculate the variance, each of the 18 sub-meter samples was upscaled to g/m2 and then subtracted from the total AGB in the 1 m2 that was measured at each location. Due to oven capacity limitations, this analysis was conducted with wet AGB, assuming that the variations in water content within 1 m2 would be negligible. A higher variance was found in the datasets that were composed of samples with higher spatial resolutions (i.e., the 2021 datasets) and within those, it was substantially higher for crops with higher sparsity (i.e., wheat) than for the more homogeneous crops (i.e., barley), as was expected.
Figure A1. The variance of measurements of wet AGB value per sample (upscaled to g/m2) with respect to the total wet AGB value in an area of 1 m2 (which was set as a reference at 0 g/m2): (a) the 2020 barley dataset (sample size: 1 × 0.35 m2); (b) the 2021 wheat and barley datasets (sample size: 0.5 × 0.35 m2). In both (a,b), the solid lines represent the estimation of the kernel probability density.
Figure A1. The variance of measurements of wet AGB value per sample (upscaled to g/m2) with respect to the total wet AGB value in an area of 1 m2 (which was set as a reference at 0 g/m2): (a) the 2020 barley dataset (sample size: 1 × 0.35 m2); (b) the 2021 wheat and barley datasets (sample size: 0.5 × 0.35 m2). In both (a,b), the solid lines represent the estimation of the kernel probability density.
Remotesensing 14 03912 g0a1

References

  1. Maimaitijiang, M.; Sagan, V.; Sidike, P.; Daloye, A.M.; Erkbol, H.; Fritschi, F.B. Crop monitoring using satellite/UAV data fusion and machine learning. Remote Sens. 2020, 12, 1357. [Google Scholar] [CrossRef]
  2. Gebbers, R.; Adamchuk, V.I. Precision agriculture and food security. Science 2010, 327, 828–831. [Google Scholar] [CrossRef]
  3. Isbell, F.; Adler, P.R.; Eisenhauer, N.; Fornara, D.; Kimmel, K.; Kremen, C.; Letourneau, D.K.; Liebman, M.; Polley, H.W.; Quijas, S.; et al. Benefits of increasing plant diversity in sustainable agroecosystems. J. Ecol. 2017, 105, 871–879. [Google Scholar] [CrossRef]
  4. Lambin, E.F.; Meyfroidt, P. Global land use change, economic globalization, and the looming land scarcity. Proc. Natl. Acad. Sci. USA 2011, 108, 3465–3472. [Google Scholar] [CrossRef]
  5. Challinor, A.J.; Ewert, F.; Arnold, S.; Simelton, E.; Fraser, E. Crops and climate change: Progress, trends, and challenges in simulating impacts and informing adaptation. J. Exp. Bot. 2009, 60, 2775–2789. [Google Scholar] [CrossRef]
  6. Wang, N.; Wang, E.; Wang, J.; Zhang, J.; Zheng, B.; Huang, Y.; Tan, M. Modelling maize phenology, biomass growth and yield under contrasting temperature conditions. Agric. For. Meteorol. 2018, 250, 319–329. [Google Scholar] [CrossRef]
  7. Raza, A.; Razzaq, A.; Mehmood, S.S.; Zou, X.; Zhang, X.; Lv, Y.; Xu, J. Impact of climate change on crops adaptation and strategies to tackle its outcome: A review. Plants 2019, 8, 34. [Google Scholar] [CrossRef] [PubMed]
  8. Deryng, D.; Elliott, J.; Folberth, C.; Müller, C.; Pugh, T.A.; Boote, K.J.; Conway, D.; Ruane, A.C.; Gerten, D.; Jones, J.W.; et al. Regional disparities in the beneficial effects of rising CO2 concentrations on crop water productivity. Nat. Clim. Chang. 2016, 6, 786–790. [Google Scholar] [CrossRef]
  9. Wang, X.; Zhao, C.; Müller, C.; Wang, C.; Ciais, P.; Janssens, I.; Peñuelas, J.; Asseng, S.; Li, T.; Elliott, J.; et al. Emergent constraint on crop yield response to warmer temperature from field experiments. Nat. Sustain. 2020, 3, 908–916. [Google Scholar] [CrossRef]
  10. Jägermeyr, J.; Müller, C.; Ruane, A.C.; Elliott, J.; Balkovic, J.; Castillo, O.; Faye, B.; Foster, I.; Folberth, C.; Franke, J.A.; et al. Climate impacts on global agriculture emerge earlier in new generation of climate and crop models. Nat. Food 2021, 2, 873–885. [Google Scholar] [CrossRef]
  11. Tully, K.; Ryals, R. Nutrient cycling in agroecosystems: Balancing food and environmental objectives. Agroecol. Sustain. Food Syst. 2017, 41, 761–798. [Google Scholar] [CrossRef]
  12. Abalos, D.; van Groenigen, J.W.; Philippot, L.; Lubbers, I.M.; De Deyn, G.B. Plant trait-based approaches to improve nitrogen cycling in agroecosystems. J. Appl. Ecol. 2019, 56, 2454–2466. [Google Scholar] [CrossRef]
  13. EIT-Food. More Crops Consituents Sensing; EIT-Food: Leuven, Belgium, 2022. [Google Scholar]
  14. Weih, M.; Hamnér, K.; Pourazari, F. Analyzing plant nutrient uptake and utilization efficiencies: Comparison between crops and approaches. Plant Soil 2018, 430, 7–21. [Google Scholar] [CrossRef]
  15. Kumar, L.; Mutanga, O. Remote sensing of above-ground biomass. Remote Sens. 2017, 9, 935. [Google Scholar] [CrossRef]
  16. Huete, A.; Liu, H.; Batchily, K.; Van Leeuwen, W. A comparison of vegetation indices over a global set of TM images for EOS-MODIS. Remote Sens. Environ. 1997, 59, 440–451. [Google Scholar] [CrossRef]
  17. Luckman, A.; Baker, J.; Honzák, M.; Lucas, R. Tropical forest biomass density estimation using JERS-1 SAR: Seasonal variation, confidence limits, and application to image mosaics. Remote Sens. Environ. 1998, 63, 126–139. [Google Scholar] [CrossRef]
  18. Hoekman, D.; Quiñones, M. Land cover type and biomass classification using AirSAR data for evaluation of monitoring scenarios in the Colombian Amazon. IEEE Trans. Geosci. Remote Sens. 2000, 38, 685–696. [Google Scholar] [CrossRef]
  19. Attarchi, S.; Gloaguen, R. Improving the estimation of above ground biomass using dual polarimetric PALSAR and ETM+ data in the Hyrcanian mountain forest (Iran). Remote Sens. 2014, 6, 3693–3715. [Google Scholar] [CrossRef]
  20. Joshi, N.P.; Mitchard, E.T.; Schumacher, J.; Johannsen, V.K.; Saatchi, S.; Fensholt, R. L-band SAR backscatter related to forest cover, height and aboveground biomass at multiple spatial scales across Denmark. Remote Sens. 2015, 7, 4442–4472. [Google Scholar] [CrossRef]
  21. Vaglio Laurin, G.; Pirotti, F.; Callegari, M.; Chen, Q.; Cuozzo, G.; Lingua, E.; Notarnicola, C.; Papale, D. Potential of ALOS2 and NDVI to estimate forest above-ground biomass, and comparison with lidar-derived estimates. Remote Sens. 2016, 9, 18. [Google Scholar] [CrossRef]
  22. Viergever, K.M. Establishing the Sensitivity of Synthetic Aperture Radar to Above-Ground Biomass in Wooded Savannas. Ph.D. Thesis, The University of Edinburgh, Edinburgh, UK, 2008. [Google Scholar]
  23. Michelakis, D.; Stuart, N.; Lopez, G.; Linares, V.; Woodhouse, I.H. Local-scale mapping of biomass in tropical lowland pine savannas using ALOS PALSAR. Forests 2014, 5, 2377–2399. [Google Scholar] [CrossRef]
  24. Houborg, R.; McCabe, M.F. High-Resolution NDVI from planet’s constellation of earth observing nano-satellites: A new data source for precision agriculture. Remote Sens. 2016, 8, 768. [Google Scholar] [CrossRef]
  25. Deng, L.; Mao, Z.; Li, X.; Hu, Z.; Duan, F.; Yan, Y. UAV-based multispectral remote sensing for precision agriculture: A comparison between different cameras. ISPRS J. Photogramm. Remote Sens. 2018, 146, 124–136. [Google Scholar] [CrossRef]
  26. Bastin, J.F.; Barbier, N.; Couteron, P.; Adams, B.; Shapiro, A.; Bogaert, J.; De Cannière, C. Aboveground biomass mapping of African forest mosaics using canopy texture analysis: Toward a regional approach. Ecol. Appl. 2014, 24, 1984–2001. [Google Scholar] [CrossRef] [PubMed]
  27. Ploton, P.; Barbier, N.; Couteron, P.; Antin, C.; Ayyappan, N.; Balachandran, N.; Barathan, N.; Bastin, J.F.; Chuyong, G.; Dauby, G.; et al. Toward a general tropical forest biomass prediction model from very high resolution optical satellite images. Remote Sens. Environ. 2017, 200, 140–153. [Google Scholar] [CrossRef]
  28. Hlatshwayo, S.T.; Mutanga, O.; Lottering, R.T.; Kiala, Z.; Ismail, R. Mapping forest aboveground biomass in the reforested Buffelsdraai landfill site using texture combinations computed from SPOT-6 pan-sharpened imagery. Int. J. Appl. Earth Obs. Geoinf. 2019, 74, 65–77. [Google Scholar] [CrossRef]
  29. Yue, J.; Yang, G.; Tian, Q.; Feng, H.; Xu, K.; Zhou, C. Estimate of winter-wheat above-ground biomass based on UAV ultrahigh-ground-resolution image textures and vegetation indices. ISPRS J. Photogramm. Remote Sens. 2019, 150, 226–244. [Google Scholar] [CrossRef]
  30. Saatchi, S.; Marlier, M.; Chazdon, R.L.; Clark, D.B.; Russell, A.E. Impact of spatial variability of tropical forest structure on radar estimation of aboveground biomass. Remote Sens. Environ. 2011, 115, 2836–2849. [Google Scholar] [CrossRef]
  31. Zolkos, S.G.; Goetz, S.J.; Dubayah, R. A meta-analysis of terrestrial aboveground biomass estimation using lidar remote sensing. Remote Sens. Environ. 2013, 128, 289–298. [Google Scholar] [CrossRef]
  32. Calders, K.; Adams, J.; Armston, J.; Bartholomeus, H.; Bauwens, S.; Bentley, L.P.; Chave, J.; Danson, F.M.; Demol, M.; Disney, M.; et al. Terrestrial laser scanning in forest ecology: Expanding the horizon. Remote Sens. Environ. 2020, 251, 112102. [Google Scholar] [CrossRef]
  33. Bates, J.S.; Montzka, C.; Schmidt, M.; Jonard, F. Estimating canopy density parameters time-series for winter wheat using UAS Mounted LiDAR. Remote Sens. 2021, 13, 710. [Google Scholar] [CrossRef]
  34. Ferraz, A.; Saatchi, S.; Mallet, C.; Meyer, V. Lidar detection of individual tree size in tropical forests. Remote Sens. Environ. 2016, 183, 318–333. [Google Scholar] [CrossRef]
  35. Morsdorf, F.; Eck, C.; Zgraggen, C.; Imbach, B.; Schneider, F.D.; Kükenbrink, D. UAV-based LiDAR acquisition for the derivation of high-resolution forest and ground information. Lead. Edge 2017, 36, 566–570. [Google Scholar] [CrossRef]
  36. Schneider, F.D.; Morsdorf, F.; Schmid, B.; Petchey, O.L.; Hueni, A.; Schimel, D.S.; Schaepman, M.E. Mapping functional diversity from remotely sensed morphological and physiological forest traits. Nat. Commun. 2017, 8, 1441. [Google Scholar] [CrossRef]
  37. Schneider, F.D.; Kükenbrink, D.; Schaepman, M.E.; Schimel, D.S.; Morsdorf, F. Quantifying 3D structure and occlusion in dense tropical and temperate forests using close-range LiDAR. Agric. For. Meteorol. 2019, 268, 249–257. [Google Scholar] [CrossRef]
  38. Kükenbrink, D.; Schneider, F.D.; Schmid, B.; Gastellu-Etchegorry, J.P.; Schaepman, M.E.; Morsdorf, F. Modelling of three-dimensional, diurnal light extinction in two contrasting forests. Agric. For. Meteorol. 2021, 296, 108230. [Google Scholar] [CrossRef]
  39. Jin, X.; Kumar, L.; Li, Z.; Xu, X.; Yang, G.; Wang, J. Estimation of winter wheat biomass and yield by combining the aquacrop model and field hyperspectral data. Remote Sens. 2016, 8, 972. [Google Scholar] [CrossRef]
  40. Gastellu-Etchegorry, J.P.; Lauret, N.; Yin, T.; Landier, L.; Kallel, A.; Malenovskỳ, Z.; Al Bitar, A.; Aval, J.; Benhmida, S.; Qi, J.; et al. DART: Recent advances in remote sensing data modeling with atmosphere, polarization, and chlorophyll fluorescence. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 2640–2649. [Google Scholar] [CrossRef]
  41. Demol, M.; Calders, K.; Verbeeck, H.; Gielen, B. Forest above-ground volume assessments with terrestrial laser scanning: A ground-truth validation experiment in temperate, managed forests. Ann. Bot. 2021, 128, 805–819. [Google Scholar] [CrossRef]
  42. Sofonia, J.; Shendryk, Y.; Phinn, S.; Roelfsema, C.; Kendoul, F.; Skocaj, D. Monitoring sugarcane growth response to varying nitrogen application rates: A comparison of UAV SLAM LiDAR and photogrammetry. Int. J. Appl. Earth Obs. Geoinf. 2019, 82, 101878. [Google Scholar] [CrossRef]
  43. Longfei, Z.; Xiaohe, G.; Shu, C.; Guijun, Y.; Meiyan, S.; Quian, S. Analysis of Plant Height Changes of Lodged Maize Using UAV-LiDAR Data. Agriculture 2020, 10, 146. [Google Scholar]
  44. Trepekli, K.; Friborg, T. Deriving Aerodynamic Roughness Length at Ultra-High Resolution in Agricultural Areas Using UAV-Borne LiDAR. Remote Sens. 2021, 13, 3538. [Google Scholar] [CrossRef]
  45. Bendig, J.; Yu, K.; Aasen, H.; Bolten, A.; Bennertz, S.; Broscheit, J.; Gnyp, M.L.; Bareth, G. Combining UAV-based plant height from crop surface models, visible, and near infrared vegetation indices for biomass monitoring in barley. Int. J. Appl. Earth Obs. Geoinf. 2015, 39, 79–87. [Google Scholar] [CrossRef]
  46. Yang, G.; Liu, J.; Zhao, C.; Li, Z.; Huang, Y.; Yu, H.; Xu, B.; Yang, X.; Zhu, D.; Zhang, X.; et al. Unmanned aerial vehicle remote sensing for field-based crop phenotyping: Current status and perspectives. Front. Plant Sci. 2017, 8, 1111. [Google Scholar] [CrossRef] [PubMed]
  47. Lu, N.; Zhou, J.; Han, Z.; Li, D.; Cao, Q.; Yao, X.; Tian, Y.; Zhu, Y.; Cao, W.; Cheng, T. Improved estimation of aboveground biomass in wheat from RGB imagery and point cloud data acquired with a low-cost unmanned aerial vehicle system. Plant Methods 2019, 15, 17. [Google Scholar] [CrossRef] [PubMed]
  48. Pan, L.; Liu, L.; Condon, A.G.; Estavillo, G.M.; Coe, R.A.; Bull, G.; Stone, E.A.; Petersson, L.; Rolland, V. Biomass Prediction With 3D Point Clouds From LiDAR. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 1330–1340. [Google Scholar]
  49. Oehmcke, S.; Li, L.; Revenga, J.; Nord-Larsen, T.; Trepekli, K.; Gieseke, F.; Igel, C. Deep Learning Based 3D Point Cloud Regression for Estimating Forest Biomass. arXiv 2021, arXiv:2112.11335. [Google Scholar]
  50. Forrester, D.I.; Tachauer, I.H.H.; Annighoefer, P.; Barbeito, I.; Pretzsch, H.; Ruiz-Peinado, R.; Stark, H.; Vacchiano, G.; Zlatanov, T.; Chakraborty, T.; et al. Generalized biomass and leaf area allometric equations for European tree species incorporating stand structure, tree age and climate. For. Ecol. Manag. 2017, 396, 160–175. [Google Scholar] [CrossRef]
  51. Herold, A.; Zell, J.; Rohner, B.; Didion, M.; Thürig, E.; Rösler, E. State and change of forest resources. In Swiss National Forest Inventory–Methods and Models of the Fourth Assessment; Springer: Berlin/Heidelberg, Germany, 2019; pp. 205–230. [Google Scholar]
  52. Shendryk, Y.; Sofonia, J.; Garrard, R.; Rist, Y.; Skocaj, D.; Thorburn, P. Fine-scale prediction of biomass and leaf nitrogen content in sugarcane using UAV LiDAR and multispectral imaging. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102177. [Google Scholar] [CrossRef]
  53. Han, L.; Yang, G.; Dai, H.; Xu, B.; Yang, H.; Feng, H.; Li, Z.; Yang, X. Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data. Plant Methods 2019, 15, 10. [Google Scholar] [CrossRef]
  54. Zha, H.; Miao, Y.; Wang, T.; Li, Y.; Zhang, J.; Sun, W.; Feng, Z.; Kusnierek, K. Improving unmanned aerial vehicle remote sensing-based rice nitrogen nutrition index prediction with machine learning. Remote Sens. 2020, 12, 215. [Google Scholar] [CrossRef]
  55. Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Beier, C.M.; Klimkowski, D.J.; Volk, T.A. comparison of machine and deep learning methods to estimate shrub willow biomass from UAS imagery. Can. J. Remote Sens. 2021, 47, 209–227. [Google Scholar] [CrossRef]
  56. Ma, J.; Li, Y.; Chen, Y.; Du, K.; Zheng, F.; Zhang, L.; Sun, Z. Estimating above ground biomass of winter wheat at early growth stages using digital images and deep convolutional neural network. Eur. J. Agron. 2019, 103, 117–129. [Google Scholar] [CrossRef]
  57. Danish Ministry of Environment, Government of Denmark. Order on the Use of Fertilisers by Agriculture for the 2020/2021 Planning Period. Available online: https://www.retsinformation.dk/eli/lta/2020/1166 (accessed on 25 October 2021).
  58. Jensen, R.; Herbst, M.; Friborg, T. Direct and indirect controls of the interannual variability in atmospheric CO2 exchange of three contrasting ecosystems in Denmark. Agric. For. Meteorol. 2017, 233, 12–31. [Google Scholar] [CrossRef]
  59. Davidson, L.; Mills, J.; Haynes, I.; Augarde, C.; Bryan, P.; Douglas, M. Airborne to UAS LiDAR: An analysis of UAS LiDAR ground control targets. In Proceedings of the ISPRS Geospatial Week 2019, Enschede, The Netherlands, 10–14 June 2019. [Google Scholar]
  60. Jutzi, B.; Eberle, B.; Stilla, U. Estimation and measurement of backscattered signals from pulsed laser radar. In Image and Signal Processing for Remote Sensing VIII; SPIE: New York, NY, USA, 2003; Volume 4885, pp. 256–267. [Google Scholar]
  61. Gielen, B.; Acosta, M.; Altimir, N.; Buchmann, N.; Cescatti, A.; Ceschia, E.; Fleck, S.; Hortnagal, L.; Klumpp, K.; Kolari, P.; et al. Ancillary vegetation measurements at ICOS ecosystem stations. Int. Agrophys. 2018, 32, 645–664. [Google Scholar] [CrossRef]
  62. Sechidis, K.; Tsoumakas, G.; Vlahavas, I. On the stratification of multi-label data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2011; pp. 145–158. [Google Scholar]
  63. Meier, U. Growth Stages of Mono-and Dicotyledonous Plants; Blackwell Wissenschafts-Verlag: Berlin, Germany, 1997. [Google Scholar]
  64. Kuester, T.; Spengler, D.; Barczi, J.F.; Segl, K.; Hostert, P.; Kaufmann, H. Simulation of multitemporal and hyperspectral vegetation canopy bidirectional reflectance using detailed virtual 3-D canopy models. IEEE Trans. Geosci. Remote Sens. 2013, 52, 2096–2108. [Google Scholar] [CrossRef]
  65. Hartigan, J.A. Clustering Algorithms; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1975. [Google Scholar]
  66. Bock, H.H. Clustering methods: A history of k-means algorithms. In Selected Contributions in Data Analysis and Classification; Springer: Berlin/Heidelberg, Germany, 2007; pp. 161–172. [Google Scholar]
  67. Owen, A.B. A robust hybrid of lasso and ridge regression. Contemp. Math. 2007, 443, 59–72. [Google Scholar]
  68. Huber, P.J. Robust statistics. In International Encyclopedia of Statistical Science; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1248–1251. [Google Scholar]
  69. Morsdorf, F.; Meier, E.; Kötz, B.; Itten, K.I.; Dobbertin, M.; Allgöwer, B. LIDAR-based geometric reconstruction of boreal type forest stands at single tree level for forest and wildland fire management. Remote Sens. Environ. 2004, 92, 353–362. [Google Scholar] [CrossRef]
  70. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
  71. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K. Xgboost: Extreme gradient boosting. R Package Version 0.4-2 2015, 1, 1–4. [Google Scholar]
  72. Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
  73. Feurer, M.; Hutter, F. Hyperparameter optimization. In Automated Machine Learning; Springer: Cham, Switzerland, 2019; pp. 3–33. [Google Scholar]
  74. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  75. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  76. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  77. Vosselman, G. Slope based filtering of laser altimetry data. Int. Arch. Photogramm. Remote Sens. 2000, 33, 935–942. [Google Scholar]
  78. Zhang, W.; Qi, J.; Wan, P.; Wang, H.; Xie, D.; Wang, X.; Yan, G. An easy-to-use airborne LiDAR data filtering method based on cloth simulation. Remote Sens. 2016, 8, 501. [Google Scholar] [CrossRef]
  79. Zhao, X.; Guo, Q.; Su, Y.; Xue, B. Improved progressive TIN densification filtering algorithm for airborne LiDAR data in forested areas. ISPRS J. Photogramm. Remote Sens. 2016, 117, 79–91. [Google Scholar] [CrossRef]
  80. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013. [Google Scholar]
  81. GreenValley International, Ltd. LiDAR360; GreenValley International, Ltd.: Berkeley, CA, USA, 2021. [Google Scholar]
  82. Longley, P.A.; Goodchild, M.F.; Maguire, D.J.; Rhind, D.W. Geographic Information Systems and Science; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
  83. Burrough, P.A.; McDonnell, R.A.; Lloyd, C.D. Principles of Geographical Information Systems; Oxford University Press: Oxford, UK, 2015. [Google Scholar]
  84. Beutel, A.; Mølhave, T.; Agarwal, P.K. Natural neighbor interpolation based grid DEM construction using a GPU. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 172–181. [Google Scholar]
  85. Walter, J.D.; Edwards, J.; McDonald, G.; Kuchel, H. Estimating biomass and canopy height with LiDAR for field crop breeding. Front. Plant Sci. 2019, 10, 1145. [Google Scholar] [CrossRef]
  86. Chen, Q.; Laurin, G.V.; Valentini, R. Uncertainty of remotely sensed aboveground biomass over an African tropical forest: Propagating errors from trees to plots to pixels. Remote Sens. Environ. 2015, 160, 134–143. [Google Scholar] [CrossRef]
  87. Goetz, S.; Dubayah, R. Advances in remote sensing technology and implications for measuring and monitoring forest carbon stocks and change. Carbon Manag. 2011, 2, 231–244. [Google Scholar] [CrossRef]
  88. Quiñonero-Candela, J.; Sugiyama, M.; Schwaighofer, A.; Lawrence, N.D. Dataset Shift in Machine Learning; Mit Press: Cambridge, MA, USA, 2008. [Google Scholar]
  89. Genuer, R.; Poggi, J.M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett. 2010, 31, 2225–2236. [Google Scholar] [CrossRef]
  90. Toloşi, L.; Lengauer, T. Classification with correlated features: Unreliability of feature ranking and solutions. Bioinformatics 2011, 27, 1986–1994. [Google Scholar] [CrossRef]
  91. Gregorutti, B.; Michel, B.; Saint-Pierre, P. Correlation and variable importance in random forests. Stat. Comput. 2017, 27, 659–678. [Google Scholar] [CrossRef]
  92. Zhang, H.; Nettleton, D.; Zhu, Z. Regression-Enhanced Random Forests. In Statistics Conference Proceedings; Presentations and Posters; 2017; Volume 9, Available online: https://dr.lib.iastate.edu/entities/publication/8c7c1d24-a466-4e37-a5c0-7f7405fa867e (accessed on 13 June 2022).
Figure 1. The location of the study site (⋆) in Mid-Jutland (DK). The inset shows a top-down view of the field site and the surrounding area. Source: www.icos-cp.eu (accessed on: 8 August 2021) and Google Earth Engine.
Figure 1. The location of the study site (⋆) in Mid-Jutland (DK). The inset shows a top-down view of the field site and the surrounding area. Source: www.icos-cp.eu (accessed on: 8 August 2021) and Google Earth Engine.
Remotesensing 14 03912 g001
Figure 2. The crop development and canopy structure: (a,b) the AGB development during the 2020 (barley) and 2021 (wheat) growing seasons, respectively, with an indication of the dates of the AGB sampling events (see Table 1). The shaded area covers ± the standard deviation; (c,d) the crop structure at the maturity stage of the barley and wheat, respectively.
Figure 2. The crop development and canopy structure: (a,b) the AGB development during the 2020 (barley) and 2021 (wheat) growing seasons, respectively, with an indication of the dates of the AGB sampling events (see Table 1). The shaded area covers ± the standard deviation; (c,d) the crop structure at the maturity stage of the barley and wheat, respectively.
Remotesensing 14 03912 g002
Figure 3. Instrumentation: (a) the UAV (DJI Matrice 600) that was used in this study with the mounted LiDAR system; (b) the LiDAR system (Nano M8, LidarSwiss GmbH).
Figure 3. Instrumentation: (a) the UAV (DJI Matrice 600) that was used in this study with the mounted LiDAR system; (b) the LiDAR system (Nano M8, LidarSwiss GmbH).
Remotesensing 14 03912 g003
Figure 4. The spatial distribution of the AGB sampling locations. Each color indicates one of the original datasets: red represents the barley samples that were collected in 2020 (i.e., b a r l e y 20 ); blue represents the wheat samples that were collected in 2021 (i.e., w h e a t 21 ); white represents the barley samples that were collected in 2021 (i.e., b a r l e y 21 ).
Figure 4. The spatial distribution of the AGB sampling locations. Each color indicates one of the original datasets: red represents the barley samples that were collected in 2020 (i.e., b a r l e y 20 ); blue represents the wheat samples that were collected in 2021 (i.e., w h e a t 21 ); white represents the barley samples that were collected in 2021 (i.e., b a r l e y 21 ).
Remotesensing 14 03912 g004
Figure 5. The point cloud data (PCD) scenes (the crops are shown at the maturity stage and the PCD scenes are colored by elevation): (a) the barley field in 2020; (b) the wheat field in 2021. In both (a,b), the upper panels show the cross-sectional views of the PCD, with a buffer depth of 0.5 m. The x (), y (), and z () axes indicate easting, northing, and elevation, respectively. It can be seen that there was a higher PCD porosity in (b) than in (a).
Figure 5. The point cloud data (PCD) scenes (the crops are shown at the maturity stage and the PCD scenes are colored by elevation): (a) the barley field in 2020; (b) the wheat field in 2021. In both (a,b), the upper panels show the cross-sectional views of the PCD, with a buffer depth of 0.5 m. The x (), y (), and z () axes indicate easting, northing, and elevation, respectively. It can be seen that there was a higher PCD porosity in (b) than in (a).
Remotesensing 14 03912 g005
Figure 6. The generation of the augmented datasets: (a) the partitioning of the individual LiDAR samples into three adjacent parts; (b) the three original adjacent AGB samples (individual size: 0.175 m2); (c) the shaded area represents each augmented sample (size: 0.35 m2–0.52 m2). The two augmented datasets were produced by sampling (with replacements) the individual original samples and adding either two or three of them together to produce a new instance. Each of the instances that were produced in this way were attributed the mean value of AGB of the individual samples while the LiDAR-derived features were re-calculated by considering all of the LiDAR returns contained in the individual samples.
Figure 6. The generation of the augmented datasets: (a) the partitioning of the individual LiDAR samples into three adjacent parts; (b) the three original adjacent AGB samples (individual size: 0.175 m2); (c) the shaded area represents each augmented sample (size: 0.35 m2–0.52 m2). The two augmented datasets were produced by sampling (with replacements) the individual original samples and adding either two or three of them together to produce a new instance. Each of the instances that were produced in this way were attributed the mean value of AGB of the individual samples while the LiDAR-derived features were re-calculated by considering all of the LiDAR returns contained in the individual samples.
Remotesensing 14 03912 g006
Figure 7. The data processing pipeline, model training, and the evaluation of the predictions: (a) the 2020 barley dataset was split into training and validation sets. The instances in both sets were stratified according to the mean AGB values from the training distribution. A cross-validated grid search was conducted to optimize the hyperparameters for model selection; (b) the ML model with the best performance was fitted to a new dataset (i.e., 70% of either the 2021 barley or wheat datasets) and the final prediction performance was evaluated using the remaining test set (i.e., 30% of the 2021 datasets).
Figure 7. The data processing pipeline, model training, and the evaluation of the predictions: (a) the 2020 barley dataset was split into training and validation sets. The instances in both sets were stratified according to the mean AGB values from the training distribution. A cross-validated grid search was conducted to optimize the hyperparameters for model selection; (b) the ML model with the best performance was fitted to a new dataset (i.e., 70% of either the 2021 barley or wheat datasets) and the final prediction performance was evaluated using the remaining test set (i.e., 30% of the 2021 datasets).
Remotesensing 14 03912 g007
Figure 8. The processing pipeline from the input PCD scene to the output AGB prediction map (in g/m2 at a 1 m2 resolution): (a) the PCD scene processing, including binary classification and digital terrain model (DTM) generation via the interpolation of ground returns; (b) a normalized point cloud with height values that were relative to the ground was used to produce the prediction feature maps for the metrics of height and reflectance; (c) the predictors were input into the trained ML regression model to produce the AGB prediction maps. The example AGB map corresponds to the barley field on 8 July 2021.
Figure 8. The processing pipeline from the input PCD scene to the output AGB prediction map (in g/m2 at a 1 m2 resolution): (a) the PCD scene processing, including binary classification and digital terrain model (DTM) generation via the interpolation of ground returns; (b) a normalized point cloud with height values that were relative to the ground was used to produce the prediction feature maps for the metrics of height and reflectance; (c) the predictors were input into the trained ML regression model to produce the AGB prediction maps. The example AGB map corresponds to the barley field on 8 July 2021.
Remotesensing 14 03912 g008
Figure 9. A comparison of the regression performances of the considered models using the training and validation datasets: (a) R 2 ¯ ; (b) R M S E ¯ ; (c) M A P E ¯ ; (d) M A E ¯ . The blue and green horizontal lines represent the performance of the linear regression baseline model using the training and validation sets, respectively. The overlined scores represent the mean values of 10 randomized executions.
Figure 9. A comparison of the regression performances of the considered models using the training and validation datasets: (a) R 2 ¯ ; (b) R M S E ¯ ; (c) M A P E ¯ ; (d) M A E ¯ . The blue and green horizontal lines represent the performance of the linear regression baseline model using the training and validation sets, respectively. The overlined scores represent the mean values of 10 randomized executions.
Remotesensing 14 03912 g009
Figure 10. The AGB predictions for the barley crops at a 0.35 m2 resolution that were produced by the ERT model compared to those that were produced by the linear model (baseline): (a) the regression performance of the ERT model using the testing dataset ( R 2 = 0.48; RMSE = 207 g/m2; MAE = 162 g/m2; MAPE = 42%); (b) the regression performance of the linear model using the testing dataset ( R 2 = 0.1; RMSE = 302 g/m2; MAE = 247 g/m2; MAPE = 34%).
Figure 10. The AGB predictions for the barley crops at a 0.35 m2 resolution that were produced by the ERT model compared to those that were produced by the linear model (baseline): (a) the regression performance of the ERT model using the testing dataset ( R 2 = 0.48; RMSE = 207 g/m2; MAE = 162 g/m2; MAPE = 42%); (b) the regression performance of the linear model using the testing dataset ( R 2 = 0.1; RMSE = 302 g/m2; MAE = 247 g/m2; MAPE = 34%).
Remotesensing 14 03912 g010
Figure 11. The AGB predictions for the wheat crops at a 0.35 m2 resolution that were produced by the ERT model compared to those that were produced by the linear model (baseline): (a) the regression performance of the ERT model using the testing dataset ( R 2 = 0.20; RMSE = 288 g/m2; MAE = 216 g/m2; MAPE = 23%); (b) the regression performance of the linear model (baseline) using the testing dataset ( R 2 = 0.14; RMSE = 304 g/m2; MAE = 254 g/m2; MAPE = 33%).
Figure 11. The AGB predictions for the wheat crops at a 0.35 m2 resolution that were produced by the ERT model compared to those that were produced by the linear model (baseline): (a) the regression performance of the ERT model using the testing dataset ( R 2 = 0.20; RMSE = 288 g/m2; MAE = 216 g/m2; MAPE = 23%); (b) the regression performance of the linear model (baseline) using the testing dataset ( R 2 = 0.14; RMSE = 304 g/m2; MAE = 254 g/m2; MAPE = 33%).
Remotesensing 14 03912 g011
Figure 12. An analysis of the aggregated predictions using the barley testing dataset: (a) the residual distribution had a mean value approaching zero (i.e., 2.2 g/m2 in the testing set, where N = 57); (b) the R 2 ¯ score converged to 1 as the number of aggregated samples increased. At every step along the x-axis, the data series took the mean (green solid line) of 100 repetitions (at a 1 m2 spatial resolution, where R 2 ¯ = 0.71). The light gray line shows the worst performance in each iteration, while the shaded area covers the confidence interval (i.e., ±the standard deviation); (b.1) a scatter plot of the predicted AGB values vs. the AGB field measurements at a 1 m2 spatial resolution.
Figure 12. An analysis of the aggregated predictions using the barley testing dataset: (a) the residual distribution had a mean value approaching zero (i.e., 2.2 g/m2 in the testing set, where N = 57); (b) the R 2 ¯ score converged to 1 as the number of aggregated samples increased. At every step along the x-axis, the data series took the mean (green solid line) of 100 repetitions (at a 1 m2 spatial resolution, where R 2 ¯ = 0.71). The light gray line shows the worst performance in each iteration, while the shaded area covers the confidence interval (i.e., ±the standard deviation); (b.1) a scatter plot of the predicted AGB values vs. the AGB field measurements at a 1 m2 spatial resolution.
Remotesensing 14 03912 g012aRemotesensing 14 03912 g012b
Figure 13. An analysis of the aggregated predictions using the wheat testing dataset: (a) the residual distribution had a mean value approaching zero with a slight overestimation (i.e., −18.5 g/m2 in the testing set, where N = 183), which represented a systematic error of 2% for the average wheat sample weight; (b) the R 2 ¯ score converged to 1 as the number of aggregated samples increased. At every step along the x-axis, the data series took the mean (green solid line) of 100 executions (at a 1 m2 spatial resolution, where R 2 ¯ = 0.58). The light gray line shows the worst performance in each iteration, while the shaded area covers the confidence interval (i.e., ±the standard deviation); (b.1) a scatter plot of the predicted AGB values vs. the AGB field measurements at a 1 m2 spatial resolution.
Figure 13. An analysis of the aggregated predictions using the wheat testing dataset: (a) the residual distribution had a mean value approaching zero with a slight overestimation (i.e., −18.5 g/m2 in the testing set, where N = 183), which represented a systematic error of 2% for the average wheat sample weight; (b) the R 2 ¯ score converged to 1 as the number of aggregated samples increased. At every step along the x-axis, the data series took the mean (green solid line) of 100 executions (at a 1 m2 spatial resolution, where R 2 ¯ = 0.58). The light gray line shows the worst performance in each iteration, while the shaded area covers the confidence interval (i.e., ±the standard deviation); (b.1) a scatter plot of the predicted AGB values vs. the AGB field measurements at a 1 m2 spatial resolution.
Remotesensing 14 03912 g013aRemotesensing 14 03912 g013b
Table 1. A description of the datasets. The b a r l e y 20 dataset was used for the training and validation phases, while the b a r l e y 21 , a u g . and w h e a t 21 , a u g . datasets were used to test the prediction results.
Table 1. A description of the datasets. The b a r l e y 20 dataset was used for the training and validation phases, while the b a r l e y 21 , a u g . and w h e a t 21 , a u g . datasets were used to test the prediction results.
Growing SeasonDatasetNumber of SamplesSample Size
2020 b a r l e y 20 1041 m × 0.35 m
2021 b a r l e y 21 1420.5 m × 0.35 m
b a r l e y 21 , a u g . 188(1–1.5) m × 0.35 m
w h e a t 21 4550.5 m × 0.35 m
w h e a t 21 , a u g . 609(1–1.5) m × 0.35 m
Table 2. A description of the models that were evaluated. The implementations were standardized Python modules.
Table 2. A description of the models that were evaluated. The implementations were standardized Python modules.
Regression ModelFamilyDescriptionImplementation
Extremely Randomized Trees (ERT)Tree-Based EnsembleEnsemble of decision trees (parallel setup) [70] in which the output is the average of individual predictionsscikit-learn
XGboostBoostingGradient boosting method that is based on stage-wise additive expansions [74,75]xgboost
HuberLinearRegularized linear regression that is robust to outliers [67,68]scikit-learn
Linear Regression (Baseline)LinearOrdinary least squares linear regressionscikit-learn
Table 3. The models that were evaluated and the considered hyperparameters.
Table 3. The models that were evaluated and the considered hyperparameters.
Regression ModelHyperparameters
Included in Cross-Validation a Total b
Extremely Randomized Trees (ERT)Criterion {mae; mse}, max. depth (None; 1, …, 9), bootstrap {True; False}, max. features {log2; sqrt}17
XGboostBooster {gbtree; gblinear; dart}, step size shrinkage (0.1, …, 0.5), learning rate (0.01, …, 0.1), L1 regularization (0, …, 0.5)29
HuberEpsilon (1.1, …, 1.75), alpha ( 5 · 10 5 , …, 10 3 ), fit intercept {True; False}, tolerance ( 10 6 , …, 10 4 )6
Linear Regression (Baseline)Fit intercept {True; False}1
a The hyperparameters that were included in the cross-validation grid search for parameter selection (values in curled brackets show parameter sets and those in round brackets show the ranges of the search); b the tunable hyperparameters that were considered for each model in scikit-learn or xgboost Python libraries.
Table 4. The prediction results as a function of spatial resolution for the barley and wheat crops.
Table 4. The prediction results as a function of spatial resolution for the barley and wheat crops.
Testing DatasetSpatial Resolution (m2) R 2 RMSE (g/m2)MAE (g/m2)MAPE (%)
Barley Testing Dataset0.350.4820716242.0
1.000.7123221423.0
2.000.9330026613.0
Wheat Testing Dataset0.350.2028821623.7
1.000.5828426419.7
2.000.8940035116.0
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Revenga, J.C.; Trepekli, K.; Oehmcke, S.; Jensen, R.; Li, L.; Igel, C.; Gieseke, F.C.; Friborg, T. Above-Ground Biomass Prediction for Croplands at a Sub-Meter Resolution Using UAV–LiDAR and Machine Learning Methods. Remote Sens. 2022, 14, 3912. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14163912

AMA Style

Revenga JC, Trepekli K, Oehmcke S, Jensen R, Li L, Igel C, Gieseke FC, Friborg T. Above-Ground Biomass Prediction for Croplands at a Sub-Meter Resolution Using UAV–LiDAR and Machine Learning Methods. Remote Sensing. 2022; 14(16):3912. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14163912

Chicago/Turabian Style

Revenga, Jaime C., Katerina Trepekli, Stefan Oehmcke, Rasmus Jensen, Lei Li, Christian Igel, Fabian Cristian Gieseke, and Thomas Friborg. 2022. "Above-Ground Biomass Prediction for Croplands at a Sub-Meter Resolution Using UAV–LiDAR and Machine Learning Methods" Remote Sensing 14, no. 16: 3912. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14163912

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop