Crop Nitrogen Retrieval Methods for Simulated Sentinel-2 Data Using In-Field Spectrometer Data

Perich, Gregor; Aasen, Helge; Verrelst, Jochem; Argento, Francesco; Walter, Achim; Liebisch, Frank

doi:10.3390/rs13122404

Open AccessArticle

Crop Nitrogen Retrieval Methods for Simulated Sentinel-2 Data Using In-Field Spectrometer Data

¹

Group of Crop Science, Institute of Agricultural Sciences, Department of Environmental Systems Science, ETH Zurich, 8092 Zurich, Switzerland

²

Image Processing Laboratory (IPL), University of Valencia Science Park, 46980 Valencia, Spain

³

Water Protection and Substance Flows, Department Agroecology and Environment, Agroscope, 8046 Zürich, Switzerland

^*

Author to whom correspondence should be addressed.

Remote Sens. 2021, 13(12), 2404; https://0-doi-org.brum.beds.ac.uk/10.3390/rs13122404

Submission received: 5 May 2021 / Revised: 5 June 2021 / Accepted: 15 June 2021 / Published: 19 June 2021

(This article belongs to the Special Issue Precision Agriculture Using Hyperspectral Images)

Download

Browse Figures

Versions Notes

Abstract

:

Nitrogen (N) is one of the key nutrients supplied in agricultural production worldwide. Over-fertilization can have negative influences on the field and the regional level (e.g., agro-ecosystems). Remote sensing of the plant N of field crops presents a valuable tool for the monitoring of N flows in agro-ecosystems. Available data for validation of satellite-based remote sensing of N is scarce. Therefore, in this study, field spectrometer measurements were used to simulate data of the Sentinel-2 (S2) satellites developed for vegetation monitoring by the ESA. The prediction performance of normalized ratio indices (NRIs), random forest regression (RFR) and Gaussian processes regression (GPR) for plant-N-related traits was assessed on a diverse real-world dataset including multiple crops, field sites and years. The plant N traits included the mass-based N measure, N concentration in the biomass (N_conc), and an area-based N measure approximating the plant N uptake (NUP). Spectral indices such as normalized ratio indices (NRIs) performed well, but the RFR and GPR methods outperformed the NRIs. Key spectral bands for each trait were identified using the RFR variable importance measure and the Gaussian processes regression band analysis tool (GPR-BAT), highlighting the importance of the short-wave infrared (SWIR) region for estimation of plant N_conc—and to a lesser extent the NUP. The red edge (RE) region was also important. The GPR-BAT showed that five bands were sufficient for plant N trait and leaf area index (LAI) estimation and that a surplus of bands effectively reduced prediction performance. A global sensitivity analysis (GSA) was performed on all traits simultaneously, showing the dominance of the LAI in the mixed remote sensing signal. To delineate the plant-N-related traits from this signal, regional and/or national data collection campaigns producing large crop spectral libraries (CSL) are needed. An improved database will likely enable the mapping of N at the agro-ecosystem level or for use in precision farming by farmers in the future.

Keywords:

nitrogen; chlorophyll; leaf area index; agro-ecosystem monitoring; spectral indices; random forest; gaussian processes regression; ARTMO toolbox

Graphical Abstract

1. Introduction

1.1. Nitrogen in Agro-Ecosystems

Nitrogen (N) plays a pivotal role in the plant life cycle, because it is one of the main nutrients needed for plant biomass production. It is essential for plant metabolism (e.g., chlorophyll) and major plant cell components such as proteins related to crop growth, development and high crop yield performance [1]. N is one of the most abundant molecules in the Earth’s atmosphere [2] and also in plants, where it mainly figures as a building block for the chlorophyll-containing chloroplasts and amino acids that form the plant proteins [3]. Out of all plant proteins, ribulose-1,5-biphosphate carboxylase-oxygenase (rubisco), the major CO₂-fixing enzyme in plants, is considered to be the most abundant terrestrial protein due to its high concentration in plants [4], which is estimated at about 22% of total leaf N [3]. In a C₃ plant, 1.7% of total leaf N is allocated to plant chlorophyll and approximately 19% of total leaf N is used in the light harvesting complex [3]. This explains the strong correlation between leaf N and chlorophyll often reported in literature [5]. The concentration of N in plant leaves is, however, relatively small, and ranges from 0.2 to 6.4%, depending on plant species [6].

Plants usually take up N in the form of ammonium (NH₄⁺) and nitrate (NO₃⁻) from the soil [7]. This supply is limited and often not sufficient to achieve the desired yield levels in intensive agriculture. Therefore, N is one of the most applied nutrients in the form of fertilizer. This human N input has massively influenced the global N cycle [8], causing negative effects on the agro-ecosystem on regional and global scales [9]. The loss of dissolved nitrate (NO₃⁻), nitrite (NO₂⁻) and volatile losses in the form of ammonia (NH₃) and nitrous oxide (N₂O) from agricultural systems has been found to pollute ground and surface waters [10,11], deteriorate biodiversity [12], and contribute to greenhouse gas emissions [13,14]. For policymakers, it is therefore becoming increasingly important to know the in- and output of N in agricultural systems at the connection between the field and farm level and the regional watershed, as part of a more holistic approach, at the agro-ecosystem level [15]. Such information can be used to better manage focus areas such as watersheds with nitrate problems in drinking water reserves. Better-informed spatial recommendations can facilitate targeted application of N input reduction measures at the field and regional scale.

1.2. Remote Sensing of Plant Nitrogen and Biomass

1.2.1. On the Terminology of Plant Nitrogen Status

Efficient monitoring of N at both the field and regional level is only possible using remote sensing. In the remote sensing literature, however, many different concepts and terms are used to describe plant ‘N status’ [5]. The used terms include N status, N content, N concentration, plant N, plant N uptake (NUP) or just ‘N’, and they are often used synonymously and are sometimes confused.

The term ‘N status’ is very popular, and often describes the plant N nutrition relative to the optimum desired for the target yield levels in an agronomic scenario [16,17,18]. This is a result of environmental conditions such as soil available N [19], the plant growth stage [16], and the expected growth performance and yield expectation, and is often used to infer the crops’ fertilizer demand at the field level [20]. It can be evaluated using the plant N concentration (N_conc), which is the amount of N relative to the dry mass per sampled plant unit (leaf, stems or the whole plant). We therefore consider the N_conc to be a mass-based N measure.

Opposed to mass-based N measures are area-based N measures, expressed in N per unit area, e.g., kg N ha⁻¹ [5]. Such area-based N measures can be obtained by multiplying the N_conc with the plant biomass in dry matter [21], whereas the leaf area index (LAI) has also been used to approximate plant biomass on the canopy level [22,23], avoiding growth stage effects in crops’ vegetative growth phase. An area-based N measure represents the total amount of N in the plant, also called nitrogen uptake (NUP), which refers to the total N taken up by the plant. NUP is often measured as ‘aboveground’ NUP, as plant biomass samplings often do not take the root biomass into account [1,20,24,25]. The term NUP is often called ‘N content’ [21]. N content is, however, often used interchangeably with N_conc, as shown in the review in [5], which can be misleading due to the confusion of mass- and area-based N measures.

The used concepts and terms strongly depend on the perspectives of the ‘end users’ and the anticipated application. For an agronomist, the plant N status is indicative of the plants’ demand for N fertilization applications. The N status can thus either be the N_conc or the NUP. In this case, the term ‘N status’ usually refers to NUP as an area-based N measure, since fertilization is usually measured in kg ha⁻¹.

For policymakers, the plant N status in agriculture is more likely to focus on the entire agro-ecosystem, especially taking waterways into account [26], i.e., how much N fertilizer has been brought into the agro-ecosystem by farmers? In particular, the risk of N losses, which pose a risk to the environment, and the economic implications thereof, are of interest for environmental stakeholders and policy makers [27]. The concept of N use efficiency (NUE) can be defined as the fraction of N taken up by crops, opposed to the amount available to the plant from soil or fertilizer application [19], and is often used both from an agronomic and from a policymaker’s perspective.

From a remote sensing perspective, the retrieved signal is a proxy for the total N per area (pixel), weighted by its visibility to the sensor. For optical remote sensing, this means that the sunlit top of the canopy has more influence on the retrieved signal than the shaded parts in the lower canopy and consequently, the N in the upper part of the canopy will have higher influence on the retrieved signal than the N in the lower part. Therefore, remotely sensed plant N mostly refers to area-based N information, where the plant biomass is part of the canopy signal.

1.2.2. Remote Sensing of Crop Nitrogen

Remote sensing plant N has often focused on the relationship between the plant leaf chlorophyll content (Chl_AB) and plant leaf N concentration [18,28,29]. However, the observed relationships have been found to be moderate, with Pearson correlation coefficients being around 0.65 ± 0.15 [30], which can partially be explained by the small amount of leaf N in the light harvesting complex compared to the total leaf N [3,5].

For remote estimation of plant N, spectral information from the red edge (RE) [17,18,24,31,32] and the near infrared (NIR) wavelength [17,24,33] regions have often been used. Studies have often focused on the spectral wavelengths up to 1000 nm [24,33,34]. The short-wave infrared (SWIR) region has not often been used, despite it showing significant potential for plant N estimation [30,35,36,37]. This spectral region is indicative of nitrogen bonds in amino acids [38], and is thereby more directly connected to N in proteins than the RE spectral region.

Spectral index (SI)-based methods have been in use since the late 1970s [39] and are now widely used in intensive agriculture, e.g., ‘smart farming’ for fertilizer applications, where an increasing number of commercial sensors and remote sensing applications related to plant N exist [32,40,41,42]. In most studies, a correlation between SIs and the trait of interest is established, and subsequent parametric regression allows prediction of plant N. Today, nonparametric methods such as machine learning regression algorithms (MLRAs), including partial least squares regression (PLSR) [43,44], random forest regression (RFR) [44,45], and Gaussian processes regression (GPR) [46,47,48], are more frequently used. Methods based on deep learning, such as neural networks, are being explored [49], but are not used frequently [5]. MLRAs and deep learning-based methods are considered nonparametric regression methods [46]. For a more in-depth discussion of commonly used algorithms for N retrieval from remote sensing data, see the review of [5].

1.2.3. Remote Sensing Plant Biomass

The estimation of plant biomass through remote sensing has been extensively performed [30,48,50,51,52,53,54]. Overall, crop traits such as LAI and canopy cover (CC) related to aboveground plant biomass show higher correlations than methods that estimate plant N [48,52]. So far, plant biomass estimation has mostly been based on SIs [17,53,55,56], and MLRAs seem to be less widely in use [48,57]. For biomass traits such as the LAI, mostly information in the RE and NIR regions has been shown to be of importance [52,55,56], with the visible (VIS) region—mainly the red domain—seeing use as well [43,58]. These are similar regions to those shown to be important for the estimation of plant-N-related traits.

1.2.4. Field Spectrometer for Validating Satellite Measurements

Few studies are available for direct plant N estimation using satellite imagery [59,60]. A limitation for such studies is the need for large and expensive field trials for model training and calibration. This is particularly difficult for small-structured agricultural systems such as prevailing in Switzerland. Ref. [61] showed, for average field sizes of 1.6 ha, that no monitoring was possible for up to 22% of the fields because no ‘pure’ field pixel was available at 20-m resolution. Increasing the spatial resolution to 10 m reduced the number of fields that could not be monitored to 6.4%. This issue is exacerbated for Switzerland, where the average farm size is just 21 ha [62] and the field sizes are even smaller (around 1.5 ha, Federal Office of Agriculture census data). Therefore, it would be of great interest if field spectrometer (FS) measurements provided the link between satellite and small-plot N fertilization trials that would otherwise be too small for the calibration of satellite measurements. Thus, several studies have used ground-based FS for the simulation of satellite sensors [24,33,63,64]. In these studies, the FS effectively acted as the tool for a simulation and validation of the data acquired by the satellite sensor.

1.3. Aims of This Study

To link satellite imagery with plant traits, data simulation using radiative transfer models (RTMs) has been extensively performed [46,65,66,67]. Often, if at all, with only small real-world validation datasets. In this paper, we aim to contribute to this research gap by applying parametric and nonparametric methods for estimation of mass- and area-based plant N traits, as well as analysis methods so far only applied to RTM-based studies to a real-world hyperspectral dataset including multiple crops, test sites and years. We further aim to elicit differences between the sensitivity of wavelength regions for plant-N-related traits as a function of bandwidth and number of bands available for prediction in ground- versus satellite-based sensing. We hypothesize that the SWIR region might be of greater importance for N estimation in satellite-based sensing as opposed to ground-based sensing due to the effect of the canopy area in the satellite-based signal. Ultimately, we aimed to estimate the potential of the Sentinel-2 (S2) satellites for plant N estimation in small-scale agricultural agro-ecosystems.

2. Materials and Methods

2.1. The Dataset

The dataset used in this study originates from three datasets containing spectral libraries (FS reflectance data) of main Swiss field crops (corn, potatoes, sugar beet, summer and winter barley, spring wheat, sunflower and winter wheat) from the years 2013–2016 and 2019. All datasets were collected within the eastern regions of the canton of Zürich, Switzerland. Please see the ‘supplementary materials—dataset’ for more information on the dataset (Figures S1–S3 and Table S1) as well as a download link for the data used in this study. The 1st and the 2nd datasets originate from FS measurements taken as part of ground truth data collection for the projects SEON (Swiss Earth Observatory Network) [68] and FLOURISH [69]. The 3rd dataset originates from FS measurements on the winter wheat experiments form the work of [20]. In all cases, spectral reflectance data were collected with a FS (ASD FieldSpec4^®, ASD Inc., Malvern Panalytical, Malvern, UK) with a spectral range of 350–2500 nm resampled to 1 nm band intervals. FS measurements were performed using a white reference and ten measurements distributed in the plot were averaged for each plot. Selected plant traits from this dataset include crop growth stage (BBCH), the mass-based N measure N concentration (N_conc), chlorophyll_AB concentration (Chl_AB) and LAI as seen in Table 1. To approximate total N and Chl_AB on the canopy level, N_conc and Chl_AB were multiplied with LAI, forming two additional traits: LAI*N_conc and LAI*Chl_AB. The trait LAI*N_conc approximates the Nitrogen uptake (NUP), as an area-based N measure. Ref. [22] suggested multiplying LAI with Chl_AB to increase canopy level N status estimation.

The mentioned traits were determined on 1 to 4 m² plots located in within farmers’ fields after being evaluated for crop growth stage according to the BBCH scale [70] and FS measurements. Within the same plots, LAI was non-destructively measured with a LI-COR LAI-2200 (2000) Plant Canopy Analyzer (LI-COR Biosciences, Lincoln, NE, USA), as described in detail by [71]. Total biomass samples (very early growth stages) or leaf subsamples of 10 to 20 of the youngest fully developed leaves (later growth stages) were collected in the measurement plots and subsequently dried for N analysis and freeze dried for Chl_AB. N_conc was measured with an elemental analyzer (Flash EA Series, Thermo Fisher Scientific, Waltham, MA, USA) or EURO EA (HEKAtech GmbH, Wegberg, Germany), Chl_AB was measured using 95% ethanol extraction and subsequent absorbance measurement by a photometer (EnSpire multimode plate reader, Perkin Elmer, Waltham, MA, USA) at 470, 649 and 664 nm using the equations given in [72].

To test the effect of crop canopy structure on the obtained reflection signal [73], four subsets were created: (1) An ‘erectophile’ dataset containing the crop species with erectophile morphology winter wheat, winter barley, spring wheat and corn, (2) a ‘planophile’ dataset containing the broad-leaved crop species sugar beet, rapeseed, sunflower and potatoes, (3) a dataset containing only winter wheat and (4) a dataset containing only sugar beet (Table 1).

2.2. Data Analysis

2.2.1. Dataset Pre-Processing

For the analysis, the atmospheric water absorption bands in the wavelength regions 1350–1440 nm, 1790–1990 nm and 2400–2500 nm were omitted from the FS data. Data in the 350–400 nm region was also omitted due to the low signal-to-noise ratio. To speed up computation, the FS data were resampled into 10-nm intervals (in the following referred to as the FS dataset). All dataset pre-processing and subsequent analysis was performed in R statistical software version 4.0.3 [74]. The R package ‘hsdar’ [75] was used to resample the FS dataset to the spectral resolution of the ‘MultiSpectral Instrument’ (MSI, Table 2) of S2 by using the S2 spectral response function provided by ESA (in the following referred to as S2 dataset).

2.2.2. Normalized Ratio Indices Generation

SIs were calculated as normalized ratio indices (NRIs, Equation (1)) using all possible band combinations of wavelengths

λ_{A}

and

λ_{B}

:

N R I = \frac{λ_{A} - λ_{B}}{(λ_{A} + λ_{B})}

(1)

This resulted in 14,706 unique NRI combinations for the FS dataset and 66 combinations for the S2 dataset. The calculated NRIs were correlated against the individual crop traits (Table 1) using Pearson’s correlation. The Pearson’s correlation was squared to obtain the coefficient of determination (R²). For the NRI scoring the highest R² value was used to fit a linear regression equation to the trait of interest [24,36,52,76,77].

2.2.3. Random Forest Regression

Random Forest Regression (RFR) was used as a nonparametric machine learning method to regress the individual crop traits on the spectral data. RFR was performed with ten-fold cross validation [44,78]. RFR was implemented using the ‘caret’ [79] and ‘ranger’ [80] packages in R. The optimal model parameter mtry (the number of variables to use in each tree) was determined using the best performing model elicited in cross-validation. The RF variable importance scores were calculated using the permutation importance [81] and were used to rank the importance of the available and used spectral bands for the estimation of the trait of interest.

2.2.4. Gaussian Processes Regression–Band Analysis Tool

An alternative spectral analysis was conducted in the automated radiative transfer models operator (ARTMO) toolbox [82]. ARTMO consists of a suite of radiative transfer models and post-processing toolboxes, such as the global sensitivity analysis (GSA) toolbox [83], the MLRA toolbox [84] and the emulator toolbox [85]. The MLRA and emulator toolboxes consist of a suite of MLRAs for mapping applications and subsequent analysis. The Gaussian processes regression–band analysis tool (GPR-BAT) [86] included in the MLRA toolbox was used as an additional method to identify the importance of spectral bands for trait estimation. For this, a GPR model was fitted using ten-fold cross validation. GPR was used with an automatic relevance determination (ARD) kernel, where correlation length scales

σ_{i}

for each spectral band of the GPR covariance (kernel) function can be directly used as band importance measures [46,87]. Iteratively, the spectral band exhibiting the highest

σ_{i}

value of the ARD kernel [47,86] was omitted from the GPR model using a sequential backward band removal (SBBR) algorithm [86] until only one spectral band remained. This resulted in an approximation of the influence of each band for the trait of interest [86]. The frequency at which a spectral band was ranked within the top five lowest Sigma values for each of the ten cross-validation folds was taken as the importance factor for said spectral band.

2.2.5. Global Sensitivity Analysis

The global sensitivity analysis (GSA) toolbox was originally developed to estimate the key input variables driving the spectral output of radiative transfer models (RTM) by using sensitivity analysis of the input variables [83]. Instead of using RTM spectra, the spectra of the real-world dataset were used to perform a GSA of all sampled crop traits at once using the full FS and S2 datasets where entries for all traits were available. Contrary to the NRI regression, RFR and GPR-BAT, which are univariate analyses in which one target trait is analyzed at a time, the GSA is a multivariate analysis. Since the GSA allows estimation of the contribution of each input variable across the whole spectrum, it can be used to elicit the importance of spectral regions for all traits of interest at once [83,88]. To reduce the large computation time needed for GSA, the input spectra can be approximated using an emulator [88] that fits an ML model emulating the original spectra. Here, multiple emulators from ARTMOs MLRA toolbox were trained, and the best-performing (according to an 80/20% training/test set data split) was chosen to approximate the available spectral data. This emulator was then used to conduct the GSA, which effectively varies the target trait of interest along its variance range in a Monte-Carlo simulation, measuring the sensitivity of each spectral band to the variance change of the target trait. For each trait, 1000 iterations were simulated.

3. Results

3.1. Comparison of Spectral Analysis Methods

The comparison of the NRI regression, RFR and GPR results for both the FS and the S2 dataset and all their subsets is shown in Figure 1. For N_conc, R² values for the FS datasets ranged from 0.33 to 0.59 for the NRI method (p < 0.001). R² values for the RFR method ranged from 0.16 to 0.74 (RMSE = 0.45 to 0.52) and from 0.22 to 0.77 for the GPR method (RMSE = 0.39 to 0.47) and FS datasets. R² values for the S2 datasets ranged from 0.17 to 0.25 for the NRI (p < 0.001); from 0.29 to 0.68 for the RFR (RMSE = 0.44 to 0.55) and from 0.30 to 0.80 for the GPR method (RMSE = 0.38 to 0.45). In the full, erectophile and winter wheat datasets, the ML-based methods RFR and GPR outperformed the NRI method for both the FS and S2 dataset. Overall, the RFR and GPR exhibited similar R² values with the GPR showing slightly higher values. For the NRI method we found generally higher R² values for the FS data than for the S2 resampled data. This was not observed for the RFR and GPR, where the differences in R² were small. In the sugar beet dataset, both ML-based methods showed higher performance on the S2 than for the FS dataset.

For Chl_AB, R² values for the FS datasets ranged from 0.45 to 0.85 for the NRI (p < 0.001); from 0.5 to 0.81 for the RFR (RMSE = 0.39 to 0.66) and from 0.63 to 0.91 for the GPR (RMSE = 0.34 to 0.56). For the S2 datasets, R² values ranged from 0.43 to 0.85 for the NRI (p < 0.001); from 0.48 to 0.80 for the RFR (RMSE = 0.35 to 0.65) and from 0.41 to 0.84 for the GPR (RMSE = 0.44 to 0.71). For each method, the differences between the FS and the S2 datasets were small except for GPR, which showed higher R² values for the FS than the S2 data in the erectophile and sugar beet subsets. The NRI method performed very similarly to the ML-based methods for Chl_AB.

For LAI, R² values for the FS datasets ranged from 0.60 to 0.92 for the NRI (p < 0.001); from 0.74 to 0.91 for the RFR (RMSE = 0.32 to 0.81) and from 0.69 to 0.93 for the GPR (RMSE = 0.29 to 0.58). For the S2 dataset, R² values ranged from 0.54 to 0.90 for the NRI (p < 0.001); from 0.73 to 0.91 for the RFR (RMSE = 0.30 to 0.75) and from 0.69 to 0.89 for the GPR method (RMSE = 0.35 to 0.59). Differences between the FS and the S2 data were very small for LAI, with the FS data exhibiting only slightly higher R² values. Performance of the NRI was overall similar to that of the ML-based methods, except for the full dataset.

For the LAI-scaled trait LAI*N_conc, R² values for the FS data ranged from 0.61 to 0.91 for NRI (p < 0.001); from 0.78 to 0.92 for RFR (RMSE = 1.56 to 3.97) and from 0.81 to 0.90 for GPR (RMSE = 1.40 to 3.35) depending on the subset. For the S2 data, R² values ranged from 0.54 to 0.89 for the NRI (p < 0.001); from 0.80 to 0.93 for the RFR (RMSE = 1.46 to 3.73) and from 0.81 to 0.89 for the GPR (RMSE = 1.47 to 3.51). Differences between the FS and S2 datasets were small.

For the LAI-scaled trait LAI*Chl_AB, R² values for the FS data ranged from 0.59 to 0.91 for the NRI (p < 0.001); from 0.74 to 0.92 for the RFR (RMSE = 1.86 to 5.49) and from 0.83 to 0.89 for the GPR (RMSE = 1.97 to 4.75). For the S2 data, R² values ranged from 0.53 to 0.90 for the NRI (p < 0.001); from 0.77 to 0.92 for the RFR (RMSE = 1.67 to 5.29) and from 0.76 to 0.86 for the GPR (RMSE = 2.31 to 4.81). Differences between the FS and the S2 data were again small.

3.2. Spectral Band Selection

3.2.1. Random Forest Variable Importance

The waveband ranking for N_conc in the full FS dataset (Figure 2, left column) showed the RE spectral region around 710 nm to be of high importance for the RFR model, along with the band at 400 nm and two bands in the SWIR region around 2000 nm. For the full S2 dataset (Figure 2, right column) the two RE bands (RE2 at 740 nm and RE3 at 783 nm, see Table 2), the two NIR bands and SWIR bands were influential for N_conc approximation.

The waveband ranking for Chl_AB on the full FS dataset showed the most influential variable to be the band at 700 nm in the RE region. The VIS region, especially the green to red domain (520 to 660 nm), contained many bands ranked with high importance. The band ranking for Chl_AB in the S2 dataset showed the same wavelength at 705 nm (RE1 band of S2) to be highest ranked followed by the green band at 560 nm. The other bands in the VIS range at 490, 665 and 443 nm (the S2 bands blue, red and coastal aerosol) also seemed to be important variables being less highly ranked, showing a similar pattern as observed for the FS dataset.

For LAI, in the full FS dataset, we found the NIR region at 870 nm to have the highest rank followed by other bands in the NIR, RE and one band in the VIS region at 400 nm. The S2 dataset exhibited the two bands RE3 at 783 nm and NIR1 at 842 nm as being highest ranked for LAI estimation, followed by the NIR2 band at 865 nm. Ranking for the FS dataset showed a similar pattern, where the NIR region between 850 and 870 nm and the RE region at 750 and 760 nm were shown to be the most important.

The two LAI-scaled traits showed bands in the RE and NIR regions between 760 and 940 nm to be of importance for the FS dataset. For LAI*N_conc, the band at 760 nm was the highest ranked, followed by the band at 900 nm, showing a much lower importance. In the S2 dataset the RE3 band at 783 nm was ranked the highest, followed by the NIR1 band at 842 nm and the NIR 2 band at 865 nm being very similar as in the FS dataset.

For LAI*C_conc four bands in the RE (bands 770, 780 nm) and NIR region (bands 810, 890 nm) were ranked highest. For the S2 dataset, the RE3 band at 783 nm was found to be the most important, followed by the NIR1 and NIR2 bands at 842 and 865 nm, respectively. The other S2 bands possessed a much lower variable importance.

3.2.2. Gaussian Processes Regression–Band Analysis Tool

The GPR-BAT performed on the full FS dataset showed the LAI and LAI-scaled traits to be largely invariant to band removal until five bands were left, after which R² values decreased sharply (Figure 3). Prediction performance for N_conc was invariant to band removal until 20 bands, after which GPR R² values increased until five bands were left, after which the R² values decreased sharply again. Chl_AB showed a similar trend, where R² values increased until five bands were left and then sharply decreased. The RMSE values for the traits followed the same trend, albeit inverted. They stayed invariant to band removal (or decreased) until ten to five bands and then sharply increased (Supplementary Figure S4). GPR-BAT R² values for the five most important FS bands were 0.74 for N_conc, 0.75 for Chl_AB, 0.91 for LAI and 0.84 for LAI* N_conc and LAI* Chl_AB. For the S2 dataset, GPR-BAT R² values were 0.77 for N_conc, 0.76 for Chl_AB, 0.81 for LAI, 0.84 for LAI*N_conc and 0.85 for LAI*Chl_AB. Therefore, the top five ranked bands were used for analysis of the GPR-BAT.

Figure 4 shows the frequency of how many times a certain band was ranked from 1st to 5th place across all the ten folds from the cross-validation performed in the GPR-BAT (see Section 2.2.4).

For the N_conc, the VIS region around 400 nm and the SWIR region (around 2000 nm) was shown to be of high importance. The green and early red (around 600 nm) and RE (around 700 nm) regions were shown to be of minor importance for the full FS dataset. The S2 dataset showed especially the green band of S2 at 560 nm and—less often—the two NIR bands at 842 and 865 nm to be the most important bands. The S2 RE bands at 705 and 740 nm were also important, albeit ranked in the second rank.

The spectral bands with the largest importance for Chl_AB estimation using GPR for the FS dataset were located in the SWIR region around 2400 nm, with other important bands in the green (590 nm), RE (760 nm) and SWIR region at 2000 nm. The S2 dataset showed the most important bands to be the SWIR2 band at 2190 nm and the RE3 band at 783 nm and the NIR2 band at 865 nm. This was a slightly stronger focus on the NIR region compared to the FS dataset.

For LAI, we found a large spread of important bands over the spectrum for the FS dataset. The bands at 570 nm and at 740 nm were ranked 1st the most often. The top ranked bands were also situated in the SWIR region (once at 1670 and 2010 nm) and in the blue VIS region around 420 nm. The S2 resampled dataset showed a strong focus on the RE region, with the RE2 band at 740 nm being the first-ranked band the most often. The water vapor band at 945 nm in the NIR region also exhibited high importance. The S2 green band at 560 nm—the most important region in the FS dataset—was also highly, but not top, ranked.

The distribution of important bands for LAI*N_conc was like LAI for the FS dataset. The important bands were in the VIS region at 410 and 420 nm, at 770 nm in the RE region and two in the SWIR region at 1670 and 1720 nm. The S2 resampled dataset showed the RE2 band at 740 nm, the RE3 band at 783 nm and the water vapor band at 945 nm to be the most important for the GPR-BAT.

For estimation of LAI*Chl_AB from the FS dataset, important bands were found across the full spectrum with the most important band located at 1350 nm. Other important bands were in the VIS region near 410 nm, one band at 770 nm in the NIR and one in the SWIR region at 1670 nm. The S2 resampled dataset also showed a focus on the RE region as found for LAI and LAI*N_conc. The importance at the end of the NIR region at 1350 nm, which was observed in the FS dataset, was not observed in the S2 dataset.

3.3. Global Sensitivity Analysis

Using the ARTMO toolbox [78], different MLRAs were fitted to the full dataset only as the data subsets proved to be too small, resulting in insufficient emulator performance for GSA. For the full FS dataset, a canonical correlation forest was chosen as the best performing emulator with a RMSE of 3.95 and a normalized RMSE (NRMSE) of 11.8% (reflectance values). The per-wavelength NRMSE ranged from 10% in the NIR plateau up to 17.57% at 720 nm in the RE region (Figure S5). For the S2 resampled dataset, the canonical correlation forest was also identified as the best performing emulator with a RMSE of 3.57 and NRMSE of 11.72%. The per-wavelength NRMSE values ranged from 10.26% at the NIR2 band (865 nm) to 12.66% at the RE1 band situated at 705 nm (Figure S6).

The sensitivity of each spectral band for the trait estimation showed LAI to be the most dominant variable, especially in the RE and the NIR region (Figure 5). The LAI showed up to 93% of the total sensitivity in these regions. This was independent of the dataset (FS and S2). LAI also exhibited strong sensitivity in the VIS region around 400 nm. Sensitivity for the LAI dropped in the SWIR region after 1400 nm but remained large with a local SWIR peak at 2000 nm. The S2 dataset showed very similar pattern for the LAI in the regions, where an S2 band was located.

The sensitivity pattern observed for Chl_AB was very different from the one observed for LAI, exhibiting peaks where LAI showed a low sensitivity at 710 nm (72.04%), in the green region around 550 nm (56%), and in the SWIR region around 1670 nm (61%). The lowest sensitivity was observed in the RE and NIR region, where LAI was dominant. The S2 dataset showed very similar sensitivity as the FS dataset, albeit at a much lower spectral resolution.

For N_conc, a very low sensitivity compared to the other traits was observed, ranging from 0.5 to 16.47%. Sensitivity for N_conc was especially low in the RE and NIR region, where LAI was dominant. The lowest sensitivity was found at 740 nm. The wavelength region with the highest sensitivity for N_conc was the VIS region, with an average sensitivity of 15% and the peak sensitivity for N_conc of 16.47% located at 540 nm. Large parts of the SWIR region from 1400 to 2400 nm showed sensitivities ranging from 10 to 12%. For the S2 dataset, N_conc exhibited very low sensitivity over the whole spectrum, with values ranging from 1.4 to 2.8%.

4. Discussion

4.1. Optimal Analysis Method Depends on Target Trait

For Chl_AB, coefficients of determination (R² values) for the NRI method were in the same range or better than the Random Forest Regression (RFR) and Gaussian processes regression (GPR) approaches (Figure 1). Other studies using NRIs found significant R² values for estimating crop-specific Chl_AB of 0.55 for winter wheat [89], 0.92 for maize, 0.81 for soybean [34] and 0.77 for sugar beet [90], comparable to the values observed for the crop-specific subsets found in this study (0.72 for winter wheat and 0.79 for sugar beet). R² values found for Chl_AB for the full, crop unspecific dataset were, however, much lower (<0.57). This was to be expected, as the large variance between the crops is not only caused by different Chl_AB levels across different crops, but also by the strongly differing canopy architecture, leaf morphology and partly growth stages (Table 1). Due to the mediocre correlation between Chl_AB and plant N status [30], traits such as N_conc and LAI are more interesting from an agronomic viewpoint, since they are more directly related to the plant management decisions of the farmer on the field level.

For N_conc, a mass-based N measure, the ML-based methods RFR and GPR performed generally better than the NRIs, a finding also confirmed by [5,91]. This was most pronounced in the full dataset, where RFR R² was 0.64 (RMSE = 0.52) and GPR R² 0.74 (RMSE = 0.39). This is in the same range as reported by [44] for RFR applied for pastures (R² = 0.76 and RMSE of 0.38), e.g., grassland, which are mixed species stands. A slightly different approach was used by [45], who calculated SIs and did subsequent RFR for N_conc in winter wheat (R² = 0.87 and RMSE of 0.32), e.g., a single-crop dataset. We found RFR R² values through direct estimation to be 0.74 (RMSE = 0.47) for the crop-specific winter wheat dataset. Ref. [47] reported GPR R² values for mass-based N (in mg g⁻¹) of 0.3 ± 0.07 for a dataset of mixed tree species.

The area-based N trait LAI*N_conc exhibited much higher model performance than the N_conc in our study. This is in line with literature citing the direct estimation of N_conc to be mediocre [30]. However, signal separation remains an issue with this composite parameter, as described further below in the discussion of the spectral regions of interest and the GSA (see Section 4.5).

The biomass-related trait LAI was estimated better with the ML-based methods than the NRI method. Using NRIs, we explained 0.59 of the observed variation in the crop unspecific full dataset, which is less than the R² of 0.71 reported by [92], who also used NRIs in a mixed-crop dataset. The R² of up to 0.92 found in the crop-specific subsets was similar to single-crop values of up to 0.98 found for maize [93]. R² for LAI predicted with the RFR model was 0.77 (RMSE = 0.81) for the full dataset and a maximum of 0.91 for the sugar beet dataset (RMSE = 0.32). These values are comparable to the ones found by [94] for soybean (R² = 0.74 and RMSE of 0.11) and [78] for rice (R² = 0.76 and RMSE of 0.67). Ref. [95] found R² values for LAI in a multi-crop dataset of up to 0.91 using GPR (RMSE = 0.51). This was very similar to the results obtained in this study (full dataset R² of 0.91 using GPR, RMSE = 0.55).

4.2. Low Specificity of Index-Based Methods for Satellite-Based Remote Sensing

The parametric, index-based methods performed very well overall, indicating that they can be readily used for proximal remote sensing tasks, e.g., with a FS. More sophisticated SIs that take either more than two bands into account [28,33,50,58], or are a composition of multiple indices [29,96], can—and regularly do—increase prediction results over NRIs. NRIs are, however, still a capable instrument in remote sensing, offering fast and efficient calculation and easy interpretation. This explains the commercial systems based on spectral indices already in operation [32,40,41]. For the S2 resampled dataset representing satellite-based remote sensing, performance of the NRIs was low, especially for N_conc. This was mostly due to the unavailability of bands in the S2 sensor that were important in the FS dataset for N_conc (Figures S7 and S8). This is also reflected in the literature, where more sophisticated indices, such as MSAVI, Cl_RedEdge, etc., are often used for N_conc estimation from S2 data instead of NRIs [32,34]. These indices were also calculated in this study, but yielded very low coefficients of determination for N_conc on the full dataset (R² of 0.02), and were sometimes not even significant. The performance on the crop-specific datasets was found to be similarly low. These non-NRI indices were shown to be not specific for traits, e.g., multiple indices showed equal prediction performance for N_conc. The same was also observed for Chl_AB, where the prediction performance of more specialized vegetation indices such as the MSAVI, MCARI and Cl_RedEdge index [33,34,89] and others showed half of the indices to be situated within <0.1 R² of each other. Especially LAI and Chl_AB shared indices that were indicative of these two traits at the same time. This was true for both the FS and the S2 dataset. Another reason for the low specificity of the index-based methods may be the crop-unspecific dataset, as well as the few datapoints of the generative growth stages. The nonparametric ML methods RFR and GPR showed similar prediction performance between the FS and the S2 datasets (Figure 1). RFR and GPR performed especially well in the prediction for both the mass- and the area-based N trait and LAI in the S2 dataset compared to the NRIs. Coupled with the low specificity of the index-based methods, this indicates that ML-based approaches may be better suited for satellite-based remote sensing applications of plant N than index-based methods. This better performance was especially apparent in the full dataset reflecting all the heterogeneity mentioned above.

4.3. Model Performance on Data Subsets

In the smaller, crop-specific datasets, the relative performance of the NRI increased (Figure 1). Even though random forests (RFs) can work well on small datasets [97], they—like all ML-based methods—generally perform better with larger datasets [98]. The RFR and GPR performances on the data subsets showed large variation in R² and RMSE values in the data subsets (Figures S9 and S10), which were too small to fit a reliable emulator in ARTMO and run a GSA (see Section 2.2.5). This suggested that in the case of limited data availability, the computationally far less expensive NRI method performs similar as the ML-based methods. It is important to note that this is generally only the case for crop-specific datasets, as the NRIs were outperformed by the ML methods on the unspecific full datasets containing several crops (Figure 1). This could be due to the ML-based methods using more input variables (e.g., spectral bands) than the NRI approach. RFR and GPR use all available spectral bands for regression and therefore more of the available information of the spectrum compared to the index-based methods that used two bands in this study and usually up to four bands [28,56]. It is likely that underlying crop architecture and morphology, as affected by growth stage, related background soil signal, plant health status, etc., are used by the RFR and GPR algorithms. The full dataset contains the most variance, therefore the ML models also perform the best on them. For that reason, we see the full dataset as valuable for agro-ecosystem modeling and monitoring. To fully answer these questions, especially with respect to crop-specific datasets, further data collection campaigns are needed to obtain larger crop spectral libraries (CSL) to exploit the power of ML-based algorithms in conjunction with the S2 satellites.

4.4. Influence of Band Number and Bandwidth on Trait Estimation

Trait estimation of the RFR and GPR methods on the S2 dataset showed little performance loss compared to the FS dataset (Figure 1), indicating that the reduced amount of S2 bands (n = 12) compared to the FS bands (n = 172) did not deteriorate model performance. This was confirmed in the GPR-BAT, where band removal kept R² and RMSE values relatively stable—or even improved them—until the number of five bands was reached, after which the model performance dropped strongly (Figure 3 and Figure S4). This indicated that GPR was able to predict the traits used in this study optimally using five spectral bands. A similar finding was reported for chlorophyll and LAI by [86], who found these traits to be optimally estimated using four to ten spectral bands. Ref. [47] found that reducing bands improved estimation of mass-based N (in mg g⁻¹), but in contrast found the number of bands to be around 100.

Another reason for the good performance of the ML methods on the S2 dataset may be that these methods cope better with the broad bandwidth of certain S2 bands (Table 2), especially for certain NIR and SWIR bands. The effect of the broad bandwidth of the S2 sensor could also be observed in the band importance rankings. Especially in the case of Chl_AB, where the S2 dataset exhibited the same important bands in the VIS range as the FS dataset, but aggregated to the four VIS bands of S2 (Figure 2). In contrast, the hyperspectral FS data showed good performance on the narrowband NRIs compared to the broad bands found on the S2 satellite. It is therefore possible to predict certain traits such as Chl_AB and LAI using only two narrow bands of the spectrum. This also shows that, depending on the user and the trait of interest, input variables can be reduced, saving on computation time or sensor cost.

4.5. Spectral Regions for Trait Estimation

In the multivariate global sensitivity analysis (GSA, Figure 5), the LAI was shown to be the most dominant variable for the spectral reflectance measurements, which confirms findings of other studies conducting GSA [65,88,99]. These studies, however, found LAI to be most important in the short-wave infrared (SWIR) region, which is a finding we could not confirm in this study, where the red edge (RE) and especially the near-infrared (NIR) regions were most important for the LAI estimation (Figure 2, Figure 4 and Figure 5). Since most studies employing the GSA used simulated data from RTMs, direct comparison to the real-world spectral datasets is difficult, but nevertheless very important. The GSA emulator performance measures found in this study were much lower than the emulator performed on a simulated RTM [88], but can be considered adequate for real-world data, with an NRMSE of 11.72%. The lower performance can be attributed to the full dataset containing different crop species and growth stages. Although the same FS and lab equipment were used to obtain the data of the three original datasets (Section 2.1), different people collected the data. This may have added additional variation in the dataset. Other effects such as different crop varieties, effects of biotic (pests and diseases), and abiotic stresses (nutrient and water limitation), or presence of weeds in monocrop stands may cause additional noise compared to simulated data. Such effects can never be fully avoided in real-world spectral data. Despite these limitations of the real-world dataset, the overall GSA regions for Chl_AB, LAI and N_conc showed similar regions as obtained with RTM data [65,88,99], indicating both the validity of RTM and the applicability of the real-world dataset used for this study.

Univariate analyses also found the RE and NIR regions to be important for LAI estimation [52,53,55,56]. In the GSA, N_conc was shown to be dominated by the other traits, with only the visible (VIS), RE and SWIR region showing low sensitivity (Figure 5). This was also found in the GSA of the RTM-based studies [65,88].

For the Chl_AB trait, the RE region around 700 nm and the red region around 600 nm were the most important. The RFR and GPR-BAT exhibited contrasting results in band importance with the GPR-BAT exhibiting the far SWIR region around 2400 nm to be of high importance for Chl_AB. Ref. [47] also found the far SWIR region at 2250 nm to be important for Chl_AB in a GPR-BAT analysis.

4.6. Important Bands for Plant N Estimation

In the univariate spectral region analysis on the FS dataset, the VIS region at 400 nm, the RE region at 740 nm and the SWIR region at 2000 nm were shown to be the most influential for sensing N_conc (Figure 2 and Figure 4). The importance of the RE regions in plant N_conc estimation has been shown previously [20,24,31,100]. The VIS region for plant-N-related traits has also been shown to be important [47], but to a lesser extent. The importance of the SWIR region for mass-based plant N estimation has been shown previously [35,46]. Since large proportions of leaf N are bound in proteins [3], which exhibit high spectral absorption in the SWIR region [5,38], the importance of the SWIR bands for N_conc was expected. For area-based plant N, the SWIR regions were found to be important as well [46]. This finding was only partially confirmed in this study, where the spectral regions of interest for LAI*N_conc were primarily observed in the RE and NIR regions and only partially in the early SWIR region around 1650 nm in the GPR-BAT analysis for both the FS and the S2 dataset (Figure 4).

In the multivariate GSA on the S2 dataset, sensitivity for N_conc was even lower than that observed for the FS dataset (Figure 5). In the univariate band analysis, the S2 dataset for the N_conc trait showed the RE bands of S2 and the NIR bands to be the most important, very similar to the FS dataset. Ref. [59] also reported the S2 NIR band in conjunction with the S2 red band to be the most important for N_conc in winter wheat. The importance of the S2 SWIR bands for N_conc was mentioned by [101]. Refs. [24,60] both estimated area- (kg ha⁻¹) and mass-based N (%) in winter wheat and highlighted the use of the three S2 RE bands [60] and a combination of NIR and RE bands [24]. Ref. [34] also used area-based N measurements (g m⁻²) and reported the importance of the S2 RE bands. The LAI*N_conc was also shown to be sensitive in the RE and NIR region for the S2 dataset, a result that is comparable to the literature [24,34,60].

4.7. Field Spectrometer Data for Satellite Data Simulation

The simulation of satellite data based on FS data has been evaluated in previous studies [24,34,59,101]. Of these, [24,34] performed simulation of S2 bands using an FS with wavelength up to 1000 nm. In this study, the spectral range was extended to 2500 nm, which was beneficial for mass-based N estimation due to the importance of the SWIR bands covering the protein-specific regions of N. Studies using true satellite imagery and FS data have obtained similar results in remote sensing plant N (mass- or area-based) and concluded similar spectral bands of interest for these traits [32,60,102], highlighting the robustness of the simulation approach followed in this study.

Simulation of satellite data allows modeling interactions of important crop traits and the remotely sensed signal without the need for expensive, large-scale field experiments and worries about satellite pixel site or mixed pixel effects. This is especially important for small-structured agricultural systems such as those found in Switzerland and southern Germany, where field sizes are small and inhomogeneous (e.g., with respect to trees within the field, hedges, and soil differences). Intercrops are often grown in the crop cycle between the main crops and have little coverage in national census databases. Therefore, prediction of crop traits on crop unspecific datasets is very important, particularly if ecological measures are an integral part in the agro-ecosystem policy fostering inter- or mixed cropping. Such cropping practices will cause more mixed pixels. It can be assumed that such practices will become more important and will increase in the future with Swiss and EU regulatory encouragement of ecological landscape measures [103]. A large CSL would be interesting for ‘end users’ such as governmental institutions to develop monitoring products, strategies and policies on an agro-ecosystem level. For such applications, a CSL including multiple crop species and varieties and information from different growth stages is sufficient to estimate plant-N-related traits as was shown in this study. This is an important step for possibly allowing derivation of valuable information on N flows such as the in- and output in agro-ecosystems supporting the identification of regional hotspots and support decisions and measures for mitigation.

4.8. Outlook on Remote Sensing of Plant N

Chl_AB and LAI are estimated robustly through remote sensing techniques currently in use [30,34,78,104]—a finding confirmed in this study. This comes as no surprise, as the S2 satellites were designed for vegetation monitoring [34]. Future work is needed in the domain of remote sensing plant N. Especially in small-scaled agricultural systems, the use of FS mounted on tractors [32,40] and unmanned aerial vehicles (UAVs) [20,105] has been proposed as an effective solution for data collection for plant N prediction and modeling. Based on such data, ML methods can be used more efficiently, or alternatively, advanced modeling techniques such as deep learning could be applied. These approaches already show promising performance for plant N estimation [49,60] but are heavily dependent on large quantities of data. An additional benefit of more data collection would be the creation of more specific (e.g., crops, growth stages, climate zones, etc.) datasets forming a CSL to allow better simulation of satellite sensors, advancing the modeling of N-related crop traits. Such a CSL would directly address the bottleneck of the small crop-specific data subsets in this study. These small subsets resulted in variations of ML performance, leading to unreliable prediction. Such datasets hold great potential, as the information about the plant morphology and growth stage would be included in them. A crop-specific CSL would be especially important for agronomists developing models for farmers which need the highest possible model accuracy for crop monitoring for management decisions such as N fertilization.

5. Conclusions

In this study, we showed the performance of parametric and nonparametric methods for two nitrogen (N) related traits: (1) the mass-based N measure, N_conc, and (2) an area-based N measure, LAI* N_conc on a diverse real-world spectral library. Estimation of plant chlorophyll was shown to be robust, with few spectral bands in the red region around 600 nm and the red edge (RE) region around 700 nm, irrespective of whether a broadband satellite or narrowband hyperspectral field spectrometer (FS) was used. Plant chlorophyll was especially well estimated using normalized ratio indices (NRIs). The leaf area index (LAI) was estimated with good performance for both the ground-based FS and the satellite-based Sentinel-2 (S2) datasets containing single crops and the dataset containing a mixture of crops. LAI was better predicted using machine learning (ML) methods than NRIs. The estimation of N_conc was most successful using the ML algorithms random forest regression (RFR) and Gaussian processes regression (GPR) for both the hyperspectral FS and the S2 dataset. Hyperspectral devices achieved the best estimation results in the visible (VIS) region at 400 nm, especially in the RE region around 740 nm and the SWIR region around 2000 nm. The broadband S2 sensor needs the SWIR bands for good estimation performance. The ML algorithms were shown to be capable of estimating N_conc in the multiple crop dataset. Scaling the N_conc with LAI approximated the area-based plant N measure N uptake (NUP) and improved the prediction of crop N status by including the plant biomass signal. However, the separation of the biomass signal remains a challenge, and further research is needed. We therefore strongly recommend intensifying the data collection of plant-N-related traits and spectral measurements, as well as sharing available datasets and/or spectral libraries in order to monitor N in agro-ecosystems on a regional or even national scale. Such systems would facilitate more intelligent monitoring and decision support systems for agricultural policies and eventually precision farming.

Supplementary Materials

The following are available online at https://0-www-mdpi-com.brum.beds.ac.uk/article/10.3390/rs13122404/s1. Figure S1. Location of the measurements where data was sampled for this paper; Figure S2. Detailed BBCH stages of the dataset; Figure S3. Climate diagram showing average temperature and precipitation for the region where the data was collected; Figure S4. RMSE values of the traits as a function of the number of spectral bands obtained using the Gaussian processes regression - band analysis tool (GPR-BAT) with the sequential backward band removal (SBBR) algorithm applied; Figure S5. Performance of the emulator (Canonical Correlation Forest) on the full FS dataset with 20 PCA and 80/20% train/test set split; Figure S6. Performance of the emulator (Canonical Correlation Forest) on the S2 resampled full dataset with 20 PCA and 80/20% train/test set split; Figure S7. R2 values between the trait of interest and the normalized ratio indices (NRI’s) using all possible band combinations of the full field spectrometer dataset; Figure S8. R2 values between the trait of interest and the normalized ratio indices (NRI’s) using all possible band combinations of the full Sentinel-2 resampled dataset; Figure S9. R2 values for prediction on the test set for 100 iterations of random train/test set splits. Train/test set splits were performed by stratification by crop and date combinations; Figure S10. Median centred RMSE values for each trait & dataset combination for the same 100 iterations of random train/test set splits as found in Figure S9; Table S1. The dataset contains the following 11 Crops.

Author Contributions

Conceptualization, H.A., A.W. and F.L.; Methodology, G.P. and F.L.; Software, G.P. and J.V.; Validation, G.P.; Formal Analysis, G.P.; Data Investigation, G.P., H.A., J.V., A.W. and F.L.; Resources, A.W. and F.L.; Data Curation, G.P. and F.A.; Writing—Original Draft Preparation, G.P., H.A., A.W. and F.L.; Writing—Review & Editing, G.P., H.A., J.V., F.A., A.W. and F.L.; Visualization, G.P.; Supervision, A.W. and F.L.; Project Administration, A.W. and F.L.; Funding Acquisition, A.W. and F.L. All authors have read and agreed to the published version of the manuscript.

Funding

We acknowledge support from the Swiss Federal Office for Agriculture (Bundesamt für Landwirtschaft, BLW) for funding this work as part of the project ‘DeepField’. Parts of the data were derived from the Swiss Earth Observatory Network (SEON) financed by the Swiss State Secretariat for Education, Research and Innovation (SERI) and ETH-Board as a Cooperation and Innovation Project (KIP) initiated by the Swiss University Conference (SUK) and the Flourish project funded by the European Union’s Horizon 2020 research and innovation program under grant agreement No 644227 (FLOURISH) and from the SERI under Contract Number 15.0029. Jochem Verrelst was supported by the European Research Council (ERC) under the ERC-2017-STGSENTIFLEX project (grant agreement 755617).

Data Availability Statement

The data presented in this study are openly available in: https://0-doi-org.brum.beds.ac.uk/10.3929/ethz-b-000488405 (accessed on 5 June 2021). Please also see the ‘supplementary materials–dataset’ for more information on the dataset.

Acknowledgments

We thank Lukas Prey for providing an R script that served as a basis for calculation of the Normalized Ratio Indices. We also thank Constantin Streit from the Federal Office of Agriculture for providing insights in the national census data. We further thank Johannes Pfeiffer and Moritz Köhle for data collection and cleaning as part of the project FLOURISH. We also thank the students Alexandra Rieder, Annabelle Ehmann and Gabriela Küng from the group of Crop Science who joined data collection campaigns and thus contributed to the collection of the data presented here.

Conflicts of Interest

The authors declare no conflict of interest.

References

Muñoz-Huerta, R.; Guevara-Gonzalez, R.; Contreras-Medina, L.; Torres-Pacheco, I.; Prado-Olivarez, J.; Ocampo-Velazquez, R. A Review of Methods for Sensing the Nitrogen Status in Plants: Advantages, Disadvantages and Recent Advances. Sensors 2013, 13, 10823–10843. [Google Scholar] [CrossRef]
Galloway, J.N.; Cowling, E.B. Reactive Nitrogen and The World: 200 Years of Change. Ambio 2002, 31, 64–71. [Google Scholar] [CrossRef]
Chapin, F.S.; Bloom, A.J.; Field, C.B.; Waring, R.H. Plant Responses to Multiple Environmental Factors Physiological Ecology Provides Tools for Studying How Interacting Environmental Resources Control Plant Growth. BioScience 1987, 37, 49–57. [Google Scholar] [CrossRef]
Kokaly, R.F.; Asner, G.P.; Ollinger, S.V.; Martin, M.E.; Wessman, C.A. Characterizing Canopy Biochemistry from Imaging Spectroscopy and Its Application to Ecosystem Studies. Remote Sens. Environ. 2009, 113, S78–S91. [Google Scholar] [CrossRef]
Berger, K.; Verrelst, J.; Féret, J.-B.; Wang, Z.; Wocher, M.; Strathmann, M.; Danner, M.; Mauser, W.; Hank, T. Crop Nitrogen Monitoring: Recent Progress and Principal Developments in the Context of Imaging Spectroscopy Missions. Remote Sens. Environ. 2020, 242, 111758. [Google Scholar] [CrossRef]
Wright, I.J.; Reich, P.B.; Westoby, M.; Ackerly, D.D.; Baruch, Z.; Bongers, F.; Cavender-Bares, J.; Chapin, T.; Cornelissen, J.H.C.; Diemer, M.; et al. The Worldwide Leaf Economics Spectrum. Nature 2004, 428, 821–827. [Google Scholar] [CrossRef]
Haynes, R. Mineral Nitrogen in the Plant-Soil System; Elsevier: Amsterdam, The Netherlands, 2012; ISBN 978-0-323-14816-0. [Google Scholar]
Gruber, N.; Galloway, J.N. An Earth-System Perspective of the Global Nitrogen Cycle. Nature 2008, 451, 293–296. [Google Scholar] [CrossRef] [PubMed]
Galloway, J.N.; Townsend, A.R.; Erisman, J.W.; Bekunda, M.; Cai, Z.; Freney, J.R.; Martinelli, L.A.; Seitzinger, S.P.; Sutton, M.A. Transformation of the Nitrogen Cycle: Recent Trends, Questions, and Potential Solutions. Science 2008, 320, 889–892. [Google Scholar] [CrossRef] [Green Version]
Diaz, R.J.; Rosenberg, R. Spreading Dead Zones and Consequences for Marine Ecosystems. Science 2008, 321, 926–929. [Google Scholar] [CrossRef] [PubMed]
Turner, R.E.; Rabalais, N.N.; Justić, D.; Dortch, Q. Global Patterns of Dissolved N, P and Si in Large Rivers. Biogeochemistry 2003, 64, 297–317. [Google Scholar] [CrossRef]
Dise, N.B.; Ashmore, M.; Belyazid, S.; Bleeker, A.; Bobbink, R.; de Vries, W.; Erisman, J.W.; Spranger, T.; Stevens, C.J.; van den Berg, L. Nitrogen as a threat to European terrestrial biodiversity. In The European Nitrogen Assessment; Sutton, M.A., Howard, C.M., Erisman, J.W., Billen, G., Bleeker, A., Grennfelt, P., van Grinsven, H., Grizzetti, B., Eds.; Cambridge University Press: Cambridge, MA, USA, 2011; pp. 463–494. ISBN 978-0-511-97698-8. [Google Scholar]
Dalal, R.C.; Wang, W.; Robertson, G.P.; Parton, W.J. Nitrous Oxide Emission from Australian Agricultural Lands and Mitigation Options: A Review. Soil Res. 2003, 41, 165–195. [Google Scholar] [CrossRef]
Wrage, N.; Velthof, G.L.; van Beusichem, M.L.; Oenema, O. Role of Nitrifier Denitrification in the Production of Nitrous Oxide. Soil Biol. Biochem. 2001, 33, 1723–1732. [Google Scholar] [CrossRef]
Conway, G.R. Agroecosystem Analysis. Agric. Adm. 1985, 20, 31–55. [Google Scholar] [CrossRef]
Lemaire, G.; Jeuffroy, M.-H.; Gastal, F. Diagnosis Tool for Plant and Crop N Status in Vegetative Stage. Eur. J. Agron. 2008, 28, 614–624. [Google Scholar] [CrossRef]
Prey, L.; Schmidhalter, U. Sensitivity of Vegetation Indices for Estimating Vegetative N Status in Winter Wheat. Sensors 2019, 19, 3712. [Google Scholar] [CrossRef] [Green Version]
Tremblay, N.; Fallon, E.; Ziadi, N. Sensing of Crop Nitrogen Status: Opportunities, Tools, Limitations, and Supporting Information Requirements. HortTecnology 2011, 21, 274–281. [Google Scholar] [CrossRef]
Sharma, L.; Bali, S. A Review of Methods to Improve Nitrogen Use Efficiency in Agriculture. Sustainability 2017, 10, 51. [Google Scholar] [CrossRef] [Green Version]
Argento, F.; Anken, T.; Abt, F.; Vogelsanger, E.; Walter, A.; Liebisch, F. Site-Specific Nitrogen Management in Winter Wheat Supported by Low-Altitude Remote Sensing and Soil Data. Precis. Agric. 2020. [Google Scholar] [CrossRef]
Chen, P.; Haboudane, D.; Tremblay, N.; Wang, J.; Vigneault, P.; Li, B. New Spectral Indicator Assessing the Efficiency of Crop Nitrogen Treatment in Corn and Wheat. Remote Sens. Environ. 2010, 114, 1987–1997. [Google Scholar] [CrossRef]
Baret, F.; Houles, V.; Guerif, M. Quantification of Plant Stress Using Remote Sensing Observations and Crop Models: The Case of Nitrogen Management. J. Exp. Bot. 2006, 58, 869–880. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jay, S.; Maupas, F.; Bendoula, R.; Gorretta, N. Retrieving LAI, Chlorophyll and Nitrogen Contents in Sugar Beet Crops from Multi-Angular Optical Remote Sensing: Comparison of Vegetation Indices and PROSAIL Inversion for Field Phenotyping. Field Crop. Res. 2017, 210, 33–46. [Google Scholar] [CrossRef] [Green Version]
Prey, L.; Schmidhalter, U. Simulation of Satellite Reflectance Data Using High-Frequency Ground Based Hyperspectral Canopy Measurements for in-Season Estimation of Grain Yield and Grain Nitrogen Status in Winter Wheat. ISPRS J. Photogramm. Remote Sens. 2019, 149, 176–187. [Google Scholar] [CrossRef]
Zhang, H.-Y.; Ren, X.-X.; Zhou, Y.; Wu, Y.-P.; He, L.; Heng, Y.-R.; Feng, W.; Wang, C.-Y. Remotely Assessing Photosynthetic Nitrogen Use Efficiency with in Situ Hyperspectral Remote Sensing in Winter Wheat. Eur. J. Agron. 2018, 101, 90–100. [Google Scholar] [CrossRef]
Chang, N.-B.; Imen, S.; Vannah, B. Remote Sensing for Monitoring Surface Water Quality Status and Ecosystem State in Relation to the Nutrient Cycle: A 40-Year Perspective. Crit. Rev. Environ. Sci. Technol. 2015, 45, 101–166. [Google Scholar] [CrossRef]
Jin, Z.; Archontoulis, S.V.; Lobell, D.B. How Much Will Precision Nitrogen Management Pay off? An Evaluation Based on Simulating Thousands of Corn Fields over the US Corn-Belt. Field Crop. Res. 2019, 240, 12–22. [Google Scholar] [CrossRef]
Stroppiana, D.; Fava, F.; Boschetti, M.; Brivio, P. Estimation of Nitrogen Content in Crops and Pastures Using Hyperspectral Vegetation Indices. In Hyperspectral Remote Sensing of Vegetation; CRC Press: Boca Raton, FL, USA, 2011; pp. 245–262. ISBN 978-1-4398-4537-0. [Google Scholar]
Fitzgerald, G.; Rodriguez, D.; O’Leary, G. Measuring and Predicting Canopy Nitrogen Nutrition in Wheat Using a Spectral Index—The Canopy Chlorophyll Content Index (CCCI). Field Crop. Res. 2010, 116, 318–324. [Google Scholar] [CrossRef]
Homolová, L.; Malenovský, Z.; Clevers, J.G.P.W.; García-Santos, G.; Schaepman, M.E. Review of Optical-Based Remote Sensing for Plant Trait Mapping. Ecol. Complex. 2013, 15, 1–16. [Google Scholar] [CrossRef] [Green Version]
Erdle, K.; Mistele, B.; Schmidhalter, U. Comparison of Active and Passive Spectral Sensors in Discriminating Biomass Parameters and Nitrogen Status in Wheat Cultivars. Field Crop. Res. 2011, 124, 74–84. [Google Scholar] [CrossRef]
Söderström, M.; Piikki, K.; Stenberg, M.; Stadig, H.; Martinsson, J. Producing Nitrogen (N) Uptake Maps in Winter Wheat by Combining Proximal Crop Measurements with Sentinel-2 and DMC Satellite Images in a Decision Support System for Farmers. Acta Agric. Scand. Sect. B Soil Plant Sci. 2017, 67, 637–650. [Google Scholar] [CrossRef]
Schlemmer, M.; Gitelson, A.; Schepers, J.; Ferguson, R.; Peng, Y.; Shanahan, J.; Rundquist, D. Remote Estimation of Nitrogen and Chlorophyll Contents in Maize at Leaf and Canopy Levels. Int. J. Appl. Earth Obs. Geoinf. 2013, 25, 47–54. [Google Scholar] [CrossRef] [Green Version]
Clevers, J.G.P.W.; Gitelson, A.A. Remote Estimation of Crop and Grass Chlorophyll and Nitrogen Content Using Red-Edge Bands on Sentinel-2 and -3. Int. J. Appl. Earth Obs. Geoinf. 2013, 23, 344–351. [Google Scholar] [CrossRef]
Herrmann, I.; Karnieli, A.; Bonfil, D.J.; Cohen, Y.; Alchanatis, V. SWIR-Based Spectral Indices for Assessing Nitrogen Content in Potato Fields. Int. J. Remote Sens. 2010, 31, 5127–5143. [Google Scholar] [CrossRef]
Thenkabail, P.S.; Mariotto, I.; Gumma, M.K.; Middleton, E.M.; Landis, D.R.; Huemmrich, K.F. Selection of Hyperspectral Narrowbands (HNBs) and Composition of Hyperspectral Twoband Vegetation Indices (HVIs) for Biophysical Characterization and Discrimination of Crop Types Using Field Reflectance and Hyperion/EO-1 Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 427–439. [Google Scholar] [CrossRef] [Green Version]
Yoder, B.J.; Pettigrew-Crosby, R.E. Predicting Nitrogen and Chlorophyll Content and Concentrations from Reflectance Spectra (400–2500 Nm) at Leaf and Canopy Scales. Remote Sens. Environ. 1995, 53, 199–211. [Google Scholar] [CrossRef]
Curran, P.J. Remote Sensing of Foliar Chemistry. Remote Sens. Environ. 1989, 30, 271–278. [Google Scholar] [CrossRef]
Tucker, C.J. Red and Photographic Infrared Linear Combinations for Monitoring Vegetation. Remote Sens. Environ. 1979, 8, 24. [Google Scholar] [CrossRef] [Green Version]
Berntsen, J.; Thomsen, A.; Schelde, K.; Hansen, O.M.; Knudsen, L.; Broge, N.; Hougaard, H.; Hørfarter, R. Algorithms for Sensor-Based Redistribution of Nitrogen Fertilizer in Winter Wheat. Precis. Agric. 2006, 7, 65–83. [Google Scholar] [CrossRef]
Meisinger, J.J.; Schepers, J.S.; Raun, W.R. Crop Nitrogen Requirement and Fertilization. In Agronomy Monographs; Schepers, J.S., Raun, W.R., Eds.; American Society of Agronomy, Crop Science Society of America, Soil Science Society of America: Madison, WI, USA, 2015; pp. 563–612. ISBN 978-0-89118-191-0. [Google Scholar]
Tremblay, N.; Wang, Z.; Ma, B.-L.; Belec, C.; Vigneault, P. A Comparison of Crop Data Measured by Two Commercial Sensors for Variable-Rate Nitrogen Application. Precis. Agric. 2009, 10, 145–161. [Google Scholar] [CrossRef]
Hansen, P.M.; Schjoerring, J.K. Reflectance Measurement of Canopy Biomass and Nitrogen Status in Wheat Crops Using Normalized Difference Vegetation Indices and Partial Least Squares Regression. Remote Sens. Environ. 2003, 86, 542–553. [Google Scholar] [CrossRef]
Pullanagari, R.R.; Kereszturi, G.; Yule, I.J. Mapping of Macro and Micro Nutrients of Mixed Pastures Using Airborne AisaFENIX Hyperspectral Imagery. ISPRS J. Photogramm. Remote Sens. 2016, 117, 1–10. [Google Scholar] [CrossRef]
Liang, L.; Di, L.; Huang, T.; Wang, J.; Lin, L.; Wang, L.; Yang, M. Estimation of Leaf Nitrogen Content in Wheat Using New Hyperspectral Indices and a Random Forest Regression Algorithm. Remote Sens. 2018, 10, 1940. [Google Scholar] [CrossRef] [Green Version]
Berger, K.; Verrelst, J.; Féret, J.-B.; Hank, T.; Wocher, M.; Mauser, W.; Camps-Valls, G. Retrieval of Aboveground Crop Nitrogen Content with a Hybrid Machine Learning Method. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102174. [Google Scholar] [CrossRef]
Van Wittenberghe, S.; Verrelst, J.; Rivera, J.P.; Alonso, L.; Moreno, J.; Samson, R. Gaussian Processes Retrieval of Leaf Parameters from a Multi-Species Reflectance, Absorbance and Fluorescence Dataset. J. Photochem. Photobiol. B Biol. 2014, 134, 37–48. [Google Scholar] [CrossRef]
Wang, Z.; Townsend, P.A.; Schweiger, A.K.; Couture, J.J.; Singh, A.; Hobbie, S.E.; Cavender-Bares, J. Mapping Foliar Functional Traits and Their Uncertainties across Three Years in a Grassland Experiment. Remote Sens. Environ. 2019, 221, 405–416. [Google Scholar] [CrossRef]
Chlingaryan, A.; Sukkarieh, S.; Whelan, B. Machine Learning Approaches for Crop Yield Prediction and Nitrogen Status Estimation in Precision Agriculture: A Review. Comput. Electron. Agric. 2018, 151, 61–69. [Google Scholar] [CrossRef]
Aasen, H.; Gnyp, M.L.; Miao, Y.; Bareth, G. Automated Hyperspectral Vegetation Index Retrieval from Multiple Correlation Matrices with HyperCor. Photogramm. Eng. Remote Sens. 2014, 80, 785–795. [Google Scholar] [CrossRef]
Broge, N.H.; Leblanc, E. Comparing Prediction Power and Stability of Broadband and Hyperspectral Vegetation Indices for Estimation of Green Leaf Area Index and Canopy Chlorophyll Density. Remote Sens. Environ. 2001, 76, 156–172. [Google Scholar] [CrossRef]
Gnyp, M.L.; Bareth, G.; Li, F.; Lenz-Wiedemann, V.I.S.; Koppe, W.; Miao, Y.; Hennig, S.D.; Jia, L.; Laudien, R.; Chen, X.; et al. Development and Implementation of a Multiscale Biomass Model Using Hyperspectral Vegetation Indices for Winter Wheat in the North China Plain. Int. J. Appl. Earth Obs. Geoinf. 2014, 33, 232–242. [Google Scholar] [CrossRef]
Gnyp, M.L.; Miao, Y.; Yuan, F.; Ustin, S.L.; Yu, K.; Yao, Y.; Huang, S.; Bareth, G. Hyperspectral Canopy Sensing of Paddy Rice Aboveground Biomass at Different Growth Stages. Field Crop. Res. 2014, 155, 42–55. [Google Scholar] [CrossRef]
Lambert, M.-J.; Traoré, P.C.S.; Blaes, X.; Baret, P.; Defourny, P. Estimating Smallholder Crops Production at Village Level from Sentinel-2 Time Series in Mali’s Cotton Belt. Remote Sens. Environ. 2018, 216, 647–657. [Google Scholar] [CrossRef]
Bendig, J.; Yu, K.; Aasen, H.; Bolten, A.; Bennertz, S.; Broscheit, J.; Gnyp, M.L.; Bareth, G. Combining UAV-Based Plant Height from Crop Surface Models, Visible, and near Infrared Vegetation Indices for Biomass Monitoring in Barley. Int. J. Appl. Earth Obs. Geoinf. 2015, 39, 79–87. [Google Scholar] [CrossRef]
Clevers, J.; Kooistra, L.; van den Brande, M. Using Sentinel-2 Data for Retrieving LAI and Leaf and Canopy Chlorophyll Content of a Potato Crop. Remote Sens. 2017, 9, 405. [Google Scholar] [CrossRef] [Green Version]
Oliveira, R.A.; Näsi, R.; Niemeläinen, O.; Nyholm, L.; Alhonoja, K.; Kaivosoja, J.; Jauhiainen, L.; Viljanen, N.; Nezami, S.; Markelin, L.; et al. Machine Learning Estimators for the Quantity and Quality of Grass Swards Used for Silage Production Using Drone-Based Imaging Spectrometry and Photogrammetry. Remote Sens. Environ. 2020, 246, 111830. [Google Scholar] [CrossRef]
Haboudane, D.; Miller, J.R.; Pattey, E.; Zarco-Tejada, P.J.; Strachan, I.B. Hyperspectral Vegetation Indices and Novel Algorithms for Predicting Green LAI of Crop Canopies: Modeling and Validation in the Context of Precision Agriculture. Remote Sens. Environ. 2004, 90, 337–352. [Google Scholar] [CrossRef]
Crema, A.; Boschetti, M.; Nutini, F.; Cillis, D.; Casa, R. Influence of Soil Properties on Maize and Wheat Nitrogen Status Assessment from Sentinel-2 Data. Remote Sens. 2020, 12, 2175. [Google Scholar] [CrossRef]
Delloye, C.; Weiss, M.; Defourny, P. Retrieval of the Canopy Chlorophyll Content from Sentinel-2 Spectral Bands to Estimate Nitrogen Uptake in Intensive Winter Wheat Cropping Systems. Remote Sens. Environ. 2018, 216, 245–261. [Google Scholar] [CrossRef]
Meier, J.; Mauser, W.; Hank, T.; Bach, H. Assessments on the Impact of High-Resolution-Sensor Pixel Sizes for Common Agricultural Policy and Smart Farming Services in European Regions. Comput. Electron. Agric. 2020, 169, 105205. [Google Scholar] [CrossRef]
Bundesamt für Statistik. Landwirtschaft Und Ernährung—Taschenstatistik 2020; Bundesamt für Statistik BFS: Bern, Switzerland, 2020. [Google Scholar]
Clevers, J.G.P.W.; Kooistra, L. Using Hyperspectral Remote Sensing Data for Retrieving Canopy Chlorophyll and Nitrogen Content. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 574–583. [Google Scholar] [CrossRef]
Frampton, W.J.; Dash, J.; Watmough, G.; Milton, E.J. Evaluating the Capabilities of Sentinel-2 for Quantitative Estimation of Biophysical Variables in Vegetation. ISPRS J. Photogramm. Remote Sens. 2013, 82, 83–92. [Google Scholar] [CrossRef] [Green Version]
Morcillo-Pallarés, P.; Rivera-Caicedo, J.P.; Belda, S.; De Grave, C.; Burriel, H.; Moreno, J.; Verrelst, J. Quantifying the Robustness of Vegetation Indices through Global Sensitivity Analysis of Homogeneous and Forest Leaf-Canopy Radiative Transfer Models. Remote Sens. 2019, 11, 2418. [Google Scholar] [CrossRef] [Green Version]
Myneni, R.B.; Ramakrishna, R.; Nemani, R.; Running, S.W. Estimation of Global Leaf Area Index and Absorbed Par Using Radiative Transfer Models. IEEE Trans. Geosci. Remote Sens. 1997, 35, 1380–1393. [Google Scholar] [CrossRef] [Green Version]
Verrelst, J.; Malenovský, Z.; Van der Tol, C.; Camps-Valls, G.; Gastellu-Etchegorry, J.-P.; Lewis, P.; North, P.; Moreno, J. Quantifying Vegetation Biophysical Variables from Imaging Spectroscopy Data: A Review on Retrieval Methods. Surv. Geophys. 2019, 40, 589–629. [Google Scholar] [CrossRef] [Green Version]
Liebisch, F.; Kung, G.; Damm, A.; Walter, A. Characterization of Crop Vitality and Resource Use Efficiency by Means of Combining Imaging Spectroscopy Based Plant Traits. In Proceedings of the 2014 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Lausanne, Switzerland, 24–27 June 2014; pp. 1–4. [Google Scholar]
Walter, A.; Khanna, R.; Lottes, P.; Stachniss, C.; Nieto, J.; Liebisch, F. Flourish—A Robotic Approach for Automation in Crop Management. In Proceedings of the 14th International Conference on Precision Agriculture, Montreal, QC, Canada, 24–27 June 2018; p. 9. [Google Scholar]
Lancashire, P.D.; Bleiholder, H.; Boom, T.V.D.; Langelüddeke, P.; Stauss, R.; Weber, E.; Witzenberger, A. A Uniform Decimal Code for Growth Stages of Crops and Weeds. Ann. Appl. Biol. 1991, 119, 561–601. [Google Scholar] [CrossRef]
Roth, L.; Aasen, H.; Walter, A.; Liebisch, F. Extracting Leaf Area Index Using Viewing Geometry Effects—A New Perspective on High-Resolution Unmanned Aerial System Photography. ISPRS J. Photogramm. Remote Sens. 2018, 141, 161–175. [Google Scholar] [CrossRef]
Lichtenthaler, H.K.; Buschmann, C. Chlorophylls and Carotenoids: Measurement and Characterization by UV-VIS Spectroscopy. Curr. Protoc. Food Anal. Chem. 2001, 1, F4.3.1–F4.3.8. [Google Scholar] [CrossRef]
Jones, H.G.; Vaughan, R.A. Remote Sensing of Vegetation: Principles, Techniques, and Applications; OUP Oxford: Oxford, UK, 2010; ISBN 978-0-19-920779-4. [Google Scholar]
R Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013. [Google Scholar]
Lehnert, L.W.; Meyer, H.; Obermeier, W.A.; Silva, B.; Regeling, B.; Bendix, J. Hyperspectral Data Analysis in R: The Hsdar Package. J. Stat. Soft. 2019, 89. [Google Scholar] [CrossRef] [Green Version]
Li, F.; Gnyp, M.L.; Jia, L.; Miao, Y.; Yu, Z.; Koppe, W.; Bareth, G.; Chen, X.; Zhang, F. Estimating N Status of Winter Wheat Using a Handheld Spectrometer in the North China Plain. Field Crop. Res. 2008, 106, 9. [Google Scholar] [CrossRef]
Gnyp, M.L.; Yu, K.; Aasen, H.; Yao, Y.; Huang, S.; Miao, Y.; Bareth, C.G. Analysis of Crop Reflectance for Estimating Biomass in Rice Canopies at Different Phenological Stages—Reflexionsanalyse Zur Abschätzung Der Biomasse von Reis in Unterschiedlichen Phänologischen Stadien. Photogramm. Fernerkund. Geoinf. 2013, 351–365. [Google Scholar] [CrossRef]
Wang, L.; Chang, Q.; Yang, J.; Zhang, X.; Li, F. Estimation of Paddy Rice Leaf Area Index Using Machine Learning Methods Based on Hyperspectral Data from Multi-Year Experiments. PLoS ONE 2018, 13, e0207624. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kuhn, M. Building Predictive Models in R Using the Caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef] [Green Version]
Wright, M.N.; Ziegler, A. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J. Stat. Softw. 2017, 77, 1–17. [Google Scholar] [CrossRef] [Green Version]
Nicodemus, K.K.; Malley, J.D.; Strobl, C.; Ziegler, A. The Behaviour of Random Forest Permutation-Based Variable Importance Measures under Predictor Correlation. BMC Bioinform. 2010, 11, 110. [Google Scholar] [CrossRef] [Green Version]
Verrelst, J.; Romijn, E.; Kooistra, L. Mapping Vegetation Density in a Heterogeneous River Floodplain Ecosystem Using Pointable CHRIS/PROBA Data. Remote Sens. 2012, 4, 2866–2889. [Google Scholar] [CrossRef] [Green Version]
Verrelst, J.; Rivera, J.P.; Mardashova, M.; Moreno, J. ARTMO’s Global Sensitivity Analysis (GSA) Toolbox to Quantify Driving Variables of Leaf and Canopy Radiative Transfer Models 2015. In Proceedings of the 9th EARSeL SIG Imaging Spectroscopy Workshop, Luxembourg, 14–16 April 2015. [Google Scholar]
Caicedo, J.P.R.; Verrelst, J.; Munoz-Mari, J.; Moreno, J.; Camps-Valls, G. Toward a Semiautomatic Machine Learning Retrieval of Biophysical Parameters. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 1249–1259. [Google Scholar] [CrossRef]
Rivera, J.P.; Verrelst, J.; Gómez-Dans, J.; Muñoz-Marí, J.; Moreno, J.; Camps-Valls, G. An Emulator Toolbox to Approximate Radiative Transfer Models with Statistical Learning. Remote Sens. 2015, 7, 9347–9370. [Google Scholar] [CrossRef] [Green Version]
Verrelst, J.; Rivera, J.P.; Gitelson, A.; Delegido, J.; Moreno, J.; Camps-Valls, G. Spectral Band Selection for Vegetation Properties Retrieval Using Gaussian Processes Regression. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 554–567. [Google Scholar] [CrossRef]
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; Adaptive Computation and Machine Learning Series; MIT Press: Cambridge, MA, USA, 2006; ISBN 978-0-262-18253-9. [Google Scholar]
Verrelst, J.; Vicent, J.; Rivera-Caicedo, J.P.; Lumbierres, M.; Morcillo-Pallarés, P.; Moreno, J. Global Sensitivity Analysis of Leaf-Canopy-Atmosphere RTMs: Implications for Biophysical Variables Retrieval from Top-of-Atmosphere Radiance Data. Remote Sens. 2019, 11, 1923. [Google Scholar] [CrossRef] [Green Version]
Camino, C.; González-Dugo, V.; Hernández, P.; Sillero, J.C.; Zarco-Tejada, P.J. Improved Nitrogen Retrievals with Airborne-Derived Fluorescence and Plant Traits Quantified from VNIR-SWIR Hyperspectral Imagery in the Context of Precision Agriculture. Int. J. Appl. Earth Obs. Geoinf. 2018, 70, 105–117. [Google Scholar] [CrossRef]
Jay, S.; Gorretta, N.; Morel, J.; Maupas, F.; Bendoula, R.; Rabatel, G.; Dutartre, D.; Comar, A.; Baret, F. Estimating Leaf Chlorophyll Content in Sugar Beet Canopies Using Millimeter- to Centimeter-Scale Reflectance Imagery. Remote Sens. Environ. 2017, 198, 173–186. [Google Scholar] [CrossRef]
Zhou, K.; Cheng, T.; Zhu, Y.; Cao, W.; Ustin, S.L.; Zheng, H.; Yao, X.; Tian, Y. Assessing the Impact of Spatial Resolution on the Estimation of Leaf Nitrogen Concentration Over the Full Season of Paddy Rice Using Near-Surface Imaging Spectroscopy Data. Front. Plant Sci. 2018, 9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Thenkabail, P.S. Optimal Hyperspectral Narrowbands for Discriminating Agricultural Crops. Remote Sens. Rev. 2001, 20, 257–291. [Google Scholar] [CrossRef]
Daughtry, C.S.T.; Walthall, C.L.; Kim, M.S.; de Colstoun, E.B.; McMurtrey, J.E. Estimating Corn Leaf Chlorophyll Concentration from Leaf and Canopy Reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
Yuan, H.; Yang, G.; Li, C.; Wang, Y.; Liu, J.; Yu, H.; Feng, H.; Xu, B.; Zhao, X.; Yang, X. Retrieving Soybean Leaf Area Index from Unmanned Aerial Vehicle Hyperspectral Remote Sensing: Analysis of RF, ANN, and SVM Regression Models. Remote Sens. 2017, 9, 309. [Google Scholar] [CrossRef] [Green Version]
Verrelst, J.; Pablo Rivera, J.; Moreno, J.; Camps-Valls, G. Gaussian Processes Uncertainty Estimates in Experimental Sentinel-2 LAI and Leaf Chlorophyll Content Retrieval. ISPRS J. Photogramm. Remote Sens. 2013, 86, 157–167. [Google Scholar] [CrossRef]
He, L.; Song, X.; Feng, W.; Guo, B.-B.; Zhang, Y.-S.; Wang, Y.-H.; Wang, C.-Y.; Guo, T.-C. Improved Remote Sensing of Leaf Nitrogen Concentration in Winter Wheat Using Multi-Angular Hyperspectral Data. Remote Sens. Environ. 2016, 174, 122–133. [Google Scholar] [CrossRef]
Ziegler, A.; König, I.R. Mining Data with Random Forests: Current Options for Real-World Applications. Wires Data Min. Knowl. Discov. 2014, 4, 55–63. [Google Scholar] [CrossRef]
Jordan, M.I.; Mitchell, T.M. Machine Learning: Trends, Perspectives, and Prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
Mousivand, A.; Menenti, M.; Gorte, B.; Verhoef, W. Global Sensitivity Analysis of the Spectral Radiance of a Soil–Vegetation System. Remote Sens. Environ. 2014, 145, 131–144. [Google Scholar] [CrossRef]
Guo, B.-B.; Qi, S.-L.; Heng, Y.-R.; Duan, J.-Z.; Zhang, H.-Y.; Wu, Y.-P.; Feng, W.; Xie, Y.-X.; Zhu, Y.-J. Remotely Assessing Leaf N Uptake in Winter Wheat Based on Canopy Hyperspectral Red-Edge Absorption. Eur. J. Agron. 2017, 82, 113–124. [Google Scholar] [CrossRef]
Moreno-Martínez, Á.; Camps-Valls, G.; Kattge, J.; Robinson, N.; Reichstein, M.; van Bodegom, P.; Kramer, K.; Cornelissen, J.H.C.; Reich, P.; Bahn, M.; et al. A Methodology to Derive Global Maps of Leaf Traits Using Remote Sensing and Climate Data. Remote Sens. Environ. 2018, 218, 69–88. [Google Scholar] [CrossRef] [Green Version]
Zhao, H.; Song, X.; Yang, G.; Li, Z.; Zhang, D.; Feng, H. Monitoring of Nitrogen and Grain Protein Content in Winter Wheat Based on Sentinel-2A Data. Remote Sens. 2019, 11, 1724. [Google Scholar] [CrossRef] [Green Version]
European Comission. Working with Parliament and Council to Make the CAP Reform Fit for the European Green Deal; European Union: Brussels, Belgium, 2020. [Google Scholar]
Zarco-Tejada, P.J.; Miller, J.R.; Harron, J.; Hu, B.; Noland, T.L.; Goel, N.; Mohammed, G.H.; Sampson, P. Needle Chlorophyll Content Estimation through Model Inversion Using Hyperspectral Data from Boreal Conifer Forest Canopies. Remote Sens. Environ. 2004, 89, 189–199. [Google Scholar] [CrossRef]
Gnyp, M.L.; Panitzki, M.; Reusch, S.; Jasper, J.; Bolten, A.; Bareth, G. Comparison between Tractor-Based and UAV-Based Spectrometer Measurements in Winter Wheat. In Proceedings of the 13th International Conference on Precision Agriculture, St. Louis, MO, USA, 31 July–3 August 2016; p. 10. [Google Scholar]

Figure 1. Coefficients of determination (R²) values for the field spectrometer (FS) dataset (empty bars) and the Sentinel-2 (S2) resampled dataset (hatched bars) for the used methods: Normalized Ratio Index (NRI, blue), Random Forest Regression (RFR, red) and Gaussian Processes Regression (GPR, green) as related to the plant traits described in Table 1.

Figure 2. Calculated variable importance scores of the random forest regression (RFR) on the full dataset for the field spectrometer (FS, left) and the Sentinel-2 (S2, right) resampled data. The colors show the waveband regions visible (VIS: 400–690 nm), red edge (RE: 700–790 nm), near infrared (NIR: 800–1350 nm) and short-wave infrared (SWIR: 1450–2400 nm). The water absorption bands in the regions 1350–1450, 1790–1990 and >2400 nm were omitted due to their low signal to noise ratio.

Figure 3. Prediction performance in (R²) for the traits as a function of the number of spectral bands obtained using the Gaussian processes regression–band analysis tool (GPR-BAT) tool with sequential backward band removal (SBBR) algorithm applied (for details on SBBR, see: [86]).

Figure 4. Occurrence of the top five ranked bands with lowest GPR sigma values for the ASD sensor (left) and the S2 resampled sensor (right). Data from 10-fold cross validation, e.g., 50 (10 folds × 5 ranks) is the maximum possible occurrence.

Figure 5. GSA results for the ASD ground spectrometer (left) and the Sentinel-2 resampled (right) sensor for the full dataset.

Table 1. Datasets and traits used in this study.

Individual Traits	n	Min	Median	Max
N_conc [%]	322	0.68	3.46	5.32
Chl_AB [mg g⁻¹]	194	2.28	5.34	7.11
LAI [m² m⁻²]	272	0.05	2.09	8.63
LAI*N_conc [%]	210	0.17	7.20	41.25
LAI*Chl_AB [mg g⁻¹]	193	0.14	11.25	56.01
Combined Data	n	Min BBCH	Median BBCH	Max BBCH
full dataset	180	15	30	80
erectophile	98	15	31	80
planophile	55	15	22	67
winter wheat	64	15	30	32
sugar beet	45	15	21	38

Table 2. The specifications of the Multispectral Instrument (MSI) on board the Sentinel-2 (S2) satellites (reproduced from the European Space Agency ESA). Band B10 was not used in the S2 resampled dataset as it lies within a region of atmospheric water absorption.

Band	Band Name	Center Wavelength [nm]	Bandwidth [nm]	Ground Resolution [m]
B01	Coastal aerosol	443	21.00	60
B02	Blue	490	66.00	10
B03	Green	560	36.00	10
B04	Red	665	31.00	10
B05	RE1	705	15.50	20
B06	RE2	740	15.00	20
B07	RE3	783	20.00	20
B08	NIR1	842	106.00	10
B8a	NIR2	865	21.50	20
B09	Water vapour	945	20.50	60
B10	SWIR—cirrus	1375	30.50	60
B11	SWIR1	1610	92.50	20
B12	SWIR2	2190	180.00	20

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Perich, G.; Aasen, H.; Verrelst, J.; Argento, F.; Walter, A.; Liebisch, F. Crop Nitrogen Retrieval Methods for Simulated Sentinel-2 Data Using In-Field Spectrometer Data. Remote Sens. 2021, 13, 2404. https://0-doi-org.brum.beds.ac.uk/10.3390/rs13122404

AMA Style

Perich G, Aasen H, Verrelst J, Argento F, Walter A, Liebisch F. Crop Nitrogen Retrieval Methods for Simulated Sentinel-2 Data Using In-Field Spectrometer Data. Remote Sensing. 2021; 13(12):2404. https://0-doi-org.brum.beds.ac.uk/10.3390/rs13122404

Chicago/Turabian Style

Perich, Gregor, Helge Aasen, Jochem Verrelst, Francesco Argento, Achim Walter, and Frank Liebisch. 2021. "Crop Nitrogen Retrieval Methods for Simulated Sentinel-2 Data Using In-Field Spectrometer Data" Remote Sensing 13, no. 12: 2404. https://0-doi-org.brum.beds.ac.uk/10.3390/rs13122404

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Crop Nitrogen Retrieval Methods for Simulated Sentinel-2 Data Using In-Field Spectrometer Data

Abstract

1. Introduction

1.1. Nitrogen in Agro-Ecosystems

1.2. Remote Sensing of Plant Nitrogen and Biomass

1.2.1. On the Terminology of Plant Nitrogen Status

1.2.2. Remote Sensing of Crop Nitrogen

1.2.3. Remote Sensing Plant Biomass

1.2.4. Field Spectrometer for Validating Satellite Measurements

1.3. Aims of This Study

2. Materials and Methods

2.1. The Dataset

2.2. Data Analysis

2.2.1. Dataset Pre-Processing

2.2.2. Normalized Ratio Indices Generation

2.2.3. Random Forest Regression

2.2.4. Gaussian Processes Regression–Band Analysis Tool

2.2.5. Global Sensitivity Analysis

3. Results

3.1. Comparison of Spectral Analysis Methods

3.2. Spectral Band Selection

3.2.1. Random Forest Variable Importance

3.2.2. Gaussian Processes Regression–Band Analysis Tool

3.3. Global Sensitivity Analysis

4. Discussion

4.1. Optimal Analysis Method Depends on Target Trait

4.2. Low Specificity of Index-Based Methods for Satellite-Based Remote Sensing

4.3. Model Performance on Data Subsets

4.4. Influence of Band Number and Bandwidth on Trait Estimation

4.5. Spectral Regions for Trait Estimation

4.6. Important Bands for Plant N Estimation

4.7. Field Spectrometer Data for Satellite Data Simulation

4.8. Outlook on Remote Sensing of Plant N

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI