An NDVI Retrieval Method Based on a Double-Attention Recurrent Neural Network for Cloudy Regions

Jing, Ran; Duan, Fuzhou; Lu, Fengxian; Zhang, Miao; Zhao, Wenji

doi:10.3390/rs14071632

Open AccessArticle

An NDVI Retrieval Method Based on a Double-Attention Recurrent Neural Network for Cloudy Regions

¹

School of Geosciences, Yangtze University, Wuhan 430100, China

²

Engineering Research Center of Spatial Information Technology, MOE, Beijing 100048, China

³

College of Resource Environment and Tourism, Capital Normal University, Beijing 100048, China

⁴

Henan Engineering Research Center of Environmental Laser Remote Sensing Technology and Application, Nanyang 473061, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(7), 1632; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14071632

Submission received: 11 February 2022 / Revised: 24 March 2022 / Accepted: 26 March 2022 / Published: 29 March 2022

(This article belongs to the Special Issue Microwave Remote Sensing for Quantitative Parameters Retrieval: Methods and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

NDVI is an important parameter for environmental assessment and precision agriculture that well-describes the status of vegetation. Nevertheless, the clouds in optical images often result in the absence of NDVI information at key growth stages. The integration of SAR and optical image features will likely address this issue. Although the mapping of different data sources is complex, the prosperity of deep learning technology provides an alternative approach. In this study, the double-attention RNN architecture based on the recurrent neural network (RNN) and attention mechanism is proposed to retrieve NDVI data of cloudy regions. Overall, the NDVI is retrieved by the proposed model from two aspects: the temporal domain and the pixel neighbor domain. The performance of the double-attention RNN is validated through different cloud coverage conditions, input ablation, and comparative experiments with various methods. The results conclude that a high retrieval accuracy is guaranteed by the proposed model, even under high cloud coverage conditions (R² = 0.856, RMSE = 0.124). Using SAR images independently results in poor NDVI retrieval results (R² = 0.728, RMSE = 0.141) with considerable artifacts, which need to be addressed with auxiliary data, such as IDM features. Temporal and pixel neighbor features play an important role in improving the accuracy of NDVI retrieval (R² = 0.894, RMSE = 0.096). For the missing values of NDVI data caused by cloud coverage, the double-attention RNN proposed in this study provides a potential solution for information recovery.

Keywords:

SAR; multispectral imagery; double-attention mechanism; RNN; deep learning

Graphical Abstract

1. Introduction

Vegetation is an essential part of the land surface. It is intimately linked to energy transmission and material exchange activities. It has a substantial impact on the conservation of soil and water, as well as on the carbon cycle and climate change [1,2,3]. Accordingly, the implementation of dynamic vegetation monitoring is crucial for regional ecological conservation [4]. Among many quantitative indicators, NDVI, as a representation of the intensity of photosynthesis, has the capability of indicating the growth status of plants to some extent [5].

Currently, optical remote sensing images are widely employed in NDVI calculations [6,7,8]. Unfortunately, the NDVI data of key growth periods—affected by the cloud coverage of optical images—cannot be fully collected, which has an adverse effect on continuous vegetation monitoring. To address this issue, many interpolation and machine learning methods have been proposed to retrieve the missing NDVI information. The interpolation methods employ previous optical images to retrieve missing NDVI either locally or globally [9,10,11]. Local temporal methods rely on an appropriate time step factor to recover missing NDVI data. For example, Bolton et al. [12] employed a cubic smooth spline function to fill seasonal observation gaps in vegetation, based on local temporal features and the optimal time step parameter. Global temporal approaches, however, use objective functions and optimization methods, such as least squares to suit the specific NDVI pattern, and then complete the missing NDVI recovery process. Commonly utilized functions include the Whittaker function [13,14], the asymmetric Gaussian function [15,16], and the Fourier harmonic analysis [17,18]. With simulated global-scale NDVI data, the comparison study of several interpolation algorithms by Julien and Sobrino [18] indicates that, for local temporal methods, the time step factor has a direct impact on the accuracy of NDVI prediction results. Since the determination of the time step factor is a time-consuming procedure, local temporal methods are difficult to use widely. Global temporal methods perform well when the length of the sequential NDVI is short, while the prediction accuracy is considerably diminished when the length extends. Generally, the above strict requirements affect the performance of interpolation methods, and the intrinsic problem introduced by cloud coverage has not been addressed effectively. In the last few years, due to the successive launch of SAR satellites, such as Sentinel-1, machine learning-based NDVI retrieval has been viable. SAR signals are impervious to cloud coverage and shadows and are highly related to the growth status of vegetation [19]. Therefore, support vector machines, random forests, Gaussian processes, and deep learning methods have been used to recover missing NDVI data for cloudy regions [20,21]. In comparison to interpolation methods, machine learning-based approaches generally achieve better results. However, support vector regression and random forest classifiers are constrained by training data, they are unable to seize the pattern of missing NDVI in cloudy regions, and the generalization of these models are poor. The Gaussian process algorithm needs to select the kernel function; clouds and cloud shadows make this procedure more complicated, and the parameter adjustment also restricts its promotion. Furthermore, the above mentioned classical machine learning approaches are unable to fully utilize the temporal information within the SAR and optical images, further reducing the retrieval accuracy of NDVI under cloudy conditions. With the computing power of computers greatly improving, deep learning technology has been increasingly used in the field of remote sensing [22,23,24], and its capability of massive training data processing has greatly improved model generalization. Based on convolutional neural network(s) (CNN) and recurrent neural network structure(s) (RNN), some studies have investigated NDVI retrieval methods with SAR and optical images in a single time phase or a temporal perspective, and the missing information caused by cloud coverage were reconstructed [25,26]. Scarpa et al. [27], for example, integrated Sentinel-1 and Sentinel-2 images, and built an NDVI retrieval model from May to November. Additionally, the importance of SAR data in NDVI retrieval was also analyzed. Yu et al. [28] employed NDVI data generated from MODIS and Sentinel-2 and built several RNN and CNN architectures to compare the retrieval results to those obtained from classical methods. Despite the fact that existing deep learning approaches take into account the texture and temporal features of NDVI, they are still affected by factors of clouds and network architecture, which results in a substantial disparity between the retrieval results and target [29].

Therefore, retrieving NDVI data in cloudy regions and improving the accuracy of retrieval results is a problematic issue. In this study, we employ annual NDVI sequential data and SAR images and propose a novel NDVI retrieval model double-attention RNN with a double-attention mechanism. This model can provide an alternative option for the missing NDVI recovery for cloudy regions.

More specifically, this paper presents the following contributions:

1. Temporal with pixel neighbors are considered by the double-attention RNN architecture, which contributes to constructing the relationship between SAR and optical information in cloudy regions.

2. The double-attention mechanism introduced in this study provides powerful capability for feature extraction, further improving the NDVI retrieval quality.

The rest of the paper is organized as follows: Section 2 presents a brief introduction to the image dataset used and the study area examined in this work. Subsequently, the methodology, the architecture of the double-attention RNN, and the input data generation are demonstrated in Section 3. The results and comparison with state-of-the-art methods are presented in Section 4. We summarize our work and draw conclusions in Section 5 and Section 6.

2. Study Area and Data Source

2.1. Study Area

The study area of this paper is Erhai Lake, Dali Bai Autonomous Prefecture, Yunnan Province. Erhai Lake is the second largest freshwater lake in the Yunnan Province [30]. This region has a northern subtropical plateau monsoon climate with an average annual temperature of 14.9 °C, annual precipitation of 1051.1 mm, and 2227.5 h of sunlight. The lake body of Erhai is long and narrow, and steep mountains are distributed in the lakefront with abundant natural vegetation. The planting pattern in this area is complicated and highly fragmented. Additionally, the study area is populated with reservoirs and lakes, with an average annual water temperature of 12–21 °C. The unique geographical conditions result in a high cloud coverage rate in this area, which severely impedes remote sensing-related tasks (Figure 1).

2.2. Dataset

This study employed SAR and optical remote sensing images as data sources. After fully balancing the spatial–temporal resolution and acquisition costs, the VV and VH bands of the Sentinel-1 satellite in IW imaging mode, as well as the near-infrared (Band 8) and red (Band 4) image bands of the Sentinel-2 satellite, were adopted [31,32]. Because the parameter training of the double-attention RNN relies on a large number of image samples, the Google Earth Engine (GEE) [33] platform was used in data batch preprocessing (including thermal noise removal, radiation correction, and terrain correction of Sentinel-1 images, and radiation correction, atmospheric correction of Sentinel-2 images) and NDVI calculation. To avoid the negative effects of cloud coverage in the study area on the model training procedure, the S2Cloudless algorithm [34] was used to mask the clouds and cloud shadow pixels; meanwhile, the enhanced lee filter was utilized to eliminate speckle noise in the SAR images [35]. The image coverage was eventually determined to be 400 × 400 km, and the image pairs of Sentinel-1 and Sentinel-2 from 2018–2020 were gathered twice a month in a total of 144 image scenes. The cloud coverage information of Sentinel-2 images is summarized in Figure 2.

For the purpose of building a high-quality training dataset, ≤10% is determined to be low cloud coverage; otherwise, it indicates high cloud coverage. As shown in Figure 2, images with high cloud coverage are mainly scattered from June to September, covering the key growth stages of various crops. Therefore, the retrieval of missing NDVI values in these areas is especially critical for high-precision crop monitoring. To suppress the interference of cloud pixels in the training procedure, the generated image patches are filtered by cloud masks. The training dataset, validation dataset, and test dataset were all built by random sampling with a proportion of 6:2:2, and the detailed information for the generated dataset is shown in Table 1.

3. Methodology

Due to the problems of applicability and architecture that exist in current NDVI retrieval methods, improvement of retrieval accuracy is an urgent issue. The double-attention RNN architecture is proposed in this study, which uses a double-attention mechanism to deeply excavate image features in temporal and pixel neighbor domains.

3.1. The Overall Workflow of the Double-Attention RNN

Figure 3 presents the overall workflow of the double-attention RNN, which consists of the following steps: (1) data preprocessing, SAR-masked NDVI pairs of consistent spatial positions are fed into the architecture, and the IDM feature of the (GLCM) gray-level cooccurrence matrix is achieved. (2) Pixel vector resampling, the images obtained from data preprocessing are reshaped to meet the input requirement of double-attention RNN. (3) Double-attention RNN training, the parameters in the model are trained by the generated dataset. (4) Double-attention RNN prediction, with the well-trained model, after the forward propagation process of the double-attention RNN, the cloudless NDVI retrieval results are acquired.

The sequential VV and VH images contain abundant information on ground objects exclusive to clouds. The double-attention RNN splits the input images into pixel vectors to reduce the effect of imaging differences with NDVI targets and avoids the blurred details caused by direct convolutional operations. The feature vectors are processed in temporal and pixel neighbor dimensions by double-attention RNN separately. Temporal, pixel neighbor encoding, and pixel neighbor decoding, as the names suggest, adopt RNN-based modules to analyze the sequential information between each time phase and adjacent pixels. With the double-attention mechanism proposed in this paper, pixel neighbor feature extraction is conducted to achieve more comprehensive features. With these well-extracted features, the proposed double-attention RNN is able to retrieve the missing NDVI in cloudy regions.

3.2. Architecture of the Double-Attention RNN

The input of the double-attention RNN is a multitemporal pixel vector. To efficiently extract the contextual information of the adjacent pixels, a pixel neighbor step B is determined to transform the input pixel vector into a resampled vector. The resampling procedure is shown in Figure 4.

Figure 4 depicts the generating procedure of the resampled feature vectors. The “nether” part includes pixel vectors with T time phases, and there are N pixels in each time phase. According to the predefined step value B, adjacent pixels are intercepted to form resampled feature vectors, which are subsequently fed into the network to extract features. At the training stage, samples of batch size numbers are randomly selected from the original pixel vector (ranging of N–B) to produce a resampled feature vector in a size of (batch size, B, T). For prediction, the zero-padding operation is applied to original pixel vectors, sequentially generating resampled feature vectors in a shape of (N, B, T).

With resampled feature vectors, the relationship between SAR and optical images is explored by a double-attention RNN, and the architecture is presented in Figure 5. The network generally consists of encoding and decoding parts [36]. With resampled feature vectors of VV and VH bands, the encoding part extracts temporal information and pixel contextual information, together with high-dimensional transformation. Correspondingly, the decoding part reconstructs the encoded abstract features into NDVI data.

More specifically, the encoding part is divided into two sections, temporal feature extraction and pixel neighbor feature extraction, and standard RNN structures are used in this part. Due to the double-attention mechanism focusing on the saliency status of deep features, the temporal feature extraction is first used to excavate the sequential information within the image features, subsequently, refined deep features are used to provide abundant information for the double-attention mechanism and the following decoding part. In the temporal feature extraction section, with integrated feature vectors achieved from concatenation, the pixel neighbor dimension is first fixed, and a temporal LSTM block is built to extract temporal information between the features. It is composed of a single-layer LSTM [37] and a fully connected layer. A single-layer LSTM includes three gate units, the input gate (i), the forget gate (f), and the output gate (o). They jointly control the current state (c) and the previous output and hidden state, as defined in the equations, as follows:

i_{t} = σ (W_{Ii} X_{t} {+ W}_{Hi} H_{t - 1} {+ W}_{ci} c_{t - 1} {+ bias}_{i})

(1)

f_{t} = σ (W_{If} X_{t} {+ W}_{Hf} H_{t - 1} {+ W}_{cf} c_{t - 1} {+ bias}_{f})

(2)

o_{t} = σ (W_{Io} X_{t} {+ W}_{Ho} H_{t - 1} {+ W}_{co} c_{t - 1} {+ bias}_{o})

(3)

c_{t} {= i}_{t} ⊙ \tan h (W_{Ic} X_{t} {+ W}_{Hc} H_{t - 1} {+ bias}_{c}) {+ f}_{t} ⊙ c_{t - 1}

(4)

where i, f, o, and c represent the three gate units and the current state, respectively. W

\in

(m(m + T)) is the weight, and bias

\in

(m) corresponds to W; H

\in

(m) describes the output of the hidden layer. m defines the dimension of the hidden layers, σ and tanh represent sigmoid and tangent activation functions, respectively, and

⊙

is the Hadamard product operator between tensor vectors.

Next, a Conv1DReLU block is attached to the temporal LSTM block to augment the feature dimension and add nonlinear relationships between the features; it consists of successive 1D convolutional layers and ReLU activation layers. Then, these high-dimensional features extracted from the Conv1DReLU block are fed into a MaxPool1DFC block, which contains a MaxPool1D layer and a fully connected layer; refined features with abundant temporal information are acquired. In the pixel neighbor feature extraction section, it first processes the previous features with a pixel neighbor LSTM block, which has the same structure as the temporal LSTM block. The main difference lies in the action scope, which is the pixel neighbor. The contextual information of adjacent pixels is extracted. The following built attention SoftMax block imitates the attention mechanism of human eyes [38]. In this paper, the attention mechanism was implemented by a multilayer perceptron (MLP), which adaptively assigned larger weights to retrieve related features, as defined in the equations, as follows:

e_{b T} {= U}_{e}^{T} {\tan h (W}_{e} {[H}_{b - 1} {; S}_{b - 1} {] + V}_{e} X_{b T})

(5)

α_{b T} = \frac{{\exp (e}_{b T})}{\sum_{b = 1}^{B} {\exp (e}_{b T})}

(6)

where U_e

\in

(B), V_e

\in

(B × B), and W_e(B × 2 m) are parameters of the attention mechanism to be updated and α_bT is the attention weight, which is paired with the SoftMax function to ensure that the sum of weights is equal to 1.

To improve the performance of NDVI retrieval, a decoder part with the symmetric structure of the pixel neighbor extraction is designed. The decoder part reconstructs NDVI values with the features acquired from the encoder part. Since features in long sequences are prone to result in gradient vanishing [36], another attention SoftMax block is used to build a double-attention mechanism. Additionally, when affected by the difference in the imaging principle of SAR and optical images, it is difficult to achieve ideal retrieval results by using SAR images independently [39]. Therefore, with the abundant texture characteristic of the gray level cooccurrence matrix (GLCM), the IDM features of the masked NDVI are added in the decoder part as auxiliary data, which is defined as follows:

h_{b t} = \sum_{i, j = 0}^{L} \frac{P (i, j, d, θ)}{{1 + (i - j)}^{2}}

(7)

where i and j denote the value of pixels, L represents the gray level, and P(I, j, d, θ) represents the cooccurrence probability of pixels i and j with a certain direction of θ and a distance of d. Finally, the IDM features are connected with previous features and fed into a pixel neighbor LSTM block to restore the original shape of NDVI data, finishing the NDVI retrieval task.

4. Results

In this section, three experiments were designed to evaluate the retrieval performance of the double-attention RNN. First, with different cloud coverage conditions, SAR and IDM images generated from masked NDVI were used as the input data of the double-attention RNN to analyze the effect of clouds on the retrieval results of NDVI. The second experiment modified the input of the double-attention RNN, and the IDM feature was removed to assess its importance on the NDVI retrieval task with various cloud coverage conditions. In addition, the double-attention RNN was evaluated against several classical interpolation and machine learning-based methods to further validate its performance on NDVI retrieval.

All experiments were carried out on a workstation equipped with an Intel Core i9-10900k CPU @ 3.70 GHz, 32 GB of DDR4 2133 MHz RAM, and an NVIDIA GeForce RTX 3060 with 12 GB RAM. The operation system was Ubuntu 18.04. The deep models were implemented with PyTorch 1.10 [40] using Python 3.6. Other methods were implemented with MATLAB. For the training procedure, the batch size was set to 128, the z score method was employed to reduce the discrepancies between the different image data. The mean squared error was used as the loss function, which is defined below as follows:

{L (y}_{T}, \hat{y_{T}}) = \frac{1}{N} \sum_{i = 1}^{N} {{(y}_{T} (i) - \hat{y_{T}} (i))}^{2}

(8)

where y_T is the cloudless NDVI and

\hat{y_{T}}

represents the retrieval result.

The Adam optimizer [41] was used to optimize the whole training procedure, and the initial learning rate was 0.001, with β1 = 0.5 and β2 = 0.999. The learning rate was reduced by 10% every 10,000 iterations. After 25 epochs, the loss value no longer declined, and the model completed training.

4.1. The Performance of the Double-Attention RNN on Various Cloud Coverage Conditions

Two subregions with low and high cloud coverage conditions were chosen to evaluate the performance of the double-attention RNN, and the images were captured in 2019. It is important to note that the target NDVI images were replaced by cloudless images acquired on the nearest date. The optical images and IDM features of the NDVI at low and high cloud coverage levels are shown in Figure 6.

The first row of the above Figure 6a,b represents the subregions in the study area acquired in 2019, where the cloud coverage of the annual 24 images was, on average, lower than 10%. Figure 6c,d are areas with high cloud coverage, meaning that the cloud coverage is higher than 10%, clouds in blocky and broken shapes are included. The cloud coverage mainly affects the textural information of IDM features in the second row. For example, the clouded area in the first row shows high pixel values and 0 values in the IDM features generated from the masked NDVI. The subregions selected contain forestland, water bodies, upland fields, paddy fields, and residential areas, including primary land cover types of the study area that meet the requirements of model validation. With the input data of different cloud coverage conditions, NDVI retrieval results are achieved by the pretrained double-attention RNN, and Figure 7 depicts the NDVI retrieval images corresponding to Figure 6.

As shown in the above figure, the first row exhibits the original NDVI images, and the NDVI retrieval results of the double-attention RNN are shown in the images of the second row. The color bar beside the figure represents the value range, a high NDVI value is depicted in red, and a low NDVI value is depicted in green. It can be seen from the color bar of the original NDVI that clouds and water bodies show negative values, while the vegetation shows positive values. By visual comparison, the texture of each NDVI retrieval result under low and high cloud conditions is consistent with the original NDVI, and the variation patterns of pixel values follow the same trend. The temporal and pixel neighbor features constructed by the double-attention RNN provide abundant information on the demand of the mapping–cloud-free NDVI. The NDVI retrieval results of high cloud coverage indicate that the missing information caused by thick clouds in different shapes are also well recovered; cloud pixels with deep green in Figure 7c,d are replaced by meaningful pixels of the retrieved underlying surface, mainly due to the double-attention mechanism of the proposed architecture, which deeply mines the contextual features within SAR images when the information of IDM is absent.

In addition to visual assessment, R² and RMSE are used to evaluate the accuracy of the retrieval models, as defined in the equations below, where

\hat{y_{T}}

represents the mean NDVI. However, image patches of low cloud coverage and high cloud coverage are separately selected from the 2019 study area images, and accuracy curves are also generated, as depicted in Figure 8. Equations (9) and (10) are written as follows:

RMSE = \frac{\sum_{i = 1}^{N} ‖ y_{T} (i) - \hat{y_{T} (i)} ‖}{N}

(9)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y}_{T} (i) - \hat{y_{T} {(i))}^{2}}}{\sum_{i = 1}^{N} {(y}_{T} (i) - \bar{y_{T} {(i))}^{2}}}

(10)

As shown in the above figure, the images of the above dates contain low and high cloud coverage subregions. Additionally, the aggregation of validation within a day are more reasonable, and the evaluation results are achieved by them. The left and right figures depict the model evaluation results of R² and RMSE under low and high cloud coverage conditions, respectively. This result indicates a high correlation between the retrieval NDVI and the target NDVI (for cloud-covered areas, cloudless NDVI of adjacent dates are used instead) (R² > 0.7), which reflects the high prediction accuracy of the double-attention RNN. In comparison, the retrieval accuracy of low cloud coverage is higher than that of high cloud coverage, and the R² and RMSE increased by 2% and 9%, respectively. At the same time, in combination with the cloud distribution depicted in Figure 2, within cloudy July–September, the decline of R² values was probably due to more cloud spread than other months, which led to little impact on the NDVI retrieval accuracy, while the double-attention RNN performed well in both cloud conditions. The highest R² appeared on 16 November 2019, with metric values of 0.859 and 0.856 in low and high cloud coverage areas; correspondingly, the lowest R² values were 0.743 and 0.724, which were acquired on 2 September 2019. For the RMSE metric, the highest values under low and high cloud coverage conditions were obtained on 17 June, 2019, which were 0.089 and 0.112; the worst RMSE values of the subregions were 0.147 and 0.157, which appeared on different dates of 2 July and 16 December 2019. The reason for this, on the one hand, is the high cloud coverage and the crop rotation, on the other.

4.2. Evaluation of IDM Features on NDVI Retrieval

Except for SAR images, IDM features were introduced in this study to provide auxiliary texture information. To evaluate the influence of IDM features on NDVI retrieval accuracy, this section designs an ablation experiment, retraining the overall architecture of the double-attention RNN with only SAR images as input. The dataset division principle and training hyperparameters remain the same as previously described. Then, the ablation model is analyzed together with a double-attention RNN. The test R²-epoch curves during the training process of the two models are shown in Figure 9.

Figure 9a depicts the training accuracy curve of the double-attention RNN, and Figure 9b shows the accuracy curve of the ablation model. The comparison of the above accuracy curves indicates that the IDM features make the test R² rise steadily with the increase in epochs, while the curve of the ablation model fluctuates greatly. This result is due to the IDM features provide more supplementary information on model training. The double-attention RNN achieves the highest test R² of 0.862, while for the ablation model, the highest test R² reduces to 0.785. It can be concluded that the IDM feature considerably improves the retrieval accuracy of NDVI.

The low and high cloud coverage images acquired on 16 March 2019, are used for visual comparison. Following the configuration of the ablation experiment, NDVI retrieval results with various inputs (SAR + IDM features and sole SAR) were generated. In addition, 5000 validation points were randomly scattered on the achieved NDVI images, and four density scatter plots were plotted to evaluate the performance of the models, as shown in Figure 10.

In Figure 10, there are a large number of artifacts in the NDVI retrieval results of the ablation model, in which visual representation is more similar to SAR images compared to the results of double-attention RNN in both low and high cloud coverage regions. Meanwhile, in the low cloud coverage results, the NDVI values of the vegetation area are suppressed, and the values in non-vegetation area are overestimated by the ablation model. Moreover, the NDVI values of cropland are also underestimated, since the bare soil appears as mixed signals in SAR images, and there is deficient information for the ablation model to accurately distinguish the blended image features of crops and bare soil. For water bodies in the NDVI image, its value range should be negative values, while in the low cloud coverage retrieval results of the ablation model, there are no negative pixels, which is obviously incorrect. Within high cloud coverage areas, the ablation model fails to recover the missing NDVI information in Figure 10l. The ablation model employs no extra data as a complement, and temporal features alone are not enough to precisely characterize the complex nonlinear relationship between SAR and optical images, which eventually leads to deviations between the retrieval results and the cloudless NDVI. In contrast, the double-attention RNN combines the texture information of IDM features with SAR images, which well-reconstructs missing NDVI information caused by various coverage clouds and distinguishes the confusing regions between different ground objects, making the retrieved NDVI results closer to the actual NDVI.

The density scatter plot is often used in the accuracy evaluation of a large number of samples, and it is especially suitable for depicting the relationship between the prediction accuracy and spatial distribution. As shown in the above density scatter plots, the horizontal and vertical axes represent the range of cloudless NDVI and retrieval NDVI, respectively. The legend on the right shows the occurrence frequency of sample points with various values, ranging from 0 to 1. The density scatter plots of the two models indicate that most values of the sample points are concentrated in the range of 0.6–0.8 with low cloud coverage areas, presenting highlighted tone in the figure; for the ablation model, there are a large number of horizontal distributed points, which are supposed to be water bodies with negative values in the cloudless NDVI, but they conversely show positive values in the retrieval NDVI of the ablation model. As for high cloud coverage areas, the sample points of ablation model are concentrated in the range of 0.2–0.6, and more scattered compared to double-attention RNN. This is consistent with the results exhibited in the left image. For the double-attention RNN, the distribution of sample points in the density scatter plot is more concentrated compared to the ablation model, and there are fewer outliers. The analysis of density scatter plots illustrates the importance of the IDM feature improving the accuracy of the NDVI retrieval task under various cloud coverage conditions.

4.3. Comparative Evaluation

In previous experiments, the performance of the double-attention RNN under different cloud coverage conditions was comprehensively evaluated by both quantitative and qualitative analyses, and then the importance of IDM features on NDVI retrieval accuracy was deeply discussed. To further evaluate the double-attention RNN, in this section, the machine learning algorithms: SVM, random forest (RF) regression, and the multilayer perceptron (MLP) network, and the interpolation algorithms: linear interpolation algorithm and Whittaker smoother (WHIT) algorithm [42] were selected for comparative experiments under low and high cloud coverage conditions, respectively.

The linear interpolation and WHIT methods employ the temporal correlation of sequential NDVI to retrieve missing information. With the pixel values of adjacent time phases, the linear interpolation method uses distance measurements to predict the NDVI results. The WHIT algorithm is based on the penalized least squares principle; it uses data fidelity and sequence smoothing filtering to predict NDVI sequential data. The parameters of smoothing and penalty are set to 1 and 2 to retain detailed information in the sequential NDVI as much as possible. The RF and SVM algorithms use SAR and IDM features to retrieve missing NDVI values; for the RF algorithm, the number of trees is set to 100, the maximum depth of a tree is set to 25. An RBF kernel with grid search is used for the SVM algorithm. The linear and WHIT interpolation methods are implemented by using DATimeS [43]; SVM and RF are realized based on the MATLAB platform.

Since the MLP and the double-attention RNN include fully connected layers, the comparison analysis between the two models is able to preferably reflect the advantage of the double-attention mechanism. The MLP network used in this section is consistent overall with the double-attention RNN. It was only modified by replacing the 1D convolutional and pooling layers, double-attention mechanism, and RNN structures with the fully connected skeleton. After a trial-and-error, the encoder part of the MLP included four fully connected layers (the shapes of the hidden layers were 128, 128, 200, and 256). Correspondingly, four fully connected layers were used in the decoder part (the hidden layer shapes are 256, 64, 32, and 24). SAR and IDM features were concatenated in the temporal dimension and fed into the comparative MLP model to retrieve NDVI results. The cloudless sequential NDVI of 2020 was chosen as the retrieval target, and the NDVI retrieval results and error histogram plots from 2 May and 6 July were generated, as shown in Figure 11.

The comparative models rely on the sequential information of previous and future time phases or image features to retrieve missing NDVI data. As shown in the above figure, under various cloud conditions, significant discrepancies exist between the NDVI retrieval results of the comparative models and the target NDVI. While for the NDVI retrieval results of double-attention RNN, the forestland and agricultural land are more consistent with those in target NDVI. More specifically, despite the MLP network integrating SAR with IDM features to build a direct end-to-end mapping of the input and target, the temporal information and pixel neighbor information within the input data are unable to be fully extracted. The result of the low cloud coverage demonstrates that the NDVI values of vegetation are underestimated, the values of non-vegetation areas are overestimated, and the overall contrast of retrieval NDVI is low. For high cloud coverage regions, there are many artifacts in the NDVI retrieval results. Therefore, it can be concluded that the MLP network is incapable of constructing an accurate NDVI retrieval model. The linear interpolation and WHIT methods merely use sequential NDVI as input data, and the retrieval results are inevitably affected by clouds in previous images. For example, the left bottom corner in the figures of linear interpolation and WHIT is supposed to be forestland, whereas it presents the characteristics of clouds and is more obvious in the above Figure 11l,p. Additionally, the contrast of linear interpolation is superior to that of MLP, but its retrieval performance is poor. The WHIT method retrieves cloudless NDVI information well compared to linear interpolation, and the retrieval result of the low cloud coverage region appears considerably better than MLP, even closer to the target NDVI; however, it fails to accurately predict NDVI values occluded by thick clouds. From the visual representations, SVM and RF algorithms fail to retrieve missing NDVI values under different cloud conditions, large amounts of training samples deteriorate the performances of the models, which results in inferior retrieval results.

Similarly, 5000 validation points were randomly scattered in the image region, and combined with the target NDVI data, two error histogram plots were generated to evaluate the retrieval performance of each comparative method under various cloud coverage conditions. The X-axis of the error histogram plot represents the difference between the retrieval values and the target values of the NDVI. The Y-axis represents the ratio of the sample number on the X-axis to the total number of samples. With the error histogram plot, the distribution of error can be directly analyzed to validate the retrieval performance of the models. The above error histograms clearly present the error histograms of each comparison method, “obeying” an approximately normal distribution, and there are differences in their expectations and standard deviations. As shown in the error histogram of the double-attention RNN, most deviations are distributed around the 0 value and are more concentrated, which reflects that the retrieval result of the double-attention RNN is more consistent with the target cloudless NDVI. Compared with the double-attention RNN, the error histogram of MLP is even wider in shape, and the corresponding expectation moves in the direction of high error values, which is more apparent in the histogram of high cloud coverage. It indicates that the distribution of retrieval error is more discrete, and the retrieval accuracy of MLP is decreased. For other comparative methods, there are more samples of WHIT distributed near the 0 error and the error histogram curves are more compact under both low and high cloud coverage conditions, the curves of other methods deviate even further, which reflects the same results with the corresponding retrieval NDVI images. After a comprehensive comparison of the error histograms, the plots of SVM and RF are flatter in shapes, indicating a poor NDVI retrieval performance. To completely validate the accuracy of the comparative methods, an accuracy evaluation table with R² and RMSE as metrics is generated, which is shown in Table 2.

The above table shows that the double-attention RNN achieves high NDVI retrieval accuracy in each time phase. For temporal information, the LSTM unit in the temporal domain continuously learns about the relationship within features of various dates, and the annual variation trend of NDVI is well-fitted. The LSTM unit integrating the double-attention mechanism—applied in the pixel neighbor domain—fully exploits the contextual information of pixels while retaining the texture details of NDVI. For the evaluation metrics of MLP, although the SAR and IDM features are connected to be fed into the network, the stacked fully connected layers are still unable to accurately build the mapping of NDVI between different time phases; meanwhile, large numbers of parameters in MLP affect the training procedure, leading to a low retrieval accuracy. The WHIT algorithm is especially suitable for sequential data processing. The validation result shows that its retrieval accuracy is considerably higher than that of the linear interpolation method. However, since the input of the WHIT is sequential NDVI without any other auxiliary data, its retrieval accuracy is severely affected by clouds, resulting in a higher retrieval accuracy in the first and fourth quarters than the rest of the time in a year. Thus, the WHIT algorithm is more suitable for regions with few clouds. The SVM, RF, and linear interpolation methods achieve low retrieval accuracy in the evaluation results. Among them, the SVM method achieves the worst validation result, although it is trained with large amounts of samples, it is not capable of fitting abstract NDVI sequential data. Additionally, the computational efficiency performance of double-attention RNN and the comparative methods are listed in Table 3.

Table 3 lists the processing/training (1 epoch), testing, and total time; the unit is second. Since the retrieval procedure is simultaneously conducted with the processing step, there are no testing time consumptions for linear and WHIT interpolations. The table shows that linear interpolation consumes the least time compared to other methods, but it has a poor performance. SVM and RF methods require much time to train the parameters. The double-attention RNN achieves the fastest testing time, and it balances accuracy and time consumption well.

5. Discussion

SAR provides a potential probability for NDVI retrieval in cloudy regions, and with freely available Sentinel series satellites, the status of vegetation can be thoroughly reflected. In this paper, the proposed double-attention RNN uses feature vectors from Sentinel-1 and Sentinel-2 images, and ideal NDVI retrieval results are achieved. The model performance is evaluated with various experiments from different aspects. The LULC-related influence on the retrieval accuracy and the extensibility of the double-attention RNN are further discussed in this section, except for the issues discussed in the above sections.

5.1. The Influence of Crop Types on Retrieval Accuracy

From the above sections, the influence of cloud coverage on the performance of the double-attention RNN is discussed in depth, and ideal NDVI retrieval accuracy is achieved in both thin and thick cloud conditions. The NDVI retrieval results also present various characteristics with different land cover types, except for the cloud coverage regions; some are underestimated, and some are overestimated. In the study area, natural forestland is the dominant vegetation type and is distributed widely. As a result, the influence of other crop types is especially uncertain and needs to be further analyzed. Therefore, the crop classification polygons of 2020 are used for evaluation, and the class-specific accuracy result is plotted in Figure 12.

To exclude the effect of clouds, the evaluation result was generated based on cloudless areas. The crop patches, sorted from the greatest to the least (in number), are maize, paddy, vegetable, tobacco plantation, orchard, flower garden, and greenhouse. To ensure an equal comparison, 5000 validation points were randomly chosen within the extent of polygons. As shown in the above figure, the mean values of the metrics are presented in histograms, and the positive and negative deviations are depicted with error bars. This result notably indicates that the double-attention RNN performs differently with various crop types; higher R² and RMSE values appear in maize, orchard, and tobacco plantations (R² > 0.81, RMSE < 0.11), and lower metric values are shown in other crop types. The evaluation results with high accuracy generally have smaller error bars, which represent a smaller uncertainty within the NDVI retrieval values. This result is probably attributed to the characteristics of crop objects, as shown in Figure 13.

From the above figure, red regions represent high percent vegetation distributions; regions in green exhibit low vegetation coverage or non-vegetation. The first row displays the fields with extensive areas and single crop types, the second row includes crop types with multiple breeds that are in various shapes. From the appearance, it is generally consistent with the evaluation results in Figure 12. Similar to forests in the study area, high retrieval results are achieved in the fields of single crop types by double-attention RNN, showing that crop types with single classes have the same growth state, the plants in the fields are “large-scale”, cultivated by people presenting a uniform “form”, with identical spectral and spatial features in both SAR and NDVI images; the temporal and pixel neighbor feature extractions of the proposed model fully explore the image features from different perspectives. For the crop types with multiple breeds is another mode, field patches gathered with several kinds of plants that lead to a high variability in the image representation, the difference between backscattering image features in VV and VH from Sentinel-1 and IDM features from Sentinel-2 optical images is likely to broaden; there is no doubt that the retrieval accuracy is affected in these regions. Double-attention RNN—having satisfactory performance with its double-attention mechanism and complex patterns in these areas—has been well studied.

From the above analysis, with different crop types, double-attention RNN can be applied not only to extensive natural vegetation scenes but to agriculture applications with single or multiclass crops. It should be noted that the reconstruction of NDVI on multiclass crop types is a challenging task that needs further study; its influence on the retrieval accuracy of the double-attention RNN lies in the planting structure of crop types, which controlling the image representation and various crop shapes, have an effect on the procedure of feature extraction. It can be addressed in two ways: first, designing an image preprocessing pipeline to maintain more detailed information of various crop types so as to provide distinguish features for double-attention RNN. The second solution involves adding other auxiliary data, such as radar vegetation index [44], to enhance the retrieval performance of double-attention RNN.

5.2. Perspectives of Double-Attention on Vegetation Monitoring

The double-attention RNN proposed in this paper analyzes image features in temporal and pixel neighbor domains. The LSTM module and attention principle were combined to build a double-attention mechanism that was able to comprehensively extract temporal and pixel neighbor features, resulting in refined and accurate NDVI retrieval results. In this paper, the retrieval interval was determined to be 24, with 2 image observations per month. Generally, it already fits in many conventional vegetation monitoring applications [45,46], but it is insufficient in precision agriculture, such as in tasks that often require intensive continuous monitoring. The extensive use of UAV images, except for the monitoring period, also presents a higher demand for the analysis capability of high-resolution images [47]. Due to the architecture of the double-attention RNN, it is possible to meet the requirements of the above issues. For the retrieval period, the LSTM module was originally used to process large-scale sequential data, which had high dimensions [48], while data in vegetation monitoring-related tasks were obviously shorter than their primary application scenarios; therefore, internal retrieval was easy to extend. With respect to the high-resolution image processing issue, as discussed in the above section, the double-attention RNN also extracts features in the pixel neighbor domain in addition to the temporal domain, and the attention mechanism allows the architecture to focus on the recovery of indispensable parts of high-resolution images. The modification of pixel neighbor step B makes it possible to improve the adaptability of the double-attention RNN on image sensors with different spatial resolutions. The feature vector-based processing mode is a convenient way to retain detailed information that is beneficial to both satellite and UAV images. Many indices, except for NDVI, are important for the growth of vegetation. In this paper, to improve the NDVI retrieval performance, the IDM feature of a masked NDVI was used to provide complementary information, which reduced the difficulty in feature mapping between multimodal images of Sentinel-1 and Sentinel-2. In addition, the introduction of complement texture, not the original NDVI, avoided the risk that the deep model discarded features from SAR images and, hence, the multisource features contributed equally to the NDVI retrieval task. As a consequence, the double-attention RNN was able to retrieve other vegetation indices, thereby expanding the applicability toward more types of vegetation monitoring.

6. Conclusions

The retrieval of missing NDVI information caused by cloud coverage has potential in the fields of environmental protection and precision agriculture. With SAR and IDM features, we propose a novel architecture composed of a double-attention RNN to recover missing NDVI information for cloudy regions. The architecture deeply mines image features from temporal and pixel neighbor domains successively. With resampled feature vectors and an encoding–decoding architecture, the double-attention RNN first combines a one-dimensional convolutional layer and an LSTM unit to fully extract the temporal information in sequential input data. Then, focusing on the pixel neighbor domain, the double-attention mechanism and LSTM unit synthesize the two-dimensional information, and the missing information in cloudy NDVI is reconstructed well. The influence of cloud coverage and the importance of IDM features were validated by two experiments. Additionally, a comparative experiment with linear interpolation, WHIT, MLP, SVM, and RF methods were also used to qualitatively and quantitatively evaluate the performance of the double-attention RNN. The experimental results draw the following conclusions:

(1) The double-attention RNN is capable of well recovering the missing information in NDVI. For cloudy regions, ideal retrieval accuracy is achieved under high cloud coverage conditions, with R² > 0.72 and RMSE < 0.15. The highest retrieval accuracy shows in thin cloud coverage regions, with R² > 0.85 and RMSE < 0.08. Therefore, it can be concluded that the double-attention RNN is suitable for the NDVI retrieval task, despite the negative effect of a thin cloud or a thick cloud.

(2) According to the results of ablation and comparative experiments, the IDM feature is important for NDVI retrieval. The results of the double-attention RNN in the comparison experiments (date of retrieved NDVI: 3 November, 2020, R² = 0.894, RMSE = 0.096) further indicate the importance of temporal and pixel neighbor features as well as the corresponding double-attention mechanism on the improvement of NDVI retrieval accuracy.

In conclusion, the double-attention RNN effectively improves the accuracy of the NDVI retrieval task, and the prediction results can be used as a complement when NDVI images are obstructed by clouds. Moreover, double-attention RNN has potential in intensive continuous and high-resolution monitoring with UAV images, which makes the NDVI sequential data intact. However, the crop types with multiple breeds and the optimization of architecture are key issues that need to be solved urgently, requiring further in-depth studies.

Author Contributions

Conceptualization, W.Z.; funding acquisition, W.Z.; methodology, F.D., F.L. and M.Z.; software, R.J.; validation, R.J.; writing—original draft, R.J. and F.D.; writing—review and editing, F.D. and R.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 42071422; the National Key Research and Development Program of China, grant number 2018YFC0706004.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Alencar, A.; Z Shimbo, J.; Lenti, F.; Balzani Marques, C.; Zimbres, B.; Rosa, M.; Arruda, V.; Castro, I.; Fernandes Márcico Ribeiro, J.P.; Varela, V. Mapping three decades of changes in the brazilian savanna native vegetation using landsat data processed in the google earth engine platform. Remote Sens. 2020, 12, 924. [Google Scholar] [CrossRef] [Green Version]
Ye, C.; Sun, J.; Liu, M.; Xiong, J.; Zong, N.; Hu, J.; Huang, Y.; Duan, X.; Tsunekawa, A. Concurrent and lagged effects of extreme drought induce net reduction in vegetation carbon uptake on Tibetan Plateau. Remote Sens. 2020, 12, 2347. [Google Scholar] [CrossRef]
Qiu, L.; Wu, Y.; Shi, Z.; Chen, Y.; Zhao, F. Quantifying the responses of evapotranspiration and its components to vegetation restoration and climate change on the Loess plateau of China. Remote Sens. 2021, 13, 2358. [Google Scholar] [CrossRef]
Hou, W.; Hou, X. Spatial–temporal changes in vegetation coverage in the global coastal zone based on GIMMS NDVI3g data. Int. J. Remote Sens. 2020, 41, 1118–1138. [Google Scholar] [CrossRef]
Wen, Y.; Liu, X.; Yang, J.; Lin, K.; Du, G. NDVI indicated inter-seasonal non-uniform time-lag responses of terrestrial vegetation growth to daily maximum and minimum temperature. Glob. Planet. Change 2019, 177, 27–38. [Google Scholar] [CrossRef]
Jia, K.; Yao, Y.; Wei, X.; Gao, S.; Jiang, B.; Zhao, X. A review on fractional vegetation cover estimation using remote sensing. Adv. Earth Sci. 2013, 28, 774. [Google Scholar] [CrossRef]
Pang, G.; Yang, Q.; Wang, C.; Shan, L.; Wang, B. Influence of parameter determination methods of the pixel dichotomy model on the estimation accuracy of fractional vegetation cover by GF-1 PMS data. Geogr. Geo-Inf. Sci. 2019, 35, 27–33. [Google Scholar] [CrossRef]
Long, X.; Li, X.; Lin, H.; Zhang, M. Mapping the vegetation distribution and dynamics of a wetland using adaptive-stacking and Google Earth Engine based on multi-source remote sensing data. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102453. [Google Scholar] [CrossRef]
Houborg, R.; McCabe, M.F. Daily Retrieval of NDVI and LAI at 3 m Resolution via the Fusion of CubeSat, Landsat, and MODIS Data. Remote Sens. 2018, 10, 890. [Google Scholar] [CrossRef] [Green Version]
Gerber, F.; de Jong, R.; Schaepman, M.E.; Schaepman-Strub, G.; Furrer, R. Predicting missing values in spatio-temporal remote sensing data. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2841–2853. [Google Scholar] [CrossRef] [Green Version]
Moreno-Martínez, Á.; Izquierdo-Verdiguier, E.; Maneta, M.P.; Camps-Valls, G.; Robinson, N.; Muñoz-Marí, J.; Sedano, F.; Clinton, N.; Running, S.W. Multispectral high resolution sensor fusion for smoothing and gap-filling in the cloud. Remote Sens. Environ. 2020, 247, 111901. [Google Scholar] [CrossRef]
Bolton, D.K.; Gray, J.M.; Melaas, E.K.; Moon, M.; Eklundh, L.; Friedl, M.A. Continental-scale land surface phenology from harmonized Landsat 8 and Sentinel-2 imagery. Remote Sens. Environ. 2020, 240, 111685. [Google Scholar] [CrossRef]
Kandasamy, S.; Baret, F.; Verger, A.; Neveux, P.; Weiss, M. A comparison of methods for smoothing and gap filling time series of remote sensing observations–application to MODIS LAI products. Biogeosciences 2013, 10, 4055–4071. [Google Scholar] [CrossRef] [Green Version]
Atkinson, P.M.; Jeganathan, C.; Dash, J.; Atzberger, C. Inter-comparison of four models for smoothing satellite sensor time-series data to estimate vegetation phenology. Remote Sens. Environ. 2012, 123, 400–417. [Google Scholar] [CrossRef]
Jonsson, P.; Eklundh, L. Seasonality extraction by function fitting to time-series of satellite sensor data. IEEE Trans. Geosci. Remote Sens. 2002, 40, 1824–1832. [Google Scholar] [CrossRef]
Beck, P.S.; Atzberger, C.; Høgda, K.A.; Johansen, B.; Skidmore, A.K. Improved monitoring of vegetation dynamics at very high latitudes: A new method using MODIS NDVI. Remote Sens. Environ. 2006, 100, 321–334. [Google Scholar] [CrossRef]
Julien, Y.; Sobrino, J.A. Optimizing and comparing gap-filling techniques using simulated NDVI time series from remotely sensed global data. Int. J. Appl. Earth Obs. Geoinf. 2019, 76, 93–111. [Google Scholar] [CrossRef]
Solano-Correa, Y.T.; Bovolo, F.; Bruzzone, L.; Fernández-Prieto, D. A method for the analysis of small crop fields in sentinel-2 dense time series. IEEE Trans. Geosci. Remote Sens. 2019, 58, 2150–2164. [Google Scholar] [CrossRef]
Liu, Q.; Yang, L.; Liu, Q.; Li, J. Review of forest above ground biomass inversion methods based on remote sensing technology. J. Remote Sens. 2015, 19, 62–74. [Google Scholar] [CrossRef]
Pipia, L.; Muñoz-Marí, J.; Amin, E.; Belda, S.; Camps-Valls, G.; Verrelst, J. Fusing optical and SAR time series for LAI gap filling with multioutput Gaussian processes. Remote Sens. Environ. 2019, 235, 111452. [Google Scholar] [CrossRef]
Mohite, J.; Sawant, S.; Pandit, A.; Pappula, S. Investigating the Performance of Random Forest and Support Vector Regression for Estimation of Cloud-Free Ndvi Using SENTINEL-1 SAR Data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 1379–1383. [Google Scholar] [CrossRef]
Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep learning classification of land cover and crop types using remote sensing data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
Ienco, D.; Interdonato, R.; Gaetano, R.; Minh, D.H.T. Combining Sentinel-1 and Sentinel-2 Satellite Image Time Series for land cover mapping via a multi-source deep learning architecture. ISPRS J. Photogramm. Remote Sens. 2019, 158, 11–22. [Google Scholar] [CrossRef]
Sun, J.; Lai, Z.; Di, L.; Sun, Z.; Tao, J.; Shen, Y. Multilevel deep learning network for county-level corn yield estimation in the us corn belt. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5048–5060. [Google Scholar] [CrossRef]
He, W.; Yokoya, N. Multi-temporal sentinel-1 and-2 data fusion for optical image simulation. ISPRS Int. J. Geo-Inf. 2018, 7, 389. [Google Scholar] [CrossRef] [Green Version]
Cresson, R.; Ienco, D.; Gaetano, R.; Ose, K.; Minh, D.H.T. Optical image gap filling using deep convolutional autoencoder from optical and radar images. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 218–221. [Google Scholar] [CrossRef]
Scarpa, G.; Gargiulo, M.; Mazza, A.; Gaetano, R. A CNN-based fusion method for feature extraction from sentinel data. Remote Sens. 2018, 10, 236. [Google Scholar] [CrossRef] [Green Version]
Yu, W.; Li, J.; Liu, Q.; Zhao, J.; Dong, Y.; Wang, C.; Lin, S.; Zhu, X.; Zhang, H. Spatial–Temporal Prediction of Vegetation Index With Deep Recurrent Neural Networks. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Zhao, W.; Qu, Y.; Chen, J.; Yuan, Z. Deeply synergistic optical and SAR time series for crop dynamic monitoring. Remote Sens. Environ. 2020, 247, 111952. [Google Scholar] [CrossRef]
Lin, S.-S.; Shen, S.-L.; Zhou, A.; Lyu, H.-M. Sustainable development and environmental restoration in Lake Erhai, China. J. Clean. Prod. 2020, 258, 120758. [Google Scholar] [CrossRef]
Torres, R.; Snoeij, P.; Geudtner, D.; Bibby, D.; Davidson, M.; Attema, E.; Potin, P.; Rommen, B.; Floury, N.; Brown, M. GMES Sentinel-1 mission. Remote Sens. Environ. 2012, 120, 9–24. [Google Scholar] [CrossRef]
Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Moorthy, I.; Sturn, T.; Batic, M.; See, L.; Milčinski, G.; Fritz, S. Improving Cloud Detection in Satellite Imagery using a Citizen Science Approach. In Proceedings of the Digital Earth Observation, Salzburg, Austria, 1–4 July 2019. [Google Scholar]
Banerjee, S.; Sinha Chaudhuri, S.; Mehra, R.; Misra, A. A Survey on Lee Filter and Its Improved Variants; Springer: Singapore, 2021; pp. 371–383. [Google Scholar] [CrossRef]
Cho, K.; Merrienboer, B.v.; Gülçehre, Ç.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation. In Proceedings of the EMNLP, Doha, Qatar, 25–29 October 2014. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Hübner, R.; Steinhauser, M.; Lehle, C. A dual-stage two-phase model of selective attention. Psychol. Rev. 2010, 117, 759. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Garioud, A.; Valero, S.; Giordano, S.; Mallet, C. Recurrent-based regression of Sentinel time series for continuous vegetation monitoring. Remote Sens. Environ. 2021, 263, 112419. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar]
Vuolo, F.; Ng, W.-T.; Atzberger, C. Smoothing and gap-filling of high resolution multi-spectral time series: Example of Landsat data. Int. J. Appl. Earth Obs. Geoinf. 2017, 57, 202–213. [Google Scholar] [CrossRef]
Belda, S.; Pipia, L.; Morcillo-Pallarés, P.; Rivera-Caicedo, J.P.; Amin, E.; De Grave, C.; Verrelst, J. DATimeS: A machine learning time series GUI toolbox for gap-filling and vegetation phenology trends detection. Environ. Model. Softw. 2020, 127, 104666. [Google Scholar] [CrossRef]
Mandal, D.; Kumar, V.; Ratha, D.; Dey, S.; Bhattacharya, A.; Lopez-Sanchez, J.M.; McNairn, H.; Rao, Y.S. Dual polarimetric radar vegetation index for crop growth monitoring using sentinel-1 SAR data. Remote Sens. Environ. 2020, 247, 111954. [Google Scholar] [CrossRef]
John, E.; Bunting, P.; Hardy, A.; Silayo, D.S.; Masunga, E. A Forest Monitoring System for Tanzania. Remote Sens. 2021, 13, 3081. [Google Scholar] [CrossRef]
Dumitru, C.O.; Schwarz, G.; Pulak-Siwiec, A.; Kulawik, B.; Lorenzo, J.; Datcu, M. Earth Observation Data Mining: A Use Case for Forest Monitoring. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 5359–5362. [Google Scholar] [CrossRef] [Green Version]
Murugan, D.; Garg, A.; Singh, D. Development of an adaptive approach for precision agriculture monitoring with drone and satellite data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 5322–5328. [Google Scholar] [CrossRef]
Baek, Y.; Kim, H.Y. ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module. Expert Syst. Appl. 2018, 113, 457–480. [Google Scholar] [CrossRef]

Figure 1. The location of the study area. The image of the right column in false color composition (red: band 8; green: band 4; blue: band 3) is derived from the Sentinel-2 image acquired on 2 February 2018.

Figure 2. The cloud distribution of Sentinel-2 images in the study area; (a–c) represent the dates of cloud coverage status in 2018, 2019, and 2020, respectively. The blue dots indicate the dates when images had low cloud coverage, and the orange dots show the date images with high cloud coverage.

Figure 3. Technical flowchart. After an image preprocessing procedure, resampled feature vectors of SAR and IDM are generated as the input of the double-attention RNN, and these deep features are further extracted in temporal and pixel neighbor domains to recover missing NDVI information.

Figure 4. Input resampling workflow. The double-attention RNN resamples the input pixel vectors with step B to enlarge the analyzing dimensions of the feature extraction to the temporal and pixel neighbor domains. The black grids represent N pixels in T time phases and the two red grids at the bottom of the figure are pixels to be resampled; the red pixel vectors on top of the figure are the resampled feature vectors with step B.

Figure 5. The architecture of the double-attention RNN. The resampled SAR vectors are first concatenated and then extracted by the encoding part and the decoding part of the double-attention RNN from the temporal and the pixel neighbor domains. The visualized samples of the main block features are colorized in viridis to demonstrate the overall procedure of the architecture.

Figure 6. Examples of low and high cloud coverage areas; (a–d) represent images of Sentinel-2 (band composition: 8, 4, 3); (e–h) are corresponding IDM features of NDVI.

Figure 7. NDVI retrieval results of the double-attention RNN under various cloud coverage conditions; (a–d) are original NDVI images with low and high cloud coverage conditions, respectively. (e–h) show the retrieval results under two cloud coverage conditions of the double-attention RNN. The figures are colorized with diverging colormaps; the intermediate value is depicted in yellow, and the high value and low value are shown in red and green.

Figure 8. Accuracy curves of NDVI retrieval results in 2019. (a,b) represent the evaluation curves of R² and RMSE, respectively. The blue line indicates the accuracy under low cloud coverage conditions; the orange line reflects the performance of the double-attention RNN on thick clouded images.

Figure 9. Accuracy comparison with the test dataset. (a,b) depict the test R² of the double-attention RNN and comparative model. Separately, the blue dotted line shows the test R² of the double-attention RNN; the rhombic line shows the test R² of the ablation model.

Figure 10. The NDVI retrieval results of the double-attention RNN with and without IDM features; (a,h) show the NDVI images with low and high cloud coverage conditions; (b,d,i,k) are the retrieval results of the ablation experiment; (c,e,j,l) are detailed views of the same region; (f,m) are the generated density plots of double-attention RNN; (g,n) are density plots of the ablation model. The NDVI images are colorized with diverging colormaps, as in Figure 7; the density plots are depicted in viridis.

Figure 11. The NDVI retrieval results of the comparative models; (a,j) represent the optical images of low cloud coverage and high cloud coverage; (e,n) show the cloudless NDVI; (b–d,f–h,k–m,o–q) represent the retrieval results of the double-attention RNN, WHIT, MLP, linear interpolation, SVM, and RF; (i,r) present the error histogram plots, which integrate a bar graph and a line graph to show the error distributions of the comparative methods. The plot in cyan depicts the double-attention RNN, the red plot is the error histogram of the MLP network, the yellow plot represents linear interpolation, the error histogram of WHIT is in green, the fuchsia color is used for SVM, and RF is depicted in the chocolate color.

Figure 12. Class-specific evaluation result of various crop types. (a) The estimation result of R². (b) The estimation result of RMSE. The evaluation result is calculated by the mean R² and RMSE of the cloudless target and retrieval NDVI regions of 2020 images.

Figure 13. Characteristics of crop patches in the study area. The retrieval NDVI result of the double-attention RNN is used as the base map, and various crop types are shown in different colors.

Table 1. The number of image samples in the train, validation, and test datasets. In the table, the second row denotes the sample number of the train dataset, and the third and fourth rows present the sample numbers of validation and test datasets, respectively.

Dataset	Sample Number
Train	421,839
Validation	144,656
Test	156,785

Table 2. Accuracy evaluation table of comparative models. Bold numbers indicate the best metric values of each model in 2020.

Time	Double-Attention RNN		MLP		WHIT		Linear Interpolation		SVM		RF
Time	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE
3 January 2020	0.879	0.092	0.613	0.156	0.691	0.155	0.376	0.227	0.415	0.157	0.488	0.163
15 January 2020	0.883	0.093	0.656	0.149	0.738	0.139	0.508	0.182	0.611	0.148	0.522	0.158
2 February 2020	0.855	0.087	0.674	0.162	0.757	0.143	0.517	0.197	0.338	0.295	0.464	0.152
17 February 2020	0.872	0.100	0.691	0.144	0.715	0.122	0.495	0.169	0.397	0.166	0.573	0.179
3 March 2020	0.891	0.081	0.704	0.139	0.814	0.118	0.523	0.172	0.543	0.183	0.624	0.181
15 March 2020	0.877	0.094	0.687	0.157	0.802	0.099	0.468	0.195	0.486	0.152	0.441	0.253
2 April 2020	0.844	0.107	0.663	0.159	0.721	0.161	0.505	0.188	0.607	0.175	0.635	0.139
17 April 2020	0.826	0.102	0.715	0.124	0.703	0.133	0.601	0.171	0.593	0.168	0.556	0.157
2 May 2020	0.882	0.086	0.625	0.135	0.608	0.152	0.549	0.149	0.662	0.149	0.481	0.232
19 May 2020	0.886	0.091	0.642	0.138	0.686	0.147	0.622	0.154	0.497	0.181	0.654	0.146
1 June 2020	0.875	0.105	0.722	0.145	0.652	0.158	0.553	0.178	0.445	0.164	0.503	0.186
16 June 2020	0.756	0.118	0.708	0.131	0.598	0.150	0.634	0.135	0.687	0.142	0.637	0.194
1 July 2020	0.804	0.104	0.692	0.143	0.711	0.172	0.402	0.193	0.569	0.197	0.519	0.211
16 July 2020	0.853	0.099	0.677	0.152	0.585	0.156	0.530	0.182	0.495	0.185	0.483	0.188
2 August 2020	0.766	0.121	0.639	0.172	0.735	0.135	0.646	0.148	0.462	0.183	0.522	0.247
15 August 2020	0.868	0.103	0.701	0.140	0.554	0.157	0.637	0.164	0.571	0.225	0.504	0.225
1 September 2020	0.841	0.087	0.592	0.159	0.773	0.127	0.592	0.176	0.601	0.159	0.476	0.195
16 September 2020	0.797	0.110	0.603	0.177	0.817	0.103	0.613	0.162	0.385	0.246	0.507	0.208
1 October 2020	0.857	0.083	0.647	0.133	0.839	0.119	0.557	0.167	0.493	0.192	0.538	0.173
16 October 2020	0.869	0.085	0.711	0.121	0.762	0.111	0.582	0.143	0.478	0.207	0.442	0.199
3 November 2020	0.894	0.096	0.739	0.132	0.722	0.138	0.608	0.156	0.537	0.213	0.596	0.151
15 November 2020	0.881	0.099	0.690	0.146	0.801	0.129	0.569	0.166	0.522	0.164	0.487	0.172
3 December 2020	0.875	0.101	0.728	0.165	0.784	0.131	0.661	0.173	0.624	0.148	0.423	0.210
15 December 2020	0.863	0.109	0.718	0.151	0.826	0.105	0.685	0.192	0.579	0.151	0.531	0.195

Table 3. Computational efficiency performance of comparative models. Bold numbers indicate the least time consuming.

Method	Processing/Training for 1 Epoch (s)	Testing (s)	Total (s)
Double-attention RNN	2543.8	105.2	2649.0
MLP	3357.6	278.6	3636.2
WHIT	4970.3	-	4970.3
Linear interpolation	1427.1	-	1427.1
SVM	17,249.4	350.7	17,600.1
RF	15,493.5	412.5	15,906.0

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jing, R.; Duan, F.; Lu, F.; Zhang, M.; Zhao, W. An NDVI Retrieval Method Based on a Double-Attention Recurrent Neural Network for Cloudy Regions. Remote Sens. 2022, 14, 1632. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14071632

AMA Style

Jing R, Duan F, Lu F, Zhang M, Zhao W. An NDVI Retrieval Method Based on a Double-Attention Recurrent Neural Network for Cloudy Regions. Remote Sensing. 2022; 14(7):1632. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14071632

Chicago/Turabian Style

Jing, Ran, Fuzhou Duan, Fengxian Lu, Miao Zhang, and Wenji Zhao. 2022. "An NDVI Retrieval Method Based on a Double-Attention Recurrent Neural Network for Cloudy Regions" Remote Sensing 14, no. 7: 1632. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14071632

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An NDVI Retrieval Method Based on a Double-Attention Recurrent Neural Network for Cloudy Regions

Abstract

1. Introduction

2. Study Area and Data Source

2.1. Study Area

2.2. Dataset

3. Methodology

3.1. The Overall Workflow of the Double-Attention RNN

3.2. Architecture of the Double-Attention RNN

4. Results

4.1. The Performance of the Double-Attention RNN on Various Cloud Coverage Conditions

4.2. Evaluation of IDM Features on NDVI Retrieval

4.3. Comparative Evaluation

5. Discussion

5.1. The Influence of Crop Types on Retrieval Accuracy

5.2. Perspectives of Double-Attention on Vegetation Monitoring

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI