Gully Erosion Monitoring Based on Semi-Supervised Semantic Segmentation with Boundary-Guided Pseudo-Label Generation Strategy and Adaptive Loss Function

Zhao, Chunhui; Shen, Yi; Su, Nan; Yan, Yiming; Liu, Yong

doi:10.3390/rs14205110

Open AccessArticle

Gully Erosion Monitoring Based on Semi-Supervised Semantic Segmentation with Boundary-Guided Pseudo-Label Generation Strategy and Adaptive Loss Function

¹

College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China

²

Key Laboratory of Advanced Marine Communication and Information Technology, Ministry of Industry and Information Technology, Harbin Engineering University, Harbin 150001, China

³

Heilongjiang Province Hydraulic Research Institute, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(20), 5110; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14205110

Submission received: 2 September 2022 / Revised: 3 October 2022 / Accepted: 5 October 2022 / Published: 13 October 2022

(This article belongs to the Special Issue Artificial Intelligence Algorithm for Remote Sensing Imagery Processing Ⅱ)

Download

Browse Figures

Versions Notes

Abstract

:

Gully erosion is a major threat to ecosystems, potentially leading to desertification, land degradation, and crop loss. Developing viable gully erosion prevention and remediation strategies requires regular monitoring of the gullies. Nevertheless, it is highly challenging to automatically access the monitoring results of the gullies from the latest monitoring data by training historical data acquired by different sensors at different times. To this end, this paper presents a novel semi-supervised semantic segmentation with boundary-guided pseudo-label generation strategy and adaptive loss function method. This method takes full advantage of the historical data with labels and the latest monitoring data without labels to obtain the latest monitoring results of the gullies. The boundary-guided pseudo-label generation strategy (BPGS), guided by the inherent boundary maps of real geographic objects, fuses multiple evidence data to generate reliable pseudo-labels. Additionally, we propose an adaptive loss function based on centroid similarity (CSIM) to further alleviate the impact of pseudo-label noise. To verify the proposed method, two datasets for gully erosion monitoring are constructed according to the satellite data acquired in northeastern China. Extensive experiments demonstrate that the proposed method is more appropriate for automatic gully erosion monitoring than four state-of-the-art methods, including supervised methods and semi-supervised methods.

Keywords:

gully erosion; automatic monitoring; semi-supervised semantic segmentation; boundary-guided; satellite data

1. Introduction

As one of the forms of soil degradation, gully erosion not only incises the land surface and nibbles the field but also washes out the fertile soil and degrades the efficiency of the large cultivator [1,2,3]. Regular investigation of gullies is conducive to understanding their evolution regularity and formulating appropriate restoration strategies [4,5,6]. At present, gullies have been investigated in many countries around the world [7,8,9,10]. Nevertheless, these investigations mainly rely on field surveys or manual interpretation of aerial/satellite images, which requires enormous human, material, and financial resources. In the past few years, artificial intelligence (AI) in the form of deep learning (DL) could automatically extract representative abstract features, thus obtaining remarkable achievements in many fields [11,12]. Therefore, there is an urgent need to design a DL-based monitoring method that allows automatic access to the latest monitoring results of the gullies relying on the existing historical data.

As shown in Figure 1, the automatic monitoring process of gully erosion is illustrated by taking northeastern China as an example. In this figure, D^L is the historical data with labels, in which the labels come from the previous manual investigation process, whereas D^U is the latest monitoring data without labels and needs to be automatically obtained the monitoring results. Moreover, D^L and D^U were acquired over the same area at different times. Nevertheless, due to the different acquisition times, it is also difficult to ensure the consistency of the sensors collecting the two types of data. As a result, the same geographical object from D^L and D^U could have significant feature differences. Under the above conditions, the automatic monitoring method aims to accurately attain the latest monitoring results (the monitoring results of D^U).

In the field of DL, supervised semantic segmentation methods based on convolutional neural networks (CNNs) have attracted extensive research and application [13,14,15]. For instance, Wang et al. [16] proposed a segmentation method based on object context and boundary enhanced loss for earthquake-damaged buildings, aimed at enhancing the feature representation ability and refining the segmented boundaries of earthquake-damaged buildings; Yu et al. [17] researched the impact of the attention mechanism on the segmentation model and designed an attention-gates Unetwork (AGs-Unet) for buildings segmentation; Zhu et al. [18] introduced a multiscale-aware and segmentation-prior conditional random fields (MSCRF) to solve the problem of excessive smoothing of building edges in segmentation results. In fact, for the latest monitoring data (test data), it is difficult to achieve the desired monitoring results relying only on the historical data (training data) acquired by different sensors at different times.

To address this issue, we shift our focus to semi-supervised semantic segmentation methods. Compared with supervised methods, semi-supervised methods can provide pseudo-labels for the latest monitoring data to reduce the impact of feature differences [19,20]. Nevertheless, one inherent problem exists, in that the pseudo-labels inevitably contain noise. Therefore, some methods filter the predictions with confidence [21,22,23,24]. In other words, only highly confident predictions are used as pseudo-labels, whereas ambiguous ones are discarded. Obviously, these methods ignore numerous correct predictions with low confidence. Thus, many improved methods have been proposed. For example, Zhang et al. [25] exploited virtual adversarial perturbation and density-aware entropy to find valuable predictions with low confidence as the candidate samples; Wang et al. [26] argued that every pixel matters to the model training and proposed a category-wise queue that consists of negative samples to filter unreliable pixels; Yao et al. [27] measured the variance of pseudo-labels and regularized the network to learn with more confident pseudo-labels; He et al. [28] presented an effective distribution alignment and random sampling method to create unbiased pseudo-labels that match the true class distribution estimated from the labeled data.

Although many achievements have been made in semi-supervised methods, several limitations still remain for high-precision gully erosion monitoring. (1) Due to the enormous differences between historical data and the latest monitoring data, it is a challenge to provide credible pseudo-labels for the latest monitoring data. Thus, how to mine more valuable information to improve the reliability of pseudo-labels is a limitation that needs to be addressed. (2) The pseudo-labels inevitably contain noise, and even state-of-the-art methods cannot solve this problem. Therefore, it is necessary to further reduce the impact of pseudo-label noise. In response to these limitations, a novel semi-supervised semantic segmentation with boundary-guided pseudo-label generation strategy and adaptive loss function method is proposed. The main contributions of this paper are summarized as follows:

(1): To improve the reliability of pseudo-labels, we propose a boundary-guided pseudo-label generation strategy (BPGS), which is composed of an object boundary generator and a multi-evidence fusion strategy. First, an object boundary generator based on CNNs and superpixel segmentation is proposed to output boundary maps of geographic objects, aiming to exploit the structural prior information and neighborhood correlation of pixels. On this basis, guided by these boundary maps, a multi-evidence fusion strategy is designed to fully utilize historical labels as well as the output of the student model and the teacher model to generate high-quality pseudo-labels.
(2): To alleviate the impact of pseudo-label noise, an adaptive loss function based on centroid similarity is developed. In this loss function, centroid similarity (CSIM) is designed to measure the reliability of pseudo-labels and adjust the loss value. In this manner, the loss value is weighted lower when the reliability is lower. Consequently, we can endure pseudo-label noise through this loss function during the training process.
(3): To validate our method, two benchmark datasets for gully erosion monitoring are constructed according to the satellite data acquired in northeastern China.

The rest of the paper is organized as follows: Section 2 introduces the proposed method in detail; Section 3 displays the datasets, experiments, and results; Section 4 presents a discussion on this paper; Section 5 draws the conclusions and offers some prospects.

2. Method

For gully erosion monitoring, we propose a semi-supervised semantic segmentation with boundary-guided pseudo-label generation strategy and adaptive loss function method, illustrated in Figure 2. Our method contains two models with the same backbone, including a student model (e.g.,

Ψ_{A}

in Figure 2) and a teacher model (e.g.,

Ψ_{B}

in Figure 2). The historical data with labels (D^L) are fed directly into

Ψ_{A}

for supervised training. Nevertheless, as for the latest monitoring data (D^U), the training process consists of four steps. First, we employ the designed object boundary generator to obtain the boundary maps of D^U. Next, by using

Ψ_{A}

and

Ψ_{B}

to predict D^U, we can obtain the predictions and confidences of two models for D^U. Then, guided by the boundary maps, a multi-evidence fusion strategy combines historical labels and the output of two models to generate reliable pseudo-labels. Finally, the proposed centroid similarity automatically adjusts the loss value by measuring the reliability of pseudo-labels during the training process. Based on the above operations, the training of

Ψ_{A}

is completed. Furthermore, the proposed method is converted to an iterative training framework for better performance, and a detailed introduction can be found in Section 2.3.

2.1. Boundary-Guided Pseudo-Label Generation Strategy

This section describes the process of generating high-quality pseudo-labels through the proposed BPGS. The BPGS consists of an object boundary generator and a multi-evidence fusion strategy, corresponding to serial numbers 1–3 in Figure 2. Then, we will describe the object boundary generator and the multi-evidence fusion strategy in detail.

2.1.1. Object Boundary Generator

Most semi-supervised methods ignore structural prior information and pixel neighborhood information when generating pseudo-labels, thus often producing noisy predictions around the object boundaries. To address this, we propose the object boundary generator to extract the boundary maps of geographic objects, thus providing guidance for the subsequent pseudo-label generation.

An object instance often includes similar colors or texture features [29,30], so grouping spatially continuous pixels that have similar features into the same object is a reasonable strategy. Based on this principle, we first designed a dual attention network based on CNNs to extract pixels with similar features, in which two attention modules enhance the ability to focus on the gullies. On this basis, a superpixel segmentation method, the SLIC method [31,32], is applied to further identify spatially continuous pixels. Based on the above introduction, the specific flowchart of the object boundary generator is shown in Figure 3.

As shown in Figure 3, the input is an image of gully erosion, and the output is the boundary map of this image. We first input this image into the dual attention network. In this way, pixels with similar features are aggregated into an object. Then, the SLIC method is employed to refine the output of the dual attention network by mining the spatial relationships between pixels. On this basis, the refined results are used as labels for updating the parameters of the dual attention network. Repeat the above operations until the set termination condition is reached. The termination condition can be found in Section 3.2.2. Finally, we extract the boundary pixels in the final refined result as the boundary map. Based on the above steps, we can extract the boundary map of the input image. To illustrate the object boundary generator in more detail, we will introduce the dual attention network and the SLIC refinement.

(1): Dual attention network: As shown in Figure 3, the dual attention network consists of three parts, taking an image as the input and the predictions of this image as the output. The first part is features encoding, consisting of three convolution blocks. Each block includes a convolution layer, a batch normalization layer, and a ReLU activation function layer. The second part is dual attention enhancement, which is beneficial for focusing on the information of interest and suppressing irrelevant background information. In this part, we adopt a channel attention module and a spatial attention module, as shown in Figure 4. As for the third part, a convolution layer, a batch normalization layer, and an argmax method are used to decode the features. With the above three parts, pixels with similar features can be aggregated into an object.
(2): SLIC refinement: The dual attention network aims to aggregate pixels with similar features into one object. However, it is also preferable for objects to be spatially continuous [29]. Meanwhile, the SLIC method considers the spatial correlation of pixels in images when generating irregular pixel regions (superpixels). Therefore, we adopt the SLIC method to refine the output of the dual attention network. Specifically, based on the SLIC method, we first extract the superpixel set $S = {S_{o}}_{o = 1}^{O}$ from the input image, where O is the total number of superpixels, and $S_{o}$ denotes a set of the index of pixels belonging to the oth superpixel. Then, on the basis of the predictions obtained from the dual attention network and $S_{o}$ , we can obtain $L_{o}$ (the predictions for pixels belonging to $S_{o}$ ). Next, we force the most frequent category in $L_{o}$ as the category for all pixels of $S_{o}$ . Finally, iterate the above steps until all superpixels are updated. Based on the above operations, continuous pixels with similar features are aggregated into the same object.

2.1.2. Multi-Evidence Fusion Strategy

Although the object boundary generator is able to exploit the structural prior information and neighborhood correlations of pixels, more useful information should also be mined to improve the reliability of pseudo-labels. Compared with other disasters, such as earthquakes and floods, most gullies will not change significantly in the short term. As shown in Figure 1, although D^L and D^U were acquired over the same area at different times, the gullies in them had not altered dramatically. This means that historical labels contain a wealth of valuable information for generating pseudo-labels. Meanwhile, some scholars argue that different model initialization could help models describe the same data from different aspects or perspectives and significantly improve performance [33]. Inspired by this view,

Ψ_{A}

and

Ψ_{B}

were pre-trained on historical data from two different initializations, thus providing different supports for pseudo-labels. In summary, we take the historical labels and the output of two models with different initializations as evidence data to generate pseudo-labels. However, with the introduction of more evidence, this also brings evidence conflict. In other words, as a basis for determining whether a pixel belongs to gully erosion, three evidence data may give opposite conclusions, and the experimental results in Section 4.1 verify that it is difficult to achieve reliable results for the simple practice of taking the intersections of all evidence data. Therefore, a multi-evidence fusion strategy is designed to combine three evidence data in a holistic way.

The multi-evidence fusion strategy is based on the Dempster/Shafer (D–S) evidence theory [34,35,36] and generates reliable labels under the guidance of boundary maps. Specifically, for an input image, we first denote its boundary map as

R = {R_{i}}_{i = 1}^{I}

. Here, I is the total number of objects in the boundary map, and

R_{i}

is the ith object. According to the D–S evidence theory, the decision fusion framework can be defined as

Θ :

{GE, NG} for any object

R_{i}

, where GE and NG denote gully erosion and non-gully erosion, respectively. Thus, the non-empty subset

ψ

of

Θ

should contain {GE}, {NG}, and {GE, NG}.

Then, we define the basic probability assignment formula (BPAF) of

ψ

as

m (ψ)

, which can be represented as follows:

m (ψ) = \sum_{\cap ψ^{'} = ψ} \prod_{1 \leq g \leq G} m_{g} (ψ^{'}) / \sum_{\cap ψ^{'} \neq \emptyset} \prod_{1 \leq g \leq G} m_{g} (ψ^{'})

(1)

where G is the total number of evidence data,

m_{g} (\cdot)

is the BPAF of gth evidence. On this basis, the corresponding BPAF of

R_{i}

is established through the following equations:

m_{i g} [{G E}] = C_{i g} \times P_{i g}

(2)

m_{i g} [{N G}] = (1 - C_{i g}) \times (1 - P_{i g})

(3)

m_{i g} [{G E, N G}] = (C_{i g} + P_{i g}) - 2 C_{i g} P_{i g}

(4)

where

C_{i g}

and

P_{i g}

are the overall confidence indicators that

R_{i}

belongs to GE in the gth evidence and can be represented as

C_{i g} = \frac{1}{N (R_{i})} \times \sum_{q = 1}^{N (R_{i})} E_{i g}^{q}

(5)

P_{i g} = \frac{1}{N (R_{i})} \times ϕ_{i g}

(6)

Here,

N (R_{i})

is the total number of pixels of

R_{i}

,

E_{i g}^{q}

is the confidence that the qth pixel of

R_{i}

belongs to GE in the projection of

R_{i}

on the confidences of gth evidence,

ϕ_{i g}

is the total number of pixels belonging to GE in the projection of

R_{i}

on the predictions of gth evidence. In addition, it should be noted that we set the confidence to 1 for pixels belonging to GE and 0 for pixels belonging to NG when adopting history labels to calculate

E_{i g}^{q}

.

Finally, calculate the

m_{i} [{G E}]

,

m_{i} [{N G}]

, and

m_{i} [{G E, N G}]

by using Equations (1)–(6). If

m_{i} [{G E}]

>

m_{i} [{N G}]

and

m_{i} [{G E}]

>

m_{i} [{G E, N G}]

are satisfied,

R_{i}

belongs to gully erosion; otherwise,

R_{i}

belongs to non-gully erosion. On this basis, iterate over all objects of

R

to obtain the pseudo-label of the input image.

2.2. The Adaptive Loss Function Based on Centroid Similarity

Since it is unlikely that all evidence data will make mistakes at the same time, their similarity also reflects the reliability of pseudo-labels. Moreover, the centroid reflects the mass distribution of an object, so the closer the distance between the pixel and the centroid, the more reliable it will be as proof of monitoring [37,38]. Thus, the similarity should also be more focused on the representative pixels of each object. Based on the above analysis, the adaptive loss function based on CSIM is proposed. The adaptive loss

ζ

includes a supervised loss

ζ_{s l}

and a pseudo-supervised loss

ζ_{p l}

and can be represented as

ζ = ζ_{s l} + ζ_{p l}

(7)

The supervised loss

ζ_{s l}

is formulated using the standard binary cross-entropy loss in D^L:

ζ_{s l} = \frac{1}{| D^{L} |} \sum_{Z \in D^{L}} - \frac{1}{W \times H} \sum_{r = 1}^{W \times H} [υ_{r}^{Z} \times \log (F_{r}^{Z}) + (1 - υ_{r}^{Z}) \times \log (1 - F_{r}^{Z})]

(8)

where Z is an image in D^L, W and H represent the width and height of Z, respectively.

F_{r}^{Z} \in [0, 1]

and

υ_{r}^{Z} \in {0, 1}

represent the prediction and label for the rth pixel of Z, respectively. Meanwhile, it should be noted that if the pixel belongs to gullies, we set its label to 1. Otherwise, its label is 0. Similarly, the pseudo-supervised loss

ζ_{p l}

in D^U can be represented as

ζ_{p l} = \frac{1}{| D^{U} |} \sum_{X \in D^{U}} - \frac{1}{W \times H} \sum_{r = 1}^{W \times H} [υ_{r}^{X} \times \log (F_{r}^{X}) + (1 - υ_{r}^{X}) \times \log (1 - F_{r}^{X})] \times C S I M_{r}^{X}

(9)

where X is an image in D^U,

F_{r}^{X} \in [0, 1]

and

υ_{r}^{X} \in {0, 1}

represent the prediction and pseudo-label for rth pixel of X, respectively.

C S I M_{r}^{X}

is the CSIM for rth pixel of X, i.e., the reliability of pseudo-labels for rth pixel of X. The specific introduction of

C S I M_{r}^{X}

is as follows.

For ease of introduction, we let the rth pixel of X be pixel j, and its projection on the boundary map belongs to

R_{i}

. Meanwhile,

R_{i X}

and

R_{i T}

represent the projection of

R_{i}

on X and historical labels, respectively.

R_{i A}

and

R_{i B}

represent the projection of

R_{i}

on the predictions of

Ψ_{A}

and

Ψ_{B}

for X, respectively. On this basis, the specific steps for calculating

C S I M_{r}^{X}

are as follows.

Step 1: Define

(ε, μ)

as the centroid coordinates of

R_{i X}

, and the calculation of centroid coordinates can be defined as

ε = \frac{\sum_{q = 1}^{N (R_{i X})} b_{q} ε_{q}}{\sum_{q = 1}^{N (R_{i X})} b_{q}}

(10)

μ = \frac{\sum_{q = 1}^{N (R_{i X})} b_{q} μ_{q}}{\sum_{q = 1}^{N (R_{i X})} b_{q}}

(11)

where

N (R_{i X})

is the total number of pixels of

R_{i X}

,

b_{q}

is the pixel value of the qth pixel in

R_{i X}

.

ε_{q}

and

μ_{q}

are the X-axis and Y-axis coordinates of the qth pixel, respectively.

Step 2: Compute the Euclidean distance of each pixel in

R_{i X}

from the centroid coordinates, thus obtaining the centroid distance set ED.

Step 3: Calculation of the pixel value after the centroid distance constraint, and the specific formula can be defined as

R_{i A}^{*} (j) = R_{i A} (j) \times e^{(1 - [E D (j) / \max (E D)])}

(12)

where

e^{(\cdot)}

is the exponential function,

R_{i A} (j)

is the value of pixel j in

R_{i A}

,

R_{i A}^{*} (j)

is the constrained value, and

E D (j)

is the distance from pixel j to the centroid. On this basis, it can be seen that the centroid distance constraint (

e^{(1 - [E D (j) / \max (E D)])}

) varies according to the distance to the centroid and reaches a maximum at the centroid. Therefore, the centroid distance constraint can increase the divergence around the centroid (more representative predictions) and correspondingly alleviate the effect of predictions far from the centroid.

Step 4: Repeating Step2, the constrained object

R_{i A}^{*}

can be obtained by traversing each pixel of

R_{i A}

. Similarly,

R_{i B}^{*}

and

R_{i T}^{*}

can be obtained.

Step 5: Calculation of the reliability for the rth pixel, and the specific calculation is as follows:

C S I M_{r}^{X} = (1 - κ) \times S M_{i A i B}^{*} + κ \times [(S M_{i A i T}^{*} + S M_{i B i T}^{*}) / 2]

(13)

where

S M_{i A i B}^{*}

is the structural similarity (SSIM) of

R_{i A}^{*}

and

R_{i B}^{*}

, and the specific calculation process of SSIM can be referred to [39,40]. Similarly,

S M_{i A i T}^{*}

and

S M_{i B i T}^{*}

can be calculated. Furthermore,

κ

is the similarity regulation indicator. When

κ

is small, the student model will focus more on the predictions of

Ψ_{A}

and

Ψ_{B}

. Otherwise, it will pay more attention to historical labels.

2.3. Model Training and Testing

2.3.1. Iterative Training Framework

In this section, our method is converted to an iterative training framework for better performance. Among them, both

Ψ_{A}

and

Ψ_{B}

can be used as the student model first. The pseudo-code using

Ψ_{A}

first as the student model is shown in Algorithm 1.

Algorithm 1 Iterative Training Framework

Input: Historical data (labeled data) D^L, the latest monitoring data (unlabeled data) D^U.
Output: Trained model

Ψ_{A}^{V}

and

Ψ_{B}^{V}

.
1 Initialize

Ψ_{A}^{0}

and

Ψ_{B}^{0}

with different pre-trained weights.
2 Train

Ψ_{A}^{0}

on D^L.
3 Train

Ψ_{B}^{0}

on D^L.
4 for

n \in {1, \cdot \cdot \cdot, V}

, do
5 Predict on D^U with

Ψ_{A}^{n - 1}

.
6 Predict on D^U with

Ψ_{B}^{n - 1}

.
7 Use BPGS to fuse historical labels and the output of

Ψ_{A}^{n - 1}

and

Ψ_{B}^{n - 1}

\to

pseudo-labeled data D^PL.
8 Fine tune

Ψ_{A}^{n}

from

Ψ_{A}^{n - 1}

on both D^L and latest D^PL with the adaptive loss

ζ

.
9 Predict on D^U with

Ψ_{A}^{n}

.
10 Use BPGS to fuse historical labels and the output of

Ψ_{A}^{n}

and

Ψ_{B}^{n - 1}

\to

pseudo-labeled data D^PL.
11 Fine tune

Ψ_{B}^{n}

from

Ψ_{B}^{n - 1}

on both D^L and latest D^PL with the adaptive loss

ζ

.
12 end for

2.3.2. Testing Process

In the testing phase, we first employ

Ψ_{A}^{V}

and

Ψ_{B}^{V}

to predict the test images. Then, the fusion results of the two models can be attained using Equations (1)–(6). Furthermore, the test labels are pixel wise and contain boundary pixels, so the closed operation is used to process these fusion results. Based on the above steps, we can obtain the final monitoring results.

3. Experiments and Results

3.1. Datasets

3.1.1. Dataset Description

Due to the lack of publicly available datasets for gully erosion monitoring, we constructed two benchmark datasets (including HC2012 and HC2020) to verify the performance of the proposed method. The HC2012 dataset was carried out based on the remote-sensing images of Huachuan County, Heilongjiang Province, China, collected by the ZY-3 satellite in 2012. The images included the panchromatic and multispectral (blue, green, red, and near-infrared) bands, with spatial resolutions of 2.1 m and 5.8 m, respectively. By using ENVI software, the images were fused to pan-sharpened RGB images with a spatial resolution of 2.1 m, as shown in Figure 5a. The HC2020 dataset was based on the same-site (Huachuan County) remote-sensing images collected by the GF-2 satellite in 2020. The images also covered the panchromatic and multispectral bands, with spatial resolutions of 1 m and 4 m, respectively. Then, the images were fused to pan-sharpened RGB images with a spatial resolution of 1 m, as shown in Figure 5b. In addition, during the experiments, the pan-sharpened RGB images of two datasets were also rigorously registered to facilitate the implementation of the proposed method.

3.1.2. Label of Dataset

In view of the experimental hardware condition and keeping abundant spatial contexts of the gullies, as far as possible, we segmented the original pan-sharpened RGB images into sub-images with 384 × 384 pixels. As a result, the HC2012 dataset obtained 981 samples. In addition, since two datasets were registered, the samples with the same number and size were obtained for the HC2020 dataset. Based on this, we annotated these samples with the assistance of the water department. Figure 6 shows a sample from the HC2020 dataset and its corresponding label.

3.2. Experimental Design and Implementation Details

3.2.1. Experimental Design

To evaluate our proposed method, we designed two experiments. In Experiment 1 (HC2012 to HC2020), the HC2012 dataset was employed as the training set (historical data), while the HC2020 dataset was used as the test set (the latest monitoring data). In Experiment 2 (HC2020 to HC2012), we reversed Experiment 1. In other words, the HC2020 dataset and the HC2012 dataset were treated as the training set and the test set, respectively. Moreover, based on the practical monitoring needs, we also provided the images of the test set as unlabeled data to the semi-supervised methods, whereas their labels were only used in the test phase to calculate the evaluation metrics. By carrying out the above two experiments, it is beneficial to validate the effectiveness and extensiveness of our method.

In these two experiments, our method was compared with four state-of-the-art methods, including two supervised and two semi-supervised methods. The supervised methods include Bisenetv2 [41] and Segformer [42], and the semi-supervised ones include U²PL [26] and DMT [33]. All comparison methods were reproduced using the open code of the original authors. Due to different data sources and hardware conditions, we made some necessary modifications, such as image size, batch size, number of categories, etc. To ensure the fairness of the experiments, the same data augmentation, learning rate, iterations, and optimizer were set, whereas the other hyperparameters followed the settings recommended by the original authors. Moreover, we evaluated different methods with four evaluation indicators, including precision, recall, F1 score, and intersection over union (IoU). The specific calculation formulae of these indicators can be found in Refs [43,44,45,46]. Compared with precision and recall, the F1 score and IoU are two comprehensive indicators. Hence, we will pay more attention to the IoU and F1 score in the experiments.

3.2.2. Implementation Details

The proposed method was implemented based on the PyTorch-1.7.0 framework in the Ubuntu 18.04 environment. All experiments were performed on a workstation. Its model was the Dell Precision T7920, which consists mainly of an Intel Xeon Silver 4210R CPU with 40G RAM and an Nvidia GeForce RTX 3090 GPU with 24G RAM. In the experiments,

Ψ_{A}^{0}

and

Ψ_{B}^{0}

used DeepLab-v3 with Resnet101 as the backbone but separately used pre-trained weights from COCO [47] and ImageNet [48]. In the training phase,

Ψ_{A}

was first used as the student model, V was set to 5, each iteration had 20 epochs, and the SGD optimizer with the poly learning rate schedule was used to optimize the network. Some common data augmentation methods were adopted, including the horizontal flip of the image, random rotation of the image within the angle range of [–15,15], and random scaling was set at 0.75 and 1.5. The termination condition of the object boundary generator in Section 2.1.1 was 128 iterations. Furthermore, the similarity regulation indicator

κ

in Equation (11) was set to 0.6, and its detailed introduction can be found in Section 4.2.

3.3. Experiment 1: HC2012 to HC2020

In Experiment 1, the HC2012 dataset was employed as the historical data (training set), while the HC2020 dataset was treated as the latest monitoring data (test set). Figure 7 displays the latest monitoring data and the corresponding labels. Among them, some representative lands are marked with orange, yellow, blue, and red rectangles. On this basis, the monitoring results of five different methods for representative lands are shown in Figure 8.

As shown in Figure 8, two supervised methods (Bisenetv2 and Segformer) manifest poor performance in all representative lands, which is consistent with the previous analysis of supervised methods. Compared with supervised methods, semi-supervised methods can provide pseudo-labels for the latest monitoring data to reduce the impact of feature differences. Thus, three semi-supervised methods (U²PL, DMT, and the proposed method) monitor most of the gully erosion pixels. However, there are plenty of false positive pixels in the results of U²PL and DMT. For example, in the field roads (the golden rectangles in the second and third rows of Figure 8), wasteland (the golden rectangle in the first row), and the edge regions of the gullies (the golden rectangle in the fourth row), U²PL and DMT all present large areas of false positive pixels. These results suggest that it is difficult to acquire reliable monitoring results through the teacher model alone. With the introduction of BPGS and the adaptive loss function, the proposed method can better handle the impact of feature differences on automatic monitoring, thus presenting the best visual results.

To further evaluate the performance of the five methods, precision, recall, F1 score, and IoU were selected as the evaluation indicators for gully erosion monitoring. On this basis, the three times average quantitative evaluation results of five different methods are shown in Table 1.

As shown in Table 1, the proposed method outperforms the other methods in terms of the IoU, precision, and F1 score. In particular, it raises the IOU by 22.1% compared with the second ranked method (DMT). As state-of-the-art supervised methods, Bisenetv2 and Segformer manifest poor performance in all evaluation indicators, which is consistent with the visual analysis. Moreover, the recalls of U²PL and DMT are higher than 83%, while their precisions are lower than 45%. This suggests that these two methods improve recall by sacrificing precision. Thus, they perform worse than the proposed method in terms of more balanced indicators (F1 score and IoU). Nevertheless, BPGS generates reliable pseudo-labels, and the adaptive loss function reduces the impact of pseudo-label noise, so the proposed method can acquire the best quantitative evaluation results. In summary, our method is more competent for complex gully erosion monitoring than other methods.

3.4. Experiment 2: HC2020 to HC2012

In Experiment 2, the HC2020 dataset and the HC2012 dataset were employed as the historical data and the latest monitoring data, respectively. Figure 9 displays the latest monitoring data and the corresponding labels. On this basis, the monitoring results of five different methods for representative lands are shown in Figure 10.

As shown in Figure 10, the supervised methods display massive false negative results, and the three semi-supervised ones monitor most of the gully erosion pixels, which is similar to Experiment 1. As in the golden rectangle in the first row of Figure 10, the three semi-supervised methods achieve satisfactory results, while the supervised ones still have a mass of false negative pixels. These results again justify the choice of semi-supervised methods for gully monitoring. When monitoring some field roads with similar characteristics to gullies (the golden rectangles in the second and third rows of Figure 10), U²PL and DMT show false positive results, while the proposed method is able to monitor correctly. Meanwhile, compared with Experiment 1, the edge regions of the gullies are more likely to show errors due to the reduced image resolution. For example, in the golden rectangle in the fourth row of Figure 10, three semi-supervised methods display different degrees of false positive and false negative results. Nevertheless, the proposed method still achieves the best visual results in this region. To further evaluate the performance of the methods, the three times average quantitative evaluation results of five different methods are shown in Table 2.

As shown in Table 2, the proposed method outperforms the other methods in terms of the IoU, precision, and F1 score. Among them, it attains an IoU improved by 18.3% and an F1 score improved by 6% in comparison with the second ranked method (DMT). Due to the feature differences between historical data and the latest monitoring data, Bisenetv2 and Segformer once again present bad performance in all evaluation indicators. As for U²PL and DMT, these methods exhibit similar evaluation results as in Experiment 1, i.e., higher recall and lower precision. In addition, compared with Experiment 1, the IoUs of all methods show different degrees of degradation due to the reduced image resolution. Nevertheless, because BPGS can merge more valuable information to generate reliable pseudo-labels, and the adaptive loss can further reduce the impact of pseudo-label noise, the IoU of the proposed method is still higher than 60%. As a result, the proposed method can better address the impact of feature differences on automatic monitoring and possesses a higher superiority in complex gully erosion monitoring tasks.

4. Discussion

4.1. Ablation Study

In order to analyze the effectiveness of BPGS and adaptive loss function in the proposed method, an ablation study was conducted. First, we constructed a baseline method. Based on the iterative framework in Section 2.3, this baseline method used the intersection of evidence data as the pseudo-labels of the latest monitoring data. Additionally, this baseline method applied the standard binary cross-entropy loss function. On this basis, we take the IoU indicator as the evaluation criterion, and the three times average evaluation results of the ablation study are illustrated in Table 3.

As shown in Table 3, Baseline + BPGS improves the IoU by 11.4% and 9.7%, respectively, compared with Baseline. This demonstrates that the proposed BPGS can effectively improve the reliability of pseudo-labels by mining more useful information. Additionally, compared with Baseline+ BPGS, the proposed method (Baseline + BPGS + Adaptive Loss) attains preferable results. Hence, it is feasible to adjust the loss function according to the similarity of evidence data, thereby effectively mitigating the impact of pseudo-label noise. Based on the above analysis, it can be concluded that BPGS and the adaptive loss function are effective and necessary.

4.2. Analysis of the Setting of Similarity Regulation Indicator

In the adaptive loss function, the similarity regulation indicator

κ

in Equation (11) is used to determine the level of attention to different evidence data. To clarify the setting basis of

κ

, the relationship between parameter

κ

and the IoU is analyzed, as shown in Figure 11.

As shown in Figure 11, the horizontal coordinates are the similarity regulation indicator

κ

, the interval is 0.05, the longitudinal coordinates are the IoU, and the results of the two experiments are expressed by two curves with different styles. In addition, we also marked separately the highest IoU of the two experiments. With the increase in

κ

, the IoU curves of the two experiments first gradually increase and then decrease after reaching the highest point. Among them,

κ

= 0.6 and

κ

= 0.55 correspond to the highest points of the IoU curves, with 64.5% and 60.9% in Experiment 1 and Experiment 2, respectively. The detailed

κ - IoU

values in the two experiments are shown in Table 4.

By analyzing Table 4, we discover that when

κ

was set at 0.6, the IoU could reach 60.4%, which is only 0.5% lower than the corresponding highest IoU in Experiment 2. That is to say, the ideal monitoring results could be obtained in the two experiments by setting

κ

at 0.6. Therefore, to avoid excessive hyperparameter tuning, the similarity regulation indicator

κ

is advised to be set directly at 0.6 in practical applications.

5. Conclusions

In this paper, we propose a semi-supervised semantic segmentation with boundary-guided pseudo-label generation strategy and adaptive loss function method. To the best of our knowledge, this is the first paper to implement automatic gully erosion monitoring from data acquired by different sensors at different times. In this method, the boundary-guided pseudo-label generation strategy (BPGS) composed of the object boundary generator and the multi-evidence fusion strategy is designed to enhance the reliability of pseudo-labels. Meanwhile, the adaptive loss function based on centroid similarity (CSIM) is proposed to further alleviate the impact of pseudo-label noise. Two experiments carried out on the HC2012 and HC2020 datasets show that the proposed method can better cope with the impact of feature differences on automatic monitoring than other state-of-the-art methods and present an IoU above 64% and 60%, respectively. Therefore, our method is more suitable for complex gully erosion monitoring. Furthermore, the ablation study demonstrates that BPGS and the adaptive loss function are effective and necessary. Concerning future work, we will further improve the performance and reliability of gully erosion monitoring, thereby continuing to make contributions to the research on automatic gully erosion monitoring.

Author Contributions

Conceptualization, C.Z. and N.S.; methodology, Y.S. and N.S.; software, Y.S. and Y.Y.; validation, Y.S., Y.Y. and Y.L.; formal analysis, C.Z. and Y.S.; data curation, C.Z. and Y.L.; writing—original draft preparation, Y.S. and N.S.; writing—review and editing, C.Z. and Y.Y.; supervision, C.Z., N.S. and Y.L.; project administration, C.Z. and N.S.; funding acquisition, C.Z. and N.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62071136, No. 62271159, No. 62002083, No. 61971153); Heilongjiang Outstanding Youth Foundation (YQ2022F002); Heilongjiang Postdoctoral Foundation (LBH-Q20085 and LBH-Z20051); Fundamental Research Funds for the Central Universities Grant (3072022QBZ0805, 3072021CFT0801 and 3072022CF0808).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

All authors have reviewed the manuscript and approved submission to this journal. The authors declare that there are no conflicts of interest regarding the publication of this article and no self–citations included in the manuscript.

References

Valentin, C.; Poesen, J.; Li, Y. Gully Erosion: Impacts, Factors and Control. Catena 2005, 63, 132–153. [Google Scholar] [CrossRef]
Hitouri, S.; Varasano, A.; Mohajane, M.; Ijlil, S.; Essahlaoui, N.; Ali, S.A.; Essahlaoui, A.; Pham, Q.B.; Waleed, M.; Palateerdham, S.K.; et al. Hybrid Machine Learning Approach for Gully Erosion Mapping Susceptibility at a Watershed Scale. ISPRS Int. J. Geo-Inf. 2022, 11, 401. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, G.; Wang, C.; Xing, S. Assessment of the Gully Erosion Susceptibility Using Three Hybrid Models in One Small Watershed on the Loess Plateau. Soil Tillage Res. 2022, 223, 105481. [Google Scholar] [CrossRef]
Kong, H.; Wu, D.; Yang, L. Quantification of Soil Erosion in Small Watersheds on the Loess Plateau Based on a Modified Soil Loss Model. Water Supply 2022, 22, 6308–6320. [Google Scholar] [CrossRef]
Rafique, N.; Bhat, M.S.; Muntazari, T.H. Identification and Mapping of Land Degradation through Remote Sensing in Budgam District of Jammu and Kashmir, India. Indian J. Ecol. 2022, 49, 602–606. [Google Scholar]
Wang, R.; Sun, H.; Yang, J.; Zhang, S.; Fu, H.; Wang, N.; Liu, Q. Quantitative Evaluation of Gully Erosion Using Multitemporal UAV Data in the Southern Black Soil Region of Northeast China: A Case Study. Remote Sens. 2022, 14, 1479. [Google Scholar] [CrossRef]
Liu, G.; Zheng, F.; Wilson, G.V.; Xu, X.; Liu, C. Three decades of ephemeral gully erosion studies. Soil Tillage Res. 2021, 212, 105046. [Google Scholar] [CrossRef]
Slimane, A.B.; Raclot, D.; Rebai, H.; Le Bissonnais, Y.; Planchon, O.; Bouksila, F. Combining field monitoring and aerial imagery to evaluate the role of gully erosion in a Mediterranean catchment (Tunisia). Catena 2018, 170, 73–83. [Google Scholar] [CrossRef]
Evans, M.; Lindsay, J. High resolution quantification of gully erosion in upland peatlands at the landscape scale. Earth Surf. Proc. Land 2010, 35, 876–886. [Google Scholar] [CrossRef]
Li, H.; Cruse, R.M.; Bingner, R.L.; Gesch, K.R.; Zhang, X. Evaluating ephemeral gully erosion impact on Zea mays L. yield and economics using AnnAGNPS. Soil Till. Res. 2016, 155, 157–165. [Google Scholar] [CrossRef]
Guo, Y.; Liu, Y.; Georgiou, T.; Lew, M.S. A review of semantic segmentation using deep neural networks. Int. J. Multimed. Inf. Retr. 2018, 7, 87–93. [Google Scholar] [CrossRef]
Shivappriya, S.N.; Priyadarsini, M.J.P.; Stateczny, A.; Puttamadappa, C.; Parameshachari, B.D. Cascade object detection and remote sensing object detection method based on trainable activation function. Remote Sens. 2021, 13, 200. [Google Scholar] [CrossRef]
Xie, S.; Hu, H. Facial Expression Recognition Using Hierarchical Features with Deep Comprehensive Multipatches Aggregation Convolutional Neural Networks. IEEE Trans. Multimed. 2018, 21, 211–220. [Google Scholar] [CrossRef]
Song, J.; Gao, S.; Zhu, Y.; Ma, C. A Survey of Remote Sensing Image Classification Based on CNNs. Big Earth Data 2019, 3, 232–254. [Google Scholar] [CrossRef]
Wang, G.; Fan, B.; Xiang, S.; Pan, C. Aggregating rich hierarchical features for scene classification in remote sensing imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 4104–4115. [Google Scholar] [CrossRef]
Wang, C.; Qiu, X.; Huan, H.; Wang, S.; Zhang, Y.; Chen, X.; He, W. Earthquake-Damaged Buildings Detection in Very High-Resolution Remote Sensing Images Based on Object Context and Boundary Enhanced Loss. Remote Sens. 2021, 13, 3119. [Google Scholar] [CrossRef]
Yu, M.; Chen, X.; Zhang, W.; Liu, Y. AGs-Unet: Building Extraction Model for High Resolution Remote Sensing Images Based on Attention Gates U Network. Sensors 2022, 22, 2932. [Google Scholar] [CrossRef]
Zhu, Q.; Li, Z.; Zhang, Y.; Guan, Q. Building Extraction from High Spatial Resolution Remote Sensing Images via Multiscale-Aware and Segmentation-Prior Conditional Random Fields. Remote Sens. 2020, 12, 3983. [Google Scholar] [CrossRef]
Zhang, P.; Zhang, B.; Zhang, T.; Chen, D.; Wang, Y.; Wen, F. Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Online, 19–25 June 2021; pp. 12414–12424. [Google Scholar]
Zhang, Q.; Zhang, J.; Liu, W.; Tao, D. Category anchor-guided unsupervised domain adaptation for semantic segmentation. In Proceedings of the Conference and Workshop on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; p. 32. [Google Scholar]
Yang, L.; Zhuo, W.; Qi, L.; Shi, Y.; Gao, Y. St++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–23 June 2022; pp. 4268–4277. [Google Scholar]
Xu, Y.; Shang, L.; Ye, J.; Qian, Q.; Li, Y.F.; Sun, B.; Jin, R. Dash: Semi-supervised learning with dynamic thresholding. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 11525–11536. [Google Scholar]
Zou, Y.; Zhang, Z.; Zhang, H.; Li, C.L.; Bian, X.; Huang, J.B.; Pfister, T. Pseudoseg: Designing pseudo labels for semantic segmentation. arXiv 2020, arXiv:2010.09713. [Google Scholar]
Zuo, S.; Yu, Y.; Liang, C.; Jiang, H.; Er, S.; Zhang, C.; Zha, H. Self-training with differentiable teacher. arXiv 2021, arXiv:2109.07049. [Google Scholar]
Zhang, W.; Zhu, L.; Hallinan, J.; Zhang, S.; Makmur, A.; Cai, Q.; Ooi, B.C. Boostmis: Boosting Medical Image Semi-Supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–23 June 2022; pp. 20666–20676. [Google Scholar]
Wang, Y.; Wang, H.; Shen, Y.; Fei, J.; Li, W.; Jin, G.; Le, X. Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–23 June 2022; pp. 4248–4257. [Google Scholar]
Yao, H.; Hu, X.; Li, X. Enhancing Pseudo Label Quality for Semi-Supervised Domain-Generalized Medical Image Segmentation. arXiv 2022, arXiv:2201.08657. [Google Scholar] [CrossRef]
He, R.; Yang, J.; Qi, X. Re-Distributing Biased Pseudo Labels for Semi-Supervised Semantic Segmentation: A Baseline Investigation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Online, 19–25 June 2021; pp. 6930–6940. [Google Scholar]
Kanezaki, A. Unsupervised Image Segmentation by Backpropagation. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 1543–1547. [Google Scholar]
Wang, C.; Shi, A.; Wang, X.; Wu, F.; Huang, F.; Xu, L. A novel multi-scale segmentation algorithm for high resolution remote sensing images based on wavelet transform and improved JSEG algorithm. Light Electron. Opt. 2014, 125, 5588–5595. [Google Scholar] [CrossRef]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC Superpixels Compared to State-of-the-art Superpixel Methods. IEEE Trans. Pattern Anal. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [PubMed]
Csillik, O. Fast segmentation and classification of very high resolution remote sensing data using SLIC superpixels. Remote Sens. 2017, 9, 243. [Google Scholar] [CrossRef] [Green Version]
Feng, Z.; Zhou, Q.; Gu, Q.; Tan, X.; Cheng, G.; Lu, X.; Ma, L. Dmt: Dynamic Mutual Training for Semi-Supervised Learning. Pattern Recogn. 2022, 2022, 108777. [Google Scholar] [CrossRef]
Wang, C.; Liu, H.; Shen, Y.; Zhao, K.; Xing, H.; Wu, H. High-Resolution Remote-Sensing Image-Change Detection Based on Morphological Attribute Profiles and Decision Fusion. Complexity 2020, 171, 8360361. [Google Scholar] [CrossRef]
Shi, A.; Gao, G.; Shen, S. Change detection of bitemporal multispectral images based on FCM and DS theory. EURASIP J. Adv. Sig. Process. 2016, 2016, 96. [Google Scholar] [CrossRef] [Green Version]
Dempster, A.P. Upper and lower probabilities induced by multivalue mapping. Ann. Math. Stat. 1967, 38, 325–339. [Google Scholar] [CrossRef]
Wang, C.; Zhang, Y.; Chen, X.; Jiang, H.; Mukherjee, M.; Wang, S. Automatic Building Detection from High-Resolution Remote Sensing Images Based on Joint Optimization and Decision Fusion of Morphological Attribute Profiles. Remote Sens. 2021, 13, 357. [Google Scholar] [CrossRef]
Trivedi, M.M.; Mills, J.K. Centroid calculation of the blastomere from 3D Z-Stack image data of a 2-cell mouse embryo. Biomed Signal. Proces. 2020, 57, 101726. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Brunet, D.; Vrscay, E.R.; Wang, Z. On the mathematical properties of the structural similarity index. IEEE Trans. Image Process. 2011, 21, 1488–1499. [Google Scholar] [CrossRef] [PubMed]
Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. Bisenet v2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. Int. J. Comput. Vision 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Proceedings of the Conference and Workshop on Neural Information Processing Systems, Vancouver, BC, Canada, 6–14 December 2021; pp. 12077–12090. [Google Scholar]
Pozzer, S.; Rezazadeh Azar, E.; Dalla Rosa, F.; Chamberlain Pravia, Z.M. Semantic segmentation of defects in infrared thermographic images of highly damaged concrete structures. J. Perform. Constr. Facil. 2021, 35, 04020131. [Google Scholar] [CrossRef]
Xia, L.; Zhang, R.; Chen, L.; Li, L.; Yi, T.; Wen, Y.; Xie, C. Evaluation of Deep Learning Segmentation Models for Detection of Pine Wilt Disease in Unmanned Aerial Vehicle Images. Remote Sens. 2021, 13, 3594. [Google Scholar] [CrossRef]
Peng, X.; Zhong, R.; Li, Z.; Li, Q. Optical remote sensing image change detection based on attention mechanism and image difference. IEEE Trans. Geosci. Remote 2020, 59, 7296–7307. [Google Scholar] [CrossRef]
He, N.; Fang, L.; Plaza, A. Hybrid first and second order attention Unet for building segmentation in remote sensing images. Sci. China Inform. Sci. 2020, 63, 140305. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L. Microsoft Coco: Common Objects in Context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 5–12 September 2014; pp. 740–755. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]

Figure 1. The automatic monitoring process of gully erosion. The input contains historical data with labels (D^L) and the latest monitoring data without labels (D^U). The output is the monitoring results of D^U.

Figure 2. The training flowchart of the proposed method. In the flowchart, our method contains a student model and a teacher model, and the training of the latest monitoring data is conducted sequentially from 1 to 4.

Figure 3. Flowchart of the object boundary generator. Boundary extraction is only performed after iteration has stopped.

Figure 4. The structure of channel attention module and spatial attention module. (a) The channel attention module. (b) The spatial attention module. The channel attention module helps focus on the meaningful channels, while the spatial attention module contributes to extracting features at the location of interest.

Figure 5. The pan-sharpened RGB images of two datasets. (a) HC2012 dataset. (b) HC2020 dataset. The HC2012 dataset was carried out based on the remote-sensing image of Huachuan County, Heilongjiang Province, China, collected by the ZY-3 satellite in 2012, whereas the HC2012 dataset was based on the same-site remote-sensing image collected by the GF-2 satellite in 2020.

Figure 6. The sample from the HC2020 dataset and its corresponding label. (a) Sample. (b) Corresponding label. In (b), white and black pixels stand for gully erosion and non-gully erosion, respectively.

Figure 7. The latest monitoring data of Experiment 1 and corresponding labels. Among them, the colored rectangles are the representative lands, and the white as well as black pixels in the labels separately represent gully erosion and non-gully erosion.

Figure 8. The monitoring results of five different methods for representative lands of Experiment 1: (a) original image; (b) Bisenetv2; (c) Segformer; (d) U²PL; (e) DMT; (f) the proposed method. The white, blue, and green pixels in (b–f) separately represent true positive, false positive, and false negative.

Figure 9. The latest monitoring data of Experiment 2 and corresponding labels. Among them, the colored rectangles are the representative lands, and the white as well as black pixels in the labels separately represent gully erosion and non-gully erosion.

Figure 10. The monitoring results of five different methods for representative lands of Experiment 2: (a) original image; (b) Bisenetv2; (c) Segformer; (d) U²PL; (e) DMT; (f) the proposed method. The white, blue, and green pixels in (b–f) separately represent true positive, false positive, and false negative.

Figure 11. The relationship between similarity regulation indicator

κ

and IoU.

Figure 11. The relationship between similarity regulation indicator

κ

and IoU.

Table 1. Quantitative evaluation results of Experiment 1. The entries in bold denote the best results in Experiment 1.

Methods	Precision	Recall	F1 score	IoU
Bisenetv2 [41]	50.9%	40.9%	45.4%	18.9%
Segformer [42]	52.7%	39.4%	45.1%	18.6%
U²PL [26]	43.2%	83.8%	57%	41.1%
DMT [33]	44.8%	85%	58.7%	42.4%
The proposed method	58.7%	60.4%	59.5%	64.5%

Table 2. Quantitative evaluation results of Experiment 2. The entries in bold denote the best results in Experiment 2.

Methods	Precision	Recall	F1 score	IoU
Bisenetv2	58.8%	26.6%	36.6%	18.1%
Segformer	57.4%	25.9 %	35.7%	16.2 %
U²PL	41.9%	82.8%	55.6%	39.3%
DMT	44.5%	85.5%	58.5%	42.1%
The proposed method	63.9%	65.2%	64.5%	60.4%

Table 3. The results of the ablation study. √ and — separately represent used and not used; the entries in bold denote the best results in the corresponding experiment; and Δ(IoU) is the IoU change compared with the baseline.

Experiment	Method	BPGS	Adaptive Loss	IoU	Δ(IoU)
Experiment 1	Baseline	–	–	44.7%	+0.0%
	Baseline	√	–	56.1%	+11.4%
	Baseline	√	√	64.5%	+19.8%
Experiment 2	Baseline	–	–	43.6%	+0.0%
	Baseline	√	–	53.3%	+9.7%
	Baseline	√	√	60.4%	+16.8%

Table 4. Detailed

κ - IoU

values in the two experiments. The entries in bold denote the best results in the corresponding experiment.

Table 4. Detailed

κ - IoU

values in the two experiments. The entries in bold denote the best results in the corresponding experiment.

Experiment 1	$κ$	0	0.05	0.1	0.15	0.2	0.25	0.3	0.35	0.4	0.45	0.5
	IoU (%)	51.4	52.1	53.5	54.7	56.3	57.5	59.8	60.7	62.2	63.4	62.7
	$κ$	0.55	0.6	0.65	0.7	0.75	0.8	0.85	0.9	0.95	1
	IoU (%)	63.3	64.5	62.9	60.3	58.7	56.8	55.4	54.8	54.3	53.6
Experiment 2	$κ$	0	0.05	0.1	0.15	0.2	0.25	0.3	0.35	0.4	0.45	0.5
	IoU (%)	49.8	49.6	50.7	51.2	53.9	55.3	57.1	59.5	58.3	59.4	60.1
	$κ$	0.55	0.6	0.65	0.7	0.75	0.8	0.85	0.9	0.95	1
	IoU (%)	60.9	60.4	59.7	58.1	57.8	55.9	53.4	53.6	52.5	52.1

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, C.; Shen, Y.; Su, N.; Yan, Y.; Liu, Y. Gully Erosion Monitoring Based on Semi-Supervised Semantic Segmentation with Boundary-Guided Pseudo-Label Generation Strategy and Adaptive Loss Function. Remote Sens. 2022, 14, 5110. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14205110

AMA Style

Zhao C, Shen Y, Su N, Yan Y, Liu Y. Gully Erosion Monitoring Based on Semi-Supervised Semantic Segmentation with Boundary-Guided Pseudo-Label Generation Strategy and Adaptive Loss Function. Remote Sensing. 2022; 14(20):5110. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14205110

Chicago/Turabian Style

Zhao, Chunhui, Yi Shen, Nan Su, Yiming Yan, and Yong Liu. 2022. "Gully Erosion Monitoring Based on Semi-Supervised Semantic Segmentation with Boundary-Guided Pseudo-Label Generation Strategy and Adaptive Loss Function" Remote Sensing 14, no. 20: 5110. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14205110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gully Erosion Monitoring Based on Semi-Supervised Semantic Segmentation with Boundary-Guided Pseudo-Label Generation Strategy and Adaptive Loss Function

Abstract

1. Introduction

2. Method

2.1. Boundary-Guided Pseudo-Label Generation Strategy

2.1.1. Object Boundary Generator

2.1.2. Multi-Evidence Fusion Strategy

2.2. The Adaptive Loss Function Based on Centroid Similarity

2.3. Model Training and Testing

2.3.1. Iterative Training Framework

2.3.2. Testing Process

3. Experiments and Results

3.1. Datasets

3.1.1. Dataset Description

3.1.2. Label of Dataset

3.2. Experimental Design and Implementation Details

3.2.1. Experimental Design

3.2.2. Implementation Details

3.3. Experiment 1: HC2012 to HC2020

3.4. Experiment 2: HC2020 to HC2012

4. Discussion

4.1. Ablation Study

4.2. Analysis of the Setting of Similarity Regulation Indicator

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI