A Spatial and Temporal Evolution Analysis of Desert Land Changes in Inner Mongolia by Combining a Structural Equation Model and Deep Learning

Chang, Xinyue; Zhang, Bing; Zhu, Hongbo; Song, Weidong; Ren, Dongfeng; Dai, Jiguang

doi:10.3390/rs15143617

Open AccessArticle

A Spatial and Temporal Evolution Analysis of Desert Land Changes in Inner Mongolia by Combining a Structural Equation Model and Deep Learning

¹

School of Mapping and Geoscience, Liaoning Technical University, Fuxin 123000, China

²

Collaborative Innovation Institute of Geospatial Information Service, Liaoning Technical University, Fuxin 123000, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(14), 3617; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15143617

Submission received: 9 June 2023 / Revised: 15 July 2023 / Accepted: 18 July 2023 / Published: 20 July 2023

(This article belongs to the Special Issue Remote Sensing and Ecosystem Modeling for Nature-Based Solutions)

Download

Browse Figures

Versions Notes

Abstract

:

With the wide application of remote sensing technology, target detection based on deep learning has become a research hotspot in the field of remote sensing. In this paper, aimed at the problems of the existing deep-learning-based desert land intelligent extraction methods, such as the spectral similarity of features and unclear texture features, we propose a multispectral remote sensing image desert land intelligent extraction method that takes into account band information. Firstly, we built a desert land intelligent interpretation dataset based on band weighting to enhance the desert land foreground features of the images. On this basis, we introduced the deformable convolution adaptive feature extraction capability to U-Net and developed the Y-Net model to extract desert land from Landsat remote sensing images covering the Inner Mongolia Autonomous Region. Finally, in order to analyze the spatial and temporal trends of the desert land in the study area, we used a structural equation model (SEM) to evaluate the direct and indirect effects of natural conditions and human activities, i.e., population density (PD), livestock volume (LS), evaporation (Evp), temperature (T), days of sandy wind conditions (LD), humidity (RH), precipitation (P), anthropogenic disturbance index (Adi), and cultivated land (CL). The results show that the F1-score of the Y-Net model proposed in this paper is 95.6%, which is 11.5% more than that of U-Net. Based on the Landsat satellite images, the area of desert land in the study area for six periods from 1990 to 2020 was extracted. The results show that the area of desert land in the study area first increased and then decreased. The main influencing factors have been precipitation, humidity, and anthropogenic disturbance, for which the path coefficients are 0.646, 0.615, and 0.367, respectively. This study will be of great significance in obtaining large-scale and long-term time series of desert land cover and revealing the inner mechanism of desert land area change.

Keywords:

desert land; Y-Net model; multispectral images; structural equation model; driving factors

1. Introduction

Desertification is a process of land degradation and the decrease or destruction of biological potential in arid, semi-arid, and semi-humid places, driven by the coupling of natural and socio-economic factors [1]. Combating desertification is one of the Sustainable Development Goals of the United Nations in the 2030 Agenda for Sustainable Development [2,3] and is one of the most significant potential ecological, social, and economic problems [4,5].

Earlier studies on desert land change relied on manual visual interpretation to monitor its evolution process, and empirical judgment and visual interpretation of remote sensing data to determine the desertification indicators [6,7]. However, these methods are inadequate and incapable of meeting the need for the efficient acquisition of large-scale spatial and temporal data for desert land. Remote sensing has been widely employed in desert land change monitoring [8,9,10]. The three major categories of remote-sensing-based desert land change monitoring methods are classifier approaches, spectral index extraction methods, and deep-learning-based classification/extraction methods. The classifier techniques are based on expert knowledge, and they predetermine the classification sample categories and regulate the training sample selection [8,11,12,13,14]; however, the classifier approaches are not ideal for dynamic desert land extraction and diverse contour features. The spectral index extraction methods extract the desert land from an image using the spectral information. This approach is based on multispectral images, where the spectral bands with sensitive desert land features are selected in order to maximize the use of the spectral information. Nonetheless, the construction of a desert land index is complex and frequently only applicable to a specific area [15,16], which constrains the scope of application.

Intelligent interpretation of remote sensing images based on deep learning has the advantage of rapid extraction of target features over a wide range, enabling regional or even global extraction of desert land [17,18]. Zhang et al. [19] initially proposed the SOF-Net model based on the DeepLab v3+ framework for general land-cover classification. However, on account of the complex structure of the model, the imagery lost too much desert land detail information during the convolution operation, resulting in relatively blurry extraction results. In comparison to the more complex DeepLab v3+, U-Net, with its simple structure, symmetric encoder–decoder framework, and skip-connection approach, improves the preservation of details. U-Net not only preserves the deeper semantic information of the target, but also combines the shallow spatial location information of the target to achieve accurate segmentation of the target objects, achieving an excellent performance in binary classification [20]. However, its segmentation results are hampered by the loss of edge and position information, making it difficult to make accurate judgments about details.

In addition, deep-learning-based methods acquire features primarily based on the target contours, but desert land areas in remote sensing images typically exhibit unapparent texture features, and the inherent convolutional neural network (CNN) is limited by the geometric transformation, which remains a challenge in complex desert land extraction [21]. The proposed deformable convolution approach makes the accurate extraction of complex scenes of multiple scales and with irregular features possible and is applicable to desert extraction without apparent texture features [22,23,24].

In this paper, we propose a multispectral remote sensing image desert land intelligent extraction method based on the spectral information and a deep CNN. Meanwhile, we describe how we built a desert land intelligent interpretation dataset based on band weighting in order to enhance the foreground characteristics of desert land imagery. On this basis, we introduce deformable convolution adaptive feature extraction to U-Net and propose the Y-Net model to extract desert land from Landsat remote sensing images of the Inner Mongolia Autonomous Region. To evaluate the direct and indirect effects of natural conditions and human activities, i.e., population density (PD), livestock volume (LS), evaporation (Evp), temperature (T), days of sandy wind conditions (LD), humidity (RH), precipitation (P), anthropogenic disturbance index (Adi), and cultivated land (CL), a structural equation model (SEM) was used to analyze the spatial and temporal trends of the desert land in the study area.

The remaining sections of this paper are organized as follows. Section 2 of this paper describes the study area and the data collection process. The model construction and evaluation indices are presented in Section 3. The details of the experiments and an analysis are provided in Section 4, including the details of the production of the desert land intelligent interpretation dataset and a comparison of the experimental results. In Section 5, we analyze the effects of the weighted band combination approach, the multiyear changes in the desert land area, and the SEM’s driving factors for the changes in the desert land area. Our conclusion is presented in Section 6.

2. Materials and Methods

2.1. Overview of the Study Area

The Inner Mongolia Autonomous Region (37°24′–53°23′N–97°12′–126°04′E) is located on the northern border of the People’s Republic of China (Figure 1). It is a vast territory, with a predominantly temperate continental monsoon climate and an extremely fragile ecological environment. It is one of the provinces in China with the greatest concentration of desert land and also the location with the greatest threat sand dunes [10,25]. Due to climatic factors and human activities (such as intensive grazing and abandoned arable land), the sand problem in Inner Mongolia has become more severe. Since the 1980s, Inner Mongolia has implemented numerous ecological restoration projects and policies, which have been successful in halting land sanding in certain areas, but the overall situation is still extremely serious.

2.2. Data Sources

2.2.1. Multispectral Image Data

In this study, we acquired Landsat 4–5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), and Landsat 8 Operational Land Imager (OLI) images for the years 1990, 2000, 2005, 2010, 2015, and 2020 with cloud cover of <10%, with a total of 624 images. The vegetation growth period (April to October) was also selected to facilitate the identification and extraction of sediment change information [26].

2.2.2. Meteorological and Socio-Economic Data

We used socio-economic data of Inner Mongolia obtained from the annual statistical almanac data for Inner Mongolia from 1990 to 2020, i.e., the four indicators of population density, livestock volume, arable land, and anthropogenic disturbance index, to evaluate the human contribution. The meteorological data were sourced from the National Meteorological Information Center (http://data.cma.cn/, accessed on 5 February 2023) and were made up of the five indicators of temperature, days of sandy wind conditions, evaporation, precipitation, and humidity.

3. Model Construction

3.1. U-Net

U-Net is a fully convolutional neural network based on a symmetric encoder–decoder framework that is improved with a fully convolutional network (FCN) [20,27]. Composed of a compression path, an expansion path, and skip connections, the U-Net model is extensively used in remote sensing detection [28,29,30,31,32] due to its simple structure, requirement for fewer data samples, and high segmentation accuracy. The compression path performs four 3 × 3 convolutions and 2 × 2 maximum pooling operations, which are down-sampling operations, to obtain a high-dimensional feature pyramid for image information capture. The expansion path performs four deconvolution and up-sampling operations to precisely locate the segmentation results. U-Net also replaces the summation form of the FCN with skip connections. The details and spatial dimensions of the object of interest are gradually repaired to obtain shallower convolutional layer features, which are advantageous for image segmentation and obtaining a segmentation result of the same size as the input image. Finally, the 1 × 1 convolutional layers are connected for dimensionality reduction to generate the image feature map.

3.2. Deformable Convolution

Due to the fixed geometric structure of the building blocks, a CNN only learns features at fixed locations for the input image, which cannot be derived precisely using the complex and variable feature outline information of the input image. In 2017, Dai et al. [33] proposed deformable convolution, which allows the convolution kernel to be arbitrarily deformed during training by adding offsets to the sampling locations, in contrast to fixed-grid sampling. It has the same input and output as the standard version, which allows it to address the problem of spatial deformation more effectively and enhance the capability of feature extraction. A comparison of deformable convolution and standard convolution is provided in Figure 2. The dots in Figure 2a represent the sampling positions of regular convolution, Figure 2b represents the deformed sampling points with offset in deformable convolution, and Figure 2c,d are special cases of Figure 2b, indicating that the deformable convolution can be scaled, rotated, and subject to various transformations with different aspect ratios. In addition, the size of its perceptual field varies based on the sampling point positions, further demonstrating the ability of deformable convolution to adapt to the spatial deformation of the target.

In order to adjust the input size and channel, convolution (Conv) is performed on the input to obtain the convolutional feature map, and the feature map is fed into two parallel branches: one branch learns the input image features to obtain the offset, and the sampled offset is used with another parallel standard convolution branch for deformable convolution (Figure 3). Deformable convolution: The convolution kernel is not actually a variable convolution kernel, but the coordinate value with the offset obtained after convolution is changed. Before the image is convolved, the pixels are re-integrated to realize the expansion of the convolution kernel. The convolution kernel is offset to the sampled points of the input feature map and focused on the region or target of interest, improving the model capability for deformable images. The convolutional layer is responsible for extracting local information to preserve the underlying details, where different image targets have different offset sizes obtained by convolutional learning. In deformable convolution, the convolution operation is two-dimensional (2N) and conducted on the same channel. N denotes the number of sampling points in the convolution kernel (e.g., N = 9 in the 3 × 3 convolution kernel). Each sampling point is displaced along both H and W, and the size of the convolution kernel and the expansion rate of the parallel convolutional layer are set to the same value. The offset obtained using convolution to predict the offset is typically a small number, and feature extraction is accomplished by backpropagation utilizing a bilinear interpolation algorithm [34]. The output of the deformable convolution is then convolved a second time to produce the output after feature concatenation with the same dimensions and channels as the input.

3.3. Y-Net

On account of the blurred boundaries and unsatisfactory contrast of desert land in Landsat remote sensing images, and the fact that U-Net is insensitive to the detailed features of desert land, it is difficult to achieve an accurate judgment. In view of this, we developed the Y-Net model, which retains the symmetrical structure of the original U-Net model while incorporating deformable convolution to improve the network’s ability to learn the spatial variation of the target object. The Y-Net model can obtain an adaptive perceptual field according to the target’s size and shape, based on fully considering the characteristics of desert land.

The Y-Net model consists of an input layer, a hidden layer, and an output layer, and the hidden layer can be divided into feature extraction and up-sampling parts, which are responsible for the network path contraction. The Y-Net model structure is shown in Figure 4. The input image is passed through a deformable convolutional layer with a convolution kernel size of 3 × 3 and a rectified linear unit (ReLU) activation function, which provides a stable and flexible receptive domain for the network to capture the global information by learning the initial image features from the target contour, as shown in Equation (1). Y-Net adopts U-Net’s strategy for increasing the number of feature maps while lowering the scale of the feature maps in the spatial structure. After the deformable convolution module is learned, the image is down-sampled with the max pooling layer, and the higher-order semantic feature information of the desert land in the input image is obtained with four deconvolution and up-sampling sequences, as shown in Equation (2).

The up-sampling section is made up of an up-sampling layer and a deconvolutional layer, which play the role of path expansion and locate pixel points in the network to recover the image resolution and restore the details and spatial dimension of the objects by combining the deformable convolution module to reinforce the learned low-order features (i.e., position, texture). The low-order feature map of the desert land is recovered to the same size as the input image after four times up-sampling, and the number of channels of the same scale corresponding to the feature extraction part and the up-sampling process are fused so that the low-order location information is fused with the high-order semantic information and an end-to-end network is formed. Using the sigmoid activation function, the network classifies each pixel point in the up-sampled feature map to derive the desert land information in the image. In short, the input and output images are of the same dimensions, enabling not only the classification of objects with high-level features but also the precise segmentation of objects with low-level features [35].

X (p) = \sum_{y = 0}^{r} (\sum_{x = 0}^{k} I (x - a, b \cdot M (x, y)))

(1)

where

X (p)

denotes the point

(x, y)

of the image

I

of the certain band of the raw zremote sensing image of desert land in the output preliminary feature image, with the image value corresponding to

p

is the output feature image point;

r

is the width of the image;

I

is the image of a certain band of the input raw remote sensing image of desert land;

a

and

b

are the offsets of a certain band

I

of the input raw remote sensing image of desert land in the horizontal and vertical directions;

x

and

y

are the horizontal and vertical coordinate values of a point in

I

, respectively; and

M

is the convolution kernel;

Y (p) = \sum_{k = 1}^{k} W_{k} \cdot X (p_{0} + s_{k} \cdot p_{k} + (1 - c_{k}) \cdot Δ p_{k} \cdot (1 - s_{k}) \cdot c_{k} \cdot Δ m_{k})

(2)

where

Y (p)

denotes the higher-order feature image value of the output at point

p

;

W_{k}

is the convolution kernel weight coefficient for grid point k; grid point k denotes the sampling location;

p_{0}

is the original sampling position;

s_{k}

denotes the learning rate of

p_{k}

;

p_{k}

denotes the position of the sampled point in the adaptive deformable convolution kernel;

c_{k}

denotes the learning rate of the adjustable quantity

Δ m_{k}

,

Δ p_{k}

denotes the offset in the deformable convolution; and

Δ m_{k}

denotes the adjustable amount;

s_{k}

∈ [0, 1],

Δ m_{k}

∈ [0, 1],

c_{k}

∈ [0, 1].

3.4. Performance Evaluation Indicators

To quantitatively evaluate the performance of the model, four metrics were employed in this study to evaluate the recognition results: IoU, Precision, Recall, and F1-score. The formulations for these metrics are provided in Equations (3)–(6). IoU is the most frequently employed performance evaluation method in image segmentation. It is the ratio of intersection and merge between desert land pixels extracted by the model and desert land pixels in the true value labels. The IoU is a measure of the degree of correlation between true and predicted ranges. The greater the IoU, the closer the match between the measured and actual samples, and the more precise the model. Precision is the degree of accuracy of the positive sample prediction results, that is, the probability that the sample that was predicted to be positive is actually positive; the higher the precision, the more accurate the model retrieval. Recall is the proportion of correctly predicted positive samples to the actual positive samples, that is, the probability of being predicted to be positive in a sample that is actually positive; the higher the recall, the more comprehensive the model retrieval. The F1-score is used to measure the accuracy of the Y-Net model’s prediction. The F1-score can be an effective synergy between the precision rate and recall rate, serving as a comprehensive index for both when evaluating a model with various prediction values. In this study, the desert land images obtained through the Y-Net model extraction method were binarized with the standard desert land contour images. This was done so that each sand-covered area was 255 pixels and the image background was 0 pixels, and the pixel-level comparison result of the two images was used as the evaluation index.

I o U = \frac{T P}{T P + F P + F N}

(3)

\Pr e c i s i o n = \frac{T P}{T P + F P}

(4)

Re c a l l = \frac{T P}{T P + F N}

(5)

F 1 - Score = 2 \times \frac{\Pr e c i s i o n \times Re c a l l}{\Pr e c i s i o n + Re c a l l}

(6)

In these formulas, TP (true positives) are the samples that are correctly identified as desert land; FN (false negatives) are the samples that are mistaken as background; TN (true negatives) are the samples that are correctly identified as background; and FP (false positives) are the samples that are misidentified as desert land.

4. Experiments and Analysis

4.1. Desert Land Intelligent Interpretation Dataset

The quality and quantity of the training datasets have a direct impact on the training and prediction accuracy of deep learning models. Currently, the publicly available desert land datasets are limited in quantity, consist of single scenes and categories, and are based solely on visible images. The proposed approach is based on multispectral image bands using a weighted reconstruction method for the band combination, where the bands with a high sensitivity to desert land are given high weights to maximize the information utilization of multispectral image bands. This is done to address the fact that sandy samples constructed from the existing visible image datasets are more similar to the background features. We utilized Landsat multispectral images and investigated the optimal band combination for constructing an intelligent interpretation dataset for desert land.

The dataset was constructed via weighted fusion based on the pixel-level band characteristics, concentrating on the relationships and combinations between spectral bands [36,37,38]. The results of the weighted fusion contain more spectral information than the original image, which can significantly improve the accuracy and reliability of image interpretation, classification, object recognition, and change detection, while also reducing the interference of invalid bands, maximizing the utilization of image information, highlighting the target features, and improving the accuracy of the feature extraction. We aimed to resolve the issues of the color characteristics of desert land in remote sensing images being more similar to the characteristics of the background due to the influence of topography and dune shadows, and the fact that the relative luminance of sand dunes in the images is variable.

The principal component analysis method is extensively used in multispectral image band selection tasks, as the color combination of the first three principal component images can display as much information as possible [39,40]. The original image was preprocessed using a combination of the principal component analysis band combination method and band-weighted reconstruction to improve the image spectral difference and model extraction accuracy.

The remote sensing reflectance of different types of desert land differs [41]. To achieve an objective evaluation of the sensitivity of the desert land extraction results and to account for the correlation among the bands, we calculated and compared the performance indicator (F1-score) for desert land extraction after training each band individually, as shown in Table 1, to obtain the magnitude of the sensitivity of each band to desert land: B5 > B4 > B1 > B6 > B3 > B7 > B2, so we assigned B5/B4/B1 as the three color channels [41,42].

After image weighting and reconstruction, a new image was generated based on the weighting of the F1-score indicators for B5/B4/B1. The inverse of the variance weighting procedure was selected in the combined prediction model to determine the image band weights [43], and the weights for the three aforementioned bands were then calculated. The inverse of variance weighting method determines the weights based on the magnitude of the error sum of squares, where a high weight is assigned to the small error sum of squares values, as shown in Equations (7) and (8). The corresponding B5/B4/B1 weighting coefficients were calculated as 0.6:0.9:1, and then the image weighting reconstruction was performed using Equation (9).

Q_{t} = {\sum_{i = 1}^{n} (1 - {\hat{y}}_{t} (i))}^{2}

(7)

W_{t} = \frac{Q_{t}^{- 1}}{\sum_{i = 1}^{m} Q_{t}^{- 1}}

(8)

Y = \sum_{t = 1}^{m} A_{t} W_{t}

(9)

where

W_{t}

is the corresponding weight of the t-band image;

Q_{t}

is the sum of squares of the differences between the true and predicted values of desert land information in the t-band image;

{\hat{y}}_{t} (i)

is the degree of accuracy of the extraction of the ith desert land test set in the t-band image; m represents the number of bands;

A_{t}

denotes the remote sensing image of the tth band; and

Y

is the reconstructed remote sensing image [44].

We developed a band combination multispectral desert land intelligent interpretation dataset based on the weighted reconstruction method, which maximizes the use of desert land information sensitive bands to increase the spectral differentiation between desert land and background features. The original dimension of the multispectral image acquired by Landsat was 8031 × 7371 pixels. To increase the training speed, the original imagery was cropped with the desert land centered neighborhood, and the cropped imagery was adjusted to 512 × 512 pixels using bilinear interpolation. In the dataset, a total of 11,672 samples, including 7343 positive samples and 4329 negative samples, were generated based on cloud cover, luminance and obscurity, and background complexity (Table 2).

4.2. Model Training

The Y-Net model was created using Python 3.7.0 as the development environment and the PyTorch 1.7.1 deep learning framework. The network model was trained using an NVIDIA GeForce RTX 3090 Ti GPU. Following a series of experiments on the model’s operation efficiency, result accuracy, and hardware, the final epoch number was set to 200, the experimental batch size was set to 8, and the Adam optimizer with an initial learning rate of 10⁻⁴ was selected as the optimizer.

4.3. Model Comparison

The robustness of the proposed method for desert land extraction in natural environments was evaluated using the test set of 1412 desert land samples. We performed ablation experiments on U-Net, weighted U-Net (W-U-Net), Y-Net, and weighted Y-Net (W-U-Net) models (where the original image was the original desert land image with normal B5/B4/B1 spectral fusion, and the W-image was the B5/B4/B1 weighted desert land image). The exact labeling results were used as the ground-truth labels, and comparison graphs of the image quality were constructed. The truth value is the true label of the image.

Table 3 depicts the recognition effect of each network, indicating that the method presented in this paper enhances the extraction of desert land areas. As shown in Table 4, the prediction of 1412 Landsat multispectral desert land images for two different test sets, weighted and unweighted, by the proposed method and the classical U-Net model, demonstrates that the proposed method is superior in terms of IoU, Recall, Accuracy, and F1-score. The maximum enhancement of the F1-score of the network presented in this paper over the classic U-Net model is 11.5%. When compared to W-Y-Net, W-U-Net has a lower accuracy but a shorter prediction time, demonstrating that the model in this research is more sophisticated yet has a substantial prediction effect. All U-Net indices and projections are the lowest, showing that the model has the most basic structure yet the least sensitivity to sandy soil. Y-Net has a shorter prediction time and worse accuracy than W-Y-Net, showing that the weighted image features are highlighted and the model prediction is more complicated. The preceding data demonstrate that the proposed strategy has a positive impact.

5. Discussion

5.1. Weighted Band Combination Methods

In deep learning, the metric between the model inference results for the validation set data and the true value is referred to as the loss. We fed the training set images into the model, and the metric between the obtained desert land extraction results and the sample labels was the extraction error, which is expressed as a 1−F1-score value. Based on this, we introduced the inverse of the variance weighting technique [43] into the band weighting process. This method employs the sum of squares of the extraction error as a metric that reflects the sensitivity of the various bands to desert land characteristics. The extraction error is the difference between the model extraction result and the true value in the test set images, and the larger the sum of squares of the extraction error, the less sensitive the band is to the desert land.

In the meantime, selecting the optimal band combination for multispectral data is a crucial prerequisite for image interpretation and thematic information extraction [41], and the preferred feature-sensitive bands typically have a better performance than standard band color images in comparison with conventional optical images [45,46]. The multispectral information is the remote sensing reflectance of the different bands of different ground objects collected by the sensor. In the process of remote sensing image interpretation, with the increase in the number of sensitive bands of the target ground objects, the ability to extract ground objects at this image resolution becomes stronger. However, at the same time, the choice of multiple spectral bands will also increase the amount of computation. In order to enhance the model operation while also taking into account the correlation, visibility, and data volume between bands, we selected the three bands (B5/B4/B1) with the most prominent sensitivity to desert land for the band combination [47].

While the existing common multispectral image fusion methods use the spatial and spectral characteristics of unprocessed panchromatic and multispectral image bands to determine the optimal parameters, for the same image, the spatial and spectral characteristics of unprocessed panchromatic and multispectral image bands can vary greatly. Since the resolution of each band is not the same, it is not appropriate to give each band the same weight in the selection process [48]. Using the same parameters, their fusion outcomes would also be suboptimal [46]. In the process of band weighting reconstruction, we designated corresponding weighting coefficients to the three bands of B5/B4/B1 based on their sensitivity to the desert land in the imagery, so as to accomplish the goal of balancing the overall image characteristics.

5.2. Multiyear Changes in Desert Land Area

To acquire the evolution trend of the desert land area in the study area from 1990 to 2020, we isolated the desert land area of six time phases in the Inner Mongolia region of China during the study period, as depicted in Figure 5. The results indicate that the overall desert land area in the Inner Mongolia region increased from 1990 to 2000. From 2000 to 2020, the desert land area in the study area decreased consistently, and the land desertification was reversed. Due to the intensive land cultivation in Alashan League, Ordos, and Wuhai in western Inner Mongolia, which contributes to a relatively vulnerable ecological environment, the area of desert land continued to expand from 1990 to 2000 [49], as shown in Figure 6. With the introduction of sand prevention and control policies after the year 2000, part of the land was restored, and the desertification process began to reverse. As shown in Figure 6c–f, a large amount of desert land recovered to usable land from 2000 to 2010. The main reason for this is that policies such as enclosure, grazing prohibition, and conversion of farmland to forest (grassland), as well as the coupling effect of climate conditions with moisture change, has accelerated the change of desert land degree in the western Alxa and Ordos regions, and the desert land was improved, to some extent [50]. From 2010 to 2020, a continuous and stable recovery state appeared, and the degree of desertification was clearly improved, indicating that sand control policies and ecosystem restoration projects have made some progress in the study area [51].

5.3. Driving Force Analysis

The SEM [52] is a model for constructing, evaluating, and assessing causal relationships between variables. Intermediate variables can be added to determine the degree of direct or indirect influence of different factors on desert land area [53]. To quantify and evaluate the coupled influence of multiple natural and human factors on the change of desert land area, we constructed a SEM for multiple variables, i.e., desert land area, natural geographical elements, and socio-economic factors in the study area, and characterized the degree of contribution of the various influencing factors to the change of desert land area using the path coefficients derived from the model. As depicted in Figure 6, a path diagram model was developed between the desert land area and the nine observed variables of population density (PD), livestock volume (LS), evaporation (Evp), temperature (T), days of sandy wind conditions (LD), humidity (RH), precipitation (P), anthropogenic disturbance index (Adi), and cultivated land (CL) [54]. The anthropogenic disturbance index evaluates the degree to which human activities have an impact on natural ecosystems and is divided into four levels. The disturbance index is 0 for natural unused lands such as Gobi and desert land, 1 for natural unused lands such as woodland and grassland, 2 for anthropogenic regenerated lands such as cropland and pastureland, and 3 for artificial surfaces such as urban construction land, as shown in Equation (10).

D = \frac{\sum_{i = 0}^{3} A_{i} \times P_{i}}{3 \sum_{i = 1}^{n} P_{i}}

(10)

where

A_{i}

is the disturbance index of the ith level and

P_{i}

is the area of the ith type of land.

In Figure 7, the observed variables are depicted by rectangular black frames, and the solid red arrows indicate the causal relationship between the two variables. The fitted model has a p-value of 0.599, where p ranges between 0.05 and 1, and the closer p is to 1, the better the model fit. We evaluated the model fit and established that the determined model structure was capable of reflecting the intricate relationship between the nine observed variables and the sandy land area. Clearly, five of the observed variables, i.e., humidity, evapotranspiration, days of sandy wind conditions, the anthropogenic disturbance index, and precipitation, can directly affect the change of desert land area. Temperature, livestock volume, arable land, and population density can indirectly affect the desert land area through the direct variables. The following describes the specific performance of the aforementioned variables during the various time periods in the study area. From 1990 to 2000, the change in desert land area in the study area was caused by an increase in temperature, which led to a decrease in humidity, which in turn led to an increase in days of sandy wind conditions and promoted the continuous expansion of the desert land area.

Meanwhile, as population density increased in the area, a large amount of forest and grassland was cleared for cultivation and pasture, and the anthropogenic disturbance index increased year by year during the study period. As the area of vegetation cover in the study area decreased, the surface roughness decreased, resulting in an increase in the area of desert land [49].

During the study period of 2000–2010, a series of sand control policies and ecosystem restoration projects implemented in the study area from around 2000 caused the climate in the region to begin to turn wet and precipitation to increase [50,51], which is conducive to the growth and recovery of vegetation, thereby reducing the number of sandy wind condition days in the region and controlling the mobility and expansion of land sanding [53,55].

From 2010 to 2020, as a result of the continued development of sand control and sand management in the region and the implementation of ecological projects, the climate became wetter, accelerating the growth of vegetation and the recovery of degraded land [50,56]. The vegetation cover increased, and the decrease in the anthropogenic disturbance index further decreased the area of desert land in the study area.

Comparing the total path coefficients to the correlation coefficients, Table 5 reveals that the anthropogenic disturbance index, precipitation, humidity, and cropland are the primary determinants of desert land area change, and despite the fact that the Pearson correlation coefficient (r) is greater for the days of sandy wind conditions, cropland, and the anthropogenic disturbance index, the difference between the total path coefficients and the correlation coefficients is large. The main cause of the difference between the total path coefficients and the correlation coefficients is the coupling effect between the drivers of the anthropogenic disturbance index, i.e., precipitation, days of sandy wind conditions, and cultivated land.

Desert land area change is a consequence of the coupling effect of climate change and human activities [57]. In this study, we quantitatively evaluated the interaction between climate change and human activities using the SEM results. As depicted in Figure 8, in terms of the combined effect, the anthropogenic factor is the most direct factor of desert land area change, with a total effect coefficient of 0.392. Human activities such as socio-economic development and ecosystem restoration projects have also played a leading role in desert land area change in the study area during the different historical periods. Other related issues, such as overgrazing or a grazing ban, excessive land reclamation, and ecological restoration projects, have also had significant effects on the sand dunes in Inner Mongolia [58], and the literature [51] has confirmed the efficacy of ecological engineering in enhancing sand dunes, which supports the aforementioned findings.

The influence of human activities on climate through the partial least squares path analysis amounts to 0.687, highlighting the close relationship between climate change and human activities. The anthropogenic factors have influenced the process of desert land change by altering the climate of the study area. Consequently, ecosystem restoration projects, such as afforestation to promote environmental conversion to wet conditions, have played a significant role in enhancing the desert land change environment in northern China. From the year 2000, the relevant ecological policies to convert pasture and arable land into grassland and woodland have been favorable strategies for the restoration of vegetation cover in the study area, thereby influencing the humidity of the regional climate through water conservation [59] and creating climatically favorable conditions for the recovery and reversal of desert land. In contrast, natural factors have had a negligible effect, with an overall impact of −0.25. Since there is a strong negative correlation between humidity and desert land area, the contribution of humidity and precipitation to desert land area change accounts for the majority of the total effect of natural factors on desert land area change. In particular, as precipitation and humidity levels rise, the climate in the area becomes more conducive to the recovery and development of vegetation, which has a containing effect on the desert land area. As shown in Figure 8 and Table 6, human activities play a dominant role in the process of desert land area change, but the interrelationship between human activities and climate change is also crucial to the impact of desert land area change [60]. The 30-year sediment change process in the study area has been driven primarily by human activities and secondarily by anthropogenic climate change.

6. Conclusions

In this paper, we proposed the multispectral remote sensing image desert land extraction model—Y-Net—that takes into account the band information and evaluates the spatial and temporal patterns using structural equation modeling. We evaluated the spatial and temporal patterns of desert land changes and their drivers in the Inner Mongolia Autonomous Region from 1990 to 2020 and reached the following conclusions.

Combining the inverse of variance weighting method with the accuracy degree of information extraction from each band of the desert land remote sensing images allowed us to determine the weight value of each band of the desert land remote sensing images. The multispectral remote sensing images of the desert land area were weighted and reconstructed to generate new desert land remote sensing images and a dataset that provides new data support for desert land extraction from remote sensing images using deep learning. By incorporating deformable convolution into the U-Net model to improve its performance, the model’s adaptability to the issue of low extraction accuracy due to the irregular shape and unequal demarcation of desert land images was enhanced.

In the extraction task of irregular desert land in complex scenes, the extraction accuracy in this study was 95.1%, which is 11.5% better than that of U-Net. We also obtained the spatial and temporal distribution of desert land in the study area from 1990 to 2020 based on the Y-Net model and quantitatively estimated the driving factors causing the spatial and temporal evolution of desert land in Inner Mongolia. The results indicated that the desert land area in the study area has decreased continuously since 2000, and anthropogenic disturbance and humidity have been the two most influential anthropogenic and natural factors influencing the change of desert land area. Moreover, the coupling effect of human activities and climate change was found to be closely related to the change in desert land area, with climate change constituting a significant background factor in the desert land change process.

In this paper, we presented a method for intelligently extracting band information from multispectral remote sensing images of desert land. However, future improvement of the expert knowledge through field validation and verification of the precision of the constructed data samples will be required. Moreover, the method proposed in this paper is restricted to Landsat multispectral remote sensing images. Data reconstruction and sand extraction using other images will yield results that differ from those presented here. In addition, the structure of the SEM is not unique, and the interaction of the various influencing factors on the desert land area should be examined to obtain a more precise analysis.

Author Contributions

Conceptualization, B.Z. and X.C.; methodology, B.Z., X.C. and H.Z.; software, X.C. and H.Z.; validation, X.C. and H.Z.; writing—original draft preparation, X.C. and B.Z.; writing—review and editing, B.Z., X.C. and D.R.; visualization, H.Z. and X.C.; project administration, B.Z., X.C. and H.Z.; funding acquisition, X.C., W.S. and J.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 42071343, 42071428 and 42204031.

Data Availability Statement

Not applicable.

Acknowledgments

We are very grateful to all the reviewers, institutions, and researchers for their help and advice on this work.

Conflicts of Interest

All authors declare that they have no conflict of interest.

References

Neely, C.; Bunning, S. Review of evidence on drylands pastoral systems and climate change. Land Water Discuss. Pap. 2009, 6, 103. [Google Scholar]
United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development, 70/1. A/RES/; United Nations: New York, NY, USA, 2015. [Google Scholar]
UNCCD. The Global Land Outlook, 1st ed.; United Nations Convention to Combat Desertification: Bonn, Germany, 2017. [Google Scholar]
PNUMA. Status of Desertification and Implementation of the United Nations Plan of Action to Combat Desertification: Report of the Executive Director. 1991. Available online: https://xueshu.baidu.com/usercenter/paper/show?paperid=4dc7173b5b02abe2fbc0cebcb0c92331&site=xueshu_se (accessed on 8 June 2023).
Wang, T.; Yan, C.; Song, X.; Xie, J. Monitoring recent trends in the area of aeolian desertified land using Landsat images in China’s Xinjiang region. ISPRS J. Photogramm. Remote. Sens. 2012, 68, 184–190. [Google Scholar] [CrossRef]
Wang, H.; Ma, M.; Geng, L. Monitoring the recent trend of aeolian desertification using Landsat TM and Landsat 8 imagery on the north-east Qinghai–Tibet Plateau in the Qinghai Lake basin. Nat. Hazards 2015, 79, 1753–1772. [Google Scholar] [CrossRef]
Zhang, F.; Tiyip, T.; Feng, Z.D.; Kung, H.T.; Johnson, V.C.; Ding, J.L.; Tashpolat, N.; Sawut, M.; Gui, D.W. Spatio-Temporal Patterns of Land Use/Cover Changes Over the Past 20 Years in the Middle Reaches of the Tarim River, Xinjiang, China. Land Degrad. Dev. 2015, 26, 284–299. [Google Scholar] [CrossRef]
Fathizad, H.; Ardakani, M.A.H.; Mehrjardi, R.T.; Sodaiezadeh, H. Evaluating desertification using remote sensing technique and object-oriented classification algorithm in the Iranian central desert. J. Afr. Earth Sci. 2018, 145, 115–130. [Google Scholar] [CrossRef]
Levin, N.; Ben-Dor, E.; Karnieli, A. Topographic information of sand dunes as extracted from shading effects using Landsat images. Remote Sens. Environ. 2004, 90, 190–209. [Google Scholar] [CrossRef]
Rivera-Marin, D.; Dash, J.; Ogutu, B. The use of remote sensing for desertification studies: A review. J. Arid. Environ. 2022, 206, 104829. [Google Scholar] [CrossRef]
Helld´en, U.; Tottrup, C. Regional desertification: A global synthesis. Glob. Planet. Chang. 2008, 64, 169–176. [Google Scholar] [CrossRef]
Duan, H.C.; Wang, T.; Xue, X.; Liu, S.L.; Guo, J. Dynamics of aeolian desertification and its driving forces in the Horqin desert land, Northern China. Environ. Monit. Assess. 2014, 186, 6083–6096. [Google Scholar] [CrossRef] [PubMed]
Chasek, P.; Akhtar-Schuster, M.; Orr, B.J.; Luise, A.; Ratsimba, H.R.; Safriel, U. Land degradation neutrality: The science-policy interface from the UNCCD to national implementation. Environ. Sci. Policy 2019, 92, 182–190. [Google Scholar] [CrossRef]
Hanan, N.P.; Prevost, Y.; Diouf, A.; Diallo, O. Assessment of desertification around deep wells in the Sahel using satellite imagery. J. Appl. Ecol. 1991, 28, 173–186. [Google Scholar] [CrossRef]
Wang, J.; Ding, J.; Yu, D.; Ma, X.; Zhang, Z.; Ge, X.; Teng, D.; Li, X.; Liang, J.; Guo, Y.; et al. Machine learning-based detection of soil salinity in an arid desert region, Northwest China: A comparison between Landsat-8 OLI and Sentinel-2 MSI. Sci. Total Environ. 2020, 707, 136092. [Google Scholar] [CrossRef]
Li, J.; Zhao, L.; Xu, B.; Yang, X.; Jin, Y.; Gao, T.; Yu, H.; Zhao, F.; Ma, H.; Qin, Z. Spatiotemporal variations in grassland desertification based on Landsat images and spectral mixture analysis in Yanchi county of Ningxia, China. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2014, 7, 4393–4402. [Google Scholar] [CrossRef]
Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding convolution for semantic segmentation. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1451–1460. [Google Scholar]
Mikulane, S.; Siegmund, A.; del Río, C.; Koch, M.A.; Osses, P.; García, J.-L. Remote sensing based mapping of Tillandsia—A semi-automatic detection approach in the hyperarid coastal Atacama Desert, northern Chile. J. Arid. Environ. 2022, 205, 104821. [Google Scholar] [CrossRef]
Zhang, D.; Gade, M.; Zhang, J. SOFNet: SAR-optical fusion network for land cover classification. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 2409–2412. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Chen, F.; Wu, F.; Xu, J.; Gao, G.; Ge, Q.; Jing, X.-Y. Adaptive deformable convolutional network. Neurocomputing 2021, 453, 853–864. [Google Scholar] [CrossRef]
Li, D.; Li, Y.; Sun, H.; Yu, L. Deep image compression based on multi-scale deformable convolution. J. Vis. Commun. Image Represent. 2022, 87, 103573. [Google Scholar] [CrossRef]
Zhao, S.; Zhang, S.; Lu, J.; Wang, H.; Feng, Y.; Shi, C.; Li, D.; Zhao, R. A lightweight dead fish detection method based on deformable convolution and YOLOV4. Comput. Electron. Agric. 2022, 198, 107098. [Google Scholar] [CrossRef]
Shen, N.; Wang, Z.; Li, J.; Gao, H.; Lu, W.; Hu, P.; Feng, L. Multi-organ segmentation network for abdominal CT images based on spatial attention and deformable convolution. Expert Syst. Appl. 2023, 211, 118625. [Google Scholar] [CrossRef]
Bai, Z.; Han, L.; Jiang, X.; Liu, M.; Li, L.; Liu, H.; Lu, J. Spatiotemporal evolution of desertification based on integrated remote sensing indices in Duolun County, Inner Mongolia. Ecol. Inform. 2022, 70, 101750. [Google Scholar]
Gao, F.; Li, Y.; Zhang, P.; Zhai, Y.; Zhang, Y.; Yang, Y.; An, Y. A high-resolution panchromatic-multispectral satellite image fusion method assisted with building segmentation. Comput. Geosci. 2022, 168, 105219. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 640–651. [Google Scholar]
Wang, N.; Cheng, J.; Zhang, H.; Cao, H.; Liu, J. Application of U-net model in water extraction from high-resolution remote sensing images. Remote Sens. Land Resour. 2020, 32, 35–42. [Google Scholar]
Zhang, D.; Gade, M.; Zhang, J. SOF-UNet: SAR and Optical Fusion Unet for Land Cover Classification. In Proceedings of the IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022. [Google Scholar]
Pucino, N.; Kennedy, D.M.; Young, M.; Ierodiaconou, D. Assessing the accuracy of Sentinel-2 instantaneous subpixel shorelines using synchronous UAV ground truth surveys. Remote Sens. Environ. 2022, 282, 113293. [Google Scholar] [CrossRef]
Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
Li, Y.; Zheng, H.; Luo, G. Extraction and counting of Populus euphratica tree canopy from UAV images with integrated U-Net method. Remote Sens. Technol. Appl. 2019, 34, 939–949. [Google Scholar]
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
Wu, Y.; Zhang, J.; Li, Y.; Huang, K. Research on Building Cluster Recognition Based on Improved U-Net. Remote Sens. Land Resour. 2021, 33, 1. [Google Scholar]
Jin, Q.; Meng, Z.; Pham, T.D.; Chen, Q.; Wei, L.; Su, R. DUNet: A deformable network for retinal vessel segmentation. Knowl. Based Syst. 2019, 178, 149–162. [Google Scholar] [CrossRef] [Green Version]
Metwalli, M.R.; Nasr, A.H.; Allah, O.S.F.; El-Rabaie, S. Image fusion based on principal component analysis and high-pass filter. In Proceedings of the 2009 International Conference on Computer Engineering & Systems, Cairo, Egypt, 14–16 December 2009. [Google Scholar]
Pandit, V.R.; Bhiwani, R.J. Image Fusion in Remote Sensing Applications: A Review. Int. J. Comput. Appl. 2015, 120, 22–32. [Google Scholar]
Liu, S.; Zheng, Y.; Du, Q.; Samat, A.; Tong, X.; Dalponte, M. A novel feature fusion approach for VHR remote sensing image classification. IEEE J. Sel. Top. Appl. Earth Obs. Rem. Sens. 2020, 14, 464–473. [Google Scholar] [CrossRef]
Merembeck, B.F.; Borden, F.Y.; Podwysocki, M.H.; Applegate, D.N. Application of canonical analysis to multispectral scanner data. In Proceedings of the 14th Annual Symposium on Computer Applications in the Mineral Industries, Society of Mining Engineers, American Institute in Mining, Metallurgical and Petroleum Engineers, New York, NY, USA, 24 March 1977. [Google Scholar]
Taylor, M.M. Principal components color display of ERTS imagery. In Third Earth Resources Technology Satellite Symposium; NASA: Washington, DC, USA, 1974. [Google Scholar]
Sheffield, C. Selecting band combinations from multispectral data. Photogramm. Eng. Remote Sens. 1985, 51, 681–687. [Google Scholar]
Chavez, P.S.; Bowell, J.A. Image processing techniques for Thematic Mapper data. Proc. ASPRS-ACSM Tech. Pap. 1984, 2, 728–742. [Google Scholar]
Duan, X. Research on prediction of slope displacement based on a weighted combination forecasting model. Results Eng. 2023, 18, 101013. [Google Scholar] [CrossRef]
Zhang, Z.; Ding, J.; Zhu, C.; Wang, J. Combination of efficient signal pre-processing and optimal band combination algorithm to predict soil organic matter through visible and near-infrared spectra. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2020, 240, 118553. [Google Scholar] [CrossRef]
Hong, Y.; Chen, S.; Chen, Y.; Linderman, M.; Mouazen, A.M.; Liu, Y.; Guo, L.; Yu, L.; Liu, Y.; Cheng, H.; et al. Comparing laboratory and airborne hyperspectral data for the estimation and mapping of topsoil organic carbon: Feature selection coupled with random forest. Soil Tillage Res. 2020, 199, 104589. [Google Scholar] [CrossRef]
Paisley, E.C.; Lancaster, N.; Gaddis, L.R.; Greeley, R. Discrimination of active and inactive sand from remote sensing: Kelso dunes, Mojave desert, California. Remote. Sens. Environ. 1991, 37, 153–166. [Google Scholar] [CrossRef]
Zicari, P.; Folino, G.; Guarascio, M.; Pontieri, L. Discovering accurate deep learning based predictive models for automatic customer support ticket classification. In Proceedings of the SAC’21: The 36th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, 22–26 March 2021. [Google Scholar]
Niño-Adan, I.; Manjarres, D.; Landa-Torres, I.; Portillo, E. Feature weighting methods: A review. Expert Syst. Appl. 2021, 184, 115424. [Google Scholar] [CrossRef]
Xu, D.; Wang, Z. Identifying land restoration regions and their driving mechanisms in inner Mongolia, China from 1981 to 2010. J. Arid. Environ. 2019, 167, 79–86. [Google Scholar] [CrossRef]
Zhao, L.; Jia, K.; Liu, X.; Li, J.; Xia, M. Assessment of land degradation in Inner Mongolia between 2000 and 2020 based on remote sensing data. Geogr. Sustain. 2023, 4, 100–111. [Google Scholar] [CrossRef]
Liang, P.; Yang, X. Landscape spatial patterns in the Maowusu (Mu Us) desert land, northern China and their impact factors. Catena 2016, 145, 321–333. [Google Scholar] [CrossRef]
Zhu, H.; Zhang, B.; Song, W.; Dai, J.; Lan, X.; Chang, X. Power-Weighted Prediction of Photovoltaic Power Generation in the Context of Structural Equation Modeling. Sustainability 2023, 15, 10808. [Google Scholar] [CrossRef]
Kawabata, A.; Ichii, K.; Yamaguchi, Y. Global monitoring of interannual changes in vegetation activities using NDVI and its relationships to temperature and precipitation. Int. J. Remote Sens. 2001, 22, 1377–1382. [Google Scholar] [CrossRef]
Feng, K.; Wang, T.; Liu, S.; Yan, C.; Kang, W.; Chen, X.; Guo, Z. Path analysis model to identify and analyse the causes of aeolian desertification in Mu Us Sandy Land, China. Ecol. Indic. 2021, 124, 107386. [Google Scholar] [CrossRef]
Kozuchowski, K. Contemporary changes of climate in Poland: Trends and variation in thermal and solar conditions related to plant vegetation. Pol. J. Ecol. 2005, 53, 283–297. [Google Scholar]
Zhou, W.; Gang, C.; Zhou, F.; Li, J.; Dong, X.; Zhao, C. Quantitative assessment of the individual contribution of climate and human factors to desertification in northwest China using net primary productivity as an indicator. Ecol. Indic. 2015, 48, 560–569. [Google Scholar] [CrossRef]
Deng, L.; Shangguan, Z.-P.; Li, R. Effects of the grain-for-green program on soil erosion in China. Int. J. Sediment Res. 2012, 27, 120–127. [Google Scholar] [CrossRef]
Huang, L.; Xiao, T.; Zhao, Z.; Sun, C.; Liu, J.; Shao, Q.; Fan, J.; Wang, J. Effects of grassland restoration programs on ecosystems in arid and semiarid China. J. Environ. Manag. 2013, 117, 268–275. [Google Scholar] [CrossRef]
Zhang, Y.; Peng, C.; Li, W.; Tian, L.; Zhu, Q.; Chen, H.; Fang, X.; Zhang, G.; Liu, G.; Mu, X.; et al. Multiple afforestation programs accelerate the greenness in the ‘Three North’ region of China from 1982 to 2013. Ecol. Indic. 2016, 61, 404–412. [Google Scholar] [CrossRef]
Guo, Z.; Wei, W.; Shi, P.; Zhou, L.; Wang, X.; Li, Z.; Pang, S.; Xie, B. Spatiotemporal changes of land desertification sensitivity in the arid region of Northwest China. Acta Geograph. Sin. 2020, 75, 1948–1965. [Google Scholar]

Figure 1. Overview map of the Inner Mongolia Autonomous Region.

Figure 2. Sampling comparison diagram of deformable convolution and standard convolution [33]. (a) Standard convolution sampling. (b) Deformable convolution offset. (c) Deformable convolution panning. (d) Deformable convolution rotation. The green circles represent the standard sampling points of the fixed convolution kernel, the yellow circle represents the new sampling points generated by the convolution kernel based on the offset, the black arrow represents the offset vector.

Figure 3. Deformable convolution model structure diagram. The arrow represents the the offset vector in the deformable convolution, the black boxes represent sample points for deformable convolution.

Figure 4. Y-Net model structure diagram.

Figure 5. Distribution map of desert land area changes in Inner Mongolia from 1990 to 2020. (a) Distribution map of desert land area changes in Inner Mongolia in 1990; (b) Distribution map of desert land area changes in Inner Mongolia in 2000; (c) Distribution map of desert land area changes in Inner Mongolia in 2005; (d) Distribution map of desert land area changes in Inner Mongolia in 2010; (e) Distribution map of desert land area changes in Inner Mongolia in 2015; (f) Distribution map of desert land area changes in Inner Mongolia in 2020.

Figure 6. Distribution map of severe desertification in eastern Inner Mongolia. (a) Distribution map of severe desertification in eastern Inner Mongolia in 1990; (b) Distribution map of severe desertification in eastern Inner Mongolia in 2000; (c) Distribution map of severe desertification in eastern Inner Mongolia in 2005; (d) Distribution map of severe desertification in eastern Inner Mongolia in 2010; (e) Distribution map of severe desertification in eastern Inner Mongolia in 2015; (f) Distribution map of severe desertification in eastern Inner Mongolia in 2020.

Figure 7. Model diagram of the structural equation model.

Figure 8. Diagram of the coupling of human activity and climate. The green arrows represent positive correlations with greater influence, and the orange arrows represent negative correlations with less influence.

Table 1. Comparison of the band extraction results.

Band	Precision %	Recall %	F1-Score %
B1	76.293	80.651	78.411
B2	68.789	63.503	66.040
B3	76.888	70.175	73.378
B4	83.870	81.898	82.872
B5	86.710	80.361	83.414
B6	80.267	72.169	76.003
B7	73.594	67.692	70.518

Table 2. Band combination multispectral remote sensing image dataset.

	B1	B4	B5	Original-B5B4B1	W-B5B4B1	Label
Normal images
Low light images
Cloud interference
Complex background
Negative samples

Table 3. Comparison of the model extraction results.

Original Image	W-Image	U-Net	Y-Net	W-U-Net	W-Y-Net	True Value

The red boxes represent the image details extracted by different models at the same location.

Table 4. Comparison of the desert land extraction results.

Model	IoU %	Precision %	Recall %	F1-Score %	Calculation Time (s)
W-U-Net	85.2	90.0	88.4	89.2	347
W-Y-Net	88.3	96.1	94.1	95.1	386
U-Net	73.3	89.1	78.7	83.6	281
Y-Net	77.2	91.8	83.2	87.3	318

Table 5. Comparison table of the combined path coefficients and the absolute values of r for the nine drivers of desert land area.

Factors	LD	P	RH	Adi	Evp	LS	CL	PD	T
Comprehensive path coefficients	0.199	0.646	0.615	0.367	0.259	0.253	0.545	0.19	0.181
Pearson correlation coefficients	0.408	0.104	0.100	0.313	0.136	0.046	0.317	0.422	0.100

Table 6. Direct and indirect effects of climate and human activities on desert land changes.

Factor	Direct	Indirect	Total
Human activities	0.367	0.025	0.392
Climate	−0.565	0.315	−0.25

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, X.; Zhang, B.; Zhu, H.; Song, W.; Ren, D.; Dai, J. A Spatial and Temporal Evolution Analysis of Desert Land Changes in Inner Mongolia by Combining a Structural Equation Model and Deep Learning. Remote Sens. 2023, 15, 3617. https://0-doi-org.brum.beds.ac.uk/10.3390/rs15143617

AMA Style

Chang X, Zhang B, Zhu H, Song W, Ren D, Dai J. A Spatial and Temporal Evolution Analysis of Desert Land Changes in Inner Mongolia by Combining a Structural Equation Model and Deep Learning. Remote Sensing. 2023; 15(14):3617. https://0-doi-org.brum.beds.ac.uk/10.3390/rs15143617

Chicago/Turabian Style

Chang, Xinyue, Bing Zhang, Hongbo Zhu, Weidong Song, Dongfeng Ren, and Jiguang Dai. 2023. "A Spatial and Temporal Evolution Analysis of Desert Land Changes in Inner Mongolia by Combining a Structural Equation Model and Deep Learning" Remote Sensing 15, no. 14: 3617. https://0-doi-org.brum.beds.ac.uk/10.3390/rs15143617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Spatial and Temporal Evolution Analysis of Desert Land Changes in Inner Mongolia by Combining a Structural Equation Model and Deep Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Study Area

2.2. Data Sources

2.2.1. Multispectral Image Data

2.2.2. Meteorological and Socio-Economic Data

3. Model Construction

3.1. U-Net

3.2. Deformable Convolution

3.3. Y-Net

3.4. Performance Evaluation Indicators

4. Experiments and Analysis

4.1. Desert Land Intelligent Interpretation Dataset

4.2. Model Training

4.3. Model Comparison

5. Discussion

5.1. Weighted Band Combination Methods

5.2. Multiyear Changes in Desert Land Area

5.3. Driving Force Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI