Two-Phase Object-Based Deep Learning for Multi-Temporal SAR Image Change Detection

Zhang, Xinzheng; Liu, Guo; Zhang, Ce; Atkinson, Peter M.; Tan, Xiaoheng; Jian, Xin; Zhou, Xichuan; Li, Yongming

doi:10.3390/rs12030548

Open AccessArticle

Two-Phase Object-Based Deep Learning for Multi-Temporal SAR Image Change Detection

¹

College of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China

²

Chongqing Key Laboratory of Space Information Network and Intelligent Information Fusion, Chongqing 400044, China

³

Lancaster Environment Centre, Lancaster University, Lancaster LA1 4YQ, UK

⁴

UK Centre for Ecology & Hydrology, Library Avenue, Lancaster LA1 4AP, UK

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2020, 12(3), 548; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12030548

Submission received: 15 December 2019 / Revised: 29 January 2020 / Accepted: 4 February 2020 / Published: 7 February 2020

(This article belongs to the Special Issue Image Processing and Analysis: Trends in Registration, Data Fusion, 3D Reconstruction and Change Detection)

Download

Browse Figures

Versions Notes

Abstract

:

Change detection is one of the fundamental applications of synthetic aperture radar (SAR) images. However, speckle noise presented in SAR images has a negative effect on change detection, leading to frequent false alarms in the mapping products. In this research, a novel two-phase object-based deep learning approach is proposed for multi-temporal SAR image change detection. Compared with traditional methods, the proposed approach brings two main innovations. One is to classify all pixels into three categories rather than two categories: unchanged pixels, changed pixels caused by strong speckle (false changes), and changed pixels formed by real terrain variation (real changes). The other is to group neighbouring pixels into superpixel objects such as to exploit local spatial context. Two phases are designed in the methodology: (1) Generate objects based on the simple linear iterative clustering (SLIC) algorithm, and discriminate these objects into changed and unchanged classes using fuzzy c-means (FCM) clustering and a deep PCANet. The prediction of this Phase is the set of changed and unchanged superpixels. (2) Deep learning on the pixel sets over the changed superpixels only, obtained in the first phase, to discriminate real changes from false changes. SLIC is employed again to achieve new superpixels in the second phase. Low rank and sparse decomposition are applied to these new superpixels to suppress speckle noise significantly. A further clustering step is applied to these new superpixels via FCM. A new PCANet is then trained to classify two kinds of changed superpixels to achieve the final change maps. Numerical experiments demonstrate that, compared with benchmark methods, the proposed approach can distinguish real changes from false changes effectively with significantly reduced false alarm rates, and achieve up to 99.71% change detection accuracy using multi-temporal SAR imagery.

Keywords:

synthetic aperture radar (SAR); change detection; deep learning; superpixel

Graphical Abstract

1. Introduction

With its cloud penetrating capability, synthetic aperture radar (SAR) images have drawn a large amount of attention, for example, in environmental surveillance, urban planning and military applications over the past decades. Using SAR images for change detection often involves two images acquired over the same area at different times, utilising the information in the differences between them.

Depending on the availability of a difference image (DI), change detection approaches can be divided into two categories. One is post-classification comparison which is undertaken to identify changed and unchanged regions directly from two images that were classified independently before the analysis. In this approach, the change detection result is not influenced by radiation normalization and geometric correction. However, the accuracy of the change detection relies on the quality of the classification results, with errors propagating to the outcome. The other approach is post-comparison analysis, in which change detection is achieved by generating a DI from two multi-temporal images, and obtaining the final change map from it. The classification errors in this case do not accumulate, but the way that the DI is generated may influence the validity of the change detection results [1].

From a machine learning perspective, change detection can also be categorized into supervised and unsupervised approaches, depending on whether labelled data are used or not [2,3]. For supervised methods, features extracted from labelled data are fed into a subsequent classifier. This strategy requires a significant number of ground reference data to train the algorithm, and the labelling process can be extremely labour-intensive and time-consuming [4]. In [5], a context-sensitive similarity measure is presented based on supervised classification to amplify the dissimilarity between changed and unchanged pixels. Unsupervised methods for change detection can be viewed as a clustering approach which divides the data into changed and unchanged classes [6,7]. In [8], the DI is cast into an eigenvector space and k-means clustering is used to partition the space into two clusters. In [9], a modified Markov Random Field (MRF) energy function is employed to update iteratively the membership association of fuzzy c-means (FCM), to cluster the DI into two classes. In [10] a novel method based on spatial fuzzy clustering was used to add spatial information to enhance change detection performance.

Recently, deep learning has gained widespread attention in the field of computer vision and pattern recognition, and demonstrated state-of-the-art prediction accuracy in various challenging tasks, such as target detection, image classification, etc. The major benefit of deep learning is that it can extract abstract and high-level representations that are hard to hand-code through feature engineering [11,12]. In addition, deep networks are often pre-trained using a large-scale dataset (e.g., ImageNet), and fine-tuned to other domains including remote sensing. Convolutional neural networks (CNNs) are considered as the pioneer of deep learning methods which mimic the receptive fields of the human brain neural cortex, with less redundancy and complexity through the weight-sharing architecture [12,13]. Some well-developed CNN models, such as AlexNet [12], VGG [14] and ResNet [15], have been adopted quickly in the remote sensing community to solve real-world challenges (e.g., land cover and land use classification).

Given the advantages of deep learning, some pioneering methods have been proposed for multi-temporal SAR image change detection. In [1], a stack of restricted Boltzmann machine (RBM) networks was used to learn efficiently the relationship between two multi-temporal SAR images for change detection. A dual-channel CNN structure was used to extract features of two SAR images for change detection [16]. In [17] presents a local restricted CNN for SAR image change detection, which is formed by imposing a spatial constraint on the output layer of the CNN, such as to learn from several layered difference images. In [18], a stacked contractive autoencoder (sCAE) using a contractive penalty was proposed to promote local invariance and robustness, such that robust features can be extracted from superpixels of SAR images for change detection. In [19], a deep learning-based weakly supervised framework was developed for urban change detection using multi-temporal polarimetric SAR data. In [20], a transferred multi-level fusion network (MLFN) was trained using a large dataset and fine-tuned to extract features from SAR image patches for sea ice change detection. PCANet is an alternative deep learning model with its convolution filter banks chosen from principal component analysis (PCA) filters, which is suitable for SAR image change detection [21,22]. In PCANet, the cascaded PCA filters and binary quantization (hashing) are used as a data-adapting convolution filter bank in each stage and in the nonlinearity layer [21]. During the PCANet training process, there is no requirement for regularized parameters and numerical optimization solvers, which promotes the efficiency and accuracy of the network. In [22], PCANet was shown to be accurate, with great potential for SAR image change detection. In [23], context-aware saliency detection was employed to obtain training samples for PCANet in SAR image change detection, which reduces the number of training samples required while maintaining the reliability of the training sample sets, leading to less training time and computational efficiency. In [24], a morphologically supervised PCANet was designed to overcome the class imbalance problem in SAR image change detection (changed pixels are far less common than unchanged pixels).

Although the above-mentioned deep learning methods exhibit excellent performance in SAR image change detection, there are still some shortcomings. First of all, all the above methods are actually binary classification algorithms, which separate pixels of the changed class (CC) from pixels of the unchanged class (UC). In reality, variation in the pixel values caused by strong speckle noise may lead to allocation to the changed class, potentially producing a large number of false alarms. Here, strong speckle noise refers to those speckles which have amplitude values similar to the terrain pixel amplitude values or even larger. Thus, strong speckle noise can bring significant false alarms to change detection. However, for SAR image change detection, the strong or weak speckle is relative to the amplitudes of terrain pixels. Due to the complexity of the terrain background, some objects have smaller pixel amplitude values in the SAR image, and some objects have larger pixel amplitude values in the SAR image. It is difficult to use a general certain value or standard to measure “strong” degree in SAR image change detection. Therefore, in this research, only the term “strong speckle” is introduced qualitatively. There are actually two kinds of changed pixels: one is produced by real terrain object changes (i.e., real changed class, RCC), and the other caused by strong speckle noise (i.e., false changed class, FCC). For example, if there was a building in a location in the first temporal SAR image, but it was no longer available in the second temporal SAR image. This situation belongs to RCC. The FCC means that there is no change in terrain, but the change is caused by the speckle noise. For example, the original speckle noise is weak in the first temporal SAR image, but the later speckle noise of the same location is very strong in the second temporal SAR image. This kind of strong speckle noise variation is often regarded by the change detection algorithm as a real terrain change leading to false alarms. Therefore, this kind of change belongs to the FCC. Even if deep learning models have powerful classification capabilities, there will still be several false alarms due to strong speckle noise. Secondly, in current deep learning-based SAR image change detection, high quality training samples are required to train the networks. Those training samples are commonly taken as rectangular patches centring around the pixels that are of interest. However, this operation often introduces artefacts on the border of these rectangular patches, which produces uncertainty in the classification maps. For example, unchanged pixels and changed pixels could potentially exist in one image patch simultaneously. Heterogeneous pixels can also be found in one rectangular patch, which will increase the difficulty of distinguishing between CC and UC classes.

In this research, a new framework of two-phase object-based deep learning (TPOBDL) is proposed for SAR image change detection. Object-based deep learning has been shown to be suitable for remote sensing applications [25]. Thus, in TPOBDL, change detection is implemented in an object-based rather than pixel-wise fashion. Superpixel generation is applied to SAR images to acquire image objects (also called superpixels in computer science, and here) using a simple linear iterative clustering (SLIC) algorithm [26]. In fact, all processing steps in TPOBDL are based on image superpixels. Since a superpixel is a local set of homogeneous pixels, superpixels can reflect the local spatial context [27,28,29]. Therefore, this approach can overcome the problems caused by operations involving rectangular patches, such as introducing artefacts and uncertainty in the classification. The proposed approach involves two phases to differentiate RCC and FCC objects in an automated approach. Our two-phase deep learning strategy is, thus: Phase 1 deep learning to classify the objects of CC and those of UC, and Phase 2 deep learning to classify objects of CC into RCC and FCC objects. This two-phase framework reduces the classification difficulty faced by deep learning models at each phase, and is conducive to increasing the overall accuracy of change detection.

Our major contributions are as follows:

(1): Change detection through an object-based rather than pixel-wise approach. Superpixel generation is applied to SAR images to obtain objects via SLIC, such that the local spatial context is captured.
(2): A two-phase approach is designed for multi-temporal SAR image change detection. Deep learning methods are developed to identify objects of FCC and RCC by combining low rank and sparse decomposition (LRSD) with reduced false alarms.

The remainder of this paper is organized as follows. In Section 2, the proposed approach is described in detail. Section 3 presents the experimental datasets and results. Discussion on the experiment results and the proposed approach are shown in Section 4. Finally, conclusions are drawn in Section 5.

2. Methodology

2.1. Problem Statement and Overview of the Proposed Method

Consider two SAR images taken from the same location, but at different times

I_{1}

and

I_{2}

, both of size

M \times N

. Change detection is required to generate a binary change map labelling changed pixels and unchanged pixels between

I_{1}

and

I_{2}

. Figure 1 shows the scheme of TPOBDL, which consists mainly of two phases of deep learning, described in detail as follows.

2.2. First Phase Deep Learning

2.2.1. Superpixel Generation of Multi-Temporal SAR Images

In existing deep learning-based SAR image change detection methods, the patches for the training and testing of deep neural networks are generated mainly in the shape of rectangles, which is convenient [24]. However, the operation of taking rectangular patches has significant disadvantages for SAR image change detection. Firstly, when the current pixel is near the boundary between changed and unchanged regions, the patch generated will contain both changed and unchanged pixels, which may introduce uncertainty to the deep neural network and impair the learning process [25]. Secondly, rectangular patch generation ignores the local spatial context, which is conducive to the change detection. Instead of taking a rectangular patch, in this paper, patches come from superpixels, where all pixels are homogeneous. This reduces the likelihood that heterogeneous pixels, or even changed and unchanged pixels appear in one patch simultaneously. Patches that are superpixels, compared with traditional rectangular patches, provide more valid information to the deep learning model. In fact, deep learning based on superpixels is an object-based approach, which have more advantages.

In this research, we use SLIC to apply superpixel generation to two multitemporal SAR images

I_{1}

and

I_{2}

. SLIC is chosen for its simplicity, flexibility in compactness, memory efficiency and high accuracy, as applied to SAR image processing [30,31]. First, superpixels of

I_{1}

are obtained by SLIC. Then the superpixel pattern from

I_{1}

is copied to

I_{2}

, as shown in Figure 2. Pattern copying ensures that the corresponding two superpixels of

I_{1}

and

I_{2}

represent the same local region.

The principles of SLIC are briefly described as follows. Firstly, the number of superpixels is set as

v

, which means

I_{1}

is portioned into

v

pixel-blocks at the beginning. The centre of each pixel-block is called a seed. The distance (step length) between two seeds is defined as

Ω = \sqrt{M \times N / v}

. To avoid seeds falling on the contour boundary with a larger gradient, the seeds are redefined where the gradient is the smallest in the neighbourhood. Then searching in the neighbourhood of each seed, the distance between a pixel in the neighbourhood and the seed, including distance in feature (colour) space

d_{c}

and in geographical space

d_{s}

, is gained by

d_{c} = \sqrt{{(l_{j} - l_{i})}^{2} + {(a_{j} - a_{i})}^{2} + {(b_{j} - b_{i})}^{2}}

(1)

d_{s} = \sqrt{{(x_{j} - x_{i})}^{2} + {(y_{j} - y_{i})}^{2}}

(2)

D = \sqrt{{(d_{c} / Γ)}^{2} + {(d_{s} / Ω)}^{2}}

(3)

where

d_{c}

means feature (colour) distance,

Γ

is the maximum colour distance in the SLIC algorithm. Colour distances can vary significantly from image to image, therefore the parameter

Γ

can be fixed to a constant. Based on the experiments in this research, we determined the value of this parameter to be 10.

d_{s}

means spatial distance, and

D

is the distance metric.

l_{i}

,

a_{i}

and

b_{i}

represent the three colour values of the seed in the CIELAB colour space

{[l a b]}^{T}

respectively, and

x_{i}

,

y_{i}

represents the coordinate of the seed.

l_{j}

,

a_{j}

,

b_{j}

,

x_{j}

and

y_{j}

are corresponding parameters of the pixel in the neighbourhood. In this manner, a pixel will be searched many times with different seeds. The seed with the smallest

D

is taken as the clustering centre of this pixel. Then the seeds are updated. According to observations in our experiments, we found that the SLIC algorithm converges within 10 iterations on the SAR images.

Superpixels possess a range of geometries and sizes (i.e., numbers of pixels). In contrast, the inputs of the deep neural network are required to be uniform rectangles with the same numbers of pixels. Thus, the superpixels need to be reshaped into rectangles before being fed into the network. Assume that the input patches are of size

k \times k

. Then, each reshaped superpixel should also have

k^{2}

pixels. If a superpixel contains

p

pixels, there are two ways to reshape the superpixel. One is

p \leq k^{2}

. For this case, assume that a superpixel represented as

S_{n, i}^{m}

(where

m

represents the phase it is in, in this stage

m = 1

,

n

represents the image it comes from,

n = 1, 2

,

i

is an index of the superpixels,

i = 1, 2, \dots, v

) is reshaped to a vector

V_{n, i}^{m}

having

k^{2}

pixels. The first

p

pixels of

V_{n, i}^{m}

is filled by pixels of

S_{n, i}^{m}

, and the other

k^{2} - p

pixels are chosen randomly from

S_{n, i}^{m}

. The other one is

p > k^{2}

. For this case, we reshape the superpixel

S_{n, i}^{m}

into

q + 1

vectors

V_{n, i, 1}^{m}

,

V_{n, i, 2}^{m}

, …,

V_{n, i, q}^{m}

, each of which has

k^{2}

pixels, and an extra vector with

p - q k^{2}

pixels. This extra vector is filled with a vector

V_{b, i, (q + 1)}^{a}

of

k^{2}

pixels under the condition

p \leq k^{2}

. For a unified description,

V_{n, i}^{m}

of case

p \leq k^{2}

is redefined as

V_{n, i, 1}^{m}

.

2.2.2. Superpixel DI Generation and FCM

The reshaped superpixel vectors

V_{1, i, h}^{1}

and

V_{2, i, h}^{1}

(

h = 1, 2, \dots, q, q + 1

) from

S_{1, i}^{1}

and

S_{2, i}^{1}

of

I_{1}

and

I_{2}

are fed into the superpixel DI (SPDI) operator

F_{i, h}^{1} = | V_{1, i, h}^{1} - V_{2, i, h}^{1} |

. All

F_{i, h}^{1}

form a SPDI. The reason for generating the superpixel difference map is to help the FCM algorithm to cluster satisfactorily in the next step. Then all the

F_{i, h}^{1}

are clustered into three classes by FCM: changed class (CC)

ω_{c}^{1}

, unchanged class (UC)

ω_{u}^{1}

and intermediate class

ω_{m}^{1}

. Details of FCM can be found in [32].

F_{i, h}^{1}

belonging to

ω_{c}^{1}

or

ω_{u}^{1}

means that superpixel

S_{1, i}^{1}

and

S_{2, i}^{1}

corresponding to

V_{1, i, h}^{1}

and

V_{2, i, h}^{1}

have a high probability to be changed or unchanged, respectively. The pair of superpixels

S_{1, i}^{1}

and

S_{2, i}^{1}

with the case

p \leq k^{2}

can easily be inferred to be one of three classes, because each pair of them only has one set of

V_{1, i, h}^{1}

and

V_{2, i, h}^{1}

which forms one

F_{i, h}^{1}

. However, for superpixels

S_{1, j}^{1}

and

S_{2, j}^{1}

with

p > k^{2}

, each pair has

q + 1

sets of

V_{1, i, h}^{1}

and

V_{2, i, h}^{1}

, which leads to

q + 1

F_{i, h}^{1}

. Thus, a voting mechanism is employed to determine their classes. Specifically, for the

+ 1

F_{i, h}^{1}

, those clustered into

ω_{c}^{1}

are weighted by 1, those clustered into

ω_{u}^{1}

are weighted by 0 and those clustered into

ω_{m}^{1}

are weighted by 0.5. Then, all

q + 1

weights are summed to be

Λ

, and the class of superpixel pair

S_{1, j}^{1}

and

S_{2, j}^{1}

with

p > k^{2}

is determined as follows:

class of superpixel pair S_{1, j}^{1} and S_{2, j}^{1} = {\begin{matrix} ω_{c}^{1}, Λ / (q + 1) \geq 0.8 \\ ω_{m}^{1}, 0.8 > Λ / (q + 1) \geq 0.5 \\ ω_{u}^{1}, Λ / (q + 1) < 0.5 \end{matrix}

(4)

These specific thresholds in Equation (4) are selected according to the voting mechanism. If

Λ / (q + 1) < 0.5

, it means that UC are the majority in q + 1

F_{i, h}^{1}

, so the corresponding superpixel pair are identified as UC. If

0.8 > Λ / (q + 1) \geq 0.5

, it indicates that the intermediate class has the majority and there are a few changed class, so the corresponding superpixel pair is judged as the intermediate class. If

Λ / (q + 1) \geq 0.8

, it indicates that CC is the majority, so the corresponding superpixel pair is judged as CC.

The

V_{b, i, h}^{1}

determined as CC and UC are reshaped to patches, which will be fed into the deep learning model as training samples. Those

V_{b, i, h}^{1}

belonging to the intermediate class will be classified to CC or UC by the trained deep neural network.

2.2.3. Training PCANet1

As a type of deep learning model, PCANet is easy to train and can be adapted to other tasks. For SAR image change detection, PCANet has been shown to learn non-linear relations from multi-temporal SAR images, which is an advantage compared to other deep neural networks [22]. It has already been employed in SAR image change detection [22,23,24]. Considering these superiorities of PCANet in SAR image change detection tasks, we use PCANet here to further classify those superpixel pairs identified to the intermediate class in the previous phase. Since PCANet is used in the second phase, the network in the first phase is called PCANet1.

First, the

V_{b, i, h}^{1}

of CC and UC are used as samples to train PCANet1.

V_{1, i, h}^{1}

and

V_{2, i, h}^{1}

are reshaped and combined to form the patches

R_{i, h}^{}

to be fed into the network (Figure 3). If

I_{1}^{}

is segmented into

v

superpixels and the

i

-th superpixel is reorganized as

γ_{i}

vectors. Then the number of

R_{i, h}^{}

of size

2 k \times k

is

Γ = \sum_{i = 1}^{v} γ_{i}

.

The structure of PCANet1 is shown in Figure 4, consisting of two PCA filters convolution layers, a Hashing and histogram generation layer. After patch generation, all

R_{i, h}^{}

have their means removed, are vectorised and combined as a matrix

Y

.

Y = [y_{1, 1}, \dots, y_{1, γ_{1}}, y_{2, 1}, \dots, y_{2, γ_{2}}, \dots, y_{v, 1}, \dots, y_{v, γ_{v}}]

(5)

where

y_{i, h}

denotes mean-removed and vectorised

R_{i, h}^{}

.

Next, we choose

L_{1}

principal eigenvectors of

Y Y^{T}

(T denotes the matrix transposition) as the PCA filters

W_{l}^{1}

of the first layer, that is

W_{l}^{1} = mat (q l (Y Y^{T})) \in ℜ^{2 k^{2} \times 2 k^{2}}, l = 1, 2, \dots, L_{1}

(6)

where

q l (Y Y^{T})

means

l -

th principal eigenvector and

mat (x)

can map a vector

x \in ℜ^{4 k^{4}}

into a matrix

W \in ℜ^{2 k^{2} \times 2 k^{2}}

. So, the output of the first layer is

R_{i, h}^{l} = R_{i, h}^{} * W_{l}^{1}

(7)

where the

*

operator means 2-D convolution.

R_{i, h}^{l}

forms the input of the second layer.

In the second layer, all

R_{i, h}^{l}

have their means removed and are vectorised to be

z_{i, h}^{l}

, which is combined to be a matrix

Z^{l} = [z^{l}_{1, 1}, \dots, z^{l}_{1, γ_{1}}, z^{l}_{2, 1}, \dots, z^{l}_{2, γ_{2}}, \dots, z^{l}_{v, 1}, \dots, z^{l}_{v, γ_{v}}]

. Then, all

Z^{l}

are combined as:

Z = [Z^{1}, Z^{2} \dots, Z^{L_{1}}]

(8)

The following step is similar to that for the first layer. We choose

L_{2}

principal eigenvectors of

Z Z^{T}

as the PCA filters

W_{l}^{2}

of the first layer, that is:

W_{p}^{2} = mat (q l (Z Z^{T})) \in ℜ^{2 k^{2} \times 2 k^{2}}, p = 1, 2, \dots, L_{2}

(9)

Then the outputs of the second convolution layer are:

R_{i, h}^{l, p} = R_{i, h}^{l} * W_{p}^{2}

(10)

After these two convolution layers, every

R_{i, h}^{}

has

L_{1} L_{2}

outputs. Each output is binarized by the Heaviside step function (one for positive input and zero otherwise) to obtain an integer value of each pixel of

R_{i, h}^{l}

, which is in the range

[0, 2^{L_{2}} - 1]

. Thus, we gain an integer-value image

T_{i, h}^{l}

T_{i, h}^{l} = \sum_{p = 1}^{L_{2}} 2^{p - 1} H (R_{i, h}^{l} * W_{p}^{2})

(11)

Further,

T_{i, h}^{l}

is transformed into a histogram

hist T_{i, h}^{l}

. Then the feature of input

R_{i, h}^{}

is defined by PCANet as:

κ_{i, h} = [hist (T_{i, h}^{1}), hist (T_{i, h}^{2}), \dots, hist (T_{i, h}^{L_{1}})]

(12)

The features obtained as above are fed into a support vector machine (SVM) to train a model which can classify superpixels of intermediate class to CC or UC. It is worth noting that there are almost no CC objects in the final UC at the end of the first phase. The reason is as follows. If FCM clusters all superpixel vectors into two categories, namely UC and CC, then UC parts may contain CC objects probably. To avoid this problem, in the first phase, the clustering results are three categories, UC, CC, and intermediate class. In this way, the obtained UC and CC are of highly probability. It means that there are almost no CC objects in UC, and there are almost no UC objects in CC. For those CC objects that are easily assigned to UC in only two categories clustering, they are assigned to intermediate class in three categories clustering. Therefore, those samples with high uncertainty are assigned to the intermediate class. Later, we use the high probability UC and CC objects to train PCANet1, and use the trained PCANet1 to accurately classify objects of the intermediate class. PCANet1 can extract the deep features of UC and CC, therefore it can classify objects belonging to intermediate class to UC or CC well. In summary, we combine FCM and PCANet to ensure that there are almost no CC Objects in UC, thereby ensuring extremely low missing detection. However, it is worth noting that the CC of the first phase includes not only the changed pixels caused by real terrain variation, but also changed pixels caused by strong speckle noise.

2.3. Second Phase Deep Learning

As stated above, when SAR images are contaminated by strong speckle noise, the CC of the first phase contains two categories of change. One is false change caused by speckle noise called FCC, the other is caused by real terrain variation called RCC. Thus, in the second phase, we aim to separate FCC and RCC, between which the intra-class interval is so small that they are difficult to distinguish. However, the hypostatic difference between the two categories is such that the change caused by strong speckle noise has strong randomness. If the influence of the random noise can be greatly weakened, discrimination between the RCC and FCC can be increased. Therefore, in the second deep learning phase, we adopt different methods to the first phase. One key step in the second phase is speckle noise suppression based on low rank and sparse decomposition. Details are as follows.

2.3.1. Superpixel Generation on the Updated SAR Images

In the second phase, we firstly use mask processing on the original SAR images

I_{1}

and

I_{2}

to set the pixels classified as UC in the first phase to zero, thus, easing the burden on the classifier in this phase. Then SLIC is conducted on these two masked images to generate new superpixel objects denoted by

S_{b, i}^{2}

. The superpixel generation in the phase has two differences from that in the first phase. Firstly, the superpixel generation of this phase is based on the masked images, so the spatial context of the pixels has altered significantly leading to different superpixel patterns. Secondly, when applying SLIC in this phase, we set the number of pixels of each superpixel to be less than that in the first phase because there are many discontinuous areas caused by the mask operation compared to the generation in the first phase. Then we reshape the superpixel objects

S_{b, i}^{2}

into vectors

V_{b, i, h}^{2}

using a strategy similar to that in the first phase.

2.3.2. Low Rank and Sparse Decomposition

The principle of using LRSD is that the pair of noisy superpixels from the same unchanged area of

I_{1}

and

I_{2}

, have an inherent large correlation with a low rank characteristic. Therefore, to discriminate RCC and FCC, we propose an idea based on LRSD to suppress speckle noise and restore the superpixel objects. The LRSD model establishes the effective expression of observed data with noise [33,34]. Low rank regularization constraints and sparse regularization constraints can separate noise effectively from observed data and recover data. By optimizing the LRSD model, speckle noise can be separated and observed objects restored, which may greatly increase the discrimination between RCC and FCC.

At first, we apply a logarithmic operation on each vector of superpixel objects to convert multiplicative speckle noise to additive noise. Then, each vector can be formulated as follows.

V_{b, i, h}^{2} = u_{b, i, h}^{2} + e_{b, i, h}^{2}

(13)

where

u_{b, i, h}^{2}

indicates the pixels of observed objects ideally without any speckle noise, and

e_{b, i, h}^{2}

indicates additive speckle noise. All vectors

V_{1, i, h}^{2}

and

V_{2, i, h}^{2}

are arranged in pairs to construct a matrix

Φ = [V_{1, 1, 1}^{2}, V_{2, 1, 1}^{2}, \dots, V_{1, 1, q_{1}}^{2}, V_{2, 1, q_{1}}^{2}, \dots \dots, V_{1, v, 1}^{2}, V_{2, v, 1}^{2}, \dots, V_{1, 1, q_{v}}^{2}, V_{2, 1, q_{v}}^{2}]

, as shown in Figure 5. Thus, we can obtain the matrix version of Equation (13) as Equation (14).

Φ = U + E

(14)

where

U = [u_{1, 1, 1}^{2}, u_{2, 1, 1}^{2}, \dots, u_{1, 1, q_{1}}^{2}, u_{2, 1, q_{1}}^{2}, \dots \dots, u_{1, v, 1}^{2}, u_{2, v, 1}^{2}, \dots, u_{1, 1, q_{v}}^{2}, u_{2, 1, q_{v}}^{2}]

E = [e_{1, 1, 1}^{2}, e_{2, 1, 1}^{2}, \dots, e_{1, 1, q_{1}}^{2}, e_{2, 1, q_{1}}^{2}, \dots \dots, e_{1, v, 1}^{2}, e_{2, v, 1}^{2}, \dots, e_{1, 1, q_{v}}^{2}, e_{2, 1, q_{v}}^{2}]

.

According to the principle of low rank representation, in order to estimate a low rank matrix

U

and a spare matrix

E

from a noise-contaminated observed

Φ

, we formulate an optimization problem as follows.

\min_{U, E} {‖ U ‖}_{*} + ε (1 - λ) {‖ U ‖}_{2, 1} + ε λ {‖ E ‖}_{2, 1}, subject to Φ = U + E

(15)

where

{‖ \cdot ‖}_{*}

indicates the nuclear norm,

{‖ \cdot ‖}_{2, 1}

indicates the

l_{1}

norm of a vector formed by the

l_{2}

norm of the column vector of the underlying matrix.

{‖ \cdot ‖}_{*}

induces sparsity of the singular values of the matrix, and

{‖ \cdot ‖}_{2, 1}

induces sparsity of the elements of the matrix.

The optimization problem can be solved by an augmented Lagrange algorithm. The augmented Lagrange formula of the Equation (16) is as follows:

L (U, E, X, μ) = {‖ U ‖}_{*} + ε (1 - λ) {‖ U ‖}_{2, 1} + ε λ {‖ E ‖}_{2, 1} + 〈 X, Φ - U - E 〉 + \frac{μ}{2} {‖ Φ - U - E ‖}_{F}^{2}

(16)

where

X

is the Lagrange multiplier. Given

X = X_{k}

and

μ = μ_{k}

, the key to solving the problem is to solve:

\min_{U, E} L (U, E, X_{k}; μ_{k})

(17)

The solution of which will emerge though iteration. First, fix

U = U_{k}

, and solve:

E_{k + 1} = \arg \min_{E} L (U_{k}, E, X_{k}; μ_{k})

(18)

Then, fix

E = E_{k + 1}

, and solve:

U_{k + 1} = \arg \min_{U} L (U_{k}, E_{k + 1}, X_{k}; μ_{k})

(19)

After LRSD, we utilize column vectors

u_{1, i, h}^{2}

and

u_{2, i, h}^{2}

of low rank matrix

U

to restore

V_{b, i, h}^{2}

, abandoning the noise matrix

E

, as shown in Figure 6.

2.3.3. SPDI Generation and FCM

In the second phase, the difference vector is obtained from the superpixel vectors restored by LRSD, and FCM clustering is also adopted. At this stage,

F_{i, h}^{2} = | u_{1, i, h}^{2} - u_{2, i, h}^{2} |

, forming a new SPDI, is taken as the input of FCM, to be clustered into three classes, FCC

ω_{f c}^{2}

, RCC

ω_{r c}^{2}

and the intermediate class

ω_{m c}^{2}

.

2.3.4. Training PCANet2 and Obtaining the Final Change Map

As mentioned earlier, in the second phase, the FCM clusters the superpixel vectors into three categories, which are RCC

ω_{r c}^{2}

, FCC

ω_{f c}^{2}

and the intermediate class

ω_{m c}^{2}

. RCC is the category of those superpixel vectors that have real changes with a high probability caused by terrain objects. FCC is the category of those superpixel vectors that have false changes with a high probability caused by strong speckle noise. Other superpixel vectors are with high uncertainty, which are difficult to be determined as RCC or FCC. Thus, those superpixel vectors with high uncertainty is named the intermediate class. This is the role of the intermediate classes. In fact, these superpixel vectors of the intermediate class belong to either RCC or FCC. However, FCM cannot identify the category of these superpixel vectors with higher uncertainty due to its limited clustering ability. Therefore, a deep learning classifier is needed to accurately identify whether these superpixel vectors of the intermediate class belong to RCC or FCC. We design a new PCANet model to accomplish this precise identification task. To distinguish it from the first phase, we named this PCANet as PCANet2, the structure of which is the same as PCANet1.

The model training of PCANet2 is to use FCC and RCC superpixel vectors obtained by FCM as training samples to train the SVM in PCANet2. The training process of PCANet2 is similar to PCANet1, except that the training samples of the two deep learning model are different. After model training, PCANet2 with the trained SVM can accurate identify superpixel vectors of intermediate classes to be RCC or FCC. Additionally, since the size of the superpixels of this phase is smaller than that in the first phase, the patch size of PCANet2 is smaller than that of PCANet1 relatively. Once the network extracts the features of all the training samples, the extracted features are employed to train an SVM model. Further, those vectors belonging to the intermediate class

ω_{m c}^{2}

are fed into the PCANet2 with the trained SVM to be classified to FCC or RCC. It is worth noting that the classification task of the PCANet2 is performed only once, without any iteration. In this way, we obtain the result of the second phase, which discriminates strong-noise-induced changes and real terrain changes. Finally, the real changed pixels of the SAR images are only the pixels of superpixel objects belonging to RCC

ω_{r c}^{2}

. By doing this, the final binary change detection result can be obtained.

2.4. Computational Complexity

The analysis of the computational complexity of the method proposed in this paper is as follows. In the first phase, the computational complexity of SLIC is

O (M N)

, the FCM is

O (M N k)

, the PCANet1 is

O (M N k^{2} (L_{1} + L_{2}) + M N k^{4})

and the SVM is

O (M N k^{2})

. In the second phase, due to the masking operation, the number of pixels actually participating in the operation is no longer

M \times N

. For ease of description, it is assumed that the number of pixels actually participating in the operation can be arranged into a rectangle of size

M^{'} \times N^{'}

. Then, the computational complexity of SLIC is

O (M^{'} N^{'})

, the LRSD is

O (M^{'} N^{'} k^{'} + {k^{'}}^{3})

, where

k^{'}

is one dimension of a patch reshaped from a superpixel in the second phase. The computational complexity of FCM is

O (M^{'} N^{'} k^{'})

, the PCANet2 is

O (M^{'} N^{'} {k^{'}}^{2} (L_{1} + L_{2}) + M^{'} N^{'} {k^{'}}^{4})

and the SVM is

O (M^{'} N^{'} {k^{'}}^{2})

. Therefore, the total computational complexity of the proposed method is summed as

O (M N k + M^{'} N^{'} k^{'} + M N k^{2} (L_{1} + L_{2} + k^{2}) + M^{'} N^{'} {k^{'}}^{2} (L_{1} + L_{2} + {k^{'}}^{2}))

3. Experiments and Results

To demonstrate the accuracy and effectiveness of the proposed approach, we compared TPOBDL with other state-of-the-art methods: principal component analysis and k-means clustering (PCAKM) [8], Gabor feature extraction and PCANet (GaborPCANet) [22], neighbourhood-based ratio and extreme learning machine (NR_ELM) [35] and convolutional-wavelet neural network (CWNN) [36].

3.1. Datasets and Experimental Setup

The pre-requisite steps for applying SAR images include geometric correction, radiation correction, and geocoding. Particularly, the multi-temporal SAR images should be registered before change detection. Our experimental datasets were registered by the commercial satellite data supplier at high geometric accuracy.

We applied the proposed and benchmark methods to three real space-borne SAR datasets to evaluate the performance of TPOBDL. The three datasets used are co-registered and geometrically corrected SAR images acquired by the COSMO-Skymed satellite sensor, as shown in Figure 7. The images in Figure 7a,c were acquired on 10 June 2016 and those in Figure 7d,f on 26 April 2017. The three areas are selected to represent different landscapes containing a river, a plain, mountain and buildings. They are all of size

400 \times 400 pixels

. It is obvious that the three SAR datasets suffer from speckle noise. Many studies have pointed out that speckle reduction algorithms result in the loss of spatial resolution and feature suppression [35]. This is because a typical speckle reduction algorithm, such as multi-looking processing, usually involves a moving average within a rectangular window. This will significantly reduce spatial details such as edges, textures and even remove some point-like targets. However, these details are especially useful for change detection. Therefore, no speckle filters were applied to these three SAR datasets prior to our approach. The corresponding ground truth maps are shown in Figure 7g,i, which were obtained by manual annotation. In all ground truth maps, white represents pixels of the changed class, and black represents pixels of the unchanged class.

How to evaluate the performance of SAR image change detection algorithms is a key issue. Here, we utilized several state-of-the-art evaluation metrics, including the false alarm probability

P_{f}

, missing detection probability

P_{m}

, percentage correct classification

P C C

, Kappa coefficient

K C

and

G D / O E

[1,22]. Assume that the actual numbers of pixels belonging to UC and CC are denoted by

N_{u}

and

N_{c}

, respectively, in the ground reference data, then

P_{f} = \frac{F_{n}}{N_{u}} \times 100 %

(20)

P_{m} = \frac{M_{n}}{N_{c}} \times 100 %

(21)

where

F_{n}

denotes the number of unchanged pixels detected as changed, while

M_{n}

represents the number of changed pixels detected as unchanged.

P C C = \frac{(N_{u} + N_{c} - F_{n} - M_{n})}{N_{u} + N_{c}} \times 100 %

(22)

K C = \frac{(P C C - P R E)}{1 - P R E} \times 100 %

(23)

where,

P R E = \frac{(N_{c} + F_{n} - M_{n}) \times N_{c} + (N_{u} + M_{n} - F_{n}) \times N_{u}}{{(N_{c} + N_{u})}^{2}}

(24)

The definition of

G D / O E

is then as follows.

G D / O E = \frac{(N_{u} - M_{n})}{F_{n} + M_{n}} \times 100 %

(25)

3.2. Experiments

We analysed and evaluated the final results visually and quantitatively.

The change detection results of multi-temporal SAR dataset C1 are shown in Figure 8 and Table 1. As presented in Figure 8, the change map of PCAKM contains many false alarms, scattered widely across the image with

P_{f}

reaching 39.23%. This is because PCAKM is unable to classify the false changes caused by strong speckle noise and real changes caused by terrain variation as shown in Figure 8a. However, different from PCAKM, the false alarms of GaborPCANet, NR_ELM and CWNN are centred in the river, as shown in Figure 8b–d. On one hand, PCAKM uses pixel values for change detection, which are affected by strong speckle noise. Thus, the

P_{f}

of PCAKM is very high. However, GaborPCANet and CWNN, two deep learning-based methods, can extract deep features and have a certain speckle noise suppression capability, so the

P_{f}

is greatly reduced compared to PCAKM. Moreover, the extreme learning machine in NR_ELM can also effectively extract features and suppress speckle noise. Therefore, the performance of GaborPCANet, NR_ELM and CWNN is better than that of PCAKM. On the other hand, compared to the original two SAR images, we found that false alarms occur in the river region for the latter three methods. The river region in the two SAR images looks very dark, because the river backscatter of electromagnetic waves is relatively weak. Thus, under strong speckle noise, the signal-to-noise ratio (SNR) in the river region of the SAR image is very low. Therefore, in this case, the difference in values of pixels between the two images in the river region is relatively large, and pixels in the river region are easily classified as CC.

It can be seen that the final change map obtained by the proposed approach TPOBDL is very close to the ground reference, as shown in Figure 8f. Compared with the former methods, the

P_{f}

obtained by TPOBDL is only 0.18% (see Table 1), which is a remarkable result. This is because the second phase of TPOBDL uses a special network to identify the pixels of FCC and those of RCC. In addition, compared to CWNN, our approach uses object-based deep learning removing those scattered false alarms effectively, which demonstrates the advantages of object-based deep learning. Therefore, TPOBDL can eliminate effectively the false alarms caused by strong speckle noise.

As can be seen from Table 1, the quantitative analysis is consistent with the visual analysis. The performance of TPOBDL is better than for the benchmark algorithms in terms of

P C C

,

P_{f}

,

K C

and

G D / O E

. It is worth noting that although the

P_{m}

of PCAKM, GaborPCANet and NR_ELM are smaller than that of TPOBDL, these three methods come at the cost of a much larger

P_{f}

. The reason why the

P_{m}

of our method is larger than for the three benchmark methods, is that a few superpixel objects of RCC are mistakenly classified as FCC in the second deep learning phase. Therefore, we need to consider the value of the more convincing

K C

. TPOBDL has the highest value of

K C

(97.84%), which means that the change detection accuracy of TPOBDL is the highest amongst all five methods.

Figure 9 and Table 2 present the final change detection results on dataset C2. In terms of visual comparison, PCAKM still includes many false alarms. The performance of GaborPCANet is better than that of PCAKM in terms of

P_{f}

. However, there are several false alarms due to speckle noise. Moreover, for each of PCAKM, GaborPCANet or NR_ELM, there is an obvious long and narrow area with fewer false alarms in the upper right corner of the change map. Comparing the original two multi-temporal SAR images, we find that this long and narrow area has an area of relatively strong back-scattering (visually white), which means the amplitude value of these pixels is relatively large. This indicates that change detection in areas with strong scattering is less affected by speckle noise because of the high SNR. This situation is exactly the opposite of the high false alarm phenomenon in the river region in the experiments on C1. As for CWNN, it is clear that the value of

P_{f}

due to speckle noise is smaller than for the three benchmarks. This benefit arises from the wavelet pooling layers in CWNN, which suppress speckle noise by losing high-frequency sub-bands while preserving low-frequency sub-bands to extract features. However, TPOBDL has less false alarms than CWNN, because the object-based methodology is adopted, which greatly reduces classification uncertainty induced by rectangular patches. As for TPOBDL, two-phase deep learning is not only effective for change detection in low SNR region, but also for change detection in high SNR regions. This is due to the influence of the LRSD, which greatly constrains the influence of speckle noise. Among the five methods, TPOBDL has the best performance in terms of

P C C

,

P_{f}

,

G D / O E

and

K C

, reaching 99.43%, 0.26%, 4.70% and 95.67%, respectively.

The results of experiments on dataset C3 are exhibited in Figure 10 and Table 3. The performance of PCAKM is again the least good. Compared with the first two datasets, there are no weak backscattering regions (like river, C1) or strong backscattering regions (like mountain, C2). However, the contrast in the whole scene of C3 is relatively low, which means that classification may be more challenging due to low discrimination. Thus, it can be seen from Table 3 that the

P_{m}

of all methods is relatively high. Still, TPOBDL is superior to CWNN in terms of

P_{m}

under the circumstances, which is opposite to the experiments on C1 and C2. Among the five methods, TPOBDL again produces the best result, with a

P C C

of 98.42%,

P_{f}

of 1.18%,

G D / O E

of 1.59% and

K C

of 89.32%. It is worth noting that in the experiments on C3, TPOBDL again produces the best values of

P C C

,

P_{f}

and

K C

, while also producing a similar

P_{m}

of 19.64% to other methods, at the same time. The experimental results illustrate the superiority of TPOBDL.

4. Discussion

4.1. Parameters Selection

In the proposed approach, there exist four parameters to be discussed, which are the number of superpixels

S P_{1}

and the patch size

k_{1}

in the first phase, and the equivalents,

S P_{2}

and

k_{2}

, in the second phase. These four parameters affect the ability to learn neighbourhood information in the two-phase object-based deep learning approach. As indicated in [21], when the patch size is set as

5 \times 5

, it leads to an optimal result. Hence, we fix

k_{1} = 5

at the beginning. As for

S P_{1}

and

S P_{2}

, to reduce redundancy and increase superpixel generation efficiency, we assume

S P_{i} \approx (M \times N) / k_{i}^{2}

(i = 1, 2)

, which means that the number of pixels in a superpixel and the number of pixels in a patch should be the same, as far as possible. So we fix

S P_{1} = 6400

. Then, we conduct experiments on

S P_{2} =

17800, 6400, 3200

and

k_{2}

= 3, 5, 7, 9

in pair-wise fashion, respectively. The experimental results are shown in Figure 11 and Figure 12.

Observing from Figure 11 and Figure 12, we found that when

S P_{2} =

17800

and

k_{2}

= 3

, the values of

P C C

and

K C

were the best. The experimental result is consistent with the principle of the proposed approach. As mentioned before, the spatial context of the pixels has altered significantly after masking in the second phase. There may be many discontinuous areas after masking. Hence, superpixel objects with a small number of pixels have the benefit of avoiding heterogeneous pixels inside the objects, which reduces classification uncertainty in PCANet2. This reveals that, in the second phase, the relatively small superpixels helps the PCANet2 to exploit more details, which cater to the purpose of distinguishing RCC and FCC.

We then fixed the parameters of the second phase as

S P_{2} =

17800

and

k_{2}

= 3

to conduct experiments on

S P_{1} =

17800, 6400, 3200

and

k_{1}

= 3, 5, 7, 9

in a pair-wise fashion, respectively. The experimental results are presented in Figure 13 and Figure 14.

As shown in Figure 13 and Figure 14, there are two pairs of

S P_{1}

and

k_{1}

that obtain a larger and KC than other parameter values. One pair is

S P_{1}

= 6400

and

k_{1}

= 5

, and the other pair is

S P_{1}

= 3200

and

k_{1}

= 7

. This means that superpixels with relatively large number of pixels are of benefit for classifying UC and CC in the first phase. After further observation, these two pairs of parameters adhere to

S P_{i} \approx (M \times N) / k_{i}^{2}

, which indicates that theoretically the number of pixels in a superpixel should be similar to the number of pixels in a patch. Thus, the best parameter combination is

S P_{1}

= 3200

,

k_{1}

= 7

for the first phase, and

S P_{2} =

17800

,

k_{2}

= 3

for the second phase.

4.2. Comparison with Other Methods

Firstly, we compare the proposed approach with four other methods. The experimental results of all methods are presented in Figure 8 and Figure 10 and Table 1, Table 2 and Table 3. TPOBDL outperforms other methods in all evaluation indicators, except for missing alarms rate. This is because by using superpixel objects and two phases of PCANet, TPOBDL is more robust to speckle noise, able to extract deep features and capable of learning the nonlinear relations from multi-temporal SAR images efficiently. The patches reshaped from superpixel objects with homogeneous pixels are beneficial to the deep feature extraction and PCANet training, which avoids uncertainty due to rectangular patches.

The two deep learning phases in TPOBDL are important for acquiring the desired change detection performance. The first phase generally classifies pixels into two classes, CC and UC. However, there are actually two kinds of changes in CC. One is strong speckle noise-induced change, and the other is real terrain variation-induced change. In the second phase, the pixels belonging to UC are set to zero so that the PCANet2 can focus on identifying two indistinguishable changes. PCANet2 faces a more difficult classification tasks than PCANet1. Hence, we equip the second phase with LRSD to suppress noise and increase the ability to discriminate the two previously indistinguishable changes. Despite noise interference, multi-temporal SAR images of the same object should have a strong correlation. Based on this principle, we established the LRSD model. LRSD can not only suppress speckle noise, but also highlight the correlation between objects via the low rank constraint, as shown in Figure 15. Through this, TPOBDL achieves the best performance amongst the five methods when facing strong speckle noise. It is worth noting that there is no speckle filtering in TPOBDL.

4.3. Modular Deep Learning Framework for Change Detection

In the proposed approach, PCANet1 in the first phase completes the classification tasks of CC and UC, and PCANet2 in the second phase completes the classification tasks of RCC and FCC. In fact, other deep neural networks can also be used in the first stage, instead of PCANet. In the same way, it is not necessary to use the PCANet in the second phase. Therefore, the two phase deep learning framework proposed in this paper can be regarded as a modular structure. The structure does not actually limit what deep learning models are used. The key to this modular structure is hierarchical classification. Moreover, the advantage of this modular deep learning framework is that the deep neural network in each module can complete a specialized, and not particularly complicated task, so the difficulty of classification in each module is reduced. For example, in this research, if only one PCANet is used to complete the classification of UC, RCC and FCC simultaneously, it is easy to generate more misclassifications, which will lead to a larger number of false alarms or larger number of missing alarms. In addition, this modular deep learning-based change detection structure is particularly suitable for engineering implementation.

4.4. Time-Series SAR Images to Suppress Speckle Noise

In fact, we used LRSD to strip speckle noise at the beginning of the second phase, so as to differentiate between false change and real change. The LRSD cannot strip off the speckle noise completely. Thus, how to improve the speckle noise separation effect in the second phase without the loss of spatial details would be our future work. The multi-temporal speckle noise reduction can potentially be used, which may better preserve spatial details. With multi-temporal SAR image time series, change-detection-aware speckle noise reduction algorithm may be also applied in our future research.

5. Conclusions

In this research, a novel change detection algorithm with two-phase object-based deep learning approach for multi-temporal SAR images is presented. An object-based approach is used instead of a pixel-wise approach. The object-based change detection approach can effectively exploit the spatial context of neighbourhood pixels, which is conducive to increasing the ability to identify UC and CC. Using superpixel objects, the pixels in each object are generally more homogeneous, which avoids the classification uncertainty caused by heterogeneous pixels and provides high-quality training samples for subsequent PCANets. In addition, this paper uses a two-phase deep learning framework to implement change detection on multi-temporal SAR images. The first phase of deep learning realizes the distinction between UC and CC. The second phase of deep learning realizes the distinction between RCC and FCC. The two-phase deep learning framework can tackle effectively the classification challenge faced by deep learning in each phase, and can effectively distinguish RCC and FCC, while maintaining a very low false alarm under strong speckle noise. The experimental results illustrate that the proposed approach can achieve high accuracy and validity.

Author Contributions

Conceived and designed the scheme, X.Z. (Xinzheng Zhang), G.L.; conducted experiments, G.L.; analysed and discussed the results, X.Z. (Xinzheng Zhang), P.M.A., C.Z.; wrote the first draft, X.Z. (Xinzheng Zhang), G.L.; completed the revised paper, X.Z. (Xinzheng Zhang), C.Z., P.M.A.; gave some suggestions for the paper, X.T., X.J., X.Z. (Xinchuan Zhou), Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Foundation of China under Grants No. 61301224. This research was also partly supported by the Basic and Advanced Research Project in Chongqing under Grants No. cstc2017jcyjA1378 and No. cstc2016jcyjA0457.

Acknowledgments

The authors would like to thank the reviewers for valuable suggestions that increased the quality of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gong, M.; Zhao, J.; Liu, J.; Miao, Q.; Jiao, L. Change Detection in Synthetic Aperture Radar Images Based on Deep Neural Networks. IEEE Trans. Neural Netw. 2016, 27, 125–138. [Google Scholar] [CrossRef]
Chavez, P.S.J.; MacKinnon, D.J. Automatic detection of vegetation changes in the southwestern United States using remotely sensed images. Photogram. Eng. Remote Sens. 1994, 60, 1285–1294. [Google Scholar]
Bruzzone, L.; Prieto, D.F. Automatic analysis of the difference image for unsupervised change detection. IEEE Trans. Geosci. Remote Sens. 2000, 38, 1171–1182. [Google Scholar] [CrossRef] [Green Version]
Neagoe, V.E.; Stoica, R.M.; Ciurea, A.I.; Bruzzone, L.; Bovolo, F. Concurrent Self-Organizing Maps for Supervised/Unsupervised Change Detection in Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. 2017, 7, 3525–3533. [Google Scholar] [CrossRef]
Li, C.; Gao, L. Context-Sensitive Similarity Based Supervised Image Change Detection. In Proceedings of the Int. Joint Conference on Neural Networks, Changsha, China, 13–15 August 2016. [Google Scholar]
Bujor, F.; Trouvé, E.; Valet, L.; Nicolas, J.-M.; Rudant, J.-P. Application of log-cumulants to the detection of spatiotemporal discontinuities in multitemporal SAR images. IEEE Trans. Geosci. Remote Sens 2004, 42, 2073–2084. [Google Scholar] [CrossRef]
Gong, M.; Zhou, Z.; Ma, J. Change detection in synthetic aperture radar images based on image fusion and fuzzy clustering. IEEE Trans. Image Process 2012, 21, 2141–2151. [Google Scholar] [CrossRef]
Celik, T. Unsupervised Change Detection in Satellite Images Using Principal Component Analysis and k -Means Clustering. IEEE Trans. Geosci. Remote Sens. 2009, 6, 772–776. [Google Scholar] [CrossRef]
Gong, M.; Su, L.; Jia, M.; Chen, W. Fuzzy Clustering With a Modified MRF Energy Function for Change Detection in Synthetic Aperture Radar Images. IEEE Trans. Fuzzy Syst. 2013, 2, 98–109. [Google Scholar] [CrossRef]
Li, Y.; Zhou, L.; Peng, C.; Jiao, L. Spatial Fuzzy Clustering and Deep Auto-encoder for Unsupervised Change Detection in Synthetic Aperture Radar Images. In Proceedings of the Geoscience and Remote Sensing (IGARSS) IEEE International Symposium, Valencia, Spain, 22–27 July 2018. [Google Scholar]
Bengio, Y. Deep Learning of Representations for Unsupervised and Transfer Learning. Available online: http://www.iro.umontreal.ca/~lisa/pointeurs/DL_tutorial.pdf (accessed on 4 February 2020).
Krizhevsky, A.; Sutskever, I.; Hinton, G. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, CA, USA, 3–8 December 2012. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the IEEE International Conference on Learning Representation (ICLR), Vancouver, BC, Canada, 7–9 May 2015. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
Liu, T.; Li, Y.; Xu, L. Dual-Channel Convolutional Neural Network for Change Detection of Multitemporal SAR Images. Available online: https://0-ieeexplore-ieee-org.brum.beds.ac.uk/document/8278979 (accessed on 4 February 2020).
Liu, F.; Jiao, L.; Tang, X.; Yang, S.; Ma, W.; Hou, B. Local Restricted Convolutional Neural Network for Change Detection in Polarimetric SAR Images. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 818–833. [Google Scholar] [CrossRef]
Lv, N.; Chen, C.; Qiu, T.; Sangaiah, A.K. Deep Learning and Superpixel Feature Extraction Based on Contractive Autoencoder for Change Detection in SAR Images. IEEE Trans. Ind. Inf. 2018, 14, 5530–5538. [Google Scholar] [CrossRef]
De, S.; Pirrone, D.; Bovolo, F.; Bruzzone, L. A Novel Change Detection Framework Based on Deep Learning for The Analysis of Multi-Temporal Polarimetric SAR Images. In Proceedings of the Geoscience and Remote Sensing (IGARSS), IEEE International Symposium, Fort Worth, TX, USA, 23–28 July 2017. [Google Scholar]
Gao, Y.; Gao, F.; Dong, J.; Wang, S. Transferred Deep Learning for Sea Ice Change Detection From Synthetic-Aperture Radar Images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1–5. [Google Scholar] [CrossRef]
Chan, T.H.; Gao, J.K.; Lu, S.; Zeng, J.; Ma, Z.Y. PCANet: A Simple Deep Learning Baseline for Image Classification? IEEE Trans. Image Process 2015, 24, 5017–5032. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gao, F.; Dong, J.; Li, B.; Xu, Q. Automatic Change Detection in Synthetic Aperture Radar Images Based on PCANet. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1792–1796. [Google Scholar] [CrossRef]
Li, M.; Li, M.; Zhang, P.; Wu, Y.; Song, W.; An, L. SAR Image Change Detection Using PCANet Guided by Saliency Detection. IEEE Geosci. Remote Sens. Lett. 2018, 16, 402–406. [Google Scholar] [CrossRef]
Wang, R.; Zhang, J.; Chen, J.; Jiao, L.; Wang, M. Imbalanced Learning-Based Automatic SAR image change Detection by Morphologically Supervised PCA-Net. IEEE Geosci. Remote Sens. Lett. 2018, 16, 554–558. [Google Scholar] [CrossRef] [Green Version]
Zhang, C.; Sargent, I.; Pan, X.; Li, H.; Gardiner, A.; Hare, J.; Atkinson, P.M. An object-based convolutional neural network (OCNN) for urban land use classification. Remote Sens. Environ. 2018, 216, 57–70. [Google Scholar] [CrossRef] [Green Version]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrun, S. SLIC Superpixels Compared to State-of-the-Art Superpixel Methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [Green Version]
Luo, F.L.; Huang, H.; Duan, Y.L.; Liu, J.M.; Liao, Y.H. Local Geometric Structure Feature for Dimensionality Reduction of Hyperspectral Imagery. Remote Sens. 2017, 9, 790. [Google Scholar] [CrossRef] [Green Version]
Zhang, X.; Xia, J.; Tan, X.; Zhou, X.; Wang, T. PolSAR Image Classification via Learned Superpixels and QCNN Integrating Color Features. Remote Sens. 2019, 11, 1831. [Google Scholar] [CrossRef] [Green Version]
Zhang, C.; Sargent, I.; Pan, X.; Li, H.; Gardiner, A.; Hare, J.; Atkinson, P.M. Joint Deep Learning for land cover and land use classification. Remote Sens. Env. 2019, 221, 173–187. [Google Scholar] [CrossRef] [Green Version]
Zou, H.; Qin, X.; Zhou, S.; Ji, K. A Likelihood-Based SLIC Superpixel Algorithm for SAR Images Using Generalized Gamma Distribution. Sensors 2016, 16, 1107. [Google Scholar] [CrossRef] [PubMed]
Hou, B.; Kou, H.; Jiao, L. Classification of polarimetric SAR images using multilayer autoencoders and superpixels. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 3072–3081. [Google Scholar] [CrossRef]
Cannon, R.L.; Dave, J.V.; Bezdek, J.C. Efficient Implementation of the Fuzzy c-Means Clustering Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 1986, 8, 248–255. [Google Scholar] [CrossRef] [PubMed]
Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust Recovery of Subspace Structures by Low-Rank Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar] [CrossRef] [Green Version]
Zhao, Y.; Yang, J. Hyperspectral image denoising via sparse representation and low-rank constraint. IEEE Trans. Geosci. Remote Sens. 2014, 53, 296–308. [Google Scholar] [CrossRef]
Gao, F.; Dong, J.; Li, B.; Xu, Q.; Xie, C. Change detection from synthetic aperture radar images based on neighborhood-based ratio and extreme learning machine. J. Appl. Remote Sens. 2016, 10, 046019. [Google Scholar] [CrossRef]
Gao, F.; Wang, X.; Gao, Y.; Dong, J.; Wang, S. Sea Ice Change Detection in SAR Images Based on Convolutional-Wavelet Neural Networks. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1230–1244. [Google Scholar] [CrossRef]

Figure 1. The scheme of the proposed approach.

Figure 2. Illustration of copying superpixel pattern from

I_{1}

to

I_{2}

.

Figure 2. Illustration of copying superpixel pattern from

I_{1}

to

I_{2}

.

Figure 3. Patch generation in stage 1.

Figure 4. The structure of PCANet.

Figure 5. Construction of matrix

Φ

.

Figure 5. Construction of matrix

Φ

.

Figure 6. Low rank and sparse decomposition (LRSD) of the vectors from superpixel objects.

Figure 7. Synthetic aperture radar (SAR) images including (a–f), were acquired by the COSMO-Skymed spaceborne SAR instrument at X-band, which has the spatial resolution of 3 m. Each of (a–f) has the size of 400

\times

400 pixels, equivalent to a ground area of 1.2 km

\times

1.2 km. (a,d) are dataset C1 that contains river and mountains, and (g) is its ground truth. (b,e) are dataset C2 that contains buildings, roads and mountains and (h) is its ground truth. (c,f) are dataset C3 that contains plain and buildings and (i) is its ground truth.

Figure 7. Synthetic aperture radar (SAR) images including (a–f), were acquired by the COSMO-Skymed spaceborne SAR instrument at X-band, which has the spatial resolution of 3 m. Each of (a–f) has the size of 400

\times

400 pixels, equivalent to a ground area of 1.2 km

\times

1.2 km. (a,d) are dataset C1 that contains river and mountains, and (g) is its ground truth. (b,e) are dataset C2 that contains buildings, roads and mountains and (h) is its ground truth. (c,f) are dataset C3 that contains plain and buildings and (i) is its ground truth.

Figure 8. Results of experiments on C1; (a) principal component analysis and k-means clustering (PCAKM); (b) GaborPCANet; (c) neighbourhood-based ratio and extreme learning machine (NR_ELM); (d) convolutional-wavelet neural network (CWNN); (e) two-phase object-based deep learning (TPOBDL); (f) ground truth.

Figure 9. Results of experiments on C2; (a) PCAKM; (b) PCANet; (c) NR_ELM; (d) CWNN; (e) TPOBDL; (f) ground truth.

Figure 10. Results of experiments on C3; (a) PCAKM; (b) PCANet; (c) NR_ELM; (d) CWNN; (e) TPOBDL; (f) ground truth.

Figure 11. The influence of different parameters (

S P_{2}

and

k_{2}

) on PCC.

Figure 11. The influence of different parameters (

S P_{2}

and

k_{2}

) on PCC.

Figure 12. The influence of different parameters (

S P_{2}

and

k_{2}

) on KC.

Figure 12. The influence of different parameters (

S P_{2}

and

k_{2}

) on KC.

Figure 13. The influence of different parameters (

S P_{1}

and

k_{1}

) on PCC.

Figure 13. The influence of different parameters (

S P_{1}

and

k_{1}

) on PCC.

Figure 14. The influence of different parameters (

S P_{1}

and

k_{1}

) on KC.

Figure 14. The influence of different parameters (

S P_{1}

and

k_{1}

) on KC.

Figure 15. (a) A selected object before LRSD; (b) The object after LRSD.

Table 1. Comparison of evaluation metrics amongst PCAKM, GaborPCANet, NR_ELM, CWNN and TPOBDL on dataset C1 using the false alarm probability (

P_{f}

), missing detection probability (

P_{m}

), percentage correct classification (

P C C

), Kappa coefficient (

K C

) and

G D / O E

.

Table 1. Comparison of evaluation metrics amongst PCAKM, GaborPCANet, NR_ELM, CWNN and TPOBDL on dataset C1 using the false alarm probability (

P_{f}

), missing detection probability (

P_{m}

), percentage correct classification (

P C C

), Kappa coefficient (

K C

) and

G D / O E

.

Methods	Results on C1(%)
Methods	$P C C$	$P_{f}$	$P_{m}$	$G D / O E$	$K C$
PCAKM [9]	60.99	39.24	1.78	0.07	58.87
GaborPCANet [23]	64.67	35.46	4.88	0.08	59.36
NR_ELM [33]	73.85	26.26	9.86	0.11	61.39
CWNN [34]	85.22	14.69	29.18	0.19	65.67
TPOBDL	99.71	0.18	15.10	9.97	97.84

Table 2. Comparison of evaluation metrics amongst PCAKM, GaborPCANet, NR_ELM, CWNN and TPOBDL on dataset C2 using the false alarm probability (

P_{f}

), missing detection probability (

P_{m}

), percentage correct classification (

P C C

), Kappa coefficient (

K C

) and

G D / O E

.

Table 2. Comparison of evaluation metrics amongst PCAKM, GaborPCANet, NR_ELM, CWNN and TPOBDL on dataset C2 using the false alarm probability (

P_{f}

), missing detection probability (

P_{m}

), percentage correct classification (

P C C

), Kappa coefficient (

K C

) and

G D / O E

.

Methods	Results on C2(%)
Methods	$P C C$	$P_{f}$	$P_{m}$	$G D / O E$	$K C$
PCAKM [9]	55.65	45.24	1.81	0.07	58.13
GaborPCANet [23]	79.64	20.66	6.19	0.14	63.22
NR_ELM [33]	86.99	13.14	7.11	0.21	67.37
CWNN [34]	95.24	4.59	12.41	0.56	78.49
TPOBDL	99.43	0.26	15.02	4.70	95.67

Table 3. Comparison of evaluation metrics amongst PCAKM, GaborPCANet, NR_ELM, CWNN and TPOBDL on dataset C3 using the false alarm probability (

P_{f}

), missing detection probability (

P_{m}

), percentage correct classification (

P C C

), Kappa coefficient (

K C

) and

G D / O E

.

Table 3. Comparison of evaluation metrics amongst PCAKM, GaborPCANet, NR_ELM, CWNN and TPOBDL on dataset C3 using the false alarm probability (

P_{f}

), missing detection probability (

P_{m}

), percentage correct classification (

P C C

), Kappa coefficient (

K C

) and

G D / O E

.

Methods	Results on C3(%)
Methods	$P C C$	$P_{f}$	$P_{m}$	$G D / O E$	$K C$
PCAKM [9]	62.23	38.29	14.39	0.07	58.50
GaborPCANet [23]	84.61	15.32	18.92	0.16	64.84
NR_ELM [33]	89.54	9.98	31.90	0.21	67.56
CWNN [34]	94.53	5.02	25.90	0.43	75.55
TPOBDL	98.42	1.18	19.64	1.59	89.32

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Liu, G.; Zhang, C.; Atkinson, P.M.; Tan, X.; Jian, X.; Zhou, X.; Li, Y. Two-Phase Object-Based Deep Learning for Multi-Temporal SAR Image Change Detection. Remote Sens. 2020, 12, 548. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12030548

AMA Style

Zhang X, Liu G, Zhang C, Atkinson PM, Tan X, Jian X, Zhou X, Li Y. Two-Phase Object-Based Deep Learning for Multi-Temporal SAR Image Change Detection. Remote Sensing. 2020; 12(3):548. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12030548

Chicago/Turabian Style

Zhang, Xinzheng, Guo Liu, Ce Zhang, Peter M. Atkinson, Xiaoheng Tan, Xin Jian, Xichuan Zhou, and Yongming Li. 2020. "Two-Phase Object-Based Deep Learning for Multi-Temporal SAR Image Change Detection" Remote Sensing 12, no. 3: 548. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12030548

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Two-Phase Object-Based Deep Learning for Multi-Temporal SAR Image Change Detection

Abstract

1. Introduction

2. Methodology

2.1. Problem Statement and Overview of the Proposed Method

2.2. First Phase Deep Learning

2.2.1. Superpixel Generation of Multi-Temporal SAR Images

2.2.2. Superpixel DI Generation and FCM

2.2.3. Training PCANet1

2.3. Second Phase Deep Learning

2.3.1. Superpixel Generation on the Updated SAR Images

2.3.2. Low Rank and Sparse Decomposition

2.3.3. SPDI Generation and FCM

2.3.4. Training PCANet2 and Obtaining the Final Change Map

2.4. Computational Complexity

3. Experiments and Results

3.1. Datasets and Experimental Setup

3.2. Experiments

4. Discussion

4.1. Parameters Selection

4.2. Comparison with Other Methods

4.3. Modular Deep Learning Framework for Change Detection

4.4. Time-Series SAR Images to Suppress Speckle Noise

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI