When Deep Learning Meets Multi-Task Learning in SAR ATR: Simultaneous Target Recognition and Segmentation

Wang, Chenwei; Pei, Jifang; Wang, Zhiyong; Huang, Yulin; Wu, Junjie; Yang, Haiguang; Yang, Jianyu

doi:10.3390/rs12233863

Open AccessArticle

When Deep Learning Meets Multi-Task Learning in SAR ATR: Simultaneous Target Recognition and Segmentation

The Department of Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2020, 12(23), 3863; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12233863

Submission received: 13 October 2020 / Revised: 16 November 2020 / Accepted: 22 November 2020 / Published: 25 November 2020

(This article belongs to the Special Issue Target Recognition in Synthetic Aperture Radar Imagery)

Download

Browse Figures

Versions Notes

Abstract

:

With the recent advances of deep learning, automatic target recognition (ATR) of synthetic aperture radar (SAR) has achieved superior performance. By not being limited to the target category, the SAR ATR system could benefit from the simultaneous extraction of multifarious target attributes. In this paper, we propose a new multi-task learning approach for SAR ATR, which could obtain the accurate category and precise shape of the targets simultaneously. By introducing deep learning theory into multi-task learning, we first propose a novel multi-task deep learning framework with two main structures: encoder and decoder. The encoder is constructed to extract sufficient image features in different scales for the decoder, while the decoder is a tasks-specific structure which employs these extracted features adaptively and optimally to meet the different feature demands of the recognition and segmentation. Therefore, the proposed framework has the ability to achieve superior recognition and segmentation performance. Based on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset, experimental results show the superiority of the proposed framework in terms of recognition and segmentation.

Keywords:

synthetic aperture radar (SAR); automatic target recognition (ATR); multi-task learning; deep learning

Graphical Abstract

1. Introduction

Synthetic aperture radar (SAR) is an important microwave remote sensing system in the domains of military and civilian applications. With the high-resolution coherent imaging capability of all weather and all day penetration, it can obtain more distinct information than optical sensors, infrared sensors, etc. [1,2]. Moreover, it is able to acquire abundant backscattering characteristics of the targets. These backscattering characteristics contain unique identifying information of target attributes, which is often difficult to accurately interpret from the perspective of human vision. Besides, it is usually a hard task to accomplish real-time processing when the size and number of SAR images are increasing. Therefore, SAR automatic target recognition (ATR) has become one of the most crucial and challenging issues in SAR application.

Basically, the fundamental problem of SAR ATR is to locate and recognize the objects of interest in an environment with clutters in SAR images [3]. The standard architecture of the SAR ATR system proposed by MIT Lincoln Laboratory has three main stages: detection, discrimination and classification [4]. In the detection stage, a constant false alarm rate (CFAR) detector is employed to localize where a potential target is likely to exist in the SAR image. Then, in the discrimination stage, some specific discriminating criteria are adopted to reject cultural and natural clutter false alarms. In the classification stage, an elaborate and efficient classifier provides additional false alarm rejection and categorizes the remaining detections as specific target types.

Many novel classification algorithms and systems have been proposed in recent years and performed well in applications [5,6,7]. These various methods for the classification stage in general can be taxonomized into two mainstream paradigms: template-based and model-based. The template-based taxon is a pattern recognition approach solely relying on templates to represent the targets [8]. These templates could be two-dimensional target templates or extracted feature vectors. The process of the template-based taxon involves two distinctive stages: offline classifier training and online classification. Despite templated-based taxon’s simplicity and popularity [9], it may be unable to cope with extended operating conditions (EOCs). Unlike the template-based taxon, the model-based taxon mainly focuses on representing the characterization of the physical structure of the target [10]. Typically, a model-based taxon consists of two main stages: holistic physical model construction for the target and the online classification prediction that yields close resemblance to the input SAR chips. Despite model-based taxon can circumvent the EOCs to some extent, it also faces the problem of the additional complexity in the SAR ATR system.

In recent decades, deep learning has been applied in signal and image processing fields and demonstrated its superior performance. As for the SAR ATR application, many excellent studies have proposed many deep learning methods with outstanding results [11,12,13,14,15]. Chen et al. [16] proposed an all-convolutional network replacing all the dense layers with the convolutional layers, which leads to outstanding recognition performance. Wagner et al. [17] proposed a combination of a convolutional neural network and support vector machines to incorporate prior knowledge and enhance its robustness against imaging errors. Jiao et al. [18] proposed a multi-scale and multi-scene ship detection approach for SAR images, which could detect small scale ships and avoid the interference of inshore complex background. Li et al. [19] proposed block sparse Bayesian learning (BSBL) to synthetic aperture radar (SAR) target recognition, which considers the azimuthal sensitivity of SAR images and the sparse coefficients on the local dictionary.

However, most of the existing SAR ATR methods only focus on improving the detection or recognition performance and still need various separate subsystems for different functions. In practice, the whole system of SAR ATR has great demand in acquiring multifarious information of the given target, such as location, category, shape, morphological contour, ambient relationships, etc. Furthermore, when multiple subsystems are employed to achieve the goals of detection, recognition, etc., the complexity of the whole SAR ATR system will be too high to meet the practical demand.

In practice, for analyzing the purpose of the detected targets, it is necessary to get enough target attributes which are composed of multi-dimensional information. For example, it is critical to gain the categories and the tracks of detected ships simultaneously to judge if they are going to transport or attack. The category and geometric structure of the target contain substantial information among the multifarious target attributes [20]. Therefore, it is necessary and valuable to extract the category and geometric structure of the given target with one system, namely one SAR ATR system deals with multiple tasks simultaneously.

Fortunately, multi-task learning (MTL) can handle different related tasks simultaneously, which can refer to the joint learning of multiple problems, enforce a common intermediate parameterization and replace multiple subsystems with one system. With the relevance of the different tasks, it could improve the generalization performance of the system, which is caused by leveraging the domain-specific information contained in the training dataset of related tasks [21,22,23]. Besides, these related tasks are learned simultaneously by extracting and utilizing appropriate shared information across tasks. From a machine learning point of view, MTL could be regarded as a form of inductive transfer, which could improve a model by introducing an inductive bias provided by the related tasks [24,25].

Considering that the superior performance of deep learning, by introducing the deep learning theory into the framework of MTL, MTL will acquire the capability of adaptive feature learning and powerful feature representation to promote its performance [26], which would be a perfect encounter in SAR ATR. Furthermore, it is possible that a neural network MTL module can increase the performance of the whole SAR ATR system using the relevance between tasks.

Therefore, in this paper, we propose a novel multi-task deep learning framework for recognition and segmentation of the SAR target to obtain both its category and shape information. First, we construct a multi-task deep learning framework to complete target recognition and segmentation in SAR images, which consists of two parts: encoder and decoder. Second, a shared encoder is designed to extract effective features in different scales for morphological segmentation and recognition. Then, through constructing two different sub-network structures, the two decoders have the capability of employing these extracted features adaptively and optimally to meet the different feature demands of the recognition and segmentation. Therefore, the proposed multi-task framework has the capability of extracting sufficient category and shape information of the SAR target.

The remainder of this paper is organized as follows. An overview of the multi-task deep learning framework is presented in Section 2. The specific design and instantiation of the proposed framework are given in Section 3. Section 4 evaluates the performance of our proposed framework with experiments. Section 5 gives the conclusions.

2. MTL Deep Learning Framework for SAR Target Segmentation and Recognition

As mentioned above, the category and geometric structure of the targets are able to provide sufficient information of the SAR targets in practice. Therefore, we propose an MTL deep learning framework to efficiently extract multifarious attributes of the SAR target and achieve the recognition and segmentation simultaneously.

The proposed MTL deep learning framework mainly consists of two parts, as shown in Figure 1: encoder and decoder. The encoder is a special structure which is utilized to extract optimal image feature from SAR image to achieve accuracy recognition and segmentation. The key point of the encoder construction is to provide sufficient image features in different scales for the decoder. Then, the decoder is a task-specific structure which is divided into two sub-decoders. The decoder for the precise segmentation is constructed to adopt the fusion of the extracted features in different scales. These features represent the overall contour and local details of the target. Meanwhile, the structure for recognition should finish further extraction and fusion of optimal image features to realize the accurate recognition of the targets. Through the above structures, the proposed multi-task deep learning framework can extract optimal features layer by layer from SAR images and employ these extracted features adaptively and optimally to meet the different feature demands of the recognition and segmentation.

3. Network Architecture of MTL Deep Learning Framework

In this section, a specific implementation of the proposed MTL deep learning framework and the details of its configuration are presented. First, we elucidate the structure of the specific implementation. Then, the configurations of each layer are presented. Finally, the joint loss of the proposed network and the training implementation are given.

3.1. Specific Implementation

The specific implementation of the proposed MTL deep learning framework is presented in Figure 2.

To gain sufficient image information to achieve the recognition and segmentation of the targets, the encoder is designed to consist of three convolutional layers and three max pooling layers to extract different forms of image features in different scales. A rectified linear unit (ReLU) [27] is adopted as an activation function after each convolutional layer, which could increase nonlinear capability. A batch normalization [28] is adopted before each convolutional layer, which could make the middle data distribution more consistent with the distribution of the input data and ensure the nonlinear expression ability of the whole architecture. Therefore, the encoder gets the capability of fitting nonlinear data distribution and acquiring a different form of optimal image features.

Then, owing to the different demands of structure and image feature for the recognition and segmentation, the decoder is designed, respectively, for the two tasks, whose specific forms are two different sub-decoders with two different feature utilizations. The sub-decoder for the recognition consists of two convolutional layers and one max pooling layer. At the last convolutional layer, SoftMax is adopted as a classifier to get the normalized probability distribution of the recognition results. As for the segmentation, the sub-decoder is designed as the structure consist of three transposed convolutional layers [29] and three convolutional layers. After each convolutional layer, there is one skip connection [30] for being combined with the image features extracted by the encoder. There are no activation functions after each convolutional layer. Through the two specific structures of decoder for the recognition and segmentation of the targets, the decoder gets the capability of gaining accurate recognition and precise segmentation.

The details of those layers, activation functions, etc. are described in the following.

3.2. Convolutional Layer and Transposed Convolutional Layer

The convolutional layer is the main component of the whole architecture to percept the local image information and extracts the image feature. Sparse connectivity and weight sharing are two advantages in the convolutional layer to reduce the number of parameters. Sparse connectivity means that the size of connection fields between the feature maps of the

(l - 1) th

layers and the

l th

convolutional layer is the same as the size of convolutional kernels. Weight sharing means that each convolutional kernel is employed to be calculated with all the spatial area in the convolutional layer. Given the

i th

feature map in the

(l - 1) th

layers as

x_{i}^{l - 1}

,

w_{i}^{l - 1}

as one convolutional kernel and

b_{i}^{l - 1}

as the bias in the

l th

convolutional layer. The operation of the

l th

convolutional layer can be presented as

\begin{matrix} x_{j}^{l} = \sum_{i = 1} (x_{i}^{l - 1}) * (w_{i j}^{l}) + b_{j}^{l} \end{matrix}

(1)

where ∗ denotes the convolution.

At the same time, the operation of the convolutional layers can be presented as

\begin{matrix} x^{l} = W^{l} x^{l - 1} + b^{l} \end{matrix}

(2)

where

x^{l}

denotes the

n^{l}

dimension output vector which is reshaped into the matrix,

x^{l - 1}

denotes the

n^{l - 1}

dimension input vector which is reshaped from the matrix and

W^{l}

denoted the reshaped convolutional kernel whose size is

n^{l} \times n^{l - 1}

.

The transposed convolutional layer is an up-sampling method, which could seek for the optimal parameter to up-sample the images. The transposed convolutional layer is actually the reverse operation of the convolutional layer, which means that the forward and backward of the transposed convolutional layer are reverse to the convolutional layers. The operation of the transposed convolutional layer can be described as follows. First, the input image is padded with zero to expand the size. Then, the padded input images are convolved with the transposed convolutional kernels [29]. After each operation of convolution, the position of the next convolution is shifted by the set stride.

The transposed convolutional layer is a main component of the decoder for the segmentation, which is adopted to up-sample and integrates the extracted feature maps adaptively layer by layer. The output size of the

l th

transposed convolutional layer with the factor

s_{T}^{l}

is equal to the convolutional layer with a fractional stride

\frac{1}{s_{T}^{l}}

. Given the

i th

feature map in the

(l - 1) th

layers as

x_{i}^{l - 1}

, the operation of the transposed convolutional layer can be presented as

\begin{matrix} x^{l} = {(W^{l})}^{T} x^{l - 1} + b^{l} \end{matrix}

(3)

where

W^{l}

denotes the reshaped convolutional kernel, whose size is

n^{l} \times n^{l - 1}

.

3.3. Batch Normalization and Rectified Linear Unit

Batch normalization is a trick to train a deep learning network. It not only can accelerate the convergence speed of the network, but also solve the problem called gradient dispersion to a certain extent, which makes it easier and more stable to train a deep learning network [31]. The processing of batch normalization could be divided into three steps as following. First, given a batch of the input images as

B = \{x_{1}, x_{2}, \dots, x_{m}\}

, the average value and the variance of each training data batch are calculated by

\begin{matrix} μ_{B} = \frac{1}{m} \sum_{i = 1}^{m} x_{i} \end{matrix}

(4)

\begin{matrix} σ_{B}^{2} = \frac{1}{m} \sum_{i = 1}^{m} {(x_{i} - μ_{B})}^{2} \end{matrix}

(5)

where

μ_{B}

is the average value of this batch B and

σ_{B}^{2}

is the variance. Then, the batch

B = \{x_{1}, x_{2}, \dots, x_{m}\}

is normalized by

μ_{B}

and

σ_{B}^{2}

to get the 0–1 distribution:

\begin{matrix} {\hat{x}}_{i} = \frac{x_{i} - μ_{B}}{\sqrt{σ_{B}^{2} + ε}} \end{matrix}

(6)

where

ε

is a small positive number to avoid the divisor as zero. Finally, the normalized batch B is subjected to scale transformation and translation by

\begin{matrix} B N_{γ, β} (x_{i}) = γ {\hat{x}}_{i} + β \end{matrix}

(7)

where

γ

is the scale factor and

β

is the translation factor.

B N_{γ, β} (\cdot)

is denoted as the operation of the batch normalization. The two learnable parameters,

γ

and

β

, are introduced to solve the problem that the expression ability of the network is decreased, which is caused by the normalized batch being basically limited to the normal distribution [32].

The Rectified linear unit (ReLU) is an activation function which has less computational complexity than other activation functions [33], such as sigmoid, and solves the problem called vanishing gradient to a certain extent. The formula of the ReLU can be presented as

\begin{matrix} f (x_{i}^{j}) = ReLU (x_{i}^{j}) = \{\begin{matrix} x_{i}^{j} i f x_{i}^{j} > 0 \\ 0 i f x_{i}^{j} \leq 0 \end{matrix} \end{matrix}

(8)

The ReLU will make the output of some feature maps zero, which leads to the sparsity of the network and alleviates the occurrence of the overfitting problem.

3.4. Max Pooling and SoftMax

The max pooling layer is utilized to integrate the information of the feature maps with reducing the number of parameters and the computational complexity of the whole network. The operation of the max pooling layer is to get the maximum value in the window of the feature maps as

\begin{matrix} p_{i} = max_{(u, v) \in P} x_{i} (u, v) \end{matrix}

(9)

where

u, v

is the coordinate of the pixels in the pooling window,

p_{i}

is the output of the max pooling layer and P is the pooling window. Although the max pooling layer has many advantages, it could also pool some crucial information for the segmentation or other tasks.

SoftMax is adopted as a classifier that could normalize the output of the network to be understood as posterior probability with the original intention to make the effect of the feature on probability multiplicative. Given the output vector of the network before SoftMax as

x^{L} = \{x_{1}^{L}, x_{2}^{L}, \dots, x_{C}^{L}\}

, the formula of SoftMax can be presented as

\begin{matrix} p (y_{i} |x^{L}) = \frac{exp (x_{i}^{L})}{\sum_{k = 1}^{C} exp (x_{k}^{L})} \end{matrix}

(10)

where C is the number for the target types,

y_{i}

is the one-hot vector of the target type and

exp (\cdot)

is the power of e. Through the operation of SoftMax, the probability of each type of target is acquired corresponding to each element in the output vector of SoftMax.

3.5. Joint Loss and Backpropagation

The Joint loss is the combination of each task’s loss, which could highly influence the performance of the whole framework. Through choosing the appropriate weights between each task’s loss, the joint loss not only consider the difference between tasks, but also take the advantage of the relevance between tasks, which could lead to a better performance of the whole framework [34]. As for the target recognition and segmentation, the target recognition needs to utilize the features of the scattering distribution and target morphology, which is the same as the target segmentation [35]. Therefore, there is a strong coherence and relevance between the recognition and segmentation of target in the SAR image.

In the proposed multi-task deep learning framework, the joint loss is set as the weighted sum of the recognition loss and the segmentation loss. The recognition loss is set as the cross-entropy cost function, which is presented as

\begin{matrix} L_{r} (w, b) = - \sum_{i = 1}^{C} y_{i} log (p (y_{i} |x^{L})) \end{matrix}

(11)

In nature, the target segmentation is a kind of classification in pixel level. To achieve accurate segmentation, the distance between the segmentation result and the ground truth should be calculated. Therefore, the segmentation loss is set as the cross-entropy cost function of all the pixels in a SAR chip. The segmentation loss is averaged to the same unified scale as the recognition loss, which leads to better and more robust performance [36]. The function of the segmentation loss is defined as

\begin{matrix} L_{s} (w, b) = - \frac{1}{n} \sum_{i = 1}^{V} s_{i} log (p (s_{i} |x^{L})) \end{matrix}

(12)

where

p (s_{i} |x^{L})

is the probability vector f segmentation result of all pixel on the ith SAR chip, n is the number of pixels in a SAR chip,

s_{i}

is the segmentation labels in the form of one hot and V is the number of the segmentation types. Therefore, the joint loss can be presented as

\begin{matrix} L (w, b) = L_{r} (w, b) + L_{s} (w, b) \end{matrix}

(13)

After the joint loss is obtained, the optimal performance of the whole architecture could be obtained through minimizing the joint loss using backpropagation [37].

First, the total error is computed by comparing the output of the architecture with the ground truth.

\begin{matrix} δ_{t o t a l} = \sum_{i = 1} (p (y_{i} |x^{l}) - y_{i}) \end{matrix}

(14)

Then, the error is spread from the high layer to the low layer in the architecture by computing the intermediate error of each layer. When the

l th

layer is one convolutional layer, the intermediate error can be calculated by

\begin{matrix} δ^{l} = ({(w^{l + 1})}^{T} δ^{l + 1}) ⊙ f^{'} (x^{l}) \end{matrix}

(15)

where

f^{'}

denote the 1st derivative of the ReLU,

δ^{l}

denotes the intermediate error of the

l th

convolutional layer and ⊙ denotes Hadamard multiplication. As for the transposed convolutional layers, the formula is

\begin{matrix} δ^{l} = (w^{l + 1} δ^{l + 1}) ⊙ f^{'} (x^{l}) \end{matrix}

(16)

The derivatives for updating

w^{l}

and

b^{l}

of the

l th

layer can be presented as

\begin{matrix} \frac{\partial L_{w}^{l}}{\partial w_{i j}^{l}} = x_{j}^{l - 1} δ_{i}^{l} \end{matrix}

(17)

\begin{matrix} \frac{\partial L_{b}^{l}}{\partial b_{i}^{l}} = δ_{i}^{l} \end{matrix}

(18)

This step is the same for the convolutional and transposed convolutional layers. When the backpropagation comes across the max pooling layers, only the unit with the max value in every pooling field receives the error term and the intermediate error on other units is set as zero.

Finally, Backpropagation updates the trainable parameters of the architecture by

\begin{matrix} w^{l} \to w^{l} - l r \times \frac{\partial L_{w}^{l}}{\partial w^{l}} \end{matrix}

(19)

\begin{matrix} b^{l} \to b^{l} - l r \times \frac{\partial L_{b}^{l}}{\partial b^{l}} \end{matrix}

(20)

where

w^{l}

denotes the convolutional kernels of the

l th

layer,

b^{l}

denotes the bias of the

l th

layer and

l r

denotes the learning rate.

Through the process of the backpropagation, the network gradually achieves the optimal performance, which could achieve accuracy and effective target recognition and segmentation simultaneously. Its performance is presented and compared in the next section.

4. Experiments and Results

In this section, the performance of the multi-task deep learning framework is evaluated. First, the information of the used dataset is introduced in detail. Then, the steps of the data preprocessing are described and the hyperparameter and set-up of the specific implementation of the multi-task deep learning framework are described. Finally, the results and comparisons of the target recognition and segmentation are presented.

4.1. Dataset

The experiment dataset used to evaluate our proposed multi-task deep learning framework is collected from the Moving and Stationary Target Acquisition and Recognition (MSTAR) program. This dataset is released by the Defense Advanced Research Projects Agency and the Air Force Research Laboratory. The dataset is as part of the MSTAR program and collected using the Sandia National Laboratory STARLOS sensor platform [38]. As a benchmark dataset for SAR ATR performance assessment, this dataset has a significant quantity of SAR images containing different types of military vehicles and clutter images. Ten different classes of ground targets (tank, T62 and T72; rocket launcher, 2S1; truck, ZIL131; armored personnel carrier, BTR70, BTR60, BRDM2 and BMP2; air defense unit, ZSU23/4; and bulldozer, D7) were captured as 1-ft resolution X-band SAR images with full aspect coverage (in the range of 0°–360°). They were collected under varying operating conditions, such as different aspect angles, depression angles and serial numbers. As for the segmentation labels, the segmented binary labels are a precise manual marking by the tool called OpenLabeling. The SAR images and corresponding optical images of the target at similar aspect angles are depicted in Figure 3.

To comprehensively assess the performance of recognition, the proposed multi-task deep learning framework was evaluated under the standard operating condition (SOC) and extended operating condition (EOC) [38]. SOC refers to that the serial numbers and target configurations of the train and test set are the same, but with different aspects and depression angles. EOC includes three extended operating conditions: depression variant, configuration variant and version variant. As for the performance of segmentation, the proposed multi-task deep learning framework was assessed with the merit of the visual and objective aspect at the same time as the assessment of the recognition performance.

4.2. Data Preprocessing

Before assessing the performance of the proposed multi-task deep learning framework, data preprocessing was employed to augment the training images and manually annotate the segmentation of the training and testing images. The specific processes are described as follows. At first, we employed data augmentation to generate more training images [39]. The numbers of the training and testing images before the data augmentation are listed in Table 1. The training images were augmented 10 times by randomly sampling ten

88 \times 88

SAR image chips from one original

128 \times 128

SAR image, which ensures the central target was complete [16]. Then, the training and testing datasets of the segmentation were acquired by manual annotation using the tool named OpenLabeling. The manual annotation was based on the intensity and the contour of the target and shadow. The number of the segmentation labels was the same as the one of the original images, and, when the original images encountered the data augmentation, the segmentation labels also went through the data augmentation in the same way. Therefore, the segmentationwas synchronous with the recognition above when the proposed network architecture was being trained or tested. After the data preprocessing, the proposed multi-task deep learning framework could be regarded as a whole network to be trained and evaluated.

4.3. Network Setup

On the basis of the proposed multi-task deep learning framework, a specific implementation was employed to evaluate the proposed framework for SAR ATR. The specific implement is presented in Figure 2. There are three convolutional and three max pooling layers forming the feature extractor. Two convolutional layers, one max pooling layers and one SoftMax layer are composed to accomplish the recognition task. Meanwhile, three de-convolutional layers and three convolutional layers are organized to segment the SAR images. The size of the input SAR images is

88 \times 88

, the stride size of every convolutional layer is

1 \times 1

and the stride size for each max pooling layer is

2 \times 2

. Other hyper parameters in our network instances are shown in Figure 2. The weights of convolutional layers are initialized from Gaussian distributions with zero mean and a standard deviation of 0.01, and biases are initialized with a small constant value of 0.1. The initial learning rate is set as 0.001 and is reduced by a factor of 0.1 after 5 epochs.

4.4. Recognition Results under SOC

In this SOC experimental setup, the performance of the proposed architecture was assessed on the classification of ten targets in the MSTAR dataset. The training and testing images have the same serial number, but are different in the depression angle. As listed in Table 1, the training images were captured at 17° depression angle, while the testing images were captured at 15° depression angle. A summary of this experimental setup for training and testing datasets is listed in Table 1. In Table 1, the number of each target serial is the number of the original SAR images in MSTAR dataset before the data augmentation. The number of each class of the target after the data augmentation is 2700.

The recognition result of the proposed multi-task deep learning is presented in Table 2. Table 2 is a confusion matrix of ten targets, which is widely used to present the classification performance in SAR ATR [40]. The numbers at the diagonal of the confusion matrix are the numbers of correct recognitions for each target.

In Table 2, the recognition ratios of BTR60, I2S1 and D7 are above 96.5%, the recognition ratios of BRDM_2 and T62 are above 99.5%, and the others have achieved 100% recognition ratio. The overall recognition ratio is 99.13%, which is obviously satisfactory. From the recognition result, it is clear that, through the deep convolutional structure, there are some stable features extracted for the recognition of the ten targets among the different targets. Therefore, the proposed network architecture can achieve a satisfactory performance for the ten-target recognition, and these results can also verify the superiority of the proposed architecture in the SOC experiment.

4.5. Recognition Results under EOC

In realistic battlefield situations, there is more complex target recognition in varied operation conditions, such as the variances of the depression angle and target type. Therefore, it is necessary to assess the performance of the SAR ATR algorithm in the EOC. In this section, the stability and effectiveness of the proposed network architecture are evaluated in the variances of the depression angle, target configuration and version, which are denoted as EOC-D, EOC-C and EOC-V, respectively.

The SAR images are extremely sensitive to the variance of the depression angle, so it is important to evaluate the performance of the proposed network architecture at the variance of depression angle, EOC-D. However, the limitation that the MSTAR dataset only contains four targets (2S1, BRDM_2, T-72 and ZSU-234) which have a larger enough variance of depression angle to evaluate EOC-D. The SAR images at 17° depression angle are set as the training dataset and the corresponding SAR images at 30° depression angle are set as the testing dataset. The training dataset is generated by the same data augmentation as the SOC experiment. A summary of the training and testing dataset is listed in Table 3. The number of each class of the training dataset was augmented to 2700, while the number of the training dataset was 10,800.

The recognition performance of the proposed network architecture in the variance of depression angle is presented in Table 4. It can be seen that the recognition performance of the proposed multi-tasks is superior. The total recognition ratio is above 94.00% and the recognition ratios of 2S1, BRDM-2 and ZSU-234 at 30° depression angle are higher than 93.00%. The relatively low recognition ratio for T-72 is caused by the difference between the training and testing dataset at the depression angle and the serial number. From the recognition performance in Table 4, the proposed network architecture is still stable and effective when the depression angle varies greatly.

The performance of the proposed network architecture with the variance of target configuration and version (EOC-C and EOC-V) was also evaluated. Limited by the difficulty of acquiring the SAR images of different configurations and versions of targets, the training datasets for EOC-C and EOC-V could only be set as four targets (BMP-2, BRDM_2, BTR-70 and T-72) at 17° depression angle and the testing datasets are set as the corresponding SAR images of the targets with different configurations and versions. The numbers of the training data of the four targets before the data augmentation are listed in Table 5, and the testing datasets are listed in Table 6 and Table 7. The number of each class of the four targets in the training dataset was augmented to 2700. In Table 5 and Table 6, there are two different configurations of BMP2 and five different configurations of T72 captured at 17° and 15° depression angles to evaluate the recognition performance under the EOC of the target configuration varieties. In Table 5 and Table 7, it can be seen that the testing dataset for EOC-V has four different serial types of T72 from the training dataset, which are captured at 17° and 15° depression angles and utilized to evaluate the recognition performance of the proposed multi-task deep learning framework under the EOC of the target version varieties.

The recognition performance of the proposed network architecture in EOC-C is presented in Table 8. The recognition performance of the proposed network architecture is 98.36% for the variance of target configuration. It can be proved that the proposed network architecture has the ability to recognize the targets with different configurations. As for the recognition performance in EOC-V, which is presented in Table 9, the recognition ratio has reached 99.21% for the five versions of T72. The proposed network architecture is resilient to the variance of the target version.

From the four experiment results of SOC, EOC-D, EOC-C and EOC-V, the proposed network architecture has obtained superior recognition performance. It demonstrates that the proposed multi-task deep learning framework has the ability to extract optimal and effective target features from SAR images, which are also resilient to the variances of the depression angle, target configuration and version.

4.6. Results of SAR Target Segmentation

As mentioned above, the segmentation of the targets in SAR images not only is able to obtain more refined structural features in morphology, but also could obtain the semantic information in the pixel level. Some examples of the segmentation labels for targets are presented in Figure 4. In Figure 4, the left image is the original image in the MSTAR dataset and the middle one is the segmentation ground truth. The right image is the original image masked by the ground truth, which is denoted as the masked original image.

To present the segmentation results visually, some segmentation results of the proposed network architecture for different targets are shown in Figure 5. The first three columns are the original SAR images from the MSTAR dataset, the segmentation ground truth and their corresponding masked original SAR images, respectively. The fourth column is the segmentation results of the proposed multi-task deep learning framework. The last column is the original SAR images masked by the segmentation results. It can be seen that the segmentation results of the proposed multi-task deep learning framework are quite close to the segmentation ground truth in the morphological contour. It can be concluded that the proposed network architecture can segment precisely when the contour and intensity of the targets are varying.

To evaluate the segmentation results more objectively, the pixel accuracy of the segmentation results is employed, which evaluates the accuracy of segmenting the targets from the background. The pixel accuracy is calculated as follows.

\begin{matrix} P_{p a} = s u m (P_{p}) / (P_{a}) \end{matrix}

(21)

where

P_{p a}

is the pixel accuracy,

P_{p}

is the correct predicted pixel and

P_{a}

is the total pixels in one SAR image. It means that the higher the pixel accuracy is, the better the performance is.

The pixel accuracy of the proposed multi-task deep learning framework is presented in the form of a confusion matrix in Table 10. In Table 10, the accuracy for the target or background is above 98.00% and the overall accuracy of the segmentation is higher than 99.00%. From the quantitative analyses, it is quite clear that the proposed network architecture has the ability to segment the targets from the backgrounds precisely and effectively.

From the evaluations of the performance of the target recognition and segmentation, it can be proved that, through the deep learning structure of multiple convolutional layers and the multi-task framework design of the encoder and two sub-decoders, the proposed multi-task deep learning framework can achieve the target recognition and segmentation accurately and effectively and finish those two tasks simultaneously with only one system.

4.7. Comparison of Performance of Segmentation and Recognition

In this section, we compare our proposed algorithm with other algorithms in recognition and segmentation. For recognition, seven SAR ATR algorithms are considered: support vector machine (SVM) [41], adaptive boosting (AdaBoost) [41] IGT [41], CGM [42], two DCNNs and gcForest [43]. SVM and AdaBoost, both traditional algorithms, IGT, based on the probabilistic graphical model, and the two DCNNs [44,45] are state of the art in SAR ATR, while gcForest is recently published. For segmentation, two other algorithms are considered, namely Maximum Between-Class Variance (Otsu Method) [46] and Canny edge detector (Canny) [47], which are traditional algorithms for segmentation in SAR images.

For recognition performance comparison, we compare those algorithms with our proposed algorithm in terms of the recognition performance. The recognition performances are listed in Table 11 under SOC and EOC. In Table 11, the performance of our proposed algorithm is better than other algorithms under SOC and has significant improvement under EOC. Therefore, can be concluded that our proposed algorithm is superior to other algorithms in recognition performance.

For segmentation performance comparison, some segment images of different SAR images using Otsu, Canny and our proposed algorithm are shown in Figure 6. In Figure 6, it is obvious that our proposed algorithm has better performance than other algorithms when the image intensity varies and the contour of images is complicated. At the same time, the pixel accuracies of Otsu, Canny and our proposed algorithm are listed in Table 12. In Table 12, it is clear that the proposed multi-task deep learning framework has higher pixel accuracy than the other algorithms. From the comparisons of the segmentation above, it can be concluded that the proposed multi-task deep learning framework could obtain more accurate segmentation at both the overall contour and local details of the targets.

From the above all the contrast experiments, it is clear that, through the deep learning structure and the multi-task capability, the proposed multi-task deep learning framework not only could extract the optimal effective target feature to achieve the accurate robust recognition, but also could obtain the overall contour and local details of the targets to achieve elaborate segmentation at the same time as the recognition. All the evaluations and the contrast experiments verify that our proposed algorithm has the superiority in both recognition and segmentation with the capability of simultaneous target recognition and segmentation.

5. Conclusions

When deep learning meets multi-task learning, multi-task learning will acquire the capability of adaptive feature learning and powerful feature representation to promote the performances of multiple tasks simultaneously in SAR ATR. Hence, we propose a novel multi-task deep learning framework to obtain accurate category and precise shape of the targets simultaneously. With an elaborately designed encoder, the optimal image features are extracted from different scales to represent the overall contour and local details of the target. With employing these extracted features adaptively and optimally to meet the different feature demands of the recognition and segmentation, the task-specific decoder achieves superior performance in terms of recognition and segmentation simultaneously. Extensive experiments were carried out on the MSTAR dataset, and the results show clearly that the proposed framework not only achieves higher recognition performance than existing SAR ATR methods in SOC and EOCs, but also obtains more precise and stable segmentation performance than other segment methods. With the sufficient target attributes extracted by the proposed multi-task framework, it could make some contributions to the practical application of SAR ATR systems.

Author Contributions

Conceptualization, C.W. and J.P.; methodology, C.W.; software, J.P.; validation, C.W., Z.W., J.P. and J.W.; formal analysis, C.W. and J.W.; investigation, H.Y. and Y.H.; resources, C.W. and Y.H.; data curation, C.W. and Y.H.; writing—original draft preparation, C.W.; writing—review and editing, C.W. and Y.G.; visualization, C.W., Z.W. and Y.H.; supervision, H.Y., Y.G. and J.Y.; project administration, C.W. and H.Y.; and funding acquisition, J.P. and H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant numbers 61901091 and 61671117.

Conflicts of Interest

The authors declare no conflict of interest.

References

Curlander, J.C.; McDonough, R.N. Synthetic Aperture Radar; Wiley: New York, NY, USA, 1991; Volume 11. [Google Scholar]
Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef] [Green Version]
Dudgeon, D.E.; Lacoss, R.T. An Overview Ofautomatic Target Recognition. Linc. Lab. J. 1993, 6, 3–10. [Google Scholar]
Bhanu, B. Automatic target recognition: State of the art survey. IEEE Trans. Aerosp. Electron. Syst. 1986, AES-22, 364–379. [Google Scholar] [CrossRef]
Zhao, Q.; Principe, J.C. Support vector machines for SAR automatic target recognition. IEEE Trans. Aerosp. Electron. Syst. 2001, 37, 643–654. [Google Scholar] [CrossRef] [Green Version]
He, Y.; He, S.Y.; Zhang, Y.H.; Wen, G.J.; Yu, D.F.; Zhu, G.Q. A forward approach to establish parametric scattering center models for known complex radar targets applied to SAR ATR. IEEE Trans. Antennas Propag. 2014, 62, 6192–6205. [Google Scholar] [CrossRef]
Zhang, H.; Nasrabadi, N.M.; Huang, T.S.; Zhang, Y. Joint sparse representation based automatic target recognition in SAR images. In Algorithms for Synthetic Aperture Radar Imagery XVIII; International Society for Optics and Photonics: Bellingham, WA, USA, 2011; Volume 8051, p. 805112. [Google Scholar]
Novak, L.M.; Owirka, G.J.; Brower, W.S.; Weaver, A.L. The automatic target-recognition system in SAIP. Linc. Lab. J. 1997, 10, 187–202. [Google Scholar]
El-Darymli, K.; Gill, E.W.; McGuire, P.; Power, D.; Moloney, C. Automatic target recognition in synthetic aperture radar imagery: A state-of-the-art review. IEEE Access 2016, 4, 6014–6058. [Google Scholar] [CrossRef] [Green Version]
Ikeuchi, K.; Wheeler, M.D.; Yamazaki, T.; Shakunaga, T. Model-based SAR ATR system. In Algorithms for Synthetic Aperture Radar Imagery III; International Society for Optics and Photonics: Bellingham, WA, USA, 1996; Volume 2757, pp. 376–387. [Google Scholar]
Tait, P. Introduction to Radar Target Recognition; IET: Washington, DC, USA, 2005; Volume 18. [Google Scholar]
Clemente, C.; Pallotta, L.; Gaglione, D.; De Maio, A.; Soraghan, J.J. Automatic Target Recognition of Military Vehicles With Krawtchouk Moments. IEEE Trans. Aerosp. Electron. Syst. 2017, 53, 493–500. [Google Scholar] [CrossRef] [Green Version]
Eryildirim, A.; Cetin, A.E. Man-made object classification in SAR images using 2-D cepstrum. In Proceedings of the 2009 IEEE Radar Conference, Pasadena, CA, USA, 4–8 May 2009; pp. 1–4. [Google Scholar]
Sun, Y.; Du, L.; Wang, Y.; Wang, Y.; Hu, J. SAR automatic target recognition based on dictionary learning and joint dynamic sparse representation. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1777–1781. [Google Scholar] [CrossRef]
Bau, D.; Zhou, B.; Khosla, A.; Oliva, A.; Torralba, A. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6541–6549. [Google Scholar]
Chen, S.; Wang, H.; Xu, F.; Jin, Y.Q. Target classification using the deep convolutional networks for SAR images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4806–4817. [Google Scholar] [CrossRef]
Wagner, S.A. SAR ATR by a combination of convolutional neural network and support vector machines. IEEE Trans. Aerosp. Electron. Syst. 2016, 52, 2861–2872. [Google Scholar] [CrossRef]
Jiao, J.; Zhang, Y.; Sun, H.; Yang, X.; Gao, X.; Hong, W.; Fu, K.; Sun, X. A densely connected end-to-end neural network for multiscale and multiscene SAR ship detection. IEEE Access 2018, 6, 20881–20892. [Google Scholar] [CrossRef]
Li, C.; Liu, G. Block Sparse Bayesian Learning over Local Dictionary for Robust SAR Target Recognition. Int. J. Opt. 2020, 2020. [Google Scholar] [CrossRef]
Pei, W.; Xin, Z.; Rongkun, P.; Peng, F. Active contour model based on edge and region attributes for target contour extraction in SAR image. J. Image Graph. 2014, 19, 1095–1103. [Google Scholar]
Parameswaran, S.; Weinberger, K.Q. Large Margin Multi-Task Metric Learning; Advances in Neural Information Processing Systems: Vancouver, BC, Canada, 2010; pp. 1867–1875. [Google Scholar]
Ruder, S. An overview of multi-task learning in deep neural networks. arXiv 2017, arXiv:1706.05098. [Google Scholar]
Evgeniou, T.; Pontil, M. Regularized multi–task learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 22–25 August 2004; pp. 109–117. [Google Scholar]
Zhang, Y.; Yeung, D.Y. A convex formulation for learning task relationships in multi-task learning. arXiv 2012, arXiv:1203.3536. [Google Scholar]
Lounici, K.; Pontil, M.; Tsybakov, A.B.; Van De Geer, S. Taking advantage of sparsity in multi-task learning. arXiv 2009, arXiv:0903.1468. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
Zeiler, M.D.; Krishnan, D.; Taylor, G.W.; Fergus, R. Deconvolutional networks. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2528–2535. [Google Scholar]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
Santurkar, S.; Tsipras, D.; Ilyas, A.; Madry, A. How Does Batch Normalization Help Optimization? Advances in Neural Information Processing Systems: Vancouver, BC, Canada, 2018; pp. 2483–2493. [Google Scholar]
Cooijmans, T.; Ballas, N.; Laurent, C.; Gülçehre, Ç.; Courville, A. Recurrent batch normalization. arXiv 2016, arXiv:1603.09025. [Google Scholar]
Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical evaluation of rectified activations in convolutional network. arXiv 2015, arXiv:1505.00853. [Google Scholar]
Sun, Y.; Chen, Y.; Wang, X.; Tang, X. Deep Learning Face Representation by Joint Identification-Verification; Advances in Neural Information Processing Systems: Vancouver, BC, Canada, 2014; pp. 1988–1996. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Nielsen, M.A. Neural Networks and Deep Learning; Determination Press: San Francisco, CA, USA, 2015; Volume 2018. [Google Scholar]
Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. arXiv 2014, arXiv:1401.4082. [Google Scholar]
Ross, T.D.; Worrell, S.W.; Velten, V.J.; Mossing, J.C.; Bryant, M.L. Standard SAR ATR evaluation experiments using the MSTAR public release data set. In Algorithms for Synthetic Aperture Radar Imagery V; International Society for Optics and Photonics: Bellingham, WA, USA, 1998; Volume 3370, pp. 566–573. [Google Scholar]
Perez, L.; Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv 2017, arXiv:1712.04621. [Google Scholar]
Mossing, J.C.; Ross, T.D. Evaluation of SAR ATR algorithm performance sensitivity to MSTAR extended operating conditions. In Algorithms for Synthetic Aperture Radar Imagery V; International Society for Optics and Photonics: Bellingham, WA, USA, 1998; Volume 3370, pp. 554–565. [Google Scholar]
Srinivas, U.; Monga, V.; Raj, R.G. SAR automatic target recognition using discriminative graphical models. IEEE Trans. Aerosp. Electron. Syst. 2014, 50, 591–606. [Google Scholar] [CrossRef]
O’Sullivan, J.A.; DeVore, M.D.; Kedia, V.; Miller, M.I. SAR ATR performance using a conditionally Gaussian model. IEEE Trans. Aerosp. Electron. Syst. 2001, 37, 91–108. [Google Scholar] [CrossRef] [Green Version]
Zhang, J.; Song, H.; Zhou, B. SAR Target Classification Based on Deep Forest Model. Remote Sens. 2020, 12, 128. [Google Scholar] [CrossRef] [Green Version]
Ding, J.; Chen, B.; Liu, H.; Huang, M. Convolutional neural network with data augmentation for SAR target recognition. IEEE Geosci. Remote Sens. Lett. 2016, 13, 364–368. [Google Scholar] [CrossRef]
Morgan, D.A. Deep convolutional neural networks for ATR from SAR imagery. In Algorithms for Synthetic Aperture Radar Imagery XXII; International Society for Optics and Photonics: Bellingham, WA, USA, 2015; Volume 9475, p. 94750F. [Google Scholar]
Kaur, J.; Agrawal, S.; Vig, R. A comparative analysis of thresholding and edge detection segmentation techniques. Int. J. Comput. Appl. 2012, 39, 29–34. [Google Scholar] [CrossRef]
Al-Kubati, A.A.M.; Saif, J.; Taher, M. Evaluation of Canny and Otsu image segmentation. In Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering (ICETCEE, Himeji, Japan, 5–7 November 2012. [Google Scholar]

Figure 1. Proposed MTL deep learning framework.

Figure 2. Specific implementation of the proposed framework.

Figure 3. SAR images and corresponding optical images of targets at similar aspect angels. From left to right: BMP2, BTR70, T72, 2S1, BRDM2, ZSU234, BTR60, D7, T62 and ZIL131.

Figure 4. Some examples of the segmentation labels for ten targets: (a) SAR image; (b) segmentation ground truth; and (c) masked original image.

Figure 5. Some segmentation results of the proposed network architecture for different targets: (a) original SAR image; (b) segmentation ground truth; (c) masked original image; (d) segmentation results; and (e) masked segmentation results.

Figure 6. Some segmentation results of the proposed network architecture for different targets present: (a) masked original SAR images; (b) masked segmentation results of the proposed method; (c) masked segmentation results of Otsu; and (d) masked segmentation results of Canny.

Table 1. Number of training and testing images for SOC before the data augmentation.

	Training		Testing
Class	Depression	Number	Depression	Number
BMP2-9563	17°	233	15°	196
BTR70-c71	17°	233	15°	196
T72-132	17°	232	15°	196
BTR60-7532	17°	256	15°	195
2S1-b01	17°	299	15°	274
BRDM2-E71	17°	298	15	274
D7-92	17°	299	15°	274
T62-A51	17°	299	15°	273
ZIL131-E12	17°	299	15°	274
ZSU234-d08	17°	299	15°	274

Table 2. Recognition result of the proposed MTL deep learning framework under SOC (recognition ratio 99.13%).

	BMP2	BTR70	T72	BTR60	2S1	BRDM2	D7	T62	ZIL131	ZSU234
BMP2	100	0	0	0	0	0	0	0	0	0
BTR70	0	100	0	0	0	0	0	0	0	0
T72	0	0	100	0	0	0	0	0	0	0
BTR60	0	0.51	0	97.95	0	1.54	0	0	0	0
2S1	0	0.38	0.38	0.38	96.71	0	0	0	1.77	0.38
BRDM2	0	0	0	0	0	99.64	0	0	0.36	0
D72	0.36	0	0	0	0	0	97.81	0	0	1.83
T62	0	0	0.37	0	0	0	0	99.63	0	0
ZIL131	0	0	0	0	0	0	0	0	100	0
ZSU234	0	0	0	0	0	0	0	0	0	100

Table 3. Number of training and testing images for EOC-D before the data augmentation.

	Training		Testing
Class	Depression	Number	Depression	Number
2S1	17°	299	30°	288
BRDM2	17°	298	30°	287
T72	17°	232	30°	288
ZSU234	17°	299	30°	288

Table 4. Recognition result of the proposed MTL deep learning framework under EOC-D (recognition ratio 94.01%).

	2S1	BRDM2	T72	ZSU234
2S1	99.31	0.345	0.345	0
BRDM2	4.88	95.12	0	0
T72	11.81	0	88.19	0
ZSU234	1.45	0	5.45	93.10

Table 5. Number of training images for EOC-C and EOC-V.

	Training
Class	Depression	Number
BMP2-9563	17°	233
BTR70-c71	17°	233
T72-132	17°	232
BRDM2-E71	17°	256

Table 6. Number of testing images for EOC-C.

Class	Serial No.	Depression	Number
BMP2	9566	15°, 17°	428
	C21	15°, 17°	429
	812	15°, 17°	426
T72	A04	15°, 17°	573
	A05	15°, 17°	573
	A07	15°, 17°	573
	A10	15°, 17°	567

Table 7. Number of testing images for EOC-V.

Class	Serial No.	Depression	Number
T72	S7	15°, 17°	419
	A32	15°, 17°	572
	A62	15°, 17°	573
	A63	15°, 17°	573
	A64	15°, 17°	573

Table 8. Recognition result of the proposed MTL deep learning framework under EOC-C (recognition ratio 98.36%).

Class	BMP2	BRDM2	BTR70	T72
BMP2sn-9566	96.93	0.23	1.64	4.21
BMP2sn-c21	96.04	0.47	0.47	3.03
T72sn-812	0.00	0.47	0.47	99.06
T72-A04	0.17	0.17	0.00	99.65
T72-A05	0.00	0.00	0.00	100.00
T72-A07	0.17	0.00	0.00	99.83
T72-A10	0.00	0.00	0.00	100.00

Table 9. Recognition result of the proposed MTL deep learning framework under EOC-V (recognition ratio 99.21%).

Class	BMP2	BRDM2	BTR70	T72
T72sn-s7	1.19	0.23	0.23	98.33
T72-A32	0.00	0.00	0.00	100.00
T72-A62	1.57	0.17	0.00	98.25
T72-A63	0.17	0.00	0.35	99.48
T72-A64	0.00	0.00	0.00	100.00

Table 10. Pixel accuracies for the targets and the backgrounds (pixel accuracy 99.03%).

Pixel Accuracy	Target	Background
Target	98.92	1.08
Background	0.87	99.13

Table 11. Recognition performance for various methods.

Methods	SOC	EOC-D	EOC-V
SVM [41]	90.00%	75.00%	81.00%
AdaBoost [41]	92.00%	78.00%	82.00%
IGT [41]	95.00%	80.00%	85.00%
CGM [42]	97.00%	79.00%	80.00%
DCNN [45]	92.30%	−	−
DCNN [44]	94.56%	−	−
gcForest [43]	96.70%	−	−
Proposed method	99.13%	94.01%	99.21%

Table 12. Pixel accuracies of Otsu, Canny and our proposed algorithm.

Methods	Pixel Accuracy of Target	Pixel Accuracy of Background	Pixel Accuracy
Otsu	58.17%	88.35%	73.26%
Canny	79.12%	90.13%	85.12%
Proposed	98.92%	99.13%	99.03%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, C.; Pei, J.; Wang, Z.; Huang, Y.; Wu, J.; Yang, H.; Yang, J. When Deep Learning Meets Multi-Task Learning in SAR ATR: Simultaneous Target Recognition and Segmentation. Remote Sens. 2020, 12, 3863. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12233863

AMA Style

Wang C, Pei J, Wang Z, Huang Y, Wu J, Yang H, Yang J. When Deep Learning Meets Multi-Task Learning in SAR ATR: Simultaneous Target Recognition and Segmentation. Remote Sensing. 2020; 12(23):3863. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12233863

Chicago/Turabian Style

Wang, Chenwei, Jifang Pei, Zhiyong Wang, Yulin Huang, Junjie Wu, Haiguang Yang, and Jianyu Yang. 2020. "When Deep Learning Meets Multi-Task Learning in SAR ATR: Simultaneous Target Recognition and Segmentation" Remote Sensing 12, no. 23: 3863. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12233863

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

When Deep Learning Meets Multi-Task Learning in SAR ATR: Simultaneous Target Recognition and Segmentation

Abstract

1. Introduction

2. MTL Deep Learning Framework for SAR Target Segmentation and Recognition

3. Network Architecture of MTL Deep Learning Framework

3.1. Specific Implementation

3.2. Convolutional Layer and Transposed Convolutional Layer

3.3. Batch Normalization and Rectified Linear Unit

3.4. Max Pooling and SoftMax

3.5. Joint Loss and Backpropagation

4. Experiments and Results

4.1. Dataset

4.2. Data Preprocessing

4.3. Network Setup

4.4. Recognition Results under SOC

4.5. Recognition Results under EOC

4.6. Results of SAR Target Segmentation

4.7. Comparison of Performance of Segmentation and Recognition

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI