Next Article in Journal
Best Accuracy Land Use/Land Cover (LULC) Classification to Derive Crop Types Using Multitemporal, Multisensor, and Multi-Polarization SAR Satellite Images
Previous Article in Journal
Comparison of Four Machine Learning Methods for Generating the GLASS Fractional Vegetation Cover Product from MODIS Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SAR Target Recognition via Supervised Discriminative Dictionary Learning and Sparse Representation of the SAR-HOG Feature

1
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
2
Luoyang Electronic Equipment Test Center, Luoyang 471000, China
*
Author to whom correspondence should be addressed.
Submission received: 25 May 2016 / Revised: 22 July 2016 / Accepted: 17 August 2016 / Published: 20 August 2016

Abstract

:
Automatic target recognition (ATR) in synthetic aperture radar (SAR) images plays an important role in both national defense and civil applications. Although many methods have been proposed, SAR ATR is still very challenging due to the complex application environment. Feature extraction and classification are key points in SAR ATR. In this paper, we first design a novel feature, which is a histogram of oriented gradients (HOG)-like feature for SAR ATR (called SAR-HOG). Then, we propose a supervised discriminative dictionary learning (SDDL) method to learn a discriminative dictionary for SAR ATR and propose a strategy to simplify the optimization problem. Finally, we propose a SAR ATR classifier based on SDDL and sparse representation (called SDDLSR), in which both the reconstruction error and the classification error are considered. Extensive experiments are performed on the MSTAR database under standard operating conditions and extended operating conditions. The experimental results show that SAR-HOG can reliably capture the structures of targets in SAR images, and SDDL can further capture subtle differences among the different classes. By virtue of the SAR-HOG feature and SDDLSR, the proposed method achieves the state-of-the-art performance on MSTAR database. Especially for the extended operating conditions (EOC) scenario “Training 17 —Testing 45 ”, the proposed method improves remarkably with respect to the previous works.

Graphical Abstract

1. Introduction

Automatic target recognition (ATR) is one of the important applications of synthetic aperture radar (SAR) in civilian and military fields. Although many works have been done in the past few decades [1,2,3,4,5,6,7,8], it is still a highly challenging problem. Generally, the process of SAR ATR includes four sequential stages: detection, discrimination, feature extraction and classification. In the first two stages, the potential regions of interest (ROIs) are located, and the false ROIs are removed. In the last two stages, distinctive features are extracted from ROIs, and then, the extracted features are classified. In this paper, feature extraction and classification are studied.
For the good performance of SAR ATR, feature extraction is a key factor, the aim of which is to capture the characteristic of targets. In high resolution SAR images, geometric structures are the important features of targets. Inspired by the simplicity and robustness of a local (statistical) feature, i.e., histogram of oriented gradients (HOG) [9], we adapt HOG to speckled SAR images by using the ratio-based gradient definition (called SAR-HOG). Experiments show that SAR-HOG can indeed depict the geometric property of targets. In addition to obtaining the distinctive features of targets, classification is used to capture the differences among the different classes. Inspired by the effectiveness of the sparse representation classifier (SRC) [10] and dictionary learning [11], we further propose the supervised discriminative dictionary learning (SDDL) to learn a discriminative dictionary by combining the merits of supervised dictionary learning (SDL) [12] and discriminative dictionary learning (DDL) [13]. In addition, in order to amplify the differences while properly suppressing the common features of the different classes, we represent the learned dictionary as the concentration of the class-specific sub-dictionaries and a shared sub-dictionary. Meanwhile, we propose a new strategy to combine the dictionary and the parameter of the classifier to simplify the optimization procedure. Based on the proposed SDDL and SRC, we propose the SAR ATR classifier, i.e., supervised discriminative dictionary learning and the sparse representation classifier (SDDLSR). Finally, SAR-HOG and SDDLSR are applied to the public MSTAR database.
The main contributions of this paper can be summarized as follows: (1) We propose a novel local feature for SAR ATR named SAR-HOG, which can effectively capture the main structures of targets in speckled SAR images. (2) We propose SDDL to learn the discriminative dictionary for SAR ATR. This is the first work to combine the SDL and DDL. Meanwhile, we propose a new strategy to simplify the optimization problem. (3) We propose a SAR ATR classifier SDDLSR based on SDDL and SRC, in which both the reconstruction error and the classification error are considered. (4) We perform extensive experiments to demonstrate the performance of the proposed method on the MSTAR database under the standard operating conditions (SOC) and extended operating conditions (EOC) scenarios. In addition, the performance of the proposed method with respect to the reduced size of the training set is also evaluated. Experiments show that our method achieves the state-of-the-art SAR ATR performance.
The remainder of the paper is organized as follows. In Section 2, we briefly introduce the related work, including the feature extraction methods and classification methods for SAR ATR. The novel local feature SAR-HOG is detailed in Section 3. The proposed supervised discriminative dictionary learning and the whole SAR ATR algorithm are detailed in Section 4. Experimental results on the MSTAR database are given in Section 5. Section 6 finally concludes our work.

2. Related Work

2.1. Work Related to Feature Extraction for SAR ATR

The present feature extraction methods for SAR images can be roughly grouped into three categories. (1) Using raw images or transformed images: Zhao et al. [1] used the raw images as the feature of targets. Srinivas et al. [2] adopted the wavelet decomposition images. Dong et al. [3] utilized the monogenic signal to capture the characteristics of the SAR image. Generally, the features represented by raw images or transformed images have a high dimension; thus, linear or nonlinear dimensionality reduction techniques are usually used. Mishra [4] compared the linear methods, i.e., principle component analysis (PCA) and linear discriminant analysis (LDA), for SAR ATR. Huang et al. [5] proposed a nonlinear method using tensor global and local discriminant embedding for dimensionality reduction in SAR ATR. (2) Using scattering center features: Zhou et al. [6] adopted scattering center features at different target poses for classification. However, an offline global scattering model is needed to establish SAR image templates. (3) Using global or local statistical features: Clemente et al. [7] used the pseudo-Zernike moments as the global statistical features for SAR ATR. Local (statistical) features arise from the small parts of an image, e.g., HOG [9], scale-invariant feature transform (SIFT) [14], etc. The local features are usually invariant to image rotation, image scaling and minor changes in viewing direction [15,16]. For optical images, local feature extraction methods usually detect key points at first and then compute the local descriptors [15,16]. However, there are two problems when applying the existing local feature extraction methods to SAR ATR. On the one hand, for limited resolution SAR images, the pixels of targets are usually limited. Additionally, key point detectors cannot always reliably capture the structures of targets [9]. On the other hand, the gradient computation by difference in local feature extraction is not suitable for the SAR image due to the speckle noise. By far, only a few local features are investigated for SAR images. Dai et al. [17] adopted the multilevel local pattern histogram for SAR terrain and land-use classification. Cui et al. [18] proposed a ratio-detector-based feature extraction method for very high resolution SAR image patch indexing. Dellinger et al. [19] proposed a SIFT-like algorithm called SAR-SIFT for the registration of SAR images. These investigations indicate that the local features have great potential in SAR applications. Some reviews of local feature extraction can be found in [15,16]. Apart from the above hand-designed feature extraction methods, convolutional neural networks (CNNs) have been used for automatic feature extraction for SAR ATR [8]. However, CNNs need huge training samples to gain the desired performance. In short, feature extraction for SAR images is still an open-ended question.

2.2. Work Related to Classification for SAR ATR

Classification in SAR ATR normally refers to supervised classification. Many classic classifiers have been used in SAR ATR, such as SVM [1], kNN [3], etc. The earlier mentioned dimensionality reduction methods PCA and LDA can also be taken as classifiers [4]. Recently, the sparse representation classifier proposed by Wright [10] has been successfully applied to SAR applications, such as polarimetric SAR image classification [20] and SAR ATR [3]. The success of SRC is largely guaranteed by the high redundancy and low coherency of the dictionary atoms. However, the dictionary in SRC constructed by stacking training samples may neither be optimal for solving reconstruction problems (i.e., denoising, inpainting) nor for solving discriminative problems (i.e., classification) [21]. Thus, dictionary learning [11] is used to learn a more representative and compact dictionary. Several dictionary learning methods have been proposed for reconstruction tasks, such as K-SVD [22] and online dictionary learning [23]. However, these methods are not necessarily suitable for discriminative tasks [24]. Therefore, discriminative dictionary learning methods are proposed for solving discriminative problems. Ramirez et al. [13] proposed an incoherent dictionary learning method by introducing an incoherence term to encourage the independency of sub-dictionaries corresponding to different classes; thus, the dictionary becomes more discriminative. Gao et al. [25] further developed this method. They explicitly represented the similar atoms existing in sub-dictionaries as a shared sub-dictionary. Meanwhile, they also imposed incoherence constraints among class-specific sub-dictionaries. Although DDL can efficiently improve the performance of the dictionary, it does not use the class labels in the training set. Mairal et al. [12] proposed supervised dictionary learning by using class labels to improve the classification performance. Zhang et al. [26] proposed a method called discriminative K-SVD (DK-SVD) to jointly learn the dictionary and classifier by minimizing the summation of the reconstruction and classification errors. Inspired by DK-SVD, Jiang et al. [27] further proposed a method called label consistent K-SVD (LCK-SVD). They enforced a label consistency constraint on the dictionary and combined the dictionary and the parameters of the classifier into a single parameter to make the optimization more simple. A survey of supervised dictionary learning and sparse representation can be found in [28].

3. SAR-HOG

Because of the much greater aspect sensitivity of the SAR image compared with the optical image, the SAR image is more sensitive to depression angle change and the pose variation of targets [29]. Therefore, we should pay more attention to the stable SAR image pixels, which are mainly dominated by strong backscatter returns from structures with aspect insensitivity. HOG is a simple and efficient local feature. It suggests that, for capturing stable structures of targets, one should use fine-scale derivatives, fine orientation binning, a dense grid and high-quality normalized descriptor blocks [9]. Inspired by this idea, we propose a HOG-like local feature for SAR images called SAR-HOG by using the ratio-based gradient definition.
SAR-HOG computation includes three steps: gradient computation, orientation binning, normalization and feature description [9]. In the following, we detail the computation.

3.1. Gradient Computation

Original HOG uses the simplest scheme to compute the gradients, i.e., 1-D [ 1 , 0 , 1 ] masks are employed to compute the gradients of the raw image without smoothing [9]. Because of the multiplicative speckle noise existing in SAR images, the gradient by difference is not a constant false alarm rate operator. It is more suitable to use the ratio instead of the difference to compute the gradients of SAR images [19]. Here, we use the simplest ratio of average (ROA) [30], i.e.,
R i = M 1 i M 2 i
where R i denotes the ratio and M 1 ( i ) and M 2 ( i ) denote the local means on opposite sides of the current pixel along direction i. i = 1 means the horizontal direction, and i = 3 means the vertical direction. Figure 1 shows the scheme of the ROA. The average region for M 1 ( i ) calculation is max [ w i n 1 2 , 1 ] × w i n (or w i n × max [ w i n 1 2 , 1 ] ), where w i n denotes the odd size of the average region.
Inspired by SAR-SIFT [19], we define the horizontal gradient G H and vertical gradient G V as:
G H = log R 1 , G V = log R 3
Then, the gradient magnitude G m and orientation G θ can be computed by:
G m = G H 2 + G V 2 , G θ = atan G V G H
where the atan ( · ) denotes the inverse tangent function.
It should be noted that original HOG computes gradients without smoothing, and this is equal to computing M 1 ( i ) and M 2 ( i ) using only one pixel, respectively, i.e., w i n = 1 . However, the effect of doing so in practice is not good for SAR images. Additionally, the main reason depends on speckle noise. We find that the moderately-sized region for averaging can improve the performance.

3.2. Orientation Binning

Given G m and G θ , we calculate the histogram of oriented gradients in local spatial regions (called cells) like that in [9]. Specifically, the SAR image is divided into small cells (see Figure 2). For all of the pixels within a cell, the orientations are quantized into a fixed number of angular bins, and the magnitudes are accumulated into orientation bins. In original HOG, 6–8 pixel-wide cells and nine angular bins do best [9]. However, we find that smaller cells work better, and the angular bins should be a bit bigger.

3.3. Normalization and Feature Description

Generally, G m of the SAR image has a large dynamic range, especially for metallic targets in SAR ATR. Therefore, effective local contrast normalization is very important for good performance. Specifically, the cells are grouped into larger spatially-connected blocks (see Figure 2), and the histogram entries of cells in each block are concentrated to be a vector. Then, each vector is normalized to have unit l 2 norm, i.e.,
v i v i max v i 2 , ε
where v i denotes the vector corresponding to the i-th block; ε is a small number, which can be chosen to be 0.2-times the mean value of v i 2 in all of the blocks [31]. The normalization method here is a little different from that in [9], which can better avoid the poor results for blocks with uniform G m . In the original HOG; 2–3 cell blocks work best [9]. In Section 5, we can see that block size should be adjusted according to the targets in SAR images.
Finally, all of the normalized vectors corresponding to the blocks are concentrated to yield the SAR-HOG descriptor. It should be noted that, in order to improve the performance, the blocks typically overlap, meaning that each cell contributes more than once to the final descriptor (see Figure 2). In the original HOG, the block overlap (stride) is half of the block size. We find that it is still suitable for the SAR image.
In Section 5, we can see that SAR-HOG can reliably capture the structures of targets in SAR images, and this local feature can directly improve SAR ATR performance.

4. SAR ATR Algorithm via Supervised Discriminative Dictionary Learning of SAR-HOG

In some SAR ATR scenarios, the targets from different classes may have similar physical structures. For example, in military vehicles, SAR ATR [29], the main battle tanks T62 and T72 have similar structures, e.g., tread, turret, armor, etc. Additionally, these similar objects correspond to similar image features. In this paper, we propose the SDDL method to further capture subtle differences among the different classes. In the following, we first review the sparse representation classifier and dictionary learning, then detail the proposed SDDL method and the optimization procedure and, finally, present the complete SAR ATR algorithm.

4.1. Review of the Sparse Representation Classifier and Dictionary Learning

SRC is a generic classifier and has been used in many applications. Given sufficient training samples of K classes, a dictionary D = D 1 · · · D K R m × N is constructed by stacking the samples, where D k = x 1 k , , x i k , , x n k k is the k-th sub-dictionary and x i k represents the i-th training sample with m dimension in the k-th class, i { 1 , · · · , n k } and N = k = 1 K n k . A test sample x t is decomposed onto D by solving the following optimization problem [10]:
min α t R N x t D α t 2 2 + λ α t 1
where α t is the sparse code of x t and λ is the regularization parameter to control the sparse degree of α t ; · 2 and · 1 denote the l 2 norm and l 1 norm, respectively.
Then, x t is approximated by quantities x t k ^ = D δ k α t , where δ k α t is a vector whose nonzero entries are the entries in α t that are associated with the k-th class. Additionally, x t is classified by assigning it to the class that minimizes the residual error between x t and x t k ^ [10]:
k ^ = arg min k 1 , , K x t x t k ^ 2 2
In order to have a representative and compact dictionary, dictionary learning is applied to learn a dictionary from the training set, which can be formulated as follows [11]:
min D , A k k = 1 K 1 2 X k D A k F 2 + λ ψ A k s . t . D k : , j 2 = 1 , k , j
where D R m × P is the learned dictionary with size of P; X k R m × n k and A k R P × n k denote the k-th test samples matrix and the corresponding sparse codes matrix, respectively; ψ ( · ) is the sparsity-inducing regularization function, and the l 1 norm is usually used; · F denotes the Frobenius norm. To avoid the trivial solution, we constrain each atom to be norm one. Generally, the size of the dictionary is much smaller than the amount of the samples, i.e., P N , meaning that the learned dictionary is compact.
The above dictionary learning method is usually used for reconstruction tasks. In fact, for discriminative tasks [24], e.g., SAR ATR, DDL can be used to learn a more discriminative dictionary. In [25], class-specific sub-dictionaries and a shared sub-dictionary are jointly learned from the training samples. Meanwhile, incoherence constraints are enforced on sub-dictionaries. Mathematically, this dictionary learning is formulated as follows:
min D 0 , { D k } , { A k } k = 1 K 1 2 X k D 0 , D k A k F 2 + λ A k 1 + k = 0 K μ k 2 D k T D k I k F 2 + η k 2 D k T D k F 2 s . t . D k : , j 2 = 1 , k , j
where D 0 R m × P 0 denotes the shared sub-dictionary, which encodes the common feature among the different classes; { D k } R m × P k denote the class-specific sub-dictionaries, which encode the subtle feature differences among the different classes; the complete dictionary is D = D 0 , D 1 , , D K R m × P ; m is the feature dimension; P = P 0 + P 1 + P k is the size of D ; D k is the sub-matrix by removing D k from D ; I k R P k × P k is the identity matrix; λ , { μ k } , { η k } are regularization parameters.
In the above formulation, the fist two terms represent reconstruction error, which are similar to Equation (7). The third term enforces the self-incoherence on each sub-dictionary in order to make the learned dictionary stable, and the fourth term enforces the inherenceconstraint among sub-dictionaries to make the dictionary more discriminative.
Although the dictionary learned by DDL is more discriminative, DDL does not use the class labels in the training set. SDL [12] jointly learns the dictionary and classifier by adding a classification error term to Equation (7), which would require solving:
min D , A k , W k = 1 K 1 2 X k D A k F 2 + λ ψ ( A k ) + γ L Y k , W , A k + ρ 2 W F 2 s . t . D : , j 2 = 1 , j
where Y k = y k × 1 T R K × n k is the class labels matrix associated with X k and y k is usually denoted by a one-of-K indicator vector; L ( · ) measures the classification error between Y k , and the prediction of the classifier with model parameter W R K × P based on A k , and L ( · ) can be a logistic loss function, hinge loss function or square loss function; γ and ρ are regularization parameters; the last term is a regularizer for stability reasons.

4.2. Supervised Discriminative Dictionary Learning

Inspired by DDL and SDL, we combine Equations (8) and (9) to propose the supervised discriminative dictionary learning method. Specifically, we represent the dictionary as the concentration of the class-specific sub-dictionaries { D k } and a shared sub-dictionary D 0 , i.e., D = D 0 , D 1 , , D K , enforce the incoherence constraints on sub-dictionaries, adopt the square loss function to measure the classification error and overlook the regularization term W F 2 like that in LCK-SVD. Therefore, the dictionary, classifier and sparse codes can be obtained by solving:
min D 0 , { D k } , { A k } , W k = 1 K 1 2 X k D 0 , D k A k F 2 + λ A k 1 + k = 0 K μ k 2 D k T D k I k F 2 + η k 2 D k T D k F 2 + k = 1 K γ k 2 Y k W A k F 2 s . t . D k : , j 2 = 1 , k , j
where A k R ( P 0 + P k ) × n k , { γ k } are regularization parameters.
The above problem can be solved iteratively and alternatively, with the ( 2 K + 1 ) of the unknowns fixed each time and solving for the ( 2 K + 2 ) -th. However, the solution is very likely to get stuck in some local minima. Therefore, we have to optimize the above objective function. Now, it is worth taking a few moments to study the form of Equation (10), as it can provide a new perspective about the classification error. We can see that k = 1 K γ k 2 Y k W A k F 2 + λ A k 1 in Equation (10) has the same form of dictionary learning (see Equation (7)). Additionally, the parameter of classifier W can be taken as a dictionary; the class labels { Y k } can be taken as training samples. Thus, the classification error can also be taken as reconstruction error. It should be noted that { Y k } are actually class labels, and the columns of Y k = y k × 1 T are always denoted by the sparsest one-of-K vector y k . Here, we represent { y k } by a group of dense orthonormal vectors. Based on the above understandings, we further represent W as the concentration of the class-specific sub-dictionaries { W k } and a shared sub-dictionary W 0 , i.e., W = [ W 0 , W 1 , , W K ] , and also enforce incoherence constraints on the sub-dictionaries of W . Because D and W are learned from two training sets with different natures, it is reasonable to further enforce incoherence constraints among D and W . In addition, we elaborately set the regularization parameters for reasonably simplification. Each γ k is set to be one, meaning that the reconstruction error term has the same weight with the classification error term. The incoherence constraint terms about D and W have the same weights. Therefore, Equation (10) can be developed into the following problem:
min D 0 , { D k } , W 0 , W k , { A k } k = 1 K 1 2 X k D 0 , D k A k F 2 + λ A k 1 + k = 0 K μ k 2 D k T D k I k F 2 + η k 2 D k T D k F 2 + k = 1 K 1 2 Y k [ W 0 , W k ] A k F 2 + k = 0 K μ k 2 W k T W k I k F 2 + η k 2 W k T W k F 2 + k = 0 K μ k D K W k T F 2 + η k Tr D k T D k W k T W k μ k P k 2 s . t . D k : , j 2 = 1 , k , j
where W 0 R K × P 0 and W k R K × P k ; Tr denotes the trace operation. The third line of Equation (11) denotes the incoherence constraints among D and W , and the adoption of such a form is for the following mathematical simplification. Then, we combine the parameters in Equation (11) by:
X ˜ k = X k Y k , D ˜ 0 = D 0 W 0 , D ˜ k = D k W k
where X ˜ k R ( m + K ) × n k , D ˜ 0 R ( m + K ) × P 0 and D ˜ k R ( m + K ) × P k . Additionally, the simplified form is obtained as follows:
min D ˜ 0 , { D ˜ k } A k k = 1 K 1 2 X ˜ k D ˜ 0 , D ˜ k A k F 2 + λ A k 1 + k = 0 K μ k 2 D ˜ k T D ˜ k I k F 2 + η k 2 D ˜ k T D ˜ k F 2 s . t . D ˜ k : , j 2 = 1 , k , j
and the complete quasi-dictionary D ˜ = [ D ˜ 0 , D ˜ 1 , , D ˜ K ] R ( m + K ) × P . We can see that Equation (13) has the same form as Equation (8).

4.3. Optimization Procedure

The objective function in Equation (13) is not convex. We can use the similar method in [25] to solve the problem. The main idea is that each A k , D ˜ k and D ˜ 0 is updated alternatively while keeping all the rest of the variables fixed, and the updates iterate until reaching the stopping criterion. Specifically, there are three steps in each iteration as follows.
  • Updating sparse codes { A k } : With all of the other factors fixed in Equation (13), the k-th sparse codes matrix A k can be obtained by solving the following problem:
    min A k 1 2 X ˜ k D ˜ 0 , D ˜ k A k F 2 + λ A k 1
    This problem is the standard sparse coding problem (see Equation (5)). In this paper, we use the fast implementation of LARSalgorithm [31], which is a variant for solving the Lasso.
  • Updating class-specific sub-dictionaries { D ˜ k } : With the sparse codes { A k } , the shared sub-dictionary D ˜ 0 and the k 1 class-specific sub-dictionaries fixed, the k-th class-specific sub-dictionary D ˜ k can be obtained by solving the following problem:
    min D ˜ k 1 2 X ˜ k D ˜ 0 A k 0 D ˜ k A k 1 F 2 + μ k 2 D ˜ k T D ˜ k I k F 2 + η k 2 D ˜ k T D ˜ k F 2 , s . t . D ˜ k ( : , j ) 2 = 1 , j
    where A k 0 R P 0 × n k , A k 1 R P k × n k and A k = [ A k 0 A k 1 ] T . We use the gradient descent algorithm to solve the objective function, and the step size is chosen according to the Armijo rule, like that in [13,25].
  • Updating the shared sub-dictionary D ˜ 0 : When { A k } and { D ˜ k } are fixed, we can obtain D ˜ 0 by solving the following problem:
    min D ˜ 0 k = 1 K 1 2 X ˜ k D ˜ k A k 1 D ˜ 0 A k 0 F 2 + μ k 2 D ˜ 0 T D ˜ 0 I 0 F 2 + η k 2 D ˜ 0 T D ˜ 0 F 2 , s . t . D ˜ 0 ( : , j ) 2 = 1 , j
    We also use the gradient descent algorithm to optimize the above problem, like that in [13,25].
The above iterative process stops when the number of iterations reaches the predefined value M or the respective relative change between two successive estimations of D ˜ and { A k } are less than the given constants, i.e.,
D ˜ s D ˜ s 1 F D ˜ s F < t h r e s D ˜ k = 1 K A k s A k s 1 F k = 1 K A k s F < t h r e s A
where s denotes the s-th iteration; ∧ denotes the AND operation; t h r e s D ˜ and t h r e s A denote the convergence thresholds, which are set to be small numerical constants. Although the solutions obtained by this algorithm are not the exact solutions, they are actually the solutions satisfying the constraints [25]. Additionally, the experiments in Section 5 also demonstrate the good performance of these solutions.
Following the work of [27], once D ˜ is obtained, the final desired dictionary D ^ and the parameter of classifier W ^ can be derived from D ˜ as follows:
D ^ = d 1 d 1 2 , , d j d j 2 , , d P d P 2 , W ^ = w 1 d 1 2 , , w j d j 2 , , w P d P 2
where d j = D ˜ ( 1 : m , j ) , w j = D ˜ ( m + 1 : m + K , j ) , D ^ R m × P and W ^ R K × P .
As for the initialization of the above iterative process, we initialize { D k } and { A k } with K-SVD by using the training samples from different classes, and we initialize D 0 by using the training samples from all of the classes. For the initialization of W 0 and { W k } , we can use the ridge regression model to initialize, like that in [27]. As described above, W can also be taken as a dictionary, so K-SVD can be employed for initialization. In this paper, we use the latter method to initialize W 0 and { W k } . According to Equation (12), the initialized { X ˜ k } , D ˜ 0 and { D ˜ k } are obtained.
The parameter setting, performing cross-validation on λ, { μ k } , { η k } (regulation parameters) and P (size of dictionary), would be cumbersome. Fortunately, due to the same forms of Equations (8) and (13), we can refer to the method in [25] for parameter setting. For sparse regularization parameter λ, it has been experimentally shown that good performance can be achieved when it is set to be 0.3 [25]. We also set λ to be 0.3 for simplification. In [25], { μ k } and { η k } are set as follows:
μ k = a · n k 2 K P k 2 , η k = b · n k P k P P k
where a and b are controllable parameters. Such a formulation considers the normalization of the incoherence term by the number of samples and the size of the sub-dictionary in each class. In [25], the ratio between η k and μ k is simply set to be two, and b is set to be 0.1. We also take the formulation of Equation (19) and, meanwhile, constrain η k / μ k = 2 . b is set by experiment, which is illustrated in Section 5. As for the size of the dictionary, [25] shows that a bigger size of class-specific sub-dictionary can lead to better performance, and a moderately-sized shared sub-dictionary leads to the desired performance. A similar rule still holds in our method. It should be noted that P grows linearly with the number of classes. Therefore, we should properly adjust P according to the real SAR ATR problem.
The overall optimization procedure for solving Equation (13) is summarized in Algorithm 1.
Algorithm 1: Supervised discriminative dictionary learning (SDDL).
Input:
  feature vectors of the ROIs: { X k } , k = 1 , , K ; class labels { Y k } ; regularization parameters, λ , { μ k } , { η k } ; the size of sub-dictionaries, { P k } , k = 0 , , K ;
  the stopping thresholds t h r e s D ˜ and t h r e s A or maximum iterations M.
Initialization:
  Initialize { D k } , D 0 , { W k } , W 0 and { A k } with K-SVD.
  Initialize { X ˜ k } , D ˜ 0 and { D ˜ k } by Equation (12).
Repeat
  for k = 1 to K do
   Update the sparse codes A k of X ˜ k with LARS algorithm (see Equation (14)).
  end for
  for k = 1 to K do
   Update each D k by using gradient descent algorithm (see Equation (15)).
  end for
  Update D 0 by using gradient descent algorithm (see Equation (16)).
Until reaching the stopping criterion (see Equation (17)).
  Output: The desired dictionary D ^ and the classifier parameter W ^ obtained by Equation (18).

4.4. SAR ATR Algorithm

Given the training slice images of ROIs and relevant class labels from K classes, we first extract features from the slice images. Then, we use the SDDL method to jointly learn a discriminative dictionary D ^ and classifier W ^ using Algorithm 1. Then, we decompose the feature vector of a certain test sample x t onto D ^ to obtain its sparse code α t by solving the problem of Equation (5), and here, the D in Equation (5) is replaced by D ^ . Finally, we identify x t based on α t by using the following decision rule:
k ^ = arg min k 1 , , K x t D ^ δ k α t 2 2 + y k W ^ δ k α t 2 2
where α t R P 0 + P k . By comparing Equations (6) and (20), we can see that both the reconstruction error and the classification error are actually considered by us.
The proposed SAR ATR algorithm via supervised discriminative dictionary learning and sparse representation is summarized in Algorithm 2. Here, the proposed method is denoted by SDDLSR.
Algorithm 2: SAR ATR via SDDLSR.
Input:
  Slice images of ROIs from the training set and the testing set and class labels in the training set.
Feature extraction:
  Compute features of slice images. In this paper, SAR-HOG features are computed following the description in Section 3.
Classification:
  Learn the dictionary D ^ and the classifier parameter W ^ using Algorithm 1.
  Decompose the test sample X t onto D ^ and identify by the decision rule (see Equation (20)).
Output: The identity of X t .

5. Experiments and Discussions

In order to verify the effectiveness and robustness of the proposed method, we perform experiments using the Moving and Stationary Target Automatic Recognition (MSTAR) public database [29]. This database is a gallery collected using an X-band HH polarization SAR with 1 ft × 1 ft resolution for multiple targets. The SAR images are captured at various depression angles ( 15 , 17 , 30 , 45 ) over a 0 360 range of aspect view. The sizes of the images are all around 128 × 128 pixels. In this paper, these images are cropped to 64 × 64 pixels to further avoid the influence of clutter. The performance of SAR ATR obtained by the proposed method is attributed to two factors, i.e., the SAR-HOG feature and the SDDLSR classifier. Therefore, we should demonstrate the effectiveness of the SAR-HOG feature and SDDLSR, respectively.
In the following, we first detail the experiment setup and parameter setting. Then, we evaluate the effectiveness of the SAR-HOG feature by using SDDLSR as the baseline classifier in Section 5.3. In Section 5.4 and Section 5.5, we demonstrate the effectiveness of SDDLSR classifier based on the same feature SAR-HOG. Here, some classic classification methods, including SVM, kNN, SRC and the SDL method LCK-SVD, are compared to the proposed SDDLSR classifier. The performance of the proposed method with the reduced size of the training set is evaluated in Section 5.6. The time consumption is illustrated in Section 5.7. All of the experiments are performed by MATLAB, using a common PC with the Intel Core i7 processor with a 3.40-GHz main frequency and an 8.00-GB main memory.

5.1. Experiment Setup

Experiments are performed under standard operating conditions (SOC) and extended operating conditions (EOC) [29]. In SOC scenarios, the testing conditions are very close to the training conditions. However, in EOC scenarios, the operating conditions of testing are always away from the training conditions. Therefore, EOC scenarios are closer to real-world battlefield scenarios.
In the experiments under SOC, the images of ten targets acquired at a 17 depression angle are used to train, while the images acquired at a 15 depression angle are used to test [3,29]. The number of images for training and testing is tabulated in Table 1. It should be noted that BMP2and T72have several variants with small structural modifications, and these variants are denoted by different series numbers (SN). Here, only the SN_132 of T72 and SN_9563 of BMP2 acquired at a 17 depression angle are used for training.
In the experiments under EOC, the EOC difference on the depression angle (denoted by EOC_d) is considered. Four targets, 2S1, BRDM2, ZSU23/4and T72(SN_A64) at different depression angles under Scene 1 are used [29]. Specifically, the images acquired at a 17 depression angle are used for training, and the images acquired at 30 and 45 depression angles are used for testing. Here, the two cases are denoted by “Training 17 —Testing 30 ” and “Training 17 —Testing 45 ”, respectively. The number of images for training and testing about the four targets is tabulated in Table 2.
For SAR ATR, the training samples acquired from the real battlefield are usually very limited. Therefore, we further study the performance of the SAR ATR methods with a small training set. Specifically, the case “Training 17 —Testing 30 ” is considered, and the training samples are gradually reduced. Each time, 50 samples per class are randomly removed from the previous training set, and the left is used as the current training set. For each training size, experiments are repeated 100 times to have the mean performance.

5.2. Parameter Setting

The parameters in the proposed method include the parameters of SAR-HOG and the parameters of SDDL. The setting principles have been given in Section 3 and Section 4.3. The parameter setting in SAR-HOG actually depends on SAR image resolution and the sizes of targets in the images. Here, we further illustrated the parameter setting. Additionally, the fixed parameter values are tabulated in Table 3.
We use the experiment setup “Training 17 —Testing 30 ” and the proposed SDDLSR method to choose the parameter values. The experiment results are recorded in Figure 3 and Figure 4 and Table 4, Table 5 and Table 6. It should be noted that when the current parameters are tested, the left parameters are fixed at the values in Table 3.
Figure 3a shows that properly increasing the average region size w i n for computing M 1 ( i ) and M 2 ( i ) can improve the recognition rate up to 11 pixels. Figure 3b shows that setting the number of bins to be 11 can obtain the best performance. Table 4 and Table 5 record the recognition rate and time consumption as the cell and block sizes change, respectively. The items for which the corresponding recognition rates are bigger than 0.96 are in bold. We can see that when the cell sizes are set to be 4–8 pixels, the block sizes are set to be 2–7 cells; meanwhile, the block sizes are no more than 32 pixels, and the recognition rates are no less than 0.96. In addition, although the longer SAR-HOG feature length associated with the smaller cell and block sizes tends to lead to a higher recognition rate, the time consumption is more. Therefore, we choose 8 × 8 pixel cells and 4 × 4 cell blocks in the experiments to compromise between recognition rate and time consumption. Table 6 shows that setting the stride to be 16 pixels (i.e., half of the block size) can lead to desirable performance. As for the parameter λ and b in SDDL, λ is set to be the empirical value 0.3 [25], and b = 0 . 1 is an ideal value shown in Figure 4a. As for the size of the dictionary, Figure 4b shows that setting P k , k = 0 , 1 , K to be 96 works best. For the stopping criterion, we find that the performance of Algorithm 1 is satisfactory within 20 iterations. In addition, small t h r e s D ˜ and t h r e s A , e.g., 10 4 can guarantee the convergence of Algorithm 1. However, such a setting generally results in more than one hundred iterations. Therefore, we simply set the max iterations M = 20 in the experiments.

5.3. Effectiveness of SAR-HOG

The effectiveness of the SAR-HOG feature is first evaluated using the experiment setups under SOC and EOC_d. Specifically, the commonly-used features, i.e., raw intensity images [1] and the wavelet decomposition images [2] are used to compare with the results based on the SAR-HOG feature by using SDDLSR as the baseline classifier. It is easy to comprehend the feature “intensity image”. The features “wavelet decomposition images” are actually the LL, LH and HL (L = low, H = high) sub-bands obtained after a multi-level wavelet decomposition using the 2D reverse biorthogonal wavelet [2]. The parameters in computing SAR-HOG are listed in Table 3. The SDDLSR is implemented in Algorithm 2, the fixed parameters of which are also listed in Table 3.
The results obtained by using Algorithm 2 and taking the intensity image as the input feature under SOC and EOC_d are recorded in Table 7 and Table 8, respectively. The results obtained by using Algorithm 2 and taking wavelet decomposition images as the input feature under SOC and EOC_d are recorded in Table 9 and Table 10, respectively. The results obtained by using Algorithm 2 and taking SAR-HOG as the input feature under SOC and EOC_d are recorded in Table 11 and Table 12, respectively. Results include the confusion matrices and the overall recognition rates. The confusion matrix is a matrix of fractions, with each column representing a possible decision class and each row representing a class that is present to the ATR system. The diagonal elements of the confusion matrix represent the fraction of correct decisions, and the mean value of the diagonal elements is the overall recognition rate.
By comparing the results in Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12, we can see that the SAR-HOG feature leads to the best performance under SOC and EOC_d by using the same classifier SDDLSR. For the SOC scenario, the recognition rates for using raw intensity images and using wavelet decomposition images are 0.9406 and 0.9380, compared to 0.9624 by using SAR-HOG. For the EOC_d scenario “Training 17 —Testing 30 ”, the recognition rates for using raw intensity images and using wavelet decomposition images are 0.9165 and 0.9028, compared to 0.9661 by using SAR-HOG. For the most challenging EOC_d scenario “Training 17 —Testing 45 ”, the recognition rate by using SAR-HOG is 0.8086, which is better by 9% and 10.89% than the result of 0.7186 by using raw intensity images and the result of 0.6997 by using wavelet decomposition images, respectively. Thus, we can conclude that a good feature can indeed improve the SAR ATR performance, and SAR-HOG has remarkable merit. The following experiments are all based on SAR-HOG, and they further verify the effectiveness of SAR-HOG for SAR ATR.

5.4. Ten Targets’ ATR under SOC

For the problem of ten targets’ ATR under SOC, we take SAR-HOG as the input feature and compare the proposed SDDLSR classifier with the SVM, kNN, SRC and LCK-SVD classifiers. The parameters in computing SAR-HOG and the parameters used in SDDLSR (Algorithm 2) are listed in Table 3. The implementations of SVM, kNN, SRC and LCK-SVD are illustrated in Section 5.7. The experimental results are recorded in Table 11 and Table 13, Table 14, Table 15 and Table 16.
By comparing the results in Table 11 and Table 13, Table 14, Table 15 and Table 16, we can see that, SDDLSR achieves the highest recognition rate of 0.9624, which is 2.37%, 1.97%, 1.43% and 1.23% better than SVM, kNN, SRC and LCK-SVD, respectively. These experiments show that SDDL can further capture the subtle differences among the different classes by learning a discriminative dictionary. Additionally, the sparse codes of the testing samples by sparsely decomposing become more distinctive.
By far, the best result under SOC is obtained by sparse representation on Riemannian manifolds (SRCR) (see Table VII in [3]), i.e., the recognition rate for ten targets under SOC is 0.9448. Compared with this result, the recognition rate by our proposed method improves 1.36%.

5.5. Four Targets ATR under EOC

As described in Section 3, SAR images have remarkable aspect sensitivity. The change of depression angle from 17 30 can lead to the drastic change of the SAR image, not to mention the change from 17 45 . Therefore, the stable structures of the targets are more difficult to capture; thus, a more efficient feature extraction method should be employed. For the problem of four targets’ ATR under EOC, we also take SAR-HOG as the input feature and compare the proposed SDDLSR classifier with the SVM, kNN, SRC and LCK-SVD classifiers. The parameters in computing SAR-HOG and the parameters used in SDDLSR (Algorithm 2) are listed in Table 3. The implementations of SVM, kNN, SRC and LCK-SVD are illustrated in Section 5.7. The experimental results of the proposed SDDLSR method and the reference methods are tabulated in Table 12 and Table 17, Table 18, Table 19 and Table 20.
By comparing the results in Table 12 and Table 17, Table 18, Table 19 and Table 20, we can see that, although the experiment setting “Training 17 —Testing 30 ” has a 13 depression angle difference, the results based on the SAR-HOG feature are acceptable on the whole. The SDDLSR achieves the best recognition rate of 0.9661, compared to 0.9132 for SVM, 0.9253 for kNN, 0.9305 for SRC and 0.9262 for LCK-SVD, respectively. In addition, it is no surprise that the performance of all of the methods degrades under the experiment setting “Training 17 —Testing 45 ”. Additionally, the recognition rate decreases by more than 10% for each method. The proposed method still achieves the highest recognition rate of 0.8086. It is 8.25%, 8.5%, 7.84% and 7.1% better than SVM, kNN, SRC and LCK-SVD, respectively.
By far, the best results under EOC_d are obtained by the TJSRmethod (see Figure 7 in [3]), i.e., the recognition rates for three targets (2S1, BRDM2 and ZSU23/4) under the EOC_d scenarios “Training 17 —Testing 30 ” and “Training 17 —Testing 45 ” are 0.9524 and 0.7073, respectively. Compared with these results, the recognition rates by our proposed method improve 1.37% and 10.13%, respectively. The experiments indicate that our method has more advantages to deal with the EOC scenario with a remarkable depression angle difference.

5.6. Experiment on Small Training Set

The recognition rates of different methods versus the reduced training set under “Training 17 —Testing 30 ” are shown in Figure 5. We can see that the recognition rates of all of the methods degrade with the reduced size of the training set. When about one third of the original training samples, i.e., 395 training samples, are left, the recognition rate of our method is 0.9314, compared to 0.8633 for SVM, 0.8980 for kNN, 0.9158 for SRC and 0.8874 for LCK-SVD, respectively. The experimental results indicate that the proposed method has the best performance with a reduced training set.

5.7. Time Consumption

The time consumption in SAR ATR tasks includes two main parts: the training time and the testing time. Generally, the classifier can be trained off-line. Additionally, the testing time plays an important role in the real-time operation of SAR ATR. For the proposed method, the training time is the time consumption of Algorithm 1. The testing time is the time consumption of sparse representation classification (see Equation (20)). In the proposed method, SAR-HOG calculation and the gradient descent algorithm are written in our unrefined MATLAB code; the LARS algorithm in Algorithm 2 is implemented by the SPAMS software [31]. For the reference methods, SVM is implemented by the well-known LIBSVM software [32]; kNN is implemented by the functions in MATLAB Statistics and the Machine Learning Toolbox; SRC is implemented by the SPAMS software; LCK-SVD is implemented by the code supplied in [27]. The time consumptions of different methods under “Training 17 —Testing 30 ” are recorded in Table 21. It can be seen that the computing time of SAR-HOG and the training time of the proposed method are acceptable, and the testing performance is satisfactory. In addition, the training stage of SRC is the dictionary construction by just stacking the training samples; the training stage of LCK-SVD is the discriminative dictionary learning.

6. Conclusions

In this paper, we proposed an SAR ATR method via supervised discriminative dictionary learning and sparse representation based on the SAR-HOG feature. Firstly, we adapted a local feature named HOG to SAR images by using the ratio-based gradient definition to deal with speckle noise. Then, we combined the merits of discriminative dictionary learning and supervised dictionary learning to propose the supervised discriminative dictionary learning. Using the SDDL, we obtained the discriminative dictionary and the classifier meanwhile. Finally, we sparsely decomposed the test sample onto the learned dictionary and identified the test sample based on the summation of the reconstruction error and the classification error.
In order to demonstrate the effectiveness and robustness of the proposed method, we performed extensive experiments on the MSTAR database under SOC and EOC. Additionally, we used several classification methods, including SVM, kNN, SRC and LCK-SVD, as reference methods. From the experimental results, we can obtain the following conclusions: (1) SAR-HOG can effectively capture the features embedded in the stable pixels corresponding to stable structures of targets; (2) SDDL can further capture the subtle differences between the different classes; and the proposed strategy can effectively simplify the optimization problem; (3) Compared with the published best results (see Figure 7 and Table VII in [3]), the proposed method can obtain higher recognition rate, especially for the EOC scenario “Training 17 —Testing 45 ”.
In the future, the proposed method can be extended to other applications. On the one hand, SAR-HOG can be used as an effective local feature to be applied to target detection, image indexing, SAR image registration, and so on. On the other hand, SDDL can be used to learn a representative and discriminative dictionary for discriminative tasks, and SDDLSR can be taken as a general classifier to deal with target detection (two category classification) and SAR image classification problems.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No. 41171317, in part by the Key Project of the NSFC under Grant No. 61132008 and in part by the Major Research Plan of the NSFC under Grant No. 61490693.

Author Contributions

Shengli Song proposed the general idea of SAR-HOG and SDDL for SAR ATR and performed the experiments. Bin Xu and Jian Yang reviewed the idea and provided many suggestions and much in-depth analysis. This manuscript was written by Shengli Song.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhao, Q.; Principe, J.C. Support vector machine for SAR automatic target recognition. IEEE Trans. Aerosp. Electron. Syst. 2001, 2, 643–654. [Google Scholar] [CrossRef]
  2. Srinivas, U.; Monga, V.; Raj, R.G. SAR automatic target recognition using discriminative graphical models. IEEE Trans. Aerosp. Electron. Syst. 2014, 1, 591–606. [Google Scholar] [CrossRef]
  3. Dong, G.G.; Kuang, G.Y.; Wang, N.; Zhao, L.J.; Lu, J. SAR target recognition via joint sparse representation of monogenic signal. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 7, 3316–3328. [Google Scholar] [CrossRef]
  4. Mishra, A.K. Validation of PCA and LDA for SAR ATR. IEEE Tencon 2008, 10, 1–6. [Google Scholar]
  5. Huang, X.Y.; Qiao, H.; Zhang, B. SAR target configuration recognition using tensor global and local discriminant embedding. IEEE Geosci. Remote Sens. Lett. 2016, 2, 222–226. [Google Scholar] [CrossRef]
  6. Zhou, J.X.; Shi, Z.G.; Cheng, X.; Fu, Q. Automatic target recognition of SAR images based on global scattering center model. IEEE Trans. Geosci. Remote Sens. 2011, 10, 3713–3729. [Google Scholar]
  7. Clemente, C.; Pallotta, L.; Proudler, I.; Maio, A.D.; Soraghan, J.J.; Farina, A. Pseudo-Zernike based multi-pass automatic target recognition from multi-channel SAR. IET Radar Sonar Navig. 2015, 9, 457–466. [Google Scholar] [CrossRef]
  8. Ding, J.; Chen, B.; Liu, H.W.; Huang, M.Y. Convolutional neural network with data augmentation for SAR target recognition. IEEE Geosci. Remote Sens. Lett. 2016, 3, 1–5. [Google Scholar] [CrossRef]
  9. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 25 June 2005.
  10. Wright, J.; Yang, A.; Ganesh, A.; Sastry, S.; Ma, Y. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 2, 210–227. [Google Scholar] [CrossRef] [PubMed]
  11. Olshausen, B.A.; Field, D.J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 1996, 6, 607–609. [Google Scholar] [CrossRef] [PubMed]
  12. Mairal, J.; Ponce, J.; Sapiro, G.; Zisserman, A.; Bach, F. Supervised dictionary learning. In Proceedings of the Neural Information Processing Systems 21, Vancouver, BC, Canada, 8–10 December 2008.
  13. Ramirez, I.; Sprechmann, P.; Sapiro, G. Classification and clustering via dictionary learning with structured incoherence and shared features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010.
  14. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 10, 91–110. [Google Scholar] [CrossRef]
  15. Weinmann, M. Visual features: From early concepts to modern computer vision. In Advanced Topics in Computer Vision, Advances in Computer Vision and Pattern Recognition; Farinella, G.M., Battiato, S., Cipolla, R., Eds.; Springer-Verlag: London, UK, 2013; pp. 1–34. [Google Scholar]
  16. Tuytelaars, T.; Mikolajczyk, K. Local invariant feature detectors: A survey. Found. Trends Comput. Graph. Vis. 2008, 1, 177–280. [Google Scholar] [CrossRef] [Green Version]
  17. Dai, D.; Yang, W.; Sun, H. Multilevel local pattern histogram for SAR image classification. IEEE Geosci. Remote Sens. Lett. 2011, 3, 225–229. [Google Scholar] [CrossRef]
  18. Cui, S.; Dumitru, C.O.; Datcu, M. Ratio-Detector-Based feature extraction for very high resolution SAR image patch indexing. IEEE Geosci. Remote Sens. Lett. 2013, 9, 1175–1179. [Google Scholar]
  19. Dellinger, F.; Delon, J.; Gousseau, Y.; Michel, J.; Tupin, F. SAR-SIFT: A SIFT-Like algorithm for SAR images. IEEE Trans. Geosci. Remote Sens. 2015, 1, 453–466. [Google Scholar] [CrossRef] [Green Version]
  20. Yang, F.; Gao, W.; Xu, B.; Yang, J. Multi-Frequency polarimetric SAR classification based on riemannian manifold and simultaneous sparse representation. Remote Sens. 2015, 7, 8469–8488. [Google Scholar] [CrossRef]
  21. Sun, X.; Nasrabadi, N.M.; Tran, T.D. Task-Driven dictionary learning for hyperspectral image classification with structured sparsity constraints. IEEE Trans. Geosci. Remote Sens. 2015, 8, 4457–4471. [Google Scholar] [CrossRef]
  22. Aharon, M.; Elad, M.; Bruckstein, A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 2006, 11, 4311–4322. [Google Scholar] [CrossRef]
  23. Mairal, J.; Bach, F.; Ponce, J.; Sapiro, G. Online dictionary learning for sparse coding. Proc. ICML 2009, 6, 689–696. [Google Scholar]
  24. Mairal, J.; Bach, F.; Ponce, J. Task-driven dictionary learning. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 4, 791–804. [Google Scholar] [CrossRef] [PubMed]
  25. Gao, S.; Tsang, I.W.H.; Ma, Y. Learning category-specific dictionary and shared dictionary for fine-grained image categorization. IEEE Trans. Image Process. 2014, 2, 623–634. [Google Scholar]
  26. Zhang, Q.; Li, B. Discriminative K-SVD for dictionary learning in face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010.
  27. Jiang, Z.L.; Lin, Z.; Davis, L.S. Label consistent K-SVD: Learning a discriminative dictionary for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 11, 2651–2664. [Google Scholar] [CrossRef] [PubMed]
  28. Gangeh, M.J.; Farahat, A.K.; Ghodsi, A.; Kamel, M.S. Supervised dictionary learning and sparse representation–A Review. Comput. Sci. 2015, 2, 1–56. [Google Scholar]
  29. Keydel, E.; Lee, S.; Moore, J. MSTAR extended operating conditions: A tutorial. Proc. SPIE 1997, 4, 228–242. [Google Scholar]
  30. Touzi, R.; Lopes, A.; Bousquet, P. A statistical and geometrical edge detector for SAR images. IEEE Trans. Geosci. Remote Sens. 1988, 11, 764–773. [Google Scholar] [CrossRef]
  31. Mairal, J.; Bach, F.; Ponce, J. Sparse modeling for image and vision processing. Found. Trends Comput. Graph. Vis. 2014, 12, 85–283. [Google Scholar] [CrossRef] [Green Version]
  32. Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 3, 1–27. [Google Scholar] [CrossRef]
Figure 1. Scheme of the ratio of average (ROA). (a) The ratio of the local means for the horizontal direction. (b) The ratio of the local means for the vertical direction.
Figure 1. Scheme of the ratio of average (ROA). (a) The ratio of the local means for the horizontal direction. (b) The ratio of the local means for the vertical direction.
Remotesensing 08 00683 g001
Figure 2. Scheme of SAR-HOG.
Figure 2. Scheme of SAR-HOG.
Remotesensing 08 00683 g002
Figure 3. The recognition rate versus the average region size and histogram bins. (a) The recognition rate versus the average region size w i n . w i n is 1, 3, 5, 7, 9, 11, 13 and 15. (b) The recognition rate versus the histogram bins. Bins are 3, 5, 7, 9, 11, 13 and 17.
Figure 3. The recognition rate versus the average region size and histogram bins. (a) The recognition rate versus the average region size w i n . w i n is 1, 3, 5, 7, 9, 11, 13 and 15. (b) The recognition rate versus the histogram bins. Bins are 3, 5, 7, 9, 11, 13 and 17.
Remotesensing 08 00683 g003
Figure 4. The recognition rate versus b and P k in supervised discriminative dictionary learning (SDDL). (a) The recognition rate versus b. b is 0.01, 0.05, 0.1, 0.5 and 1. (b) The recognition rate versus P k . P k is 32, 48, 64, 80, 96, 112 and 128.
Figure 4. The recognition rate versus b and P k in supervised discriminative dictionary learning (SDDL). (a) The recognition rate versus b. b is 0.01, 0.05, 0.1, 0.5 and 1. (b) The recognition rate versus P k . P k is 32, 48, 64, 80, 96, 112 and 128.
Remotesensing 08 00683 g004
Figure 5. Recognition rates of different methods versus the training set.
Figure 5. Recognition rates of different methods versus the training set.
Remotesensing 08 00683 g005
Table 1. The number of images for training and testing about the ten targets under standard operating conditions (SOC).
Table 1. The number of images for training and testing about the ten targets under standard operating conditions (SOC).
Target2S1BRDM2BTR60D7T62ZIL131ZSU23/4BRT70T72BMP2
Training ( 17 )299298256299299299299233(SN_132)
232
(SN_9563)
233
Testing ( 15 )274274195274273274274196(SN_132)
196
(SN_812)
195
(SN_s7)
191
(SN_9563)
195
(SN_9566)
196
(SN_c21)
196
Table 2. The number of images for training and testing about the four targets under extended operating conditions (EOC).
Table 2. The number of images for training and testing about the four targets under extended operating conditions (EOC).
Target2S1BRDM2ZSU23/4T72(SN_A64)
Training ( 17 )299298299299
Testing ( 30 )288287288288
Testing ( 45 )303303303303
Table 3. The parameters used in the proposed method.
Table 3. The parameters used in the proposed method.
Parameter win BinsCell (Pixels)Block (Cells)Stride (Pixels)λb P k M
Value1111 8 × 8 4 × 4 16 0 . 3 0 . 1 9620
Table 4. The recognition rate versus different (cell, block)s.
Table 4. The recognition rate versus different (cell, block)s.
Block (Cells)
2345678910
40.96610.97830.97830.97390.97040.96520.95130.92350.9139
60.96090.97130.97310.96610.93830.92610.93040.93480.9287
cell (pixels)80.96610.97140.96610.95220.93570.94530.9470--
100.95050.93920.93830.94270.9505----
120.93920.93400.93140.9349-----
“-” denotes that the block size is bigger than the image size. The items for which recognition rates are bigger than 0.96 are in bold.
Table 5. The time consumption (min) per iteration of Algorithm 1 versus different (cell, block)s.
Table 5. The time consumption (min) per iteration of Algorithm 1 versus different (cell, block)s.
Block (Cells)
2345678910
436.559124.323728.378719.061616.45519.512916.62145.34178.0049
64.89535.02653.25442.59861.12692.03170.20950.35570.6282
cell (pixels)82.04081.19311.17060.56770.07440.12990.2157--
100.55140.36040.21840.03790.0747----
120.23550.08170.02550.0404-----
“-” denotes that the block size is bigger than the image size. The items for which the corresponding recognition rates in Table 4 are bigger than 0.96 are in bold.
Table 6. The recognition rate versus the stride.
Table 6. The recognition rate versus the stride.
stride (pixels)081624
recognition rate0.95050.95050.96610.9653
Table 7. The result of supervised discriminative dictionary learning sparse representation (SDDLSR) based on the raw intensity image under SOC.
Table 7. The result of supervised discriminative dictionary learning sparse representation (SDDLSR) based on the raw intensity image under SOC.
2S1BRDM2BTR60D7T62ZIL131ZSU23/4BRT70T72BMP2
2S10.93800000.007300.0547000
BRDM20.00510.9590000000.030800.0051
BTR6000.01540.8769000.010300.08720.00510.0051
D70000.98180.01090.00730000
T620000.04400.94130.00370.0110000
ZIL131000.00360.007300.98910000
ZSU23/40.0146000000.9854000
BRT70000.005100000.989800.0051
T7200.00860.012000000.02580.87460.0790
BMP200.02210.001700000.03580.08010.8603
recognition rate0.9406
Table 8. The result of SDDLSR based on the raw intensity image under EOC_d.
Table 8. The result of SDDLSR based on the raw intensity image under EOC_d.
Training( 17 )—Testing( 30 )Training( 17 )—Testing( 45 )
2S1BRDM2ZSU23/4T72(SN_A64)2S1BRDM2ZSU23/4T72(SN_A64)
2S10.97570.0243000.74590.20130.04620.0066
BRDM20.05230.80140.00700.13940.05280.82180.00660.1188
ZSU23/40.03820.00690.95140.00350.12540.07920.71950.0759
T72(SN_A64)0.010400.05210.93750.11550.04950.24750.5875
recognition rate0.91650.7186
Table 9. The result of SDDLSR based on the wavelet decomposition image under SOC.
Table 9. The result of SDDLSR based on the wavelet decomposition image under SOC.
2S1BRDM2BTR60D7T62ZIL131ZSU23/4BRT70T72BMP2
2S10.97820000.003600.0182000
BRDM20.00360.93070.010900000.01090.03650.0073
BTR6000.01540.953800000.025600.0051
D70000.985400.01460000
T620000.02560.96340.01100000
ZIL1310000.010900.98910000
ZSU23/40.0073000000.9927000
BRT7000.00510.020400000.974500
T7200.03090.024100000.02410.89860.0223
BMP200.07160.015300000.09370.10560.7138
recognition rate0.9380
Table 10. The result of SDDLSR based on the wavelet decomposition image under EOC_d.
Table 10. The result of SDDLSR based on the wavelet decomposition image under EOC_d.
Training( 17 )—Testing( 30 )Training( 17 )—Testing( 45 )
2S1BRDM2ZSU23/4T72(SN_A64)2S1BRDM2ZSU23/4T72(SN_A64)
2S10.96180.02080.00690.01040.85480.12210.00990.0132
BRDM20.02440.96860.00350.00350.05940.93400.00330.0033
ZSU23/40.03470.02080.92710.01740.10230.15840.66340.0759
T72(SN_A64)0.09380.02430.12850.75350.27390.24090.13860.3465
recognition rate0.90280.6997
Table 11. The result of SDDLSR based on SAR-HOG under SOC.
Table 11. The result of SDDLSR based on SAR-HOG under SOC.
2S1BRDM2BTR60D7T62ZIL131ZSU23/4BRT70T72BMP2
2S10.9891000000.0109000
BRDM200.96350.007300000.01820.00730.0036
BTR600.005100.933300000.04100.01030.0103
D70000.992700.00730000
T620000100000
ZIL1310000010000
ZSU23/40000001000
BRT7000.00510.005100000.969400.0204
T7200.01720.015500000.00860.88830.0704
BMP200.02900.005100000.02390.05450.8876
recognition rate0.9624
Table 12. The result of SDDLSR based on the SAR-HOG under EOC_d.
Table 12. The result of SDDLSR based on the SAR-HOG under EOC_d.
Training( 17 )—Testing( 30 )Training( 17 )—Testing( 45 )
2S1BRDM2ZSU23/4T72(SN_A64)2S1BRDM2ZSU23/4T72(SN_A64)
2S10.97920.00350.01040.00690.85150.04950.06600.0330
BRDM20.00700.9930000.01650.957100.0264
ZSU23/40.00350.00690.97220.01740.02970.01650.79540.1584
T72(SN_A64)0.041700.03820.92010.14520.04950.17490.6304
recognition rate0.96610.8086
Table 13. The result of SVM based on SAR-HOG under SOC.
Table 13. The result of SVM based on SAR-HOG under SOC.
2S1BRDM2BTR60D7T62ZIL131ZSU23/4BRT70T72BMP2
2S10.9854000000.0146000
BRDM200.95260.010900000.02190.00360.0109
BTR600.00510.02050.876900000.04100.04100.0154
D70000.98540.00730.00730000
T620000.00370.996300000
ZIL1310000010000
ZSU23/40000001000
BRT7000.00510.030600000.949000.0153
T7200.01030.013700000.01200.89860.0653
BMP200.04940.030700000.04770.12950.7428
recognition rate0.9387
Table 14. The result of kNN based on SAR-HOG under SOC.
Table 14. The result of kNN based on SAR-HOG under SOC.
2S1BRDM2BTR60D7T62ZIL131ZSU23/4BRT70T72BMP2
2S10.9927000000.0073000
BRDM200.90510.025500000.02920.01820.0219
BTR6000.00510.912800000.04620.01540.0205
D70000.992700.00730000
T620000.00370.996300000
ZIL1310000010000
ZSU23/40000001000
BRT7000.01530.020400000.943900.0204
T7200.02750.039500000.00520.81620.1117
BMP200.03240.013600000.02730.05960.8671
recognition rate0.9427
Table 15. The result of sparse representation classifier (SRC) based on SAR-HOG under SOC.
Table 15. The result of sparse representation classifier (SRC) based on SAR-HOG under SOC.
2S1BRDM2BTR60D7T62ZIL131ZSU23/4BRT70T72BMP2
2S10.9927000000.0073000
BRDM200.91240.010900000.04380.02550.0073
BTR6000.01030.938500000.03590.00510.0103
D70000.98180.01090.00730000
T620000.00370.996300000
ZIL1310000010000
ZSU23/40000001000
BRT7000.00510.010200000.964300.0204
T7200.02580.037800000.01200.83330.0911
BMP200.02900.010200000.03920.05960.8620
recognition rate0.9481
Table 16. The result of label consistent K(LCK)-SVD based on SAR-HOG under SOC.
Table 16. The result of label consistent K(LCK)-SVD based on SAR-HOG under SOC.
2S1BRDM2BTR60D7T62ZIL131ZSU23/4BRT70T72BMP2
2S10.9781000000.0219000
BRDM200.94530.014600000.02190.01460.0036
BTR6000.00510.943600000.01030.02560.0154
D70000.98540.00360.01090000
T620000.00730.992700000
ZIL1310000010000
ZSU23/40.00360000.003600.9927000
BRT70000.020400000.969400.0102
T7200.00860.027500000.00860.88140.0739
BMP200.02900.037500000.04260.07840.8126
recognition rate0.9501
Table 17. The result of SVM based on SAR-HOG under EOC_d.
Table 17. The result of SVM based on SAR-HOG under EOC_d.
Training( 17 )—Testing( 30 )Training( 17 )—Testing( 45 )
2S1BRDM2ZSU23/4T72(SN_A64)2S1BRDM2ZSU23/4T72(SN_A64)
2S10.954900.02430.02080.77230.10890.02640.0924
BRDM20.01050.9895000.03630.957100.0066
ZSU23/4000.89930.10070.00990.07590.55780.3564
T72(SN_A64)0.062500.12850.80900.18150.10560.09570.6172
recognition rate0.91320.7261
Table 18. The result of kNN based on SAR-HOG under EOC_d.
Table 18. The result of kNN based on SAR-HOG under EOC_d.
Training( 17 )—Testing( 30 )Training( 17 )—Testing( 45 )
2S1BRDM2ZSU23/4T72(SN_A64)2S1BRDM2ZSU23/4T72(SN_A64)
2S10.965300.01740.01740.81520.05280.07590.0561
BRDM20.07320.895500.03140.06270.83830.00660.0924
ZSU23/40.013900.95830.02780.08580.05940.65680.1980
T72(SN_A64)0.041700.07640.88190.08910.06270.26400.5842
recognition rate0.92530.7236
Table 19. The result of SRC based on SAR-HOG under EOC_d.
Table 19. The result of SRC based on SAR-HOG under EOC_d.
Training( 17 )—Testing( 30 )Training( 17 )—Testing( 45 )
2S1BRDM2ZSU23/4T72(SN_A64)2S1BRDM2ZSU23/4T72(SN_A64)
2S10.951400.01390.03470.86470.07260.02970.0330
BRDM20.04880.94080.00350.00700.09900.88450.00330.0132
ZSU23/40.00350.01390.94440.03820.06270.08910.60070.2475
T72(SN_A64)0.045100.06940.88540.09240.06600.27060.5710
recognition rate0.93050.7302
Table 20. The result of LCK-SVD based on SAR-HOG under EOC_d.
Table 20. The result of LCK-SVD based on SAR-HOG under EOC_d.
Training( 17 )—Testing( 30 )Training( 17 )—Testing( 45 )
2S1BRDM2ZSU23/4T72(SN_A64)2S1BRDM2ZSU23/4T72(SN_A64)
2S10.95490.01040.01390.02080.73600.12210.05280.0891
BRDM20.01740.975600.00700.04620.85810.00660.0891
ZSU23/400.00690.93750.05560.06270.02640.68650.2244
T72(SN_A64)0.02080.00350.13890.83680.13200.06600.13200.6700
recognition rate0.92620.7376
Table 21. The time consumption of different methods (s).
Table 21. The time consumption of different methods (s).
MethodComputing Time of SAR-HOG per ImageTraining TimeTesting Time
SVM 5.766011.2620
kNN 0.61801.0440
SRC0.96600.03621.1820
LCK-SVD 113.760.8160
proposed 1404.720.8280

Share and Cite

MDPI and ACS Style

Song, S.; Xu, B.; Yang, J. SAR Target Recognition via Supervised Discriminative Dictionary Learning and Sparse Representation of the SAR-HOG Feature. Remote Sens. 2016, 8, 683. https://0-doi-org.brum.beds.ac.uk/10.3390/rs8080683

AMA Style

Song S, Xu B, Yang J. SAR Target Recognition via Supervised Discriminative Dictionary Learning and Sparse Representation of the SAR-HOG Feature. Remote Sensing. 2016; 8(8):683. https://0-doi-org.brum.beds.ac.uk/10.3390/rs8080683

Chicago/Turabian Style

Song, Shengli, Bin Xu, and Jian Yang. 2016. "SAR Target Recognition via Supervised Discriminative Dictionary Learning and Sparse Representation of the SAR-HOG Feature" Remote Sensing 8, no. 8: 683. https://0-doi-org.brum.beds.ac.uk/10.3390/rs8080683

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop