Next Article in Journal
Integrate the Canopy SIF and Its Derived Structural and Physiological Components for Wheat Stripe Rust Stress Monitoring
Previous Article in Journal
Deep Learning Aided Time–Frequency Analysis Filter Framework for Suppressing Ionosphere Clutter
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generative Adversarial Networks Based on Transformer Encoder and Convolution Block for Hyperspectral Image Classification

1
School of Artificial Intelligence, Xidian University, Xi’an 710071, China
2
College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(14), 3426; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14143426
Submission received: 10 June 2022 / Revised: 11 July 2022 / Accepted: 13 July 2022 / Published: 16 July 2022

Abstract

:
Nowadays, HSI classification can reach a high classification accuracy when given sufficient labeled samples as training set. However, the performances of existing methods decrease sharply when trained on few labeled samples. Existing methods in few-shot problems usually require another dataset in order to improve the classification accuracy. However, the cross-domain problem exists in these methods because of the significant spectral shift between target domain and source domain. Considering above issues, we propose a new method without requiring external dataset through combining a Generative Adversarial Network, Transformer Encoder and convolution block in a unified framework. The proposed method has both a global receptive field provided by Transformer Encoder and a local receptive field provided by convolution block. Experiments conducted on Indian Pines, PaviaU and KSC datasets demonstrate that our method exceeds the results of existing deep learning methods for hyperspectral image classification in the few-shot learning problem.

Graphical Abstract

1. Introduction

By analyzing hyperspectral images (HSIs), we can explore both abundant spatial information and rich spectral information [1,2,3]. Compared with RGB images, it can be applied in fields such as mineral detection, disaster prevention and precision agriculture by precisely classifying each pixel [4,5,6]. In environmental protection, HSI can detect gas [7], oil spills [8], water quality [9,10] and vegetation coverage [11,12].
There are hundreds of spectral bands in each pixel taken from a hyperspectral image and thus forms a three-dimensional data cube. Every spectral band in the cube can be seen as a 2D image. By analyzing the vast amount of information in this 3D cube, each pixel can be predicted with a unique label, and various classes are discriminated as accurately as possible. Through the rapid development of classification accuracy, HSIs have become the foundation of military, agriculture and astronomy.
In the early times, researchers mainly focused on traditional machine learning methods such as logistic regression [13], neural networks [14], principal component analysis (PCA) [15] and support vector machines (SVM) [16]. However, these methods cannot fully utilize the non-linear information in the high-dimensional hyperspectral data.
In the deep learning era, the convolutional neural network (CNN) has achieved satisfactory results with the invention of different models. The CNN can effectively capture features from raw pixels by exploiting the shape, layout and texture of ground objects which combines both the spatial information and spectral information. In [17], a 2D-CNN and a 1D-CNN are combined together to explore more useful features for classification from spatial information and spectral information. Since a 3D-CNN have more advantages in processing the 3D information, Li et al. [18] and Chen et al. [19] developed a classification framework consisting of 3D convolution blocks to process the cubes around each pixel. In [20], Xu used the dual-channel model to combine a 3D-CNN and a 2D-CNN to learn useful spatial information and spectral information of HSIs. Then, this extracted information is merged and put into a classification block formed of fully connected layers to improve the accuracy. Recently, the SOTA in hyperspectral image classification has been able to reach 99% classification accuracy in the condition of sufficient labeled data.
However, these good results are obtained only under the condition of sufficiently labeled data. While a human can classify new classes by learning a few labeled samples, the performance of these methods decreases sharply when labeled samples are scarce. It is time-consuming and costly to label the data manually. If we only train the network until enough pixels have been labeled, it will be impossible to perform classification in real time. Learning how to obtain good results under the condition that there are only a few labeled classes has recently attracted more and more attention. The so-called few-shot classification means that each class is given K-labeled samples as training data to make predictions on the whole dataset. Usually, the value of K will be set to a small number here, which is 5, 10, 15, 20 and 25 in our experimental settings.
In order to solve a few-shot problem, the unlabeled data and outer dataset are considered to solve the problem [21]. Semi-supervised methods and active learning methods have been proposed based on the assumption that there is no severe shift between the two data distributions which are the target domain data and source domain data. VSCNN [22] uses active learning to select valuable samples from uncertain dataset to form training sample set and improve the small sample classification. However, affected by various environmental conditions such as light or atmosphere, even the pixels from the two different domains, which are the target domain and source domain, have the same labeled class, and the target and source domain usually have significant spectral shift. Domain-adaptation methods are proposed in order to solve this cross-domain problem.
DCFSL [23] is proposed by combining few shot learning and a domain adaptation strategy in the conditional adversarial manner together to address the issue that there may be different data distributions between the target domain and the source domain. MDL4OW [24] improves the classification accuracy by identifying unknown classes. MDL4OW uses the statistical mode EVT to estimate the unknown score and a new evaluation metric to evaluate the accuracy. These methods try to both solve the few-shot learning problem by applying a framework of utilizing other datasets. However, the fitness of outer dataset is still a burden of the few-shot problem.
In combination with metric learning, domain adaptation can solve the few-shot learning problem without involving an external dataset. Metric learning can learn a relationship between sample pairs by mapping the samples into a metric space. In this space, the distance between the samples of same classes will be as close as possible and the distance between the samples without the same classes is as large as possible. S-DMM [25] proposes a model based on metric learning and then learns the similarity between sample pairs using a Siamese network and an auto-encoder. S-DMM solves the cross-scene HSI classification by applying the deep learning method. However, the metric learning method has the defect of being very time-consuming.
Solving the few-shot problem while being time-efficient and without using other data is not a trivial task. Nevertheless, the above methods cannot meet the requirements. In summary, the few-shot problem in classifying HSI faces the following challenges:
How to reach a high accuracy in few-shot problem. Considering the cost of manually labeling every pixel in a hyperspectral image, reaching a satisfying accuracy result under the condition of giving a few training samples can bring great economic benefits. However, this is difficult because the network relies on learning the distribution of labeled samples to make predictions. If the amount of training data is not large enough, it will be very difficult for the network to achieve high accuracy.
How to solve the few-shot problem without involving an outer dataset. Because of the existence of severe shifts in the sample distribution between the source dataset and the target dataset, finding an appropriate outer dataset as the source dataset is a hard task. As we want to solve the few-shot problem by applying different datasets successfully, it may be a better choice to achieve high accuracy without including an outer dataset. In this way, searching for useful source datasets for every target dataset will not be required.
How to solve the few-shot problem with a fast speed. The proposed methods usually have huge time requirements to solve the above problem because of the defects in the methods themselves, such as metric learning. In some cases, classifying an HSI as quick as possible is very important.
Considering the above problems, we propose a new method employing all the benefits of convolution blocks and Transformer Encoders to solve few-shot learning in this paper. Convolution blocks have the benefits of shared weight, spatial subsampling and local receptive fields, and Transformer Encoders have the advantages of dynamic attention, better generalization and global context fusion. Combined with a generative adversarial network, this method can ensure the similarity between generated and original samples. We do not use any other dataset or unlabeled data in this paper to solve the few-shot learning problem. The main contributions of our paper are as follows:
(1)
For the first time, a convolution block, Transformer Encoder and Generative Adversarial Network are combined together to realize the few-shot classification of HSIs. Through this model, we can learn the data distribution by only using a few samples and can reach a high accuracy on different datasets. With this efficient model, we also achieve the aim of not using outer datasets.
(2)
We solve the few-shot problem with better time efficiency. Considering the time consumption of training Transformers, we speed up the training time by combining the Transformer Encoder with convolution blocks.
(3)
The method proposed in the paper achieved good classification results on the Indian Pines, PaviaU and KSC datasets compared with other few-shot learning methods.

2. Related Work

2.1. Transformer Combined with Convolution

Convolution blocks can capture local features efficiently by using local receptive fields. While self-attention-based architectures, such as Transformers [26], have the advantages that convolution-based architectures do not have, they can capture global information by dynamic attention and global context fusion. LSTM and CNN are combined by replacing the feature fusion block with LSTM in Wang et al. [27]. SENet [28] uses the squeeze operation and excitation operation to obtain the relationship between channels. Moreover, SENet is improved by CBAM [29] through adding spatial attention. The Split-Attention block is introduced in ResNeSt [30] to extract the attention between multi-layer feature-maps. Swin-Transformer and UperNet are combined for segmentation in hyperspectral image classification in Xu et al. [31]. TRS [32] combines the ResNet with the Transformer by replacing the convolutions in the ResNet with a Multi-Head Self-Attention layer. SATNet [33] improves the self-attention mechanism by introducing a spectral attention mechanism to extract the spectral-spatial features. HSI-BERT [34] first tries self-attention-based architecture and then proposes BERT as the framework to classify HSIs. HSI-BERT can obtain good results under the condition of sufficient samples. Recently, ViT [35] has been proposed to try using a Transformer on the image classification and obtain state-of-the-art performance by training and testing on the ImageNet dataset. The Transformer comes from the nature processing language. Its main idea is to split images into patches, treat these patches as tokens and then input them into standard Transformer layers repeatedly. CVT [36] combines both the convolution block and the Transformer Encoder and has the advantage of learning local and global relations efficiently.

2.2. Generator Combined with Self-Attention

The generative adversarial network [37] is an efficient method for solving few-shot learning problems by generating more samples to train a discriminator to achieve the best classification result. However, the GAN is hard to train because the information cannot flow efficiently across the generator and discriminator, which is the essential point to generate samples having distributions similar to real samples. At first, the GAN is included to solve small-sampled problems which take the training set on a small percent considering the whole dataset. CA-GAN [38] uses collaborative and competitive training and uses joint spatial–spectral hard attention modules to solve small-sampled problems by suppressing less useful features and emphasizing more discriminative ones. SaGAAN [39] adds the cross-domain loss term to generate high-quality generated samples and includes the self-attention mechanism to reduce unintentional noises. While the few-shot problem requires taking a lower and fixed number of samples in every class as the trainset, these methods deteriorate greatly in this condition.

3. Methodology

In order to learn the data distribution with only a few samples, the neighbors around a pixel are taken as a whole, which represent this pixel label and input the network as training sample. This cube around a pixel has a window size of W × W , which has W in width and W in height. Considering the spectral bands have N channels, the whole cube has a size of W × W × N . In our network, this network will use a dual-channel block and fusion block to make a classification. The dual-channel block will learn the spatial information and spectral information around the label pixel and compress the cube to an appropriate size. The output of the dual-channel block is the input of the fusion block, and the fusion block will perform the final classification through learning the input. A generator is used to generate a same-sized cube to promote the classification accuracy. In the ablation experiments, classification results are improved by the generator. The whole network is shown in Figure 1.

3.1. Transformer Encoder

Transformer Encoder, as a self-attention-based architecture, has achieved great success in natural language processing (NLP) by training on a large text corpus with over 100B parameters. The main idea of Transformer Encoder is the use of a self-attention mechanism, which learns three matrices representing query, key and value and then obtains the long-distance relationship using Equation (1):
A ( x 1 ) = i = 1 n ( Q 1 K i T d ) V i ,
A t t e n t i o n ( x 1 , x 2 , . . . , x n ) = s o f t max ( A ( x 1 ) , A ( x 2 ) , . . . , A ( x n ) ) ,
where V, K and Q are the abbreviations of value, key and query, respectively. d is the dimensions of Q and K, and n defines the sequence length of x. By learning the attention matrix of each node, the Transformer Encoder can obtain the global relationship between nodes. The architecture of the Transformer Encoder used in this paper is shown in Figure 2.

3.2. Spatial Feature Extraction

At first, the spectral information is compressed by applying 1 × 1 convolution kernels while not decreasing original spatial relationships. The spectral bands are decreased from N bands to 3 bands to make this channel focus on the spatial correlations. After spectral feature extraction, each pixel in a patch can provide more useful spatial information for few-shot classification. The compressed cube is passed through three successive hybrid convolution and Transformer Encoder layers. This channel splits the cube into M patches, and every patch has p in width and p in height, where M = ( W / p ) 2 . The relationship between each patch will be learned by the Transformer through dot-product attention, and convolution manipulation is applied after every Transformer Encoder in order to obtain the local attention while also attaining the global attention. A ReLU activation function is applied after each convolution manipulation, and a batch normalization layer is applied after each ReLU activation function. The detailed structure of Spatial Feature Extraction is shown in Table 1.

3.3. Spectral Feature Extraction

Two-dimensional convolution and self-attention blocks are used in this channel to compress N bands gradually in order to learn spectral information. The input cube with size of W × W × N (width and height both have W pixels, and spectral bands are N) is split into M groups. Each group’s width and each group’s height both have p pixels, where M = ( W / p ) 2 . The bands in every pixel will learn the relationship with other bands in the same group through group-divided attention. The group-divided self-attention is shown in Figure 3. This group-divided self-attention will make the remaining spectral information focuses on the pixels in the original p × p group and learns the spectral correlation between the pixels in the original p × p group. Before every self-attention block, the cube is convoluted with a kernel of 3 in size and 1 in stride in order to introduce the spatial subsampling, joint weighting and local receptive fields of the convolution block into this channel. This step will also realize suitable spectral information. A ReLU activation function is applied after each convolution manipulation, and a batch normalization layer is applied after each ReLU activation function.
The dual-channel block is shown in Figure 4. In order to illustrate it clearly, the network sample of a 3D cube in Indian Pines is used as an example. This cube has a size of 15 × 15 × 200 (width and height both have 15 pixels, and the number od spectral bands is 200). The detailed structure of Spectral Feature Extraction is shown in Table 2.

3.4. Fusion Block

The spectral channel obtains the output size of p × p × f , where f = ( W / p ) 2 × 3 . The spatial channel obtains the output size of W × W × 3 . Then, we reshape the output of the spatial channel into a cube of size p × p × f and concatenate it with spectral channel output at the last dimension. By using these two channels, we can abandon the position embedding, which is used in ViT [35] to retain positional information. In order to fuse the information extracted from two channels represented by spatial and spectral, a fully connected layer is applied at the last dimension to obtain the output size p × p × f . After that, the p × p × f cube and an extra learnable embedding are passed to three successive convolution and Transformer Encoder hybrid layers. This extra learnable embedding is used as a classification token to obtain prediction results by going through an MLP. The Fusion Block is shown in Figure 5.

3.5. Generator

In the NLP domain, the Transformer takes words as a sequence and computes the importance between each word. However, a hyperspectral image cannot be divided pixel-by-pixel, as this would result in a sequence that is too long to process in terms of both computational cost and efficiency. Inspired by some GANs which generate images layer-by-layer, we iteratively upscale the image size to reduce the computation cost and enhance network generation ability. We use shufflepixel and Transformer Encoder iteratively to make the resolution four times larger at each iteration. The Generator is shown in Figure 6. The detailed structure of the Generator is shown in Table 3. The framework of our system is shown in Algorithm 1.
Algorithm 1 The framework of our system
Input:
 The training data selected from K classes. The class labels of training samples. The test data from K classes.
Output:
1:
Extracting the spatial information through the Spatial Feature Extract.
2:
Extracting the spectral information through the Spectral Feature Extract.
3:
Fusing the two channels’ information through the Fusion Block.
4:
Generating random noises from uniform distribution.
5:
Generating samples from the generator by using the random noises.
6:
Training the network through the generated samples.
RETURN: The labels of test data;

4. Experiments

In this section, three leading HSI data sets are selected to conduct HSI classification experiments. The experiments are implemented on the pytorch open source software framework using the NVIDIA 3080Ti graphics card.
The Indian Pines dataset was gathered in 1992 and has 224 bands. The AVIRIS stands for airborne visible infrared imaging spectromet, and its data were gathered in northwestern Indiana. The band’s visible and infrared spectra range from 400 to 2500 nm, and 200 spectral bands are used in this paper because of the atmospheric absorption compared with the original 224 bands. The size of the dataset is 145 × 145, and some of these pixels are labeled to 16 classes. Table 4 shows the data division of the Indian Pines dataset for this experiment.
The Pavia University dataset was gathered by the ROSIS sensor in 2002 and has 115 bands. The ROSIS stands for Reflective Optics System Imaging Spectrometer, and its data were gathered over the University of Pavia campus during a flight campaign. The band’s visible and infrared spectra range from 430 to 860 nm, and the ground resolution of this dataset is 1.3 m. Affected by noise and water absorption, some bands were abandoned, and 103 spectral bands are used in this paper. The size is 610 × 340 in this dataset, and some of these pixels are labeled to 9 classes. Table 5 shows the training and testing data division of the Pavia University dataset.
The Kennedy Space Center dataset was gathered by NASA AVIRIS in 1996 and has 224 bands. The band’s visible and infrared spectra range from 400 to 2500 nm, and the ground resolution of this dataset is 18 m. Because of the existence of water absorption, some affected and low SNR bands were abandoned, and 176 spectral bands were used in this paper. The size of the dataset is 512 × 614, and some of these pixels are labeled to 13 classes. Table 6 shows the training and testing data division of the KSC dataset.
To demonstrate how our method performs, we compared our method with eight different methods, including SVM [40], 2D-CNN [41], 3D-CNN [18], HSI-BERT [34], CA-GAN [38], DCFSL [23], VSCNN [22] and S-DMM [25]. DCFSL, VSCNN and S-DMM are the few-shot learning methods in hyperspectral image classification and obtain good results. DCFSL can utilize other datasets by combining few-shot learning and a domain adaptation strategy in the conditional adversarial manner. VSCNN uses active learning to select valuable samples from uncertain dataset to form training sample set and improve the few-shot learning ability. S-DMM can learn more features by learning the similarity between sample pairs using a Siamese network and an auto-encoder based on metric-learning.
For the fairness of the experiments, all the methods use their optimal parameters. The experiment is divided into five groups for IP and PU by the number of training samples, and the training samples of every class in each group have the numbers of 5, 10, 15, 20 and 25, respectively. In addition, the experiment is divided into three groups for Kennedy Space Center by the number of training samples, and the training samples of every class in each group have the numbers of 15, 20 and 25, respectively. Taking five per class for example, five samples are randomly selected from every class as the samples for training and the left samples are used as the testing set. We adopt the overall accuracy (OA) as the evaluation metric to measure the classification performance. All experiments are averaged on 10 times independent training results.
The above experiments are shown in Table 7, Table 8 and Table 9. From the tables, we can find that as more samples are labeled, the accuracy reaches a higher score. Our proposed method outperforms in all conducted experiments, which demonstrates the ability of our method regardless of the change in the number of labeled samples. When other methods can obtain a good result on single dataset but cannot fit to the others, it means they do not have good adaptation ability, which is essential in few-shot learning problems. Because we cannot predict what dataset we will encounter, we need to have good results on different datasets.
Given 15 labeled samples as training samples per class, the corresponding classification maps of all the selected methods in IP dataset are shown in Figure 7. In addition, the corresponding detailed maps of PU and KSC are shown in Figure 8 and Figure 9, respectively. It can obviously be seen that our classification map matches best with the image labeled with ground truth in all the images, which means that other methods assigned more incorrect labels to the pixels compared to our method. Moreover, Table 10, Table 11 and Table 12 show the detailed accuracy of every class classification with 15 labeled samples as training samples on different datasets.
Our method achieves better results on most land classes. In particular, on the Indian Pines dataset, our method obtains the highest classification results on 13 classes out of 16 classes. For the classes “Corn-notill”, “Soybean-mintill” and “Woods”, where the proportion between the number of testing sets and the number of training sets is huge, our method obtains classification results of 78.77%, 81.39% and 97.28%, respectively. Our method is greatly improved compared with other methods in category 3 and 11.
On the Pavia University dataset, our method obtains the highest classification results on four classes out of nine classes. For the class “Meadows” and “Bare Soil”, where the proportion between the number of testing set and the number of training sets is huge, our method obtains classification results of 97.57% and 100.0%, respectively. Our method is greatly improved compared with other methods in category 2.
On the KSC dataset, our method obtains the highest classification results on 6 out of 13 classes. For the classes “Scrub” and “Water”, where the proportion between the number of testing sets and the number of training sets is huge, our method obtains classification results of 99.87% and 100.0%, respectively. Our method is greatly improved compared with other methods in category 2.
It can be seen that our method can make full use of a small number of training samples to extract effective features. From the perspective of AA, our method reaches the highest on the Indian Pines and KSC dataset. From the perspective of kappa, our method reaches the highest perfromance on three dataset. It can be seen that the classification results of each category are relatively balanced in the case of unbalanced proportion of training samples. Ablation experiments are shown in Table 13, Table 14 and Table 15. By introducing the generative adversarial network, the OA can be improved by around 2 % . As can be seen, the classification results after adding the generator improve greatly.

5. Conclusions

In this paper, we propose a new method that combines a Generative Adversarial Network, convolution block and Transformer Encoder in a unified framework. The proposed method has both a global receptive field provided by the Transformer Encoder and a local receptive field provided by the convolution block. In order to perform better in the few-shot learning problem, the Generative Adversarial Network is used to provide more training data. Experiments conducted on the Indian Pines, PaviaU and KSC datasets demonstrate that our method exceeds the results of existing deep learning methods for hyperspectral image classification in the few-shot learning problem.

Author Contributions

Conceptualization, Z.X.; Data curation, Z.X. and Z.C.; Formal analysis, J.L.; Funding acquisition, J.B. and L.J.; Methodology, J.B.; Resources, Z.C.; Supervision, L.J.; Visualization, Z.X.; Writing—original draft, J.L.; Writing—review & editing, J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61772401, in part by the Key Research and Development Program of Shaanxi under Grant 2022GY-062 and Grant 2020GXLH-Y-023, in part by the Science and Technology Project of Hunan Provincial Water Resources Department under Grant XSKJ2021000-39, in part by the Scientific Research Project of Department of the Natural Resources of Hunan Province under Grant 202211, in part by the Fund of National Key Laboratory of Science and Technology on Remote Sensing Information and imagery Analysis, Beijing Research Institute of Uranium Geology under Grant 6142A010409. This paper is also supported by the Science and Technology on Communication Information Security Control Laboratory.

Data Availability Statement

Not Applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chang, C.I. Hyperspectral Data Exploitation: Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
  2. Bai, J.; Ding, B.; Xiao, Z.; Jiao, L.; Chen, H.; Regan, A.C. Hyperspectral image classification based on deep attention graph convolutional network. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
  3. Bai, J.; Yuan, A.; Xiao, Z.; Zhou, H.; Wang, D.; Jiang, H.; Jiao, L. Class incremental learning with few-shots based on linear programming for hyperspectral image classification. IEEE Trans. Cybern. 2020, 52, 5474–5485. [Google Scholar] [CrossRef] [PubMed]
  4. Makki, I.; Younes, R.; Francis, C.; Bianchi, T.; Zucchetti, M. A survey of landmine detection using hyperspectral imaging. ISPRS J. Photogramm. Remote Sens. 2017, 124, 40–53. [Google Scholar] [CrossRef]
  5. Gevaert, C.M.; Suomalainen, J.; Tang, J.; Kooistra, L. Generation of spectral–temporal response surfaces by combining multispectral satellite and hyperspectral UAV imagery for precision agriculture applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 3140–3146. [Google Scholar] [CrossRef]
  6. Brown, A.J.; Walter, M.R.; Cudahy, T. Hyperspectral imaging spectroscopy of a Mars analogue environment at the North Pole Dome, Pilbara Craton, Western Australia. Aust. J. Earth Sci. 2005, 52, 353–364. [Google Scholar] [CrossRef]
  7. Kuflik, P.; Rotman, S.R. Band selection for gas detection in hyperspectral images. In Proceedings of the 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel, 14–17 November 2012; pp. 1–4. [Google Scholar]
  8. Salem, F.; Kafatos, M.; El-Ghazawi, T.; Gomez, R.; Yang, R. Hyperspectral image analysis for oil spill detection. In Proceedings of the Summaries of NASA/JPL Airborne Earth Science Workshop, Pasadena, CA, USA, 27 February–2 March 2001; pp. 5–9. [Google Scholar]
  9. Awad, M. Sea water chlorophyll-a estimation using hyperspectral images and supervised artificial neural network. Ecol. Inform. 2014, 24, 60–68. [Google Scholar] [CrossRef]
  10. Jay, S.; Guillaume, M. A novel maximum likelihood based method for mapping depth and water quality from hyperspectral remote-sensing data. Remote Sens. Environ. 2014, 147, 121–132. [Google Scholar] [CrossRef]
  11. Jänicke, C.; Okujeni, A.; Cooper, S.; Clark, M.; Hostert, P.; van der Linden, S. Brightness gradient-corrected hyperspectral image mosaics for fractional vegetation cover mapping in northern california. Remote Sens. Lett. 2020, 11, 1–10. [Google Scholar] [CrossRef]
  12. Li, J.; Pang, Y.; Li, Z.; Jia, W. Tree species classification of airborne hyperspectral image in cloud shadow area. In International Symposium of Space Optical Instrument and Application; Springer: Berlin/Heidelberg, Germany, 2018; pp. 389–398. [Google Scholar]
  13. Li, J.; Bioucas-Dias, J.M.; Plaza, A. Semisupervised hyperspectral image classification using soft sparse multinomial logistic regression. IEEE Geosci. Remote Sens. Lett. 2012, 10, 318–322. [Google Scholar]
  14. Zhong, Y.; Zhang, L. An adaptive artificial immune network for supervised classification of multi-/hyperspectral remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2011, 50, 894–909. [Google Scholar] [CrossRef]
  15. Kang, X.; Xiang, X.; Li, S.; Benediktsson, J.A. PCA-based edge-preserving features for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7140–7151. [Google Scholar] [CrossRef]
  16. Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef] [Green Version]
  17. Zhang, H.; Li, Y.; Zhang, Y.; Shen, Q. Spectral-spatial classification of hyperspectral imagery using a dual-channel convolutional neural network. Remote Sens. Lett. 2017, 8, 438–447. [Google Scholar] [CrossRef] [Green Version]
  18. Li, Y.; Zhang, H.; Shen, Q. Spectral–spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef] [Green Version]
  19. Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef] [Green Version]
  20. Xu, Y.; Li, Z.; Li, W.; Du, Q.; Liu, C.; Fang, Z.; Zhai, L. Dual-Channel Residual Network for Hyperspectral Image Classification With Noisy Labels. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5502511. [Google Scholar] [CrossRef]
  21. Bai, J.; Huang, S.; Xiao, Z.; Li, X.; Zhu, Y.; Regan, A.C.; Jiao, L. Few-shot hyperspectral image classification based on adaptive subspaces and feature transformation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
  22. Hu, L.; Luo, X.; Wei, Y. Hyperspectral Image Classification of Convolutional Neural Network Combined with Valuable Samples. J. Phys. Conf. Ser. 2020, 1549, 52011. [Google Scholar] [CrossRef]
  23. Li, Z.; Liu, M.; Chen, Y.; Xu, Y.; Li, W.; Du, Q. Deep Cross-Domain Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5501618. [Google Scholar] [CrossRef]
  24. Liu, S.; Shi, Q.; Zhang, L. Few-shot hyperspectral image classification with unknown classes using multitask deep learning. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5085–5102. [Google Scholar] [CrossRef]
  25. Miao, J.; Wang, B.; Wu, X.; Zhang, L.; Hu, B.; Zhang, J.Q. Deep Feature Extraction Based on Siamese Network and Auto-Encoder for Hyperspectral Image Classification. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 397–400. [Google Scholar]
  26. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
  27. Wang, Q.; Liu, S.; Chanussot, J.; Li, X. Scene classification with recurrent attention of VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1155–1167. [Google Scholar] [CrossRef]
  28. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
  29. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  30. Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; et al. Resnest: Split-attention networks. arXiv 2020, arXiv:2004.08955. [Google Scholar]
  31. Xu, Z.; Zhang, W.; Zhang, T.; Yang, Z.; Li, J. Efficient transformer for remote sensing image segmentation. Remote Sens. 2021, 13, 3585. [Google Scholar] [CrossRef]
  32. Zhang, J.; Zhao, H.; Li, J. TRS: Transformers for Remote Sensing Scene Classification. Remote Sens. 2021, 13, 4143. [Google Scholar] [CrossRef]
  33. Qing, Y.; Liu, W.; Feng, L.; Gao, W. Improved Transformer Net for Hyperspectral Image Classification. Remote Sens. 2021, 13, 2216. [Google Scholar] [CrossRef]
  34. He, J.; Zhao, L.; Yang, H.; Zhang, M.; Li, W. HSI-BERT: Hyperspectral image classification using the bidirectional encoder representation from transformers. IEEE Trans. Geosci. Remote Sens. 2019, 58, 165–178. [Google Scholar] [CrossRef]
  35. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  36. Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. Cvt: Introducing convolutions to vision transformers. arXiv 2021, arXiv:2103.15808. [Google Scholar]
  37. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
  38. Feng, J.; Feng, X.; Chen, J.; Cao, X.; Zhang, X.; Jiao, L.; Yu, T. Generative adversarial networks based on collaborative learning and attention mechanism for hyperspectral image classification. Remote Sens. 2020, 12, 1149. [Google Scholar] [CrossRef] [Green Version]
  39. Zhao, W.; Chen, X.; Chen, J.; Qu, Y. Sample generation with self-attention generative adversarial Adaptation Network (SaGAAN) for Hyperspectral Image Classification. Remote Sens. 2020, 12, 843. [Google Scholar] [CrossRef] [Green Version]
  40. Archibald, R.; Fann, G. Feature selection and classification of hyperspectral images with support vector machines. IEEE Geosci. Remote Sens. Lett. 2007, 4, 674–677. [Google Scholar] [CrossRef]
  41. Makantasis, K.; Karantzalos, K.; Doulamis, A.; Doulamis, N. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4959–4962. [Google Scholar]
Figure 1. The overall framework of the proposed method for few-shot classification. As can be seen, the framework is a classic Generative Adversarial Network framework.
Figure 1. The overall framework of the proposed method for few-shot classification. As can be seen, the framework is a classic Generative Adversarial Network framework.
Remotesensing 14 03426 g001
Figure 2. The Transformer Encoder module in this paper.
Figure 2. The Transformer Encoder module in this paper.
Remotesensing 14 03426 g002
Figure 3. The Group-divided Self-attention.
Figure 3. The Group-divided Self-attention.
Remotesensing 14 03426 g003
Figure 4. The dual-channel module that extracts spatial features and spectral features.
Figure 4. The dual-channel module that extracts spatial features and spectral features.
Remotesensing 14 03426 g004
Figure 5. The Fusion Block that learns the class token to obtain the classification result.
Figure 5. The Fusion Block that learns the class token to obtain the classification result.
Remotesensing 14 03426 g005
Figure 6. The Generator.
Figure 6. The Generator.
Remotesensing 14 03426 g006
Figure 7. The classification maps for the IP with compared methods. (a) Source Image. (b) Ground Truth. (c) SVM. (d) 2D-CNN. (e) 3D-CNN. (f) HSI-BERT. (g) CA-GAN. (h) DCFSL. (i) VSCNN. (j) S-DMM. (k) Our proposed method. (l) Legend.
Figure 7. The classification maps for the IP with compared methods. (a) Source Image. (b) Ground Truth. (c) SVM. (d) 2D-CNN. (e) 3D-CNN. (f) HSI-BERT. (g) CA-GAN. (h) DCFSL. (i) VSCNN. (j) S-DMM. (k) Our proposed method. (l) Legend.
Remotesensing 14 03426 g007
Figure 8. The classification maps for the PU with compared methods. (a) Source Image. (b) Ground Truth. (c) SVM. (d) 2D-CNN. (e) 3D-CNN. (f) HSI-BERT. (g) CA-GAN. (h) DCFSL. (i) VSCNN. (j) S-DMM. (k) Our proposed method. (l) Legend.
Figure 8. The classification maps for the PU with compared methods. (a) Source Image. (b) Ground Truth. (c) SVM. (d) 2D-CNN. (e) 3D-CNN. (f) HSI-BERT. (g) CA-GAN. (h) DCFSL. (i) VSCNN. (j) S-DMM. (k) Our proposed method. (l) Legend.
Remotesensing 14 03426 g008
Figure 9. The classification maps for the KSC with compared methods. (a) Source Image. (b) Ground Truth. (c) SVM. (d) 2D-CNN. (e) 3D-CNN. (f) HSI-BERT. (g) CA-GAN. (h) DCFSL. (i) VSCNN. (j) S-DMM. (k) Our proposed method. (l) Legend.
Figure 9. The classification maps for the KSC with compared methods. (a) Source Image. (b) Ground Truth. (c) SVM. (d) 2D-CNN. (e) 3D-CNN. (f) HSI-BERT. (g) CA-GAN. (h) DCFSL. (i) VSCNN. (j) S-DMM. (k) Our proposed method. (l) Legend.
Remotesensing 14 03426 g009
Table 1. The detailed structure of Spatial Feature Extraction.
Table 1. The detailed structure of Spatial Feature Extraction.
StageOutputSpatial Feature Extraction
S115 × 15 × 200(1 × 1,3,stride 1)
S215 × 15 × 3Attention
(3 × 3,3,stride 1)
S315 × 15 × 3Attention
(3 × 3,3,stride 1)
S415 × 15 × 3Attention
(3 × 3,3,stride 1)
Table 2. The detailed structure of Spectral Feature Extraction.
Table 2. The detailed structure of Spectral Feature Extraction.
StageOutputSpectral Feature Extraction
S115 × 15 × 200(1 × 1,97,stride 1)
S215 × 15 × 97Reshape
Linear
Reshape
S325 × 27 × 3 × 97Group-divided Self-attention
Reshape
S415 × 15 × 97(3 × 3,47,stride 1)
S515 × 15 × 47Reshape
Linear
Reshape
S625 × 27 × 3 × 47Group-divided Self-attention
Reshape
S715 × 15 × 47(3 × 3,27,stride 3)
S85 × 5 × 27Reshape
Linear
Reshape
S925 × 3 × 27Group-divided Self-attention
Reshape
Table 3. The detailed structure of the Generator.
Table 3. The detailed structure of the Generator.
StageOutputGenerator
S1256 × 4 × 4Attention
Deconvolution
S264 × 8 × 8Attention
Deconvolution
S316 × 16 × 16Attention
Deconvolution
S415 × 32 × 32Attention
Deconvolution
S55 × 15 × 200Attention
Deconvolution
Table 4. The land cover category and data division of the Indian Pines Dataset.
Table 4. The land cover category and data division of the Indian Pines Dataset.
Class No.Class NameTrainingTest
1Alfalfa1531
2Corn-notill151413
3Corn-mintill15815
4Corn15222
5Grass-pasture15468
6Grass-trees15715
7Grass-pasture-mowed1513
8Hay-windrowed15463
9Oats155
10Soybean-notil15957
11Soybean-mintill152440
12Soybean-clean15578
13Wheat15190
14Woods151250
15Buildings-Grass-Trees-Drives15371
16Stone-Steel-Towers1578
Total 24010,009
Table 5. The land cover category and data division on the Pavia University dataset.
Table 5. The land cover category and data division on the Pavia University dataset.
Class No.Class NameTrainingTest
1Asphalt156616
2Meadows1518,634
3Gravel152084
4Trees153049
5Painted metal sheets151330
6Bare Soil155014
7Bitumen151315
8Self-Blocking Bricks153667
9Shadows15932
Total 13542,641
Table 6. The land cover category and data division on the Kennedy Space Center dataset.
Table 6. The land cover category and data division on the Kennedy Space Center dataset.
Class No.Class NameTrainingTest
1Scrub15746
2Willow swamp15228
3Cabbage palm hammock15241
4Cabbage palm/oak hammock15237
5Slash pine15146
6Oak/broadleaf hammock15214
7Hardwood swamp1590
8Graminoid marsh15416
9Spartina marsh15505
10Cattail marsh15389
11Salt marsh15404
12Mud flats15488
13Water15912
Total 1955016
Table 7. The classification results of our proposed and other leading methods on Indian Pines (%).
Table 7. The classification results of our proposed and other leading methods on Indian Pines (%).
NumberSVM2D-CNN3D-CNNHSI-BERTCA-GANDCFSLVSCNNS-DMMOurs
542.01 ± 1.7739.81 ± 1.1047.66 ± 1.8340.08 ± 1.7051.52 ± 1.8167.23 ± 1.2166.70 ± 1.5660.70 ± 1.8576.25 ± 0.92
1051.34 ± 1.2855.61 ± 1.8954.46 ± 1.0052.79 ± 0.7770.77 ± 1.1172.14 ± 0.6680.06 ± 1.1464.89 ± 0.9486.28 ± 0.77
1557.41 ± 1.4057.72 ± 1.9058.94 ± 1.2758.50 ± 1.5675.52 ± 1.2877.45 ± 1.7883.06 ± 1.0467.04 ± 1.6587.47 ± 1.45
2064.32 ± 1.1160.79 ± 1.5365.89 ± 1.0862.03 ± 1.3180.54 ± 1.0682.18 ± 1.0286.13 ± 1.3568.73 ± 1.2489.26 ± 0.53
2568.11 ± 0.6364.39 ± 1.6273.23 ± 1.3066.87 ± 1.4283.38 ± 0.8683.20 ± 0.6288.22 ± 1.4069.05 ± 0.5993.01 ± 1.14
Table 8. The classification results of our proposed and other leading methods on Pavia University (%).
Table 8. The classification results of our proposed and other leading methods on Pavia University (%).
NumberSVM2D-CNN3D-CNNHSI-BERTCA-GANDCFSLVSCNNS-DMMOurs
562.53 ± 1.8763.74 ± 1.3062.92 ± 1.0118.14 ± 1.5964.63 ± 0.6078.03 ± 1.3971.95 ± 1.7876.64 ± 0.6285.95 ± 0.58
1071.92 ± 1.9766.37 ± 0.7272.43 ± 1.6458.12 ± 1.1572.55 ± 1.4085.86 ± 1.7375.45 ± 1.0983.26 ± 0.7791.40 ± 1.39
1578.67 ± 1.7977.53 ± 1.5075.24 ± 0.8475.31 ± 1.5976.81 ± 0.9190.71 ± 0.5681.63 ± 1.8188.30 ± 1.0393.20 ± 0.59
2080.19 ± 1.6079.51 ± 1.6780.87 ± 1.0876.10 ± 1.4583.82 ± 0.8593.68 ± 0.9183.52 ± 1.3692.26 ± 0.8094.69 ± 0.76
2581.72 ± 0.8783.77 ± 1.7082.97 ± 0.8879.11 ± 1.6284.99 ± 0.7894.66 ± 0.8387.19 ± 1.7293.37 ± 1.9796.38 ± 0.42
Table 9. The classification results of our proposed and other leading methods on the Kennedy Space Center dataset (%).
Table 9. The classification results of our proposed and other leading methods on the Kennedy Space Center dataset (%).
NumberSVM2D-CNN3D-CNNHSI-BERTCA-GANDCFSLVSCNNS-DMMOurs
1584.83 ± 1.5180.53 ± 1.3187.18 ± 1.0082.93 ± 0.9491.17 ± 1.5497.59 ± 1.0380.15 ± 0.6295.83 ± 1.6898.39 ± 0.63
2086.49 ± 1.9482.26 ± 0.8490.53 ± 1.4186.47 ± 0.6794.34 ± 0.6298.10 ± 1.3285.44 ± 1.0797.96 ± 1.3299.54 ± 0.40
2589.26 ± 0.5486.23 ± 1.4391.22 ± 1.0090.16 ± 1.5297.04 ± 1.8799.16 ± 0.8487.15 ± 1.9298.08 ± 0.6299.84 ± 0.15
Table 10. The classification results of our proposed and other leading methods on the Indian Pines dataset (%).
Table 10. The classification results of our proposed and other leading methods on the Indian Pines dataset (%).
IP15SVM2D-CNN3D-CNNHSI-BERTCA-GANDCFSLVSCNNS-DMMOurs
class154.84 ± 0.5870.97 ± 0.6683.87 ± 1.0293.55 ± 1.25100.0 ± 0.0100.0 ± 0.090.32 ± 1.0891.67 ± 1.80100.0 ± 0.0
class230.71 ± 1.8534.89 ± 1.5338.08 ± 0.8542.60 ± 1.5861.78 ± 0.6560.79 ± 1.6875.94 ± 1.7847.18 ± 0.8678.77 ± 1.03
class343.93 ± 0.8548.47 ± 1.7141.84 ± 1.3248.22 ± 0.8368.22 ± 0.7778.77 ± 1.0585.03 ± 1.6144.88 ± 1.2992.15 ± 1.42
class457.21 ± 1.0858.56 ± 1.9652.70 ± 0.5681.98 ± 0.5592.34 ± 1.5894.59 ± 1.8995.95 ± 0.9533.04 ± 0.7499.1 ± 0.9
class562.82 ± 0.5681.41 ± 1.6974.79 ± 1.9686.54 ± 0.7282.69 ± 1.8685.68 ± 0.8591.03 ± 1.7878.44 ± 1.4695.30 ± 0.66
class681.82 ± 0.5091.75 ± 1.2187.27 ± 1.8487.83 ± 0.6289.51 ± 1.5596.64 ± 1.0297.34 ± 1.3892.50 ± 1.6695.94 ± 0.73
class784.62 ± 1.0592.31 ± 0.92100.0 ± 0.092.31 ± 1.57100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0
class888.12 ± 1.8286.83 ± 0.5694.38 ± 1.7568.03 ± 0.8499.78 ± 0.2192.22 ± 1.9397.84 ± 1.2385.26 ± 1.94100.0 ± 0.0
class9100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0
class1052.77 ± 1.0863.64 ± 1.6364.26 ± 0.5359.98 ± 1.5276.28 ± 0.5171.89 ± 1.8580.88 ± 1.5166.74 ± 0.9186.00 ± 0.32
class1152.75 ± 0.6648.69 ± 1.4841.43 ± 0.5446.68 ± 0.9364.22 ± 1.6065.66 ± 0.9973.32 ± 1.0870.39 ± 1.0381.39 ± 1.18
class1252.08 ± 0.9247.58 ± 0.5641.70 ± 1.7439.45 ± 1.0378.72 ± 0.6673.18 ± 0.8588.41 ± 0.5340.82 ± 1.1173.18 ± 1.29
class1393.68 ± 0.7697.89 ± 1.7599.47 ± 0.5386.32 ± 0.9599.47 ± 0.53100.0 ± 0.098.95 ± 1.0499.49 ± 0.51100.0 ± 0.0
class1480.80 ± 0.7858.80 ± 1.4984.24 ± 1.7670.48 ± 0.8182.32 ± 1.2393.28 ± 0.6984.24 ± 1.5081.35 ± 0.6997.28 ± 1.12
class1542.32 ± 1.1157.68 ± 1.4670.89 ± 1.4562.53 ± 1.2192.99 ± 1.6887.87 ± 0.5086.52 ± 0.5868.35 ± 1.4983.83 ± 0.81
class1688.46 ± 1.5794.87 ± 1.1297.44 ± 1.6084.62 ± 0.5692.31 ± 1.29100.0 ± 0.098.72 ± 1.2498.80 ± 0.74100.0 ± 0.0
OA57.41 ± 1.4057.72 ± 1.9058.94 ± 1.2758.50 ± 1.5675.52 ± 1.2877.45 ± 1.7883.06 ± 1.0467.04 ± 1.6587.47 ± 1.45
AA66.68 ± 1.6770.90 ± 1.3673.27 ± 1.3071.94 ± 1.3181.21 ± 0.8487.54 ± 0.5790.28 ± 0.5874.93 ± 1.0892.68 ± 0.47
kappa52.29 ± 1.2952.82 ± 1.0954.06 ± 1.6853.63 ± 1.0572.69 ± 0.5774.65 ± 0.7280.89 ± 0.6362.44 ± 1.6385.78 ± 1.31
Table 11. The classification results of our proposed and other leading methods on the Pavia University dataset (%).
Table 11. The classification results of our proposed and other leading methods on the Pavia University dataset (%).
PU15SVM2D-CNN3D-CNNHSI-BERTCA-GANDCFSLVSCNNS-DMMOurs
class166.28 ± 0.5040.10 ± 0.7770.41 ± 1.2568.91 ± 1.5060.16 ± 0.6774.55 ± 0.8683.27 ± 1.6396.97 ± 0.8289.07 ± 1.13
class282.10 ± 0.8793.15 ± 1.8273.10 ± 1.4487.44 ± 1.5372.83 ± 0.9297.20 ± 1.3376.96 ± 0.5481.15 ± 1.2997.57 ± 1.46
class364.78 ± 1.1983.01 ± 1.6273.80 ± 0.6133.59 ± 1.2098.03 ± 0.7380.57 ± 0.5881.91 ± 0.9192.69 ± 1.1967.08 ± 0.69
class485.93 ± 1.1290.03 ± 0.6789.37 ± 1.0769.86 ± 1.4289.44 ± 0.8394.62 ± 1.5486.86 ± 0.5497.50 ± 1.4188.03 ± 0.69
class599.32 ± 0.6898.5 ± 1.596.39 ± 1.9792.18 ± 1.0099.7 ± 0.29100.0 ± 0.099.55 ± 0.45100.0 ± 0.0100.0 ± 0.0
class672.34 ± 1.6152.11 ± 0.6669.68 ± 0.6448.54 ± 1.0979.94 ± 1.3590.37 ± 1.2582.81 ± 1.4484.73 ± 0.9893.80 ± 0.78
class787.22 ± 1.5268.44 ± 1.3086.46 ± 1.5066.92 ± 0.5190.04 ± 0.6992.47 ± 1.4077.94 ± 1.5297.71 ± 0.5899.47 ± 0.53
class878.13 ± 1.8276.85 ± 0.5777.09 ± 1.8383.50 ± 1.5481.95 ± 1.6581.62 ± 1.2593.58 ± 1.3293.23 ± 1.4293.07 ± 1.13
class999.89 ± 0.10100.0 ± 0.086.05 ± 0.9488.73 ± 1.0397.32 ± 1.72100.0 ± 0.071.84 ± 1.5599.89 ± 0.1096.67 ± 0.57
OA78.67 ± 1.7977.53 ± 1.5075.24 ± 0.8475.31 ± 1.5976.81 ± 0.9190.71 ± 0.5681.63 ± 1.8188.30 ± 1.0393.20 ± 0.59
AA81.78 ± 1.0678.02 ± 0.6480.26 ± 1.4171.08 ± 0.7176.94 ± 0.7690.20 ± 0.6783.86 ± 0.5593.76 ± 0.7791.60 ± 0.55
kappa72.32 ± 0.8870.17 ± 0.6068.43 ± 1.6567.00 ± 1.3371.02 ± 1.5187.73 ± 1.6776.46 ± 1.7084.90 ± 1.6591.00 ± 1.07
Table 12. The classification results of our proposed and other leading methods on the Kennedy Space Center (%).
Table 12. The classification results of our proposed and other leading methods on the Kennedy Space Center (%).
KSC15SVM2D-CNN3D-CNNHSI-BERTCA-GANDCFSLVSCNNS-DMMOurs
class170.38 ± 0.8586.23 ± 1.6089.41 ± 1.9285.66 ± 0.7188.20 ± 0.7196.92 ± 1.6897.15 ± 0.6196.01 ± 1.4999.87 ± 0.12
class281.14 ± 1.7074.44 ± 1.0986.40 ± 1.5090.35 ± 1.0385.53 ± 1.0486.40 ± 0.9091.28 ± 1.4888.84 ± 1.03100.0 ± 0.0
class394.19 ± 1.7572.46 ± 1.9885.06 ± 1.2149.79 ± 0.7895.02 ± 0.7698.76 ± 0.9480.09 ± 0.9299.19 ± 0.8196.68 ± 0.44
class443.04 ± 1.9576.29 ± 0.7654.01 ± 0.6651.90 ± 0.8990.72 ± 0.5982.28 ± 1.3442.29 ± 1.2154.96 ± 1.0886.08 ± 1.18
class573.97 ± 1.0542.55 ± 1.2983.56 ± 0.5658.90 ± 1.5490.41 ± 1.1991.78 ± 1.6658.09 ± 0.6180.79 ± 0.8693.84 ± 1.35
class666.36 ± 1.2446.89 ± 1.0376.64 ± 0.8989.72 ± 1.2194.39 ± 0.5397.66 ± 0.7770.59 ± 0.7396.35 ± 1.07100.0 ± 0.0
class794.44 ± 0.9978.82 ± 0.65100.0 ± 0.094.44 ± 1.13100.0 ± 0.0100.0 ± 0.070.00 ± 1.34100.0 ± 0.097.78 ± 0.40
class891.11 ± 1.4076.16 ± 0.8092.55 ± 1.8676.68 ± 1.7186.78 ± 0.75100.0 ± 0.062.81 ± 1.3099.29 ± 0.5096.63 ± 0.92
class987.52 ± 1.5584.80 ± 1.6560.59 ± 1.9776.44 ± 0.9086.73 ± 1.39100.0 ± 0.074.55 ± 0.88100.0 ± 0.099.60 ± 0.40
class1087.92 ± 0.9175.26 ± 1.2293.32 ± 0.9796.92 ± 1.0186.12 ± 0.6999.74 ± 0.2661.48 ± 1.49100.0 ± 0.099.49 ± 0.51
class1198.27 ± 1.2097.24 ± 1.9093.07 ± 1.1596.78 ± 1.8092.57 ± 0.74100.0 ± 0.078.68 ± 1.24100.0 ± 0.0100.0 ± 0.0
class1287.91 ± 1.9874.33 ± 1.9893.85 ± 0.5971.31 ± 1.7788.52 ± 1.0899.18 ± 0.8178.24 ± 1.3198.99 ± 0.8797.95 ± 1.26
class1397.81 ± 1.9692.17 ± 1.30100.0 ± 0.097.37 ± 0.56100.0 ± 0.0100.0 ± 0.099.89 ± 0.10100.0 ± 0.0100.0 ± 0.0
OA84.83 ± 1.5180.53 ± 1.3187.18 ± 1.0082.93 ± 0.9491.17 ± 1.5497.59 ± 1.0380.15 ± 0.6295.83 ± 1.6898.39 ± 0.63
AA82.62 ± 0.7475.20 ± 0.8585.27 ± 1.1679.71 ± 0.8184.64 ± 0.8496.36 ± 1.7474.24 ± 1.7293.42 ± 0.7297.53 ± 1.11
kappa83.17 ± 1.9078.23 ± 0.9185.73 ± 0.6080.99 ± 1.9790.20 ± 1.6797.31 ± 1.8377.81 ± 1.9695.35 ± 0.5398.20 ± 1.20
Table 13. The ablation experiments on Indian Pines (OA) (%).
Table 13. The ablation experiments on Indian Pines (OA) (%).
NumberWithout GeneratorWith Generator
574.24 ± 2.1576.25 ± 0.92
1084.52 ± 1.4386.28 ± 0.77
1585.76 ± 1.8787.47 ± 1.45
2087.15 ± 1.5189.26 ± 0.53
2590.43 ± 1.6593.01 ± 1.14
Table 14. The ablation experiments on the Pavia University dataset (OA) (%).
Table 14. The ablation experiments on the Pavia University dataset (OA) (%).
NumberWithout GeneratorWith Generator
583.34 ± 2.1285.95 ± 0.58
1088.04 ± 2.2191.40 ± 1.39
1591.56 ± 2.3893.20 ± 0.59
2092.14 ± 2.9894.69 ± 0.76
2594.89 ± 2.1296.38 ± 0.42
Table 15. The ablation experiments on the Kennedy Space Center (OA) (%).
Table 15. The ablation experiments on the Kennedy Space Center (OA) (%).
NumberWithout GeneratorWith Generator
1596.82 ± 1.3298.39 ± 0.63
2097.54 ± 1.6399.54 ± 0.40
2598.12 ± 1.3299.84 ± 0.15
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bai, J.; Lu, J.; Xiao, Z.; Chen, Z.; Jiao, L. Generative Adversarial Networks Based on Transformer Encoder and Convolution Block for Hyperspectral Image Classification. Remote Sens. 2022, 14, 3426. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14143426

AMA Style

Bai J, Lu J, Xiao Z, Chen Z, Jiao L. Generative Adversarial Networks Based on Transformer Encoder and Convolution Block for Hyperspectral Image Classification. Remote Sensing. 2022; 14(14):3426. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14143426

Chicago/Turabian Style

Bai, Jing, Jiawei Lu, Zhu Xiao, Zheng Chen, and Licheng Jiao. 2022. "Generative Adversarial Networks Based on Transformer Encoder and Convolution Block for Hyperspectral Image Classification" Remote Sensing 14, no. 14: 3426. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14143426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop