AL-MRIS: An Active Learning-Based Multipath Residual Involution Siamese Network for Few-Shot Hyperspectral Image Classification

Yang, Jinghui; Qin, Jia; Qian, Jinxi; Li, Anqi; Wang, Liguo

doi:10.3390/rs16060990

Open AccessArticle

AL-MRIS: An Active Learning-Based Multipath Residual Involution Siamese Network for Few-Shot Hyperspectral Image Classification

by

Jinghui Yang

^1,*

,

Jia Qin

¹,

Jinxi Qian

²,

Anqi Li

¹ and

Liguo Wang

³

¹

School of Information Engineering, China University of Geosciences (Beijing), Beijing 100083, China

²

Institute of Telecommunication and Navigation Satellites, China Academy of Space Technology, Beijing 100094, China

³

College of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(6), 990; https://0-doi-org.brum.beds.ac.uk/10.3390/rs16060990

Submission received: 29 January 2024 / Revised: 6 March 2024 / Accepted: 8 March 2024 / Published: 12 March 2024

(This article belongs to the Special Issue New Advances in Hyperspectral–Multispectral Image Classification and Fusion Applications)

Download

Browse Figures

Versions Notes

Abstract

:

In hyperspectral image (HSI) classification scenarios, deep learning-based methods have achieved excellent classification performance, but often rely on large-scale training datasets to ensure accuracy. However, in practical applications, the acquisition of hyperspectral labeled samples is time consuming, labor intensive and costly, which leads to a scarcity of obtained labeled samples. Suffering from insufficient training samples, few-shot sample conditions limit model training and ultimately affect HSI classification performance. To solve the above issues, an active learning (AL)-based multipath residual involution Siamese network for few-shot HSI classification (AL-MRIS) is proposed. First, an AL-based Siamese network framework is constructed. The Siamese network, which has relatively low demand for sample data, is adopted for classification, and the AL strategy is integrated to select more representative samples to improve the model’s discriminative ability and reduce the costs of labeling samples in practice. Then, the multipath residual involution (MRIN) module is designed for the Siamese subnetwork to obtain the comprehensive features of the HSI. The involution operation was used to capture the fine-grained features and effectively aggregate the contextual semantic information of the HSI through dynamic weights. The MRIN module comprehensively considers the local features, dynamic features and global features through multipath residual connections, which improves the representation ability of HSIs. Moreover, a cosine distance-based contrastive loss is proposed for the Siamese network. By utilizing the directional similarity of high-dimensional HSI data, the discriminability of the Siamese classification network is improved. A large number of experimental results show that the proposed AL-MRIS method can achieve excellent classification performance with few-shot training samples, and compared with several state-of-the-art classification methods, the AL-MRIS method obtains the highest classification accuracy.

Keywords:

hyperspectral image classification; Siamese network; active learning; few-shot samples; involution

Graphical Abstract

1. Introduction

Hyperspectral images (HSIs) are multiband information-based images that not only contain rich spatial features but also include rich spectral information carried by tens or even hundreds of continuous narrow bands on each pixel. In recent years, HSIs have been widely used in forest monitoring [1], marine biological estimation [2] and geological exploration [3]. HSI classification is a basic and important technique that involves detailed processing and in-depth analysis of hyperspectral data, with the purpose of identifying and classifying various substances in images at the pixel-by-pixel level.

In recent years, many technologies for HSI classification have been developed, and methods based on deep learning strategies, especially deep convolutional neural networks (CNNs), have been at the forefront of this field. These advanced CNN-based methods have shown outstanding accuracy when processing HSI data [4]. However, deep learning architectures often integrate a large number of parameters to be optimized [5]. How can these parameters be effectively trained? Generally, reliance on large-scale training datasets is needed to ensure that the model can be accurately trained. However, in practice, obtaining sufficient HSI training samples is a considerable challenge. Limited by the training sample size, deep learning models easily suffer from overfitting [6,7].

Researchers have proposed various solutions to address these problems with limited training samples or few-shot conditions. Meta-learning, also known as ‘learning to learn,’ has shown remarkable results in solving overfitting problems that are prone to occur during few-shot learning. Gao et al. used a model-agnostic meta-learning algorithm (MAML) to realize the application of meta-learning in HSI classification with a small training sample [8]. Li et al. combined MAML and regularized fine-tuning methods to enhance the generalization ability and accomplished few-shot HSI classification [9]. Zhang et al. designed a few-shot HSI classification method based on Bayesian element learning [10]. Meta-learning models are often designed for quickly adapting to new tasks, accompanied by sacrificing their generalizability. The performance of MAML is highly dependent on the task distribution; the significant difference between the task distribution and the training distribution may lead to a decrease in model performance. In addition, MAML has a relatively high computational cost, usually requiring iterations and updates on a large amount of data.

Faced with the scarcity of target dataset labels and other datasets having abundant labels, researchers have proposed a series of strategies for cross-domain learning. Cao et al. proposed a cross-domain few-shot HSI classification method by combining transformers and CNNs [11]. Zhang et al. proposed a dual-graph cross-domain few-shot sample learning framework. In the classification of few-shot HSIs, the combination of few-shot learning (FSL) and domain adaptation (DA) can reduce the negative impact of domain drift on FSL [12]. Zhang proposed a deep cross-domain small sample HSI classification method based on FSL and DA by integrating information from the source domain and target domain at the feature level [13]. Wang et al. proposed a cross-domain few-shot HSI classification method with a weak parameter-sharing mechanism to narrow the distance between two domains and used local spatial–spectral alignment to reduce classification errors [14]. For cross-domain few-shot HSI classification, Wang et al. proposed a class-wise metric module and an asymmetric domain adversarial module, and the feature extractor can pay more attention to discriminative local information between classes [15]. Huang et al. proposed a cross-domain learning strategy for few-shot HSI classification by using kernel triplet loss to characterize complex nonlinear relationships between samples [16]. Zhang et al. proposed a few-shot HSI classification model that combined graph information aggregation cross-domain learning and domain alignment [17]. However, efficient cross-domain learning requires considerable computational resources, especially when deep learning models are involved, which can lead to application difficulties when resources are limited. In cross-domain learning, there may be a bias in the data distribution between different domains, and when the difference is significant, the model’s performance in the target domain may decrease. When the source and target domains have different feature representations, in the target domain the model does not fully utilize the source domain learned knowledge.

To assist with the classification of unlabeled data, self-supervised learning is also an indispensable tool. Its core objective is to explore the characteristics of unlabeled data through designed proxy tasks, with the aim of enhancing the representation ability in recognition and classification. The Siamese network, in contrast learning, has outstanding performance when few-shot samples are used. Li et al. proposed a new HIS few-shot classification framework based on self-supervised learning. In order to fully exploit the few annotated data from novel classes, an SSL with contrastive learning was designed to mine the category-invariant features and to learn more discriminative individual knowledge [18]. Li et al. proposed a two-branch deep learning network with shared feature extractors to improve the performance of few-shot HSI classification [19]. Cao et al. designed a Siamese network by using 3D convolutional networks and combining contrast information and label information. HSI classification performs well when only a few training samples are available [20]. By combining EMP, Siamese CNNs and spectrum–space feature fusion, Huang et al. proposed an extended morphological profile-based method for HSI classification with limited training samples [21]. To enhance the limited training samples in the CNN-based Siamese classification framework, Wang et al. proposed two enhancement methods, named SR-PDA and DR-PDA, to generate training sample pairs [22]. Xue et al. designed a two-branch lightweight spectral–spatial Siamese network that consists of 1D convolution and 2D convolution and uses different patch sizes as the input [23]. In contrast, when learning, how to select or generate sample pairs is an important problem; if the negative samples are not selected properly, the model may be limited from learning effective representations, and this may even lead to unstable training. The above Siamese networks randomly select training samples without selecting representative samples, and the contrastive loss functions are all based on Euclidean distance, which does not fully represent or discriminate the characteristics of hyperspectral images.

To ensure the accuracy of HSIs in classification tasks and mitigate the impact of scarce training samples, researchers have widely applied ALs to HSI classification. Wenhui Hou et al. proposed an integrated framework to improve HSI classification performance with small sample training. The framework utilized the spatial and spectral information extraction capabilities of deep learning, the high-information sample selection mechanism of AL and prototype learning [24]. Ma et al. jointly used iterative training sampling and AL to iteratively update and enhance the initial training sample set to improve the HSI classification accuracy with small training samples [25]. Li et al. combined semi-supervised clustering technology and the AL strategy to develop an efficient prototype network-based framework that can extract representative features from few-shot samples to enhance representation ability [26]. Wang et al. developed an adversarial AL strategy that captures variability HSI features and used advanced features to obtain heuristics through adversarial learning [27]. These AL algorithms tend to use difficult or marginal samples, which can cause biases in the training data, resulting in poor model performance on more general data.

Through the above analysis, it can be seen that in the few-shot HSI classification methods the rich features of HSI have not been fully explored and utilized. Considering the limited human and material resources, it is more practical to select representative samples for targeted labeling. Therefore, the performance of few-shot HSI classification still has considerable room for improvement. Considering the actual labeling cost and taking advantage of the great potential of Siamese networks in few-shot HSI classification, we integrate the Siamese network with the AL strategy. The involution operation [28,29], which has excellent performance in image processing, is adopted to improve the classification performance. Finally, for few-shot HSI classification, an active learning (AL)-based multipath residual involution Siamese network (AL-MRIS) is proposed. AL-MRIS not only fully considers the data characteristics under few-shot sample conditions but can also efficiently learn and identify rich information features, which improves the accuracy and efficiency of HSI classification. First, the initial training samples are randomly selected, and a series of positive and negative sample pairs are constructed as the input of the Siamese network. The Siamese network can effectively extract multiple comprehensive features from HSIs via a multipath residual involution (MRIN) module to improve the representation ability. Moreover, the AL strategy is utilized to select the samples with the highest prediction probability in each class, and after manually labeling the corresponding real labels, these samples are added to the training set. By updating the training sample set and the network model, the performance of the classification model can be maximized while taking into account the labeling cost.

In summary, the main contributions of the work are as follows:

An active learning-based multipath residual involution Siamese network for few-shot HSI classification, AL-MRIS, is proposed. In the AL-MRIS method, the multipath residual involution (MRIN) module can comprehensively consider the local features, dynamic features and global features of HSIs. Moreover, to address the sample scarcity problem, the AL strategy is integrated into the Siamese network to make the training samples more representative to improve the classification performance.
An AL-based Siamese network framework is constructed. The Siamese network can extract information beyond labels from the data itself, thereby achieving better classification performance, especially for few-shot training samples. Moreover, by integrating with AL, representative samples can be selected more effectively, thus improving the ability of the Siamese network to discriminate features while reducing the practical labeling cost.
The multipath residual involution (MRIN) module is proposed. The MRIN module captures fine-grained features via an involution operation and effectively aggregates the contextual semantic information of the HSI through dynamic weights. Moreover, the MRIN module comprehensively considers local features, dynamic features and global features through multipath residual connections, which improves the representation ability of HSIs.
A cosine distance-based contrastive loss (CD loss) for Siamese networks is proposed. The CD loss utilizes the directional similarity of high-dimensional HSI data and improves the discriminability of the Siamese classification network.

The remainder of this paper is organized as follows: Section 2 provides a brief introduction to related technologies. Section 3 provides a detailed description of our proposed AL-MRIS method and its internal modules. Section 4 shows the classification results, and several key parameters are discussed and analyzed. Finally, Section 5 provides the conclusion.

2. Related Works

This section provides a brief overview of related work, including two key technologies: the involution network and the Siamese network.

2.1. Involution Network

An involution network is a kind of deep learning network that was proposed by Duo Li et al. in 2021 [28] to improve flexibility when processing spatial information. Compared with traditional CNNs, the involution network proposes a new operation called the ‘involution’. At its core is the ‘involution’ layer, which replaces traditional convolution operations with local interactions of adaptive weights.

The traditional convolution operation is based on the convolution kernel to extract element-by-element products from the input image and sum the product results to obtain the output. However, this approach limits the ability of the convolution kernel to adapt to diverse visual patterns according to different spatial locations, and the convolution receptive field is also locally limited, preventing it from handling small objects and blurred images well. In addition, traditional convolutional filters exhibit redundancy, which affects the flexibility of the convolutional kernel. In contrast, in the ‘involution’ operation, the network computation is neatly divided into two parts: ‘kernel generation’ and ‘Multiply–Add’. This operation obviously reduces the number of parameters and computations, thus improving the efficiency of the network. Compared with traditional convolutional operations, involution operations have ‘spatial-specific’ and ‘channel-agnostic’ characteristics; these operations can adaptively assign weights to different positions and prioritize the visual elements with the most information in the spatial domain.

The involution operator is a nonlinear transformation method that utilizes a learnable weight matrix to process the feature. Its uniqueness lies in the dynamism of the weight matrix, which can be adjusted according to the input feature. This dynamic adjustment enables the involution operator to better adapt to different types of features and improve the effectiveness of feature extraction. The involution kernel is a trainable matrix that multiplies the features element by element and then adds the results element by element to generate the output feature.

As shown in Figure 1, let

X_{I n v} \in R^{H \times W \times C}

represent the input feature, where

H a n d W

represent its height and width, respectively, and

C

represents the input channel. Within the cuboid of the input feature tensor

X_{I n v}

, the feature vector

({X_{I n v})}_{i, j} \in R^{C}

represents the pixel located at position

(i, j)

. The output of the involution operation

Y_{I n v}

can be defined by the following Formula (1):

{(Y_{I n v})}_{i, j, k} = \sum_{(u, v) \in Δ_{K}} H_{i, j, u + ⌊\frac{K}{2}⌋, v + ⌊\frac{K}{2}⌋, ⌈\frac{k G}{C}⌉} {{(X}_{I n v})}_{i + u, j + v, k},

(1)

where

H \in R^{H \times W \times K \times K \times G}

is an involution kernel and

H_{i, j, \cdot, \cdot, g} \in R^{K \times K}, g = 1,2, \dots, G

.

G

is the number of groups sharing the same involution kernel.

Δ_{K}

is the set of neighborhood offsets convolved over the central pixel, represented as the Cartesian product, as shown in Formula (2):

Δ_{K} = [- ⌊\frac{K}{2}⌋, \dots, ⌊\frac{K}{2}⌋] \times [- ⌊\frac{K}{2}⌋, \dots, ⌊\frac{K}{2}⌋] .

(2)

The shape of the involution kernels

H

depends on the shape of the input feature

X_{I n v}

. The kernel generation function is denoted as

ϕ

, and the kernel generation at each

(i, j)

position is calculated via Formula (3):

H_{i, j} = ϕ ({{(X}_{I n v})}_{Ψ_{i, j}}),

(3)

Ψ_{i, j}

represents the index of the pixel set

H_{i, j}

. Formally, there is a kernel generation function

ϕ

:

\begin{array}{r} R^{C} & \mapsto R^{K \times K \times G} \end{array}

, which is specifically defined as Formula (4):

H_{i, j} = ϕ ({{(X}_{I n v})}_{i, j}) = ω_{1} σ (ω_{0} {{(X}_{I n v})}_{i, j}),

(4)

Here,

ω_{0} \in R^{\frac{C}{r} \times C} a n d ω_{1} \in R^{(K \times K \times G) \times \frac{C}{r}}

represent two linear transformations, which together form a bottleneck layer.

σ

denotes batch normalization and a nonlinear activation function.

2.2. Siamese Network

The core of the Siamese network structure is to map input sample pairs to the same feature space through two subnetworks, and the two subnetworks have shared weights. This allows the network to learn a common representation that makes similar inputs closer in the feature space [20]. The Siamese network is usually used for dealing with the problem of measurement learning and similarity comparison. Two inputs in the sample pairs are passed through the shared weights subnetworks. Then, representations of the two vectors are obtained, and the similarity score is calculated between these two vector representations. This score can be used to indicate the similarity degree of two inputs, as shown in Figure 2. The Siamese network usually uses a contrast loss function to analyze the similarity scores of two input samples. By adjusting the network parameters, the similarity scores of similar input pairs increase, and the similarity scores of dissimilar input pairs decrease, thus improving the classification accuracy of the model. During the training stage, rather than directly classifying input samples, the Siamese network evaluates the similarities between input samples through learning, which enables it to learn representations of input data effectively without requiring a large number of labeled samples. In practical applications, the Siamese network is widely used in face verification [30], signature verification [31], target tracking [32] and other fields.

3. Our Proposed AL-MRIS Method

For few-shot HSI classification, an AL-based multipath residual involution Siamese network, named AL-MRIS, was proposed. The flowchart of the AL-MRIS method is shown in Figure 3. The multipath residual involution (MRIN) module comprehensively considers the local, dynamic and global features in HSIs. Moreover, to solve the sample scarcity problem, by integrating the Siamese network with AL, representative samples can be selected more effectively, thus improving the discriminative ability of the Siamese network while reducing the practical labeling cost.

3.1. The Multipath Residual Involution (MRIN) Module

To make full use of the rich features in HSIs, inspired by [28], a multipath residual involution (MRIN) module was proposed. The MRIN module adopts an involution operation to capture fine-grained features and effectively aggregates the contextual semantic information of HSIs through dynamic weights. Moreover, the MRIN module comprehensively considers local features, dynamic features and global features through multipath residual connections, which improves the representation ability of HSIs. A specific block diagram of the MRIN module is shown in Figure 4.

In the proposed MRIN module, there are three branches: a local feature branch, a dynamic feature branch and a global feature branch. The middle gray arrow branch is the dynamic feature branch, which extracts the core dynamic spectral–spatial feature via an involution operation. Through an involution operation, the adaptive convolution kernel can dynamically adjust weights within the receptive field to adapt to different spatial structures in HSIs. This approach expands the spatial range of utilized images, effectively aggregates the contextual semantic information in HSIs, and effectively extracts HSI features in nonuniform and complex spaces. The involution operation enhances the representation by using nonlinear transformations through nonlinear activation functions and has advantages in processing irregular shape regions and local structural information from HSIs. The involution operation is followed by a 1 × 1 Conv operation to maintain channel consistency. The upper purple arrow branch is the local feature branch, which uses a 3 × 3 Conv operation to extract local spatial features. At the same time, the lower yellow arrow branch is the global feature branch, which directly uses skip connections to transmit the information of the front layer and fuse the global features. The three branches are connected through a residual addition operation. The multipath residual connection not only helps the network maintain the integrity of information between different layers but also integrates different features to enhance the representation ability of the model. In addition, this approach accelerates convergence, alleviates the problem of gradient disappearance or gradient explosion and can be applied to deeper network structures.

Assuming that

X \in R^{H \times W \times C}

represents the input feature of the MRIN module, the output of the MRIN module CIN(X) can be represented by the following Formula (5):

C I N (X) = σ (C o n v 1 \times 1 (I n v (X))) + σ (C o n v 3 \times 3 (X)) + X,

(5)

σ (\cdot)

indicates the batch normalization and activation functions, Conv1 × 1(∙) represents a 2D convolution with a kernel size of 1 × 1 to maintain channel consistency and Conv3 × 3(∙) represents a 2D convolution with a kernel size of 3 × 3.

To achieve better feature aggregation, the proposed AL-MRIS network concatenates MRIN modules three times in series, and the concatenated module output

Ը

can be represented by the following Formula (6):

Ը = {C I N}_{t r i} (X) = C I N (C I N (C I N (X))) .

(6)

3.2. AL-Based Siamese Network

According to practical conditions, the acquisition of real hyperspectral labeled samples is time consuming, labor intensive and costly, and an AL-based Siamese network framework is proposed for few-shot HSI classification. The Siamese network can extract information beyond labels from the data itself, thereby achieving better classification performance. To a certain extent, the construction of sample pairs in the Siamese network can achieve data augmentation and alleviate the overfitting problem in the network training process. Moreover, by integrating with AL, representative samples can be selected more effectively, thus improving the ability of the Siamese network to discriminate features while reducing the practical labeling cost. In summary, as shown in Figure 5, the AL-based Siamese network framework includes three main steps: Siamese network learning using the training sample set, AL selecting newly labeled training samples and updating the training set.

3.2.1. Construct Sample Pairs Based on the Training Sample Set

Initially, one sample from each class is randomly selected to constitute the training set

{S e t}_{o r i_L a b l e}

. Considering the spatial information of the HSI, the training set includes

{N C}_{c l a s s} \in R^{H \times W \times C}

labeled 3D data blocks. Each block consists of the central labeled pixel and its surrounding neighborhood pixels, denoted as

X = {x_{1}, x_{2}, \dots, x_{{N C}_{c l a s s}}}

. The labels corresponding to these data blocks are represented by the set

Y = \{Y_{1}, Y_{2}, {\dots, Y}_{s}, \dots, Y_{{N C}_{c l a s s}}\}

, where

Y_{s} \in {1,2, \dots, {N C}_{c l a s s}}

and

{N C}_{c l a s s}

is the total number of classes in the HSI. The inputs of the proposed AL-MRIS algorithm are sample pairs, and a set of sample pairs

{S e t}_{p a i r s}

is constructed by traversing all possible combinations in

{S e t}_{o r i_L a b l e}

, achieving data augmentation to a certain extent. The sample pairs in

{S e t}_{p a i r s}

are denoted as

X_{n}, X_{m}

, and their corresponding labels are given in Formula (7):

Y_{n, m} = L a b e l ([X_{n}, X_{m}]) = \{\begin{matrix} 1 i f Y_{n} = Y_{m} \\ 0 i f Y_{n} \neq Y_{m} \end{matrix} .

(7)

The sample pair label

Y_{n, m}

represents whether the classes of the sample pair are consistent.

3.2.2. Siamese Network Learning Using the Training Set

The two samples in the training pair

[X_{n}, X_{m}]

are separately input into the two subnetworks of AL-MRIS through learning to deeply mine the rich information inherent in the samples themselves and decreasing the interclass distance and intraclass distance. The subnetworks of the AL-MRIS are mainly composed of three serially connected MRIN modules. The input X is passed through three serial MRIN modules to obtain the output

Ը

. Then,

Ը

is passed through the

C o v_s p e (\cdot)

operation and combined with itself by residual connection to obtain the advanced feature output

Թ (X)

, as represented in Formula (8):

Թ (X) = C o v_s p e (C o v_s p e (Ը) + Ը),

(8)

C o v_s p e (\cdot)

includes a 7 × 1 × 1 convolutional layer, a batch normalization layer and a Relu activation layer.

The subnetwork of AL-MRIS learns advanced features

Թ (X)

from the input patch X, with a simple network structure and high learning efficiency. Next, adaptive average pooling operations are utilized to extract 1 × 1 × 96 feature vectors from

Թ (X)

, a fully connected layer is used to transform them into a prediction vector

f (X)

with a size of 1 × C and the preliminary training by using the initial training set is completed, which can be represented by the following Formula (9):

f (X) = L i n e a r (A v g P o o l 2 d (Թ (X))),

(9)

A v g P o o l 2 d (\cdot)

represents the adaptive average pooling operation, and

L i n e a r (\cdot)

represents the fully connected layer.

During the contrastive learning process, the parameter

θ

was used to update the model. The updating

θ

processes are shown in Formulas (10)–(13):

θ = a r g m i n (L_{c o n t r a} \{f (X_{n}), f (X_{m}), Y_{n, m}; θ\}) .

(10)

For HSI classification, a cosine distance-based contrastive loss (CD loss) for Siamese networks was proposed to minimize the intraclass distance and maximize the interclass distance.

The CD loss utilizes the directional similarity of high-dimensional HSI data and improves the discriminability of the Siamese classification network. It can be represented by the following Formula (11):

L_{c o n t r a} = \frac{1}{2} [Y_{n, m} d^{2} + (1 - Y_{n, m}) \max {(m a r g i n - d, 0)}^{2}],

(11)

where the margin is a constant value set to 1.25, which is used to maintain the lower bound of the distance between negative sample pairs.

d

represents the cosine distance between the feature vectors of sample pairs

f (X_{n})

and

f (X_{m})

and can be expressed by Formula (12):

d = \frac{f (X_{n}) \cdot f (X_{m})}{m a x (∥ f (X_{n}) ∥_{2}, ϵ) \cdot m a x (∥ {f (X_{m})}_{2} ∥_{2}, ϵ)},

(12)

ϵ

is a constant

1 \times 10^{- 8}

and is a small value to avoid zero division errors.

The cosine distance metric enhances the network model’s sensitivity to the direction similarity for HSIs rather than simply focusing on the numerical magnitude. The goal is to minimize the contrastive loss function

L_{c o n t r a}

by optimizing the parameters

θ

so that the model can bring similar samples closer in the representation space while pushing away dissimilar samples. The cosine distance effectively anchors the learning focus of the Siamese network, ensuring directional similarity rather than absolute feature value differences as the basis for discrimination, thus forming meaningful and distinctive mapping in the representation space.

In the classification learning process, the class label with the highest predicted probability value is selected as the predicted label Y. This label can be represented by Formula (13):

Y = {f_{q} (X)}_{m a x}, q \in (1,2, \dots, {N C}_{c l a s s}) .

(13)

The cross-entropy loss was used for retraining

θ

. First, we define the cross-entropy loss function

L_{c r o s s - e n t r o p y}

, which is used to measure the distance between the predicted labels and the real labels. The cross-entropy loss function is shown in Formula (14):

L_{c r o s s - e n t r o p y} = - \sum_{i = 1}^{N} Y^{q} \times l o g {\hat{Y}}^{q},

(14)

where

Y^{q}

is the real label of the

q

-th sample and

{\hat{Y}}^{i}

is the predicted label of the

q

-th sample. Next, the parameter

θ

is used to update the model to decrease the value of the cross-entropy loss function. The update process is shown in Formula (15):

θ = a r g m i n \{L_{c r o s s - e n t r o p y} [f (X_{q})], Y_{q}; θ\} .

(15)

X_{q}

and

Y_{q}

represent the

q

-th training sample and its label, respectively.

3.2.3. AL Selecting Newly Labeled Training Samples

After the training of the Siamese network, the remaining unlabeled sample is input into the model to obtain the classification prediction probability

P_{p r e d i c t i o n}

. Then, for each class, the sample with the highest prediction probability is selected, and manual annotation is used to obtain the corresponding real label. It can be represented by the following Formula (16):

P_{p r e d i c t i o n} = \max (f (X_{u n l a b l e d})) = \{p_{1, m a x}, p_{2, m a x}, \dots, p_{t, m a x}, \dots, p_{c, m a x}\},

(16)

p_{t, m a x}

denotes the sample with the maximum predicted probability value in class

t

. Then, through manual annotation, the real label

P_{t r u e}

can be obtained as Formula (17):

P_{t r u e} = M a n u a l (P_{p r e d i c t i o n}) .

(17)

The AL stage can select representative samples more effectively, thus improving the ability of the Siamese network to discriminate features while reducing the practical labeling cost.

3.2.4. Updating the Training Set

The newly labeled samples

{S e t}_{n e w_L a b l e}

are added to the original training set

{S e t}_{o r i_L a b l e}

to update the training sample set, which can be represented by the following Formula (18):

{S e t}_{o r i_L a b l e} = {S e t}_{o r i_L a b l e} \cup {S e t}_{n e w_L a b l e} .

(18)

The whole network is updated using the latest training sample set data, and continuous iterative optimization is performed. The updated network is used for the next round of uncertainty evaluation and sample selection. Through continuous iteration, the training sample set gradually expands, and the classification performance of the network is also enhanced. After the training sample set expands to a certain size, the whole network parameters are fixed, and then the unlabeled samples are passed through the network to obtain the final classification result.

4. Experiments and Results

In this section, three real HSI datasets were used to validate the classification performance of the proposed network. For each dataset, initially, one sample from each class is randomly selected to constitute the training set, and in every round, the AL stage selects one more sample from each class to add to the training set. Eventually, three samples from each class were included in the final training set, and ten repeated experiments were conducted, with the average value serving as the final experimental result. The proposed AL-MRIS utilized a sliding window of 11 × 11 size to generate a series of data blocks. In the contrastive learning process, the learning rate and the weight decay parameter were set to 5 ×

10^{- 5}

and 0, respectively. During the classification process, the learning rate and the weight decay parameter were set to 1 ×

10^{- 3}

and 5 ×

10^{- 5}

, respectively. All the experiments were carried out on a server equipped with 80 GB of memory and an RTX 3080 GPU and were implemented in Python.

4.1. Datasets

The University of Pavia (PU) dataset was captured in an urban scene in the city of Pavia using the ROSIS (Reflective Optics System Imaging Spectrometer) sensor. The spatial resolution is 1.3 m. The dataset contains spectral wavelengths ranging from 430 to 860 nanometers. The dataset has a size of 610 × 340 pixels and 103 spectral bands. The PU dataset included 9 different land cover classes, and the names of these classes and the number of samples are shown in Table 1. Additionally, Figure 6 presents the pseudocolor image and the corresponding ground truth image to more intuitively showcase the characteristics of the PU dataset.

The Indian Pines (IP) dataset was collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over the IP area in northwestern Indiana in 1996 with a spatial resolution of 20 m. This dataset contains one of the earliest HSI datasets. The dataset included 220 bands, after which the noisy and water absorption bands were removed; the remaining 200 bands (consisting of bands 1 to 103, 109 to 149 and 164 to 219) were used for the experiments. The IP dataset has a size of 145 × 145 pixels. In the data scene, sixteen different classes are covered. The detailed class names and sample quantities can be found in Table 2. In addition, Figure 7 shows the pseudocolor image of the IP dataset and the corresponding ground truth image.

The Salinas Valley (SA) dataset was captured by the AVIRIS sensor in the Salinas Valley area of California, United States, in 1998. The spatial resolution is 3.7 m. The dataset was stripped of 20 water absorption bands, leaving 144 usable bands. The size of the SA dataset is 512 × 217 pixels. In the data scene, 16 different classes are covered. Table 3 provides detailed information on the names and numbers of samples for each class, and Figure 8 shows the pseudocolor image and ground truth image.

4.2. Evaluation Metrics

Three common classification metrics were used to evaluate the classification performance: overall accuracy (OA) [33], average accuracy (AA) [34] and Kappa coefficient (Kappa) [33]. OA represents the proportion of correctly classified labels to the total number of labels. AA is the mean value of the classification accuracies for all the classes, while Kappa measures the agreement between the predicted labels and the real labels. By using these metrics, the classification performance can be evaluated more comprehensively.

4.3. Comparison of Different Classification Methods

To further verify the effectiveness of the proposed AL-MRIS method, several state-of-the-art classification methods, including DRIN [29], 3DCSN [20], S3Net [23], ALPN [26], FAAL [27], CFSL [16] and Gia-CFSL [17], were used for comparison. The corresponding classification maps are shown in Figure 9, Figure 10, Figure 11 and Figure 12. Compared with the classification maps of the other methods, the classification map obtained by the proposed AL-MRIS method is closest to the ground truth map. The corresponding classification accuracy results are shown in Table 4, Table 5 and Table 6. The proposed AL-MRIS method achieves optimal OA, AA and Kappa results on all three datasets. For the PU dataset, the OA of the proposed AL-MRIS method reaches 84.71%; this value is 5.34% higher than that of the CFSL method (79.37%), which has the highest accuracy among the comparison methods, and 47.53% higher than that of the FAAL method (37.18%). For the IP dataset, the OA of the AL-MRIS method reached 75.31%, which was 6.84% higher than that of the CFSL method (68.47%), which has the highest accuracy among the comparison methods. Compared with that of FAAL (32.42%), the OA of the AL-MRIS method was 42.89% higher. For the SA dataset, the OA of the AL-MRIS method was 90.18%, which was 1.34% higher than that of S3Net (88.84%) and 27.61% higher than that of FAAL (62.57%). Meanwhile, Table 4 shows that for the PU dataset the proposed AL-MRIS method achieves the highest classification accuracy in Asphalt, Meadows, Painted metal sheets, and Bare Soil. Figure 10 shows the partial enlarged PU classification maps, and Asphalt (in red) and Meadows (in light green) in the AL-MRIS classification map are the closest to the ground truth distribution.

The reasons that the proposed AL-MRIS method can achieve the best classification effect are as follows: The MRIN module in proposed AL-MRIS method can comprehensively consider the local features, dynamic features and global features through the multipath residual connection, which improve the representation ability of HSI. Moreover, by integrating the Siamese network with AL, representative samples can be selected more effectively, thus improving the ability of the Siamese network to discriminate. Moreover, the CD loss utilized the directional similarity of high-dimensional HSI data, and the discriminability of the Siamese classification network improved.

4.4. Parameter Discussions

This section discusses and analyzes the key parameters of the proposed AL-MRIS method. The PU dataset was used for analysis. Initially, one sample from each class was randomly selected, and despite being at the AL stage, three samples from each class were ultimately included in the final training set.

4.4.1. Impacts of the MRIN Module Number

To analyze the impacts of the MRIN module number, the number of concatenated MRIN modules was set to 1, 2, 3, 4 and 5; the classification results are shown in Figure 13. When the concatenation number is three, the OA, AA and Kappa all yield optimal performances.

The reasons for this difference are as follows: when the concatenation number is one, only a single MRIN module is used, and the features of the input image are not fully mined, leading to the lowest classification results. When the concatenated number is more than one, the classification performance is improved because the concatenated modules can obtain more rich and complex comprehensive information by gradually extracting and combining the features. However, when the concatenation number is too large, the gradient is easily diluted or amplified during backpropagation, leading to difficulties or degradation in network training. Therefore, as shown in Figure 13, when the concatenation number is 3, the network can make full use of image features while avoiding the problem of gradient disappearance or gradient explosion, thus achieving the best classification performance. According to the results and the analysis, to achieve better feature aggregation, the proposed AL-MRIS network concatenates MRIN modules three times in series.

4.4.2. Comparison of Different Distance-Based Contrast Losses

To verify the advantages of the proposed cosine distance-based contrastive loss (CD loss), comparison experiments with three other distance-based contrast loss functions (European distance-based contrast loss, Minkowski-based contrast loss and Jensen–Shannon-based contrast loss) were conducted. The classification accuracy results for different distance-based contrast losses are listed in Table 7. As shown in Table 7, the classification results of the proposed CD loss are higher than those of any other comparison loss for OA, AA and Kappa. The OA of the CD loss was 84.71%, which was 1.94% higher than that of the European distance-based contrast loss (82.77%). This loss was 4.64% higher than the Minkowski-based contrast loss (80.07%) and 5.4% higher than the Jensen–Shannon-based contrast loss (79.31%). Moreover, for a more intuitive presentation, Figure 14 displays an intuitive comparison of OA.

The reason why the CD loss can achieve the best classification performance is that the cosine distance enhances the network model’s sensitivity to the direction similarity for HSIs rather than simply focusing on the numerical magnitude. For high-dimensional HSI data, the direction can better capture the similarity than the distance. Therefore, the CD loss improves the discriminability of the Siamese classification network.

4.4.3. Influences of Different Training Sample Numbers

This section discusses the influences of different training sample numbers on the classification methods. In the PU dataset, two, three four, five and six labeled samples were used as training samples, and the proposed AL-MRIS method was compared with four representative methods: 3DCSN, ALPN, DRIN and Gia-CFSL. The OA results for the classification methods with different training samples are shown in Figure 15. The classification performances of all the classification methods gradually improve with the increasing training sample number because as the number of training samples increases, more information becomes available for the classification process. Moreover, the proposed AL-MRIS method outperforms all the comparison methods for different training sample numbers and achieves the highest OA, which indicates that the proposed AL-MRIS method has excellent classification performance and is especially suitable for few-shot classification.

4.4.4. Ablation Experiments

To verify the validity of the Siamese network framework, AL strategy and MRIN module in the proposed AL-MRIS method, ablation experiments were carried out on the PU dataset, and the classification results are shown in Table 8. As shown in Table 8, compared with those of the proposed AL-MRIS method, when the Siamese network framework was missing, the OA decreased by 17.99%, the AA decreased by 20.12% and the Kappa decreased by 22.40%. This is because the Siamese network framework achieves data augmentation to a certain extent and can extract information beyond labels from the data itself, thereby achieving better classification performance. Through learning, the Siamese network explore the rich information inherent in the samples themselves and decreases the interclass distance and decreases intraclass distance. When the AL strategy was abandoned, the OA decreased by 10.20%, the AA decreased by 9.26%, and the Kappa decreased by 10.99%. This shows that through the AL strategy, representative samples can be selected more effectively, thus improving the network’s discriminant ability. When the MIRN module was replaced with a normal 3D convolution, the OA decreased by 20.29%, the AA decreased by 31.41%, and the Kappa decreased by 27.30%. This finding verifies that the MIRN module comprehensively considers local features, dynamic features and global features, which improves the representation ability of HSIs.

5. Conclusions

For few-shot HSI classification, an active learning (AL)-based multipath residual involution Siamese network, named AL-MRIS, is proposed. In the AL-MRIN method, an AL-based Siamese network framework is constructed, and representative samples can be selected more effectively, thus improving the discriminative ability of the Siamese network. An MRIN module is designed to comprehensively consider local features, dynamic features and global features, which improves the representation ability of HSIs. Moreover, a CD loss function for Siamese networks is proposed to utilize the directional similarity of high-dimensional HSI data, and this approach improves the discriminability of the Siamese classification network. A large number of experimental results show that the proposed AL-MRIS method can achieve excellent classification performance with only a few training samples. Moreover, for few-shot HSI classification, we will try to integrate AL-MRIN with cross-domain learning in the future to further improve the classification performance.

Author Contributions

Conceptualization, J.Y. and J.Q. (Jia Qin); Methodology, J.Y. and J.Q. (Jia Qin); Original draft preparation, J.Q. (Jia Qin); Review and editing, J.Y., J.Q. (Jinxi Qian) and A.L.; Valuable advice, J.Q. (Jinxi Qian) and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Nos. 62001434, 62071084).

Data Availability Statement

The three real HSI datasets analyzed during the research can be found at http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes, which was accessed on 1 May 2023.

Acknowledgments

The authors wish to express gratitude for the valuable comments and suggestions provided by the editors and the anonymous reviewers.

Conflicts of Interest

The authors declare no conflict of interest.

References

Janne, M.; Sarita, K.-S.; Sonja, K.; Topi, T.; Pekka, H.; Peter, K.; Laura, P.; Arto, V.; Sakari, T.; Timo, K.; et al. Tree species classification from airborne hyperspectral and LiDAR data using 3D convolutional neural networks. Remote Sens. Environ. Interdiscip. J. 2021, 256, 112322. [Google Scholar]
Teng, M.Y.; Mehrubeoglu, R.; King, S.A.; Cammarata, K.; Simons, J. Investigation of epifauna coverage on seagrass blades using spatial and spectral analysis of hyperspectral images. In Proceedings of the 2013 5th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Gainesville, FL, USA, 26–28 June 2013. [Google Scholar] [CrossRef]
Kirsch, M.; Lorenz, S.; Zimmermann, R.; Tusa, L.; Möckel, R.; Hödl, P.; Booysen, R.; Khodadadzadeh, M.; Gloaguen, R. Integration of Terrestrial and Drone-Borne Hyperspectral and Photogrammetric Sensing Methods for Exploration Mapping and Mining Monitoring. Remote Sens. 2018, 10, 1366. [Google Scholar] [CrossRef]
Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D–2-D CNN Feature Hierarchy for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 277–281. [Google Scholar] [CrossRef]
Tao, H.; Duan, Q.; Lu, M.; Hu, Z. Learning discriminative feature representation with pixel-level supervision for forest smoke recognition. Pattern Recognit. J. Pattern Recognit. Soc. 2023, 143, 109761. [Google Scholar] [CrossRef]
Deng, B.; Jia, S.; Shi, D. Deep Metric Learning-Based Feature Embedding for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 58, 1422–1435. [Google Scholar] [CrossRef]
Guo, A.J.X.; Zhu, F. A CNN-Based Spatial Feature Fusion Algorithm for Hyperspectral Imagery Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7170–7181. [Google Scholar] [CrossRef]
Gao, K.; Liu, B.; Yu, X.; Zhang, P.; Tan, X.; Sun, Y. Small sample classification of hyperspectral image using model-agnostic meta-learning algorithm and convolutional neural network. Int. J. Remote Sens. 2021, 42, 3090–3122. [Google Scholar] [CrossRef]
Li, W.; Liu, Q.; Zhang, Y.; Wang, Y.; Yuan, Y.; Jia, Y.; He, Y. Few-Shot Hyperspectral Image Classification Using Meta Learning and Regularized Finetuning. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
Zhang, J.; Liu, L.; Zhao, R.; Shi, Z. A Bayesian Meta-Learning-Based Method for Few-Shot Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5500613. [Google Scholar] [CrossRef]
Cao, M.; Zhao, G.; Dong, A.; Lv, G.; Guo, Y.; Dong, X. Few-Shot Hyperspectral Image Classification Based on Cross-Domain Spectral Semantic Relation Transformer. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 9–12 October 2023; pp. 1375–1379. [Google Scholar]
Li, Z.; Liu, M.; Chen, Y.; Xu, Y.; Li, W.; Du, Q. Deep Cross-Domain Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5501618. [Google Scholar] [CrossRef]
Zhang, C.; Zhong, S.; Gong, C. Feature Integration-Based Training for Cross-Domain Hyperspectral Image Classification. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 3572–3575. [Google Scholar]
Wang, B.; Xu, Y.; Wu, Z.; Zhan, T.; Wei, Z. Spatial–Spectral Local Domain Adaption for Cross Domain Few Shot Hyperspectral Images Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5539515. [Google Scholar] [CrossRef]
Wang, W.; Liu, F.; Liu, J.; Xiao, L. Cross-Domain Few-Shot Hyperspectral Image Classification with Class-Wise Attention. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5502418. [Google Scholar] [CrossRef]
Huang, K.-K.; Yuan, H.T.; Ren, C.X.; Hou, Y.E.; Duan, J.L.; Yang, Z. Hyperspectral Image Classification via Cross-Domain Few-Shot Learning with Kernel Triplet Loss. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5530818. [Google Scholar] [CrossRef]
Zhang, Y.; Li, W.; Zhang, M.; Wang, S.; Tao, R.; Du, Q. Graph Information Aggregation Cross-Domain Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 1912–1925. [Google Scholar] [CrossRef]
Li, Z.; Guo, H.; Chen, Y.; Liu, C.; Du, Q.; Fang, Z. Few-Shot Hyperspectral Image Classification with Self-Supervised Learning. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5517917. [Google Scholar] [CrossRef]
Li, Y.; Zhang, L.; Wei, W.; Zhang, Y. Deep Self-Supervised Learning for Few-Shot Hyperspectral Image Classification. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 501–504. [Google Scholar]
Cao, Z.; Li, X.; Jiang, J.; Zhao, L. 3D convolutional Siamese network for few-shot hyperspectral classification. J. Appl. Remote Sens. 2020, 14, 048504. [Google Scholar] [CrossRef]
Huang, L.; Chen, Y. Dual-Path Siamese CNN for Hyperspectral Image Classification with Limited Training Samples. IEEE Geosci. Remote Sens. Lett. 2024, 18, 518–522. [Google Scholar] [CrossRef]
Wang, W.; Chen, Y.; He, X.; Li, Z. Soft Augmentation-Based Siamese CNN for Hyperspectral Image Classification with Limited Training Samples. IEEE Geosci. Remote Sens. Lett. 2022, 19, 5508505. [Google Scholar] [CrossRef]
Xue, Z.; Zhou, Y.; Du, P. S3Net: Spectral–Spatial Siamese Network for Few-Shot Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5531219. [Google Scholar] [CrossRef]
Hou, W.; Chen, N.; Peng, J.; Sun, W. A Prototype and Active Learning Network for Small-Sample Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5510805. [Google Scholar] [CrossRef]
Ma, K.Y.; Chang, C.-I. Iterative Training Sampling Coupled with Active Learning for Semi supervised Spectral–Spatial Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 8672–8692. [Google Scholar] [CrossRef]
Li, X.; Cao, Z.; Zhao, L.; Jiang, J. ALPN: Active-Learning-Based Prototypical Network for Few-Shot Hyperspectral Imagery Classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 5508305. [Google Scholar] [CrossRef]
Wang, G.; Ren, P. Hyperspectral Image Classification with Feature-Oriented Adversarial Active Learning. Remote Sens. 2020, 12, 3879. [Google Scholar] [CrossRef]
Li, D.; Hu, J.; Wang, C.; Li, X.; She, Q.; Zhu, L.; Zhang, T.; Chen, Q. Involution: Inverting the Inherence of Convolution for Visual Recognition. arXiv 2021. [Google Scholar] [CrossRef]
Meng, Z.; Zhao, F.; Liang, M.; Xie, W. Deep Residual Involution Network for Hyperspectral Image Classification. Remote Sens. 2021, 13, 3055. [Google Scholar] [CrossRef]
Wu, H.; Xu, Z.; Zhang, J.; Yan, W.; Ma, X. Face recognition based on convolution siamese networks. In Proceedings of the 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 14–16 October 2017; IEEE: Piscataway, NJ, USA, 2018. [Google Scholar] [CrossRef]
Dey, S.; Dutta, A.; Toledo, J.I.; Ghosh, S.K.; Llados, J.; Pal, U. SigNet: Convolutional Siamese Network for Writer Independent Offline Signature Verification. arXiv 2017. [Google Scholar] [CrossRef]
Chen, Z.; Zhong, B.; Li, G.; Zhang, S.; Ji, R. Siamese Box Adaptive Network for Visual Tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
Foody, G.M. Status of land cover classification accuracy assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
Richards, J.A. Classifier performance and map accuracy. Remote Sens. Environ. 1996, 57, 161–166. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of involution. The input is

H_{i, j}

, and the output is

Y_{I n v}

.

Figure 1. Schematic diagram of involution. The input is

H_{i, j}

, and the output is

Y_{I n v}

.

Figure 2. Siamese network structure.

Figure 3. Flowchart of AL-MRIN. The red and green regions represent the two Siamese subnetworks, and the blue regions represent the active learning strategy.

Figure 4. Diagram of the MRIN module. The middle gray arrow branch is a dynamic feature branch, the upper purple arrow branch is a local feature branch and the lower yellow arrow branch is a global feature branch.

Figure 5. The framework of the AL-based Siamese network.

Figure 6. The pseudocolor image and ground truth image of the PU dataset. (a) The pseudocolor image. (b) The ground truth image.

Figure 7. The pseudocolor image and ground truth image of the IP dataset. (a) The pseudocolor image. (b) The ground truth image.

Figure 8. The pseudocolor image and ground truth image of the SA dataset. (a) The pseudocolor image. (b) The ground truth image.

Figure 9. Classification maps of the PU dataset: (a) Ground truth, (b) DRIN, (c) Sia-3DCNN, (d) 3DCSN, (e) S3Net, (f) ALPN, (g) FAAL, (h) CFSL, (i) Gia-CFSL and (j) Proposed AL-MRIS.

Figure 10. Partial enlarged classification maps obtained for the PU dataset: (a) ground truth, (b) DRIN, (c) Sia-3DCNN, (d) 3DCSN, (e) S3Net, (f) ALPN, (g) FAAL, (h) CFSL, (i) Gia-CFSL, and (j) proposed AL-MRIS.

Figure 11. Classification maps of the IP dataset: (a) ground truth, (b) DRIN, (c) Sia-3DCNN, (d) 3DCSN, (e) S3Net, (f) ALPN, (g) FAAL, (h) CFSL, (i) Gia-CFSL and (j) proposed AL-MRIS.

Figure 12. Classification maps of the SA dataset: (a) ground truth, (b) DRIN, (c) Sia-3DCNN, (d) 3DCSN, (e) S3Net, (f) ALPN, (g) FAAL, (h) CFSL, (i) Gia-CFSL and (j) proposed AL-MRIS.

Figure 13. Classification results for different numbers of MRIN modules.

Figure 14. OA results for different distance-based contrast losses.

Figure 15. OA results with different training sample numbers.

Table 1. The number of sample classes in the PU dataset.

Label	Class	#Number
1	Asphalt	6631
2	Meadows	18,649
3	Gravel	2099
4	Trees	3064
5	Painted metal sheets	1345
6	Bare Soil	5029
7	Bitumen	1330
8	Self-Blocking Bricks	3682
9	Shadows	947
Total (9 classes)		42,776

Table 2. The number of sample classes in the IP dataset.

Label	Class	#Number
1	Alfalfa	40
2	Corn–notill	1428
3	Corn–mintill	830
4	Corn	237
5	Grass–pasture	483
6	Grass–trees	730
7	Grass–pasture–mowed	28
8	Hay–windrowed	478
9	Oats	20
10	Soybean–notill	972
11	Soybean–mintill	2455
12	Soybean–clean	593
13	Wheat	205
14	Woods	1265
15	Buildings–Grass–Trees–Drives	386
16	Stone–Steel–Towers	93
Total (16 classes)		10,249

Table 3. The number of sample classes in the SA dataset.

Label	Class	#Number
1	Brocoli_green_weeds_1	2006
2	Brocoli_green_weeds_2	3723
3	Fallow	1973
4	Fallow_rough_plow	1391
5	Fallow_smooth	2675
6	Stubble	3956
7	Celery	3576
8	Grapes_untrained	11,268
9	Soil_vinyard_develop	6200
10	Corn_senesced_green_weeds	3275
11	Lettuce_romaine_4wk	1065
12	Lettuce_romaine_5wk	1924
13	Lettuce_romaine_6wk	913
14	Lettuce_romaine_7wk	1067
15	Vinyard_untrained	7265
16	Vinyard_vertical_trellis	1804
Total (16 classes)		54,081

Table 4. Classification accuracies of the different methods on the PU dataset (three training samples per class). The best results are shown in bold typeface.

	DRIN	Sia-3DCNN	3DCSN	S3Net	ALPN	FAAL	CFSL	Gia-CFSL	Proposed AL-MRIS
OA (%)	72.06 ± 4.59	63.73 ± 6.19	65.27 ± 5.01	75.24 ± 6.42	69.19 ± 8.29	37.18 ± 7.68	79.37 ± 1.05	73.58 ± 3.47	84.71 ± 3.58
AA (%)	80.42 ± 2.49	68.09 ± 4.47	71.99 ± 3.08	81.75 ± 3.67	71.54 ± 8.29	23.82 ± 7.86	81.07 ± 2.13	74.97 ± 1.20	82.27 ± 5.07
Kappa × 100	65.70 ± 4.99	52.65 ± 7.96	57.78 ± 6.41	68.83 ± 7.18	61.29 ± 5.65	38.46 ± 5.01	74.29 ± 1.94	65.97 ± 3.79	79.81 ± 4.71
Asphalt	87.02	65.51	68.88	63.99	54.13	5.35	62.02	73.31	93.49
Meadows	41.75	25.53	66.12	74.20	74.27	23.51	82.64	68.35	82.01
Gravel	64.02	67.55	73.95	79.76	82.37	76.65	77.93	87.58	55.56
Trees	87.97	59.26	65.59	70.81	93.39	2.23	75.81	73.71	80.92
Painted metal sheets	96.34	100	98.95	95.97	99.70	99.08	99.32	99.92	100
Bare Soil	80.00	69.11	86.64	77.79	82.12	81.32	82.40	85.35	100
Bitumen	60.28	88.62	97.89	89.74	93.28	100	93.51	96.75	96.83
Self-Blocking Bricks	99.13	36.62	56.71	35.23	69.40	8.56	42.42	54.96	94.45
Shadows	98.83	64.08	54.23	79.74	58.81	1.61	56.68	60.51	74.89

Table 5. Classification accuracies of the different methods on the IP dataset (three training samples per class). The best results are shown in bold typeface.

	DRIN	Sia-3DCNN	3DCSN	S3Net	ALPN	FAAL	CFSL	Gia-CFSL	Proposed AL-MRIS
OA (%)	63.91 ± 3.28	50.34 ± 2.81	59.62 ± 3.20	66.66 ± 1.88	60.98 ± 1.09	32.42 ± 4.26	68.47 ± 2.79	56.00 ± 5.62	75.31 ± 2.09
AA (%)	78.16 ± 1.67	64.84 ± 3.49	74.33 ± 2.47	79.66 ± 2.35	69.49 ± 1.38	45.28 ± 3.21	77.13 ± 2.43	68.62 ± 3.48	78.69 ± 2.09
Kappa × 100	56.87 ± 3.42	44.68 ± 2.94	54.83 ± 3.49	62.85 ± 2.05	55.71 ± 1.29	52.92 ± 3.65	64.84 ± 2.56	50.60 ± 5.97	71.86 ± 2.26
Alfalfa	100	90.69	97.67	100	100	72.09	83.72	95.34	100
Corn-notill	30.31	39.92	31.01	55.12	39.18	29.37	53.89	59.37	72.75
Corn-mintill	57.67	21.88	61.54	54.17	22.88	2.52	81.98	77.53	69.93
Corn	100	54.27	90.17	82.48	75.53	20.08	38.88	94.44	97.31
Grass-pasture	63.12	60.41	64.58	84.61	69.10	10.98	71.45	65.62	91.68
Grass-trees	95.59	93.94	91.61	91.75	95.59	64.99	81.15	76.25	96.50
Grass-pasture-mowed	100	100	100	100	100	50.00	100	100	100
Hay-windrowed	95.36	79.36	88.21	99.21	74.05	62.76	99.36	95.57	99.13
Oats	100	88.23	100	100	100	73.68	100	100	100
Soybean-notill	70.89	65.12	53.56	98.71	40.80	28.08	75.23	46.59	72.51
Soybean-mintill	57.25	27.36	47.96	46.31	41.91	58.34	63.25	35.91	69.05
Soybean-clean	56.61	38.13	20.11	56.21	25.97	11.48	68.64	75.59	63.03
Wheat	100	74.25	76.73	99.62	82.58	36.66	100	92.43	99.47
Woods	99.68	69.17	34.07	89.31	87.71	81.79	31.69	81.11	94.16
Buildings-Grass-Trees-Drives	63.44	66.57	64.75	72.45	47.64	16.95	77.75	75.14	94.07
Stone-Steel-Towers	99.67	97.77	95.55	98.89	98.87	70.73	100	100	100

Table 6. Classification accuracies of the different methods on the SA dataset (three training samples per class). The best results are shown in bold typeface.

	DRIN	Sia-3DCNN	3DCSN	S3Net	ALPN	FAAL	CFSL	Gia-CFSL	Proposed AL-MRIS
OA (%)	87.42 ± 5.07	85.62 ± 2.07	88.20 ± 1.97	88.84 ± 3.21	79.84 ± 0.61	62.57 ± 4.28	79.57 ± 2.45	87.44 ± 1.91	90.18 ± 1.79
AA (%)	90.13 ± 2.49	88.55 ± 2.05	91.44 ± 1.48	92.39 ± 1.24	91.83 ± 0.39	62.47 ± 3.65	92.19 ± 2.18	91.13 ± 2.22	92.79 ± 2.25
Kappa × 100	86.08 ± 5.54	84.02 ± 2.27	86.90 ± 2.16	87.62 ± 3.53	77.83 ± 0.64	58.06 ± 4.89	77.63 ± 2.61	86.03 ± 2.13	89.05 ± 1.99
Brocoli_green_weeds_1	98.05	90.01	93.07	100	97.95	99.28	93.96	92.82	99.70
Brocoli_green_weeds_2	100	98.92	98.41	83.81	95.13	95.07	100	100	99.11
Fallow	99.89	99.13	99.34	93.76	97.87	65.84	89.05	92.56	100
Fallow_rough_plow	98.56	98.71	71.02	99.07	97.91	69.58	100	100	98.41
Fallow_smooth	94.65	89.98	83.73	99.96	82.91	54.41	93.75	90.15	95.72
Stubble	97.69	87.63	94.38	95.73	83.74	91.26	100	87.26	90.40
Celery	98.82	92.95	98.99	100	88.17	98.65	100	48.16	99.32
Grapes_untrained	18.19	88.55	85.51	61.43	54.81	85.83	7.81	87.84	91.49
Soil_vinyard_develop	99.90	98.45	99.22	100	96.98	13.31	99.88	98.72	96.09
Corn_senesced_green_weeds	89.19	32.51	90.04	94.05	57.25	81.58	97.92	94.16	95.56
Lettuce_romaine_4wk	98.87	92.11	96.61	100	91.92	73.32	99.43	100	97.54
Lettuce_romaine_5wk	96.04	89.71	92.09	94.29	95.47	55.21	99.74	89.48	94.41
Lettuce_romaine_6wk	98.13	99.56	98.91	82.09	98.02	97.77	100	100	100
Lettuce_romaine_7wk	98.21	98.21	92.41	98.03	95.12	39.96	99.71	97.78	97.35
Vinyard_untrained	98.03	70.43	60.51	80.99	91.85	42.18	99.46	65.88	84.23
Vinyard_vertical_trellis	96.51	78.54	82.15	98.45	84.58	13.44	100	90.28	99.67

Table 7. Classification accuracy results for the PU dataset for different distance-based contrast losses. The best results are shown in bold.

	Euclidean Distance	Manhattan Distance	Jensen–Shannon Distance	Cosine Distance
OA (%)	82.77	80.07	79.31	84.71
AA (%)	79.96	75.29	76.18	82.27
Kappa × 100	77.32	74.01	72.34	79.81

Table 8. Classification Results for the Ablation Experiments on the PU Dataset. The best results are shown in bold.

	No Siamese	No AL	No MIRN	Proposed AL-MRIS
OA (%)	66.72	74.51	64.42	84.71
AA (%)	62.15	73.01	50.86	82.27
Kappa × 100	57.41	68.82	52.51	79.81

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, J.; Qin, J.; Qian, J.; Li, A.; Wang, L. AL-MRIS: An Active Learning-Based Multipath Residual Involution Siamese Network for Few-Shot Hyperspectral Image Classification. Remote Sens. 2024, 16, 990. https://0-doi-org.brum.beds.ac.uk/10.3390/rs16060990

AMA Style

Yang J, Qin J, Qian J, Li A, Wang L. AL-MRIS: An Active Learning-Based Multipath Residual Involution Siamese Network for Few-Shot Hyperspectral Image Classification. Remote Sensing. 2024; 16(6):990. https://0-doi-org.brum.beds.ac.uk/10.3390/rs16060990

Chicago/Turabian Style

Yang, Jinghui, Jia Qin, Jinxi Qian, Anqi Li, and Liguo Wang. 2024. "AL-MRIS: An Active Learning-Based Multipath Residual Involution Siamese Network for Few-Shot Hyperspectral Image Classification" Remote Sensing 16, no. 6: 990. https://0-doi-org.brum.beds.ac.uk/10.3390/rs16060990

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AL-MRIS: An Active Learning-Based Multipath Residual Involution Siamese Network for Few-Shot Hyperspectral Image Classification

Abstract

1. Introduction

2. Related Works

2.1. Involution Network

2.2. Siamese Network

3. Our Proposed AL-MRIS Method

3.1. The Multipath Residual Involution (MRIN) Module

3.2. AL-Based Siamese Network

3.2.1. Construct Sample Pairs Based on the Training Sample Set

3.2.2. Siamese Network Learning Using the Training Set

3.2.3. AL Selecting Newly Labeled Training Samples

3.2.4. Updating the Training Set

4. Experiments and Results

4.1. Datasets

4.2. Evaluation Metrics

4.3. Comparison of Different Classification Methods

4.4. Parameter Discussions

4.4.1. Impacts of the MRIN Module Number

4.4.2. Comparison of Different Distance-Based Contrast Losses

4.4.3. Influences of Different Training Sample Numbers

4.4.4. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI