Feature Matching for Remote-Sensing Image Registration via Neighborhood Topological and Affine Consistency

Gong, Xi; Yao, Feng; Ma, Jiayi; Jiang, Junjun; Lu, Tao; Zhang, Yanduo; Zhou, Huabing

doi:10.3390/rs14112606

Open AccessArticle

Feature Matching for Remote-Sensing Image Registration via Neighborhood Topological and Affine Consistency

¹

Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430073, China

²

Electronic Information School, Wuhan University, Wuhan 430072, China

³

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(11), 2606; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14112606

Submission received: 27 April 2022 / Revised: 20 May 2022 / Accepted: 26 May 2022 / Published: 29 May 2022

(This article belongs to the Special Issue Advances in Hyperspectral Remote Sensing: Methods and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Feature matching is a key method of feature-based image registration, which refers to establishing reliable correspondence between feature points extracted from two images. In order to eliminate false matchings from the initial matchings, we propose a simple and efficient method. The key principle of our method is to maintain the topological and affine transformation consistency among the neighborhood matches. We formulate this problem as a mathematical model and derive a closed solution with linear time and space complexity. More specifically, our method can remove mismatches from thousands of hypothetical correspondences within a few milliseconds. We conduct qualitative and quantitative experiments on our method on different types of remote-sensing datasets. The experimental results show that our method is general, and it can deal with all kinds of remote-sensing image pairs, whether rigid or non-rigid image deformation or image pairs with various shadow, projection distortion, noise, and geometric distortion. Furthermore, it is two orders of magnitude faster and more accurate than state-of-the-art methods and can be used for real-time applications.

Keywords:

feature matching; topological and affine transformation consistency (TAT); registration; remote sensing

Graphical Abstract

1. Introduction

Image registration [1,2] is one of the most basic and challenging tasks in remote sense image processing. Its main goal is to geometrically align two images of the same, or similar scene [3] from different perspectives, different moments, and even different imaging methods. It is an essential prerequisite for the analysis and for processing two or more images. Image registration is widely used in many remote sensing fields, such as image-stitching, image fusion, change detection, and environmental monitoring [4,5,6,7,8].

Image registration methods can be roughly divided into two categories, one is region-based methods and the other is feature-based methods. Region-based methods establish matching correspondences by pixel intensities in the intersection of two images with a certain similarity, and such methods usually do not detect features. Feature-based methods extract feature points and descriptors from images and then establish matching correspondences based on their similarity or geometric relationship constraints [3]. These methods need to detect features.

Feature-based methods are more robust when dealing with remote-sensing images with image distortion, significant illumination changes, and significant noise effects. In this paper, we focus on feature-based image registration methods. We transform the problem of image registration into the problem of matching between discrete feature points in two images.

These discrete feature points are obtained by general-purpose feature detectors, such as scale-invariant feature transform (SIFT) [9], speeded up robust features (SURF) [10], oriented FAST and rotated BRIEF (ORB) [11], etc. In the feature-matching processes, it is often necessary to compare the feature points in the two images one by one so that the matching space of all possible matches is large. Assuming that there are N feature points to be matched in the two images, a total of

N!

arrangements [12] need to be computed, which is inevitably time-consuming.

To overcome this challenge, researchers filter out untrusted outliers through similarity constraints, thereby obtaining a initial set of putative correspondences containing only similar descriptors. Then, our method only needs to judge the correctness of the initial matching correspondences in the hypothetical set, to remove the false matching, and to establish an accurate correspondence. This method greatly reduces the amount of computation.

Researchers have designed a variety of algorithms to achieve the problem of removing false matching [13,14,15]. However, it is still a challenging and urgent task to design a general and efficient matching algorithm to handle various feature matching problems in real scenes when faced with the registration problem of remote-sensing images. Remote-sensing images mainly have the following characteristics: First, remote-sensing images are mostly pictures of cities, mountains, farmland, and other complex scenes taken by an unmanned aerial vehicle (UAV) in the air.

Secondly, remote-sensing images have many great challenges that affect image registration, such as a large field of view, low image quality, serious geometric distortion and obvious viewpoint changes. Researchers proposed a series of methods based on the consistency of neighborhood topology to solve the above problems. For example, Ma et al. [16] proposed a Locality Preserving Matching (LPM) method. By observing the differences between image pairs of various transformation types, with LPM, they found that, although the Euclidean distance between the feature points of the image pairs changes, the topological structure of the feature points and their neighboring points does not change easily, in realizing remote-sensing image matching using this property.

However, methods based on neighborhood topology consistency fail in the following two cases: (1) Some false matches and their neighborhood point pairs can also remain similar in relative directions and lengths. (2) Some correct matches cannot maintain the relative orientation consistency of neighborhood topology when there are severe geometric distortions and obvious viewpoint changes between image pairs. Therefore, using only neighborhood topology consistency cannot solve the above problems well.

In order to solve the above problems, in this paper, we propose a method that combines neighborhood topological consistency with affine constraints for the feature matching of remote-sensing images. We found that affine consistency is a more general feature within a small local area of image, and the general affine transformation [17] can well model the topology of the internal small local region of the image. Thus, further introducing affine consistency between matching points and their neighboring point pairs can be well applied to the above two cases. The proposed method is more general and can handle remote-sensing image pairs with multiple transformation types by preserving topological consistency and affine constraints among neighborhood matches.

Our method has the following two contributions: (1) We introduce local affine constraints into the neighborhood topology consistency approach. This method can deal with the problems of severe geometric distortion and obvious viewpoint change of images that can not be overcome by previous methods. (2) We formulate the problem as a mathematical model, derive its solution, and then conduct qualitative, quantitative, and robust experiments on four remote-sensing-image datasets. Experimental results show that our method has better matching performance than other mainstream methods and can deal with matching problems in milliseconds.

The organization of this article is as follows. Related work and background are presented in Section 2. Section 3 proposes a strongly constrained remote-sensing image registration method based on preserving neighborhood topological consistency and an affine transformation model constraints. Then, in Section 4, we present our qualitative, quantitative, and robust experimental results on four different types of remote-sensing-image datasets. Finally, Section 5 summarizes the full text.

2. Related Works

Image registration is widely used in computer vision [18,19], pattern recognition [20,21,22], medical imaging [23,24], and especially remote-sensing image analysis [25,26,27,28]. From some literature reviews, we can find a detailed introduction on image registration methods [3,29]. In the past few decades, image registration technology has developed rapidly, and many methods for image matching have been proposed. These methods are roughly divided into region-based methods and feature-based methods. The region-based method does not detect image features but directly processes the image pixel level.

For example, Zhou et al. [30] proposed a cross-weather road scene registration method based on a latent generative model of intensity constancy, which is a dense matching method. It contains three types of methods: correlation-like methods [31], Fourier methods [32], and mutual information methods [33]. This kind of method can show better effect when there are less detailed features in the image. However, feature-based methods can show better results when we deal with remote-sensing images with significant detailed texture features. It is worth mentioning that these methods will be more popular due to their lower complexity when the image has unfavorable factors, such as low quality, shadows, noise, obvious viewpoint changes, and geometric distortions.

We pay more attention to the feature-based matching methods. We mean to establish the correct corresponding relationship of pixels or points between two images by extracting image feature points and their descriptors. Then, the spatial transformations are estimated based on the results, which can be used to align a given pair of images. We also call this process feature matching, where the main purpose is to establish a one-to-one correspondence between feature point sets of image pairs. It can be divided into indirect matching and direct matching according to whether the local descriptor is used.

Direct matching is to directly use spatial geometric relations and optimization methods to establish correspondences from two given feature sets. Loiola et al. [34] proposed a direct matching method based on secondary assignment. Sofka et al. [35] studied the modeling of uncertainty in the registration process and proposed a covariance-driven correspondence method in an em-like framework. Indirect matching establishes the correct correspondences by establishing the correspondences between the similarity of the descriptors and the distance judged from the measurement space and then removing false matches.

Some parametric model-based methods, such as Maier et al. [36] introduced a statistical optical flow-based guided matching method, RANSAC and its variants MLESAC [37] and PROSAC [38]. These methods have been widely used in image registration in the remote-sensing field. However, when there are many outliers and non-rigid transformations occur, these methods are no longer applicable.

To establish a robust feature-point correspondence between remote-sensing image pairs, we follow a two-step strategy [39,40]: (1) calculate an initial set of hypothetical correspondences and (2) according to the geometric constraints between point pairs, the outliers in the hypothetical set are removed to establish an accurate correspondence. In the first step, researchers have proposed a variety of classic and widely used feature matching algorithms to establish correspondences.

For example, Lowe [9] proposed a SIFT algorithm, the essence of which is to find key points in different scale spaces and calculate the direction of the key points. Then, by comparing the feature vectors of each key point pairwise, several pairs of feature points that match each other are found, and the corresponding relationship between the image pairs is established. Rublee et al. [11] proposed the ORB algorithm, which uses the FAST algorithm to detect feature points, then uses the BRIEF algorithm to calculate the descriptor of a feature point, and finally obtains the corresponding relationship according to the preset threshold and the similarity of the descriptors.

In order to achieve the removal of outliers in the hypothetical set in the second-step strategy, researchers have proposed a number of methods. In remote-sensing image registration, Ma et al. [41] proposed a guided locality-preserving matching method for the robust feature matching of remote-sensing images. Wu et al. [42] proposed an estimation method based on weighted total least squares (WTLS) and introduced it into remote-sensing image registration, which can obtain more robust model parameter estimation and higher registration precision.

Ma et al. [43] proposed a robust remote-sensing image feature matching algorithm based on Lq estimation, which achieved better results. In addition, some nonparametric interpolation methods have been proposed [44], especially in remote-sensing image registration. These methods insert prior knowledge conditions into nonparametric functions, and they apply to a motion field with slow and smooth correspondences.

However, these methods have high computational complexity, and the high time-consumption affects their application in real-time tasks. LPM [16] further preserves the distribution of transformed neighborhood point pairs on the constraint that the distances between general feature correspondences are preserved. At the same time, considering the topology constraints of adjacent elements, LPM transforms the hypothetical matching into displacement vectors, where the head and tail of each vector correspond to the spatial positions of the two corresponding feature points in the two images, and defines the neighborhood topology consistency based on the length ratio and included angle of the vectors to achieve the elimination of mismatching.

Zhou et al. [45] proposed a non-rigid feature matching method for remote-sensing images through probabilistic inference with global and local regularization. These methods focus on the constraints of the neighborhood structure, which is a great inspiration for our method.

3. Methods

This section describes a method for establishing precise correspondences between feature points extracted from two identical or similar images. This method follows a two-step strategy. First, we need to construct a hypothetical set of matching point pairs. Fortunately, more mature feature matching algorithms, such as SIFT and ORB can help us to initially filter out the clearly different matching point pairs between the two feature point sets and to find all the initial matching point pairs that are sufficiently trusted.

Thus, the first step strategy is easy to complete. The focus of our research is the second step strategy. We use a well-designed geometric relationship to constrain the matching correspondences in the hypothetical matching set, and eliminate those mismatched point pairs that do not retain the spatial neighborhood structure and the local affine transformation constraints. Below, we discuss in detail how to eliminate false matches in the second step.

3.1. Problem Formulation

Suppose there is a set of N hypothetical feature-point correspondences

S = {(x_{i}, y_{i})}_{i = 1}^{N}

extracted from two given identical or similar images, where

x_{i}

and

y_{i}

, respectively, represent the two-dimensional vectors coordinates of the feature points extracted from the two images using the SIFT algorithm. The corresponding relationship of the i-th hypothetical feature point is represented by

(x_{i}, y_{i})

. Our goal is to eliminate the mismatched point pairs in the hypothetical set S and construct accurate feature-point correspondences accordingly.

If, an image pair has only undergone rigid transformation, then the distances between all feature correspondences will be preserved. We use

I

to represent the unknown inlier point set [16], and its optimal solution can be expressed as follows:

I^{*} = \arg \min_{I} C (I; S, λ)

(1)

where the cost function is denoted by C and defined as:

C (I; S, λ) = \sum_{i \in I}^{} \sum_{j \in I}^{} {(d (x_{i}, x_{j}) - d (y_{i}, y_{j}))}^{2} + λ (N - | I |)

(2)

In Equation (2), d represents the Euclidean distance between two feature points, and

|\cdot|

denotes the cardinality of a set. In this cost function, the first term penalizes any match which does not preserve the distance of a point pair, and the second term prevents outliers [16]. We set the parameter

λ > 0

to balance these two items. In the case of an ideal rigid transformation, the optimal solution should reach zero penalty, which means that the first term of the cost function C should be zero.

However, simple rigid transformations are rare in reality. The distance relationship between the above-mentioned feature point pairs cannot be well preserved when a given pair of two images undergoes a complex non-rigid transformation, especially for those matching point pairs with a longer distance. Even though there is a clear viewpoint change and non-rigid deformation between the two images due to the physical constraints between the regions around the feature points, the local neighborhood structure between the transformed feature point pairs can be well preserved. Therefore, by preserving only local structures, the cost function in Equation (2) becomes:

\begin{matrix} \begin{matrix} C (I; S, λ) = \sum_{i \in I} \frac{1}{2 K} (\sum_{j ∣ x_{j} \in N_{x_{i}}} {(d (x_{i}, x_{j}) - d (y_{i}, y_{j}))}^{2} \\ + \sum_{j ∣ y_{j} \in N_{y_{i}}} {(d (x_{i}, x_{j}) - d (y_{i}, y_{j}))}^{2}) + λ (N - | I |) \end{matrix} \end{matrix}

(3)

The feature point set does not have a clear neighborhood definition. Therefore, we independently defined a simple neighborhood representation method, which is to search for the K leading points with the closest Euclidean distance for each point in the feature point set. In Equation (3),

N_{x_{i}}

denotes the neighborhood of feature point

x_{i}

, and

1 / 2 K

is the normalization factor for processing the first term.

In the case of non-rigid transformation, the absolute distance d of the feature point pair cannot be maintained well. In order to facilitate subsequent calculations, LPM quantizes the distance metric d into two levels as:

d (x_{i}, x_{j}) = \{\begin{matrix} 0, & x_{j} \in N_{x_{i}} \\ 1, & x_{j} \notin N_{x_{i}} \end{matrix}

(4)

d (y_{i}, y_{j}) = \{\begin{matrix} 0, & y_{j} \in N_{y_{i}} \\ 1, & y_{j} \notin N_{y_{i}} \end{matrix}

(5)

3.2. Consistency of Neighborhood Structure

The above Equation (3) eliminates mismatches by minimizing the cost function C. This process essentially only preserves the distance relationship of the local structure without considering the constraints of the topological structure between the neighborhood of the feature point pairs. The performance of eliminating mismatches is greatly degraded when encountering complex scenes where two images have many similar patterns as shown in Figure 1. Therefore, we strengthen the constraints of the topological structure to improve its ability to eliminate mismatches.

As shown in Figure 1, we use

x_{i}

and

y_{i}

to represent a feature point of two given images, and

(x_{i}, y_{j})

is a pair of feature correspondence in the hypothetical set S. The above part represents a pair of correct matching points, which are also called inliers and connects them with a blue line, where its neighborhood points are connected by blue dotted lines. The below part represents a pair of false matching points, also called outliers connected with red lines, and its neighborhood points are connected by red dotted lines.

For each feature point

x_{i}

and

y_{i}

, we search for the three leading points with the closest Euclidean distance as the neighborhood of feature points according to the mentioned definition of the previous neighborhood. We use green lines to connect the feature points and their neighborhood points to represent the topological structure of the feature point neighborhood. We can clearly see the huge difference in the topological structure of the above and below parts.

For the correct matching point pairs in the above part, the topological structures of the respective neighborhood of

x_{i}

and

y_{i}

are almost the same, and the direction and distance of the neighborhood points are almost the same; the false matching point pairs in the below part are the opposite. It is precisely by taking advantage of this obvious difference that we have designed an algorithm that uses the constraints of the neighborhood topology to eliminate mismatches.

One of the properties of similar triangles is that the corresponding angles are equal and the ratio of the lengths of the corresponding sides is proportional. It is proposed in LRO [46] that, although the absolute coordinates of a pair of correct matching points are different in their respective images, the angle formed between the feature points and their neighborhood points should remain the same. Unlike LRO, we add a further geometric constraint, which is that the ratio of lengths between neighborhood points should remain the same.

As shown in Figure 2, we extend the properties of similar triangles to triangles composed of feature points and any two adjacent neighborhood points, and construct a local neighborhood similar triangle of matching point pairs. Given the feature point

x_{i}

and its neighborhood points

x_{i 1}

,

x_{i 2}

,

x_{i 3}

, they can form three triangles:

Δ x_{i} x_{i 1} x_{i 2}

,

Δ x_{i} x_{i 2} x_{i 3}

, and

Δ x_{i} x_{i 1} x_{i 3}

. Correspondingly, given the feature point

y_{i}

and its neighborhood points

y_{i 1}

,

y_{i 2}

,

y_{i 3}

, they can form three triangles:

Δ y_{i} y_{i 1} y_{i 2}

,

Δ y_{i} y_{i 2} y_{i 3}

, and

Δ y_{i} y_{i 1} y_{i 3}

. From this, we can obtain the following expression:

\begin{matrix} \begin{matrix} Δ x_{i} x_{i 1} x_{i 2} \sim Δ y_{i} y_{i 1} y_{i 2}, Δ x_{i} x_{i 2} x_{i 3} \sim Δ y_{i} y_{i 2} y_{i 3}, Δ x_{i} x_{i 1} x_{i 3} \sim Δ y_{i} y_{i 1} y_{i 3} \end{matrix} \end{matrix}

(6)

3.3. Affine Transformation Model

The consistency of the topological structure of the neighborhood around the matching points cannot be maintained well when the given two images have severe geometric distortions and the viewpoints within the images change greatly. The transformation model refers to the selected geometric transformation model that can best fit the changes between the two images in the case of geometric distortion between the image to be matched and the background image.

Due to the large number of outliers in the initial hypothetical set S, we cannot establish a global and accurate transformation model for this transformation. However, the topological structure of the local area in the image can still be kept consistent with an affine transformation model.

As shown in Figure 3, affine transformation is a linear transformation from 2D-to-2D coordinate, which can maintain the straightness and parallelness of images transformed by translation, scaling, inversion, rotation and shearing, that is, keep the relative position relationship between 2D images unchanged. Equation (7) expresses the formula of affine transformation, where

(t_{x}, t_{y})

represents the amount of translation, and the parameter

a_{i}

reflects the transformation of the image, such as translation, scaling, flipping, rotation, and shearing.

(x, y)

and

(x^{'}, y^{'})

represent the coordinate values of feature points

x_{i}

and

y_{i}

, respectively.

By calculating the parameters

t_{x}

,

t_{y}

,

a_{i}

(i = 1∼4), we can find the affine transformation relationship T of the two images. Based on this, we designed a neighborhood based affine transformation model to further verify the correspondences that does not conform to the consistency of the neighborhood topology.

\begin{matrix} \begin{matrix} [\begin{matrix} x^{'} \\ y^{'} \\ 1 \end{matrix}] = [\begin{matrix} a_{1} & a_{2} & t_{x} \\ a_{3} & a_{4} & t_{y} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x \\ y \\ 1 \end{matrix}] \end{matrix} \end{matrix}

(7)

3.4. Affine Transformation Constraint

As shown in Figure 4, the left part of the first row represents the initial matching results of the two given images using the SIFT. The false match is connected with the red line, and the correct match is connected with the blue line. The right part of the first row represents the vector cluster map of the initial matching result. Red represents the false matching vector, and blue represents the correct matching vector. The left part of the second row represents the matching result obtained by the LPM [16] algorithm.

We selected a pair of false matching points pair

(x_{i}, y_{i})

, and displayed their four neighborhood points in yellow on the left part of the first row. These false neighborhood point pairs were connected by red dashed lines. In particular, in order to highlight, we filter out the interference point pairs around this pair of mismatched points and crop the surrounding parts and paste them on the left part of the first row. The right part of the second row represents the vector cluster map of the matching result of the LPM algorithm.

The left part of the third row represents the matching result obtained by our algorithm. Clearly, our algorithm can eliminate mismatch points that the LPM algorithm cannot eliminate. The right part of the third row represents the vector cluster map of the matching result of the our algorithm. The left part of the fourth row represents the vector diagram of the matching points pair

(x_{i}, y_{i})

and its neighborhood point pairs.

Since

v_{i}

and

v_{i m} (m = 1, 2, 3, 4)

have no significant differences in length and direction, the LPM incorrectly judges that the matching points pair

(x_{i}, y_{i})

maintains the consistency of the neighborhood topology, and thus it was not eliminated.

We found that affine consistency is a more general feature within a small local area of an image. The general affine transformation can well model the topology of the internal small local region of the image. The affine consistency of TAT is a more vital constraint than the relative directional consistency of LPM. The correspondence between neighborhood point pairs is also wrong for the mismatched points that LPM cannot eliminate. In this neighborhood range, although the mismatch points preserve the relative direction consistency of the neighborhood, they fail to satisfy the correct affine transformation model.

Then, the right part of the fourth row indicates that our algorithm randomly selects three pairs of neighborhood points and computes the affine transformation

T_{m}

according to Equation (7). We substitute

T_{m}

into Equation (16) to obtain

y_{i}^{^{'}}

. As shown in Figure 5 (below), since these neighborhood point pairs have false correspondences, and the local neighborhoods of false matched point pairs do not satisfy the correct affine transformation model, and the symmetric transfer error

d_{m}

(calculated by Equation (17)) between the mismatched point

y_{i}

and

y_{i}^{^{'}}

exceeds a set threshold. Thus, our algorithm can eliminate this pair of false matching points, which improves the accuracy of the matching algorithm.

As shown in Figure 6, what we have shown is roughly the same as Figure 4. The difference is that we selected a pair of correct matching point

(x_{i}, y_{i})

from the matching results of the our algorithm in the left part of the third row, and displayed their four neighborhood points in blue on the left part of the first row. These correct neighborhood point pairs are connected by blue dashed lines.

Clearly, this pair of correct matching points was erroneously eliminated from the matching results obtained by the LPM in the left part of the second row. Similarly, we can see from the left part of the fourth row that because the two images have an obvious change in perspective, the images have serious geometric distortion, and the viewpoints within the image have changed greatly. As a result, the consistency of the topological structure of the neighborhood around the matching points cannot be maintained well. There are significant differences between

v_{i}

and

v_{i m}

in length and direction, and thus the LPM eliminates the correct matching points pair

(x_{i}, y_{i})

.

However, these neighborhood point pairs are the correct correspondence, and the local neighborhood of the correctly matched point pair satisfies a correct affine transformation model; thus, there are small symmetry transition errors

d_{m} (x_{i}, y_{i})

. Our algorithm can keep this pair of correct matching points, which improves the recall rate of the matching algorithm.

3.5. Neighborhood Topology Consistency and Affine Transformation Model

Based on the above-mentioned problems and solutions, we propose a new mismatch elimination method based on neighborhood topology consistency and affine transformation model constraints, called TAT. According to this method, we redesign the cost function in Equation (1). We first define a set of feature correspondences

S_{i} = {\{(x_{i m}, y_{i m})\}}_{m = 1}^{L}

for the neighborhood points of feature points

x_{i}

and

y_{i}

, where neighborhood points

x_{i m} \in N_{x_{i}}

,

y_{i m} \in N_{y_{i}}

,

(x_{i m}, y_{i m})

is also a hypothetical correspondence, and L is the number of elements in the hypothetical set

S_{i}

.

Figure 2 shows a similar triangle composed of the i-th hypothetical correspondence in the hypothetical set

S = {(x_{i}, y_{i})}_{i = 1}^{N}

and its neighborhood points, where the common elements in the

N_{x_{i}}

and

N_{y_{i}}

are represented by

{\{(x_{i m}, y_{i m})\}}_{m = 1}^{L}

. The vector formed by the feature point

x_{i}

and its neighborhood point

x_{i m}

is represented by

x_{i} x_{i m}

, and the vector

y_{i} y_{i m}

can be obtained in the same way. In order to express the consistency of the neighborhood topology, we use the similarity measure of neighborhood similar triangles. The similarity of similar triangles is expressed as the similarity of the corresponding angles and the similarity of the length ratio of the corresponding sides. The angles between the vectors

x_{i} x_{i m}

and

x_{i} x_{i (m + 1)}

,

y_{i} y_{i m}

and

y_{i} y_{i (m + 1)}

is defined as:

θ_{x} (i, m) = arccos \frac{(x_{i} x_{i m}, x_{i} x_{i (m + 1)})}{| x_{i} x_{i m} | \cdot | x_{i} x_{i (m + 1)} |}

(8)

θ_{y} (i, m) = arccos \frac{(y_{i} y_{i m}, y_{i} y_{i (m + 1)})}{| y_{i} y_{i m} | \cdot | y_{i} y_{i (m + 1)} |}

(9)

The ratio of the lengths of the corresponding sides

x_{i} x_{i m}

and

y_{i} y_{i m}

,

x_{i} x_{i (m + 1)}

and

y_{i} y_{i (m + 1)}

is defined as:

l_{m} (i, m) = \frac{| x_{i} x_{i m} |}{| y_{i} y_{i m} |}

(10)

l_{m + 1} (i, m) = \frac{| x_{i} x_{i (m + 1)} |}{| y_{i} y_{i (m + 1)} |}

(11)

where

arccos (\cdot)

represents the inverse function of cosine,

(\cdot, \cdot)

represents the inner product, and

|\cdot|

represents the length of the vector, and

m \in \{1, \dots, L\}

. In particular, in order to ensure that the subscript value m does not overflow, when

m = L

, let

m + 1 = 1

.

The angle similarity of two corresponding angles is expressed as

s_{θ} (i, m)

, and the similarity of the ratio of the two corresponding side lengths is expressed as

s_{l} (i, m)

:

s_{θ} (i, m) = 1 - \frac{| θ_{x} (i, m) - θ_{y} (i, m) |}{max \{θ_{x} (i, m), θ_{y} (i, m)\}}

(12)

s_{l} (i, m) = 1 - \frac{| l_{m} (i, m) - l_{m + 1} (i, m) |}{max \{l_{m} (i, m), l_{m + 1} (i, m)\}}

(13)

Finally, we obtain the similarity of the neighborhood similar triangle

s_{Δ} (i, m)

:

s_{Δ} (i, m) = \frac{s_{θ} (i, m) + s_{l} (i, m)}{2}

(14)

where

s_{Δ} (i, m) \in [0, 1]

and a larger value indicates a higher similarity. In reality, there is a certain error interference between images, making it impossible for the similarity to reach 1; thus, by setting a predefined threshold

τ_{1}

, we construct a neighborhood topology consistency loss function

s_{T A T} (i, m)

:

s_{T A T} (i, m) = \{\begin{matrix} f (i, m), & s_{Δ} (i, m) \leq τ_{1} \\ 0, & s_{Δ} (i, m) > τ_{1} \end{matrix}

(15)

If the similarity

s_{Δ} (i, m)

is higher than the threshold

τ_{1}

, there has no punishment for the i-th feature correspondence. Otherwise, it is necessary to further verify whether the topological structure of the feature point pairs conforms to an affine transformation model.

As shown in Figure 5, there may be an affine transformation between the two neighborhood structures. A pair of correct matching points and their corresponding neighborhood points are shown in the above part of the figure, the correct matching correspondence within the scope of the local neighborhood should retain the correct neighborhood point correspondences; the below part shows a pair of false matching points and their respective corresponding neighborhood points.

Similarly, the corresponding relationship between neighborhood points is also false and does not satisfy the affine transformation model. From the hypothetical set

S_{i}

, choose three pairs of neighborhood point correspondences and substitute them into Equation (7), we find an affine transformation model set

T = {\{T_{1}, T_{2}, \dots, T_{n}\}}_{n = 1}^{N}

, where

N = C_{L}^{3}

. If

(x_{i}, y_{i})

is a pair of false matches, then the calculated affine transformation T will result in a false corresponding point

{y_{i}}^{'}

, and there will be a larger distance error d between this point and the false matching point

y_{i}

; if

(x_{i}, y_{i})

is a pair of correct matches, then it will satisfy the following:

y_{i} = T \cdot x_{i}

(16)

T_{m}

is randomly selected from the affine transformation model set T, we construct a symmetric transfer error function:

d_{m} (x_{i}, y_{i}) = |y_{i} - T_{m} \cdot x_{i}| + |x_{i} - T_{m}^{- 1} y_{i}|

(17)

Considering the actual situation, the matching point pair cannot absolutely maintain zero error, we set a predefined threshold

τ_{2}

to construct a metric function

f (i, m)

:

f (i, m) = \{\begin{matrix} 1, & d_{m} (x_{i}, y_{i}) > τ_{2} \\ 0, & d_{m} (x_{i}, y_{i}) \leq τ_{2} \end{matrix}

(18)

Finally, we add up all the loss functions of its neighborhood to find the loss function

d_{T A T}

of the i-th hypothetical correspondence relationship, which is defined as:

d_{T A T} (x_{i}, y_{i}) = \sum_{m = 1}^{L} s_{T A T} (i, m)

(19)

Ideally, for the correct pair of matching points

(x_{i}, y_{i})

, the function

d_{T A T} (x_{i}, y_{i})

should be equal to 0.

By adding the loss function

d_{T A T}

to Equation (3), we find the cost function of the TAT algorithm:

\begin{matrix} \begin{matrix} C (I; S, λ) = \sum_{i \in I} \frac{1}{2 K} (\sum_{j ∣ x_{j} \in N_{x_{i}}} {(d (x_{i}, x_{j}) - d (y_{i}, y_{j}))}^{2} \\ + \sum_{j ∣ y_{j} \in N_{y_{i}}} {(d (x_{i}, x_{j}) - d (y_{i}, y_{j}))}^{2} + d_{T A T} (x_{i}, y_{i})) + λ (N - | I |) \end{matrix} \end{matrix}

(20)

3.6. A Closed-Form Solution

We associate the hypothetical set S with a constructed binary vector

p

of size

N \times 1

, where

p_{i} = 1

indicates that the i-th corresponding relationship

(x_{i}, y_{i})

is a correct match, which is an inlier, and

p_{i} = 0

is an incorrect match, which is an outlier. Substituting the binary vector

p

and the distance d in Equations (4) and (5) into Equation (20), the cost function C can be simplified as:

\begin{matrix} \begin{matrix} C (p; S, λ, τ_{1}, τ_{2}) = \sum_{i = 1}^{N} \frac{p_{i}}{2 K} (\sum_{j ∣ x_{j} \in N_{x_{i}}} d (y_{i}, y_{j}) \\ + \sum_{j ∣ y_{j} \in N_{y_{i}}} d (x_{i}, x_{j}) + d_{T A T} (x_{i}, y_{i})) + λ (N - \sum_{i = 1}^{N} p_{i}) \end{matrix} \end{matrix}

(21)

The item

\sum_{j ∣ x_{j} \in N_{x_{i}}} d (y_{i}, y_{j})

can be rewritten as:

\begin{matrix} \begin{matrix} \sum_{j ∣ x_{j} \in N_{x_{i}}} d (y_{i}, y_{j}) & = \sum_{j ∣ x_{j} \in N_{x_{i}}, y_{j} \in N_{y_{i}}} d (y_{i}, y_{j}) + \sum_{j ∣ x_{j} \in N_{x_{i}}, y_{j} \notin N_{y_{i}}} d (y_{i}, y_{j}) \\ = 0 + c o u n t (j ∣ x_{j} \in N_{x_{i}}, y_{j} \notin N_{y_{i}}) \\ = K - c o u n t (j ∣ x_{j} \in N_{x_{i}}, y_{j} \in N_{y_{i}}) \\ = K - n_{i} \end{matrix} \end{matrix}

(22)

where

c o u n t (\cdot)

means to count the number of elements in the set, and

n_{i}

represents the number of elements shared in the two neighborhoods

N_{x_{i}}

and

N_{y_{i}}

. Similarly,

\sum_{j ∣ y_{j} \in N_{y_{i}}} d (x_{i}, x_{j})

can also be rewritten as:

\sum_{j ∣ y_{j} \in N_{y_{i}}} d (x_{i}, x_{j}) = K - n_{i} = \sum_{j ∣ x_{j} \in N_{x_{i}}} d (y_{i}, y_{j})

(23)

Substituting Equation (23) into Equation (21) can produce the latest cost function C:

\begin{matrix} \begin{matrix} C (p; S, λ, τ_{1}, τ_{2}) = & \sum_{i = 1}^{N} \frac{p_{i}}{K} (\sum_{j ∣ x_{j} \in N_{x_{i}}} d (y_{i}, y_{j}) \\ + d_{T A T} (x_{i}, y_{i})) + λ (N - \sum_{i = 1}^{N} p_{i}) \end{matrix} \end{matrix}

(24)

We search for the K nearest neighbors of each feature point

x_{i}

and

y_{i}

under Euclidean distance to form their respective neighborhoods

N_{x_{i}}

and

N_{y_{i}}

in the calculation process. However, the hypothetical matches are not evenly distributed in the image, and the different settings of the hypothetical set S will also lead to different proportions of false matches. Therefore, the optimal value of K is not fixed. To solve this problem, we set a varying K, which is called a multi-scale neighborhood representation [16]. The size of K is defined as

K = {\{K_{q}\}}_{q = 1}^{M}

, where

N_{x_{i}}

is the neighborhood composed of

K_{q}

nearest neighbors of feature point

x_{i}

under Euclidean distance. Substituting it into Equation (24), the cost function C under the multi-scale neighborhood representation can be obtained:

\begin{matrix} \begin{matrix} C (p; S, λ, τ_{1}, τ_{2}) = & \sum_{i = 1}^{N} p_{i} (\sum_{q = 1}^{M} \frac{1}{M K_{q}} (\sum_{j ∣ x_{j} \in N_{x_{i}}} d (y_{i}, y_{j}) \\ + d_{T A T} (x_{i}, y_{i}))) + λ (N - \sum_{i = 1}^{N} p_{i}) \end{matrix} \end{matrix}

(25)

where

1 / M

is a normalization factor. We hope to obtain an objective function that is unchanged after translation, zoom, flip, and rotation. Minimizing Equation (25) can eliminate the mismatch in the hypothetical set S and establish an accurate matching correspondence. Therefore, in order to optimize the objective function in Equation (25), we combine it with the terms related to the binary vector

p_{i}

to obtain the reorganized cost function C:

C (p; S, λ, τ_{1}, τ_{2}) = \sum_{i = 1}^{N} p_{i} (c_{i} - λ) + λ N

(26)

where

c_{i} = \sum_{q = 1}^{M} \frac{1}{M K_{q}} (\sum_{j ∣ x_{j} \in N_{x_{i}}} d (y_{i}, y_{j}) + d_{T A T} (x_{i}, y_{i}))

(27)

Equation (27) is used to measure whether the i-th corresponding relationship

(x_{i}, y_{j})

satisfies the consistency of its neighborhood topology and the geometric constraints of the affine transformation model. Clearly, a correct match will make the cost function value equal to or close to zero, while a false match will cause the cost function value to be equal to or close to one.

Once the hypothetical set S is given, the neighborhood relationship of the feature points is also fixed, and thus we can calculate the cost value

c_{i}

of each feature point in advance. Therefore, only the variable

p_{i}

of Equation (26) is unknown, we only need to solve it: If the cost value

c_{i}

of any corresponding relationship is less than

λ

, it will cause Equation (26) to be negative and reduce the objective function; on the contrary, if the cost value

c_{i}

of any corresponding relationship is greater than

λ

, it will cause Equation (26) to be positive and increase the objective function. Therefore, we set a simple standard to determine the optimal solution of

p

:

p_{i} = \{\begin{matrix} 1, & c_{i} \leq λ \\ 0, & c_{i} > λ, i = 1, \dots, N \end{matrix}

(28)

Finally, the optimal inlier set

I

is represented as:

I^{*} = \{i | p_{i} = 1, i = 1, \dots, N\}

(29)

The procedure of our TAT is summarized in Algorithm 1. In Equation (28), the parameter

λ

can also be used as a threshold to judge whether the hypothetical corresponding match is correct. In particular, when

c_{i} = λ

, the first term is equal to zero; therefore, the value of

p_{i}

can be set arbitrarily.

Algorithm 1: The TAT Algorithm

Input: hypothetical

S = {(x_{i}, y_{i})}_{i = 1}^{N}

, parameters

K

,

λ

,

τ_{1}

,

τ_{2}

Output: inlier set

I^{*}

1: Construct neighborhood ${\{N_{x_{i}}^{K_{q}}, N_{y_{i}}^{K_{q}}\}}_{q = 1, i = 1}^{M, N}$ on S;

2: Compute loss ${\{d_{T A T} (x_{i}, y_{i})\}}_{i = 1}^{N}$ using Equation (19);

3: Calculate cost ${\{c_{i}\}}_{i = 1}^{N}$ using Equation (27);

4: Determine $I^{*}$ using Equations (28) and (29).

3.7. Computational Complexity

By using the K-D tree algorithm to search for

K_{q}

nearest neighbors for each feature point, since there are a total of N feature-point correspondences in the hypothetical set S, the time complexity of this process is about

O ((K_{q} + N) {log}_{} N)

. We applied the multi-scale neighborhoods representation

K = {\{K_{q}\}}_{q = 1}^{M}

. The step in line 1 of Algorithm 1 is the most time-consuming part of the entire TAT algorithm, and its time complexity reaches

O ((\sum_{q = 1}^{M} K_{q} + M N) {log}_{} N)

.

In particular, since

q < M

, the

K_{q}

neighborhood

N_{x_{i}}^{K_{q}}

is included in the

K_{M}

neighborhood

N_{x_{i}}^{K_{M}}

. The

K_{q}

neighborhood

N_{x_{i}}^{K_{q}}

of each feature point can be directly obtained from the corresponding

K_{M}

neighborhood

N_{x_{i}}^{K_{M}}

that has been calculated. In Equation (29), the calculation process of the cost value

c_{i}

includes calculating the angle of the corresponding angles of the neighborhood structure, the length ratio of the corresponding sides of the neighborhood structure and the symmetric transfer error of the feature point, and the time complexity is less than

O (3 N \sum_{q = 1}^{M} K_{q})

.

Therefore, the total time complexity of our TAT algorithm is about

O ((\sum_{q = 1}^{M} K_{q} + M N) {log}_{} N + 3 N \sum_{q = 1}^{M} K_{q})

. Since both M and

\sum_{q = 1}^{M} K_{q}

are constants much smaller than N, omitting these smaller terms, the final time complexity of the TAT algorithm can be expressed as

O (N {log}_{} N)

. Due to the memory requirements of the storage neighborhoods

\{N_{x_{i}}^{K_{q}}\}

and

\{N_{y_{i}}^{K_{q}}\}

, the space complexity of our TAT is

O (M K_{M} N)

.

As of

M K_{M} < N

, the time and space complexity of our algorithm can be expressed as

O (N {log}_{} N)

and

O (N)

. In other words, our TAT algorithm has linear time complexity and linear space complexity for the scale of a given hypothetical set. This is crucial for problems in complex scenarios and applications that require real-time performance.

4. Experiment Results and Discussion

In this section, we tested our TAT on the dataset and compared it with other mainstream image matching algorithms, including RANSAC [13], ICF [40], GS [47], MAGSAC++ [48], and LPM [16]. Below, we first describe the implementation details of our algorithm; then, we present the experimental datasets, which contain different types of remote-sensing images; and finally, we show in detail the qualitative, quantitative, and robust experimental results of our method.

4.1. Implementation Details

Our TAT is based on the initial matching point pairs obtained by the SIFT, by strengthening the neighborhood topology constraints and local area affine transformation constraints of the matching point pairs, it can eliminate the false matching point pairs and obtain accurate matching correspondences. We use python code to implement our algorithm. The experiments were conducted on a PC with 2.60 GHz Intel(R) Core(TM) i5-7200U CPU and 4GB memory. To establish the ground truth, i.e., determining the matching correctness of each correspondence, we first used TAT to establish rough correspondences and then confirmed the results artificially, including both the preserved and removed correspondences produced by TAT.

Parameter Setting: There are mainly four parameters in our method:

K

,

τ_{1}

,

τ_{2}

, and

λ

. Parameter

K

controls the size of the multi-scale neighborhood, and represents the number of neighborhood points; parameter

τ_{1}

represents the similarity threshold of the neighborhood similar triangles. When the similarity is lower than

τ_{1}

, it is necessary to further verify whether the topological structure of the feature point pair conforms to an affine transformation model. Parameter

τ_{2}

represents the threshold of the symmetric transition error. It means that there is no penalty for the matching point when the error is lower than

τ_{2}

. Parameter

λ

represents the threshold of distinguishing inliers and outliers. It means that the matching point pair is correct when the cost

c_{i}

is lower than

λ

. In our experiments, we set

K

= [4, 6, 8],

τ_{1}

= 0.6,

τ_{2}

= 10,

λ

= 0.6.

4.2. Datasets

In order to fully test the performance of our algorithm, we conducted experiments on four different types of remote-sensing-image datasets. These images have different transformation characteristics and different matching difficulties.

(1) UAV: This test dataset contains 80 pairs of color images with a size of 600 × 377, which were taken by us using an unmanned aerial vehicle (UAV) over a field of farmland. The feature matching task for this type of image is usually an automatic monitoring task for crops. Most of these images contain only grassland. Due to the diversification of crops and the changes in the shooting angle of UAV, these image pairs are subject to projection distortion and dissimilarities in the local field of view.

(2) PAN: This test dataset contains 76 pairs of images ranging from 600 × 700 to 700 × 700 in size, which are all panchromatic (PAN) aerial photographs taken with a frame camera over Tokyo, Japan and Wuhan, China at different times. There are obvious differences in appearance and viewpoint changes between these image pairs, such as mountains and buildings. These change features often appear in the feature matching task of change detection and cannot be modeled by simple rigid transformations, which greatly increases the difficulty of feature matching.

(3) SAR: This test dataset contains 53 pairs of 800 × 800 SAR images, which were captured over Nantong, Jiangsu Province, China. For each SAR image pair, it is obtained by SAR installed on the satellite and UAV, respectively. As there is a lot of noise during the SAR imaging process, the quality of the image is not high and there is affine transformation, which brings great difficulties to feature matching. This type of image-matching task usually occurs in positioning and navigation problems, that is, real-time matching of UAV images and satellite images stored in the database to obtain the current precise position.

(4) CIAP: This test dataset contains 92 pairs of 700 × 700 CIAP image pairs, and there is a small overlap area at the edge. After these image pairs have been orthorectified, there is only a simple rigid transformation. This type of image-matching task usually appears in image stitching tasks. These images are public (example data from Erdas), and they were taken in eastern Illinois, IL, USA.

4.3. Qualitative Results

In this section, Figure 7 shows the results of our method for feature-matching experiments on several pairs of representative images selected from four different types of remote-sensing datasets. For clearer performance visibility, we only show true positives, false positives, and false negatives in the results. There are four rows from top to bottom, representing the four datasets of UAV, PAN, SAR, and CIAP, and each row has two pairs of images. The first row of UAV images are captured by drones in the air, and these images have the problems of projection distortion and dissimilarity in the local field of view.

The second row of PAN images contains a large number of buildings, which have obvious differences in appearance and large viewpoint changes; the third row of SAR images is caused by different imaging sensors, which has obvious noise effects; the fourth row of CIAP images has a small overlapping area at the edge. The feature-matching problems presented by the above four types of remote-sensing-image datasets are widespread and urgent problems in image-matching tasks.

In each pair of feature-matching results, the left part is the feature correspondence established by our algorithm, blue represents true positives, green represents false positives, and red represents false negatives. The right part shows a vector cluster map of feature correspondences, where black represents true negatives. We use the classic SIFT feature matching algorithm to construct the initial matching, assuming that the set contains more initial correspondences; however, this matching strategy is relatively simple and less binding. Therefore, the initial inliers in the putative sets are relatively low, which is a huge challenge to remove the mismatch and establish an accurate correspondence.

In particular, each SAR image pair is obtained by SAR installed on the satellite and UAV, respectively. This results in noisy, low-quality images and affine transformations between image pairs, which needs our affine-consistency-based approach more. By carefully observing each pair of SAR images, we can find that the viewpoint changes obviously due to the different heights of the UAV and the satellite. It is similar to what we show in Figure 6 in Section 3.4. In this case, the relative direction consistency cannot be well maintained for some correct matching point pairs, and thus LPM erroneously eliminates them.

However, its surrounding affine relationship can still be kept correct. Our TAT method can preserve it, thus, improving the matching accuracy. Experimental results show that our method can find almost all correct matches and remove false matches well. This shows that our method can well deal with these matching problems of various types of images in the remote sensing field, and our method has high accuracy and good generality.

4.4. Quantitative Results

As shown in Figure 8, we present the initial inlier percentages for four different types of remote-sensing datasets in the first row, namely UAV, PAN, SAR, and CIAP. The second to fourth rows, respectively, show the precision, recall, and running time of several different algorithms including RANSAC [13], ICF [40], GS [47], MAGSAC++ [48], LPM [16], and our TAT algorithm tested on these four remote-sensing datasets.

As far as the overall analysis is concerned, it can be seen from the first row that the initial inlier percentages in the UAV and PAN datasets are relatively high; however, neither exceed 30%. The initial inlier percentages in the SAR and CIAP datasets are relatively low, where the average inlier percentages are 12.43% and 15.12%, respectively. This means that there will be more mismatches when the initial inlier percentage is relatively low, which will bring certain challenges to the matching task, making the four comparison methods lack robustness.

From the precision and recall curves in the second and third rows, all methods are the worst performers on the SAR dataset because SAR images come from different imaging sensors, and there are a lot of noise effects that make the image quality worse, which greatly increases the difficulty of matching. In contrast, all methods perform best on the PAN and CIAP datasets. The main reason is that although the two types of images have distinct differences in the appearance of buildings and mountains with small overlapping areas on the edges, in general, these images are fairly simple rigid transformations that benefit the algorithm to eliminate mismatches.

As far as the analysis of the characteristics of each algorithm is concerned, because the RANSAC and ICF perform poorly on the SAR and CIAP datasets, they are significantly affected by the initial inlier percentage. Furthermore, compared with other methods, RANSAC exhibits the worst matching performance when the image pair initial inlier percentage is low, while ICF can maintain a high recall. MAGSAC++ is an improvement over RANSAC that eliminates the need to artificially set inlier thresholds in RANSAC, resulting in good performance on CIAP datasets with low inlier rates but with simple rigid transformations.

However, the performance is relatively poor in the presence of severe non-rigid transformations, such as UAV, especially for SAR, where the noise is obvious and the image quality is low, the precision and recall rates are low. Compared with the RANSAC and ICF algorithms, the GS shows better results when the initial inlier percentage is relatively low and can maintain a high precision and recall as much as possible. The GS algorithm is not affine invariant and cannot automatically estimate the factors of the affinity matrix; thus, it is still not satisfactory compared with the LPM and the TAT.

In general, the LPM and TAT show the best matching results on the four datasets, mainly due to the constraints of neighborhood consistency. It is worth mentioning that since our TAT adds strong constraints to the affine transformation model, the TAT performs better in both precision and recall when individual image pairs have severe affine distortion and large viewpoint transformation.

We also show the running time of these six algorithms on various datasets in the last row. It can be seen from the experimental results that the LPM and TAT algorithms have high efficiency, mainly because these two algorithms formulate the feature-matching problem as a mathematical model and derive a closed solution with linear time and space complexity. Thus, the matching problem can be processed in milliseconds. Although our TAT algorithm is more time-consuming than the LPM algorithm, it has a qualitative improvement over the four algorithms of RANSAC, ICF, GS, and MAGSAC++, and can still be satisfied with real-time matching tasks.

4.5. Robustness Test

We conducted robust experiments to test the robustness of our TAT. We keep the number of true matches in the hypothetical set constant, change the inlier ratio by increasing the number of hypothetical matches, and then test the matching performance of the above six methods separately. Specifically, for each feature point, we search the L most similar corresponding feature points with the SIFT. There are a total of N hypothetical matches in the hypothetical set in the case of the initial one-to-one feature point. Then, there are

L N

hypothetical matches in the hypothetical set when the corresponding number of feature points becomes one-to-L.

We set different L values to obtain the mean of different inlier ratios in four remote-sensing datasets. As shown in Table 1, clearly, the inlier ratio decreases with the increase of L. In this way, we can increase the challenge of matching by increasing the number of false matches in the hypothetical set. On the other hand, searching for the L most similar corresponding points may be more accurate to find the correct match. As the correct matching point corresponding to each feature point is not necessarily the first most similar point.

The robustness experiment results are shown in Figure 9. The first row shows the mean and standard deviation of the different inlier ratios for the four remote-sensing datasets. From the precision and recall data shown in the second and third rows, respectively, we can clearly see that the matching performance of all methods deteriorates as the inlier rate decreases. Specifically, the precision and recall of RANSAC drop significantly on all four datasets. The precision of ICF is significantly reduced; however, the recall is relatively high. GS always maintains better results than RANSAC and ICF. MAGSAC++ also maintains better results because it is less affected by the interior point ratio. In contrast, TAT can maintain better and more stable matching performance than LPM and other methods.

5. Conclusions

In this paper, we proposed a new and efficient method for removing mismatches for robust feature matching on remote-sensing images. The principle of our method is that, between two images in the same or similar scene, the neighborhood topology of the corresponding feature points should be consistent, and there are affine constraints in the local area of the image. We built these two constraints as a mathematical model and derived a closed solution with linear time complexity.

We tested our method on four different types of remote-sensing-image datasets and obtained qualitative and quantitative results of feature-matching experiments. The results show that our method can handle various remote-sensing image-matching tasks and is general. At the same time, from the perspective of the running time of the experiment, our method can process the image-matching task quickly and efficiently, has real-time performance, and can be used for real-time matching tasks.

Author Contributions

Methodology, Software, Data Curation, and Writing—Original Draft, X.G.; Formal analysis, F.Y.; Writing—Review and Editing, J.M., J.J. and T.L.; Writing—Review and Editing and Supervision, H.Z. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62171328, 62171327, 62072350, and 61773295, in part by the Key Scientific and Technological Research Project of Hubei Provincial Education Department under Grant D20201507, in part by the first batch of application basic technology and science research foundation in Hubei Nuclear Power Operation Engineering Technology Research Center under Grant B210610.

Data Availability Statement

The data that used in this study can be requested by contacting the first author.

Acknowledgments

We sincerely thank the authors of RANSAC, ICF, GS, MAGSAC++, and LPM for providing their algorithm codes to facilitate the comparative experiments, and we thank the authors of reference [41] for collecting remote-sensing datasets for our experiments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ma, J.; Jiang, X.; Fan, A.; Jiang, J.; Yan, J. Image matching from handcrafted to deep features: A survey. Int. J. Comput. Vis. 2021, 129, 23–79. [Google Scholar] [CrossRef]
Jiang, X.; Ma, J.; Xiao, G.; Shao, Z.; Guo, X. A review of multimodal image matching: Methods and applications. Inf. Fusion 2021, 73, 22–71. [Google Scholar] [CrossRef]
Zitova, B.; Flusser, J. Image registration methods: A survey. Image Vis. Comput. 2003, 21, 977–1000. [Google Scholar] [CrossRef] [Green Version]
Wong, A.; Clausi, D.A. ARRSI: Automatic registration of remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1483–1493. [Google Scholar] [CrossRef]
Ma, J.; Chen, C.; Li, C.; Huang, J. Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 2016, 31, 100–109. [Google Scholar] [CrossRef]
Jiang, J.; Ma, J.; Chen, C.; Wang, Z.; Cai, Z.; Wang, L. SuperPCA: A superpixelwise PCA approach for unsupervised feature extraction of hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4581–4593. [Google Scholar] [CrossRef] [Green Version]
Bang, S.; Kim, H.; Kim, H. UAV-based automatic generation of high-resolution panorama at a construction site with a focus on preprocessing for image stitching. Autom. Constr. 2017, 84, 70–80. [Google Scholar] [CrossRef]
Yang, T.; Li, J.; Yu, J.; Wang, S.; Zhang, Y. Diverse scene stitching from a large-scale aerial video dataset. Remote Sens. 2015, 7, 6932–6949. [Google Scholar] [CrossRef] [Green Version]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Bay, H.; Tuytelaars, T.; Gool, L.V. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
Wang, C.; Wang, L.; Liu, L. Progressive mode-seeking on graphs for sparse feature matching. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 788–802. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Ma, J.; Zhao, J.; Tian, J.; Bai, X.; Tu, Z. Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognit. 2013, 46, 3519–3532. [Google Scholar] [CrossRef]
He, Z.; Shen, C.; Wang, Q.; Zhao, X.; Jiang, H. Mismatching Removal for Feature-Point Matching Based on Triangular Topology Probability Sampling Consensus. Remote Sens. 2022, 14, 706. [Google Scholar] [CrossRef]
Ma, J.; Zhao, J.; Jiang, J.; Zhou, H.; Guo, X. Locality preserving matching. Int. J. Comput. Vis. 2019, 127, 512–531. [Google Scholar] [CrossRef]
Lin, H.; Du, P.; Zhao, W.; Zhang, L.; Sun, H. Image registration based on corner detection and affine transformation. In Proceedings of the 2010 third International Congress on Image and Signal Processing, Yantai, China, 16–18 October 2010; pp. 2184–2188. [Google Scholar]
Liu, Y.; De Dominicis, L.; Wei, B.; Chen, L.; Martin, R.R. Regularization based iterative point match weighting for accurate rigid transformation estimation. IEEE Trans. Vis. Comput. Graph. 2015, 21, 1058–1071. [Google Scholar] [CrossRef] [Green Version]
Wang, G.; Wang, Z.; Chen, Y.; Zhao, W. A robust non-rigid point set registration method based on asymmetric gaussian representation. Comput. Vis. Image Underst. 2015, 141, 67–80. [Google Scholar] [CrossRef]
Guo, X.; Cao, X. Good match exploration using triangle constraint. Pattern Recognit. Lett. 2012, 33, 872–881. [Google Scholar] [CrossRef]
Yang, Y.; Ong, S.H.; Foong, K.W.C. A robust global and local mixture distance based non-rigid point set registration. Pattern Recognit. 2015, 48, 156–173. [Google Scholar] [CrossRef]
Gao, Y.; Ma, J.; Yuille, A.L. Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples. IEEE Trans. Image Process. 2017, 26, 2545–2560. [Google Scholar] [CrossRef]
Ma, J.; Jiang, J.; Liu, C.; Li, Y. Feature guided Gaussian mixture model with semi-supervised EM and local geometric constraint for retinal image registration. Inf. Sci. 2017, 417, 128–142. [Google Scholar] [CrossRef]
Wang, G.; Wang, Z.; Chen, Y.; Zhou, Q.; Zhao, W. Removing mismatches for retinal image registration via multi-attribute-driven regularized mixture model. Inf. Sci. 2016, 372, 492–504. [Google Scholar] [CrossRef]
Yang, K.; Pan, A.; Yang, Y.; Zhang, S.; Ong, S.H.; Tang, H. Remote-sensing image registration using multiple image features. Remote Sens. 2017, 9, 581. [Google Scholar] [CrossRef] [Green Version]
Wei, Z.; Han, Y.; Li, M.; Yang, K.; Yang, Y.; Luo, Y.; Ong, S.H. A small UAV based multi-temporal image registration for dynamic agricultural terrace monitoring. Remote Sens. 2017, 9, 904. [Google Scholar] [CrossRef] [Green Version]
Fan, Z.; Liu, Y.; Liu, Y.; Zhang, L.; Zhang, J.; Sun, Y.; Ai, H. 3MRS: An Effective Coarse-to-Fine Matching Method for Multimodal Remote Sensing Imagery. Remote Sens. 2022, 14, 478. [Google Scholar] [CrossRef]
Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
Oliveira, F.P.; Tavares, J.M.R. Medical image registration: A review. Comput. Methods Biomech. Biomed. Eng. 2014, 17, 73–93. [Google Scholar] [CrossRef]
Zhou, H.; Ma, J.; Tan, C.C.; Zhang, Y.; Ling, H. Cross-weather image alignment via latent generative model with intensity consistency. IEEE Trans. Image Process. 2020, 29, 5216–5228. [Google Scholar] [CrossRef]
Le Moigne, J.; Campbell, W.J.; Cromp, R.F. An automated parallel image registration technique based on the correlation of wavelet features. IEEE Trans. Geosci. Remote Sens. 2002, 40, 1849–1864. [Google Scholar] [CrossRef]
Reddy, B.S.; Chatterji, B.N. An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 1996, 5, 1266–1271. [Google Scholar] [CrossRef] [Green Version]
Rangarajan, A.; Chui, H.; Duncan, J.S. Rigid point feature registration using mutual information. Med. Image Anal. 1999, 3, 425–440. [Google Scholar] [CrossRef]
Loiola, E.M.; de Abreu, N.M.M.; Boaventura-Netto, P.O.; Hahn, P.; Querido, T. A survey for the quadratic assignment problem. Eur. J. Oper. Res. 2007, 176, 657–690. [Google Scholar] [CrossRef]
Sofka, M.; Yang, G.; Stewart, C.V. Simultaneous covariance driven correspondence (cdc) and transformation estimation in the expectation maximization framework. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 18–23 June 2007; pp. 1–8. [Google Scholar]
Maier, J.; Humenberger, M.; Murschitz, M.; Zendel, O.; Vincze, M. Guided matching based on statistical optical flow for fast and robust correspondence analysis. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 101–117. [Google Scholar]
Torr, P.H.; Zisserman, A. MLESAC: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 2000, 78, 138–156. [Google Scholar] [CrossRef] [Green Version]
Chum, O.; Matas, J. Matching with PROSAC-progressive sample consensus. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 220–226. [Google Scholar]
Ma, J.; Zhao, J.; Tian, J.; Yuille, A.L.; Tu, Z. Robust point matching via vector field consensus. IEEE Trans. Image Process. 2014, 23, 1706–1721. [Google Scholar] [CrossRef] [Green Version]
Li, X.; Hu, Z. Rejecting mismatches by correspondence function. Int. J. Comput. Vis. 2010, 89, 1–17. [Google Scholar] [CrossRef]
Ma, J.; Jiang, J.; Zhou, H.; Zhao, J.; Guo, X. Guided locality preserving feature matching for remote-sensing image registration. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4435–4447. [Google Scholar] [CrossRef]
Wu, T.; Ge, Y.; Wang, J.; Stein, A.; Song, Y.; Du, Y.; Ma, J. A WTLS-based method for remote-sensing imagery registration. IEEE Trans. Geosci. Remote Sens. 2014, 53, 102–116. [Google Scholar]
Ma, J.; Zhou, H.; Zhao, J.; Gao, Y.; Jiang, J.; Tian, J. Robust feature matching for remote-sensing image registration via locally linear transforming. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6469–6481. [Google Scholar] [CrossRef]
Wang, G.; Wang, Z.; Chen, Y.; Liu, X.; Ren, Y.; Peng, L. Learning coherent vector fields for robust point matching under manifold regularization. Neurocomputing 2016, 216, 393–401. [Google Scholar] [CrossRef]
Zhou, H.; Ma, J.; Yang, C.; Sun, S.; Liu, R.; Zhao, J. Nonrigid feature matching for remote-sensing images via probabilistic inference with global and local regularizations. IEEE Geosci. Remote Sens. Lett. 2016, 13, 374–378. [Google Scholar] [CrossRef]
Ma, J.; Ye, X.; Zhou, H.; Mei, X.; Fan, F. Loop-closure detection using local relative orientation matching. IEEE Trans. Intell. Transp. Syst. 2021. [Google Scholar] [CrossRef]
Liu, H.; Yan, S. Common visual pattern discovery via spatially coherent correspondences. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1609–1616. [Google Scholar]
Barath, D.; Noskova, J.; Ivashechkin, M.; Matas, J. MAGSAC++, a fast, reliable and accurate robust estimator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1304–1312. [Google Scholar]

Figure 1. Different neighborhood topological construction shown by inlier (above) and outlier (below). Blue: true match; red: false match; yellow: neighborhood points; and green: topological construction.

Figure 2. The neighborhood triangle formed by

x_{i}

and

y_{i}

and their neighborhood points. The same color indicates the corresponding similar triangles.

Figure 2. The neighborhood triangle formed by

x_{i}

and

y_{i}

and their neighborhood points. The same color indicates the corresponding similar triangles.

Figure 3. Schematic diagram of affine transformation. Image affine transformation can keep the straightness, parallelness, and relative position unchanged. Affine transformation includes rotation, shearing, translation, and scaling, etc.

Figure 4. Comparison

I

of LPM and TAT: eliminate false matching point pairs. SIFT algorithm initial matching result (row 1, left); LPM algorithm matching result (row 2, left); TAT algorithm matching result (row 2, left); matching result vector cluster map(row 1 to row 3, right); schematic diagram of displacement vector of LPM algorithm (row 4, left); and schematic diagram of affine model constraint of TAT algorithm (row 4, right).

Figure 4. Comparison

I

of LPM and TAT: eliminate false matching point pairs. SIFT algorithm initial matching result (row 1, left); LPM algorithm matching result (row 2, left); TAT algorithm matching result (row 2, left); matching result vector cluster map(row 1 to row 3, right); schematic diagram of displacement vector of LPM algorithm (row 4, left); and schematic diagram of affine model constraint of TAT algorithm (row 4, right).

Figure 5. Schematic diagram of our affine transformation constraint. The affine transformation model T constructed by the neighborhood points of the correct matching points will result in a small or zero symmetry transition error d (above). Conversely, mismatched points will bring about a larger symmetrical transfer error d (below).

Figure 6. Comparison

II

of LPM and TAT: keep correct matching point pairs. SIFT algorithm initial matching result (row 1, left); LPM algorithm matching result (row 2, left); TAT algorithm matching result (row 2, left); matching result vector cluster map (row 1–row 3, right); schematic diagram of displacement vector of LPM algorithm (row 4, left); and schematic diagram of affine model constraint of TAT algorithm (row 4, right).

Figure 6. Comparison

II

of LPM and TAT: keep correct matching point pairs. SIFT algorithm initial matching result (row 1, left); LPM algorithm matching result (row 2, left); TAT algorithm matching result (row 2, left); matching result vector cluster map (row 1–row 3, right); schematic diagram of displacement vector of LPM algorithm (row 4, left); and schematic diagram of affine model constraint of TAT algorithm (row 4, right).

Figure 7. Qualitative matching results of our TAT on some typical image pairs from the four test datasets, such as UAV (first row), PAN (second row), SAR (third row), and CIAP (fourth row). The initial inlier percentages on the eight image pairs (from top to bottom, left to right) are 30.12%, 26.73%, 27.65%, 26.41%, 10.57%, 13.19%, 15.24%, and 11.64%. The head and tail of each arrow in the motion field correspond to the positions of the corresponding feature points in the two images (blue = true positive, black = true negative, green = false negative, and red = false positive). For visibility, in the image pairs, only no more than 150 randomly selected matches are presented, and the true negatives are not shown. Best viewed in color.

Figure 8. Quantitative comparisons of RANSAC [13], ICF [40], GS [47], MAGSAC++ [48] LPM [16], and TAT on four test datasets, such as UAV, PAN, SAR, and CIAP. (Top–Bottom) Initial inlier ratio, precision, recall, and run time with respect to the cumulative distribution.

Figure 9. Robust test of RANSAC [13], ICF [40], GS [47], MAGSAC++ [48] LPM [16], and TAT on four test datasets, such as UAV, PAN, SAR, and CIAP. (Top to Bottom) Initial inlier ratio, precision, recall with respect to different values of L, i.e., the number of nearest neighbors used for putative set construction. Error bars: means and standard deviations on the corresponding dataset.

Table 1. Average median percentage for different L values for hypothetical sets of four datasets.

L	1	2	3	4	5
UAV	27.39%	13.12%	8.24%	6.87%	5.38%
PAN	29.46%	15.31%	10.12%	8.03%	6.13%
SAR	12.43%	6.08%	4.15%	3.21%	2.92%
CIAP	15.12%	8.11%	5.14%	4.22%	3.31%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gong, X.; Yao, F.; Ma, J.; Jiang, J.; Lu, T.; Zhang, Y.; Zhou, H. Feature Matching for Remote-Sensing Image Registration via Neighborhood Topological and Affine Consistency. Remote Sens. 2022, 14, 2606. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14112606

AMA Style

Gong X, Yao F, Ma J, Jiang J, Lu T, Zhang Y, Zhou H. Feature Matching for Remote-Sensing Image Registration via Neighborhood Topological and Affine Consistency. Remote Sensing. 2022; 14(11):2606. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14112606

Chicago/Turabian Style

Gong, Xi, Feng Yao, Jiayi Ma, Junjun Jiang, Tao Lu, Yanduo Zhang, and Huabing Zhou. 2022. "Feature Matching for Remote-Sensing Image Registration via Neighborhood Topological and Affine Consistency" Remote Sensing 14, no. 11: 2606. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14112606

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Matching for Remote-Sensing Image Registration via Neighborhood Topological and Affine Consistency

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. Problem Formulation

3.2. Consistency of Neighborhood Structure

3.3. Affine Transformation Model

3.4. Affine Transformation Constraint

3.5. Neighborhood Topology Consistency and Affine Transformation Model

3.6. A Closed-Form Solution

3.7. Computational Complexity

4. Experiment Results and Discussion

4.1. Implementation Details

4.2. Datasets

4.3. Qualitative Results

4.4. Quantitative Results

4.5. Robustness Test

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI