1. Introduction
Remote sensing refers to the detection of targets through various sensors mounted on different aircraft, and the obtained remote sensing image data often carry a large amount of information, requiring the use of different efficient processing techniques for practical purposes [
1,
2]. Image matching is an important technical foundation in remote sensing image processing, and the effectiveness of registration often directly affects the subsequent image processing results [
3,
4,
5,
6].
The work related to image matching can be traced back to P.E. Anuta’s work on cross correlation image detection [
3] and Brown’s summary of previous work [
4], which has since produced numerous influential works. For image pairs with significant deformation, Flusher proposed an adaptive mapping method that segments the image into multiple sub-blocks, calculates the similarity between the sub-blocks, matches them using the similarity of the sub-blocks, and registers the original image using the coordinate relationship between the sub-blocks [
7]. Studholme proposed a normalized mutual information method that overcomes the sensitivity of traditional mutual information methods to the size of overlapping parts between images and achieves good matching results [
8]. David Lowe proposed the classic Scale Invariant Feature Transform (SIFT) by combining the scale space pyramid and Gaussian kernel to improve the features invariance and detection rate of feature points [
9,
10]. The SIFT feature maintains the same changes in image grayscale, angle, scale, etc., and performs well in natural image matching. Some techniques based on SIFT have been developed over the past years. Ke and Sukthankar introduced principal component analysis to replace the weighted histograms and reduce the dimension of features to 20 [
11]. Speeded-up robust features (SURFs) were presented based on a Hessian matrix and image convolutions to faster the computing [
12]. Dellinger et al. proposed an improved SIFT registration algorithm for Synthetic Aperture Radar (SAR) images called SAR-SIFT [
13]. This algorithm uses a new gradient calculation method to effectively suppress the noise impact of SAR images. Alcantarilla et al. proposed a novel multiscale 2D feature detection and description algorithm in nonlinear scale spaces, named KAZE [
14,
15]. The results revealed a step forward in performance both in detection and description compared to some traditional methods. Gong et al. utilized two matching strategies, feature matching and regional matching, and then proposed a rough-to-accurate registration method, which first uses the SIFT algorithm to coarsely register the image and then uses mutual information methods to evaluate the initial matching points and find accurate matching point pairs [
16]. Wu et al. improved the Random Sample Consistency (RANSAC) algorithm and proposed using a fast iterative strategy to filter feature point pairs to obtain more accurate transformation parameters in order to achieve the accurate matching of remote sensing images [
17]. In order to effectively solve the problem of significant changes in image grayscale leading to the failure of traditional registration algorithms, Ma et al. proposed a Position Scale and Orientation Scale Invariant Feature Transform (PSO-SIFT) algorithm based on the SIFT algorithm, which uses a new gradient definition to calculate feature point descriptors [
6]. Accurate matching between remote sensing image pairs with significant grayscale changes has been achieved. Wu et al. proposed the Particle Swarm Optimization Sample Consistency (PSOSAC) algorithm, which uses particle swarm optimization to optimize the model parameters solved using RANSAC based on the RANSAC algorithm to obtain more accurate matching results [
18]. There are some other techniques that have been also introduced to achieve better performances [
19,
20,
21,
22,
23,
24].
Most of these researches are based on a Gaussian Scale Space and have shown success in many problems. However, it is still to be further explored in more complex scenes, such as heterogeneous image matching, multi-modal image matching, and medical image matching [
25,
26,
27]. A remote sensing image is much different from a nature image, not only in acquisition and transport but also in information presentation and vision characteristics [
1,
28,
29]. It is necessary to develop more suitable and efficient feature describing methods for remote sensing image matching. Scale selection techniques can be found in various image processing tasks. Olson presented a method where the scale of the filtering and feature detection is varied locally according to the distance to the scene pixel estimated through stereoscopy [
30]. Energy models inspired from Markov random fields and segmentation quality measures can be introduced to determine scales for different purposes [
31,
32]. PDE-constrained optimization has generated a great deal of attention in many fields [
33,
34]. In related problems, a certain functional is aimed to be optimized with one or more PDEs serving as constraints.
This paper aimed to present a novel scale optimization selection technique for local feature describing in remote sensing image matching. A nonlinear partial differential equation is configured with the noise estimation results to generate a continuous nonlinear scale space. Then, a PDE-constrained model is proposed to determine the optimized scale levels. It can be numerically solved based on Additive Operator Splitting (AOS) and subset selection. After some calculation steps similar to those of KAZE and SIFT are performed on the scale space, the feature points can be found for remote sensing image matching.
The rest of this paper is arranged as follows. In
Section 2, a linear scale space and nonlinear scale space are introduced. In
Section 3, an evolution partial differential equation configured with noise estimation is applied to describe a nonlinear scale space. And then, a PDE-constrained model is presented to optimally select the scale levels. In
Section 4, the several experiments implemented in this study to verify the accuracy and efficiency of proposed algorithm are discussed.
2. Fundamentals and Basis
2.1. Linear Scale Space
Scale space theory has been introduced in many models, and it can help to distinguish the features at different levels. After a scale parameter is added to the related model, a multi-scale representation sequence can be computed following a change in the parameter. Many tasks can be achieved by analyzing or processing the sequence, such as extracting features.
The scale space method can simulate the process of people observing objects from near to far, and the continuous change in scale can effectively describe the essential features of the image. The Scale Invariant Feature Transform (SIFT) uses a Gaussian pyramid to represent the linear scale space, which means to convolve a serial of Gaussian kernel functions with the original image function to obtain smoothed images at different scales [
10,
35]. It can be formulated as
Here,
t denotes the scale parameter, and
x denotes the pixel position.
denotes the image domain, and
denotes the time interval.
is the original image, and
means the related linear scale space. The value of t represents the degree of smoothness of the original image. The larger the value, the higher the degree of smoothness, and then, different scale image features can be obtained through some specific steps, which are the following: calculating the Difference of Gaussian (DoG), searching for extreme points, determining the key points (interesting points) and the main direction, and generating feature descriptors.
The discrete difference of Gaussian space can be denoted as
Here,
can be preset or determined according to the image data. The first two rows in
Figure 1 show different Gaussian kernels and the corresponding smoothed images. The last row shows the difference between two neighbor images. With the scale parameter
t increasing, the original image will be gradually smoothed, and the feature information will be redundant. A downsampling technique is adopted in SIFT to obtain a more compact version (small-sized) for later calculations.
Based on the scale space and the difference space, the original SIFT can be summarized as two main steps (see [
10,
36]).
2.1.1. Calculating the DoG and Determining the Key Points
Here, H means the Hessian of the DoG, and denotes a set of extreme points found in some basis subareas of the image domain. For example, a subarea can be set to a circle-centered pixel x with radius d, that is, . The second derivative of the DoG is actually the Hessian in a 3D space (spatial 2D and scale 1D), and it is different from the Hessian in the 2D spatial space. ∧ denotes the Boolean operator AND. denotes the trace operator. r is a threshold used to determine good key points.
2.1.2. Orientation Assignment and Generating Feature Descriptors
Here, means the preset eight direction angles. is a subdomain of , which is the neighborhood centered keypoint x. denotes the rotate matrix with angle . ∇ denotes the gradient operator. The image is divided into sub-regions, and eight direction statistics are calculated on each corresponding gradient field. Then, the statistics on 16 sub-regions are collected to form a feature vector with dimensions .
2.2. Nonlinear Scale Space
Stable feature points at different scales can be extracted on the previously established linear scale space. However, while smoothing noise, Gaussian convolution also smooths image details and edges, affecting the accuracy and uniqueness of feature points. Some methods have been introduced to improve this defect in the past years and achieved good results in applications [
11,
14,
19,
37,
38]. The KAZE features are extracted from a nonlinear scale space that is generated using a related partial differential equation. And it has been validated to have a comparable efficiency on some problems in image denoising and restoration. In general, a nonlinear scale space can be denoted as the solution of a nonlinear partial differential equation:
Here,
means the diffusion velocity, and the equation will degenerate into a linear version if
c degenerates into a constant. Unlike the linear scale space, there is no explicit formulation of the nonlinear scale space but an implicit formulation described using the above Equation (
5).
It is clear that the continuous nonlinear scale space is determined using the diffusion velocity and evolution time. There are lots of different configurations for them. For the velocity, there are several useful formulations, such as the following [
14]:
As there is no analytical solution for Equation (
5), it is necessary to introduce some numerical technique such as Additive Operator Splitting (AOS) to approximate the solution at a different scale
t. The discretization of the equation can be denoted as
and an efficient semi-implicit scheme can be obtained as
Here, is a matrix that encodes the image conductivities for each dimension, and this scheme is advantageous for large time steps. After taking some steps similar to those in SIFT, the image features can be generated.
Figure 2 shows three nonlinear scale spaces based on the velocity
with different values of
k. It can be found that the total variation decreased at different rates. And if such a space is divided according to a fixed mode (such as uniformly or proportionally in the scale parameter), the differences between two adjacent parts are various and difficult to be balanced. It is not advantageous to detect the image feature points and achieve a satisfactory performance in image matching.
3. Proposed Method
It can be found that the continuous nonlinear scale space is absolutely determined through the velocity and evolution time. Different diffusion velocities are very different in smoothing or preserving the potential features, so a proper velocity should be selected for each image. Of course, the noise level is also an important factor affecting the determination. The number of scale levels is another important parameter affecting the feature point detection. It is easy to know that if more scale levels are set, a higher computation cost is required though less feature points may be lost. Even if a proper number of levels is set, it is still difficult to determine which levels should be set. Though some experience and recommend selection methods can be taken from previous works, we still tried to present an optimization model to solve these difficulties.
The surroundings in remote sensing images is a complex variable, and the signal is susceptible to being polluted. There are also many changes in the type and intensity of image features; a flexible pattern of an alternating scale space is advantageous for the detection of more characteristics, which benefits later applications.
It is important to know some information about noise and feature intensity before the parameters of the scale space are determined. Here, we introduce a filter to compute a low-density part of the image [
39]:
Then, the noise mean, noise variance, and Signal-to-Noise Ratio (SNR) can be estimated by
Based on these, a new parameter configuration is formulated as (
11), for Equation (
5) to generate a continuous nonlinear scale space:
Here, means a reference SNR value that can be preset using some prior knowledge of the image data.
However, it is still difficult to determine which scale levels should be selected to compute the feature descriptor for the interesting points. In this paper, it is assumed that the interesting points are uniformly distributed in the scale space, so we tried to divide the scale space into several parts in which the changed amount of some metric is equal. Then, a related PDE-constrained optimization problem can be expressed as
Here, denotes the image energy needed to measure the variant of . The parameters and are preset. The initial condition means that the scale space should be generated from the observed image . div denotes the divergence operator. This model tries to determine N scale levels that uniformly divide the scale space .
To find the optimal solution, the continuous scale space
must be solved from Equation (13), which requires a fair amount of calculation. To reduce the cost for solving the whole scale space, a discrete version is introduced to approximate the scale space, and then the model (
12) can be rewritten as
where
denotes a small enough time step, and (15) means to solve Equation (13) solved using the AOS scheme.
Attributed to the reference step size , the original optimized scale levels are approximated by . The variables to be solved are changed from continuous type to integer type, and then the cost will be greatly reduced.
However, it is still not easy to properly choose N levels from the level set with about elements, and this is similar to a subset selection problem to some extent. To efficiently solve such a problem, an iterative approximation is introduced to the proposed algorithm, Algorithm 1.
After setting the initial image
, maximal scale level
T, maximal allowed error
, maximal iteration number
, initial solution
and then applying Algorithm 1, optimal scale levels
will be determined, and the nonlinear scale space can be uniformly divided into several parts in terms of total variation. As shown in
Figure 3, the nonlinear scale space of a remote sensing image is divided by five optimal scale levels, and the differences between two adjacent scales are almost uniform.
Algorithm 1 Optimal Scale Iterative Approximation |
Require: , T, , , , initial solution . Ensure: Optimal solution - 1:
for to N do - 2:
; - 3:
end for - 4:
; - 5:
while and do - 6:
for to do - 7:
; - 8:
; - 9:
end for - 10:
, ; - 11:
for to N do - 12:
; - 13:
end for - 14:
end while
|
4. Experiments and Results
In this study, we arranged several experiments to illustrate the performance and advantages of the proposed method. In real-world remote sensing, the same scene is often observed from different positions/viewpoint withs different surroundings. Subsequently, it may be more difficult to detect the feature points and search the matches. Here, we focused on remote sensing images collected with optical sensors. For a comprehensive comparison, the number of feature points was used to evaluate the cost. The match number and correct number are used to evaluate the efficiency. We also present the time cost of the comparison methods.
In the first experiment, we prepared one remote sensing images and generated four pairs for the test. One of each pair of images was the original image, and the other was generated by rotating and re-scaling the original image. The detected feature points and matching results are shown in
Figure 4 and
Figure 5. The detailed data (including time cost) can be found in
Table 1 and
Table 2.
For the first column of
Figure 4, the rotation angle was
, and the scaling factor was
. For the next three columns, the scaling factor was fixed, and the rotation angle was set to be
,
, and
, respectively. The rows indicate different methods: SIFT, SURF, BRISK, KAZE, and the proposed method.
It can be found that SIFT finds the most feature points but few matches. Surf finds fewer feature points but obtains more matches than SIFT ol average. BRISK finds the least matches, but there is no error. KAZE can find more correct matches than the three previously mentioned methods but less than the proposed method. In general, because of the optimized scale space, the proposed method achieves the best performance with cost and efficiency being well balanced.
In the second experiment, we tried to test the performances of the different methods under a scaling scene. The rotation angle was fixed to be
. The original image was scaled using different factor separately to generate four image pairs. The detected feature points and matching results are shown in
Figure 6 and
Figure 7. The detailed data (including time cost) can be found in
Table 3 and
Table 4.
The columns correspond to different scaling factors (, , , ), and the rows correspond to different methods (SIFT, SURF, BRISK, KAZE, proposed). It can be found that SIFT finds more feature points but fewer matches. Surf finds more matches than SIFT but shows less accuracy, especially when the scaling factor is smaller. BRISK finds the second most feature points but is not advantageous in terms of the number of correct matches. KAZE and the proposed method still find the most correct matches, but KAZE holds no obvious advantage in the match number compared with the other three methods. Our method finds not as many feature points but the most correct matches.
In the third experiment, the rotation angle was fixed to be
, and the scale factor was fixed to be
. Gaussian noise was added to the original image to prepare one image of the image pair, and then, the polluted image was rotated and scaled to generate the other image of the pair. The detected feature points and matching results are shown in
Figure 8 and
Figure 9. The detailed data (including time cost) can be found in
Table 5 and
Table 6.
The columns correspond to different noise intensities (SNR = 20 db, 18 db, 16 db, 14 db), and the rows correspond to different methods (SIFT, SURF, BRISK, KAZE, proposed).
As shown in
Figure 9, SIFT and BRISK find the most key points in each pair, but there is no advantage in finding correct matches. SURF finds the least key points, but there is no obvious disadvantage in finding correct matches. KAZE finds a few more key points than SURF but achieves a similar correct number of matches. The proposed method finds key points a little more than KAZE but achieves the largest correct number of matches.
In the fourth experiment, the rotation angle is fixed to be
and the scale factor is fixed to be
. Four optical remote sensing images are prepared to be scaled and rotated. The feature detection and matching identify are performed on four image pairs. The results are shown in
Figure 10 and
Figure 11,
Table 7 and
Table 8.
Similarly, the columns are corresponding to different remote sensing image (rs1, rs2, rs3, rs4) and rows are related to different methods (SIFT, SURF, BRISK, KAZE, proposed).
As shown in
Figure 11, SIFT finds the most feature points in each pair but no more matches than other methods. BRISK finds the second most feature points and achieve the least correct match number in average. SURF finds more feature points and correct matches than KAZE. The proposed method finds key points a little more than KAZE and achieves the biggest correct number.
From above experiments, the proposed method shows the significantly best comprehensive performance exceed other four.
5. Conclusions
In this paper, a constrained optimization model is introduce to determine some scale slices of a nonlinear scale space for remote sensing image feature detection and matching. With noise estimation introduced to determine the parameter values, a partial differential equation is applied to generate a continuous nonlinear scale space advantageous to represent the important features. The continuous scale space is approximated by a discrete one by Additive Operator Splitting solving the PDE. Then an discrete model is present to fast approach the optimal scale levels for feature detection and image matching. Four experiments are arranged to test the performance of proposed method from views of rotation, scaling, noise and the mixed version. The results show the accuracy and efficiency of the propose method. About 30% improvement in correct matches number with only a small increase in time cost can be found in the proposed method results against the others. In the future work, more efforts will be made to improve the proposed method and develop more efficient and robust method for complex scenes.