1. Introduction
The image registration process is an important task in image processing in applications such as automatic change detection, data fusion, motion tracking, and image mosaicking. When images of the same location taken at different dates or from different viewpoints need to be compared, they must be geometrically registered to one another. In the process of image registration, the image to be registered, called the sensed image, is aligned against a reference image. Accurate registration allows detection and monitoring of changes between images that might otherwise be difficult to ascertain without subpixel misregistration.
In remote sensing applications, the registration process is most often carried out manually, which is not feasible for large numbers of images. Hence, automated techniques are needed that require little or no operator supervision. Automatic image registration is a classical problem and has been an active research topic [
1,
2,
3,
4,
5,
6,
7,
8]. Although considerable research has been conducted, until now, significant challenges remain. There is, therefore, a critical need to develop an accurate robust and efficient image registration algorithm that requires minimal or no operator supervision to register multitemporal remote sensing images.
A review of the most common image registration techniques was conducted in [
9,
10,
11,
12]. The existing methods can be broadly divided into two major categories: areabased and featurebased techniques. Several studies examined the improvements in accuracy and reliability and means of mitigating the limitations of both groups of registration techniques.
In featurebased techniques, the algorithms involve two stages. Firstly, the method extracts feature from the reference and sensed images. Secondly, corresponding points are matched to obtain a correct match and the optimal model transformation between the image pair. The feature space includes corners, edges, and regions from the reference and sensed image. As a result that the corners are invariant to imaging geometry, they can achieve more precise results [
12].
The point feature methods first detect a set of candidate feature points by applying an interest operator to both images [
2,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15]. In [
16,
17,
18,
19], the authors review different feature detectors and suggest a combination of various detectors to obtain the best performance. Comparison between detectors has shown that the performance declines with increasing viewpointchange effects. Despite the fact that repeatability is the most commonly employed measure for assessing the accuracy of extracted interest points, their spatial distribution plays a crucial role in the process of registration. These should be the most widely distributed throughout the entire image to ensure a precise registration [
16]. The most popular algorithm is the Harris corner detector [
20], which continues to be a reference technique. The success of the Harris detector resides in its efficiency to detect stable and repeatable interest points across various image deformations [
21,
22]. Once feature points have been detected, feature matching is then performed by comparing the local descriptors such as scaleinvariant feature transform (SIFT) or speededup robust features (SURF). The SIFT descriptor was developed by Lowe [
23] to achieve feature invariance to image scaling, rotation, and illumination variations. The SIFT operator can find optimal point correspondences between images against errors arising from small geometric distortions [
24]. The registration based on SIFT algorithm suffers from low precision and slow speed for remote sensing images [
25,
26].
By comparison, areabased techniques compute a measure of similarity using raw image pixel values for a small template in the reference image and compare this with a similar region in the sensed image. Consequently, a review of the related works indicates that classical algorithms are highly sensitive to changes in pixel intensity that are introduced by local distortions, illumination differences, shadows, and changes in viewing angle, in addition to the use of images of different sensors. Typical similarity measures are crosscorrelation with or without prefiltering, and Fourier invariance properties [
27,
28], such as phase correlation (PC) used as a similarity measure [
29]. Arya proposed a normalized correlation algorithm based on Mestimators that reduces the effect of noisy pixels [
30]. In [
31], Mestimators in intensity correlation were found to perform better than the mutual information (MI) measure for medical images. Georgescu and Meer suggested the implementation of a gradient descent algorithm using Mestimators and radiometric correction [
32]. In Eastman [
33], several image registration schemes in major satellite ground systems were surveyed. The registration was performed on a database of carefully selected control point image chips. Normalized correlation was used in local regions after applying the topographic correction and the cloud masking process.
Researchers have begun to attach importance to the application of phase correlation in image registration [
3,
34,
35]. Leprince et al. [
36] built a workflow for automated correction, called COSICorr, based on phase correlation. Scheffler et al. [
37] proposed a phase correlation based on an approach called AROSICS that uses cloud masking. Fourierbased registration in the frequency domain is adapted to handle images that have been both translated, rotated, and scaled with respect to each other [
38,
39,
40]. Chen et al. [
39] proposed a matching algorithm based on both the shiftinvariant Fourier transform and the scale invariant Mellin transform to match images that are rotated and scaled. Abdelfattah and Nicolas [
41] demonstrated improved precision of the invariant Fourier–Mellin Transform (FMT) over a standard correlation for the registration of interferometric synthetic aperture radar images.
Among many previously proposed approaches, the effect of increased relief displacement, the requirement of more control points, and increased data volume are the challenges associated with the subpixel registration of VHR remote sensing images taken from different viewpoints. As a result, there is a need for accurate, robust, and automatic subpixel correlation algorithms that produce a high number of correct matches with homogenous geometric distribution and have low sensitivity to noise, outliers, and occlusions. To achieve the abovementioned objectives, an appropriate method should be selected and used for each of the following four components [
42]: (1) a feature space, (2) a search space, (3) a search strategy, and (4) a similarity metric.
Here, we propose a hybrid algorithm that combines Harris corner detection and Fourier phase correlation for coregistering very highresolution remote sensing images. This approach uses phase correlation to reduce sensitivity to noise and Harris corner to improve efficiency [
43]. The proposed method comprises three major steps: (1) feature extraction; (2) Fourier phase matching; and (3) phasematching optimization. First, the corners are generated by the Harris corner detector, which provides useful reference points but still contains subpixel shifts. We then refine the location of the corners to obtain subpixel accuracy using optimization techniques. The proposed phase correlation approach is formulated as an optimization problem that minimizes a cost function.
The proposed method is presented in
Section 3, followed by the experimental results and discussion in
Section 4. The conclusions are presented in
Section 5.
3. Materials and Methods
Our coregistration approach can be divided into four major steps. In the first step, corners are extracted from a set of reference images and sensed images. Then the corresponding points are identified on the sensed and reference images using phase correlation. After obtaining the corresponding points, accurate subpixel measurement is performed by minimizing a cost function. Following pointwise correspondence, it is important to address outliers prior to computing the spatial transformation model in order to achieve more accurate registration. All of the experiments were performed on a desktop with a 64bit Windows 10 OS with MATLAB. The hardware of the desktop comprised 2.30 GHz Intel i56200U CPU and 12 GB memory.
3.1. Harris Corners Extraction
The fundamental step in image registration is to choose which features are to be used for robust matching in order to determine the most optimal transformation between the two images [
11]. In order to achieve reliable matching, corner points are chosen as matching features as they are the most stable features in the image; they are invariant to scale, rotation, and viewpoint changes [
42]. As a result, that computing the optimal transformation parameters depends on these features, a sufficient number of invariant features, evenly distributed across the image, are required.
The proposed approach consists of detecting interest points using a suitable corner detector. Compared to previous detectors, the Harris corner detector has been proven to provide more fine and stable results [
65,
66]. Here, we propose the detection of accurate corners using a Harris corner detector to significantly enhance the registration performance. Moreover, since precise homologous feature points with a homogeneous spatial distribution over the images are important for reliable determination of spatial transformation model parameters [
67], our approach involves first subdividing the image into nine subregions and then extracting the Harris corners points for each area using the Harris detection algorithm.
This process ensures a uniform distribution of points throughout the image and avoids outofmemory errors.
3.2. Point Correspondence Using Fourier Phase Matching
Following the extraction of distinctive control points from the reference image, the search for corresponding control points (CP) is then performed using Fourier phase matching, which consists of searching the best match for each corner in the reference image from a set of candidate points in the sensed image. The candidate corner with maximum magnitude of the inverse Fourier transform of the normalized crosspower spectrum is considered the correct corresponding point.
Phase correlation methods used to estimate the displacement [
2,
18,
19,
21,
29] with subpixel accuracy are affected by aliasing and edge effects, which are mostly present at higher frequencies. The bandwidth of the magnitude of the crossspectrum may, therefore, be essential to limit these artifacts. Stone recommended masking the phasecorrelation matrix by retaining only the frequencies for which the magnitude of the crossspectrum crosses above a certain threshold value [
63].
Additionally, image features close to the image edges can have a negative effect on the ability to identify translational motion between two images. Applying a 2D spatial Blackman or Blackman–Harris window greatly reduces imageboundary effects in the Fourier domain [
63]. In [
59], the Blackman window removes a significant amount of the signal energy. The Hann window achieves a good compromise between noise reduction and losing the desired phase information [
49]. The idea behind denoising is to choose the appropriate frequency masking such that the undesired signal is removed while preserving the necessary information.
The more successful phase matching methods typically apply Hann or Blackman windows to reduce the impact of edge effects on the phase correlation process. Although performing phase matching can provide accurate results, it is important to improve the image clarity using windowing. Identification of efficient image denoising techniques remains a critical challenge. To choose a suitable window for the reference and the sensed images, we evaluated the effect of the Blackman window and the Hann window prior to the Fourier transform as a postprocess to minimize the amount of loss in the image intensity information, and therefore obtain a more accurate estimation of the translational shifts.
3.3. SubPixel Translation Estimation
Among the existing subpixel registration techniques, Fourier domain methods achieve excellent robustness against correlated and frequencydependent noise [
4,
19,
20,
21,
22]. Most of these methods are in fact variations of the original phasecorrelation method [
28]. The translation (
$\u2206$x,
$\u2206$y) that is given by
$R\left[{w}_{x},{w}_{y}\right]$ can be recovered in either the spatial or frequency domains. In the first approach, the inverse Fourier transform of the normalized crosspower spectrum
$r\left[\u2206\mathrm{x},\u2206\mathrm{y}\right]$ was computed in [
15,
19], and we obtain a function that is an impulse; that is, it is approximately zero everywhere except at the displacement (x
_{o}, y
_{o}), which is needed for the shift to optimally register the two images. As a result, it is limited to integerpixel accuracy. Getting subpixel accuracy requires finding the displacement
$\left(\u2206\mathrm{x},\u2206\mathrm{y}\right)$ that maximizes the inverse Fourier transform of the normalized crosspower spectrum
$r\left[\u2206\mathrm{x},\u2206\mathrm{y}\right]$ defined as the phaseonly correlation (POC) function. The normalized crossspectrum of the images is defined by:
where
w_{x} and
w_{y} are the frequency variables in column and row.
And the POC function is given by
with
${N}_{x}=2{M}_{x}+1$ and
${N}_{y}=2{M}_{y}+1$, where
${N}_{x}$, and
${N}_{y}$ are, respectively, the width and height of the input images.
The problem of finding the extrema of the cost function with subpixel accuracy is equivalent to finding the zeros of
${r}^{\prime}\left[\u2206\mathrm{x},\u2206\mathrm{y}\right]$. The derivative of
$r\left[\u2206\mathrm{x},\u2206\mathrm{y}\right]$ takes the form of a vectorvalued bivariate gradient function, given by:
with
and
by setting
$R\left[k,l\right]={R}_{r}\left[k,l\right]+j{R}_{i}\left[k,l\right]$.
This stage involves computing the subpixel displacements that minimize the cost function. The process is then iterated until convergence to a close neighborhood or it has reached its maximum number of iterations; then, the algorithm is implemented to obtain a fast and accurate solution. The drawback of phase correlation is that it cannot cope with large noisy data. Therefore, both a good initial estimate of the displacement and an optimal filter for noise reduction without losing the desired information are required.
The derivative of the cost function has increased highfrequency content with respect to $r\left[\u2206\mathrm{x},\u2206\mathrm{y}\right]$. Thus, $r\prime \left[\u2206\mathrm{x},\u2206\mathrm{y}\right]$ is very sensitive to noise and border effects. For this reason, two methods are used to minimize the objective function for accurate local searching. The first uses only function evaluations, and the second method requires the first derivative to update the solution point.
3.3.1. Nelder–Mead (NM) Optimization
The Nelder–Mead (NM) method [
68] was designed to solve multidimensional unconstrained minimization of an objective function of n variables using (n + 1) vertices arranged as a simplex. For two variables, a simplex is a triangle, and the algorithm begins with a randomlygenerated simplex requiring three starting points; the first was the integer solution (
$\u2206{\mathrm{x}}_{0},\u2206{\mathrm{y}}_{0})$ and the remaining two were chosen as (
$\u2206{\mathrm{x}}_{0}$ ±0.5,
$\u2206{\mathrm{y}}_{0}$) and (
$\u2206{\mathrm{x}}_{0}$,
$\u2206{\mathrm{y}}_{0}$ ±0.5) where the sign was selected depending on which neighbor was best. At every iteration step, the NM algorithm modifies a single vertex of the current simplex towards an optimal region in the search space to determine the optimal value of the objective function. Finally, the vertex of the triangle that yields the most optimal objective value is returned.
3.3.2. The TwoPoint Step Size (TPSS) Gradient
In [
21], the TPSS is an optimization algorithm used to minimize the POC function at a point
${x}_{k}$ by iteratively moving in the direction of steepest descent until convergence is achieved.
The updating equation for any gradient algorithm has the following iterative form:
where
$\nabla r\left({x}_{k}\right)$ is the gradient vector of r at
${x}_{k}$,
${x}_{0}$ is a given starting point, and the scalar
${\alpha}_{k}$ is given by:
where
${s}_{k1}={x}_{k}{x}_{k1}$ and
${y}_{k1}=\nabla r\left({x}_{k}\right)\nabla r\left({x}_{k1}\right)$.
Iterations stop when $\left{x}_{k+1}{x}_{k}\right\le \u03f5$ where $\u03f5$ is a prespecified small scalar.
3.4. Detection of Outliers
The choice of the best key points plays a crucial role in order to achieve the most accurate registration. Coregistration can be inaccurate if outliers are present. Thus, dealing with outliers is absolutely necessary prior to computing a spatial transformation model [
69,
70]. To identify and handle outliers efficiently, robust algorithms are required. Our approach is based on the Random Sample Consensus (RANSAC) algorithm for filtering outliers in an efficient way. RANSAC algorithm proposed by Fischler and Bolles [
71], is one of the most robust techniques designed to detect and deal with a large percentage of outliers in the key points set. This method aims to repeatedly select a minimal set of points and evaluates their likelihood with other data until the optimal solution is found. The basic steps of the RANSAC algorithm are as follows: Sampling of all minimal random points from all pointwise correspondences to estimate a model by minimizing a cost function that is the sum of the distance error of the inlier. All other point correspondences are checked against the fitted model. A point with a residual larger than a threshold will be considered an outlier. The estimated model with sufficiently many inliers is classified as the optimal transformation. Finally, all the inliers are used to reestimate the model. The RANSAC algorithm is listed in Algorithm 1.
Algorithm 1: RANSAC Algorithm 
input: P: Set of all points, ε: Tolerance; output: M: New model, k: Tentative inliers in P; 1. Select N random data sets required to determine the model parameters. 2. Estimate the parameters of the model. 3. Fit how many data items within a userdefined tolerance ε fit the model. Call this k. 4. If k is large enough, accept model fit and exit with success. 5. Repeat steps 1 through 4. 6. Algorithm will be exit with fail.

3.5. Transformation Model
A precise registration requires an adequate choice of the optimal transformation that correctly models the distortions expected to be present within the images [
11]. A transformation is selected depending on the nature of distortions. The most commonly used registration is the global transformations which are implemented by a single equation that maps the entire image using a single function. Generally, the registration of satellite imagery has been modeled as a global deformation, mainly utilizing affine and polynomial transformations [
1,
72]. For very high spatial resolution imagery with rugged terrain, a global transformation may not handle the local distortions. Global transformations that can accommodate slight local variations include thinplate spline (TPS) [
1,
73] and multiquadric transformations (MQ). To overcome the problem of geometric distortions, several local mapping functions have been proposed. These transformations are more accurate, but are timeconsuming and need large numbers of evenly distributed features to represent the local variation [
74]. In [
75], Goshtasby found that TPS and MQ functions are effective in the registration of remote sensing images with few local geometric differences and a small number of widely spread points. By comparison, piecewise linear transformation (PL) is more appropriate for images with local geometric differences because local accuracy is kept in a small neighborhood surrounding the corresponding control points [
76,
77]. When the correspondences are noisy, weighted mean and weighted linear methods are preferred to PL, TPS, and MQ because inaccuracies in the control point are smoothed and the rational weight is adapted to the density and organization of the control points [
77]. Here we use TPS and polynomial transformations to register VHR remote sensing images.
3.6. Workflow of the Proposed Approach
The proposed hybrid algorithm used for subpixel coregistration can be summarized in the following steps (
Figure 1):
Extract Harris corners from the sensed and reference images for each of the nine subregions.
For each point PC_{i} in the reference image, weigh the reference template image ${p}_{1}\left[x,y\right]$ and the candidate template image ${p}_{2}\left[x,y\right]$ for each of the knearest neighbors in the sensed image by a Blackman window.
Compute the discrete Fourier transform ${\widehat{p}}_{k}$ of each filtered image ${p}_{\mathrm{k}}\left[x,y\right]$.
Compute the normalized crosscorrelation $R\left[k,l\right]$ and POC function $r\left[\u2206\mathrm{x},\u2206\mathrm{y}\right]$ between ${\widehat{p}}_{1}\mathrm{and}{\widehat{p}}_{2}$.
The candidate points with the maximum magnitude of the phaseonly correlation are considered as the exact corresponding points.
Eliminate the pairs for which the score is less than 0.3, in addition to the manytoone match with the minimum score.
Deal with outliers using the RANSAC algorithm.
Compute the displacement $\left(\u2206{\mathrm{x}}_{0},\u2206{\mathrm{y}}_{0}\right)$. for each corresponding point pairs using phase correlation.
Using $\u2206{\mathrm{x}}_{0}\mathrm{and}\u2206{\mathrm{y}}_{0}$ as initial approximations, two optimization algorithms are used to find $\left(\u2206\mathrm{x},\u2206\mathrm{y}\right)$ that maximize the POC function $r\left[\u2206\mathrm{x},\u2206\mathrm{y}\right]$.
Compute the parameters of the model transformation using least squares minimization.
Apply model transformations to align the sensed image with the reference image.
3.7. Evaluation Criteria
We evaluated the overall performance of our proposed coregistration workflow using the precision of phase matching and the root mean square error (RMSE) of measured displacements before and after the registration.
The RMSE can be written generically as:
where
i is the CP number,
n is the number of evaluated points,
xt_{i} and
yt_{i} are the true coordinates, and
xr_{i} and
yr_{i} are the retransformed coordinates for CP
_{i}.
Feature matching precision is used to evaluate the matching performance. Matching precision refers to the ratio of the number of correct matches over the total number of matches.
where
N_{c} is the number of correct matches and
N_{w} is the number of wrong matches.
4. Results and Discussion
In this section, we first present the results obtained using our proposed method to automatically find corresponding points and accurately register the sensed image. Then, we evaluate the effects of different window functions on the correlation measures. After identifying a suitable window function, in the third series of experiments we apply the Nelder–Mead and TPSS optimization methods to improve the matching accuracy. The proposed methods were carried out using groundtruth subpixel shifts.
4.1. Descriptions of Experimental Data
The reliability and validity of the proposed method were evaluated with two sets of VHR remote sensing images acquired from different sensors. These data include a Pleiades multispectral scene and an aerial image at Fes and Rabat city, respectively. Their specific parameters are given in
Table 1 and
Table 2, and the aerial imagery is displayed in
Figure 2. Each set of images contains a reference image and a sensed image registered based on the metadata and georeferencing information. The approximate height range at the Fes city is between 180 and 837 m. The elevation for the Rabat test site ranges between 2 and 332 m.
4.2. LargeScale Displacements Estimation with PixelLevel Accuracy
In this section, we evaluate the effect of the window functions on the phase matching and identified the corresponding control points using phase correlation based on the windows function that provided the largest number of matches.
4.2.1. Effect of Two Window Functions on the Correlation Measures
Here, phase matching results for a pair of aerial images 2014–2016 are presented and discussed. To choose the optimal window, we applied Blackman and Hann window functions to the templates of sensed and reference images centered on 128 Harris points. A correlation was considered to be successful if the magnitude of the POC function was less than 0.3. After outlier removal, 110 corresponding point pairs remained. Examples of the template images of reference points are shown in
Figure 3. The experiment results show that the use of the Blackman window for both reference and sensed images generated a greater number of correct matches than the Hann window function; however, the Hann window produced more accurate results. The responses of both the Blackman and Hann window functions are robust to noise and produce more accurate matching. The results are summarized in
Table 3.
4.2.2. Performance of Phase Matching
Table 4 shows the results of phase matching and our algorithm for aerial images. For the first algorithm, which uses phase correlation only, the experiment results confirm that using only phase matching may not always provide precise correlation, particularly for multipixel displacements between reference and sensed images; this confirms the results of Leprince et al. [
49]. To improve the accuracy of image matching, we applied a hybrid algorithm that combines phase correlation and a featurebased technique. After extracting the interest points using the Harris corner detector, we efficiently identified the corresponding feature matches using phase correlation with a template image centered in the Harris corner. The Harris corner was first used to achieve a good initial position, based on which matching was then refined using the phase matching method. Our approach avoids decorrelation caused by multipixel displacements and produces a displacement estimation with pixel precision.
The results of the correlation process of our approach and the SURFbased method are shown in
Table 5. The matching score of the proposed approach indicates better performance was achieved than that of the SURF descriptor; the average matching precisions are about 0.1 and 0.02 for aerial images, respectively, and for satellite images, the average matching scores obtained are about 0.11 and 0.15, respectively. The main difference between the SURFbased method and our method is the number of correct matches.
Figure 4 shows that our approach provides the largest number of corresponding feature points for both aerial and satellite images. In terms of the overhead computation time, for satellite images, our approach requires an average of 237.5 mn due to the large number of corresponding points. Conversely, the SURFbased matching can obtain a result in a shorter time.
The success of the proposed approach is largely due to the use of the Blackman and Hann windows, which avoid edge effects by using patches of 16 × 16 pixels and the RANSAC algorithm to deal with outliers; the larger patch size increases the number of incorrectly matched point. To obtain this performance for the corresponding algorithm, the results of many phase matching trials were examined for various experimental conditions. The experiments demonstrated that with VHR remote sensing images, the proposed method was able to find a large number of correct matches with homogeneous distribution.
4.3. Enhanced SubPixel Displacement Estimation
To achieve more accurate results, we tested two optimization approaches using POC function maximization. To assess the quality and accuracy of this automated process on aerial and satellite images, we manually relocated the corresponding point in the sensed image for 40 control points.
A comparison of twopoint step size and Nelder–Mead algorithms are shown in
Table 6. The analysis of the optimization results shows that both of the optimization algorithms provide the same accuracy of the displacement estimate. The obtained RMSE values of the NM algorithm are similar to those of the solution without optimization, even if both algorithms are initialized with the same displacement generated using phase correlation. In addition, when the MN algorithm attempts to iteratively reduce the error for both the east/west and north/south displacement to improve the minimum of the POC function, the estimations provide the same interest point localizations as the Harris detector algorithm.
The difference between the twopoint step size and Nelder–Mead algorithms are the overhead computation time: the twopoint step size algorithm (107 mn) is more timeconsuming than the Nelder–Mead algorithm, which required an average of 3 mn for the same number of corners (in addition to the preprocessing and corresponding point phase matching steps). It was observed that our subpixel approach based on Nelder–Mead optimization performs better in terms of the number of correct matches, localization accuracy, and computation time.
4.4. Validation of the Proposed Registration Method
Following subpixel refinement, the corresponding points were used to estimate the model transformation for each data set. The proposed approach found 150,619, and 33,565 corresponding points in aerial image 2014, aerial image 2016, and satellite image 2014, respectively. In this work, we registered the three images using TPS and polynomial transformation. This step was performed using PCI Geomatica.
The estimated misregistration, measured on a set of independent check points, was found using TPS and firstorder polynomial transformations, separately, in each of the x and y dimensions. As shown in
Table 7, the polynomial transformation of degree one provides better performance than TPS, which is timeconsuming for large images. The proposed approach was able to align remote sensing images and achieved an average accuracy of 0.3 pixels in the x and y directions for aerial image 2016. This performance was lower for aerial image 2014, which was coregistered with the aerial image 2009 acquired with an analogue camera. Regarding the coregistration of the satellite image, the accuracy was about 0.6 pixels.
5. Conclusions
Very highresolution remote sensing images usually contain obvious geometric distortion in local areas introduced by increasing the spatial resolution and lowering the altitude of the sensors. To coregister these images more effectively, this paper proposes a hybrid registration approach based on combining Harris feature corners and phase correlation, which has two advantages. First, the proposed approach uses Harris feature points, which improve the reliability and efficiency of Fourier phase matching. Second, it is more insensitive to image noise and subpixel resolution is feasible.
As a result that the Harris corner location is at the pixel level, two optimization algorithms were used to refine the localization to subpixel accuracy. The proposed approach was used to coregister the aerial and satellite imagery and was validated in terms of precision and robustness to noise.
The experiment results show that the responses of both the Blackman and Hann window functions are robust to noise and result in more accurate matching. The use of the Blackman window for both reference and sensed images generates a larger number of correct matches than the Hann window function and results in a correct matching rate of 0.8. In addition, the correct matching rate of the proposed approach is superior to that of the SURFbased method. The experiments also demonstrate that the proposed approach based on the MN optimization algorithm performs better in terms of the number of correct matches, localization accuracy, and computation time. The sensed image was then registered using polynomial and TPS transformations, and the parameters of the polynomial mapping function were estimated via the weighted least squares method. Experiments confirmed that the proposed approach can achieve registration accuracy better than 0.5 pixels and is more robust to noise. Our approach is a better compromise between accurate results and a high number of correspondences. The advantages of our approach are: (1) it produces a high number of true correspondences that match with the subpixel location; (2) it results in high performance of phase matching, which is based on corners and produces more precise and stable results; (3) it reduces the computation cost because the correlation is based on corners, and allows accurate error evaluation because the error can be detected and measured more easily.
Three limiting factors of our approach need to be improved. First, our adopted phase correlation approach recovers only the translation between the reference and sensed images. Second, the high computational processing time can be addressed using a coarsetofine registration strategy and hardware with high capabilities. The third factor relates to the control points used to assess the optimization algorithms. The control points are manually relocated, which means that the localization may contain errors because it is difficult to localize the same corner with subpixel accuracy. Therefore, the performance and efficacy of the optimization methods must be revalidated with simulated data.
Our future work will focus on Fourier–Mellin transformation to match images that are translated, rotated, and scaled, with the aim to improve subpixel image registration accuracy using satellite and drone imagery.