1. Introduction
With the continuous progress of satellite and sensor technology, remote sensing image data with high spectral and spatial resolution can be acquired simultaneously. The spectral resolution can reach the nanometer level, while the spatial resolution can reach the submeter level. However, due to transmission bottlenecks and signal-to-noise ratio (SNR) limitations [
1], the acquired remote sensing data have complementary characteristics, such as low-resolution multispectral (LRMS) images obtained at the expense of spatial resolution and panchromatic (PAN) images with lower spectral resolution and higher spatial resolution. To achieve high-resolution multispectral (HRMS) images, image fusion technology is required. By merging PAN images with LRMS images, the complementary information of the two can be integrated and the redundant information can be removed. This process is usually called pansharpening [
2], which, as one branch of image fusion, aims to obtain HRMS images by injecting spatial details from PAN into LRMS. Furthermore, more accurate scene descriptions and more reliable interpretations can be provided.
Many studies have put forward theories on and methods for pansharpening. Component substitution (CS) methods and multiresolution analysis (MRA) methods are widely used traditional fusion methods [
3,
4]. The classic CS-based methods include intensity hue saturation (IHS) transform [
5], principal component analysis (PCA) [
6], Gram–Schmidt (GS) transform [
7], band-dependent spatial detail (BDSD) [
8], and so on. In general, if the spectral response of the MS bands does not completely overlap with that of the PAN band, the CS-based fusion will produce severe spectral distortion. To solve this problem, MRA methods based on spatial detail injection have been proposed. Within this family, the detailed spatial information of PAN obtained by multiscale decomposition is combined with the MS bands to obtain the fusion results. However, these methods also have spatial distortion problems, such as ringing or aliasing effects or blurring of contour and texture, due to their dependence on filtering operations. Wavelet transform [
9], Laplace pyramid transform [
10], curvelet transform [
11], and contourlet transform [
12] are all well-known MRA-based methods. Generalized Laplacian pyramid (GLP) [
13,
14], which also belongs to this family, has been widely used in pansharpening. Especially when the frequency response of the filter matches the modulation transfer function (MTF) of the MS sensor [
15], the spatial details missing in MS images can be extracted from PAN image by using the MTF-GLP filter. Other methods also have been proposed in some research, such as fusion methods based on Bayesian theory [
16,
17]. A regularization solution is applied for solving ill-posed problems of reconstruction of HRMS images, exploiting approaches that are dependent on total variation penalization terms [
18], and sparse representations of signal or compressed sensing theory [
19,
20,
21]. Although these methods have achieved good performance, their practical application is limited by the computational burden caused by the optimization technology.
In summary, the main aim of existing pansharpening methods is to inject the spatial details extracted from PAN images into LRMS images to the maximum extent on the basis of maintaining the spectral information. According to literatures [
3,
4], the classic pansharpening framework can be roughly divided into two parts: extraction of spatial details based on PAN images and injection of extracted details into upsampled LRMS images the same size as the PAN images. In the step of injecting the extracted spatial details into MS bands, the addition or multiplication framework [
22] can be used to weigh the extracted detail information. As for the extraction step, pansharpening methods can be divided into CS-based and MRA-based methods according to the different methods of extracting spatial details. CS-based methods obtain low-resolution PAN images through a linear combination of MS bands, while MRA-based methods produce low-resolution PAN images through multiscale decomposition. According to the spectral response characteristics of satellite sensors, there is a non-linear relationship between the PAN band and MS bands [
23]. Accordingly, the approximate low-resolution version of PAN images that are generated by the CS-based method cannot fully reproduce the gray-scale features of the original images, often causing spectral distortion in the fusion image. The main features that distinguish the MRA-based methods are the filters used for detail extraction and the methods of injection gain. It is also noteworthy that estimation of the injection coefficient can be done by a global method based on the entire image or a local content-adaptive (CA) method. The global method has only one injection coefficient for each MS band, so it has low computational cost. A nonlinear image decomposition framework based on morphological operators was proposed in reference [
24], which is a global detail injection method. Compared to the global method, the injection coefficients of the CA method are spatially variable and have a better fusion effect, especially in terms of spectral fidelity [
13,
25]. Local estimation of coefficients is usually achieved through sliding windows, but the cost of this scheme is high, therefore, a pansharpening method based on non-overlapping blocks to calculate the injection gains was proposed in reference [
26]. However, the fixed structure of sliding windows ignores the actual spatial arrangement of the ground objects [
23]. To solve this problem, the BDSD method based on image clustering was proposed in reference [
27] and Gram–Schmidt adaptive and GLP pansharpening algorithms based on binary partition tree (GSA-BPT) were proposed in reference [
28], achieving good performance. A Gram–Schmidt adaptive with histogram-adjusted (GSA-HA) method was proposed in a literature [
29], where the k-means algorithm is used to segment PAN images through pixel clustering, then the weighted sum of groups of pixels is calculated by multiple regression to reduce the spectral distortion of fusion results. However, in the process of coefficient injection, because the spatial structure differences between PAN and MS and the relationship between MS bands are ignored, there are still problems of injection bias and spectral distortion, or manual intervention is needed in the process of pansharpening.
In this paper, in order to improve the fusion quality of MS and PAN images while reducing the spectral distortion, a pansharpening method based on local adaptive spectral modulation (LASM) and cooperation with segmentation is proposed. This method has an adaptive spectral modulation system and can adjust the segments according to the fusion feedback. In this method, the k-means algorithm is used to segment MS images to obtain connected component groups with similar spectral characteristics. The local injection coefficient matrix is estimated based on each group. MTF-GLP technology is used to extract the spatial details of MS and PAN images. To modulate the spectral information in the fusion result, LASM coefficients are constructed based on extracted image details and the spectral relationship between MS bands. By measuring the distance between fused HRMS images and upsampled LRMS images, the optimal number of connected component groups is adaptively selected to make the spectral features of the fused image as close as possible to the original LRMS image. Through the cooperation between fusion and segmentation, the local injection coefficient matrix and LASM coefficient matrix are estimated based on the connected component groups to optimize the pansharpening result, and the parameters of the segmentation algorithm are adjusted according to feedback from the fusion image. Finally, experimental results on GeoEye-1 and QuickBird satellite data sets show that the proposed pansharpening method can effectively enhance the spatial detail information and reduce the spectral distortion of fusion images.
This paper is organized as follows: The second part introduces the pansharpening problem and the key technology of pansharpening. The proposed LASM and the cooperative approach between pansharpening and segmentation are presented in the third part. A performance comparison and analysis are provided in the fourth part by experimental results on degraded and real image data from different satellites. The final part presents the study’s conclusions.
2. Model for the Pansharpening Problem
The pansharpening problem of MS and PAN images needs a fusion model to achieve a balance between injecting spatial details and preserving spectral information. This model can be either a global model based on the whole image or a local model based on the image context, such as spectral [
30] or spatial [
31] information.
Fusion of MS and PAN images yields a high-spatial-resolution MS image
. While maintaining the spectral content of the MS image, the spatial details of the PAN image are injected into MS bands so that the fusion image achieves spectral diversity and the spatial resolution is the same as the original MS and PAN images. The definition of
is
where
is the number of MS bands,
is the
-th band of the MS image upsampled to meet the PAN image size,
represents the
-th band of the injection gain matrix,
denotes the detail image of band
extracted from the PAN image,
represents the pixel-by-pixel multiplication operation between the injection coefficient matrix and the detail image, and
is obtained by subtracting the corresponding low-resolution version of the PAN image from the histogram-matched PAN image.
Most existing fusion models only consider modulation coefficient estimation for the spatial detail part, but some fusion models add coefficient modulation for the spectral part. In literatures [
32,
33], the spectral modulation coefficients are introduced into the pansharpening methods, and thus the spectral information of the MS image can be better preserved. Based on this, the fusion model can be expressed as
where
denotes the spectral modulation coefficient matrix for the
-th band.
In this paper, the MRA-based scheme is adopted. The primary steps of the MRA-based pansharpening approach include the following: first, the original MS image is interpolated to obtain the upscaled MS image with the same size as the PAN image, then the low-resolution version of the PAN image is calculated by multiscale decomposition, and the corresponding injection gain matrix and spectral modulation coefficient matrix are calculated. Finally, spectral modulation and detail injection are completed according to Equation (2) to obtain the fusion image. The key technologies during the pansharpening process—detail image estimation, injection coefficient construction, and spectral modulation coefficient construction—are detailed below.
2.1. Detail Image Estimation
Since the relationship between the low-resolution version of the PAN image and the MS bands is non-linear, the weighted sum of MS bands cannot properly describe low-resolution PAN images with different land covers. However, the estimation of low-resolution PAN images directly affects the extraction of image details, so the MRA-based multiscale decomposition method is used in this paper to calculate the low-resolution version of PAN images.
The performance of the MRA-based method can be improved by frequency analysis of images according to the filter whose frequency response amplitude matches the MTF of the imaging system. A clearer geometric structure can be produced through this method than with an ideal MRA filter. In fact, the spatial frequency response of the Gaussian filter can be adjusted to match the MTF of the sensor. In this way, we can extract the detail information from the PAN image that cannot be obtained by the MS sensors because of the coarse spatial resolution.
MTF-GLP technology is used to calculate the detailed images [
28] in this paper. Before the introduction of multiresolution wavelet analysis, Bert and Adelson [
10] proposed the Laplacian pyramid (LP), which is a band-pass image decomposition method based on Gaussian pyramid (GP). The construction process of a Laplace image is as follows: perform low-pass filtering and downsampling of the original image to obtain an approximate image with coarse scale, that is, the low-pass approximate image can be obtained by decomposition. Then the approximate image after interpolation and filtering is subtracted from the original image, which is equivalent to band-pass filtering. The next level of decomposition is carried out on the obtained low-pass approximate image, and the multiscale decomposition is completed iteratively. This method has been proved to be suitable for the fusion of remote sensing images [
14].
First, we calculate the low-resolution version
of PAN image
where
denotes the linear time-invariant filter and
represents the convolution operation. The frequency response of
approximates Gaussian shape and matches the gain at Nyquist cutoff frequency of the exact MTF of the sensor that acquires the
-th MS band [
15]. We subtract the low-resolution image
from the PAN image
to yield the spatial detail image
where
is the PAN image after equalization according to
and
denotes the low-resolution version of
.
2.2. Injection Coefficient Construction
Spectral characteristics will change according to different objects, regions, or environments [
34], and therefore the spectral relationship of MS and PAN images is unfixed. Therefore, if we inject spatial details obtained with the PAN image into each MS band without considering the differences between local areas, the fusion quality of spectral and spatial aspects will be affected. It is a very important step to estimate the appropriate injection coefficient matrix, and then the spatial details obtained with the PAN image can be weighted by their respective coefficients and injected into each MS band. A regression-based model in literature [
28] was employed to estimate the injection coefficients in this paper. In literature [
28], regression analysis between low-resolution PAN images and MS bands was employed and expanded; the injection coefficient estimation based on image regions composed of pixels with similar spectral characteristics was performed. In this case, the injection coefficient matrix is calculated by
in which
denotes the covariance and
is the variance. According to Equation (5), the locally implemented expression on the local region is
where
and
represent the connected component groups containing pixel
in images
and
, respectively. Since all pixels in the same connected component group will adopt the same gain coefficient after localization, we can make an adaptive adjustment to the injection weights of detail information guided by the spectral characteristics in the local region.
In this paper, k-means algorithm is used to segment MS images into connected component groups according to spectral characteristics, and local injection coefficients are estimated based on each connected component group.
2.3. Construction of Spectral Modulation Coefficient
According to literature [
32], when extracting spatial information from PAN images, certain spatial details contained in the MS images should be removed, so they introduced a spectral modulation (SM) scheme for the fusion model that utilizes the Gaussian function to obtain the specific spatial details from the PAN and MS images, then SM coefficients can be constructed by removing the details in MS from the details in PAN. Desirable results for the preservation of spectral information have been obtained by this method. The calculation expression of the SM coefficient is
in which
represents the PAN image,
denotes the intensity component of the MS image,
is the number of MS channels, and
is the calculation of the maximum value of the corresponding MS band including the pixel point
. The Gaussian convolution
denotes a low-pass filter,
represents the convolution calculation, and
represents the scale factor in the Gaussian function.
In this paper, an improved spectral modulation approach based on SM is proposed to preserve spectral information.
4. Experimental Results and Comparisons
Degraded and real data sets from different satellites are used for experiment and analysis in this part. First, an experiment on degraded data set to assess the performance of the proposed method is performed. The fusion results can be compared with the reference images by visual and objective evaluation indices. Second, our proposed method is applied to real data sets for performance evaluation. Seven classic and popular pansharpening methods are used as the comparative methods, including à trous wavelet transform (ATWT) [
4], GS [
7], MTF matched GLP with context-based decision (MTF-GLP-CBD) [
25], BDSD [
8], morphological filter-half gradients (MF-HG) [
24], GSA-BPT [
28], and GSA-HA [
29].
4.1. Data Sets
(1) GeoEye-1 data set: The experimental data set from the GeoEye-1 satellite is shown in
Figure 6a,e,c,g, which are the degraded and real image pairs, respectively. This data set was acquired from Hobart, Australia on 24 February, 2009 and provides 0.5 m PAN and 2 m LRMS images.
(2) QuickBird data set: As shown in
Figure 6b,f,d,h, both the degraded and real data sets from the QuickBird satellite are used for reduced- and full-resolution assessment. The QuickBird data set was captured from Sundarbans, India, on 21 November, 2002. The spatial resolution of the PAN and LRMS images is 0.7 m and 2.8 m, respectively.
The data sets collected from the GeoEye-1 and QuickBird satellites consist of one PAN band and four MS bands (R, G, B and NIR), but only R, G, and B bands are used for visual display. In the experiment, because there were no corresponding reference images in the data sets to evaluate the pansharpening performance, the original PAN and LRMS images were processed with MTF filtering and decimation to obtain the degraded data for fusion, hence the original LRMS image can be adopted as the reference. The degraded and real data from GeoEye-1 and QuickBird satellites were used to measure the performance of the pansharpening methods. The sizes of the PAN and LRMS images used in this paper are 256 × 256 and 64 × 64, respectively. The size of the reference image is 256 × 256.
4.2. Quality Indices
Evaluating the performance of the pansharpening methods mainly entails subjective visual analysis and objective evaluation. In this paper, six evaluation indices are selected for the reduced-resolution evaluation of the degraded data sets, and three indices are used for the full-resolution assessment of the real data sets.
(1) Reduced-resolution assessment: The six indices are: CC [
37], SSIM [
38], SAM [
39], RMSE [
40], ERGAS [
3], and UIQI [
41]. CC and SAM are used to measure spectral quality. Three indices account for spatial quality: SSIM, RMSE, and ERGAS. SSIM reflects the structural similarity between the reference image and the fusion image, while RMSE and ERGAS represent the difference between the two. UIQI is a global index used to measure spatial and spectral qualities.
(2) Full-resolution assessment: For the real data experiment, due to the lack of corresponding reference images, the quality with no reference (QNR) index [
42] and two independent indices, the spectral distortion index
and the spatial distortion index
, are applied to the quality assessment.
, consisting of
and
, is a global measurement of correlation, luminance, and contrast between two images.
4.3. Experiments on Degraded Data
As
Figure 6 shows, two groups of degraded images from different satellites are used to test the proposed method.
Figure 6a,e are the first group of images from the GeoEye-1 satellite data set, and 6b and f are the second group of images from the QuickBird satellite data set. The fusion results of these two groups of degraded data sets for the proposed pansharpening method and the comparison methods are shown in
Figure 7;
Figure 8, respectively. The first image in
Figure 7 and
Figure 8 is the reference image. The objective evaluation results are listed in
Table 2 and
Table 3.
By analyzing the fusion results in terms of subjective visual effects, it is easy to see that GS and MTF-GLP-CBD have relatively serious spectral distortion and blurred details. The fusion results obtained by ATWT, MF-HG, and GSA-HA maintain the spectral characteristics well, and the spatial details are relatively clear. The fusion results achieved by BDSD and GSA-BPT have excessive enhancement of some edge details compared to the reference image, which leads to certain spectral distortion. By comparison, the proposed method shows better agreement with the reference image in terms of both spatial and spectral characteristics.
As listed in
Table 2 and
Table 3, the evaluation indices of the fusion images from different approaches that are tested on the degraded image pairs can be compared. According to the objective quantitative results in
Table 2, all the objective indices obtained by our proposed method, with the exception of SAM, are superior to those of the other seven methods. We obtained the suboptimal value of SAM; the optimal value was acquired by MF-HG. In
Table 3, our method achieved the best value on five indices: CC, SSIM, RMSE, ERGAS, and UIQI. As the color in this group of images is relatively single, the minimum value of SAM is obtained by GS, and our proposed method achieves a near-optimal value. As mentioned in literature [
43], evaluation of pansharpening mainly has two aspects. One is the injection of spatial information, which mainly represents the improvement of image spatial resolution. The other is the preservation of spectral information, which refers to the degree of damage to the original spectrum. Normally we cannot obtain both, that is, the injection of spatial details and the preservation of spectral information cannot both be superior, and some compromise may be required. Generally speaking, when the fusion results have better spatial quality, there will be some compromise in spectral maintenance.
4.4. Experiments on Real Data
As shown in
Figure 6c,g,d,h, two groups of real data sets from different satellites were applied to evaluate the performance of the proposed method in practical applications. The first set of images is from the GeoEye-1 satellite, as illustrated in
Figure 6c,g, and the second set of real data shown in
Figure 6d,h was collected from the QuickBird satellite.
Figure 9 and
Figure 10 show the fusion images of the comparison methods and the proposed method based on the two groups of real data in
Figure 6, respectively. The corresponding objective evaluation indices are given in
Table 4 and
Table 5.
Based on a subjective visual evaluation of
Figure 9, it can be seen that the fusion result produced by GS suffers from some blurring and a certain degree of spectral distortion. The fusion result of MTF-GLP-CBD has some intensity distortion and excessive contrast in local areas. The MF-HG fusion result has relatively clear spatial details but slight spectral distortion. BDSD obtained excessive spatial detailsin local areas and results in some color distortion. The fusion result of GSA-BPT suffers from some color degradation and spectral distortionin local areas. The fusion results produced by ATWT, GSA-HA, and our proposed method achieved better visual effects compared with the other methods.
Figure 10 shows the fusion results from the second group of the real data set. Subjectively, the fusion image of GS has darker colors and suffers from serious spectral distortion. It can be seen that the result from ATWT has relatively clear spatial details, but spectral distortion occurs in some local areas. The results produced by MTF-GLP-CBD and MF-HG have relatively high contrast, which leads to some intensity and spectral distortion. The fusion results of GSA-BPT, GSA-HA, BDSD, and our proposed method have relatively better visual quality. The BDSD result shows clearer spatial details as compared to the other three methods, but it is difficult to distinguish the spectral preservation from the visual effects of these four methods.
Table 4 and
Table 5 list the objective assessment of
Figure 9 and
Figure 10, respectively. As can be seen from
Table 4, the optimal values of the
and QNR indices are obtained by GSA-HA and ATWT, respectively. The QNR index of our proposed method achieves a suboptimal value, and the value of the
index obtained by the proposed method is also slightly higher than the optimal one. The best
is obtained by our method. In this group of images, although the spectral distortion is reduced, some spatial information is also lost. Due to the introduction of spectral modulation in our method, the spatial quality may not be good enough in some cases.
Table 5 shows the objective assessment of
Figure 10, from which we can see that our proposed method achieves the best values of all three indices.
5. Conclusions
A pansharpening method based on local adaptive spectral modulation (LASM) construction and cooperation with segmentation is proposed in this paper. The k-means algorithm is used to segment low-resolution multispectral (LRMS) images; pixels with similar spectral characteristics are clustered into the same connected component group, then local injection coefficients are estimated based on the connected component groups. In this paper, we propose LASM for the modulation of spectral information in the fusion result, and the construction of LASM is based on extraction of details from original images and the local spectral relationships between multispectral (MS) bands. After the detail injection and spectral modulation are completed according to the fusion model, the optimal number of segments can be adaptively chosen by measuring the distance between the fusion result and the upsampled LRMS image, then we can make the spectral characteristics of the high-resolution multispectral (HRMS) image as close as possible to those of the original LRMS image. Using the idea of cooperation between pansharpening and segmentation, image segmentation is applied to optimize the fusion result, and the parameters of the segmentation algorithm are adjusted according to the feedback from the fusion image. Finally, experiments on degraded and real data sets from the GeoEye-1 and QuickBird satellites demonstrate the effectiveness and superiority of our proposed method. Compared with seven of the classic and state-of-the-art pansharpening methods, our method has advantages in spatial detail injection and spectral information preservation, while reducing spectral distortion.
However, in some cases, although spectral distortion is reduced, some spatial information is also lost. As for our method, due to its emphasis on spectral preservation, the spatial quality may not be good enough in some cases and its generalization ability remains to be improved. For this paper, segmentation is not discussed as a focus, but only as an optimization aid. In the next step, first we will focus on the influence of spectral characteristics and ground object complexity on pansharpening, and design different fusion strategies according to different image content, and then we will study more segmentation methods that can be used to cooperate with pansharpening and further improve the fusion quality.