1. Introduction
Chinese paintings, as one of the most important forms of artistic expression of traditional Chinese culture, are the most valuable treasure of human civilization. However, these ancient paintings suffer from various undesirable patterns and face extinction threats due to natural environmental factors or improper human preservation measures. Stains are one of most common types of degradation and refer to the formation of spots from paper contamination. Stains not only affect the appearance of the painting, but sometimes even cover the text, patterns, and colors on the paper cultural relics. Scholars performed a sampling survey in the Zhengzhou Museum (the first group of Class A museums in China) and found that 96% of ancient paintings were contaminated by stains, of which 68% were severely contaminated and 28% were slightly contaminated [
1]. To restore their original appearance, traditional artificial resolution attempts adopt physical or chemical methods to remove these stains [
2,
3]. However, these approaches need repeated trials, and the repair process has many risks, where any carelessness may cause permanent irreversible damage to the ancient paintings.
Virtual restoration can reduce human interference and provide a reference for physical repair [
4]. This refers to the removal of undesirable degradations on the collected images of a painting to restore its original appearance, which contributes to improving the artistic value and facilitating subsequent interpretation. It has become an important field in the digital preservation of cultural relics [
5,
6], and most studies adopt standard 3-channel (red–green–blue (RGB)) images [
7,
8,
9]. With the development of imaging spectroscopy, hyperspectral images have been introduced for the restoration [
2,
10,
11]. Considering the types of image used, virtual restoration is classified into two categories in this paper: RGB-based methods without any other auxiliary information source and hyperspectral-based methods.
RGB-based methods refer to re-estimating the color components of degraded regions with inpainting algorithms based on the assumption that the pixels in these regions share similar geometrical structures or statistical properties with those of a painting’s undegraded parts [
12]. Inpainting algorithms mainly include the following two types: diffusion-based and exemplar-based algorithms [
12,
13]. For diffusion-based inpainting algorithms, Giakoumis et al. [
14] applied the controlled anisotropic diffusion algorithm to inpaint the cracks on digitized paintings. Wu et al. [
7] replaced orthogonalized diffusion with cross diffusion to improve the filling order of the curvature driven diffusion (CDD) algorithm to remove fissures in murals. These types of algorithms could be well suited for the removal of small areas of degradation, but when applied to large regions, they always lead to a blurry image. For exemplar-based inpainting algorithms, Pei et al. [
9] proposed an annular scan synthesis method to gradually fill the region from outside the boundary to the inside to remove stains and crevices on murals, but this method requires adding some auxiliaries to connect the breach. To ensure the connection of structures, recent studies have preferred using priority-defined exemplar-based algorithms that give the filling order of target pixels to conduct the inpainting progress along the structure propagation [
15,
16,
17,
18,
19]. For example, the Criminisi algorithm [
20], which defines the pixel priority with a confidence term and data term, has proven its effectiveness in the restoration of murals [
16,
17]. Recently, sparse representation-based inpainting methods were also introduced to solve the restoration problems [
8,
21]. Although exemplar-based and sparse representation-based methods can inpaint large regions with the known texture parts of the image, due to the lack of information inside the degradations, they still face a huge challenge when applied to images with complex structures. Regarding stains, although the covered pigment information remains, the presence of stains leads to changes in the color of the image, which makes it difficult to use residual information in restoration. Additionally, if the stained areas are directly removed and inpainted with existing methods used in the literature [
9,
17], it is difficult to accurately restore the original internal structures.
Hyperspectral-based methods might provide additional information for stain removal, since hyperspectral data have a wide spectrum range and the covered pigments remain. However, current studies have mainly focused on the visual enhancement of blur patterns and texts in cultural relics using the mining information from hyperspectral data [
22,
23,
24,
25]. Regarding virtual restoration, Hou et al. [
2] performed maximum noise fraction (MNF) transformation on the hyperspectral data to concentrate the stain information in some component and then performed inverse MNF transformation after abandoning the component. Although stains on ancient paintings can be removed, the restoration results depend on whether the stain component mainly contains stain information. Furthermore, studies have shown that the stain/artifact cover effect decreases in longer wavelengths when compared to that of the RGB image of hyperspectral data [
11,
26]. Based on this feature, Kim et al. [
11] proposed replacing the gradient of visible bands with that of invisible near-infrared bands capturing fewer artifacts, and then reconstructed the image with the newly generated gradients. This approach could effectively remove ink-bleed, corrosion, and foxing of old documents. However, compared to ancient documents, paintings are more complex, with various colors and structural information. This method is difficult to restore the original appearance with stain removal.
To effectively remove the stains on Chinese paintings, this paper proposes restoring a painting’s original appearance with the aid of selected hyperspectral feature bands. We demonstrate a similar feature of the cover effect decreasing stains in longer wavelengths when compared to that of the RGB image of hyperspectral data for Chinese paintings. Based on this feature, we propose a patch-based color constrained Poisson editing method to remove the stains of Chinese paintings. The main contributions of the proposed method are twofold. On the one hand, the introduction of feature bands provides the prior information for the restoration, which overcomes the limits of RGB-based methods due to the lack of information inside the degradations. On the other hand, the construction of color constraints based on the searched match make the Poisson editing suitable for the removal of stains. It is worth noting that although hyperspectral data are with hundreds of bands, in this paper, we focus on restoration of the RGB image (R: 640.31 nm, G: 549.79 nm, B: 460.20 nm) of hyperspectral data, since they are the most natural visualization of the paintings.
The rest of this paper is organized as follows. In
Section 2, we propose the algorithms used for virtual restoration, including the preprocessing, feature band selection, and patch-based color constrained Poisson editing.
Section 3 describes the experimental results and the comparison with other restoration methods.
Section 4 discusses the parameter setting and the effect of feature band selection. Finally, the conclusions drawn from the study are presented in
Section 5.
2. Methods
We found that Chinese paintings share similar features with those of old documents, where the cover effect of stains decreases in longer wavelength images when compared to that of RGB image. This is demonstrated by three examples in
Figure 1 (hyperspectral images of paintings are from the Digital Heritage and Virtual Restoration Laboratory of Beijing University of Civil Engineering and Architecture); although the stains are obvious at the wavelengths of the RGB image, they are almost invisible at the wavelength of 877.4 nm, where the covered information can be exposed. Based on the feature that the cover effect of stains decreases in longer wavelength images, in this paper, we propose the patch-based color constrained Poisson editing method to reconstruct the manually marked stain regions of RGB image with the selected hyperspectral feature bands that are less affected by stains.
The workflow is summarized in
Figure 2. First, the collected hyperspectral images were preprocessed, which consists of radiometric correction and image denoising. Second, the rules were established to select three feature bands capturing fewer stains. Then, the feature bands combined with the RGB image were used to construct the color constraint. On the one hand, the feature bands that can expose the covered information were used to search for the optimal patch from the nonstained region for each pixel located in the manually marked stain region. The patch from the RGB image corresponding to the optimal patch location was used to construct the constraint value for the pixel in the stain region based on the assumption that the relative spatial locations of optimal patch will be coincident within the RGB image and feature bands. On the other hand, each stain region of feature bands was segmented into blocks by quadtree to determine the constraint location. Finally, Poisson editing was used to reconstruct the stain region of the RGB image with the established color constraint to remove the stains, in which the contaminated RGB image was regarded as the target image and the feature bands were regarded as the source image. The details are described in the following sections.
2.1. Data Acquisition and Preprocessing
These paintings processed in this paper were from the Capital Museum of the China, and the hyperspectral data of these paintings were captured by the Digital Heritage and Virtual Restoration Laboratory of Beijing University of Civil Engineering and Architecture. The hyperspectral image was captured with the Themis Vision Systems VNIR400H, a pushbroom scanning hyperspectral imaging system. This system capture images at 1040 wavelength bands from 377.45 nm (ultraviolet) to 1033.10 nm (near infrared), with sampling intervals of 0.6 nm and a spectral resolution of 2.8 nm. This system takes two halogen lamps as the light source and has an instantaneous field angle of 30°. When the hyperspectral image of painting was being captured, the painting was placed on an object support facing the camera, which is handled vertically. The distance from the painting to the camera was about 0.8 m, and each collected image is 1392 × 1000 pixels. To avoid other light sources disturbing the measurement, only two halogen lamps were used to illuminate the painting in the collection. As the painting is usually large, it is impossible to collect the whole painting within an image. The collection of a painting is completed with more than one image, and the collected images overlap about 30% with their adjacent images. In this paper, we selected one or two stain-contaminated images from each painting as the experimental data.
The captured image is always disturbed by the uneven intensity distribution of the light source and by dark current noises. To reduce these effects, radiometric correction was performed with the collected standard white board data and dark current data according to the following formula:
where
R is the calibrated image,
represents the collected original image,
indicates the dark current data, and
is the standard white board image. After seriously noisy band removal, the bands from 433.57 to 974.96 nm (851 bands in total) were chosen as the input data for our approach. To further reduce image noise and improve image quality, MNF transformation was applied, in which the former ten components with large eigenvalue were chosen to perform inverse MNF transformation since the transformed component shows steadily decreasing image quality with increasing component number [
27].
2.2. Feature Band Selection
The selected feature bands not only serve as the source image to provide the gradient in the subsequent fusion but are also used to search for the optimal matched patch to construct the color constraint. Thus, the feature bands need to be minimally affected by the stains in order to expose more information covered by the stains. Although the cover effect of stains decreases in longer wavelengths, pigments tend to have a high reflectance in longer wavelengths, resulting in different types of pigments or paper having less separability. For example, in
Figure 1e,f, the paper and blue color material seem very similar at the wavelength of 877.4 nm when compared with the wavelength of 497.3 nm. The reducing separability of different color materials may cause difficulty in searching for a match for these bands. Thus, separability of different color materials is another factor to consider when selecting the feature bands.
If the band is less affected by the stains, it means that it shares similar gray values for the same color material with and without stain cover. As shown in
Figure 3, the gray values are obviously different below a wavelength of 700 nm for the same color material due to the stain cover, but they became much closer as the wavelength increases. Thus, this paper proposes the use of gray value difference between samples with and without stain cover to determine the band sections. Due to the high correlation between adjacent bands for hyperspectral data, we used the band average method [
28] and took the average value for every five adjacent bands to reduce the computational burden. Furthermore, each band was normalized to a 0–1 range with its maximum and minimum values to unify the subsequent quantization. With the assumption, the band sections were estimated using Equation (2):
and
indicate the average gray value of the
j-th color material with and without stain cover at band
i, respectively, and
indicates the difference between them.
Using the difference in Equation (2), the less affected band section
for the
j-th color material was obtained by setting a defined threshold
and taking
.
, which was empirically set to 0.05 throughout the experiments presented here. The stain region might cover multiple color materials, so to ensure that all color materials below the stain were properly accounted for, we find the intersection of the band sections
for the various color materials
j with Equation (3), where
m indicates the number of color material types:
To ensure that the selected feature bands had separability for different color materials and had fewer relations to provide more information, we used the between-class variance and correlation coefficient to determine the band combinations with randomly selected samples. A total of 800 pixels were selected for each color material and were distributed in both stained and nonstained regions. The established rule to determine the band combinations from the band sections
:
where
OBP are the calculated values from the combinations of three different bands according to the rule.
,
, and
indicate the between-class variance at bands
i1,
i2, and
i3, respectively. They describe the separability for different color materials, and the larger the value, the better the separability.
,
and
indicate the correlation coefficient between these three bands where the smaller the correlation coefficient, the lower the data redundancy. Additionally, the between-class variance was computed as shown in Equation (5), where
is the sample number of color material
j,
the mean of samples of color material
j at band
i, and
is the mean of all the samples at band
i. The correlation coefficient was computed as shown in Equation (6), where
indicate the samples’ gray value at bands
i and
k, respectively:
With these definitions, the three band combinations that maximize the OBP were selected as the feature bands.
2.3. Patch-Based Color Constrained Poisson Editing to Remove Stains
2.3.1. The Principle of Poisson Editing
Poisson editing is a classical algorithm that reconstructs the image based on the gradient domain [
29], and this idea has been widely used in seamless image composition and cloud removal in remote sensing images [
30,
31]. It interpolates inward along the boundary of the target image and enforces the spatial variation of the reconstructed image consistent with that of the source image. The colors of the reconstructed image change slowly from the boundary of the target image to the source image. The idea is as shown in
Figure 4. The RGB image with stains is regarded as target image
IT, and the selected feature bands are regarded as the source image
IS. The aim was to reconstruct the pixel values
of stain regions
according to the target image’s pixel value
located on the stain region’s boundary
and according to the gradient
V from the source image. To obtain an accurate and optimized reconstruction result (i.e., the solution of the unknown function
), the problem is formulated as:
where
is the gradient operator and can be computed from the following finite difference formula:
Equation (7) aims to derive the result
f with a gradient that is as close to the guidance vector field
V as possible, where
V was computed from the gray scale image that was converted from the source image. The solution to Equation (7) is the unique solution of the following Poisson equation with Dirichlet boundary conditions:
where
is Laplacian operator and
indicates the divergence of the gradient vector
V = (
. The specific solution process can be referenced from the literature [
29].
When Poisson editing was directly introduced into stain removal, although it could achieve the purpose of removing the stains since the source image captured fewer stains, it could not restore the original color covered by the stains because of the large color difference, since the target image and source image corresponded to different wavelengths, as shown in
Figure 5b. Although the stains could be removed or diluted, the original color is not effectively restored. For example, the stained region in the top left corner of
Figure 5a should have been filled with blue color material pixels in
Figure 5b but was filled with black color material pixels. Bie et al. [
32] introduced new energy terms to improve Poisson editing by drawing the specific color strokes and could control the color of the composited images. This approach inspired us. However, to restore the paintings’ original color, we could not casually specify the color of the stained regions. To solve this problem, we searched for an optimal matched patch in the nonstained region for each pixel in the stained region of the feature bands and selected some constructed pixels from the RGB image based on the searched patch as a color constraint to reconstruct the image.
2.3.2. Color Constraint Construction
To restore the original appearance covered by the stains, this paper first used the gray values of the feature bands to search for the optimal matched patch in the nonstained region for each pixel in the stained region, as shown in
Figure 6a. Although exhaustive patch searching can obtain a global optimum, the computational cost is high, especially for large images. Considering the search time, this paper chose the Patchmatch algorithm [
33] to search for the optimal matched patch (with a patch size of 7 × 7) in the feature bands, and the search was based on the similarity measure of the sum of squared differences (SSD). Each pixel in the stained region could obtain a matched patch and each pixel overlaps several neighboring patches, as shown in
Figure 6a. The target pixel, marked with a red star, overlapped the neighboring green patch in the stained region. Next, we obtained the locations that the neighboring patch’s match overlapped with the target pixel, as marked by the red circles in the nonstained region in
Figure 6a. Then, the averaged color votes [
33] of the RGB image at these locations were used to generate the color constraint of this pixel, as shown in
Figure 6b. However, if all of the constructed pixels based on the searched patch were selected as the color constraint, the reconstructed image would rely solely on the constructed color constraint, and the gradient from the source image would not contribute. This means that once a match error occurs, it is hard to correct this error with the gradient, as shown in
Figure 7. The pixels marked by the red box in
Figure 7a are the constructed pixels based on the searched patch, and we can see that this region, which should be filled with paper pixels, is filled with a red color material due to a match error.
Figure 7b is the reconstructed image based on the full constructed pixels as the color constraint, namely based on
Figure 7a, where we see this error continue. To balance the color constraint and gradient, we selected some constructed pixels as the final constraints, as shown in
Figure 7c. Although the selected color constraint pixels (marked with white blocks) were still located in the incorrectly matched areas, the error phenomenon was reduced for the reconstructed
Figure 7d, based on partial color constraint. In this paper, a partial color constraint was selected through quadtree segmentation, where the stained region of the feature bands was segmented into blocks containing
pixels, and then a quarter of each quantity of blocks (discussed in further detail below) was randomly selected as the final constraint as shown in
Figure 7c.
2.3.3. Image Reconstruction with the Color Constraint
This paper aimed to reconstruct stained regions, and the values for the nonstained regions remained unchanged. As mentioned above, the original Poisson editing could not restore the paintings’ original color since the target image and feature bands have different colors. Thus, this paper improved Poisson editing by constructing the color constraint
f′, and the new method can be formulated as:
is used to balance the color constraint and is set to 0.1 in this paper. The implementation of Equation (10) can be discretized on the pixel grid. Let
p indicate the pixel located in a region contaminated by a stain;
indicates the 4-connected neighbor set of pixel
p, and pixel
q is one of its 4-connected neighbors (i.e.,
).
indicates a pixel pair, and
is the gradient of the source image, which is computed as
. Let
be the value of the image function
f at pixel
p. The aim of this paper is to obtain
. After discretization, Equation (10) can be converted into Equation (11):
Equation (11) can be further generalized into Equation (12), and
indicates the neighbor numbers in
Equation (12) can be solved with conjugate gradient methods: