1. Introduction
Recently, screen content images (SCIs) have been widely applied as a form of information representation in our modern society owing to the popularization of multimedia applications including remote screen sharing, Cloud and mobile computing, commodity advertisements of online shopping websites and real-time online teaching [
1,
2]. In many actual engineering applications, including compression, storage, transmission and display, the visual quality of SCIs will inevitably be degraded owing to distortions including noise, blur, contrast variation, blockiness and quantization loss. Undoubtedly, the quality degradation of SCIs will significantly affect the visual perception of observers. Thus, it is necessary and meaningful to develop quality evaluation methods for SCIs in actual engineering applications.
Over recent decades, a large number of image quality assessment (IQA) methods have been elaborately designed and applied in the field of digital image processing. The peak signal-to-noise ratio (PSNR) is a conventional IQA method and has been applied extensively. However, it has inferior prediction performance since it only deals with the difference between pixels and does not take into account the perceptual properties of human vision. To overcome this drawback, the research community has proposed many advanced full-reference (FR) IQA metrics that require the entire information of the reference image. These metrics skillfully model intrinsic properties of the human visual system (HVS) and representative metrics include structure similarity (SSIM) [
3], feature similarity [
4], visual information fidelity [
5], gradient magnitude similarity deviation (GMSD) [
6], the internal generative mechanism (IGM) metric [
7] and deep similarity [
8]. In [
3], the quality of an image is measured by combining the changes from the luminance, contrast and structure. In [
4], two complementary low-level features, namely the phase congruency and the image gradient magnitude, are adopted to characterize the image local quality. In [
5], the loss of image information is quantified and used to assess the visual quality of an image. In [
6], the standard deviation of the gradient magnitude similarity map is calculated as the quality index of an image. In [
7], according to the IGM theory, an autoregressive prediction method is used to decompose an image into the predicted and disorderly parts whose distortions are measured by the structural similarity and the PSNR, respectively. In [
8], the local similarities of features generated by the convolutional neural network (CNN) are calculated and pooled together to assess the quality of an image.
Additionally, alongside the FR IQA metrics, some reduced-reference (RR) IQA metrics [
9], and no-reference/blind IQA metrics [
10], have also been presented over recent decades. The RR IQA metrics only need partial information of the reference image, while the no-reference (NR) IQA metrics need no information from the reference image. Many blind IQA methods first extract quality-aware features and then these features are supplied into a machine learning model to obtain the quality assessment result. Mittal et al. [
11], presented a blind IQA metric called BRISQUE in which the naturalness of an image is quantified and natural scene statistics (NSS) features of locally normalized luminance values are adopted. Li et al. [
12], presented a blind IQA metric which adopts two types of features, namely the luminance features represented by the luminance histogram and the structural features denoted by the histogram of the local binary pattern (LBP) of the normalized luminance. Li et al. [
13], designed a blind IQA metric based on structural features denoted by the gradient-weighted histogram of the LBP computed from gradient values. In [
14,
15], the statistical histograms of the texture information of an image are extracted as quality-aware features to describe the distortion degree of the image. In [
16], NSS features extracted from reference images are used to learn a multivariate Gaussian model and then this learned model is used to evaluate the quality of distorted images.
Although the IQA methods mentioned above obtain superior performance, they have been specially developed to predict the quality of natural images and cannot be used to precisely assess the quality of SCIs. The reason for this is that SCIs have some distinctly different characteristics compared to natural images. Firstly, their contents are different. Generally, texts, natural images, slides and logos are mixed in an SCI and so an SCI has rough edges, simple shapes, thin lines and a small number of colors. Two typical examples of SCIs are shown in
Figure 1. However, a natural image contains continuous-tone content with slow-varying edges, complicated structures, thick lines and more colors. Secondly, their statistical distributions are different. In general, after luminance values of a natural image are processed by the mean subtracted contrast normalized (MSCN) operation, their statistical distribution can be modeled by a Gaussian function [
11]. By comparison, for an SCI, this statistical distribution behaves like a Laplacian contour [
17] and the curve of this statistical distribution varies dramatically. Specifically, the center of this curve has a keen-edged pimpling and the remaining parts are still wavy [
18]. Thirdly, their image activity levels [
19], are different. Because the pixel values of an SCI have greater variations in local regions, the activity measurement value of an SCI is greater than that of a natural image [
18]. As SCIs and natural images have these different properties, users have completely different viewing experiences regarding the quality degradation of SCIs and natural images. Therefore, the existing IQA methods developed for natural images are inappropriate to assess the quality of SCIs.
To date, a few algorithms have been proposed to perform the quality evaluation of SCIs. The earliest study of the quality assessment of SCIs was conducted by Yang et al. [
18], who proposed an FR screen content image quality assessment (SCIQA) method called SPQA. In this method, for textual layers of SCIs, both luminance and sharpness similarities are calculated, while for pictorial layers of SCIs, only the sharpness similarity is computed. Respective quality values of textual and pictorial layers are combined as the overall quality score of a distorted SCI by employing a weighting activity map. However, the predictive performance of the SPQA method needs to be improved further. Fang et al. [
20], proposed an FR SCIQA method, in which the similarity of structural features denoted by the gradient information is calculated to estimate the quality of textual regions of the SCI and the similarities of luminance features and structural features denoted by the LBP features are computed to predict the quality of pictorial regions of the SCI. Ni et al. [
21], explored the edge variation of SCIs in depth and employed three edge characteristics including the contrast, width and direction of edges, which are extracted from a parametric edge model. Fu et al. [
22], adopted a two-scale difference-of-Gaussian (DOG) filter to extract the edges of an SCI and the similarities of small-scale edges are calculated and combined by using larger-scale edges as weighting values. Wang et al. [
23], designed an FR SCIQA method based on edge characteristics extracted from gradient values, which include the edge sharpness, the edge brightness change, the edge contrast change and the edge chrominance. In [
24], the local similarities of two chrominance components and Gabor features generated by the imaginary part of the Gabor filer are computed and combined to produce the assessment score. In [
25], statistical features of the primary visual and uncertainty information are used to design an RR SCIQA metric. Wang et al. [
26], proposed an RR quality assessment method of compressed SCIs in which wavelet domain features including the mean, variance and entropy of wavelet coefficients are used to learn a regression model. Rahul et al. [
27], presented an RR SCIQA method based on feature points identified by the cascade DOG filters. The aforementioned methods of SCIs [
21,
22,
23,
24,
25,
26,
27] have one common drawback: they employ the same feature representation method to characterize the quality degradation of the entire content of SCIs and do not take different steps to deal with the different contents of SCIs. Since human eyes have an obviously different visual experience to the distortions of the textual and pictorial contents contained in SCIs, it is unreasonable to employ the same features to denote the quality degradation of the textual and pictorial content of SCIs. Additionally, these FR or RR methods require the entire or partial information of reference SCIs which cannot be acquired in the majority of actual cases.
Gu et al. [
28], put forward an NR SCIQA model in which one free energy feature and twelve structural degradation features are extracted to train the assessment model. Yue et al. [
29], designed a blind SCIQA method based on the CNN, in which both the predicted and unpredicted parts obtained according to the IGM theory are inputted into the CNN. However, in [
28,
29], predictive values generated by objective FR SCIQA methods rather than subjective ratings values are used as training labels, which may result in a deviation. In [
30], local and global sparse representations are conducted to design an NR SCIQA model. Lu et al. [
31], performed the blind quality assessment of SCIs based on statistical orientation features and structural features denoted by the LBP histograms of nine gradient maps. Min et al. [
32], proposed an NR quality evaluation method of compressed SCIs in which the features of corners and edges at multiple scales are integrated by using a multi-scale weighting strategy. Fang et al. [
33], presented a blind SCIQA model by considering both local features denoted by the histograms of locally normalized luminance values and global features denoted by the histograms of the texture features extracted from gradient maps. Gu et al. [
17], developed a blind assessment model of SCIs comprising four elements, namely picture complexity, screen content statistics, brightness and sharpness. Although these existing blind evaluation models, which were specifically developed for SCIs, obtain better prediction performance compared to traditional evaluation models of natural images, they still cannot obtain a high prediction accuracy and there is still a great deal of room to enhance their performances. Thus, the blind quality assessment of SCIs remains a challenging problem and needs to be further investigated in depth by the research community.
To further improve the predictive accuracy of existing blind evaluation methods of SCIs, in this study, we propose a blind SCIQA method based on regionalized structural features (BSRSF) which are closely relevant to the intrinsic quality of SCIs. Firstly, considering very different characteristics of the textual and pictorial content in an SCI, the SCI is segmented into two completely different types: textual regions and pictorial regions. Secondly, to derive respective assessment values of textual and pictorial regions, their features are respectively extracted by applying different methods according to their characteristics and then they are separately supplied to machine learning models, i.e., support vector regression (SVR). Specifically, given the noticeable structural information contained in textual regions, the structural information is used as the quality-aware feature of textual regions. For pictorial regions, since human vision is sensitive to texture information and luminance variation, texture features are used as structural features; meanwhile, the luminance information is used as the auxiliary feature. Finally, an activity weighting strategy is proposed to fuse the assessment values of textual and pictorial regions as the final assessment value of this degraded SCI. Experimental results show that the proposed BSRSF method achieves better prediction performance than other existing blind SCIQA methods on SIQAD and SCID, which are often employed as validation databases of SCIs. In contrast to the existing blind SCIQA methods, the main contributions of the proposed BSRSF metric are as follows:
- (1)
We propose improved histograms of the oriented gradients, which are extracted from the multi-order derivatives. In the proposed method, these histograms are adopted as structural features to predict the quality of textual regions of SCIs.
- (2)
We extract texture features from both the spatial and shearlet domains as structural features of pictorial regions. The statistical histograms of the local derivative pattern are used as texture features in the spatial domain. We propose a new local pattern descriptor called the shearlet local binary pattern to represent texture features in the shearlet domain. To the best of our knowledge, this is the first attempt to extract texture features from the shearlet domain.
- (3)
We propose an activity weighting strategy to combine the visual quality of textual and pictorial regions. This strategy is based on the activity degree of different regions in the SCI, in which the weights are extracted from gradient values of this SCI.
The remaining content of this paper is organized as follows. The detailed content of the proposed BSRSF method is presented stage-by-stage in
Section 2. Experimental results are given in
Section 3. Finally, the conclusions of this paper are presented in
Section 4.