1. Introduction
With the rapid development and popularization of multimedia information technology, high-quality images have become one of the indispensable media contents in our daily lives. Multi-exposure fusion (MEF) can be considered an effective technique to enhance the quality of the image [
1]. It provides us a new choice to obtain as much natural scene information on the bright and dark regions as possible [
2]. However, the weight assignment processes of MEF algorithms for image fusion will bring about the quality degradation problem that affects the symmetrical perception of the human eyes, especially the structure and detail loss by the underexposure and overexposure [
3]. Therefore, it is urgent to design the specific objective quality assessment methods for MEF images. Actually, the performance of any objective quality assessment method is best gauged by its correlation with human subjective opinions of quality. Such human subjective opinions are obtained by conducting the subjective quality assessment. When the opinions are averaged, the mean opinion score (MOS) is obtained, representing the perceptual subjective quality.
Up to now, several studies have been proposed to solve the quality assessment problem for ordinary images. Different image quality assessment (IQA) methods can be divided into three categories: full-reference (FR), reduced-reference (RR), and no-reference/blind (NR) [
4]. The FR and RR IQA methods need the pre-defined reference information to make the comparison, while blind IQA (BIQA) methods do not. Since the reference image is difficult to be obtained in real applications, the development of BIQA methods may be more meaningful. Generally, BIQA methods need some quality-sensitive features to learn the quality prediction model. Therefore, the core of designing BIQA methods refers to feature extraction.
Some excellent BIQA methods for ordinary images have been presented via natural scene statistics (NSS) analysis [
5,
6,
7,
8,
9,
10]. For example, one of the famous methods is blind/referenceless image spatial quality evaluator (BRISQUE) [
5], which uses scene statistics of locally normalized luminance coefficients to quantify the naturalness loss. The specific features refer to the parameters of NSS models including the empirical distribution of locally normalized luminances and products of locally normalized luminances in the spatial domain. Moreover, Moorthy et al. [
6] proposed NSS-based method, dubbed the distortion identification-based image integrity and verity evaluation (DIIVINE) method. It combines the distortion identification with distortion-specific IQA process which mainly extracts the summary statistics based on an NSS wavelet coefficient model. Different from DIIVINE, Saad et al. [
7] designed the blind image notator using discrete cosine transform (DCT) statistics (BLINDS-II) method in the DCT domain, and the estimated parameters of the DCT coefficients-based NSS model were utilized to form features for perceptual quality prediction. Li et al. [
8] extracted a novel effective structural feature to perceive structural degradation, which utilized the local binary pattern (LBP) to encode the gradient information (GLBP) of images, denoted as the gradient-weighted histogram of GLBP (GWH-GLBP) method. Liu et al. [
9] focused on the gradient orientation which has not been deeply explored before, and also deployed a relative gradient magnitude feature accounting for perceptual masking, the method called oriented gradients IQA (OG-IQA). Oszust et al. [
10] proposed a hand-crafted blind image assessment measure with local descriptor and derivative filters (SCORER), which emphasized the local features carried by image derivatives of different orders and used for the quality prediction. In addition, Liu et al. [
11] constructed the CurveletQA method by utilizing the curvelet transform to extract a set of statistical features, including the coordinates of the maxima of the log-histograms of the curvelet coefficients values, and the energy distributions of both orientation and scale. Gu et al. [
12] combined local and global considerations to develop a new NR image quality metric for contrast distortion (NIQMC). The basic principle behind the NIQMC method is that an image with more valuable information has better quality.
However, the above-mentioned BIQA methods are just attempting for quality assessment for ordinary images, which cannot accurately predict the quality of MEF images. Currently, with the emergence of various multi-media forms, the tone-mapped image has also attracted a lot of attention. It is transformed from high dynamic range (HDR) image, which is also a technology for quality enhancement [
13]. For tone-mapped images, Gu et al. [
14] established a blind tone-mapped quality index (BTMQI) method by analyzing the image information, naturalness, and structure. Kundu et al. [
15] extracted the NSS features from spatial domain and HDR-specific gradient features in the gradient domain to design a method called HDR image gradient-based evaluator (HIGRADE). Although the tone-mapped HDR technology has a similar goal with MEF technology, the latter one bypasses the creation process of HDR images. Naturally, the resulting image obtained by the two technologies will present different phenomena.
Overall, the aforementioned methods can accurately evaluate the quality of ordinary and tone-mapped images, while the experimental performances validated on the available MEF database are not always excellent due to the specific distortion that existed in MEF images, such as detail information loss by overexposure and underexposure. It is desirable to propose the IQA methods for MEF images. Currently, there are some FR methods [
3,
16,
17,
18,
19] for the MEF image. Such as, Ma et al. [
3] proposed an FR IQA method based on the structural similarity [
20]. Xing et al. [
16] designed a method for MEF images by utilizing the contrast structure and contrast saturation. Deng et al. [
17] extracted color, structure, and texture-related features for quality regression. Rahman et al. [
18] proposed a quality map fusion approach to obtain the true reference of source MEF images for the perceptual quality assessment. Martinez et al. [
19] combined multi-scale computation and structural similarities for quality prediction. However, there are almost no specific BIQA methods designed for MEF images.
To propose an appropriate BIQA method for MEF images, how to extract effective feature vectors to distinguish the difference of MEF images with good/poor representations is a rather critical issue. Motivated by these, a curvature and entropy statistics-based blind MEF image quality assessment (CE-BMIQA) method are proposed in this paper, and the main contributions are concluded as follows:
- (1)
In terms of structure and detail distortion introduced by inappropriate exposure conditions, the histogram statistics features of surface type maps generated from the mean and Gaussian curvature, and entropy statistics features in the spatial and spectral domains are extracted to form the quality-aware feature vectors.
- (2)
Since the contrast variation is a key factor affecting the quality of the MEF image, contrast energy weights are designed to aggregate the above curvature features. Furthermore, a multi-scale scheme is adopted to perceive the distortion of the image in different resolutions for simulating the multi-channel characteristics in the human visual system.
- (3)
Considering that it is significant for multimedia applications to bridge the gap between BIQA methods and MEF images, a novel CE-BMIQA method specialized for MEF images is proposed. Experimental results on the available MEF database demonstrate the superiority of the proposed CE-BMIQA method compared with the state-of-the-art BIQA methods.
The remaining of this paper is organized as follows: the proposed CE-BMIQA method is specifically analyzed in
Section 2. The experimental results and analysis including database and experimental protocols, performance comparison, impacts of individual feature set and scale number, and computational complexity are provided in
Section 3. Further discussion is given in
Section 4. Finally, the conclusion is given in
Section 5.
2. Proposed CE-BMIQA Method
As the previous introduction mentioned, MEF technology merges a sequence of images with different exposure levels into a MEF image, which can be formulated as
where
k is the number of the multi-exposure source images,
Xv(
r) and
ωv(
r) represent the luminance value and the weight of the
r-th pixel in the
v-th exposure image, respectively,
I denotes the output MEF image.
After the weight assignment process, the MEF image is prone to quality degradation in some areas due to the unreasonable distributed weights among source multi-exposure images, which is usually expressed as the structure and information loss by the underexposure and overexposure.
Figure 1 depicts three MEF images generated by Li’s algorithm [
21], Raman’s algorithm [
22], and local energy weighting [
23], the corresponding mean opinion score (MOS) values are also provided, where the higher the MOS value, the better the visual quality of the MEF image. From
Figure 1, the following observations can be drawn:
Figure 1a is the best with its abundant detail information and proper exposure to be full of recognition;
Figure 1b has lower brightness due to underexposure condition, and cannot preserve the fine details in—crowd, grass, and balloon areas;
Figure 1c yields the annoying and unnatural artifacts near the edges of sky and balloon, which is regarded as a pseudo contour. Moreover, the MOS values of three MEF images are 8.74, 3.17, and 2.70, respectively. Therefore, structure and information preservation directly affect the visual perceptual quality of MEF images. Obviously, the distortion form of the MEF image is symmetric to its final prediction quality.
Generally, a BIQA method includes two stages that are training stage and testing stage, the schematic flowchart is shown in
Figure 2. The specific conceptual steps are given in
Table 1. To meet the quality discriminative requirement, a novel BIQA method for the MEF image is proposed, dubbed as CE-BMIQA, and
Figure 3 depicts the feature extraction process of the CE-BMIQA method; refer to
Figure 2 and
Table 1 for other processes. Specifically, two types of effective features (i.e., the histogram statistics of surface type maps and entropy statistics) are excavated for discriminating the above-described distortion. Then, a feature-weighting approach based on contrast energy is designed to capture the structure loss of image with contrast changes. Finally, all extracted features based on a multi-scale scheme are aggregated to establish the mapping relationship with subjective scores, which is a symmetric process. The detailed implementations of the proposed CE-BMIQA method are given in the following three subsections.
2.1. Curvature Statistics
Generally, a MEF image
I(
x,
y) can be regarded as a product with irregular concave and convex structures, and the points on the image are classified into different types according to their geometric characteristics, which are known as surface types (STs) [
24]. Considering STs will change with the introduction of distortion, they are utilized as the basis for the ultimate statistic feature computation to percept structural loss. First, the mean curvature
Mc and Gaussian curvature
Gc of
I(
x,
y) are calculated to determine the type of each point in the MEF image [
25], which is expressed as
where
gx and
gy are the first-order partial derivatives, and
gxx,
gyy, and
gxy are the second-order partial derivatives, respectively.
To calculate
gx,
gy, gxx,
gyy, and
gxy, the MEF image
I(
x,
y) should be smoothed by a binomial filter first. It can be represented as
S =
ssT, the vector
s is expressed as
Then, derivative estimation window masks should be calculated via
where
l0,
l1,
l2 are given by
Finally,
gx,
gy, gxx,
gyy, and
gxy can be defined as
where ∗ represents the convolution operation.
Figure 4 gives three corresponding mean and Gaussian curvature maps of
Figure 1, it is pretty obvious that mean curvature maps emphasize more structural details of MEF images compared with Gaussian curvature maps, and they all have distinctive appearances because of the varying distortions.
With the above-mentioned procedures, ST maps can be computed by different value combinations of
Mc and
Gc, while
Mc and
Gc just determine the graph surface shapes and the convex surface shapes, respectively. In total, there are nine STs obtained from the above curvature assignation process, and their complete definitions are listed in
Table 2. From
Table 2, it can be observed that each point in the MEF image can be categorized into eight fundamental STs (i.e., Peak, Ridge, Saddle Ridge, Flat, Minimal, Pit, Valley, and Saddle Valley) and one non-shaped ST (defined as “None”) for the combination of
Mc = 0 and
Gc > 0.
Figure 5a–c shows three corresponding ST maps of
Figure 1, it can be found that they are sensitive to the existing distortion among three MEF images. Therefore, it can be used as an effective way for distortion identification. To quantify the differences of ST maps clearly, histograms of them are also given in
Figure 5d–f, respectively. In
Figure 5d–f, the horizontal axis represents the ST label, and the vertical axis represents the number of each label. From them, corresponding STs histogram statistics will undergo variation greatly when MEF images are destroyed by complex distortion with different intensities.
Moreover, contrast variation of MEF image is also a key factor for affecting its visual quality. To simultaneously capture the structural details caused by contrast changes, a novel contrast energy weighted STs histogram statistic method is presented, which can be expressed as
where
N is the number of pixels in a MEF image,
j is the index of possible STs, and
WC is the contrast energy map predicting local contrast [
26], which is defined as:
where
dv and
dh are vertical and horizontal second derivatives of Gaussian function, respectively,
o is the contrast gain,
γ is the maximum value of
φ(
I),
I represents the MEF image, and
μ is the noise threshold. Note that the parameters
o and
μ in the Equation (10) are set according to the recommendations given in [
26].
Finally, as the special case (i.e., ST = 4) is invalid, the 8-dimensional curvature statistics feature set is obtained and denoted as F1.
2.2. Entropy Statistic
As can be found from
Figure 1, the amount of information contained in each MEF image is usually impaired to different degrees due to underexposure and overexposure, and it can be measured by the entropy, where information loss of image will cause a low entropy value. The entropy values of MEF images in
Figure 1 are 7.6106, 7.2598, 6.8534, respectively. Our basic goal is to utilize entropy as statistic properties for distorted MEF images. Obviously, compared with global entropy, local entropy can better discriminate the information distribution in the spatial domain, which just eliminates the impact of special cases that MEF images with the same global entropy may appear distinctly different in visual quality. Since the local entropy is more sensitive to quality degradation, so the entropy of the MEF image block is calculated to analyze the joint distribution of each pixel in the spatial domain, which is defined as:
where
n is the value of pixel in each MEF image block, and
p(
n) is the probability density of pixel value.
In addition, block-based entropy in the spectral domain (i.e., after the discrete cosine transformation (DCT) operation) is also computed by using the function of the probability distribution of DCT coefficients within 8 × 8 blocks. To calculate the spectral entropy, the DCT coefficient matrix
C of each block should be obtained and then normalized to produce a probability distribution map, which is expressed as:
where
a and
b are the integers between 1 and 8, and
. Then, the local spectral entropy is defined as:
To observe the behavior of spatial and spectral entropy values against diverse distortions produced by different MEF algorithms, the above calculations are conducted on the different MEF images in
Figure 1. As shown in
Figure 6, different MEF algorithms bring different influences on the spatial and spectral entropy values, especially reflected in the shape of the histogram. Therefore, we use the mean and skew values of spatial and spectral entropy as the 4-dimensional entropy statistics feature set, denoted as
F2.
2.3. Quality Regression
By aggregating the curvature and entropy statistics feature sets, a total of 12-dimensional feature vectors can be obtained and denoted as
F = {
F1,
F2}. Moreover, multi-scale space of image can capture the contents from fine level to coarse level, which is inspired by the processing mechanism on the low-level retina of human visual system (HVS). Therefore, the above feature extraction process is carried out on
l scales. After feature extraction, random forest (RF) is utilized to map the feature space to quality space, so that the final quality score
Q can be computed by:
where
fm(·) is the trained function for quality regression, and
Fl is the extracted quality-aware feature vector on the
l-th scale.
5. Conclusions
In this paper, a curvature and entropy statistics-based blind multi-exposure fusion (MEF) image quality assessment (CE-BMIQA) method is proposed for distinguishing the differences among image fusion algorithms. First, histograms of surface type maps based on mean and Gaussian curvature are calculated to form the curvature statistics feature, so as to identify the structure degradation of MEF images. Moreover, a novel contrast energy weighting approach is presented for capturing the structure loss with contrast variation. Then, the mean and skew of spatial and spectral entropy values for MEF image blocks are implemented as another type of quality-aware features. Finally, a multi-scale scheme is adopted to perceive the distortion of MEF image from the fine level to the coarse level. Experimental results on the benchmark MEF image database demonstrate that the proposed CE-BMIQA method achieves outstanding performance than other comparison methods as a whole.
In addition, the CE-BMIQA method still has different evaluation performances for the visual quality of each individual scene. The main reason is that the content of each scene is different, and some scenes may have fewer over-exposure regions and under-exposure regions. As a result, the distortion perception based on contrast and detail information in the CE-BMIQA method cannot accurately predict their quality. In this case, perhaps more attention should be paid to the color and aesthetic distortion. In future work, we will more comprehensively consider the distortion types of MEF images and integrate more considerations.