Abstract

In this paper, an accurate and efficient image matching method based on phase correlation is proposed to estimate disparity with subpixel precision, which is used for the stereovision of narrow baseline remotely sensed images. The multistep strategy is adopted in our technical frame; thus the disparity estimation is divided into two steps: integer-pixel prematching and subpixel matching. Firstly, integer-pixel disparity is estimated by employing a cross-based local matching method. Then the relationship of corresponding points is established under the guidance of integer-pixel disparity. The subimages are extracted through selecting the corresponding points as the center. Finally, the subpixel disparity is obtained by matching the subimages utilizing a novel variant of phase correlation approach. The experiment results show that the proposed method can match different kinds of large-sized narrow baseline remotely sensed images and estimate disparity with subpixel precision automatically.

1. Introduction

Stereovision is an advanced task in remote sensing and photogrammetry [1]. The aim of stereovision is to estimate the disparity through matching two or more images of same scene in different views and extract digital elevation model (DEM) through the disparity [2]. Intuitively, the disparity represents the displacement vectors between corresponding pixels that horizontally shift from the left image to the right image [3]. The stereovision of narrow baseline remotely sensed imagery is a new research hotspot for stereovision in recent years [4]. Automatic subpixel image matching is the essential technique for narrow baseline stereovision [5]. We will brief review the stereovision with different baseline and subpixel image matching method in the remaining part of this section, respectively.

1.1. Principle of Stereovision

The process of DEM extracting can be represented by a stereovision model [6], as shown in Figure 1. B and H express the baseline and the altitude of a satellite. L and R are the left and right images captured by the satellite from different views; h represents the height of ground object to be measured. A and B are the 3D points of the real world scene, which are projected to the three 2D locations a, , and in L and R. The 2D points of A and B in L are coincident; thus the disparity d of the two corresponding points a and equals the distance between and in R. D represents the mapping of d in the 3D scene; according to the similar triangles of geometry principle, the equation is established. Assume that the ground sample distance (GSD) of the sensed image is G (meters/pixel); the height h of ground object can be generated: . Because the altitude of satellite is much larger than the height of ground object, that is, , the height h can be approximated as . It can be known that the precision of h is proportional to the precision of disparity and GSD and is inversely proportional to the B/H ratio. In most real sensors, H and G are limited and fixed; therefore, only two aspects can be used to enhance the precision of h: Improve the precision of disparity to subpixel level [7]; increase the length of baseline, that is, increasing the B/H ratio. In traditional stereovision of remotely sensed imagery, B/H ratio is usually set to about 1, called stereovision of wide baseline remotely sensed imagery. Even if the disparity precision is only up to the integer-pixel level, the DEM can be extracted accurately.

1.2. Comparison of Wide Baseline and Narrow Baseline Stereo

The stereovision of wide baseline remotely sensed imagery can not perform well in urban areas with dense buildings [8]. The captured process of the wide baseline remotely sensed image pair is shown in Figure 2(a). Due to the long baseline and large viewing angle, there is a large occlusion area and large geometric distortion between the images. In addition, the interval time of two imaging processes is long; the illumination condition has changed greatly. The above interference factors will lead to a high false matching rate in stereovision of wide baseline remotely sensed imagery. Therefore, it requires a lot of manual and postcorrection to extract DEM [9]. In order to solve this problem of DEM extraction in urban areas, stereovision of narrow baseline remotely sensed imagery technique has emerged and gradually become a new research focus in stereovision and remote sensing [10, 11]. The captured process of the narrow baseline remotely sensed image pair is shown in Figure 2(b). Due to the short baseline and small viewing angle, there is a small occlusion area and little geometric distortion between images. When (i.e., a very narrow baseline), then occlusions will be minimized, essentially because each image view becomes more geometrically similar. In addition, the interval time of two imaging processes is short; it can be considered that the illumination condition is similar to the same time. Due to above advantages, the stereovision of narrow baseline remotely sensed imagery is suitable for extracting DEM in the urban areas [8]. Two sets of simulated remotely sensed image pairs provided by Beijing Institute of Space Mechanics and Electricity are shown in Figure 3. Figure 3(a) is simulated wide baseline remotely sensed image pair with a B/H ratio of 1, and Figure 3(b) is simulated narrow baseline remotely sensed image pair with a B/H ratio of 0.05. From the comparison of Figures 3(a) and 3(b), we can see that the occlusion range and the geometric distortion of narrow baseline image pair are less, and there is a higher degree of similarity between left and right image. It is beneficial to improve the accuracy of image matching; thus fully automatic DEM extraction can be achieved through stereovision of narrow baseline remotely sensed imagery.

However the reduction of B/H ratio will inevitably lead to the reduction of precision for DEM extracting. If the stereovision of narrow baseline is going to extract the DEM as precise as the stereovision of wide baseline, the disparity needs to achieve pixel precision, where is the ratio of wide baseline and narrow baseline [6]. For example, when B/H is 0.05, the precision of disparity needs to achieve 1/20 pixels to satisfy the DEM extraction. Therefore, the key technique for the stereovision of narrow baseline remotely sensed imagery is the high precision subpixel image matching [5, 7].

1.3. A Survey of Subpixel Image Matching Method

Automatic subpixel image matching is one of the most essential techniques in stereovision. It can be divided into three major categories: interpolation based method, fitting based method, and phase correlation method [12]. Interpolation based method includes the original image interpolation method and the matching cost interpolation method. The matching cost interpolation method is the representative algorithm, which combines the advantages of high efficiency and accuracy [13, 14]. The initial cost volume is interpolated by various interpolation function, and the subpixel disparity is estimated by searching the extremum of interpolated cost volume. At present, this method is usually used for the auxiliary system of driverless vehicles, robot navigation, and unmanned aerial vehicle. The original image interpolation method significantly increases the computing load, limits the possible precision to the chosen upsampling rate, and also may introduce interpolation artifacts [5]. The fitting based method generally postprocess the cost volume or disparity plane by fitting method to estimate the subpixel disparity. The cost volume fitting method fits the peak neighborhood of cost volume into a parabola; then the subpixel disparity is estimated by searching the extreme of parabola [15, 16]. The disparity plane fitting method models the disparity field by segmentation constraints, and subpixel disparity is obtained by fitting disparity plane [1720]. The fitting based method is efficient, but the disparity precision is low. Phase correlation method provides high efficiency and accuracy via fast Fourier transform and other supplementary approaches under ideal conditions. In general, the main peak location of the inverse Fourier transform of the normalized cross-power spectrum was interpolated with the parabolic, Gaussian, and sinc functions to get the subpixel disparity [2124].

In this paper, a subpixel image matching method based on phase correlation is proposed to estimate the disparity with subpixel precision for stereovision of narrow baseline remotely sensed imagery. The rest of this paper is organized as follows. The principle of phase correlation is introduced briefly in Section 2. A novel improved phase correlation method with subpixel precision is described in detail in Section 3. The complete algorithm is described in Section 4. Experimental evaluation is presented in Section 5. Finally, the paper is concluded in Section 6.

2. Phase Correlation

Phase correlation method is based on the well-known Fourier-domain shift property. This states that a translation between two images in the spatial domain will be expressed in the frequency domain as a linear phase difference between their Fourier transform [5]. Let and be two images with translation relationship, the size of image is , , , and are horizontal and vertical translations, respectively. Let and denote the 2D discrete Fourier transform (DFT) of two images:where and are spatial domain variables, , . and are frequency domain variables, , . The normalized cross-power spectrum of two images is calculated aswhere is complex conjugate of . The phase correlation of two images is calculated by 2D inverse discrete Fourier transforms (IDFT) of :where is IDFT function. is a 2D pulse function which exists as peak at position . The peak location corresponds to the translation between images.

Phase correlation is a common method to estimate the translation between images. Only phase information of images is utilized; thus it is almost not affected by noise and radiation difference and more robustness than spatial domain method. But the digital image is a discrete function whose Fourier transform is a discrete transform, so only the integer-pixel translation can be obtained by searching the peak location.

3. Improved Phase Correlation with Subpixel Precision

To estimate the subpixel translation, various algorithms have been proposed. In [23], Foroosh et al. achieved the subpixel estimation by approximating the Dirichlet function derived from the normalized cross-power spectrum. Nagashima et al. propose a fitting method to enhance the matching precision from integer-pixel level to subpixel level, which fits a curve surface by using the neighborhood data around phase correlation peak [21]. However, these methods only take into account the peak and its neighborhood; the distribution of phase correlation function is neglected; thus, the precision of translation estimation is low in stereovision of remotely sensed imagery. An improved phase correlation method used for subpixel estimation is designed in this section. The flow chart is shown in Figure 4. Due to the periodicity of DFT, an image can be considered to “wrap around” at an edge; therefore, discontinuities, which are not supposed to exist in real world, occur at every border in 2D DFT computation [21]. A 2D Hanning window function is applied to reduce the effect of discontinuities at image borders. The 2D Hanning window function is defined as

The subpixel translation between two images can be obtained by downsampling high resolution images with integer-pixel translation. Therefore, the phase correlation of two images can be approximated by 2D sinc function:

In theory, sinc function can approximate the distribution of phase correlation in ideal condition. However the phase correlation peak is decreased in remotely sensed images matching due to the interference of random noise, image aliasing, edge effects, and other influences. Therefore, the sinc function can not describe the distribution of phase correlation accurately. In our method, the sinc function is improved by introducing phase correlation coefficient:where is the phase correlation coefficient, . When the translation is integer-pixel and there is no interference factor, ; when the translation is subpixel or there are interference factors, . Due to this improved phase correlation method is applied to the stereovision of narrow baseline remotely sensed imagery, and the image pair is treated by epipolar rectification before matching. Thus, the phase correlation can be separated in spatial domain, and a 1D function expression is given which only exists as translation in horizontal direction:

The peak position is the translation which is expected to be estimated. To locate the high precision subpixel peak position, a peak evaluation method based on uniformly spaced sampling is proposed. Assume that is the real peak position of sampled-data, as shown at the bottom left corner of Figure 4, and and are a pair of points located pixels away from . The phase correlation values , , and of three sample points used for peak evaluation are satisfying the following relationship:

Add (8c) to (8a); the above equation group can be rewritten as

Equation (9b) can be rewritten by using the difference product formula:

Add (10a) to (10b); the equation group ((10a) and (10b)) can be rewritten as

The above equation can be expressed as a linear expression of :

Through (12), the horizontal translation with subpixel precision can be estimated:

To reduce the influence of noise, the peak evaluation is performed by selecting multiple sets of uniformly spaced sampled-data which are all the data on the axis, and the least square is applied to solve the optimal translation. Assume that sets of uniformly spaced sampled-data are selected for peak evaluation; , are determined as follows: , , , . Equation (12) can be rewritten aswhere , and . The objective of (14) is . Assume that and , then the horizontal translation with subpixel precision can be obtained by solving the least square linear equation:

4. Complete Subpixel Image Matching Algorithm

The input of our algorithm is rectified narrow baseline remotely sensed image pair, and output is the disparity with subpixel precision. Because the stability of phase correlation method is poor when the disparity search range is large [9], therefore, a multistep strategy is adopted in our technical frame, and the disparity estimation is divided into two steps: integer-pixel prematching and subpixel matching. The complete framework of our algorithm is shown in Figure 5. Firstly, integer-pixel disparity is estimated by employing an effective cross-based matching algorithm [25]. Then relationship of corresponding points is established under the guidance of integer-pixel disparity. The subimages are extracted through selecting the corresponding points as center. Finally, subpixel disparity is obtained by matching the subimages utilizing the improved phase correlation proposed in Section 3.

Assume that a narrow baseline remotely sensed image pair consists of left image and right image , which can be expressed as a classic stereo model . The disparity is used to describe the geometric position variation between and , which consists of two parts, integer-pixel disparity and subpixel disparity ; that is, . The details of our subpixel image matching algorithm are described as follows from step (a) to step (g).

(a) Assume is the pixel which is to be matched. In the first step, the cross-based support window of is defined. Firstly, an upright cross is established for . It consists of two orthogonal line segments: horizontal segment and vertical segment . A quadruple denotes the left, right, up, and bottom arm length, respectively, and the length of each arm is determined by searching for the extreme point in that direction based on the color similarity. The upright cross skeleton is defined as

Then the cross-based support window is defined based on the upright cross skeleton. is a combination of each horizontal segment where traverses over the vertical segment . The formula of is defined as

(b) The matching cost of is calculated. Firstly, the initial matching cost of each pixel in the support window is calculated:where is horizontal gradient, balances the color and gradient terms, and , are truncation values. Then the matching cost of is aggregated:

(c) The integer-pixel disparity of is estimated by winner-takes-all strategy:where is the searching range of disparity.

(d) The relationship of corresponding points is established under the guidance of integer-pixel disparity. Assume that the integer-pixel disparity of pixel in is ; the corresponding point of in is .

(e) The subimages and with same size are extracted through selecting the corresponding points as the center. Because the selection of the two subimages is based on the guidance of the integer-pixel disparity, thus there is only a small range of horizontal translation between and .

(f) The subpixel horizontal translation between and is estimated by using the improved phase correlation method proposed in Section 3. The is equivalent to the subpixel disparity of the center pixel .

(g) Finally, the disparity result is obtained by combining the integer-pixel disparity and the subpixel disparity; that is, .

5. Experiments and Analysis

We test the proposed subpixel matching method for stereovision of narrow baseline remotely sensed imagery on a standard personal computer with Intel(R) Core(TM) i7 CPU, and the algorithm is implemented utilizing VS2010+OpenCV. The results analysis includes two aspects: precision analysis and complexity analysis. Constant parameter settings are used for all experiments. The parameters of this method are set as , , and .

5.1. Test on Simulated Narrow Baseline Remotely sensed Images Generated by Translation

To evaluate the matching precision of proposed method, six urban scenes of narrow baseline remotely sensed image pairs, including Pleiades, SPOT-5, SPOT-6, WorldView-3, and GF-2, are used to perform image matching. For each image pair, the original image is utilized as the reference image (often called the left image), and the original image is moved 8.738 pixels along the x-axis as the target image (called the right image). Thus the true disparity is known to be 8.738 pixels for all the matching points of each image pair. Table 1 shows the general information of the test image pairs used in these experiments.

Figure 6 shows the distribution of matched points for the test narrow baseline remotely sensed image pairs. Because the image size is different for each image pair, the step size of grid is different too. The step sizes of grid in Tests 1–6 are set to 540 × 540 pixels, 160 × 160 pixels, 180 × 180 pixels, 170 × 170 pixels, 300 × 300 pixels, and 380 × 380 pixels.

In order to evaluate the matching precision of our method quantitatively, interpolation based method [14], fitting based method [15], and traditional phase correlation method [23] are used to compare with our method. Root mean square error (RMSE) and mean error of matched points are used as the assessment indices to evaluate these methods. Figure 7 shows quantitative evaluation results for the four subpixel image matching methods. Here the traditional phase correlation is called traditional PC for short. Figure 8 shows statistical results of the quantitative evaluation. Figure 8(a) shows statistical results where the RMSE is less than or equal to 0.05 pixels. Figure 8(b) shows statistical results where the RMSE is greater than 0.05 pixels and less than 0.1 pixels. Figure 8(c) shows statistical results where the RMSE is greater than or equal to 0.1 pixels. From the experimental results, one can see that the two PC methods have a high matching precision, and the precision is better than that of interpolation method and fitting method. In addition, the matching precision is related to the spatial resolution of the images. High spatial resolution corresponds to a high matching precision. Because the peak evaluation method based on uniformly spaced sampling is designed to fit the phase correlation function, our method outperforms the traditional PC method, about 66% matched points of our method with RMSE less than or equal to 0.05 pixels.

5.2. Test on Simulated Narrow Baseline Remotely sensed Images

To evaluate the precision of DEM extraction by using the disparity estimated by our method, a simulated narrow baseline remotely sensed image pair provided by Beijing Institute of Space Mechanics and Electricity is used to perform image matching. The simulated narrow baseline remotely sensed image pair is generated by SE-WORKBENCH software. The attached parameters file of the image pair not only provides the true disparity, but also provides the true elevation of targets and buildings in the scene. Therefore, the precision of disparity and elevation estimated by our method can be evaluated carefully. Table 2 shows the general information of the test image pair used in these experiments. The B/H ratio of the image pair is 0.05, and spatial resolution is 0.3 meters/pixel. Overlap ratio of left and right images is 60%, and the overlap area includes 22 targets and 40 buildings. Figure 9 shows the simulated narrow baseline remotely sensed image pair and distribution of targets and buildings. Figure 9(a) is the image pair, Figure 9(b) is the distribution of targets, and Figure 9(c) is the distribution of buildings. The targets and buildings are marked with red numbers. Figure 10 shows the matching results. Figure 10(a) is the distribution of matched points for the buildings. Figure 10(b) is the distribution of matched points for the targets.

To evaluate the matching precision of our method quantitatively, interpolation based method [14], fitting based method [15], and traditional phase correlation method [23] are used to compare with our method. RMSE of disparity and mean elevation error are used as the assessment indices to evaluate the precision of four methods, where the elevation error is the absolute difference between the true and the extracted elevation. The simulated narrow baseline remotely sensed image pair is matched 300 times by each subpixel image matching method. Figures 11 and 12 show the statistical results of four methods for targets and buildings, respectively. The results show that the RMSE and the mean elevation error obtained by our method are lower than the other three methods for most targets and buildings. Table 3 shows the average values of RMSE and mean elevation error. The true average elevation of 22 targets is 67.818 m, and the true average elevation of 40 buildings is 112.353 m. According to the statistical results in Table 3, the mean elevation error of targets estimated by our method is 0.303 m, and the mean elevation error of buildings estimated by our method is 0.406 m. Such precision can satisfy the requirements for stereovision of narrow baseline remotely sensed imagery; therefore, it is proved that the image matching method based on phase correlation proposed in this paper has practical application value.

5.3. Computational Complexity

Assume that there are points needed to be matched. In integer-pixel prematching step of our algorithm, the complexity of constructing the cross-based support window for each point is , and the total complexity for all points of this prematching step is , where is the disparity range.

In the subpixel matching step, the core algorithm is based on the phase correlation; therefore, the complexity is unrelated to the disparity range but related to the size of subimage. Assume that there are pixels in the extracted subimage. The complexity of fast DFT is , the complexity of Hanning window is , the complexity of normalized cross-power spectrum is , and the complexity of IDFT of normalized cross-power spectrum is . The peak evaluation is a least square fitting operation for sets of uniformly spaced sampled-data essentially. The minimum value of is 1; that is, the peak position is located at the most left or right of sampled-data. However, the maximum value of is ; this means the peak position is located at the center of sampled-data; thus . The complexity of subpixel peak position evaluation utilizing the least square method is , which is equivalent to . The complexity of a single point matching in subpixel matching step is ; thus the complexity of total points is . The complexity of complete image matching algorithm is . Because the complexity of subpixel matching is higher than that of integer-pixel prematching, the complexity of our algorithm is .

In order to clearly illustrate the complexity of our algorithm, we compare with the classical normalized cross correlation (NCC) algorithm [2]. Assume that the support window of NCC is ; in the practical image matching, . When , the complexity of our method is similar to that of NCC. Generally, the size of subimage is 30 × 30~40 × 40 pixels; the number of pixels in subimage is 900~1600; thus the value range of is 9.814~10.644. The disparity range of narrow baseline stereo image pair is usually within 20 pixels. In our experiment, the disparity range is 17 pixels; is established; therefore, the complexity of our method is similar to that of the classical NCC method.

6. Conclusion

Through the analysis of narrow baseline remotely sensed imagery stereovision, a subpixel image matching approach based on improved phase correlation is proposed. The cross-based local matching method is employed for prematching the image pair, and the obtained integer-pixel disparity reduces the search range of subpixel matching. A high precision subpixel matching step is implemented under the guidance of integer-pixel disparity. A peak evaluation method based on uniformly spaced sampling is designed to improve the precision of disparity estimation. The experimental results show that our method has superior performance on precision and efficiency, and it is beneficial to the stereovision of narrow baseline remotely sensed imagery.

Competing Interests

The authors declare no conflict of interests.

Authors’ Contributions

Ning Ma proposed the idea of the method. Ning Ma and Yubo Men performed the validation experiment of the method. Ning Ma, Chaoguang Men, and Xiang Li wrote this paper. All the authors read and approved the final paper.

Acknowledgments

This work was supported by National Natural Science Foundation of China (Grant nos. 11547157 and 61100004), Natural Science Foundation of Heilongjiang Province of China (Grant no. F201320), and Harbin Municipal Science and Technology Bureau (Grant no. 2014RFQXJ073).