Semi-Global Stereo Matching Algorithm Based on Multi-Scale Information Fusion

Deng, Changgen; Liu, Deyuan; Zhang, Haodong; Li, Jinrong; Shi, Baojun

doi:10.3390/app13021027

Open AccessArticle

Semi-Global Stereo Matching Algorithm Based on Multi-Scale Information Fusion

by

Changgen Deng

^1,2,3

,

Deyuan Liu

^1,2,3,

Haodong Zhang

^1,2,3,

Jinrong Li

³ and

Baojun Shi

^1,2,3,*

¹

State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300401, China

²

Hebei Key Laboratory of Robot Sensing and Human-Robot Integration, Tianjin 300401, China

³

School of Mechanical Engineering, Hebei University of Technology, Tianjin 300401, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(2), 1027; https://0-doi-org.brum.beds.ac.uk/10.3390/app13021027

Submission received: 15 November 2022 / Revised: 4 January 2023 / Accepted: 10 January 2023 / Published: 12 January 2023

(This article belongs to the Special Issue Machine/Deep Learning: Applications, Technologies and Algorithms)

Download

Browse Figures

Versions Notes

Abstract

:

Semi-global matching (SGM) has been widely used in binocular vision. In spite of its good efficiency, SGM still has difficulties in dealing with low-texture regions. In this paper, an SGM algorithm based on multi-scale information fusion (MSIF), named SGM-MSIF, is proposed by combining multi-path cost aggregation and cross-scale cost aggregation (CSCA). Firstly, the stereo pairs at different scales are obtained by Gaussian pyramid down-sampling. The initial matching cost volumes at different scales are computed by combining census transform and color information. Then, the multi-path cost aggregation in SGM is introduced into the cost aggregation at each scale and the aggregated cost volumes are fused by CSCA. Thirdly, the disparity map is optimized by internal left-right consistency check and median filter. Finally, experiments are conducted on Middlebury datasets to evaluate the proposed algorithm. Experimental results show that the average error matching rate (EMR) of the proposed SGM-MSIF algorithm reduced by 1.96% compared with SGM. Compared with classical cross-scale stereo matching algorithm, the average EMR of SGM-MSIF algorithm reduced by 0.92%, while the processing efficiency increased by 58.7%. In terms of overall performance, the proposed algorithm outperforms the classic SGM and CSCA algorithms. It can achieve high matching accuracy and high processing efficiency for binocular vision applications, especially for those with low-texture regions.

Keywords:

binocular vision; stereo matching; multi-scale information fusion; multi-path cost aggregation; left-right consistency check

1. Introduction

As a significant research topic in binocular vision, dense stereo matching is widely used in the fields of photogrammetry, 3D reconstruction and automatic driving [1,2,3,4]. Although plenty of high-performance stereo matching algorithms have emerged in recent years, it is still a challenging task to improve the stereo matching accuracy in low-texture regions.

According to the difference of constraint range of matching cost function, Scharstein and Szeliski [5] divided stereo matching algorithms into local stereo matching and global stereo matching. Local stereo matching algorithms only use the local information around pixels, such as the sum of absolute difference [6] and the sum of squared difference [7], to compute disparity. These local algorithms can be implemented much faster but have low matching accuracy. Global stereo matching algorithms integrate all pixel information in an image by constructing a global matching cost function. A high-precision disparity map can be obtained by minimizing the matching cost function. The commonly used global stereo matching methods include dynamic programming method [8], belief propagation method [9] and graph cut method [10]. These global algorithms have high matching accuracy but low processing efficiency. Overall, it is difficult to balance accuracy and efficiency by using local stereo matching algorithms or global stereo matching algorithms alone.

In order to improve the efficiency of global stereo matching algorithms while ensuring matching accuracy, Hirschmüller [11] proposed a semi-global stereo matching (SGM) algorithm. In this algorithm, an efficient one-dimensional (1D) path aggregation method was used to replace the two-dimensional (2D) energy minimization method when calculating the energy function minimization. Inference along each scanline was implemented respectively, and then the results in multiple directions were fused to obtain the final cost volume of each pixel. However, due to the high computational complexity in the cost computation stage of the mutual information method, iteration process is required and thus affects the overall efficiency of the algorithm. Meanwhile, two adjacent pixels affect mutually only on the same scanline, it is easy to yield streaking artifacts in disparity image. Facciolo et al. [12] proposed the more global matching (MGM). By incorporating messages from the nodes visited in the previous scanline, this approach integrated 2D information in the processing of SGM’s 1D paths, which effectively reduced the streaking artifacts of SGM. Yang et al. [13] proposed a semi-global and block matching. An adaptive block matching strategy was employed instead of a dense matching strategy to achieve accurate disparity estimation in water areas. Bu et al. [14] introduced a local edge-aware filtering method to SGM to enhance the interaction of neighboring scanlines, which effectively improved the quality of the disparity map. Ma et al. [15] proposed a novel omni-directional SGM (OmniSGM). By constructing a cost volume update scheme, OmniSGM aggregated costs from paths in all directions and encouraged beneficial information to propagate throughout the image, achieving excellent matching results. However, the common shortcoming of these methods is that they only consider the original image in a single scale and ignore the use of information in other scales. Therefore, it is still struggles in dealing with low-texture regions.

Inspired by the human visual system that observes from coarse scene to fine scene, Zhang et al. [16] introduced the cross-scale concept in the cost aggregation of stereo matching and proposed the cross-scale cost aggregation (CSCA) framework. Under this framework, cost aggregation methods such as bilateral filter (BF) [17], guided filter (GF) [18], non-local cost aggregation (NL) [19] and minimum spanning tree (ST) [20] were used for multi-scale stereo matching. Their results demonstrated that the accuracy of disparity map obtained by fusing the stereo matching algorithm and multi-scale information was significantly higher than those obtained by the algorithms that only considered single scale. However, inevitably, as the image scales increase, CSCA has high complexity and low matching efficiency. Yao et al. [21] improved CSCA by adding Laplace pyramid transform in the cross-scale cost aggregation stage, and obtained a better disparity map. Zhao et al. [22] introduced fast guided filter into CSCA, and used different cost aggregation methods in different scale spaces, which not only improved the quality of disparity map, but also reduced the running time. Although the aforementioned algorithms improved the quality of disparity map to varying extents, there still exists the problem of low matching accuracy when dealing with low-texture regions, and the real-time performance needs to be improved.

In this paper, an SGM algorithm based on multi-scale information fusion (SGM-MSIF) is proposed to improve the matching accuracy of low-texture regions and processing efficiency simultaneously. Compared with classic SGM and CSCA algorithms, we propose a novel cost aggregation method and an internal left-right consistency check method. These approaches can significantly improve the matching accuracy of the algorithm in low-texture regions and reduce the complexity of SGM-MSIF.

The rest of this paper is organized as follows. In Section 2, the flow chart and specific steps of the proposed algorithm are described. The calculation method of the initial matching cost volumes, the proposed cross-scale multi-path cost aggregation method and the principle of internal left-right consistency check in disparity optimization are illustrated. Then, experiments are designed, and the matching accuracy and processing efficiency of the algorithm are tested in Section 3. Conclusions of this article are summarized in Section 4.

2. SGM-MSIF Algorithm

Similar to the classic SGM algorithm, the proposed SGM-MSIF algorithm consists of four steps: cost computation, cost aggregation, disparity computation and disparity optimization. Figure 1 shows the flowchart of the proposed SGM-MSIF algorithm, and the detailed steps are described as follows:

(1): Matching cost computation. The initial cost volumes at S + 1 scales are calculated by the combination of census transform and color information of images.
(2): Improved cross-scale multi-path cost aggregation. The initial cost volumes at different scales are aggregated by multi-path cost aggregation to obtain the aggregated cost volumes at each scale. Then, the aggregated cost volumes of all scales are fused to obtain the final cost volume containing multi-scale information.
(3): Disparity computation. The winner-take-all (WTA) principle is adopted to calculate the initial disparity map.
(4): Disparity optimization. The initial disparity map is optimized by internal left-right consistency check, weighted median filter and median filter.

2.1. Matching Cost Computation

In this step, after obtaining stereo pairs at different scales [23], the best-matched point needs to be found for every pixel within the disparity searching range. The initial matching cost is used to measure the similarity of the corresponding pixels under different disparities. The traditional census transform [24] (based on the relative brightness difference in the window) can well process the inconsistent brightness of the stereo pair, and has certain robustness to the noise of the images. However, it is prone to produce ambiguity when dealing with repetitive texture regions. The cost computation method based on the absolute value of color difference in a single pixel [25] can reduce the ambiguity of repeated texture to a certain extent. It can quickly calculate the color difference of pixels in the left and right images. Therefore, in this paper, census transform method and color information are combined to obtain the initial matching cost, so as to make up for the defects of the traditional census method in the repeated texture regions.

In the traditional census transform, the grayscale difference between the target pixel to be matched and its neighboring pixels needs to be calculated first. Then the grayscale difference is converted into a bit string. The matching cost is obtained by calculating the Hamming distance between the bit strings. The principle of census transform is shown in Figure 2. In this figure, Left and Right represent the stereo pair, u and v represent columns and rows of pixels, respectively and d represents the disparity. Firstly, the gray values of the pixels in the neighborhood window are compared with the gray value of the central pixel. It is counted as 0 if the gray value is larger than that of the central pixel and it is counted as 1 in other cases. The obtained 0 or 1 is mapped to a bit string in turn, and the bit string is the census transform result of the central pixel. Then the Hamming distance between the two pixels within the disparity search range is calculated and denoted as

C_{c e n} (p, d)

, which represents the census value of the pixel

p = (u, v)

under the disparity d. In this way, the census initial matching cost volume

C_{c e n}

with dimension W × H × L can be obtained, where W, H and L represent the width, height and disparity range of the image, respectively.

For the matching cost calculation based on color information, the average value of the difference among the three color components of the left and right image pixels is taken as the cost value to calculate the initial cost volume

C_{c o r}

, which is expressed as:

C_{c o r} (p, d) = \frac{1}{3} \sum_{k = R, G, B} |I_{k}^{L e f t} (p) - I_{k}^{R i g h t} (q)|

(1)

where

p = (u, v)

represents a pixel to be matched in the left image.

q = (u - d, v)

represents the corresponding search point of p in the right image. I represents the color component.

C_{c o r} (p, d)

is the cost value of p obtained by using the average value of the color difference in disparity d.

Since the ranges of the above two matching cost values

C_{c e n} (p, d)

and

C_{c o r} (p, d)

are different, normalization is required before combining the two methods. The normalization expression is shown in Equation (2):

ρ (C, λ) = 1 - \exp (- \frac{C}{λ})

(2)

where ρ represents the normalized matching cost value. C and

λ

represent the matching cost values and normalized parameters corresponding to different methods, respectively. The initial matching cost after combination can be defined as:

C (p, d) = ρ (C_{c e n} (p, d), λ_{c e n}) + ρ (C_{c o r} (p, d), λ_{c o r})

(3)

where

λ_{c e n}

and

λ_{c o r}

are the normalized parameters of

C_{c e n} (p, d)

and

C_{c o r} (p, d)

, respectively.

2.2. Improved Cross-Scale Multi-Path Cost Aggregation

The initial cost volume obtained by the above method is very sensitive to noise since only the correlation of local pixels is considered [26]. To improve the robustness of the algorithm, the idea of solving the energy function of SGM algorithm is introduced in this paper. The dynamic programming problem in two-dimensional (2D) space is simplified into a series of one-direction dynamic programming problems in one-dimensional (1D) space. The multi-path cost aggregation algorithm based on the one-direction dynamic programming is used to optimize the initial matching cost volume at each scale. It will greatly improve the processing efficiency of the algorithm while ensuring the matching accuracy. Then, the optimized cost volumes at each scale are fused through cross-scale cost aggregation to make up for the deficiency of stereo matching algorithm in low-texture regions at a single scale.

For the multi-path cost aggregation algorithm based on one-direction dynamic programming, the disparity corresponding to each pixel is obtained by minimizing its energy function. In SGM [11], the energy function E(D) is defined as:

E (D) = \sum_{m} C (m, D_{m}) + \sum_{n \in N_{m}} P_{1} T [|D_{m} - D_{n}| = 1] + \sum_{n \in N_{m}} P_{2} T [|D_{m} - D_{n}| > 1]

(4)

where the first term represents the sum of the initial matching cost values of all pixels when the disparity map is D. The second and third terms are smoothing terms, indicating that all pixels in the

N_{m}

neighborhood of pixel m are punished.

P_{1}

and

P_{2}

(

P_{2} > P_{1}

) are penalty coefficients. T is the truncation function to ensure that the second term is executed when the absolute difference between the disparity

D_{m}

of pixels m and

D_{n}

of pixels n is equal to 1 and the third term is executed when the absolute difference is larger than 1.

The one-dimensional cost aggregation on all paths around each pixel is performed to obtain the cost values of each path. Then, the cost values under all paths are added to get the matched cost value of the pixel after aggregation. The cost aggregation formula of pixel p along path r under disparity d is shown in Equation (5):

L_{r} (p, d) = C (p, d) + \min \{\begin{matrix} L_{r} (p - r, d) \\ L_{r} (p - r, d - 1) + P_{1} \\ L_{r} (p - r, d + 1) + P_{1} \\ \min_{t} L_{r} (p - r, t) + P_{2} \end{matrix}\} - \min_{t} L_{r} (p - r, t)

(5)

where

L_{r}

represents the aggregated cost value on path r,

p - r

represents the last pixel of pixel p on path r, and

\min_{t} L_{r} (p - r, t)

represents the minimum cost value of pixel

p - r

under all disparity. The penalty coefficients

P_{1}

and

P_{2}

are determined by the color difference

β_{l}

and

β_{r}

of the pixels.

β_{l}

is the color difference between p and

p - r

in the left image, and

β_{r}

is the color difference between q and

q - r

in the right image, as shown in Equation (6):

\{\begin{matrix} P_{1} = \frac{Π_{1}}{10}, P_{2} = \frac{Π_{2}}{10}, if β_{l} > τ, β_{r} > τ \\ P_{1} = Π_{1}, P_{2} = Π_{2}, if β_{l} < τ, β_{r} < τ \\ P_{1} = \frac{Π_{1}}{4}, P_{2} = \frac{Π_{2}}{4}, otherwise \end{matrix}

(6)

where

Π_{1}

and

Π_{2}

are two constants, and

τ

represents the threshold to determine the color difference. The aggregated cost value of each path is added to obtain the multi-path cost aggregation result

\tilde{C} (p, d)

of pixel p under disparity d as:

\tilde{C} (p, d) = \sum_{r} L_{r} (p, d)

(7)

In multi-path cost aggregation, increasing the number of paths can effectively reduce the EMR of stereo matching algorithm. However, with the increase of the path, the processing efficiency of the algorithm will be significantly reduced. Considering both the matching accuracy and real-time performance of the algorithm, a four-path cost aggregation is used, as shown in Figure 3.

In CSCA [16], the cost aggregation process is regarded as a weighted least squares (WLS) optimization problem for the initial matching cost volume. The optimization function at a single scale is defined as:

\tilde{C} (i, l) = \underset{z}{\arg \min} \frac{1}{Z_{i}} {\sum_{j \in N_{i}} K (i, j) ‖z - C (j, l)‖}^{2}

(8)

where

\tilde{C} (i, l)

represents the aggregated cost value of pixel i under disparity l,

N_{i}

represents the neighborhood pixels of pixel i,

\tilde{C} (j, l)

is the cost value of pixel j in neighborhood

N_{i}

under disparity l,

K (i, j)

is the kernel function to measure the similarity between pixel i and j,

Z_{i} = \sum_{j \in N_{i}} K (i, j)

is the regularization constant, and z represents the target cost value.

The optimal solution is obtained by calculating the partial derivative of z in Equation (8), which is shown in Equation (9):

\tilde{C} (i, l) = \frac{1}{Z_{i}} \sum_{j \in N_{i}} K (i, j) C (j, l)

(9)

In order to overcome the issue of high EMR of single scale stereo matching algorithm in low-texture regions, as in CSCA [16], constraint items are added between different scales, and the aggregated cost volumes under S + 1 scales are connected together. The final multi-scale optimization function is expressed in Equation (10):

\hat{V} = \underset{{\{z^{s}\}}_{s = 0}^{S}}{\arg \min} (\sum_{s = 0}^{S} \frac{1}{Z_{i^{s}}^{s}} \sum_{j^{s} \in N_{i^{s}}} K (i^{s}, j^{s}) {‖z^{s} - C^{s} (j^{s}, l^{s})‖}^{2} + γ \sum_{s = 1}^{S} {‖z^{s} - z^{s - 1}‖}^{2})

(10)

where,

s \in \{0, 1, 2, \dots, S\}

represents the scale parameter of the left and right images at different scales obtained by Gaussian pyramid down-sampling,

i^{s}

and

j^{s}

represent the pixels i and j under scale s,

N_{i^{s}}

represents the neighborhood pixels of

i^{s}

,

l^{s}

represents the disparity l under scale s,

C^{s}

represents the cost value at scale s and

C^{0}

is the cost value at the finest scale.

K (i^{s}, j^{s})

is the kernel function to measure the similarity between pixels

i^{s}

and

j^{s}

, and

Z_{i^{s}}^{s} = \sum_{j^{s} \in N_{i^{s}}} K (i^{s}, j^{s})

is the regularization constant. Considering the relationship between adjacent scales, constraint parameter

γ (γ > 0)

is introduced to control the regularization intensity.

{\{z^{s}\}}_{s = 0}^{S}

represents the target cost value at each scale. The S + 1 elements in vector

\hat{V} = {[{\hat{C}}^{0} (i^{0}, l^{0}), {\hat{C}}^{1} (i^{1}, l^{1}), \dots, {\hat{C}}^{S} (i^{S}, l^{S})]}^{T}

represent the cost value of multi-scale information fusion in each scale. In order to obtain the optimal solution, setting

F ({\{z^{s}\}}_{s = 0}^{S}) = \hat{V}

, and Equation (11) is obtained by calculating partial derivative of

z^{s}

when

s \in \{1, 2, \dots, S - 1\}

:

\frac{\partial F}{\partial z^{s}} = \frac{2}{Z_{i^{s}}^{s}} \sum_{j^{s} \in N_{i^{s}}} K (i^{s}, j^{s}) (z^{s} - C^{s} (j^{s}, l^{s})) + 2 γ (z^{s} - z^{s - 1}) - 2 γ (z^{s + 1} - z^{s})

(11)

Combining Equation (9), Equation (11) can be simplified as:

\frac{\partial F}{\partial z^{s}} = 2 (- γ z^{s - 1} + (1 + 2 γ) z^{s} - γ z^{s + 1} - {\tilde{C}}^{s} (i^{s}, l^{s}))

(12)

\frac{\partial F}{\partial z^{s}} = 0

gives:

- γ z^{s - 1} + (1 + 2 γ) z^{s} - γ z^{s + 1} = {\tilde{C}}^{s} (i^{s}, l^{s})

(13)

Similarly, when

s = 0

and

s = S

, we get:

(1 + γ) z^{0} - γ z^{1} = {\tilde{C}}^{0} (i^{0}, l^{0})

(14)

- γ z^{S - 1} + (1 + γ) z^{S} = {\tilde{C}}^{S} (i^{S}, l^{S})

(15)

where

{\tilde{C}}^{s} (i^{s}, l^{s})

represents the result of multi-path cost aggregation under scale s, which is known. Setting

\tilde{V} = {[{\tilde{C}}^{0} (i^{0}, l^{0}), {\tilde{C}}^{1} (i^{1}, l^{1}), \dots, {\tilde{C}}^{S} (i^{S}, l^{S})]}^{T}

, then the S + 1 elements in vector

\tilde{V}

represent the cost value of multi-path cost aggregation at each scale. Through the above process, S + 1 linear equations can be obtained and expressed in matrix form as follows:

A \hat{V} = \tilde{V}

(16)

In Equation (16), the inverse of A always exists since A is a diagonal matrix with dimension (S + 1) × (S + 1) and its diagonal elements are not equal to 0. Thus,

\hat{V} = A^{- 1} \tilde{V}

(17)

In the proposed algorithm, the cost volumes

{\tilde{C}}^{0} (p^{0}, d^{0}), {\tilde{C}}^{1} (p^{1}, d^{1}), \dots, {\tilde{C}}^{S} (p^{S}, d^{S})

of pixel p after multi-path cost aggregation at all scales can be obtained through Equation (7). By substituting them into Equation (17), we can get the final cost volume that fuses multi-scale information. To obtain a disparity map with the same resolution as the original image, the cost volume

{\hat{C}}^{0}

on scale 0 is selected for the next disparity computation.

2.3. Disparity Computation and Disparity Optimization

In this step, through the WTA strategy [27], the initial disparity map is obtained by selecting the disparity corresponding to the minimum cost value in

{\hat{C}}^{0}

. However, there are some occluded regions in the stereo pairs due to the position relationship between cameras, and the disparity of these regions is invalid. Therefore, the initial disparity map needs to be optimized.

Left-right consistency check is a common algorithm for the detection of occluded regions and removal of mismatched pixels in stereo matching. The left and right disparity maps need to be obtained before the consistency check [28].

p = (u, v)

is a pixel in the left image with disparity value of

D_{p}

.

q = (u - D_{p}, v)

represents the corresponding point of p in the right image and the disparity of pixel q is

D_{q}

according to the right disparity map. If

D_{p}

and

D_{q}

satisfy the condition:

|D_{p} - D_{q}| \leq 1

(18)

Then pixel p passes the left and right consistency check and

D_{p}

is the effective disparity. Otherwise, pixel p will be regarded as the mismatched pixel, and its disparity value is invalid. It will be replaced by the minimum value of the two effective disparities located in the same row and closest to pixel p.

The efficiency of traditional left and right consistency check is low since two turns of cost calculations and cost aggregations are required to obtain the left and right disparity maps. When calculating the initial matching cost, for a pair of pixels, matching the right image through the left image and matching the left image through the right image are equivalent. Therefore, to further improve the processing efficiency of the algorithm, the internal left-right consistency check is used to optimize the disparity in this paper, that is, the right cost volume is calculated based on the left cost volume. The principle of the cost space conversion is shown in Figure 4, and its expression is:

{\hat{C}}_{R i g h t} ((u, v), d) = {\hat{C}}_{L e f t} ((u + d, v), d)

(19)

where

{\hat{C}}_{R i g h t}

and

{\hat{C}}_{L e f t}

represent the cost volumes of the right and left image with multi-scale information, respectively.

{\hat{C}}_{R i g h t} ((u, v), d)

and

{\hat{C}}_{L e f t} ((u + d, v), d)

represent the cost values of pixels

(u, v)

and

(u + d, v)

under disparity d, respectively. Since

{\hat{C}}_{L e f t}

is known, it is easy to obtain the cost volume of the right image through the conversion of Equation (19). Then we can get the left and right disparity map through the WTA principle. This process requires only one cost computation and one cost aggregation, which can improve the processing efficiency of the algorithm while ensuring the matching accuracy.

Moreover, in order to better preserve the edge information, the weighted median filter with bilateral filter as the weight [29] is used to optimize the disparity map in the algorithm. The median filter [30] is adopted to remove the noise in the disparity map. Thus, we get the final refined disparity map and use it for the experimental analysis.

3. Experimental Design and Results Analysis

In this Section, experimental design is first introduced with Middlebury datasets to examine the effectiveness of the proposed method. Then, experimental results of our method are compared with those of some typical methods available in the literature.

3.1. Experimental Design

Middlebury datasets [31] provide a standard for evaluating stereo matching algorithms, which contain original images of different sizes in various scenes and ground truth maps generated by structured light. Four datasets, Cones, Teddy, Tsukuba and Venus, are selected to conduct experiments, which have been widely used to evaluate stereo matching algorithms in Middlebury 2001 and Middlebury 2003 datasets [5,32]. Figure 5 shows the left images and their corresponding ground truth maps of the four datasets.

The related algorithms are implemented in Ubuntu 18.04 by C++ and OpenCV library. The experimental hardware configuration is Intel (R) Core (TM) i5-8250U CPU @ 1.60 GHz, 8 GB RAM. In the initial matching cost computation, the neighborhood window with size of 9 × 9 is selected for census transform. Table 1 shows some important parameter settings that have a significant impact on the experimental results.

To reflect the improvement of the proposed algorithm in cost aggregation and disparity optimization, SGM-MSIF algorithm is compared with SGM [11], MGM [12], OmniSGM [15], CSCA + BOX (the fastest matching speed under CSCA [16]) and CSCA + GF (the best matching accuracy under CSCA [16]). Three evaluation indicators, the visual effect of the disparity maps, the error matching rate (EMR) in three regions (Non-occ, All and Disc regions) and the running time of these algorithms, are used for comparison. The threshold is set to 1 when calculating EMR; this means it is judged as a mismatched pixel when the difference between the matching result and the real disparity value (Ground Truth) provided by the dataset is greater than 1. To verify the processing efficiency of these algorithms, all codes are tested without any acceleration technology.

3.2. Experimental Result Analysis

Figure 6 shows the disparity maps generated by SGM, CSCA + BOX, CSCA + GF and SGM-MSIF (this paper). In order to clearly reflect the stereo matching results, the mismatched pixels of different regions are marked with different colors in Figure 6, where magenta represents all regions (All), yellow represents non-occluded regions (Non-occ) and cyan represents discontinuous regions (Disc). It is observed that the “magenta pixels” are mainly concentrated in the occluded regions. The disparities of the occluded regions are difficult to obtain. The reason is that these regions have no corresponding pixels in the right image. However, generally, the number of mismatched pixels in Non-occ regions using SGM-MSIF is significantly less than those using the other three algorithms. In the low-texture regions of the images, the disparity maps obtained by SGM-MSIF is significantly improved compared with the other algorithms. In the Disc regions of Teddy and Tsukuba, the mismatched pixels are also obviously reduced in the SGM-MSIF results. From these red rectangles in Figure 6, we can see that our method successfully recovers the disparities of pixels not only in large homogeneous regions, but also in discontinuous regions. These experimental results state that the proposed algorithm of adding multi-scale information in the multi-path cost aggregation can effectively increase the accuracy of the disparity map and reduce the EMR of low-texture regions.

Table 2, Table 3 and Table 4 present the EMRs of the six algorithms in Non-occ, All and Disc regions, respectively. Figure 7 shows the histogram of EMRs in Non-occ regions of the six algorithms. In Table 2 and Figure 7, the proposed algorithm achieves the best average EMR and the smallest EMRs in Cones and Teddy datasets. Although the EMRs of our method in Tsukuba and Venus datasets are not the lowest, the differences between our approach and those of other algorithms are very small. It is difficult to get further improvement since the matching accuracies in these datasets are already high enough. In Non-occ regions, compared with traditional SGM and CSCA + GF (the best matching accuracy under CSCA [16]) algorithms, the average EMRs of SGM-MSIF reduced by 1.96% and 0.92%, respectively. In Table 3 and Table 4, SGM-MSIF also achieves the best average EMRs and the smallest EMRs in three of the four datasets. Moreover, in Disc regions, compared with SGM and CSCA + GF, the average EMRs of SGM-MSIF are reduced by 6.68% and 1.68%, respectively. From Table 2, Table 3 and Table 4, compared with the other five algorithms, our algorithm has excellent matching accuracy in general. It indicates that the SGM-MSIF algorithm can effectively improve the matching accuracy and the robustness of the stereo matching algorithm in low-texture regions.

Table 5 lists the running times of the six algorithms on different datasets. As shown in the table, the traditional SGM algorithm runs fastest, but its matching accuracy is the worst. Compared with CSCA + BOX and CSCA + GF, the average running time of SGM-MSIF algorithm reduced by 17.3% and 58.7%, respectively. It shows that the introduction of multi-path cost aggregation in cross-scale cost aggregation and the use of internal left and right consistency check in disparity optimization can significantly improve the processing efficiency of the stereo matching algorithm.

4. Conclusions

In this paper, an SGM algorithm based on multi-scale information fusion (SGM-MSIF) is proposed by combining multi-path cost aggregation and cross-scale cost aggregation (CSCA) framework. Compared with classic SGM and CSCA + GF algorithms in terms of matching accuracy, the average error matching rates of the proposed algorithm are reduced by 1.96% and 0.92%, respectively. For the matching efficiency, the average runtime of the proposed algorithm is reduced by 58.7% compared with CSCA + GF algorithm. The overall performance of the proposed SGM-MSIF algorithm outperforms the classic SGM and CSCA algorithms.

In SGM-MSIF algorithm, a novel cost aggregation method by combining multi-path cost aggregation is proposed based on one-direction dynamic programming and cross-scale cost aggregation. The matching accuracy of the algorithm in low-texture regions is significantly improved. We also present an approach of internal left-right consistency check. The right cost volumes can be converted from the left cost volumes, which can effectively reduce the complexity of SGM-MSIF and improve the stereo matching efficiency. Experimental results prove that the proposed SGM-MSIF algorithm can obviously improve the matching accuracy of low-texture regions and raise the processing efficiency of cross-scale stereo matching algorithms. It can provide a theoretical reference for 3D reconstruction of low-texture scene. In future, combining this algorithm with machine learning [33] is a possible way to further improve the matching accuracy and efficiency of stereo matching algorithms.

Author Contributions

Conceptualization, C.D.; Funding acquisition, B.S.; Methodology, C.D. and B.S.; Project administration, B.S.; Software, C.D., D.L. and H.Z.; Supervision, B.S.; Writing—original draft, C.D.; Writing—review & editing, C.D., J.L. and B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. U20A20201), the National Key R&D Program of China (Grant No. 2019YFB1312102), the Key R&D Program of Hebei Province (Grant No. 20311803D).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors thank the editor and anonymous reviewers for providing helpful suggestions for improving the quality of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, P.; Zhang, L.; Wang, M. Measurement of Large-Sized-Pipe Diameter Based on Stereo Vision. Appl. Sci. 2022, 12, 5277. [Google Scholar] [CrossRef]
Do, P.N.B.; Nguyen, Q.C. A review of stereo-photogrammetry method for 3-D reconstruction in computer vision. In Proceedings of the 19th International Symposium on Communications and Information Technologies (ISCIT), Ho Chi Minh City, Vietnam, 25–27 September 2019; pp. 138–143. [Google Scholar]
Huynh, T.H.; Yoo, M. A Taillight Matching and Pairing Algorithm for Stereo-Vision-Based Nighttime Vehicle-to-Vehicle Positioning. Appl. Sci. 2020, 10, 6800. [Google Scholar] [CrossRef]
Zhou, S.; Zhang, G.; Yi, R.; Xie, Z. Research on Vehicle Adaptive Real-time Positioning Based on Binocular Vision. IEEE Intell. Transp. Syst. Mag. 2021, 14, 47–59. [Google Scholar] [CrossRef]
Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
Vázquez-Delgado, H.D.; Pérez-Patricio, M.; Aguilar-González, A.; Arias-Estrada, M.O.; Palacios-Ramos, M.A.; Camas-Anzueto, J.L.; Perez-Cruz, A.; Velazquez-Trujillo, S. Real-time multi-window stereo matching algorithm with fuzzy logic. IET Comput. Vis. 2021, 15, 208–223. [Google Scholar] [CrossRef]
Nazmi, Z.A.M.; Hamzah, R.A.; Zarina, M.N.; Madih, Z. Disparity Map from Stereo Images for Three-dimensional Surface Reconstruction. Eng. Sci. 2022, 19, 167–174. [Google Scholar] [CrossRef]
Hallek, M.; Boukamcha, H.; Mtibaa, A.; Atri, M. Dynamic programming with adaptive and self-adjusting penalty for real-time accurate stereo matching. J. Real Time Image Process. 2022, 19, 233–245. [Google Scholar] [CrossRef]
Pan, C.; Liu, Y.; Huang, D. Novel belief propagation algorithm for stereo matching with a robust cost computation. IEEE Access 2019, 7, 29699–29708. [Google Scholar] [CrossRef]
Lu, B.; Sun, L.; Yu, L.; Dong, X. An improved graph cut algorithm in stereo matching. Displays 2021, 69, 102052. [Google Scholar] [CrossRef]
Hirschmüller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 328–341. [Google Scholar] [CrossRef]
Facciolo, G.; De Franchis, C.; Meinhardt, E. MGM: A significantly more global matching for stereovision. In Proceedings of the BMVC 2015, Swanaea, UK, 7–10 September 2015. [Google Scholar]
Yang, W.; Li, X.; Yang, B.; Fu, Y. A novel stereo matching algorithm for digital surface model (DSM) generation in water areas. Remote Sens. 2020, 12, 870. [Google Scholar] [CrossRef] [Green Version]
Bu, P.; Zhao, H.; Yan, J.; Jin, Y. Collaborative semi-global stereo matching. Appl. Opt. 2021, 60, 9757–9768. [Google Scholar] [CrossRef]
Ma, Y.; Tian, A.; Bu, P.; Liu, B.; Zhao, Z. Omni-Directional Semi-Global Stereo Matching with Reliable Information Propagation. Appl. Sci. 2022, 12, 11934. [Google Scholar] [CrossRef]
Zhang, K.; Fang, Y.; Min, D.; Sun, L.; Yang, S.; Yan, S.; Tian, Q. Cross-scale cost aggregation for stereo matching. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1590–1597. [Google Scholar]
Fan, R.; Liu, Y.; Bocus, M.J.; Wang, L. Real-time subpixel fast bilateral stereo. In Proceedings of the 2018 IEEE International Conference on Information and Automation (ICIA), Wuyishan, China, 11–13 August 2018; pp. 1058–1065. [Google Scholar]
Hamzah, R.A.; Ibrahim, H.; Hassan, A.H.A. Stereo matching algorithm based on per pixel difference adjustment, iterative guided filter and graph segmentation. J. Vis. Commun. Image Represent. 2017, 42, 145–160. [Google Scholar] [CrossRef]
Yang, Q. Stereo Matching Using Tree Filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 834–846. [Google Scholar] [CrossRef] [PubMed]
Gao, S.; Ge, H.; Zhang, H.; Zhang, Y. A nonlocal method with modified initial cost and multiple weight for stereo matching. J. Sens. 2017, 2017, 9374870. [Google Scholar] [CrossRef] [Green Version]
Yao, L.; Liu, Z.; Wang, B. Stereo matching based on pyramid transform cross-scale cost aggregation. J. Syst. Simul. 2016, 28, 2227–2234. [Google Scholar]
Zhao, Y.; Zhuang, Z.; Xu, X.; Zhang, Y.; Lv, X. Improved stereo matching algorithm based on cross-scale cost aggregation. Comput. Integr. Manuf. Syst. 2020, 26, 947–953. [Google Scholar]
Li, S.; Hao, Q.; Kang, X.; Benediktsson, J.A. Gaussian pyramid based multiscale feature fusion for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3312–3324. [Google Scholar] [CrossRef]
Nguyen, D.M.; Hanca, J.; Lu, S.P.; Munteanu, A. Robust stereo matching using census cost, discontinuity preserving disparity computation and view-consistent refinement. In Proceedings of the 2015 International Conference on 3D Imaging (IC3D), Liege, Belgium, 14–15 December 2015; pp. 1–8. [Google Scholar]
Mei, X.; Sun, X.; Zhou, M.; Jiao, S.; Wang, H.; Zhang, X. On building an accurate stereo matching system on graphics hardware. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 467–474. [Google Scholar]
Yao, M.; Ouyang, W.; Xu, B. Hybrid cost aggregation for dense stereo matching. Multimed. Tools Appl. 2020, 79, 23189–23202. [Google Scholar] [CrossRef]
Yan, T.; Gan, Y.; Xia, Z.; Zhao, Q. Segment-based disparity refinement with occlusion handling for stereo matching. IEEE Trans. Image Process. 2019, 28, 3885–3897. [Google Scholar] [CrossRef] [PubMed]
Vieira, G.d.; Soares, F.A.A.M.N.; Laureano, G.T.; Parreira, R.T.; Ferreira, J.C. A Segmented Consistency Check Approach to Disparity Map Refinement. Can. J. Electr. Comput. Eng. 2018, 41, 218–223. [Google Scholar] [CrossRef]
Li, Y.; Wu, M.; Liu, K.; Yu, W. Anisotropic Stereo Matching Fusing Multi-Scale Information. Available online: http://kns.cnki.net/kcms/detail/11.5946.TP.20210624.1009.002.html (accessed on 14 November 2022).
Erkan, U.; Gökrem, L.; Enginoğlu, S. Different applied median filter in salt and pepper noise. Comput. Electr. Eng. 2018, 70, 789–798. [Google Scholar] [CrossRef]
Scharstein, D.; Szeliski, R. Middlebury Stereo Evaluation—Version 3. Available online: http://vision.middlebury.edu/stereo/ (accessed on 14 November 2022).
Scharstein, D.; Szeliski, R. High-accuracy stereo depth maps using structured light. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), Madison, WI, USA, 18–20 June 2003; Volume 1, pp. 195–202. [Google Scholar]
El-kenawy, E.-S.M.; Abutarboush, H.F.; Mohamed, A.W.; Ibrahim, A. Advance artificial intelligence technique for designing double T-shaped monopole antenna. Comput. Mater. Con. 2021, 69, 2983–2995. [Google Scholar] [CrossRef]

Figure 1. The flowchart of SGM-MSIF algorithm.

Figure 2. Schematic diagram of census transform.

Figure 3. 4-path cost aggregation.

Figure 4. Schematic diagram of cost space conversion.

Figure 5. Middlebury stereo evaluation datasets. (a) Left images of the stereo pairs from four datasets: Cones, Teddy, Tsukuba and Venus. (b) Ground truth maps of these images.

Figure 6. Disparity maps obtained by four different algorithms: (a) SGM (traditional SGM algorithm [11]); (b) CSCA + BOX (the fastest matching speed under CSCA [16]); (c) CSCA + GF (the best matching accuracy under CSCA [16]); and (d) SGM-MSIF (this paper). Magenta represents All error. Yellow represents Non-occ error and cyan represents Disc error.

Figure 7. EMRs in Non-occ regions.

Table 1. Parameter settings of SGM-MSIF.

$λ_{c e n}$	$λ_{c o r}$	$\prod_{1}$	$\prod_{2}$	$τ$	$γ$
50	30	1.0	3.0	15	0.3

Table 2. Comparison of EMRs in Non-occ regions of six algorithms (%).

Algorithms	Cones	Teddy	Tsukuba	Venus	Avg
SGM [11]	4.30	7.84	4.56	0.98	4.42
MGM [12]	5.36	9.51	2.53	1.27	4.67
OmniSGM [15]	3.78	6.62	1.48	1.07	3.24
CSCA + BOX [16]	3.92	9.98	4.52	0.79	4.80
CSCA + GF [16]	3.25	6.41	3.44	0.40	3.38
SGM-MSIF (this paper)	2.77	4.75	1.85	0.47	2.46

Table 3. Comparison of EMRs in All regions of six algorithms (%). “/”: the results are not available in the references.

Algorithms	Cones	Teddy	Tsukuba	Venus	Avg
SGM [11]	10.33	14.64	5.80	1.80	8.14
MGM [12]	/	/	/	/	/
OmniSGM [15]	/	/	/	/	/
CSCA + BOX [16]	9.68	15.20	5.27	1.12	7.82
CSCA + GF [16]	8.90	11.45	3.85	0.54	6.19
SGM-MSIF (this paper)	8.75	10.90	2.62	1.11	5.85

Table 4. Comparison of EMRs in Disc regions of six algorithms (%). “/”: the results are not available in the references.

Algorithms	Cones	Teddy	Tsukuba	Venus	Avg
SGM [11]	12.82	21.16	16.94	12.72	15.91
MGM [12]	/	/	/	/	/
OmniSGM [15]	/	/	/	/	/
CSCA + BOX [16]	10.70	20.26	13.49	6.4	12.71
CSCA + GF [16]	8.76	16.54	13.55	4.77	10.91
SGM-MSIF (this paper)	8.18	13.33	9.61	5.78	9.23

Table 5. Comparison of the running time of four algorithms (s). “/”: the results are not available in the references.

Algorithms	Cones	Teddy	Tsukuba	Venus	Avg
SGM [11]	1.75	1.74	0.38	1.01	1.22
MGM [12]	/	/	/	/	/
OmniSGM [15]	/	/	/	/	/
CSCA + BOX [16]	3.79	4.01	1.90	2.76	3.12
CSCA + GF [16]	9.00	9.24	2.18	4.53	6.24
SGM-MSIF (this paper)	3.55	3.63	1.24	1.91	2.58

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deng, C.; Liu, D.; Zhang, H.; Li, J.; Shi, B. Semi-Global Stereo Matching Algorithm Based on Multi-Scale Information Fusion. Appl. Sci. 2023, 13, 1027. https://0-doi-org.brum.beds.ac.uk/10.3390/app13021027

AMA Style

Deng C, Liu D, Zhang H, Li J, Shi B. Semi-Global Stereo Matching Algorithm Based on Multi-Scale Information Fusion. Applied Sciences. 2023; 13(2):1027. https://0-doi-org.brum.beds.ac.uk/10.3390/app13021027

Chicago/Turabian Style

Deng, Changgen, Deyuan Liu, Haodong Zhang, Jinrong Li, and Baojun Shi. 2023. "Semi-Global Stereo Matching Algorithm Based on Multi-Scale Information Fusion" Applied Sciences 13, no. 2: 1027. https://0-doi-org.brum.beds.ac.uk/10.3390/app13021027

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Global Stereo Matching Algorithm Based on Multi-Scale Information Fusion

Abstract

1. Introduction

2. SGM-MSIF Algorithm

2.1. Matching Cost Computation

2.2. Improved Cross-Scale Multi-Path Cost Aggregation

2.3. Disparity Computation and Disparity Optimization

3. Experimental Design and Results Analysis

3.1. Experimental Design

3.2. Experimental Result Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI