Next Article in Journal
Random Access for Underwater Acoustic Cellular Systems
Previous Article in Journal
An Intraoperative Visualization System Using Hyperspectral Imaging to Aid in Brain Tumor Delineation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Review of Depth and Normal Fusion Algorithms

1
Center for Vision, Automation and Control, Austrian Institute of Technology, Vienna 1210, Austria
2
Institute of Computer Graphics and Vision, Graz University of Technology, Graz 8010, Austria
*
Author to whom correspondence should be addressed.
Submission received: 11 November 2017 / Revised: 21 December 2017 / Accepted: 26 January 2018 / Published: 1 February 2018
(This article belongs to the Section Physical Sensors)

Abstract

:
Geometric surface information such as depth maps and surface normals can be acquired by various methods such as stereo light fields, shape from shading and photometric stereo techniques. We compare several algorithms which deal with the combination of depth with surface normal information in order to reconstruct a refined depth map. The reasons for performance differences are examined from the perspective of alternative formulations of surface normals for depth reconstruction. We review and analyze methods in a systematic way. Based on our findings, we introduce a new generalized fusion method, which is formulated as a least squares problem and outperforms previous methods in the depth error domain by introducing a novel normal weighting that performs closer to the geodesic distance measure. Furthermore, a novel method is introduced based on Total Generalized Variation (TGV) which further outperforms previous approaches in terms of the geodesic normal distance error and maintains comparable quality in the depth error domain.

1. Introduction

Measuring the depth of a scene accurately is essential for many tasks including applications in industrial environments, object recognition and security assurance. Usually the depth is measured by stereo cameras, structure from motion (SfM), time of flight (ToF) sensors, or light field cameras. These methods show accurate absolute depth maps but lack detail in high frequency depth structures. The reason for this lies in the dependency on the presence of structural information in the image as well as in the analysis routine, which is usually done by hypothesis testing and therefore limited in range and step sizes. Stereo matching methods include correlation based techniques [1], semi global matching [2,3], block based matching [4,5] and stereo matching for micro array cameras [6]. Several methods were introduced to retrieve depth information from light field data using epipolar image slope analysis [7], structure tensors [8], fine-to-coarse approaches [9,10] and by line consistency metrics [11]. Structure from motion and time of flight techniques were presented in [12,13,14], respectively. In contrast to depth based approaches, methods that recover surface normals, such as photometric stereo [15], show high frequency details but lack an absolute depth reference. Shape from shading was used to retrieve surface normals by [16]. A robust normal reconstruction using photometric stereo information with a Markov Random Field (MRF) was introduced in [17]. Previous methods have been presented, which retrieve surface normals from a calibrated stereo setup. This can be achieved e.g., by estimating the homography between two matched patches [18,19] or by using the affine transform data between two projections [20] additionally to the reconstruction of a sparse depth map retrieved from a stereo correspondence analysis. A learning-based method using a tandem of convolutional neural networks to estimate depth and surface normals from image data simultaneously was introduced in [21]. Combining depth and surface normal data allows precise depth reconstructions for low- as well as high-frequency components in the depth map.
Depth maps and surface normals were previously combined in various ways. Shape from shading was used under general illumination in [22], photometric stereo normals were incorporated in [23,24,25]. Another approach was presented in [26], where the tangent plane of the given normals was projected into the measured normal field. This normal constraint was previously used in several algorithms (e.g., [27,28,29,30]). The method described in [31] uses a standard depth constraint and forces the Laplacian of the optimal solution to be in the proximity of the derivative of the given normals. In [32] a depth and photometric stereo fusion algorithm was introduced, which uses additional Laplacian smoothing term and adaptive pixel-wise weighting parameters to preserve surface discontinuities. The Laplacian smoothing term was also added in [33]. A extended penalty is chosen in [34], where the normal is enforced to be close to the normal from the initial depth map, while 2nd-order spherical harmonics are used to constrain the normals according to the observed shading in the input image, a smoothness function enforces the similarity of 1st-order neighbors and an additional term constrains the normals to unit length. A method to refine depth by photometric stereo information using RGB-D cameras was introduced in [35], where an energy function is optimizing for the depth, smoothness, shading and temporal aliasing of a scene. Surface normals from polarization cues were used to enhance the depth map in [36], in an iterative process the depth is refined with corrected surface normal information and a depth fidelity constraint, which enforces consistency between the surface from normals and accurate regions in the depth map. An original approach for inferring about the surface normals from light field data as well as a hybrid setup combining depth maps with surface normals using a block coordinate descent algorithm was demonstrated in [37].
Even though several methods to combine surface depth with surface orientation data previously emerged, a thorough analysis and classification of the properties of those approaches was missing. In this paper, our first main contribution comprises an in-depth comparison of several variational methods using depth maps and surface orientation data, as well as a classification and evaluation of weighting terms for surface orientations using gradients or surface normals. We analyze orientation weighting terms of common methods and explain their differences in respect to the geodesic distance weighting. We show that methods which behave closer to this natural surface normal weighting term show a better performance, especially in regions with steep depth edges. Based on the findings we introduce our second main contribution, a new generalized formulation of a previously introduced method [26], which outperforms other methods regarding the error in the depth domain. Our third main contribution is a novel gradient-based approach, which is using TGV and outperforms other methods in the domain of the geodesic error of the resulting normals.

2. Depth and Surface Normal Cues

At present, 3D models are used for a wide range of analysis tasks. Depth models are being constructed by acquisition devices using stereo systems, light field cameras, time of flight (ToF), or other range scanning techniques. Common methods show a high precision in the absolute depth measurement, but a low quality in fine relative details. These errors are major obstructions for tasks such as finding defects in objects. Measuring the normal fields of objects by using methods such as photometric stereo [15] or shape from shading [38] will allow the reconstruction of surfaces with highly precise local details. On the downside, those methods show errors in the low frequency domain and therefore result in a low absolute depth accuracy.
Combining depth maps with surface normal information allows an exact 3D reconstruction both in absolute depth and fine surface details. This can be achieved by optimizing energy functions by variational methods, where the solution is penalized for deviating from the depth model and from the surface normals. In state-of-the-art techniques, the surface normal component is either used directly or by converting it to gradient information, where the x- and y-component can be treated independently. Such an independent treatment can be beneficial for applications where data components are missing, as for example line-scanners [39].

3. Notations and Preliminaries

In this section, we introduce the essential notations used across this paper. By default we assume discretized surface structures of the size of M × N pixels. In order to access the image location, we define the following index set I = { ( i , j ) : 1 i M , 1 j N } .
The discrete depth map of our scene is scalar valued in each pixel and defined as follows:
Z = ( Z i , j ) i , j I R M × N .
Variables with a bold font refer to surface structures where each pixel is vector valued. Hence, the surface gradient field G in x- and y-direction is defined as:
G = G i , j i , j I R M × N × 2 , where G i , j = ( G i , j , x , G i , j , y ) .
The gradient of the depth map is computed using standard finite differences:
Z = ( ( Z ) i , j ) i , j I , where ( Z ) i , j = ( ( x Z ) i , j , ( y Z ) i , j ) ,
and the gradient operator in x- and y-direction : R M × N R M × N × 2 is given by:
( x Z ) i , j = Z i + 1 , j Z i , j if 1 i < M , 0 , otherwise , ( y Z ) i , j = Z i , j + 1 Z i , j if 1 j < N , 0 , otherwise .
The surface normal field is defined as follows:
N = N i , j i , j I R M × N × 3 , where N i , j = ( N i , j , x , N i , j , x , N i , j , z ) R 3 .
By definition, we have | N i , j | 2 = 1 . The relation between the surface gradient estimation and the surface normals is defined for all i , j I as follows:
N i , j , x = ( x Z ) i , j | ( ( Z ) i , j , 1 ) | 2 , N i , j , y = ( y Z ) i , j | ( ( Z ) i , j , 1 ) | 2 , and N i , j , z = 1 | ( ( Z ) i , j , 1 ) | 2 .
Furthermore, we are using two specific surface tangent vectors, which are aligned with the x- and y-vector respectively and defined as follows:
T = ( T i , j ) i , j I R M × N × 2 × 3 , where
T i , j = ( T i , j , x , T i , j , y ) and
T i , j , x = ( 1 , 0 , ( x Z ) i , j ) and T i , j , y = ( 0 , 1 , ( y Z ) i , j ) .

4. Depth and Surface Normal Fusion Algorithms

In this paper, we analyze state-of-the-art methods in a systematic way and introduce two novel approaches. The described hybrid depth and surface normal methods are categorized in terms of their penalty functions. State-of-the-art approaches used similar depth penalty terms and differed in the surface orientation weighting and regularization. We organize the described methods in two categories: (i) gradient-based and (ii) normal-based approaches as well as with respect to their treatment of flat and steep surface regions. While the presented methods show similar behaviors in flat areas, they differ in the penalization of steep regions. We show the quadratic penalty functions of the methods presented in Figure 1 and Figure 2 for lateral and polar deviations, which are illustrated in Figure 3. We will show that the behavior of the energy function for different inclination angles correlates with the quantitative and qualitative depth reconstruction performance of each individual method. We argue that the geodesic distance function shows the most natural behavior with the favorable property of penalizing the distance of the normal orientation independent of the steepness of the edges. Due to this ideal behavior, we use the function to evaluate other distance measures in Section 5. Without loss of generality, we assume a dense depth and normal map for the algorithms described in this paper. In case of sparse input data, we suggest an extension based on Poisson surface reconstruction [40] to deal with sparse data.
An overview of the presented methods is given in Table 1. The first method we present in Section 4.2 is the construction of depth from only the gradient surface orientation information (i.e., no prior information about the absolute depth is being used). Using only the surface orientation for the depth reconstruction results in large-scale low-frequency errors (and therefore depth offsets). Later we overcome this problem by introducing an additional depth constraint in all following methods.
Second, we introduce the gradient-based approach with a depth constraint formulated as a least squares problem in Section 4.3. The respective contour plot of the orientation distance measure shows a strong penalization of steep edges in contrast to flat surface regions. It is easy to see (Figure 1), that the error from the same angular deviation due to noisy normals may generate from small up to infinity error in the gradient domain, depending on the inclination angle. For demonstrative purposes, we additionally show two extensions of the gradient-based method with regularization terms. One forces gradient-based smoothing. The other one enforces smoothness with a Laplacian term and can be used for the reconstruction with sparse depth and surface normal data.
Third, the method of Heber [41] for combining depth with surface orientations is shown in Section 4.4, which scales the given normal by the length of the optimized gradient.
Forth, a review of the method of Nehab [26] is given in Section 4.5, which reprojects the tangents of the optimized surface onto the given normal.
Fifth, as one of our main contributions in this paper, we introduce a new penalty function in form of a generalized method of Nehab in Section 4.6. Using a novel parametrization moves the penalty function closer to the geodesic normal energy and hence penalizes deviations in steep edges and flat regions more equally.
Last, we introduce our second main contribution, a novel Total Generalized Variation (TGV) model in Section 4.7, which penalizes the distance of the gradients of the surface orientation and gives significantly improved reconstruction results. The reason for this lies in the decoupling of the gradient through the TGV term.

4.1. Geodesic Distance

In the 3D space, the geodesic distance is the most natural surface normal penalty as distances between surface normals are weighted equally, independent of the surface slope angle with respect to the observer. Therefore it is used in this paper as a comparison measure for the evaluation in Section 5.
The geodesic distance d i , j is defined as the inverse cosine of the point-wise dot product between the given normal N ^ R M × N × 3 and the estimated solution normal field N R M × N × 3 as follows:
d i , j = a c o s ( N ^ i , j , N i , j ) , i , j I ,
where · , · denotes the standard dot product, which is defined as a , b = i = 1 n a i b i = | | a | | | | b | | c o s ( ϕ ) and ϕ describes the angle between a and b. We can formulate the distance in Equation (10) by utilizing the surface gradient estimation Z R M × N × 2 for the surface normal N with the relations shown in Equation (6) as follows:
d i , j = a c o s N ^ i , j , ( ( Z ) i , j , 1 ) | ( ( Z ) i , j , 1 ) | 2 .
The surface orientation weighting of the geodesic distance is illustrated in the contour plot in the first row of Figure 1. The first column shows the polar deviation of the coordinates and the second column shows the lateral deviation, parameterized by the inclination angle α and the deviation angle β , as illustrated in Figure 3. For the following methods a balanced weighting over all inclination angles both for the polar and lateral deviation is favored, as provided by the geodesic distance.

4.2. Gradient-Based Method with Surface Orientation Constraint Only

As typical for photometric stereo methods, depth can be partly recovered from surface orientations only. In order to provide a complete context for the method considered in this paper, we show here a method that is using solely gradient-based data. Gradient-based methods have previously been frequently utilized for depth reconstruction (e.g., [16,42,43,44,45,46,47]). Given a gradient field G ^ R M × N × 2 , we calculate the surface gradients for the estimated depth map Z in x- ( x Z ) and y-direction ( y Z ) respectively. Combining relations from Equation (6), the relations of surface normals and gradients are as follows:
| ( ( Z ) i , j , 1 ) | 2 = ( x Z ) i , j N i , j , x = ( y Z ) i , j N i , j , y = 1 N i , j , z , i , j I ,
hence the surface gradients are given as:
( x Z ) i , j = N i , j , x N i , j , z and ( y Z ) i , j = N i , j , y N i , j , z , i , j I ,
and the given gradient fields correspond to the surface normals by:
G ^ i , j , x = N ^ i , j , x N ^ i , j , z and G ^ i , j , y = N ^ i , j , y N ^ i , j , z , i , j I ,
in x- and y- direction respectively. Our goal is to compute a depth map Z such that
( Z ) i , j G ^ i , j , i , j I .
The comparison of the resulting penalty between the measured and the given gradients is illustrated in Figure 4a. Since Equation (15) is an overdetermined system of linear equations, it can be solved as the following least squares problem:
min Z 1 2 | | Z G ^ | | 2 ,
whose global minimizer Z m i n satisfies the first order sufficient optimality condition:
( Z m i n G ^ ) = 0 ,
where : R M × N × 2 R M × N denotes the adjoint of the ∇ operator, with = ( x , y ) . We compute the minimizer using a standard conjugate gradient method.
It is well known that reconstructing the depth using only surface normal data usually results in errors in the low frequency domain. In the past, this has been improved by different approaches, such as introducing additional boundary conditions [44]. We resolve this problem by hybrid depth and surface normal formulations which proved very efficient in finding an accurate surface reconstruction.

4.3. Gradient-Based Method

In this section, we discuss a hybrid gradient-based method which reconstructs depth using a gradient-based algorithm (similar to the one from Section 4.2) extended by the use of the given initial depth Z ^ . Also here the gradient of the estimated depth map Z is forced to be in the proximity of a measured gradient G ^ in x- and in y-direction. Therefore, we formulate an overdetermined system of equations as follows:
Z Z ^ and ( Z ) i , j G ^ i , j , i , j I .
The corresponding least squares problem is given as:
min Z 1 2 | | Z Z ^ | | 2 + λ 2 | | Z G ^ | | 2 ,
where λ > 0 is used to balance between the depth and the orientation constraints. The global optimizer Z m i n is found by a standard conjugate gradient method, with the optimality condition given as follows:
Z m i n Z ^ + λ ( Z m i n G ^ ) = 0 .
The contour plots of the surface orientation penalty function corresponding to the gradient-based method are shown in Figure 1. Note that with this method, the penalty is notably stronger for deviations in steep regions than in flat regions.
For demonstrative purposes, we introduce the extension of the gradient-based method in Equation (19) with a regularization term. We use a Laplace 2nd-order method which is driven by the derivative of the given gradient field. We formulate the following least squares problem:
min Z 1 2 | | Z Z ^ | | 2 + λ 2 | | Z G ^ | | 2 + λ R 2 | | Δ Z ( G ^ ) | | 2 ,
where λ R > 0 balances the regularization, G ^ can be decomposed into ( x G x ^ , y G y ^ ) and Δ denotes the Laplace operator which is defined as follows:
Δ x = x x and Δ y = y y .
Such a gradient-based regularization can be applied to all presented methods.
The presented gradient-based approaches require dense depth and surface orientation data. A possibility to cope with sparse data is an additional smoothness assumption coupled with pixel-wise weighting parameters. Hence, we add another term to Equation (19) with a Laplacian smoothness assumption as follows:
min Z 1 2 | | λ 1 ( Z Z ^ ) | | 2 + 1 2 | | λ 2 ( Z G ^ ) | | 2 + 1 2 | | λ 3 ( Δ Z ) | | 2 .
The weighting parameters λ 1 and λ 2 can be given a priori by the confidence of a data point and λ 3 by the inverse confidence and are defined as follows:
λ 1 = λ 1 , i , j i , j I R M × N ,
λ 2 = λ 2 , i , j i , j I R M × N × 2 , where λ 2 , i , j = ( λ 2 , i , j , x , λ 2 , i , j , y ) , and
λ 3 = λ 3 , i , j i , j I R M × N × 2 , where λ 3 , i , j = ( λ 3 , i , j , x , λ 3 , i , j , y ) .
In case of stereo or light-field methods for depth reconstruction the parameters can be assessed base on the matching confidence. Unknown points would have a confidence of zero. The same extension for sparse data is applicable for all following methods. Therefore, without loss of generality we discuss the weighting of the surface orientation term with a focus on dense depth and surface orientation data without an additional smoothness assumption.

4.4. The Method of Heber

A hybrid variational refinement model was described by Heber [41], where an initial rough depth Z ^ is given and refined with surface normal information:
min Z 1 2 | | Z Z ^ | | 2 + λ 2 | | ( Z , 1 ) N ^ | ( Z , 1 ) | 2 | | 2 2 , where
| ( Z , 1 ) | 2 = | ( ( Z ) 1 , 1 , 1 ) | 2 , | ( ( Z ) 1 , 2 , 1 ) | 2 , , | ( ( Z ) M , N , 1 ) | 2
defines the vector of pointwise 2-norms and the symbol ⊙ denotes the operator for the point-wise product, also known as Hadamard product. This method is conceptually similar to the gradient-based method described in Section 4.3, also here the same depth constraint ensures a result in the proximity of an initial depth solution Z ^ . However, the surface orientation constraint used in Heber’s method exploits the given normal field N ^ directly instead of the gradient field G ^ . Here the normalization of Z by division by the length of the vector is overcome by multiplying the term on one side by | ( Z , 1 ) | 2 , which leads to a convex problem [41].
An illustration of the comparison of the measured Z and the given normal N ^ is shown in Figure 4b. The given normal is scaled to the length of the measured gradients Z . The contour map in Figure 1 demonstrates the normal weighting of the Heber’s energy term. Here, similar to the gradient-based method, a deviation in steep edges is penalized more than a deviation in flat regions. Also, a different weighting applies whether the estimated gradients are steeper or flatter than the given values.
As the energy function from Equation (27) is convex and differentiable, this algorithm can be solved by an (accelerated) gradient descent method or a (fast) proximal gradient approach. For our evaluation, we used a plain gradient descent approach.Nesterov [48] proposed an accelerated gradient descent method with a simple weighted gradient step, followed by additional sliding, based on the last estimation. A fast proximal gradient method has an additional extrapolation step compared to the proximal gradient method. An example is the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) [49] (see Appendix A.1).

4.5. The Method of Nehab

The method of Nehab [26] combines depth and surface normals by solving a system of linear equations, consisting of depth and surface orientation constraints. This method is similar to the gradient-based method described in Section 4.3, only the surface normal information is leveraged in a different way, making different trade-offs between flat and steep gradients.
In this method, the surface normal constraint optimizes the sum of squared projections of a set of surface tangent vectors T , as defined in Equation (7) through Equation (9), of the reconstructed surface Z onto the given normal field N ^ . This surface normal penalty has been adopted previously in several approaches (e.g., [21,27,28,29,30]). The projection is illustrated in Figure 4. Note that the lowest penalty 0 is reached when the tangent vector is precisely orthogonal to the given normal vector N ^ . We first consider the formulation as described by Nehab [26] by formulating an overdetermined linear system of sparse equations:
Z Z ^ , N ^ i , j , T i , j , x = N ^ i , j , x + N ^ i , j , z ( x Z ) i , j 0 , and N ^ i , j , T i , j , y = N ^ i , j , y + N ^ i , j , z ( y Z ) i , j 0 , i , j I ,
which leads to the following least squares problem:
min Z 1 2 | | Z Z ^ | | 2 + λ 2 | | N ^ z x Z + N ^ x | | 2 + λ 2 | | N ^ z y Z + N ^ y | | 2 ,
where the parameter λ [ 0 , 1 ] weights the influence of the initial depth and the given normals. The global optimizer Z m i n is calculated with a standard conjugate gradient method, with the optimality condition given as follows:
Z m i n Z ^ + λ x ( N ^ z ( N ^ z x Z m i n + N ^ x ) ) + y ( N ^ z ( N ^ z y Z m i n + N ^ y ) ) = 0 .
The weighting of the surface normal information is explained in more detail in Figure 4c and further illustrated in a contour plot in Figure 2. The polar deviation to the given surface orientation in steep and flat regions is penalized similarly to the method of Heber (see Figure 1, bottom row, left). However, the lateral deviations are weighted equally which is the same behavior as the ideal geodesic distance function (see Figure 1, top row, right).
Note that the approach of Nehab, as described in Equation (29), corresponds to the gradient-based method with an additional local N z -weighting applied to both sides of the surface normal constraint of Equation (18). This can be shown by utilizing the equivalences defined in Equations (13) and (14) as follows:
N ^ i , j , x = N ^ i , j , z ( x Z ) i , j and N ^ i , j , y = N ^ i , j , z ( y Z ) i , j , i , j I ,
and formulating a corresponding overdetermined linear system of equations:
Z Z ^ , N ^ i , j , z ( x Z ) i , j N ^ i , j , z G ^ i , j , x , and N ^ i , j , z ( y Z ) i , j N ^ i , j , z G ^ i , j , y , i , j I .
Compared with the gradient-based method, the N z -weighting inherent to Nehab’s method improves the behavior of the penalty function by weakening the influence of regions with steep edges. Rationale behind the N z -weighting follows from the fact that flat regions exhibit a higher value of N z (close to 1) while steep regions receive low N z values (close to 0). Such a local weighting helps preventing the over-penalization in steep regions as observed in the gradient-based method.

4.6. Generalized Nehab

In order to allow the normal weighting to reach a closer proximity to the geodesic distance, we propose a generalization of the method of Nehab. We extend on the concept of the gradient N z -weighting as explained in Equation (33) by introducing an additional exponent r that controls influence of the N z -weighting such that:
Z Z ^ , ( N ^ i , j , z ) r ( x Z ) i , j ( N ^ i , j , z ) r G ^ i , j , x , and ( N ^ i , j , z ) r ( y Z ) i , j ( N ^ i , j , z ) r G ^ i , j , y , i , j I .
We optimize the corresponding least squares problem with a standard conjugate gradient method:
min Z 1 2 | | Z Z ^ | | 2 + λ 2 | | ( N ^ z ) r x Z ( N ^ z ) r G ^ x | | 2 + λ 2 | | ( N ^ z ) r y Z ( N ^ z ) r G ^ y | | 2 ,
where we find a global optimizer Z m i n that satisfies the following optimality condition:
Z m i n Z ^ + λ ( x ( N ^ z ) r ( N ^ z ) r x Z m i n ( N ^ z ) r G ^ x + y ( N ^ z ) r ( N ^ z ) r y Z m i n ( N ^ z ) r G ^ y ) = 0
The influence of varying r on the behavior of the surface orientation normal penalty function is shown in Figure 2. It can be seen that with r = 0.5 the generalized method of Nehab penalizes steeper slopes stronger than the original method of Nehab (equivalent to r = 1 ) and weaker than the gradient-based method (equivalent to r = 0 ). On the other hand, with r = 1.6 the proposed method exhibits a more tolerant behavior towards steeper inclination angles than the original method of Nehab, resulting in a global behavior that is reasonably close to the geodesic penalty in both the polar and lateral deviation directions. The value of the parameter r might still be further optimized.
Regarding the optimal choice of r, we have recognized that r = 1.6 behaves in the vicinity to the geodesic distance. If the structure of the data shows strong noise on edges steeper than approximately 50 degrees, a lower choice of r can be considered.

4.7. Total Generalized Variation

In this section, we introduce another novel method to reconstruct a refined surface with a hybrid depth and surface normal approach that is based on a Total Generalized Variation (TGV) approach as introduced in [50]. The TGV is an extension of the Total Variation (TV), which is a very popular and efficient regularization technique used currently in many image processing applications, however, it is known for producing staircasing artifacts in slope regions of the solution. In contrast, the TGV overcomes the staircasing problem by allowing solutions of higher order. Our formulation is a gradient-based approach, which restricts the surface to be close with a quadratic penalty to an initial depth solution Z ^ , while enforcing the auxiliary gradient field G to be in the proximity of the given gradient field G ^ , propagated through the TGV penalty. Thereby, we simultaneously reconstruct the surface Z and the auxiliary gradient field G , such that our discrete model is given as follows:
min Z , G α 1 | | Z G | | 2 , 1 + α 0 | | G | | 2 , 1 + α 2 | | Z Z ^ | | 2 + β 2 | | G G ^ | | 2 ,
where the gradient operator : R M × N × 2 R M × N × 4 computes finite differences and G can be decomposed into ( G x , G y ) , where ∇ was defined in Equation (4). Therefore, the first and second order components of our TGV regularization have the following form:
| | Z G | | 2 , 1 = i , j ( x Z ) i , j 2 + ( G i , j , x ) 2 + ( y Z ) i , j 2 + ( G i , j , y ) 2 , and
| | G | | 2 , 1 = i , j ( x G x ) i , j 2 + ( y G x ) i , j 2 + ( x G y ) i , j 2 + ( y G y ) i , j 2 .
The function in Equation (37) consists of a depth and an orientation constraint as well as the TGV regularization terms. The depth constraint (i.e., the third term of Equation (37)) enforces the depth solution Z to stay in the vicinity of our noisy initial depth map Z ^ . The orientation constraint (i.e., the fourth term of Equation (37)) enforces G to stay in the proximity of the given surface gradient estimation G ^ . The TGV regularization part comprises a first and a second order term (first two terms of Equation (37)). The latter represents the TV component and penalizes the l 1 -norm of the second order gradient field, which forces the gradient field G to be piecewise constant. The former enforces the approximated gradient of the depth map Z to stay in the proximity of the auxiliary gradient field G also through an l 1 penalty.
Given our primal problem in Equation (37) we formulate a primal-dual (PD) problem, which belongs to the class of saddle-point problems, as follows:
min x max y ( K x ) T y F ( y ) + H ( x ) ,
where H describes the depth and orientation constraints, F defines the TGV component, with its convex conjugate F and the linear operator K is defined as:
K = I 0 .
Our primal variable is denoted as x R M × N × 3 and y R M × N × 6 represents the dual variable:
x = Z G x G y , y = y 1 y 2 , y 1 = y 1 , 1 y 1 , 2 , y 2 = y 2 , 1 y 2 , 2 y 2 , 3 y 2 , 4 .
where y 1 and y 2 hold the dual variables of the first and the second term in Equation (37) respectively, with y 1 holding two terms for each x and y , while y 2 holds four terms, each for x and y in G x and G y . For F ( x ) = α | | x | | p , where | | · | | p is an l p -norm, we calculate the convex conjugate as follows:
F ( y ) = δ | | · | | 2 , α ( y ) = 0 , if | y i , j | 2 α , i , j , otherwise ,
which is the indicator function of the polar ball. Therefore, F and H are defined as follows:
F ( y ) = δ | | · | | 2 , α 1 ( y 1 ) + δ | | · | | 2 , α 0 ( y 2 ) , and
H ( x ) = α 2 | | Z Z ^ | | 2 + β 2 | | G G ^ | | 2 .
An optimal solution to our hybrid formulation is found with the PD algorithm, as described in Appendix A.2. For a more detailed description of the PD algorithm, see [51].

5. Evaluation

To evaluate all considered algorithms we need two data structures, namely an initial depth map Z ^ and surface orientation information such as surface normals N ^ or gradients G ^ . Initial depth maps can be provided by stereo or light field correspondence analysis or other depth scanning methods. An estimate of the surface orientation is calculated using e.g., photometric stereo or polarization imaging. Our discussed hybrid algorithms differ mainly in how the surface orientation information is taken into account. In particular, the penalty functions associated with the surface orientation constraints vary significantly for different methods, as illustrated in Figure 1 through Figure 4c. We proposed two novel methods in Section 4.6 and Section 4.7, namely the generalized method of Nehab and the TGV approach. The former balances the surface orientation information with an improved weighting function withing the least squares framework. The latter uses a TGV regularization term combined with a gradient-based approach.

Qualitative and Quantitative Evaluation

In order to perform a comprehensive quantitative evaluation of the above mentioned methods, we considered using synthetic evaluation data that comprises full ground truth (GT) information. The initial depth Z ^ for the synthetic evaluation data is given by a ground truth depth map Z ^ G T , which is thresholded after adding noise:
Z ^ = 1 k k · ( Z ^ G T + n o i s e ) .
In this study, we considered a normally distributed additive noise as this type of noise well simulates the behavior of matching errors of stereo or light field methods combined with image sensor noise. The noise used in our evaluations has a maximum amplitude of 7% of the depth range. The constant k defines the number of discretization steps and [ · ] rounds to the nearest integer number n Z . These discretization artifacts attempt to simulate the output of the discrete regularized correspondence analysis, which is often applied in real-world scenarios (e.g., as described in [37]).
For the orientation constraints, we assume surface normals N ^ derived from the ground truth depth model N ^ G T by adding a normally distributed noise with a maximum amplitude of 23 % of the normal range in the spherical coordinate system:
N ^ = N ^ G T + n o i s e | | N ^ G T + n o i s e | |
In Figure 5, examples of the evaluation data Z ^ and N ^ as well as the corresponding ground truth depth Z ^ G T are shown. All 3D datasets used for evaluations were taken from the Stanford 3D scanning repository [52] and rendered with POV-Ray [53].
Qualitative comparisons of the methods described in Section 4 are presented in Figure 6 and Figure 7. The left column shows the color coded depth reconstructions as delivered by different methods. The middle two columns show the vertical and horizontal depth profiles as marked in Figure 5c. In each of these plots, the red, gray and black lines indicate the reconstructed depth Z, the initial depth Z ^ and the ground truth depth Z ^ G T , respectively. The error maps showing signed distances to the ground truth depth are provided in the right column. The corresponding geodesic distances from the ground truth surface normals N ^ G T for each method are displayed in Figure 8. Results of the quantitative evaluations are shown in Table 2 for gradient-based methods and in Table 3 for normal based methods. For the evaluation we used three datasets from [52]: Buddha (shown in qualitative analysis), Dragon, and Armadillo. The tables hold fractional precisions of two digits in the depth evaluation and a higher fractional precision of four digits in the normal evaluation due to different ranges. The average is calculated using 4 digits after the comma, rounding errors can cause differences in the last digit. The MSE distance in the depth domain is calculated by the quadratic distance between the ground truth depth and the depth result:
M S E Z = d Z = 1 M N | | Z ^ G T Z | | 2
The geodesic distance is calculated as described in Section 4.1 by using the following equation:
G E O N = d N = 1 M N i , j M N a c o s N ^ i , j , ( ( x Z G T ) i , j , ( y Z G T ) i , j , 1 ) | ( ( x Z G T ) i , j , ( y Z G T ) i , j , 1 ) | 2
As can be seen in Figure 7(1st row) and Figure 8b, using surface orientation information alone in the gradient-based formulation provides visually pleasing detail reconstructions ( d N = 0.2365 ), though it is performing worst in terms of the absolute distance to the ground truth depth with an average MSE of d Z = 66.68 . In Figure 7(2nd row) and Figure 8c, one can see that adding a depth constraint to the same gradient-based formulation improves the result drastically ( d Z = 2.04 and d N = 0.2951 ). Nevertheless, this method still shows somewhat low performance around steep edges. Note that each of the demonstrated methods could be used to reconstruct surfaces from surface normals only solely by dropping the depth term. The method of Nehab shown in Figure 7(3rd row) and Figure 8d exhibits the capability to improve the result over the previous methods exploiting a better surface orientation weighting strategy ( d Z = 0.14 , d N = 0.2532 ). These methods are optimized using a least squares solver. The evaluation of Heber’s method is shown in Figure 7(4th row) and Figure 8e. The method is optimized using gradient descent and reaches average results of d Z = 1.79 and d N = 0.3049 , which is significantly worse than the method of Nehab. The results of our generalized method of Nehab with a parametrization r = 1.6 are shown in in Figure 6(1st row) and Figure 8f. This method improves robustness over the standard method of Nehab against noise in surface normals and outperforms all other evaluated methods in the absolute depth error domain with d Z = 0.11 . As the normal weighting here is in closer vicinity to the geodesic penalty than the method of Nehab, this approach reaches an improved normal error of d N = 0.2442 . This method is optimized using a least squares solver. Our novel TGV method, shown in Figure 6(2nd row) and Figure 8g, provides by far the best normal accuracy of d N = 0.0666 and performs among the best in the depth domain ( d Z = 0.20 ). It is optimized with a primal-dual algorithm.
For completion, we additionally show the results of two regularized gradient methods in Table 3. First, we regularize with smoothness as shown in Equation (23), which can be used for sparse data. In our dense case, where we used scalars for the weighting parameters, the normal accuracy shows a minor improvement due to smoothing in return of a weaker depth accuracy. Second, we regularize with the gradient, as shown in Equation (21). With this we can reach an improvement both in the depth and normal accuracy. These regularization terms could be used for all presented methods, which would exceed the scope of this paper. Note that our novel TGV method shows a significantly better performance in both accuracy measurements.
As can be seen in Table 2 and Table 3, the two best performing methods are our generalized method of Nehab with r = 1.6 (normal based) and our novel TGV method (gradient-based). We show the convergence of both in Figure 9 with respect to the depth error defined in Equation (48) and the normal error as defined in Equation (49). While the generalized Nehab converges in both terms after approximately 25 iterations, the TGV settles at around the same iteration step with a depth error which is performing slightly worse than the other method, but continues to highly optimize the normal error. Note that a gradient-based formulation was chosen to demonstrate graceful properties of our novel TGV approach. It significantly improved the results compared with the standard gradient-based method. Alternatively, for even better performance, other penalty functions such as the generalized method of Nehab, can be chosen and will be a matter of future research.
We demonstrated the described algorithms with different surface orientation weighting on a real world example, as shown in Figure 10. This object was acquired with a multi-line scanner, the hybrid light field - photometric stereo acquisition framework described in [37]. In this very specific case we only have one normal direction. An example how to deal with this data in the gradient-based approach is weighting the gradient vector in Equation (23) in the missing direction with zero. Specifics of the multi-line-scanning environment are out of the scope of this paper and are a matter of future research.

6. Conclusions

We presented a review and classification of methods combining depth and surface orientation data (normals or gradients), in order to reach an improved surface depth estimation. State-of-the-art methods differ mostly in the formulation of the surface orientation constraint (see Section 4) and capabilities of the method-specific solvers.
We illustrated the differences between various formulations of the surface orientation constraint and explained performance discrepancies. Furthermore, we used our findings to introduce a generalization of the method of Nehab (see Section 4.6) that significantly outperforms other methods in terms of absolute depth accuracy. Additionally, we introduced a novel method based on TGV (see Section 4.7), which outperforms all other methods in the surface normal domain and shows a competitive performance in the depth accuracy. While our generalized Nehab method converges faster (see Figure 9) and gives the most accurate result in the depth domain, our TGV based approach refines the surface orientation further and converges at the most accurate orientation result with a high accuracy in the depth domain.
Further research will include the specialization on line-scanning algorithms, TGV weighting adaption and computational acceleration. With specialized hybrid algorithms that fit data from line scanning sensors we will determine a solution with incomplete surface orientation data. The surface orientation constraint of our TGV formulation is currently gradient-based. Plugging in another formulation with a better balanced normal weighting could improve the results even further and will be a matter of future research. Furthermore we will focus on computational acceleration of the proposed algorithms, where we will exploit their inherent structure to achieve efficient parallelization.

Acknowledgments

This work is supported by the research initiative ’Intelligent Vision Austria’ with funding from the Austrian Federal Ministry of Science, Research and Economy and the Austrian Institute of Technology.

Author Contributions

D. Antensteiner and S. Štolc conceived and designed the experiments; D. Antensteiner performed the experiments; D. Antensteiner and S. Štol analyzed the data; T. Pock contributed reagents/materials/analysis tools; D. Antensteiner wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
CRFConditional Random Field
GTGround Truth
FISTAFast Iterative Shrinkage Thresholding Algorithm
ISTAIterative Shrinkage Thresholding Algorithm
MRFMarkov Random Field
MSEMean Squared Error
PDPrimal-Dual
SfFShape from Focus
SfMStructure from Motion
TGVTotal Generalized Variation
ToFTime of Flight
TVTotal Variation

Appendix A. Optimization Theory

In this section we show two advanced optimization methods used to fuse depth and surface orientation as well as the type of functions they can optimize.

Appendix A.1. Accelerated Proximal Gradient Method

The fast iterative shrinkage thresholding algorithm (FISTA) minimizes an objective function by using proximal operators (see [54]). It works on general conditions as non-smooth convex functions and can be very fast, as there exist simple proximal operators for various energy functions. Proximal gradient methods consider optimization problems with the objective split in two components:
min x F ( x ) + H ( x ) ,
where F : R N R and H : R N R { + } are closed proper convex functions and where one of the functions (F) is additionally differentiable. The proximal gradient problem uses a step size λ k > 0 and is denoted as follows:
x k + 1 p r o x λ k H ( x k λ k F ( x k ) ) .
If F is Lipschitz continuous with a constant L and a fixed step size λ k = λ ( 0 , 1 / L ] is used, the method converges in O ( 1 / k ) . Accelerated proximal gradient methods have an additional extrapolation step in the algorithm and are denoted as follows:
y k + 1 x k + w k ( x k x k 1 ) ,
x k + 1 p r o x λ k H ( y k + 1 λ k F ( y k + 1 ) ) ,
where λ k denotes the step size and w k an extrapolation parameter which is described as:
w k = k 1 k + 2 .
FISTA is one example of such an accelerated proximal gradient method and described in detail in [49]. In [41], the FISTA algorithm is used to optimize the method of Heber, which we discuss in Section 4.4.

Appendix A.2. Primal-Dual

Primal-dual methods can be used to solve a large number of optimization problems. The considered saddle-point problem, which is a primal-dual formulation has the following primal form:
min x X F ( K x ) + H ( x ) .
The corresponding saddle point problem is described as:
min x X max y Y K x , y + H ( x ) F ( y ) .
We are using the Chambolle-Pock [55] primal-dual method and compute the proximal maps for H and F . A proximal descent is computed in the primal variable x and, in an alternating manner, a proximal ascent in the dual variable y:
x k + 1 prox τ H ( x k τ K T y k ) y k + 1 prox σ F ( y k + σ K ( 2 x k + 1 x k ) )
The convergence rate is O ( 1 / N ) , and can be improved if H or F are uniformly convex to O ( 1 / N 2 ) . Both, primal and the dual being uniformly convex the algorithm shows a linear convergence O ( e N ) [55,56,57]. We discuss the primal-dual optimization on our hybrid TGV formulation in Section 4.7.

References

  1. Hirschmüller, H.; Innocent, P.R.; Garibaldi, J. Real-Time Correlation-Based Stereo Vision with Reduced Border Errors. Int. J. Comput. Vis. 2002, 47, 229–246. [Google Scholar] [CrossRef]
  2. Hirschmuller, H. Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 328–341. [Google Scholar] [CrossRef] [PubMed]
  3. Haller, I.; Pantilie, C.; Oniga, F.; Nedevschi, S. Real-Time Semi-Global Dense Stereo Solution with Improved Sub-Pixel Accuracy. In Proceedings of the 2010 IEEE Intelligent Vehicles Symposium, San Diego, CA, USA, 21–24 June 2010; pp. 369–376. [Google Scholar]
  4. Je, C.; Park, H.M. Optimized Hierarchical Block Matching for Fast and Accurate Image Registration. Signal Process. Image Commun. 2013, 28, 779–791. [Google Scholar] [CrossRef]
  5. Yang, X.; Chen, X.; Xi, J. Block Based Dense Stereo Matching Using Adaptive Cost Aggregation and Limited Disparity Estimation. In Proceedings of the 2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Turin, Italy, 22–25 May 2017; pp. 1–6. [Google Scholar]
  6. Chen, X.; Li, D.; Zou, J. Depth Estimation of Stereo Matching Based on Microarray Camera. In Proceedings of the 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017; pp. 108–112. [Google Scholar]
  7. Bolles, R.C.; Baker, H.H.; Marimont, D.H. Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion; Springer: Berlin, Germany, 1987; Volume 1, pp. 7–55. [Google Scholar]
  8. Wanner, S.; Straehle, C.; Goldluecke, B. Globally Consistent Multi-Label Assignment on the Ray Space of 4D Light Fields. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 1011–1018. [Google Scholar]
  9. Kim, C.; Zimmer, H.; Pritch, Y.; Sorkine-Hornung, A.; Gross, M. Scene Reconstruction from High Spatio-Angular Resolution Light Fields. ACM Trans. Graph. 2013, 32, 73:1–73:12. [Google Scholar] [CrossRef]
  10. Tosic, I.; Berkner, K. Light Field Scale-Depth Space Transform for Dense Depth Estimation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Columbus, OH, USA, 23–28 June 2014; pp. 435–442. [Google Scholar]
  11. Tao, M.W.; Wang, T.C.; Malik, J.; Ramamoorthi, R. Depth Estimation for Glossy Surfaces with Light-Field Cameras. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 1–14. [Google Scholar]
  12. Longuet-Higgins, H.C. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1987; pp. 61–62. [Google Scholar]
  13. Snavely, N.; Seitz, S.M.; Szeliski, R. Photo Tourism: Exploring Photo Collections in 3D; ACM: New York, NY, USA, 2006; Volume 25, pp. 835–846. [Google Scholar]
  14. Park, J.; Kim, H.; Tai, Y.W.; Brown, M.S.; Kweon, I. High Quality Depth Map Upsampling for 3D-ToF Cameras. In Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1623–1630. [Google Scholar]
  15. Woodham, R.J. Photometric Method for Determining Surface Orientation from Multiple Images. Int. Soc. Opt. Photonics 1980, 19, 191139. [Google Scholar] [CrossRef]
  16. Frankot, R.T.; Chellappa, R. A Method for Enforcing Integrability in Shape From Shading Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 1988, 10, 439–451. [Google Scholar] [CrossRef]
  17. Wu, T.P.; Tang, C.K. Dense Photometric Stereo Using a Mirror Sphere and Graph Cut. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 140–147. [Google Scholar]
  18. Malis, E.; Vargas, M. Deeper Understanding of the Homography Decomposition for Vision-Based Control; Research Report RR-6303; INRIA: Rocquencourt, France, 25 September 2007. [Google Scholar]
  19. Köser, K. Geometric Estimation with Local Affine Frames and Free-form Surfaces. Ph.D. Thesis, Christian-Albrechts-Universität zu Kiel, Kiel, Germany, 2008. [Google Scholar]
  20. Barath, D.; Molnar, J.; Hajder, L. Novel Methods for Estimating Surface Normals from Affine Transformations. In Communications in Computer and Information Science; Braz, J., Pettré, J., Richard, P., Kerren, A., Linsen, L., Battiato, S., Imai, F., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 316–337. [Google Scholar]
  21. Eigen, D.; Fergus, R. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
  22. Wu, C.; Wilburn, B.; Matsushita, Y.; Theobalt, C. High-Quality Shape from Multi-View Stereo and Shading Under General Illumination. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011; pp. 969–976. [Google Scholar]
  23. Zhang, L.; Curless, B.; Hertzmann, A.; Seitz, S.M. Shape and Motion under Varying Illumination: Unifying Structure from Motion, Photometric Stereo, and Multi-view Stereo. In Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 618–625. [Google Scholar]
  24. Joshi, N.; Kriegman, D.J. Shape from Varying Illumination and Viewpoint. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–7. [Google Scholar]
  25. Esteban, C.H.; Vogiatzis, G.; Cipolla, R. Multiview Photometric Stereo. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 548–554. [Google Scholar] [CrossRef] [PubMed]
  26. Nehab, D.; Rusinkiewicz, S.; Davis, J.; Ramamoorthi, R. Efficiently Combining Positions and Normals for Precise 3D Geometry. ACM Trans. Graph. 2005, 24, 536–543. [Google Scholar] [CrossRef]
  27. Haque, S.M.; Chatterjee, A.; Govindu, V.M. High Quality Photometric Reconstruction Using a Depth Camera. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2283–2290. [Google Scholar]
  28. Park, J.; Sinha, S.N.; Matsushita, Y.; Tai, Y.W.; Kweon, I.S. Robust Multiview Photometric Stereo Using Planar Mesh Parameterization. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1591–1604. [Google Scholar] [CrossRef] [PubMed]
  29. Han, Y.; Lee, J.Y.; Kweon, I.S. High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 1617–1624. [Google Scholar]
  30. Chatterjee, A.; Govindu, V.M. Photometric refinement of depth maps for multi-albedo objects. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 933–941. [Google Scholar]
  31. Shi, B.; Inose, K.; Matsushita, Y.; Tan, P.; Yeung, S.K.; Ikeuchi, K. Photometric Stereo Using Internet Images. In Proceedings of the 2014 2nd International Conference on 3D Vision, Tokyo, Japan, 8–11 December 2014. [Google Scholar]
  32. Yu, H. Edge-preserving Photometric Stereo via Depth Fusion. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR ’12, Providence, RI, USA, 16–21 June 2012; IEEE Computer Society: Washington, DC, USA, 2012; pp. 2472–2479. [Google Scholar]
  33. Ti, C.; Yang, R.; Davis, J.; Pan, Z. Simultaneous Time-of-Flight sensing and photometric stereo with a single ToF sensor. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 4334–4342. [Google Scholar]
  34. Yu, L.F.; Yeung, S.K.; Tai, Y.W.; Lin, S. Shading-Based Shape Refinement of RGB-D Images. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR’13, Portland, OR, USA, 23–28 June 2013; IEEE Computer Society: Washington, DC, USA, 2013; pp. 1415–1422. [Google Scholar]
  35. Wu, C.; Zollhöfer, M.; Nießner, M.; Stamminger, M.; Izadi, S.; Theobalt, C. Real-Time Shading-Based Refinement for Consumer Depth Cameras. ACM Trans. Graph. 2014, 33. [Google Scholar] [CrossRef]
  36. Kadambi, A.; Taamazyan, V.; Shi, B.; Raskar, R. Polarized 3D: High-Quality Depth Sensing with Polarization Cues. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3370–3378. [Google Scholar]
  37. Antensteiner, D.; Štolc, S.; Valentin, K.; Blaschitz, B.; Huber-Mörk, R.; Pock, T. High-Precision 3D Sensing with Hybrid Light Field and Photometric Stereo Approach in Multi-Line Scan Framework. In Proceedings of the IS&T International Symposium on Electronic Imaging: Intelligent Robotics and Industrial Applications using Computer Vision 2017, Burlingame, CA, USA, 29 January–2 February 2017. [Google Scholar]
  38. Horn, B.K.P.; Brooks, M.J. (Eds.) Shape from Shading; MIT Press: Cambridge, MA, USA, 1989. [Google Scholar]
  39. Štolc, S.; Soukup, D.; Holländer, B.; Huber-Mörk, R. Depth and All-In-Focus Imaging by a Multi-Line-Scan Light-Field Camera. J. Electron. Imaging 2014, 23, 19–23. [Google Scholar] [CrossRef]
  40. Kazhdan, M.; Bolitho, M.; Hoppe, H. Poisson Surface Reconstruction. In Proceedings of the Fourth Eurographics Symposium on Geometry Processing, SGP’06, Cagliari, Sardinia, 26–28 June 2006; Eurographics Association: Aire-la-Ville, Switzerland, 2006; pp. 61–70. [Google Scholar]
  41. Heber, S. Variational Shape from Unconventional Imaging Devices. Ph.D. Thesis, Graz University of Technology, Graz, Austria, 18 May 2015. [Google Scholar]
  42. Scherr, T. Gradient-Based Surface Reconstruction and the Application to Wind Waves. Master’s Thesis, Ruprecht-Karls-University Heidelberg, Heidelberg, Germany, 3 November 2017. [Google Scholar]
  43. Rostami, M.; Michailovich, O.; Wang, Z. Gradient-Based Surface Reconstruction Using Compressed Sensing. In Proceedings of the 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 913–916. [Google Scholar]
  44. Harker, M.; O’Leary, P. Least Squares Surface Reconstruction from Measured Gradient Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, Anchorage, AK, USA, 23–28 June 2008. [Google Scholar]
  45. Du, Z.; Robles-Kelly, A.; Lu, F. Robust Surface Reconstruction from Gradient Field Using the L1 Norm. In Proceedings of the 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications (DICTA 2007), Glenelg, Australia, 3–5 December 2007; pp. 203–209. [Google Scholar]
  46. Agrawal, A.; Chellappa, R.; Raskar, R. An Algebraic Approach to Surface Reconstruction from Gradient Fields. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), Beijing, China, 17–21 October 2005; Volume 1, pp. 174–181. [Google Scholar]
  47. Horn, B.K.P.; Brooks, M.J. The Variational Approach to Shape from Shading. Comput. Vis. Graph. Image Process. 1986, 33, 174–208. [Google Scholar] [CrossRef]
  48. Nesterov, Y.E. Introductory Lectures on Convex Optimization: A Basic Course; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2003. [Google Scholar]
  49. Beck, A.; Teboulle, M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
  50. Bredies, K.; Kunisch, K.; Pock, T. Total Generalized Variation. SIAM J. Imaging Sci. 2010, 3, 492–526. [Google Scholar] [CrossRef]
  51. Chambolle, A.; Pock, T. An Introduction to Continuous Optimization for Imaging; Cambridge University Press: Cambridge, UK, 2016; Volume 25, pp. 161–319. [Google Scholar]
  52. The Stanford 3D Scanning Repository. Available online: http://graphics.stanford.edu/data/3Dscanrep/ (acccessed on 10 November 2017).
  53. Persistence of Vision Pty. Ltd. (2004) [Computer Software]. Available online: http://www.povray.org/download/(acccessed on 10 November 2017).
  54. Parikh, N.; Boyd, S. Proximal Algorithms; Now Publishers Inc.: Breda, The Netherlands, 2013. [Google Scholar]
  55. Chambolle, A.; Pock, T. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging; Springer: Dordrecht, The Netherlands, 2011; Volume 40, pp. 120–145. [Google Scholar]
  56. Nesterov, Y. Smooth Minimization of Non-Smooth Functions. Math. Program. 2005, 103, 127–152. [Google Scholar] [CrossRef]
  57. Beck, A.; Teboulle, M. Gradient-Based Algorithms with Applications to Signal Recovery Problems; Cambridge University Press: Cambridge, UK, 2010; pp. 42–88. [Google Scholar]
Figure 1. Quadratic penalty functions of the surface orientation constraint visualized for deviations in polar (left column) and lateral (right column) directions. Shown are the geodesic energy (top row), the gradient-based method (middle row) and the method of Heber (bottom row). Colors of the contours indicate the respective penalty values obtained for different combinations of the inclination and deviation angles. Note that the range is clipped between 0 (blue) and 1 (yellow). See Figure 3 for the explanation of the parametrization used.
Figure 1. Quadratic penalty functions of the surface orientation constraint visualized for deviations in polar (left column) and lateral (right column) directions. Shown are the geodesic energy (top row), the gradient-based method (middle row) and the method of Heber (bottom row). Colors of the contours indicate the respective penalty values obtained for different combinations of the inclination and deviation angles. Note that the range is clipped between 0 (blue) and 1 (yellow). See Figure 3 for the explanation of the parametrization used.
Sensors 18 00431 g001
Figure 2. Quadratic penalty functions of the surface orientation constraint visualized for deviations in polar (left column) and lateral (right column) directions. Shown are the method of Nehab (top row), the generalized method of Nehab with r = 0.5 (middle row) as well as r = 1.6 (bottom row). A more natural weighting, which is in the proximity of the geodesic distance, can be achieved by the adaption of the parameter r. A well balanced distance measure such as with r = 1.6 has the potential of performing better in steep regions (high inclination angle) while preserving performance in the flat regions. Colors of the contours indicate the respective penalty values obtained for different combinations of the inclination and deviation angles. Note that the range is clipped between 0 (blue) and 1 (yellow). See Figure 3 for the explanation of the parametrization used.
Figure 2. Quadratic penalty functions of the surface orientation constraint visualized for deviations in polar (left column) and lateral (right column) directions. Shown are the method of Nehab (top row), the generalized method of Nehab with r = 0.5 (middle row) as well as r = 1.6 (bottom row). A more natural weighting, which is in the proximity of the geodesic distance, can be achieved by the adaption of the parameter r. A well balanced distance measure such as with r = 1.6 has the potential of performing better in steep regions (high inclination angle) while preserving performance in the flat regions. Colors of the contours indicate the respective penalty values obtained for different combinations of the inclination and deviation angles. Note that the range is clipped between 0 (blue) and 1 (yellow). See Figure 3 for the explanation of the parametrization used.
Sensors 18 00431 g002
Figure 3. The distance between a two surface normals N ^ and N is expressed by means of the horizontal polar coordinate system. The distances in Figure 1 and Figure 2 are measured in lateral and polar directions using angles α and β . The former describes the angle between the given normal N ^ and the the upright vector O, the latter defines the angular deviation between the normals N ^ and N .
Figure 3. The distance between a two surface normals N ^ and N is expressed by means of the horizontal polar coordinate system. The distances in Figure 1 and Figure 2 are measured in lateral and polar directions using angles α and β . The former describes the angle between the given normal N ^ and the the upright vector O, the latter defines the angular deviation between the normals N ^ and N .
Sensors 18 00431 g003
Figure 4. Comparison of the penalty of (a) the gradient-based method; (b) the method of Heber and (c) the method of Nehab. Gradient-based method: comparison of the measured gradient field components ( x Z , y Z ) with the given gradient field ( G ^ x , G ^ y ) . Method of Heber: scaling of the normal vector N ^ by the length of ( x Z , y Z , 1 ) . Method of Nehab: distance by projection of the tangents T x and T y of the optimized surface onto the given normal field N ^ .
Figure 4. Comparison of the penalty of (a) the gradient-based method; (b) the method of Heber and (c) the method of Nehab. Gradient-based method: comparison of the measured gradient field components ( x Z , y Z ) with the given gradient field ( G ^ x , G ^ y ) . Method of Heber: scaling of the normal vector N ^ by the length of ( x Z , y Z , 1 ) . Method of Nehab: distance by projection of the tangents T x and T y of the optimized surface onto the given normal field N ^ .
Sensors 18 00431 g004
Figure 5. Example evaluation data, consisting of a noisy quantized depth Z ^ and noisy surface normals N ^ . The ground truth depth Z ^ G T is shown with a horizontal and vertical cross-section line, which indicates the positions of evaluations in the following Figure 6. (a) Initial depth Z ^ ; (b) Given surface normals N ^ ; (c) Ground truth depth Z ^ G T .
Figure 5. Example evaluation data, consisting of a noisy quantized depth Z ^ and noisy surface normals N ^ . The ground truth depth Z ^ G T is shown with a horizontal and vertical cross-section line, which indicates the positions of evaluations in the following Figure 6. (a) Initial depth Z ^ ; (b) Given surface normals N ^ ; (c) Ground truth depth Z ^ G T .
Sensors 18 00431 g005
Figure 6. Qualitative evaluation of our novel approaches, as introduced in Section 4.6 and Section 4.7. The following methods are demonstrated: (1st row) the generalized method of Nehab with r = 1.6 , (2nd row) the TGV approach. The left column shows the color coded depth reconstructions as delivered by different methods. The middle two columns show the vertical and horizontal depth profiles as marked in Figure 5c. In each of these plots, the red, gray and black lines indicate the reconstructed depth Z, the initial depth Z ^ and the ground truth depth Z ^ G T , respectively. The error maps showing signed distances to the ground truth depth are provided in the right column.
Figure 6. Qualitative evaluation of our novel approaches, as introduced in Section 4.6 and Section 4.7. The following methods are demonstrated: (1st row) the generalized method of Nehab with r = 1.6 , (2nd row) the TGV approach. The left column shows the color coded depth reconstructions as delivered by different methods. The middle two columns show the vertical and horizontal depth profiles as marked in Figure 5c. In each of these plots, the red, gray and black lines indicate the reconstructed depth Z, the initial depth Z ^ and the ground truth depth Z ^ G T , respectively. The error maps showing signed distances to the ground truth depth are provided in the right column.
Sensors 18 00431 g006
Figure 7. Qualitative evaluation of the state-of-the-art methods described in Section 4. The following methods are demonstrated: (1st row) the gradient-based method using surface orientations only, (2nd row) the combined gradient-based method, (3rd row) the method of Nehab, and (4th row) the method of Heber. The left column shows the color coded depth reconstructions as delivered by different methods. The middle two columns show the vertical and horizontal depth profiles as marked in Figure 5c. In each of these plots, the red, gray and black lines indicate the reconstructed depth Z, the initial depth Z ^ and the ground truth depth Z ^ G T , respectively. The error maps showing signed distances to the ground truth depth are provided in the right column.
Figure 7. Qualitative evaluation of the state-of-the-art methods described in Section 4. The following methods are demonstrated: (1st row) the gradient-based method using surface orientations only, (2nd row) the combined gradient-based method, (3rd row) the method of Nehab, and (4th row) the method of Heber. The left column shows the color coded depth reconstructions as delivered by different methods. The middle two columns show the vertical and horizontal depth profiles as marked in Figure 5c. In each of these plots, the red, gray and black lines indicate the reconstructed depth Z, the initial depth Z ^ and the ground truth depth Z ^ G T , respectively. The error maps showing signed distances to the ground truth depth are provided in the right column.
Sensors 18 00431 g007
Figure 8. Geodesic distances of the ground truth normals to the normals from the initial depth map Z ^ as well as the distances to the normal results of the presented methods (in the same order as displayed in Figure 6 and Figure 7). (a) Initial depth Z ^ ; (b) Surface orientation only; (c) Gradient-based; (d) Method of Nehab; (e) Method of Heber; (f) Generalized Nehab r = 1.6 (ours); (g) TGV (ours).
Figure 8. Geodesic distances of the ground truth normals to the normals from the initial depth map Z ^ as well as the distances to the normal results of the presented methods (in the same order as displayed in Figure 6 and Figure 7). (a) Initial depth Z ^ ; (b) Surface orientation only; (c) Gradient-based; (d) Method of Nehab; (e) Method of Heber; (f) Generalized Nehab r = 1.6 (ours); (g) TGV (ours).
Sensors 18 00431 g008
Figure 9. Convergence analysis of the two winning methods: the generalized Nehab with r = 1.6 (a) and the TGV (b). Distances to the ground truth depth and surface normals are shown after each iteration, plotted with a logarithmic y-axis for better visibility.
Figure 9. Convergence analysis of the two winning methods: the generalized Nehab with r = 1.6 (a) and the TGV (b). Distances to the ground truth depth and surface normals are shown after each iteration, plotted with a logarithmic y-axis for better visibility.
Sensors 18 00431 g009
Figure 10. Real world evaluation of a coin object acquired by a multi-line scanner. The acquisition setup was previously described in detail in [37]. (a) Z ^ ; (b) Gradient-based; (c) Method of Nehab; (d) Generalized Nehab (ours); (e) TGV (ours).
Figure 10. Real world evaluation of a coin object acquired by a multi-line scanner. The acquisition setup was previously described in detail in [37]. (a) Z ^ ; (b) Gradient-based; (c) Method of Nehab; (d) Generalized Nehab (ours); (e) TGV (ours).
Sensors 18 00431 g010
Table 1. Overview of the presented methods. The behavior of the surface orientation constraint in regions with different orientations is visualized for each method in Figure 1 and Figure 2.
Table 1. Overview of the presented methods. The behavior of the surface orientation constraint in regions with different orientations is visualized for each method in Figure 1 and Figure 2.
MethodDepth
Penalty
Orientation PenaltyBalance at
Flat Regions
Balance at
Steep Regions
Surface orientation onlyGradient-based
Gradient-basedGradient-based
Gradient-basedGradient-based +
regularization with zero Laplace
(Equation (23))
Gradient-basedGradient-based +
regularization with gradient
(Equation (21))
Method of HeberScaled normal
Method of NehabProjection of surface tangents
to the given normal field
Generalized Nehab (ours)Projection of surface tangents
to the given normal field
with additional N z weighting
TGV (ours)Gradient-based + TGV
Table 2. Gradient-based: quantitative evaluation of the distance of the optimized depth values to the ground truth depth. The evaluations were performed on objects from the Stanford database [52], which were rendered using POV-Ray [53]. Evaluated are the MSE to the ground truth depth and the geodesic distance to the ground truth normals.
Table 2. Gradient-based: quantitative evaluation of the distance of the optimized depth values to the ground truth depth. The evaluations were performed on objects from the Stanford database [52], which were rendered using POV-Ray [53]. Evaluated are the MSE to the ground truth depth and the geodesic distance to the ground truth normals.
Gradient-BasedDataset Method Z ^ Surface
Orientation
Only
Gradient
Based
Gradient
Based + Reg.
with Laplacian
Smoothness
(Equation (23))
Gradient
Based + Reg.
with Gradient
(Equation (21))
TGV (Ours)
Depth
[ M S E Z ]
Dragon4.2334.052.042.151.850.19
Buddha4.85117.292.122.252.010.22
Armadillo4.6048.711.952.061.830.18
Average4.5366.682.042.151.900.20
Normals
[ G E O N ]
Dragon0.82260.27760.33440.32000.30170.0664
Buddha0.87670.19220.25350.23390.21250.0668
Armadillo0.86110.23970.29730.27970.25990.0666
Average0.85350.23650.29510.27790.25800.0666
Table 3. Normal based: quantitative evaluation of the distance of the optimized depth values to the ground truth depth. The evaluations were performed on objects from the Stanford database [52], which were rendered using POV-Ray [53]. Evaluated are the MSE to the ground truth depth and the geodesic distance to the ground truth normals.
Table 3. Normal based: quantitative evaluation of the distance of the optimized depth values to the ground truth depth. The evaluations were performed on objects from the Stanford database [52], which were rendered using POV-Ray [53]. Evaluated are the MSE to the ground truth depth and the geodesic distance to the ground truth normals.
Normal BasedDataset Method Z ^ Method of
Heber
Method of
Nehab
Generalized
Nehab r = 1.6
(Ours)
Depth
[ M S E Z ]
Dragon4.232.010.130.10
Buddha4.851.600.150.12
Armadillo4.601.750.130.10
Average4.531.790.140.11
Normals
[ G E O N ]
Dragon0.82260.34740.29410.2849
Buddha0.87670.25790.21020.2013
Armadillo0.86110.30940.25530.2464
Average0.85350.30490.25320.2442

Share and Cite

MDPI and ACS Style

Antensteiner, D.; Štolc, S.; Pock, T. A Review of Depth and Normal Fusion Algorithms. Sensors 2018, 18, 431. https://0-doi-org.brum.beds.ac.uk/10.3390/s18020431

AMA Style

Antensteiner D, Štolc S, Pock T. A Review of Depth and Normal Fusion Algorithms. Sensors. 2018; 18(2):431. https://0-doi-org.brum.beds.ac.uk/10.3390/s18020431

Chicago/Turabian Style

Antensteiner, Doris, Svorad Štolc, and Thomas Pock. 2018. "A Review of Depth and Normal Fusion Algorithms" Sensors 18, no. 2: 431. https://0-doi-org.brum.beds.ac.uk/10.3390/s18020431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop