Next Article in Journal
FloodSim: Flood Simulation and Visualization Framework Using Position-Based Fluids
Previous Article in Journal
Analyzing Road Coverage of Public Vehicles According to Number and Time Period for Installation of Road Inspection Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Urban Scene Vectorized Modeling Based on Contour Deformation

1
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
2
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
3
College of Engineering, Ocean University of China, Qingdao 266100, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2020, 9(3), 162; https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9030162
Submission received: 1 February 2020 / Revised: 26 February 2020 / Accepted: 7 March 2020 / Published: 10 March 2020

Abstract

:
Modeling urban scenes automatically is an important problem for both GIS and nonGIS specialists with applications like urban planning, autonomous driving, and virtual reality. In this paper, we present a novel contour deformation approach to generate regularized and vectorized 3D building models from the orthophoto and digital surface model (DSM).The proposed method has four major stages: dominant directions extraction, find target align direction, contour deformation, and model generation. To begin with, we extract dominant directions for each building contour in the orthophoto. Then every edge of the contour is assigned with one of the dominant directions via a Markov random field (MRF). Taking the assigned direction as target, we define a deformation energy with the Advanced Most-Isometric ParameterizationS (AMIPS) to align the contour to the dominant directions. Finally, the aligned contour is simplified and extruded to 3D models. Through the alignment deformation, we are able to straighten the contour while keeping the sharp turning corners. Our contour deformation based urban modeling approach is accurate and robust comparing with the state-of-the-arts as shown in experiments on the public dataset.

Graphical Abstract

1. Introduction

In recent decades, there has been significant development in remote sensing technology. From valuable satellites to low-cost unmanned aerial vehicles (UAV), from high-resolution multi-spectrum images to LiDAR, we can capture both 2D and 3D data of our environment easily. Therefore, extracting semantic, geometric, and topological information from these heterogeneous data automatically has been an important research field in both GIS and nonGIS communities. Obtaining precise and compact geometry representation of large-scale urban scene is one of the core problems in urban reconstruction [1], which is referred to as vectorized modeling by many researchers. It not only has direct application in GIS like urban planning, navigation, and real estate but also is beneficial for model storage, transmission, and rendering. Apart from that, the structured and vectorized representation of an object is also a popular demand in 3D computer vision. In this paper we aim to generate compact building models from the orthophoto and digital surface model (DSM) as shown in Figure 1.
Aerial images and LiDAR point clouds are two common data types for mapping the outdoor environment. LiDAR can measure distance with great precision but the irregular form of point cloud brings an extra burden to further processing. With the increasing amount of literature on deep learning, semantic information can be easily extracted from images. Meanwhile, structure-from-motion (SfM) and multi-view-stereo (MVS) enable us to reconstruct the scene from images with ease. Therefore, we choose the orthophoto and DSM generated from aerial images as input.
Semantic segmentation [2,3] is the fundamental part in our system. With the help of deep learning, geographic information practitioners are able to predict semantic of aerial images effortlessly. DeepLab [4,5] is one of the most acknowledged neural network structures with dilated convolution, spatial pyramid pooling, and encoder-decoder. Liu et al. [3] use a self-cascaded architecture and achieved the highest overall accuracy on the ISPRS 2D semantic labeling contest [6]. Typically the boundary of each semantic region is a chain of rasterized pixels that is too dense for a modern GIS system. The Ramer–Douglas–Peucker (RDP) algorithm [7] is widely used to simplify a polyline. It approximates a curve composed of line segments with fewer points by decimating the endpoint with minimum distance to the polyline iteratively. Poullis et al. [8] use a Gaussian mixture model and Markov random field (MRF) to classify the boundary points before applying the RDP algorithm.
3D geometry is another important aspect for our system. Modern SfM and MVS pipelines [9,10,11,12,13,14,15,16]. can generate accurate models from images. However, the dense triangle surface meshes or point clouds are not suitable for modern GIS system. Turning these dense outputs into the compact form, also known as vectorization has been attracting increasing attention [17,18,19,20,21,22,23]. Generally, they can be classified into two categorizes: (1) Bounding volume slicing and selection [17,18,24], which slice the bounding volume with planar primitives and select the polytopes inside the building or the faces on the building. (2) Contour regularization and extrusion [19,21,25,26], which usually regularize the contours of a building first with the assumption of 2.5D scene, then simplify the contours and extrude them into 3D space. Level of details (LODs) defined by CityGML [27] is widely recognized for how urban environment should be described, represented, and exchanged in the digital world. Both [17,21] focus on generating models adhering to the CityGML [27] standard. We follow the second contour regularization path and generate LOD0 and LOD1 models as shown in Figure 1d.
In this paper, we propose a novel contour deformation based urban vectorized modeling method as shown in Figure 1. We take segmented orthophoto as input and extract the contours of each building first. Then the contour normals are smoothed with a bilateral filtering, and dominant directions are detected from them with the RANSAC algorithm. After that, each edge is assigned with one of the dominant directions through an MRF. By defining a deformation energy on the triangulation of the contour polygon, we could align the boundary edges to the dominant directions. Finally, the contour can be vectorized into the polygon LOD0 model and extruded to the LOD1 model from DSM. Our main contributions include:
-
An effective bilateral smoothing and RANSAC based dominant direction detection method.
-
An efficient deformation energy optimization defined on the contour triangulation to align the boundary to the target directions.
-
A novel deformation based building modeling method, which enables us to generate compact LOD0 and LOD1 models from orthophoto and DSM.

2. Proposed Method

2.1. Overview

The proposed modeling method takes the orthophoto and DSM as input and outputs vectorized LOD0 and LOD1 models. As shown in Figure 2, our algorithm has four major stages:
-
Firstly, dominant directions of the building contour are detected through the RANSAC on the bilaterally smoothed normals, Figure 2b.
-
Then each edge of the contour is assigned with one of the dominant directions as the alignment target through an MRF formulation, Figure 2c.
-
With the target direction and the deformation energy defined on the contour triangle mesh, we align the boundary edges to the target direction, Figure 2d.
-
Finally, compact LOD0 and LOD1 models are generated by connecting the corner vertexes and extruding them to their averaged heights in DSM, Figure 2e,f.
Generally, an urban scene has elements including road, building, and vegetation [17,21]. Among them, building is the most important category for urban vectorized modeling. Therefore, we only focus on the building in this paper. In the following sections, each building is reconstructed separately with its related data isolated from the orthophoto and DSM.

2.2. Dominant Directions Detection

Strong regularity in direction is one common property of the building. Manhattan assumption is widely used [24,26] in urban modeling, which assumes that the whole scene has three global orthogonal directions. The less restricted Atlanta assumption [28] requires that each building has its own local orthogonal directions. Here we do not pose any restriction to the dominant directions and detect them on the bilaterally smoothed contour normals with the RANSAC algorithm.
Figure 3a shows the building contour C = { c i } extracted from the segmentation in Figure 1. Due to the rasterization of the image, the boundary normals { n i } of all boundary edges { e i = c i c i + 1 } are limited to a few discretized directions [8]. To alleviate this problem, we propose a bilateral smoothing on the normals by weighing on both the angle similarity and the geodesic distance:
n ^ i = n k N i w ( i , k ) n k n k N i w ( i , k ) ,
where N i = { n k d i s t ( e i , e k ) < t h r e d } is the neighboring normals and d i s t ( e i , e k ) measures the distance between e i and e k along the contour. The composite weight w ( i , k ) is given by:
w ( i , k ) = e x p d i s t ( e i , e k ) σ d 2 ( n i , n k ) σ a 2 ,
where ( n i , n k ) represents the angle between n i and n k . In our experiments, the distance variation σ d and the angle variation σ a is set to 0.5 m and 10° respectively. Figure 3b shows the effect of our distance and angle weighted bilateral smoothed normals.
With the smoothed contour normals, the RANSAC algorithm is used to detect the dominant directions D = { d i } with the inlier set defined as: I i = { n ^ k ( n ^ k , d i ) < t h r e a } . Specifically, in the ith iteration and with the remaining normals R i = { n ^ k } \ k = 1 i 1 I k , we keep generating candidate directions { d i c } and collecting their corresponding inlier sets { I i c } until the missing probability [29] p m is below a threshold t h r e p :
p m = m a x ( { | I i c | } ) | R i | | { d i c } | .
The best candidate I i = a r g m a x ( { | I i c | } ) is selected to compute the ith dominant direction:
d i = n ^ k I i n ^ k n ^ k I i n ^ k .
In our experiment, the iteration stops when | I i | is less than | { n i } | / 16 . Figure 3c shows the detected directions from the smoothed contour normals.

2.3. Align Direction

With the detected dominant directions D , we adopt the similar MRF formulation in [8] to assign each boundary edge e i a target direction d i , which will be later used to drive the deformation in the following Section 2.4.
For each building contour C , an undirected dual graph G = V , E is constructed on it, meaning each edge e i is treated as a vertex in V and each vertex c i is considered as an edge in E . Accordingly, our label set is D = { d i } and a labeling configuration is f : V D . The data term measures the difference between the observed data n ^ i and the mapped label f ( e i ) :
E d a t a ( f ) = e i V n ^ i f ( e i ) .
The smoothness term favors the connecting edges e i , e j V with similar orientations f ( e i ) , f ( e j ) D and penalizes otherwise:
E s m o o t h ( f ) = e i , e j E 2 e n ^ i n ^ j 2 σ 2 · f ( e i ) f ( e j ) ,
where δ controls the smoothness uncertainty. Intuitively, if two neighboring edge normals n ^ i and n ^ j are close, there is a higher probability that f ( e i ) and f ( e j ) are similar. The last label term favors a lower number of labels in a labeling configuration f:
E l a b e l ( f ) = d i D ( 1 h i ) 2 · ζ i ( f ) ,
where h i = | I i | / i = 1 | D | | I i | measures the relative portion of a dominant direction on the contour and ζ i ( f ) indicates whether label d i exist in f:
ζ i ( f ) = 1 , e k : f ( e k ) = d i , 0 , otherwise .
The label term penalizes heavily when there exists a label that corresponds to a direction of little portion h i , thus tends to keep the few most significant directions in D . Then the overall energy function for the graph-cut is given by:
E ( f ) = E d a t a ( f ) + κ 1 E s m o o t h ( f ) + κ 2 E l a b e l ( f ) ,
where κ 1 and κ 2 are the balance coefficients of the smoothness term and the label term respectively. They are set to 1 and 10 in the following section. As proven in [8], the formulation is regular and can be minimized by the α -expansion algorithm [30,31]. Figure 3d shows how each boundary edge is assigned with a target dominant direction.

2.4. Deformation Formulation

This section introduces our dominant direction driven deformation formulation, which regularizes the contour and greatly helps the following modeling step. For each contour C , we use the constrained Delaunay triangulation [32] to triangulate the bounded area as shown in Figure 4. The generated mesh M = { t 1 , , t l } contains vertexes { v 1 , , v m } and boundary edges { e 1 , , e n } .
Our goal is to align the normal n ^ i of each e i on C to its assigned target direction under the configuration f in Section 2.3. Therefore, we have the align energy that measures the difference between the edge normals n ^ i and their target dominant directions d i D :
E a l i g n = i = 1 | C | ( e i n ^ i · f ( e i ) ) 2 .
By changing the positions of { v 1 , , v m } , we can deform M to align the contour C to major directions D and minimize E a l i g n . The deformation is a piece wise linear function g : R 2 R 2 defined on M , which maps every original triangle t = Δ v p v q v r to the output triangle t = Δ v p v q v r as illustrated in Figure 4. We adopt the Advanced Most-Isometric ParameterizationS (AMIPS) [33] to achieve as low distortion as possible. Then the mapping is an affine transformation defined on each t : g t ( x ) = J t x + b t , where J t is a 2 × 2 affine matrix and also the Jacobian [33] of g t :
J t = v p v q , v p v r · v p v q , v p v r 1 .
With the singular value of J t denoted as σ 1 and σ 2 , the AMIPS [33] conformation and area distortion energy defined on t is:
δ c o n f , t = σ 1 σ 2 + σ 2 σ 1 = trace ( J t J t T ) det J t ,
δ a r e a , t = ( det J t + ( det J t ) 1 ) .
The rigid energy measuring how rigid the mapping from the original triangle t to the deformed triangle t is defined as:
E r i g i d = t M e x p ( α δ c o n f , t + ( 1 α ) δ a r e a , t ) .
Then the overall deformation energy balanced by λ is
E d e f o r m = E r i g i d + λ E a l i g n .
To minimize E d e f o r m , we use the MATLAB optimization toolbox and the closed form gradient is in the Appendix A. The conformation and area distortion balance α is set to 0.5 and the alignment and deformation balance λ is set to 1 e 5 . Figure 5 shows the process of energy minimization.

2.5. Model Generation

After the deformation optimization in Section 2.4, the building contour C becomes C and is aligned to the dominant directions. LOD0 and LOD1 models adhering to the CityGML [27] standard can be easily extracted from it. Specifically, we traverse the contour C and only keep the corner vertexes { c i } , which has large angle between the neighboring edges e i 1 and e i + 1 . Connecting the corner vertexes gives us the LOD0 model of a simplified polygon. LOD1 model is given by extruding the LOD0 contour to the averaged height of the building [17,21]. Figure 6 shows the process of model generation.

3. Results and Discussion

The proposed method is implemented in C++. The max-flow library [30,31] is used to solve the MRF in Section 2.3 and the MATLAB optimization toolbox is used to solve the energy minimization in Section 2.4. Qualitative as well as quantitative assessments are conducted on the public dataset from ISPRS [6] in this section. Experiments show that our contour deformation optimization framework can generate regular and compact building models compared to the state-of-the-arts.

3.1. Effect of Alignment Deformation

To demonstrate the effect of the alignment deformation, we conduct a simple experiment that aligns the contour to the axes, as shown in Figure 7. The contour is extracted from our own real world orthophoto of a high rising building, which is sampled from the dense textured mesh reconstructed from aerial images by Pix4D [34]. Due to the high level of noise and inaccuracy derived from the mesh reconstruction and orthophoto segmentation, the extracted building contour is uneven and zigzags as illustrated in Figure 7. To make matters worse, the corners of small protrusion are rounded (green rectangle in Figure 7), which is essential for vectorized modeling.
For each border edge normal, we simply assign the nearest axis as its target direction. As shown in Figure 7 the jaggy input contour is aligned to the axes gradually as the deformation energy drops. The closeup on the right of the area in the green rectangle on the left shows detail of the optimization process. The energy drops dramatically in the first few iterations as shown in Figure 5d. Since we set the alignment and rigidness balance λ to 1 × 105, we can also observe that the alignment is reached quickly at first, then the rigidness in the following iterations.

3.2. Quality Comparison

There are three major aspects in evaluating the modeling quality: contour accuracy, contour complexity and regularity. Both general urban modeling algorithms [8,21] and general polyline simplification algorithm [7] are evaluated and compared on the public Vaihingen data set [6]. The input orthophotos shown in Figure 1 and Figure 8 are generated via dense image matching with Trimble INPHO 5.3 software and Trimble INPHO OrthoVista [6]. The sampling step of both the TOP and the DSM is 9 cm in Figure 1 and Figure 8.
For LOD0 and LOD1 generation, the problem is usually treated as a contour extraction and simplification problem. The RDP algorithm [7] removes the vertex close to the curve iterative until the target number of vertexes is reached or the minimum distance is reached. The simplified curve is a subset of the points that are in the original polyline. This makes it easily affected by the few extreme points as shown in Table 1.
Poullis et al. [8] conduct a orientation detection and classification before the RDP algorithm [7]. This enables it to be aware of the turning points of the global orientation. Unfortunately, the simplified curve is still the subset of the input curve. Therefore, RDB [7] and Poullis et al. [8] could not capture the position of the corners well, due to rounded corners in the segmentation mask as shown in Table 1.
The last column in Table 1 demonstrates that Zhu et al. [21] could generate contour with strong regularity. However, it fails when the scene does not satisfy the Manhattan assumption. The missing part in the lower left corner of the second building in Table 1 is caused by the missing short segments.
As illustrated in Table 1, our method has the highest IoU while maintaining the sharp corners of the contours. Thanks to the deformation optimization process, we are able to recover the position of the corner vertex more accurately. Table 2 lists more results of the buildings in Figure 1, and the orthogonal neighboring edges are correctly reconstructed. In addition, Figure 8 shows the result on another block from the Vaihingen data set [6]. On the left is the input orthophoto of residential area, and on the right is the output LOD1 model overlaid on the semantic segmentation map [3].

4. Conclusions

In this paper, we try to turn the dense 2D orthophotos into compact 3D polygon models automatically for the urban scene [17,21], which is suitable for efficient representation, processing, and rendering of large-scale scenes. The generated models can be used in various fields like urban planning, navigation, emergency simulation, and risk reduction [1].
Specifically, we propose a novel deformation based contour simplification approach that generates vectorized LOD0 and LOD1 building models from the orthophotos. To begin with, building contours are extracted from the semantic labeled orthophotos and processed separately. For each building, we first extract dominant directions by applying the RANSAC algorithm on the bilaterally smoothed contour edge normals. Then each edge normal is assigned with one of the dominant directions as target alignment by formulating the task as an MRF labeling problem [8]. Finally, a deformation energy combining the edge normal alignment and AMIPS [33] rigidness is defined on the contour triangle mesh. By minimizing the deformation energy and connecting the corner vertexes, we could generate compact LOD0 and LOD1 models easily.
Compared to the classic RDP algorithm [7] and the most recent advanced contour based methods [8,21], we are able to enhance the global regularity and retain the contour topology at the same time. The proposed novel deformation approach could aggregate different constraints into one optimization framework. It shows great potential in the urban scene modeling, which is rich with regularities like orthogonality, parallelism, and collinearity. In the future, we would like to add more common regularity constraints and take the generated models to higher LODs.

Author Contributions

Methodology, Lingjie Zhu and Shuhan Shen; software, Lingjie Zhu; validation, Xiang Gao; investigation, Xiang Gao; resources, Shuhan Shen; data curation, Shuhan Shen; writing—original draft preparation, Lingjie Zhu; writing—review and editing, Shuhan Shen and Xiang Gao; visualization, Lingjie Zhu; supervision, Zhanyi Hu; project administration, Zhanyi Hu; funding acquisition, Zhanyi Hu. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Natural Science Foundation of China under Grants 61991423, 61873265, 61421004.

Acknowledgments

We would like to acknowledge the public data set retained by ISPRS [6].

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The essential part of the deformation energy E d e f o r m is the MIPS energy δ c o n f , t and the area energy δ a r e a , t defined on each triangle t . According to [33], their derivatives with respect to a variable x are:
x δ c o n f , t = 2 t r a c e ( J t T · x J t ) d e t J t δ c o n f , t t r a c e ( J t 1 · x J t ) ,
x δ a r e a , t = ( d e t J t d e t J t 1 ) t r a c e ( d e t J t 1 · x J t ) .
The overall derivatives can be obtained easily by the chain rule.

References

  1. Musialski, P.; Wonka, P.; Aliaga, D.G.; Wimmer, M.; Gool, L.V.; Purgathofer, W. A Survey of Urban Reconstruction. Comput. Graph. Forum. 2013, 32, 146–177. [Google Scholar] [CrossRef]
  2. Rouhani, M.; Lafarge, F.; Alliez, P. Semantic segmentation of 3D textured meshes for urban scene analysis. ISPRS J. Photogramm. Remote Sens. 2017, 123, 124–139. [Google Scholar] [CrossRef] [Green Version]
  3. Liu, Y.; Fan, B.; Wang, L.; Bai, J.; Xiang, S.; Pan, C. Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. ISPRS J. Photogramm. Remote Sens. 2018, 145, 78–95. [Google Scholar] [CrossRef] [Green Version]
  4. Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
  5. Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
  6. ISPRS. ISPRS 2D Semantic Labeling Contest. Available online: http://www2.isprs.org/commissions/comm3/wg4/semantic-labeling.html (accessed on 12 December 2019).
  7. Ramer, U.; Douglas, D.; Peucker, T. Ramer–Douglas–Peucker Algorithm. Comput. Graph. Image Process 1972, 1, 244–256. [Google Scholar] [CrossRef]
  8. Poullis, C. A Framework for Automatic Modeling from Point Cloud Data. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2563–2575. [Google Scholar] [CrossRef] [PubMed]
  9. Furukawa, Y.; Ponce, J. Accurate Camera Calibration from Multi-View Stereo and Bundle Adjustment. Int. J. Comput. Vis. 2009, 84, 257–268. [Google Scholar] [CrossRef]
  10. Furukawa, Y.; Curless, B.; Seitz, S.M.; Szeliski, R. Towards Internet-scale multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 1434–1441. [Google Scholar]
  11. Agarwal, S.; Furukawa, Y.; Snavely, N.; Simon, I.; Curless, B.; Seitz, S.M.; Szeliski, R. Building Rome in a Day. Commun. ACM 2011, 54, 105–112. [Google Scholar] [CrossRef]
  12. Cui, H.; Shen, S.; Gao, W.; Hu, Z. Efficient Large-Scale Structure from Motion by Fusing Auxiliary Imaging Information. IEEE Trans. Image Process. 2015, 22, 3561–3573. [Google Scholar]
  13. Langguth, F.; Sunkavalli, K.; Hadap, S.; Goesele, M. Shading-aware multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 469–485. [Google Scholar]
  14. Cui, H.; Gao, X.; Shen, S.; Hu, Z. HSfM: Hybrid Structure-from-Motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2393–2402. [Google Scholar]
  15. Hofer, M.; Maurer, M.; Bischof, H. Efficient 3D scene abstraction using line segments. Comput. Vis. Image Underst. 2017, 157, 167–178. [Google Scholar] [CrossRef]
  16. Bódis-Szomorú, A.; Riemenschneider, H.; Gool, L.V. Efficient edge-aware surface mesh reconstruction for urban scenes. Comput. Vis. Image Underst. 2017, 157, 3–24. [Google Scholar] [CrossRef]
  17. Verdie, Y.; Lafarge, F.; Alliez, P. LOD Generation for Urban Scenes. ACM Trans. Graph. 2015, 34, 30:1–30:14. [Google Scholar] [CrossRef] [Green Version]
  18. Nan, L.; Wonka, P. PolyFit: Polygonal Surface Reconstruction from Point Clouds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2372–2380. [Google Scholar]
  19. Kelly, T.; Femiani, J.; Wonka, P.; Mitra, N.J. BigSUR: Large-scale Structured Urban Reconstruction. ACM Trans. Graph. 2017, 36, 204:1–204:16. [Google Scholar] [CrossRef]
  20. Nguatem, W.; Mayer, H. Modeling Urban Scenes from Pointclouds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3857–3866. [Google Scholar]
  21. Zhu, L.; Shen, S.; Gao, X.; Hu, Z. Large Scale Urban Scene Modeling from MVS Meshes. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 614–629. [Google Scholar]
  22. Zeng, H.; Wu, J.; Furukawa, Y. Neural procedural reconstruction for residential buildings. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 737–753. [Google Scholar]
  23. Li, M.; Rottensteiner, F.; Heipke, C. Modelling of buildings from aerial LiDAR point clouds using TINs and label maps. ISPRS J. Photogramm. Remote Sens. 2019, 154, 127–138. [Google Scholar] [CrossRef]
  24. Li, M.; Wonka, P.; Nan, L. Manhattan-World Urban Reconstruction from Point Clouds. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 54–69. [Google Scholar]
  25. Li, M.; Nan, L.; Smith, N.; Wonka, P. Reconstructing building mass models from UAV images. Comput. Graph. 2016, 54, 84–93. [Google Scholar] [CrossRef] [Green Version]
  26. Zhu, L.; Shen, S.; Hu, L.; Hu, Z. Variational Building Modeling from Urban MVS meshes. In Proceedings of the IEEE International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; pp. 318–326. [Google Scholar]
  27. CityGML. CityGML. Available online: http://www.opengeospatial.org/standards/citygml (accessed on 4 April 2019).
  28. Joo, K.; Oh, T.H.; Kweon, I.S.; Bazin, J.C. Globally Optimal Inlier Set Maximization for Atlanta World Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 2019. [Google Scholar] [CrossRef] [PubMed]
  29. Schnabel, R.; Wahl, R.; Klein, R. Efficient RANSAC for Point-Cloud Shape Detection. Comput. Graph. Forum 2007, 26, 214–226. [Google Scholar] [CrossRef]
  30. Boykov, Y.; Veksler, O.; Zabih, R. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 1222–1239. [Google Scholar] [CrossRef] [Green Version]
  31. Boykov, Y.; Kolmogorov, V. An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1124–1137. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. The CGAL Project. The Computational Geometry Algorithms Library. Available online: https://www.cgal.org (accessed on 8 November 2019).
  33. Fu, X.M.; Liu, Y.; Guo, B. Computing locally injective mappings by advanced MIPS. ACM Trans. Graph. (TOG) 2015, 34, 71. [Google Scholar] [CrossRef]
  34. Pix4D. Pix4D. Available online: https://pix4d.com/ (accessed on 18 November 2019).
Figure 1. Introduction to our contour deformation based vectorized urban modeling. (a) is the orthophoto with RGB and infra-red channels from ISPRS [6]. (b) is the segmentation result for (a) using [3], notice the blue region represents the building. (c) is the DSM of the area. (d) shows the output level of details (LOD0) (lower half) and LOD1 (upper half) models from (c) in 3D.
Figure 1. Introduction to our contour deformation based vectorized urban modeling. (a) is the orthophoto with RGB and infra-red channels from ISPRS [6]. (b) is the segmentation result for (a) using [3], notice the blue region represents the building. (c) is the DSM of the area. (d) shows the output level of details (LOD0) (lower half) and LOD1 (upper half) models from (c) in 3D.
Ijgi 09 00162 g001
Figure 2. Overview of the proposed method of contour deformation based urban modeling. (a) is the input segmentation of a building from Figure 1. (b) detects the dominant directions from the bilaterally smoothed normals using RANSAC, five dominant directions are detected (color coded). (c) each edge is assigned with a target direction to align to with a Markov random field (MRF) formulation. (d) aligns the contour to the detected directions through our deformation optimization. (e,f) the generated vectorized building LOD0 model of polygon and LOD1 model extruded to the averaged height in DSM.
Figure 2. Overview of the proposed method of contour deformation based urban modeling. (a) is the input segmentation of a building from Figure 1. (b) detects the dominant directions from the bilaterally smoothed normals using RANSAC, five dominant directions are detected (color coded). (c) each edge is assigned with a target direction to align to with a Markov random field (MRF) formulation. (d) aligns the contour to the detected directions through our deformation optimization. (e,f) the generated vectorized building LOD0 model of polygon and LOD1 model extruded to the averaged height in DSM.
Ijgi 09 00162 g002
Figure 3. Dominant directions detection and alignment direction. (a) initial normals orthogonal to the edges have limited variation due to rasterization. (b) bilaterally smoothed ( t h r e d = 0.5 m) normals capture the dominant directions in Section 2.2. (c) detected four dominant directions (outliers in black) using RANSAC in Section 2.2. (d) find the target alignment direction for each edge with the MRF formulation in Section 2.3.
Figure 3. Dominant directions detection and alignment direction. (a) initial normals orthogonal to the edges have limited variation due to rasterization. (b) bilaterally smoothed ( t h r e d = 0.5 m) normals capture the dominant directions in Section 2.2. (c) detected four dominant directions (outliers in black) using RANSAC in Section 2.2. (d) find the target alignment direction for each edge with the MRF formulation in Section 2.3.
Ijgi 09 00162 g003
Figure 4. Mapping of a triangle t to the deformed triangle t through an affine transformation J t .
Figure 4. Mapping of a triangle t to the deformed triangle t through an affine transformation J t .
Ijgi 09 00162 g004
Figure 5. Optimization of the energy E d e f o r m . While the energy drops at each iteration, the contour is gradually aligned to its assigned target direction in Section 2.3.
Figure 5. Optimization of the energy E d e f o r m . While the energy drops at each iteration, the contour is gradually aligned to its assigned target direction in Section 2.3.
Ijgi 09 00162 g005
Figure 6. Generating LOD models from the deformed contour in Figure 5. (a) is the deformed contour with corner vertexes in green circles. (b) is the LOD0 model of simplified polygon reducing the vertexes from 499 to 8. (d) is the LOD1 model by extruding the contour to the averaged height in (c).
Figure 6. Generating LOD models from the deformed contour in Figure 5. (a) is the deformed contour with corner vertexes in green circles. (b) is the LOD0 model of simplified polygon reducing the vertexes from 499 to 8. (d) is the LOD1 model by extruding the contour to the averaged height in (c).
Ijgi 09 00162 g006
Figure 7. Deformation energy minimization aligning to the nearest axes on a large contour. The right side of the first three rows is the close-ups of the area in the green rectangles on left. (a) The input jaggy contour triangle mesh has 1424 vertexes, 2508 faces and 338 boundary edges. The initial energy is 1, 150, 510 (b) The contour mesh at the 2nd iteration with energy of 170, 687. (c) The contour mesh at the 27th iteration with energy of 23, 598. (d) The energy minimization curve from the first iteration to the last one. It drops rapidly in the first few iterations and stops eventually.
Figure 7. Deformation energy minimization aligning to the nearest axes on a large contour. The right side of the first three rows is the close-ups of the area in the green rectangles on left. (a) The input jaggy contour triangle mesh has 1424 vertexes, 2508 faces and 338 boundary edges. The initial energy is 1, 150, 510 (b) The contour mesh at the 2nd iteration with energy of 170, 687. (c) The contour mesh at the 27th iteration with energy of 23, 598. (d) The energy minimization curve from the first iteration to the last one. It drops rapidly in the first few iterations and stops eventually.
Ijgi 09 00162 g007
Figure 8. Modeling result on another block. (a) The input orthophoto with infra-red channel of another urban area. (b) Our generated LOD1 model of the scene overlaid on the segmentation.
Figure 8. Modeling result on another block. (a) The input orthophoto with infra-red channel of another urban area. (b) Our generated LOD1 model of the scene overlaid on the segmentation.
Ijgi 09 00162 g008
Table 1. Contour quality comparison of different methods on the building in Figure 3. In each row from left right to left are: the ground truth building contour provided by ISPRS [6], the our LOD0 model, the RDP algorithm [7] output by specifying the number of vertexes to be the same as ours, the result by Poullis et al. [8], and Zhu et al. [21].
Table 1. Contour quality comparison of different methods on the building in Figure 3. In each row from left right to left are: the ground truth building contour provided by ISPRS [6], the our LOD0 model, the RDP algorithm [7] output by specifying the number of vertexes to be the same as ours, the result by Poullis et al. [8], and Zhu et al. [21].
Ground TruthOurs LOD0RDP [7]Poullis et al. [8]Zhu et al. [21]
Ijgi 09 00162 i001 Ijgi 09 00162 i002 Ijgi 09 00162 i003 Ijgi 09 00162 i004 Ijgi 09 00162 i005
IoU0.960.940.950.87
Ijgi 09 00162 i006 Ijgi 09 00162 i007 Ijgi 09 00162 i008 Ijgi 09 00162 i009 Ijgi 09 00162 i010
IoU0.940.920.930.78
Table 2. Generated model on different buildings in Figure 2. In each column from top to bottom are: building segmentation, generated vectorized polygon LOD0 model, and the extruded LOD1 model.
Table 2. Generated model on different buildings in Figure 2. In each column from top to bottom are: building segmentation, generated vectorized polygon LOD0 model, and the extruded LOD1 model.
RegionBCD
Contour Ijgi 09 00162 i011 Ijgi 09 00162 i012 Ijgi 09 00162 i013
LOD0 Ijgi 09 00162 i014 Ijgi 09 00162 i015 Ijgi 09 00162 i016
LOD1 Ijgi 09 00162 i017 Ijgi 09 00162 i018 Ijgi 09 00162 i019

Share and Cite

MDPI and ACS Style

Zhu, L.; Shen, S.; Gao, X.; Hu, Z. Urban Scene Vectorized Modeling Based on Contour Deformation. ISPRS Int. J. Geo-Inf. 2020, 9, 162. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9030162

AMA Style

Zhu L, Shen S, Gao X, Hu Z. Urban Scene Vectorized Modeling Based on Contour Deformation. ISPRS International Journal of Geo-Information. 2020; 9(3):162. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9030162

Chicago/Turabian Style

Zhu, Lingjie, Shuhan Shen, Xiang Gao, and Zhanyi Hu. 2020. "Urban Scene Vectorized Modeling Based on Contour Deformation" ISPRS International Journal of Geo-Information 9, no. 3: 162. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9030162

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop