Automated Matching of Multi-Scale Building Data Based on Relaxation Labelling and Pattern Combinations

Zhang, Yunfei; Huang, Jincai; Deng, Min; Chen, Chi; Zhou, Fangbin; Xie, Shuchun; Fang, Xiaoliang

doi:10.3390/ijgi8010038

Open AccessArticle

Automated Matching of Multi-Scale Building Data Based on Relaxation Labelling and Pattern Combinations

¹

School of Traffic & Transportation Engineering, Changsha University of Science & Technology, Changsha 410114, China

²

Department of Geo-Informatics, Central South University, Changsha 410083, China

³

State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079, China

⁴

Zhongnan Engineering Corporation Limited, Power China, Changsha 410014, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2019, 8(1), 38; https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi8010038

Submission received: 4 November 2018 / Revised: 24 December 2018 / Accepted: 10 January 2019 / Published: 16 January 2019

Download

Browse Figures

Versions Notes

Abstract

:

With the increasingly urgent demand for map conflation and timely data updating, data matching has become a crucial issue in big data and the GIS community. However, non-rigid deviation, shape homogenization, and uncertain scale differences occur in crowdsourced and official building data, causing challenges in conflating heterogeneous building datasets from different sources and scales. This paper thus proposes an automated building data matching method based on relaxation labelling and pattern combinations. The proposed method first detects all possible matching objects and pattern combinations to create a matching table, and calculates four geo-similarities for each candidate-matching pair to initialize a probabilistic matching matrix. After that, the contextual information of neighboring candidate-matching pairs is explored to heuristically amend the geo-similarity-based matching matrix for achieving a contextual matching consistency. Three case studies are conducted to illustrate that the proposed method obtains high matching accuracies and correctly identifies various 1:1, 1:M, and M:N matching. This indicates the pattern-level relaxation labelling matching method can efficiently overcome the problems of shape homogeneity and non-rigid deviation, and meanwhile has weak sensitivity to uncertain scale differences, providing a functional solution for conflating crowdsourced and official building data.

Keywords:

building matching; relaxation labelling; pattern combinations; multiple scales; map conflation

1. Introduction

In the context of urbanization and big data, rapid development of geospatial data acquisition has caused an explosive growth of geospatial datasets. In particular, with the popularity of volunteered geographic information (VGI), geospatial data updating is undergoing a significant transformation from top-down active updating to bottom-up crowd updating [1]. On the one hand, crowdsourced geospatial data play an important role in generating 3D city models or highly precise road maps [2,3,4]. On the other hand, it is also essential to utilize the existing geospatial data to enrich the user-generated building information [5]. In practical applications, different departments or volunteers collect various geospatial datasets with different accuracies, different scales, or different thematic focuses. These heterogeneous geospatial datasets cannot effectively interoperate with each other, leading to the problems of information isolation. It is a crucial issue in the GIS community to conflate multi-source and heterogeneous geospatial datasets into an enriched or timely data product with higher geometric precision and detailed attribute information [6,7]. Data matching aims to identify the associated point, linear, or area objects between two or more geospatial datasets in the same or overlapping region, which is widely regarded as the essential step of data conflation and map updating. The subject of the article is building polygon matching.

Due to amateur participation and limited quality inspection, data inconsistencies (such as uneven scale differences, large geometric discrepancies, and diverse semantic descriptions) possibly occur between crowdsourced and authoritative geospatial data [8]. Researchers have found that crowdsourced datasets are characterized by inconsistent levels of detail (LODs) and inexplicit scales [9]. This inconsistency in LODs may result in the prevalence of 1:M and M:N correspondences between various datasets. Moreover, user-generated content may be less geometrically accurate and semantically unstructured, especially in rural or undeveloped areas [10]. This brings about large challenges for matching crowdsourced and authoritative datasets. In particular, the matching of multi-scale building polygons is associated with some complex difficulties. Firstly, the shape and orientation homogenization of building objects and the non-rigid deviations between different datasets make it difficult to distinguish 1:1 matching objects with similar shapes and layouts [11]. As shown in Figure 1a, most building polygon objects are represented as regularized rectangles with non-rigid deviations, which can hardly be matched based on shape similarity comparison. Secondly, the inconsistent LODs lead to low similarity of 1:M and M:N matching objects. As illustrated in Figure 1b, there are many 1:0, 1:M, N:1, and M:N correspondences with low similarity. Hence, this paper focus on the particular issue of matching multi-scale building polygons with an improved relaxation labelling approach.

The remainder of the paper is organized as follows. In Section 2, the state of art information about data matching is reviewed. Section 3 presents an improved building matching method based on relaxation labelling and pattern combinations. Section 4 elaborates the experimental results and undertakes some evaluation analysis to demonstrate the efficiency and robustness of the proposed method. Finally, the conclusions and future works are outlined at the end of the paper.

2. Literature Review

Data matching was developed to find identical objects or positions between multiple geospatial datasets that reflect the same real-world entity [12,13,14]. Finding inconsistencies between up-to-date images and existing geospatial datasets was a significant means of achieving change detection and data updating [15,16]. This paper tackles the matching of vector building objects, and thus undertakes a brief review into vector data matching.

According to object types, vector data matching can be classified into point, linear, and area object matching. For point matching, multi-criteria-decision methods have been developed to match POIs (Points of Interest) from different social network platforms [17,18]. For linear matching, most studies have compared multiple similarities (e.g., geometry, semantic, topology) and applied node-based, arc-based, polygon-based, or hybrid matching strategies [19,20,21,22,23]. Recent advances have introduced relaxation labelling, logistic regression, and genetic algorithms to improve the performance of road network matching [24,25,26]. Du et al. [27] designed ontology descriptions and fusion operators to integrate authoritative and crowdsourced road data. Liu et al. [28] proposed a progressive buffering method to update road maps with OSM (OpenStreetMap) data. Yang et al. [29,30] utilized spatial clustering and shape fitting to mine pattern-related correspondences between heterogeneous POIs and the road network. Ai et al. [31] constructed bend hierarchical trees to match contour data and river networks for geometry inconsistency detection.

Although numerous studies have been devoted to matching various crowdsourced or authoritative road datasets, little attention has been paid to area object matching (e.g., building polygons) [32]. Building polygon matching mainly concentrates on identifying object-level correspondences [33] and conjugate-point pairs between matched objects [34]. Ai et al. [35] and Fan et al. [32] calculated Fourier descriptors and turning functions to measure the shape similarity of building polygon objects. Wang et al. [36] proposed a back-propagation neural network approach for adaptively determining the weights of multiple similarities. Du et al. [37] considered location and lexical information to determine candidate-matching buildings, and then utilized spatial reasoning to refine them. Samal et al. [38] and Kim et al. [39] incorporated the geographic context to amend shape homogenization and non-rigid deviations. Zhang et al. [40] combined shape similarities and contextual information to find neighboring-compatible matching pairs. The above studies only computed the matching confidence of 1:1 matching objects, and ignored the total matching probability of 1:M, N:1, and M:N matching objects. Huh et al. [41] applied the graph embedding technique and agglomerative hierarchical clustering to explore multi-scale corresponding building objects. However, due to shape homogenization, non-rigid deviations, and multi-scale complex correspondences, it is still a challenging task to match multi-scale building polygon data. This paper thus proposes a novel approach for matching multi-scale building polygons based on relaxation labelling and pattern combinations. The proposed method mainly extends the relaxation labelling matching model [40,42] in consideration of simultaneous matches with multiple objects (i.e., pattern combinations).

3. Methodology

Traditional relaxation labelling methods [24,40] ignore the total matching confidences of 1:M and M:N correspondences, and may reduce the average probability that multiple objects simultaneously match with an identical one. Hence, this paper explores potential pattern combinations to model 1:M and M:N matching relations. Generally, the proposed approach consists of two main steps, outlined as follows:

●: Detect candidate-matching building objects based on buffering analysis, aggregate neighboring objects into pattern combinations, and calculate the geo-similarities between candidate-matching objects and pattern combinations to initialize the matching matrix;
●: Compute the contextual compatibilities between neighboring matching pairs to iteratively update the initial matching matrix, select the matching pairs based on the convergent matching matrix, and refine them through a matching conflict detection.

Let two building datasets be R = {r_i|i = 1, …, m} and T = {t_j|j = 1, …, n}. We suppose that R is the reference dataset and T is the target one. On the one hand, due to inconsistent LODs between crowdsourced and authoritative building datasets, it is difficult to assert whether T has a higher LOD than S, leading to the problem of matching direction [24]. On the other hand, since traditional relaxation labelling methods ignore total matching confidences for multiple cardinalities, this paper aggregates candidate-matching objects into pattern combinations to represent simultaneous matches with multiple objects. As illustrated in Figure 2, rp₁…rp_m_’ and tp₁…tp_m_’ are the aggregated pattern combinations in R and T. This paper calculates not only the matching probabilities of individual objects (r_i, t_j), but also calculates those of pattern combinations (r_i, tp_j) or (rp_i, t_j) to constitute a matching matrix. After that, the compatible influences of neighboring objects and pattern combinations are taken into account to heuristically update the initial matching matrix for a global matching consistency.

3.1. Matching Probability Initialization in Consideration of Multi-Scale Pattern Combinations

According to the framework of relaxation labelling matching, the matching matrix was first formed based on local geo-similarities (e.g., position, orientation, area, and shape). One contribution of this paper is that we aggregated neighboring candidate-matching objects into pattern combinations and calculated the probabilities of individual objects and pattern combinations to model 1:1, 1:M, and M:N correspondences. To reduce the computational complexity, a buffering analysis was executed to filter out the impossible matching features.

3.1.1. Detection of Candidate-Matching Objects and Pattern Combinations

A buffer operation can feasibly detect all possible matching objects between two geospatial datasets in the same spatial reference [28]. Previous studies have used an empirical buffering threshold for candidate-matching detection [21,24]. We constructed the Delaunay triangle network (DT_R) for the building centroids in R. The buffering threshold ε was then set as the average edge length of DT_R after deleting all global and local long edges according to the method of Deng et al. [43]. The candidate-matching set of each object r_i was determined as those objects in T that fall in the ε-buffer region of r_i, that is, CR_i = {t_j|t_j ∈ T and t_j in Buffer (r_i, ε)}. Similarly, the candidate-matching set of objects t_j in dataset T can be determined as CT_j = {r_i|r_i ∈ R and r_i in Buffer (t_j, ε)}.

Due to inconsistent LODs, one building object may match with multiple objects in another geo-spatial dataset. As illustrated in Figure 3a, object r_i has four candidate-matching objects; that is, CR₁ = {t₁, …, t₄}. Theoretically, there are C₄⁰ + C₄¹ + C₄² + C₄³ + C₄⁴ = 2⁴ matching situations, as shown in Figure 3b–f. However, the object-level probabilistic matching strategy may identify one 1:1 matching pair but ignore other matching objects with slightly lower probabilities. Hence, it is necessary to aggregate neighboring candidate-matching objects as a whole pattern combination to model multi-scale relaxation labelling correspondences.

To measure the total similarity with more than one candidate-matching object, we should aggregate them into one whole polygon, named a pattern combination. In detail, for one candidate-matching set, we iteratively aggregated the two closest building objects and finally obtained one polygon object to represent the pattern combination of that candidate-matching set. For example, as shown in Figure 4a, we first aggregated closer t₁ and t₂ as M(t₁, t₂) and then aggregated closer M(t₁, t₂) and t₃ as M(t₁, t₂, t₃). To aggregate two individual building objects, a convex hull approximation algorithm was developed, which encompasses the following three steps:

●: Step 1: Initialize the convex hull of two building objects as the merged polygon;
●: Step 2: Iteratively add the convex points closest to the intermediate lines to refine the merged polygon;
●: Step 3: Insert the inner concave points within objects t₁ and t₂ to further refine the merged polygon.

As shown in Figure 4b, for two objects t₁ = <u₁, …, u₈> and t₂ = <v₁, …, v₈>, the convex hull was first initialized as the pattern combination M⁽¹⁾(t₁, t₂) = <u₁, u₂, v₁, v₂, v₄, v₅, v₇, u₈>, of which two segments s₁ = <u₂, v₁> and s₂ = <v₇, u₈> were found as the intermediate lines that connect the vertexes of t₁ and t₂. Secondly, the convex points closest to the intermediate lines (e.g., u₇, u₄, v₈ and u₅) were iteratively added to refine the pattern combination as M⁽²⁾(t₁, t₂). Finally, the inner concave points within objects t₁ and t₂ (e.g., u₃, v₃ and v₆) were inserted to refine the pattern combination as M(t₁, t₂).

Suppose that one object r_i has P candidate-matching objects (|CR_i| = P), the candidate-matching pattern combination set of r_i is defined as CRP_i = {CRP_i²…CRP_i^p…CRP_i^P}, where CRP_i^p means the pattern combination subset that contains p polygon objects. Similarly, suppose that one object t_j has Q candidate-matching objects (|CT_j| = Q), the candidate-matching pattern combination set of t_j is defined as CTP_j = {CTP_j², …, CTP_j^q…CTP_j^Q}, where CTP_j^q means the pattern combination subset that contains q polygon objects. Hence, based on candidate-matching detection and neighboring pattern combinations, the candidate-matching table was created as c_Table = {(r_i, t_j)|r_i ∈ R, t_j ∈ CR_i∪CRP_i or r_i ∈ CT_j∪CTP_j, t_j ∈ T}, which is simplistically described as a candidate-matching table c_Table = (r_i, t_j), i = 1…(m + m’), j = 1…(n + n’).

3.1.2. Calculation of Geo-Similarities and Initial Matching Probabilities

For each candidate matching pair (r_i, t_j) in the candidate-matching table c_Table, four geometry similarities, namely the position similarity sim_p, orientation similarity sim_o, area similarity sim_a, and shape similarity sim_s, were calculated to initialize the candidate-matching matrix. The formulas are listed as follows.

\begin{matrix} s i m_{p} (i, j) = 1 - \frac{d s (i, j)}{ε}, s i m_{a} = \frac{\min {a r e a (i), a r e a (j)}}{\max {a r e a (i), a r e a (j)}} \\ s i m_{o} (i, j) = 1 - \frac{d r (i, j)}{π / 2}, s i m_{s} = \frac{\min {e l g (i), e l g (j)}}{\max {e l g (i), e l g (j)}} \end{matrix}

(1)

As shown in Figure 5a, ds(i, j) means the centroid distance between two objects r_i and t_j, dr(i, j) denotes their wall statistical weighting orientation (WSW) [44] difference, elg(*) indicates the major–minor axis ratio of the minimum enclosing rectangle (MER), and area(*) measures the polygon area. After that, the initial matching probability is calculated by:

\begin{matrix} p_{i j} = \frac{s i m (i, j)}{\sum_{t_{k} \in C R_{i} \cup C R P_{i}} s i m (i, k)} \\ s i m (i, j) = s i m_{p} (i, j)^{*} s i m_{o} (i, j)^{*} s i m_{a} (i, j)^{*} s i m_{s} (i, j) \end{matrix}

(2)

where t_k ∈ CR_i∪CRP_i, sim(i, j) is the total geo-similarity of (r_i, t_j), so does sim(i, k). sim_p, sim_o, sim_a, and sim_s are the position, orientation, area, and shape similarities calculated by Formula (1).

It is time-consuming to consider all possible candidate-matching pattern combinations. Hence, the candidate-matching pairs with sim_p < T_pos, sim_o < T_dir, sim_a < T_area, and sim_s < T_shp were eliminated from the c_Table and the initial candidate-matching matrix was constructed as m_Matrix = (p_ij)_M×N, M ≤ m + m’, N ≤ n + n’.

3.2. Matching Probability Relaxation Based on Multi-Scale Contextual Information

The initial matching probability only considers the local geometric similarities and ignores the impacts of neighboring matching relations. The relaxation labelling algorithm models the contextual information of neighboring candidate-matching pairs as a compatibility coefficient to heuristically update the local-similarity-based matching probability.

3.2.1. Definition of the Neighboring Relation and Compatibility Coefficient

As no explicit neighboring relations are stored in the building datasets, some researchers have utilized the fixed radius, triangulation network, and proximity graph to search the local neighboring objects [38,40]. For the reference building object r_i in R, we constructed the Delaunay triangulation and defined the first-order connected objects and pattern combinations as its neighboring set Ɲ(r_i); for target building object t_j in T, the objects and pattern combinations in buffer(t_j, ε) were determined as the neighboring set Ɲ(t_j). Specifically, for one pattern combination r_u’ = {r_a, …, r_b}, Ɲ(r_u’) was defined as an subtraction set, Ɲ(r_a)∪…∪Ɲ(r_b)-{r_a…r_b}.

Based on the definition of neighboring relations, the contextual information of neighboring candidate-matching pairs was quantified as a compatibility coefficient according to Formula (3) [40], which combines both the relative and absolute position, orientation, shape, and area relations.

C (i j; h k) = s i m (i, j) \cdot \prod_{u = 1}^{4} r e l_{u} (i j; h k)

(3)

where C(ij; hk) denotes the compatibility coefficient between two neighboring candidate-matching pairs (r_i, t_j) and (r_h, t_k), sim(i, j) is the absolute geo-similarity calculated by Formula (2), and rel_u(ij; hk) (u = 1…4) represents four relative geometric relations between them, which are calculated as Formulae (4)–(7), respectively.

\begin{array}{l} r e l_{1} (i j; h k) = {r_{d i s}}^{*} {r_{d i r}}^{*} r_{r a t i o} \\ {\begin{cases} r_{d i s} = 1 - \frac{| d s (i, h) - d s (j, k) |}{\max_{r_{m} \in N (r_{i}), t_{n} \in N (t_{j})} {d s (i, m), d s (j, m)}} \\ r_{d i r} = 1 - \frac{θ (i h, j k)}{π / 2} \\ r_{r a t i o} = 1 - \frac{d s (i, h)}{\max_{r_{m} \in N (r_{i})} {d s (i, m)}} \end{cases} \end{array}

(4)

r e l_{2} (i j; h k) = 1 - \frac{| d r (i, h) - d r (j, k) |}{π / 2}

(5)

r e l_{3} (i j; h k) = \frac{1}{1 + {(e l g (i / h) - e l g (j / k))}^{2}}

(6)

r e l_{4} (i j; h k) = \frac{1}{1 + {(a r e a (i / h) - a r e a (j / k))}^{2}}

(7)

rel₁(ij; hk), rel₂(ij; hk), rel₃(ij; hk), and rel₄(ij; hk) represent the relative position, orientation, shape, and area relations between two neighboring candidate-matching pairs (r_i, t_j) and (r_h, t_k), respectively. As depicted in Figure 5b, ds(i, h) and dr(i, h) mean the centroid distance and orientation difference between two neighboring building objects r_i and r_h, Ɲ(r_i) indicates the neighboring set of object r_i, and θ(ih; jk) is the angle between two centroid vectors

\vec{r_{i} r_{h}}

and

\vec{t_{j} t_{k}}

. elg(i/h) and area(i/h) are the elongation index ratio and area ratio between two neighboring building objects r_i and r_h, so do elg(j/k) and area(j/k).

3.2.2. Relaxation of the Matching Matrix and Selection of Final Matching Pairs

The compatibility coefficient calculated above measures the relative geometric consistency between two neighboring candidate-matching pairs. However, there are possibly more than one neighboring candidate-matching pairs. The relaxation labelling process incorporates the impacts of all neighboring candidate-matching pairs (also called the sub-support indexes) into a total support index and updates the geometric-based matching matrix. In light of multi-scale 1:M and M:N correspondences, the impacts of neighboring pattern combinations should be also integrated into the total support index. In detail, the total support index of one candidate-matching pair (r_i, t_j) was computed as follows:

●: Calculate the sub-support indexes of all neighboring individual objects and pattern combinations according to Formula (8), which are partitioned into two separate queues of Q_ij and Q’_ij. Q_ij stores the sub-support indexes of neighboring individual objects and Q’_ij stores the sub-support indexes of neighboring pattern combinations. For the candidate-matching pair (r_i, t_j) in Figure 6a, two sub-support index queues Q_ij and Q_ij’ are listed in Figure 6b.

$q_{i j, h} = \max_{s_{k} \in C R_{h} \cup C R P_{h}} {C (i j; h k) P (h, k)}, r_{h} \in N_{i} or N P_{i}$

(8)
●: Traverse Q_ij’ in ascending order of the included object number of the neighboring pattern combination, and judge whether its sub-support index is larger than the average support index of its included objects. If yes, the sub-support index of that pattern combination will replace the corresponding sub-support indexes of the included objects in Q_ij. As elaborated in Figure 6b, the sub-support index q_ij,x’ of r_x’ = {r_a, r_b} is larger than the average of q_ij_,a and q_ij_,b, so q_ij_,x’ will substitute the values of q_ij_,a and q_ij_,b in Q_ij.
●: When all elements of Q_ij’ are traversed and judged by the above steps, the mean of the sub-support indexes in Q_ij is computed as the total support index q_ij of (r_i, t_j).

Once the total support indexes of all candidate-matching pairs were determined, the iteration of the geo-similarity matching matrix was executed according to Formula (9) [24]. The relaxation labelling process will continue until the probability differences between two iterations are less than a predefined threshold δ.

p_{i j}^{r + 1} = \frac{p_{i j}^{r} + q_{i j}^{r}}{1 + \sum_{h \in C S_{i}} q_{i h}^{r}}

(9)

Based on the convergent matching matrix, the matching pairs with the maximum matching probability per row were first selected and inserted into a queue MQueue. Then, check each matching pair A in MQueue. When there is another matching pair B in MQueue that contains one or more common building objects, it may be two conflict matching pairs or an M:N matching pair divided by several 1:M or M:1 matching pairs. To distinguish the above conditions, we calculated the overall geometry similarity Sim (A∪B) of merging A and B. If Sim (A∪B) is larger than both Sim (A) and Sim (B), A and B are merged into one M:N matching pair A∪B; otherwise, the one with the higher matching probability is reserved and the other with the lower probability is replaced by the pair with the next highest probability. After that, MQueue was determined as the final matching pair set between two building datasets R and T.

4. Experiment and Analysis

One mock building dataset and two real building datasets in Xi’an City, China and Dallas in the United States of America were used to verify the efficiency and reliability of the proposed method. As listed in Table 1, the spatial scales of two Xi’an building datasets were known as 1:250,000 and 1:200,000, but uncertain scale differences were found in the simulated dataset and Dallas dataset from web sources. In Table 1, T_pos, T_dir, T_area, T_shp are four similarity thresholds for the candidate-matching pairs filtering from Section 3.1.2, and δ is the convergent threshold for the relaxation labelling process outlined in Section 3.2.2.

Figure 7 illustrates the initial matching result of the mock building data based on local geometric similarity and the matching results generated by the object-level relaxation labelling method and the proposed pattern-level method. It can be seen from Figure 7a that several obvious errors occur in the similarity-based matching results because of non-rigid deviations and shape homogenization. Although the object-level relaxation labelling method is a great improvement in terms of finding a contextual matching consistency, as shown in Figure 7b, there still are some 1:M and M:N matching pairs ignored by the object-level matching strategy. Figure 7c indicates that various 1:M and M:N matching pairs are correctly recognized by the proposed method based on relaxation labelling and pattern combinations. The matching results of the Xi’an data and Dallas data are shown in Figure 8, which demonstrates the proposed method can robustly identify the complex correspondence of building objects with large and uncertain scale differences.

By comparing the matching results with manual matching, three accuracy indicators—precision P, recall R, and F value—were calculated according to Formula (10). tp denotes true matching pairs recognized by our method, fp denotes false matching pairs identified by our method, and fn means true matching pairs omitted by our method. It can be seen from Table 2 that both the precision and recall rates are higher than 90%, and obvious higher accuracies were obtained by our method than the object-level relaxation labelling method.

P = \frac{t p}{t p + f p}, R = \frac{t p}{t p + f n}, F = \frac{2 \times P \times R}{P + R}

(10)

Figure 9 displays the probability change of candidate-matching pairs of objects #125 and #26 during the relaxation labelling process. As shown in Figure 9a, the relaxation labelling process correctly distinguishes fake matching pairs (#125 → #141) and true matching pairs (#125 → #140) by integrating contextual information of neighboring matching pairs to heuristically amend the local-similarity-based matching matrix. This indicates the proposed method can efficiently overcome incorrect matching caused by shape homogenization and non-rigid deviation. Meanwhile, as depicted in Figure 9b, the pattern-level matching method correctly identifies a complex 1:M matching pair of (#26 → #64, 65, 70, 69) by considering pattern combinations in the relaxation labelling model. This demonstrates our method can efficiently find various 1:M and M:N correspondences possibly missed by the object-level matching method.

The matching probability distributions of object #81 and #17 in Figure 10a are elaborated in Figure 10b,c, respectively. It can be seen that both 1:1 object correspondence (#81 → #30) and 1:M pattern combination matching (#17 → #32, 33, 34, 35, 36, 37) are reliably identified in terms of a large probability gap. That indicates our method is relatively less sensitive to spatial scale differences than the object-level relaxation labelling method.

Table 3 shows two matching examples of pattern combinations. For object r_A, Table 3 lists the top four candidate correspondences. It can be seen that the matching probability of (r_A → {t_a, t_b, t_c, t_d}) was computed as 0.218, apparently higher than that of other candidate-matching objects or pattern combinations. For object r_B and r_C, the best matching pairs with the highest matching probabilities were computed as M₁ = (r_B → {t_e, t_f}) and M₂ = (r_C → {t_e, t_g}), respectively. Because M₁ and M₂ contain a common object t_e, these two matching pairs were finally merged into an M:N matching pair M = ({r_A, r_B} → {t_e, t_f, t_g}) by comparing their geo-similarities. This further indicates the good performance of our method in identifying complex 1:M and M:N matching relations between heterogeneous multi-scale building data.

5. Conclusions

Object matching is an essential step of map conflation and data updating. Multi-scale building data matching is confronted with such problems as shape homogenization, non-rigid deviations, and uncertain scale differences. This paper proposed an automated building matching method based on relaxation labelling and pattern combinations. The proposed method extended the object-level relaxation labelling matching method and constructed a probabilistic matching model considering both 1:1 object matching and 1:M pattern combination matching. The experimental results show that the relaxation labelling process takes contextual information into account, reliably increasing the matching accuracy and insensitivity to non-rigid deviations and shape homogenization. Moreover, pattern combinations are novelly integrated into the probabilistic matching model, efficiently improving the performance to match crowdsourced building datasets with large and uncertain spatial scale differences. Future work will focus on null matching modelling and optimized selection of candidate-matching pairs to improve the robustness of our method.

Author Contributions

Y.Z. and F.Z. proposed the matching method; J.H. and S.X. performed the experiments; M.D. and C.C. discussed the idea and analyzed the data; Y.Z. and X.F. wrote the paper.

Funding

This research was funded by [National Nature Science Foundation of China] grant number [41601495 and 41671446], [Natural Science Foundation of Hunan Province] grant number [2018JJ3525], [National Key Research and Development Program of China] grant number [2017YFB0503500], [Open Research Fund of State Key Laboratory Of Information Engineering in Surveying, Mapping and Remote Sensing] grant number [17S01], and [Open Fund of Engineering Laboratory of Spatial Information Technology of Highway Geological Disaster Early Warning in Hunan Province (Changsha University of Science & Technology)] grant number [kfj170604 and kfj170605].

Acknowledgments

We thank the anonymous reviewers for their constructive comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jiang, B. Volunteered geographic information and computational geography: New perspectives. In Crowdsourcing Geographic Knowledge; Sui, D., Elwood, S., Goodchild, M., Eds.; Springer: Dordrecht, The Netherlands, 2013; pp. 125–138. [Google Scholar]
Goetz, M. Towards generating highly detailed 3D CityGML models from OpenStreetMap. Int. J. Geogr. Inf. Sci. 2013, 27, 845–865. [Google Scholar] [CrossRef]
Bergman, C.; Oksanen, J. Conflation of OpenStreetMap and Mobile Sports Tracking Data for Automatic Bicycle Routing. Trans. GIS 2016, 20, 848–868. [Google Scholar] [CrossRef] [Green Version]
Yang, W.; Ai, T.; Lu, W. A Method for Extracting Road Boundary Information from Crowdsourcing Vehicle GPS Trajectories. Sensors 2018, 18, 1261. [Google Scholar] [CrossRef] [PubMed]
Hecht, R.; Kunze, C.; Hahmann, S. Measuring Completeness of Building Footprints in OpenStreetMap over Space and Time. ISPRS Int. J. Geo-Inf. 2013, 2, 1066–1091. [Google Scholar] [CrossRef] [Green Version]
Ruiz, J.J.; Ariza, F.J.; Ureña, M.A.; Blázquez, E.B. Digital map conflation: A review of the process and a proposal for classification. Int. J. Geogr. Inf. Sci. 2011, 25, 1439–1466. [Google Scholar] [CrossRef]
Dalyot, S.; Dahinden, T.; Schulze, M.J.; Boljen, J.; Sester, M. Integrating network structures of different geometric representations. EMP Surv. Rev. 2013, 45, 428–440. [Google Scholar] [CrossRef]
Sester, M.; Arsanjani, J.J.; Klammer, R.; Burghardt, D.; Haunert, J.H. Integrating and generalising volunteered geographic information. In Abstracting Geographic Information in a Data Rich World; Burghardt, D., Duchêne, C., Mackaness, W., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 119–155. [Google Scholar]
Touya, G.; Brando-Escobar, C. Detecting level-of-detail inconsistencies in Volunteered Geographic Information data sets. Cartogr. Int. J. Geogr. Inf. Geovis. 2013, 48, 134–143. [Google Scholar] [CrossRef]
Barron, C.; Neis, P.; Zipf, A. A Comprehensive Framework for Intrinsic OpenStreetMap Quality Analysis. Trans. GIS 2014, 18, 877–895. [Google Scholar] [CrossRef]
Xu, J.; Wu, F.; Qian, H.; Ma, F. Settlement matching algorithm using spatial similarity relations as constraints. Geomat. Inf. Sci. Wuhan Univ. 2013, 38, 484–488. [Google Scholar]
Saalfeld, A. Conflation: Automated map compilation. Int. J. Geogr. Inf. Sci. 1988, 2, 217–228. [Google Scholar] [CrossRef]
Kieler, B.; Huang, W.; Haunert, J.; Jiang, J. Matching river datasets of different scales. In Lecture Notes in Geoinformation and Cartography; Sester, M., Bernard, L., Paelke, V., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 135–154. [Google Scholar]
Pourabdollah, A.; Morley, J.; Feldman, S.; Jackson, M. Towards an authoritative OpenStreetMap: Conflating OSM and OS OpenData National Maps’ road network. ISPRS Int. J. Geo-Inf. 2013, 2, 704–728. [Google Scholar] [CrossRef]
Chen, C.-C.; Knoblock, C.; Shahabi, C. Automatically conflating road vector data with orthoimagery. GeoInformatica 2006, 10, 495–530. [Google Scholar] [CrossRef]
Zhang, J. A Congruent Hybrid Model for Conflation of Satellite Image and Road Database. Ph.D. Thesis, Technical University of Munich, Munich, German, 2013. [Google Scholar]
McKenzie, G.; Janowicz, K.; Adams, B. A weighted multi-attribute method for matching user-generated Points of Interest. Cartogr. Geogr. Inf. Sci. 2014, 41, 125–137. [Google Scholar] [CrossRef] [Green Version]
Scheffler, T.; Schirru, R.; Lehmann, P. Matching points of interest from different social networking sites. In KI 2012: Advances in Artificial Intelligence; Goebel, R., Tanaka, Y., Wahlster, W., Siekmann, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 45–248. [Google Scholar]
Xiong, D.; Sperling, J. Semiautomated matching for network database integration. ISPRS J. Photogramm. Remote Sens. 2004, 59, 35–46. [Google Scholar] [CrossRef]
Zhang, M.; Meng, L. An iterative road-matching approach for the integration of postal data. Comput. Environ. Urban Syst. 2007, 31, 598–616. [Google Scholar] [CrossRef]
Mustière, S.; Devogele, T. Matching Networks with Different Levels of Detail. GeoInformatica 2008, 12, 435–453. [Google Scholar] [CrossRef]
Fan, H.; Yang, B.; Zipf, A.; Rousell, A. A polygon-based approach for matching OpenStreetMap road networks with authority data. Int. J. Geogr. Inf. Sci. 2016, 30, 748–764. [Google Scholar] [CrossRef]
Safra, E.; Kanza, Y.; Sagiv, Y.; Doytsher, Y. Ad hoc matching of vectorial road networks. Int. J. Geogr. Inf. Sci. 2013, 27, 114–153. [Google Scholar] [CrossRef]
Yang, B.; Zhang, Y.; Luan, X. A probabilistic relaxation approach for matching road networks. Int. J. Geogr. Inf. Sci. 2013, 27, 319–338. [Google Scholar] [CrossRef]
Tong, X.; Liang, D.; Jin, Y. A linear road object matching method for conflation based on optimization and logistic regression. Int. J. Geogr. Inf. Sci. 2014, 28, 824–846. [Google Scholar] [CrossRef]
Chehreghan, A.; Abbaspour, R.A. A geometric-based approach for road matching on multi-scale datasets using a genetic algorithm. Cartogr. Geogr. Inf. Sci. 2018, 45, 255–269. [Google Scholar] [CrossRef]
Du, H.; Anand, S.; Alechina, N.; Morley, J.; Hart, G.; Leibovici, D.; Jackson, M.; Ware, M. Geospatial information integration for authoritative and crowd sourced road vector data. Trans. GIS 2012, 16, 455–476. [Google Scholar] [CrossRef]
Liu, C.; Xiong, L.; Hu, X.; Shan, J. A Progressive Buffering Method for Road Map Update Using OpenStreetMap Data. ISPRS Int. J. Geo-Inf. 2015, 4, 1246–1264. [Google Scholar] [CrossRef] [Green Version]
Yang, B.; Zhang, Y.; Lu, F. Geometric-based approach for integrating VGI POIs and road networks. Int. J. Geogr. Inf. Sci. 2014, 28, 126–147. [Google Scholar] [CrossRef]
Yang, B.; Zhang, Y. Pattern-mining approach for conflating crowdsourcing road networks with POIs. Int. J. Geogr. Inf. Sci. 2015, 29, 786–805. [Google Scholar] [CrossRef]
Ai, T.; Yang, M.; Zhang, X.; Tian, J. Detection and correction of inconsistencies between river networks and contour data by spatial constraint knowledge. Cartogr. Int. J. Geogr. Inf. Geov. 2015, 42, 79–93. [Google Scholar] [CrossRef]
Fan, H.; Zipf, A.; Fu, Q.; Neis, P. Quality assessment for building footprints data on OpenStreetMap. Int. J. Geogr. Inf. Sci. 2014, 28, 700–719. [Google Scholar] [CrossRef]
Gösseln, G.; Sester, M. Integration of geoscientific datasets and the German digital map using a matching approach. In Proceedings of the XXth International Society for Photogrammetry and Remote Sensing Congress, Istanbul, Turkey, 12–23 July 2004; pp. 1249–1254. [Google Scholar]
Huh, Y.; Yu, K.; Heo, J. Detecting conjugate-point pairs for map alignment between two polygon datasets. Comput. Environ. Urban Syst. 2011, 35, 250–262. [Google Scholar] [CrossRef]
Ai, T.; Cheng, X.; Liu, P.; Yang, M. A shape analysis and template matching of building features by the Fourier transform method. Comput. Environ. Urban Syst. 2013, 41, 219–233. [Google Scholar] [CrossRef]
Wang, Y.; Chen, D.; Zhao, Z.; Ren, F.; Du, Q. A Back-Propagation Neural Network-Based Approach for Multi-Represented Feature Matching in Update Propagation. Trans. GIS 2015, 19, 964–993. [Google Scholar] [CrossRef]
Du, H.; Alechina, N.; Jackson, M.; Hart, G. A Method for Matching Crowd-sourced and Authoritative Geospatial Data. Trans. GIS 2016. [Google Scholar] [CrossRef]
Samal, A.; Seth, S.; Cueto, K. A feature-based approach to conflation of geospatial sources. Int. J. Geogr. Inf. Sci. 2004, 18, 459–489. [Google Scholar] [CrossRef]
Kim, J.O.; Yu, K.; Heo, J.; Lee, W.H. A new method for matching objects in two different geospatial datasets based on the geographic context. Comput. Geosci. 2010, 36, 1115–1122. [Google Scholar] [CrossRef]
Zhang, X.; Ai, T.; Stoter, J.; Zhao, X. Data matching of building polygons at multiple map scales improved by contextual information and relaxation. ISPRS J. Photogramm. Remote Sens. 2014, 92, 147–163. [Google Scholar] [CrossRef]
Huh, Y.; Kim, J.; Lee, J.; Yu, K.; Shi, W. Identification of multi-scale corresponding object-set pairs between two polygon datasets with hierarchical co-clustering. ISPRS J. Photogramm. Remote Sens. 2014, 88, 60–68. [Google Scholar] [CrossRef]
Zhang, Y.; Huang, J.; Deng, M.; Fang, X.; Hu, J. Relaxation Labelling Matching for Multi-scale Residential Datasets Based on Neighboring Patterns. Geomat. Inf. Sci. Wuhan Univ. 2018, 43, 1098–1105. [Google Scholar]
Deng, M.; Liu, Q.; Cheng, T.; Shi, Y. An adaptive spatial clustering algorithm based on Delaunay triangulation. Comput. Environ. Urban Syst. 2011, 35, 320–332. [Google Scholar] [CrossRef]
Duchêne, C.; Bard, S.; Barillot, X.; Ruas, A.; Trevisan, J.; Holzapfel, F. Quantitative and qualitative description of building orientation. In Proceedings of the Fifth Workshop on Progress in Automated Map Generalisation, Paris, France, 28–30 April 2003; ICA, Commission on Map Generalisation: Barcelona, Spain, 2003. [Google Scholar]

Figure 1. Complex difficulties in matching multi-scale building polygons: (a) shape homogenization and non-rigid deviations between building features; (b) 1:M and M:N matching caused by inconsistent levels of detail (LODs).

Figure 2. The relaxation labelling matching model in consideration of individual objects and pattern combinations: (a) graphical representation; (b) matrix representation.

Figure 3. The possible matching combinations of one building object r_i: (a) one building object r_i and its candidate-matching objects t₁…t₄; (b) r_i matching with none; (c) r_i matching with one object; (d) r_i matching with two objects; (e) r_i matching with three objects; (f) r_i matching with four objects.

Figure 4. Aggregating neighboring candidate-matching objects into pattern combinations: (a) aggregating two or more objects based on centroid distances; (b) aggregating two objects based on convex hull approximation.

Figure 5. Calculation of absolute geometric similarity and the relative compatibility coefficient: (a) absolute geometric similarity between candidate-matching pairs; (b) relative compatibility coefficient between neighboring candidate-matching pairs. MER: minimum enclosing rectangle.

Figure 6. Integrating the sub-support indexes of neighboring individual objects and pattern combinations into a total support index: (a) neighboring individual objects and pattern combinations of (r_i, t_j); (b) integrating all sub-support indexes into a total support index.

Figure 7. Matching results comparison of the mock building data: (a) the initial matching pairs based on local geometric similarity; (b) the identified matching pairs by the object-level relaxation labelling method; (c) the identified matching pairs by the pattern-level relaxation labelling method.

Figure 8. Matching results of the Xi’an and Dallas data: (a) the matching results for the Xi’an data with obvious scale differences; (b) the matching results for the Dallas data with uncertain scale differences.

Figure 9. Matching probability change during the relaxation labelling process: (a) probability change of candidate-matching pairs of object #125; (b) probability change of candidate-matching pairs of object #26.

Figure 10. Comparing the probabilities of matching with an individual object and simultaneously matching with multiple objects: (a) object #81 and #17 in R and their candidate-matching objects in T; (b) the matching probability comparison of all candidate-matching pairs of #81; (c) the matching probability comparison of all candidate-matching pairs of #17.

Table 1. Experimental data statistics and parameter settings.

Test Data	R (Scale/Object Number)	T (Scale/Object Number)	T_pos	T_dir	T_area	T_shp	δ
Mock data	Small scale/25	Large scale/42	0.75	0.95	0.8	0.75	5‰
Xi’an data	1: 250,000 (83)	1: 200,000(118)	0.70	0.95	0.65	0.72	5‰
Dallas data	OpenStreetMap/83	Web data/80	0.87	0.93	0.9	0.87	5‰

Table 2. Matching accuracy statistics of precision, recall, and F value.

Data		Precision P	Recall R	F Value
Mock data	Object-level method	100%	68.57%	81.35%
Mock data	The proposed method	100%	100%	100%
Xi’an data		93.56%	91.47%	92.5%
Dallas data		98.12%	95.45%	96.93%

Table 3. Examples of candidate-matching pattern combinations.

Data R	Candidate Correspondences in Data T	Matching Probability
r_A	{t_a}	0.011
	{t_a, t_b}	0.0469
	{t_a, t_b, t_c}	0.138
	{t_a, t_b, t_c, t_d}	0.218
r_B	{t_e}	0.0585
	{t_f}	0.1424
	{t_e, t_f}	0.308
	{t_e, t_f, t_g}	0.051
r_C	{t_e}	0.070
	{t_g}	0.130
	{t_e, t_g}	0.290
	{t_e, t_f, t_g}	0.055

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Huang, J.; Deng, M.; Chen, C.; Zhou, F.; Xie, S.; Fang, X. Automated Matching of Multi-Scale Building Data Based on Relaxation Labelling and Pattern Combinations. ISPRS Int. J. Geo-Inf. 2019, 8, 38. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi8010038

AMA Style

Zhang Y, Huang J, Deng M, Chen C, Zhou F, Xie S, Fang X. Automated Matching of Multi-Scale Building Data Based on Relaxation Labelling and Pattern Combinations. ISPRS International Journal of Geo-Information. 2019; 8(1):38. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi8010038

Chicago/Turabian Style

Zhang, Yunfei, Jincai Huang, Min Deng, Chi Chen, Fangbin Zhou, Shuchun Xie, and Xiaoliang Fang. 2019. "Automated Matching of Multi-Scale Building Data Based on Relaxation Labelling and Pattern Combinations" ISPRS International Journal of Geo-Information 8, no. 1: 38. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi8010038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Matching of Multi-Scale Building Data Based on Relaxation Labelling and Pattern Combinations

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Matching Probability Initialization in Consideration of Multi-Scale Pattern Combinations

3.1.1. Detection of Candidate-Matching Objects and Pattern Combinations

3.1.2. Calculation of Geo-Similarities and Initial Matching Probabilities

3.2. Matching Probability Relaxation Based on Multi-Scale Contextual Information

3.2.1. Definition of the Neighboring Relation and Compatibility Coefficient

3.2.2. Relaxation of the Matching Matrix and Selection of Final Matching Pairs

4. Experiment and Analysis

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI