Vectorization of Floor Plans Based on EdgeGAN

Dong, Shuai; Wang, Wei; Li, Wensheng; Zou, Kun

doi:10.3390/info12050206

Open AccessArticle

Vectorization of Floor Plans Based on EdgeGAN

¹

Zhongshan Institute, University of Electronic Science and Technology of China, Zhongshan 528400, China

²

School of Automation, Guangdong University of Technology, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Information 2021, 12(5), 206; https://0-doi-org.brum.beds.ac.uk/10.3390/info12050206

Submission received: 3 April 2021 / Revised: 5 May 2021 / Accepted: 11 May 2021 / Published: 12 May 2021

(This article belongs to the Special Issue Machine Learning and Accelerator Technology)

Download

Browse Figures

Versions Notes

Abstract

:

A 2D floor plan (FP) often contains structural, decorative, and functional elements and annotations. Vectorization of floor plans (VFP) is an object detection task that involves the localization and recognition of different structural primitives in 2D FPs. The detection results can be used to generate 3D models directly. The conventional pipeline of VFP often consists of a series of carefully designed complex algorithms with insufficient generalization ability and suffer from low computing speed. Considering the VFP is not suitable for deep learning-based object detection frameworks, this paper proposed a new VFP framework to solve this problem based on a generative adversarial network (GAN). First, a private dataset called ZSCVFP is established. Unlike current public datasets that only own not more than 5000 black and white samples, ZSCVFP contains 10,800 colorful samples disturbed by decorative textures in different styles. Second, a new edge-extracting GAN (EdgeGAN) is designed for the new task by formulating the VFP task as an image translation task innovatively that involves the projection of the original 2D FPs into a primitive space. The output of EdgeGAN is a primitive feature map, each channel of which only contains one category of the detected primitives in the form of lines. A self-supervising term is introduced to the generative loss of EdgeGAN to ensure the quality of generated images. EdgeGAN is faster than the conventional and object-detection-framework-based pipeline with minimal performance loss. Lastly, two inspection modules that are also suitable for conventional pipelines are proposed to check the connectivity and consistency of PFM based on the subspace connective graph (SCG). The first module contains four criteria that correspond to the sufficient conditions of a fully connected graph. The second module that classifies the category of all subspaces via one single graph neural network (GNN) should be consistent with the text annotations in the original FP (if available). The reason is that GNN treats the adjacent matrix of SCG as weights directly. Thus, GNN can utilize the global layout information and achieve higher accuracy than other common classifying methods. Experimental results are given to illustrate the efficiency of the proposed EdgeGAN and inspection approaches.

Keywords:

generative and adversarial networks; connective graph; graph neural networks

1. Introduction

A 2D floor plan (FP) often contains structural, decorative, and functional elements and annotations. Figure 1 depicts that the vectorization of FP (VFP) aims to detect different structural primitives in the FP and assemble them into one 2D floor vector graph (FVG) that can be stretched into a 3D model. Manual methods often require meticulous measurements; thus, VFP has attracted remarkable attention for the past 20 years [1]. VFP is always a challenge because of the diversity of drawing styles and standards.

The conventional pipeline of VFP [2] (Figure 2) relies on a sequence of low-level image processing heuristics. Many researchers have devoted themselves to designing complicated algorithms to parse the local geometric constructions and retrieve structural elements based on drawing features and pixel information. Lu et al. proposed a self-incremental axis-net-based hierarchical recognition model to recognize dimensions, coordinate systems, and structural components [3], and integrate architectural information dispersed in multiple drawings and tables under the guidance of semantics and prior domain knowledge [4]. In their later work [5], the concept of primitive recognition and integration was proposed for the first time. Zhu [6] proposed a shape-operation graph to recognize walls and parse the topology of the entire layout based on structural primitives. Jiang [7] focused on the recovery of distortion to obtain the exact size. Gimenez et al. [8] also discussed methods that can be used to recognize walls, openings, and spaces. Special segmentation and recognition methods for text annotations, which could obtain high-level semantic information about scale [9], measurement [10], type of subspace [11], were proposed. The text annotations can be recognized accurately with the development of optical character recognition [12], especially those that are based on deep learning (DL) [13].

Artificial neural networks have been applied in VFP with the development of DL. Dodge et al. [14] used a fully convolutional neural network (CNN) to detect structural elements and achieve a mean intersection-over-union score of 89.9\% on R-FP and 94.4\% on the public CVC-FP dataset. Chen et al. [15] applied CNNs in translating a rasterized image to a set of junctions that represented low-level geometric and semantic information (e.g., wall corners or door endpoints). Moreover, they formulated the integer programming to aggregate junctions into a set of simple primitives (e.g., wall lines, door lines, or icon boxes) to produce an FVG with consistent constraints between topology and geometry. DL-based object detection framework can only detect doors and windows because there is no suitable annotation to describe the complex geometrical characteristic of architectural primitives. Thus, they can only replace some modules of the conventional pipeline. Faster RCNN [16] and YOLO [17], as well as other anchor-based frameworks, propose numerous boxes and combined them based on intersection over union (IoU). In a PFM, walls are described in form of lines, and if we use inflated boxes as ground truth, sloping or curved walls cannot be localized accurately. Anchor-free frameworks, CenterNet [18], and CornerNet [19] for instance, cannot solve this problem either. Subspaces segmentation is a typical semantic segmentation task, which can be achieved by a Unet [20] or a generative adversarial network (GAN) [21] in an end-to-end manner. Due to the lack of a large-scale segmentation dataset, only one literature has exploited this method on a mixed dataset PYTH [22], most samples of which are not public. Therefore, this study develops a special edge extraction GAN (EdgeGAN) to detect architectural primitives, which is a compromise between the two approaches.

GAN, which is a new learning framework for a generative model, has drawn great attention since it was proposed by Goodfellow et al. [21] in 2014. GAN has sprouted many branches, including conditional GAN [23,24], Wasserstein GAN [25,26], pix2pix [27], and has been used successfully in image translation, style migration, denoising, superresolution and repair, image matting, semantic segmentation, and dataset expansion [28,29]. GAN is a general-purpose solution for translating an input image into a corresponding output image with the same setting, which is mapped pixels to pixels.

One important milestone of GAN for image translation is pix2pix introduced by Isola et al. [27], which is developed from conditional GAN [24]. The most usual architecture of the generator is the encoder–decoder or its improved version “U-Net” with skip connections between mirrored layers in the encoder and decoder stacks [20]. Wang et al. [30] expanded pix2pix to high-resolution image synthesis and semantic manipulation by introducing a new robust adversarial learning objective together with new multiscale generator and discriminator architectures. In another work of Wang et al. [31], a video-to-video translation framework with spatial–temporal adversarial objective achieved high-resolution, photorealistic, and temporally coherent video results on a diverse set of input formats including segmentation masks, sketches, and poses.

CycleGAN is another important milestone for the unpaired image-to-image translation [32]. Two independent works also proposed the same method inspired by different motivations, namely, as DuelGAN [33] or DiscoNet [34]. Pix2pix learns the forward mapping (i.e.,

y = G (x)

), whereas CycleGAN learns two-cycle mappings (i.e.,

x = F (y^{'}) = F (G (x))

and

y = G (x^{'}) = G (F (y))

) with the input

x

and output

y

unpaired. Considering that pixel-level annotation for most tasks is impossible, CycleGAN has a wider range of applications while requiring the training of more samples.

In this work, a new VFP framework based is proposed based on pix2pix. The main contributions of this work are presented as follows:

(1): A colorful and larger dataset called ZSCVFP is established. Unlike current public datasets, which only contain black and white FPs without decorative disturbance or style variation, such as CVC-FP [14] and CubiCasa5K [35], ZSCVFP’s FPs are drawn with decorative disturbance in different styles, thereby causing difficulty in the extraction of primitives. The ground truth annotations in the form of points and lines, together with the corresponding images, are provided. Furthermore, ZSCVFP has a total of 10,800 samples. This number is higher than the 121 and 5000 samples of CVC-FP and CubiCasa5K, respectively.
(2): VFP is formulated as an image translation task innovatively, and EdgeGAN based on pix2pix is designed for the new task. EdgeGAN projects the FPs into the primitive space. Each channel of the primitive feature map (PFM) only contains some lines that represent one category of primitives. A self-supervising term is added to the generative loss of EdgeGAN to enhance the quality of PFM. Unlike conventional pipelines (even if some modules are replaced with deep-learning methods) that consist of a series of carefully designed algorithms, EdgeGAN obtains the FVG in an end-to-end manner. EdgeGAN is about 15 times as fast as the conventional pipeline. To the best knowledge of the authors, this study is the first to apply GAN in VFP.
(3): Four criteria, which are sufficient conditions for a fully connected graph, are given to inspect the connectivity of subspaces segmented from the PFM. The connective inspection can provide auxiliary information for the designers to adjust the FVG.
(4): The graph neural network (GNN) is used to predict the categories of subspaces segmented from the PFM. Given that GNN treats the adjacent matrix of the connective graph as weights directly, it can utilize global layout information and achieve higher accuracy than other common classifying methods.

This work is organized as follows. Section 2 establishes the ZSCVFP dataset and introduces the goal of the new VFP framework. Section 3 presents the main algorithms. Section 4 provides the experimental results. At last, Section 5 draws some conclusions.

2. Problem Description

In this section, the ZSCVFP dataset and the goal of the new VFP framework are introduced.

Framework Based on EdgeGAN

As mentioned, current public datasets are all black and white without decorative disturbance. However, the original FPs provided by customers in practical applications are complex and diverse. Thus, the new dataset ZSCVFP is established for this reason. ZSCVFP contains 8800 FPs in the training set and 2000 FPs in the test set. For a given FP

X \in ℝ^{w \times h \times 3}

where

w

and

h

are the width and height, respectively, the pseudo-annotations of walls, windows, and doors are given in the form of a point set

P = {p_{1}, p_{2}, \dots}

and three line sets

ℒ_{w a l l} = {w_{1}, w_{2}, \dots}

,

ℒ_{w i n d o w} = {v_{1}, v_{2}, \dots}

, and

ℒ_{d o o r} = {d_{1}, d_{2}, \dots}

, respectively. The elements of

ℒ_{w a l l}

,

ℒ_{w i n d o w}

, and

ℒ_{d o o r}

are paired points from

P

. The corresponding PFM

Z \in ℝ^{w \times h \times 3}

is also provided in the dataset, as shown in the center subfigure of Figure 1.

The walls’ annotations are obtained by a conventional pipeline that has been developed by ourselves in a previous work. The doors and windows are annotated manually with a tool (Figure 3). When the annotations are inconsistent, the windows and doors will be adjusted according to the walls to keep the geometrical constraints on the primitives. This adjustment will reduce the accuracy of annotations more or less.

In the new framework based on EdgeGAN, the generated PFM is denoted as

Y = G_{1} (X) \in ℝ^{w \times h \times n_{c}}

where

n_{c}

is the number of categories of primitives to be recognized. For the dataset ZSCVFP,

n_{c} = 3

. Each channel of

Y

is a binary image that corresponds to one primitive category. The final goal of the task, which is to extract

H = (P, L_{w a l l}, L_{w i n d o w}, L_{d o o r})

from

Y

, is very easy if the quality of

Y

is good enough.

The set of text annotations detected in

X

is denoted as

T = {t_{1}, t_{2}, \dots}

, and the set of subspaces extracted from

Y

is denoted as

S = {s_{1}, s_{2}, \dots, s_{n - 1}, s_{n}}

. For each subspace

s_{i}

, the feature vector consists of the number of windows, number of doors, ratio of area, etc. The feature matrix

S

is denoted as

X^{G} \in ℝ^{n \times m}

, where

m

is the length of the feature,

n

is the number of subspaces. The probability matrix predicted by a GNN

G_{2}

is denoted

C = G_{2} (X^{G}) \in ℝ^{n \times n}

, where

n_{s}

is the number of classes.

The formal representation of the new task’s goal can be summarized as follows:

(1): Design a $G_{1}$ to obtain the PFM that is robust with decorative disturbances in variant styles;
(2): Search for efficient criteria to inspect whether $S$ is fully connected;
(3): Design a GNN $G_{2}$ to predict the category of subspaces.

3. Methods

In this section, the EdgeGAN is designed first. Then, the SCG of VFP is defined, and some connective criteria are given based on it. Lastly, a classifying GNN for subspaces is presented.

3.1. EdgeGAN

EdgeGAN learns a map from the input FPs

X

to the output

Z

, and

Y

is the ground truth. The architecture of EdgeGAN is depicted in Figure 4. Two convolution layers, six Resnet blocks, and two deconvolution layers are connected in series with skip connect, which is a typical realization of U-Net [20] that has been used widely [27].

Two special kernels are defined as

K_{1} = [\begin{matrix} 0 & 0 & 0 \\ 1 & 1 & 1 \\ 0 & 0 & 0 \end{matrix}]

, and

K_{2} = [\begin{matrix} 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \end{matrix}]

.

The generative loss function of EdgeGAN is defined as

l_{G} = \frac{1}{N} \sum_{i = 1}^{N} \underset{G_B C E_l o s s}{\underset{︸}{- \log D (X, Z)}} + \underset{G_L 1_l o s s}{\underset{︸}{λ_{1} | Y - Z |}} + \underset{G_f i l t e r_l o s s}{\underset{︸}{λ_{2} | Y - F (Y) |}},

(1)

and the discriminative loss function is defined as

l_{D} = \frac{1}{N} \sum_{i = 1}^{N} \underset{D_f a k e_l o s s}{\underset{︸}{- \log D (X, Z)}} \underset{D_r e a l_l o s s}{\underset{︸}{- \log [1 - D (X, Y)]}},

(2)

where

N

is the batch size and

F (Y)

is a filter function defined as

F (Y) = c l i p (maxpooling 2 D (Y, K_{1}) + maxpooling 2 D (Y, K_{2}), 0, 1)

In the loss functions,

G_B C E_l o s s

,

D_f a k e_l o s s

, and

D_r e a l_l o s s

are all binary cross-entropy (BCE) loss,

G_L 1_l o s s

and

G_f i l t e r_l o s s

are L1 loss, and

λ_{1}

and

λ_{2}

are the weights for them. Those 3 BCE terms, which constitute the standard GAN loss and are designed for the maximin optimization problem

\min_{G} \max_{D} {E_{Y ~ P (Y)} [\log D (Y)] + E_{X ~ P (X)} [\log (1 - D (G (X)))]},

guide the generator

G

to generate better PFM

Z

and the discriminator to recognize the difference between the distribution of

Z

and that of the ground truth

Y

. Additionally,

G_L 1_l o s s

provides pixel-level supervision information that is suitable for a pix2pix task.

G_f i l t e r_l o s s

is a new term that composes a self-supervised loss about

Y

. In

F (Y)

,

maxpooling 2 D (A, K)

composes a max-pooling operation with a kernel

K

on the input multichannel image

A

. With those two special kernels

K_{1}

and

K_{2}

, the

maxpooling 2 D

can extract the horizontal and vertical lines, respectively, as illustrated in Figure 5b,c. The horizontal and vertical maps are added then. As those elements of

maxpooling 2 D (Y, K_{1}) + maxpooling 2 D (Y, K_{2})

the intersections would be bigger

1

, we designed a clip function to truncate it. With the clip function

c l i p (A, a, b)

, elements of

A

smaller than

a

become

a

, and elements larger than

b

become

b

. The

c l i p

operation makes the filtered PFM still be a probability map. The adding and clipping operations combine those lines to a new PFM, in which many isolate points have been filtered, as illustrated in Figure 5d.

With the self-supervised loss, the generator will learn to generate PFMs of higher quality. As

K_{1}

and

K_{2}

are designed for horizontal and vertical lines, it is not going to work for irregular walls.

In each training batch, the generator and discriminator are updated alternatively.

λ_{2}

is set to 0 in the first several epochs to keep

G_L 1_l o s s

playing a leading role in the initial stage of training. When the PFM can be generated roughly, the self-supervising loss starts to come into play gradually.

3.2. Criteria for Connective Inspection

The set of subspaces extracted from a vector graph is denoted as

S = {s_{1}, s_{2}, \dots, s_{n - 1}, s_{n}}

, where

s_{i}, i = 1, 2, \dots, n - 1

are the internal subspaces, and

s_{n}

is the subspace outside the external contour, as shown in Figure 6 and Figure 7. As the regions annotated with “AC” in Figure 6 are the spaces for air conditioners out of the door, they are ignored in Figure 7 and Figure 8. The undirected graph of

S

can be written as

H = {S, D, W}

, where

D = {(i, j) \in S \times S : i ~ j}

, and

W = {(i, j) \in S \times S : i ~ j}

.

(i, j) \in D

and

(j, i) \in D

if subspace

i

and

j

are connected with a door; moreover,

(i, j) \in W

and

(j, i) \in W

if subspace

i

and

j

are connected with a window. Denote the adjacency matrix as

M^{H} \in ℝ^{n \times n}

. The elements

m_{i j}^{H}

,

0 \leq i, j \leq n

, of

M^{H}

has the following properties:

(1): $m_{i j}^{H} = 1$ if $(i, j) \in D$ ; $m_{i j}^{H} = 0.5$ if $(i, j) \in W$ ; otherwise $m_{i j} = 0$ ;
(2): $m_{i i}^{H} = 1$ ;
(3): $m_{i j}^{H} = m_{j i}^{H}$ , that is, $M^{H}$ is symmetrical.

The subgraph without windows and its adjacency matrix are denoted as

G = {S, D}

and

M^{G} \in ℝ^{n \times n}

respectively. The elements

m_{i j}^{G} = 1

if

m_{i j}^{H} = 1

, otherwise

m_{i j}^{G} = 0

. The Laplacian matrix of

G

is defined as

L_{G} = d i a g {\sum_{j = 1, j \neq i}^{n} m_{i j}} - M_{G}

, and its eigenvalues are denoted as

λ_{1} (L_{G}) \leq λ_{2} (L_{G}) \leq \dots \leq λ_{n} (L_{G})

. If

λ_{2} (L_{G}) > 0

, then

G

is a connected graph.

The degree of internal and external connectivity of each subspace are denoted as

C_{i}^{i n n e r} = \sum_{j = 1}^{n - 1} m_{i j}

and

C_{i}^{e x t e r n a l} = m_{i n}

respectively. The criteria for inspection of connectivity include the following:

(1): There is a door on the external door at least, i.e., $\sum_{i = 1}^{n - 1} C_{i}^{e x t e r n a l} \geq 1$ ;
(2): The number of doors on the external doors is often less than 2, i.e., $\sum_{i = 1}^{n - 1} C_{i}^{e x t e r n a l} \leq 2$ ;
(3): Each subspace except those with special architectural functionality (for example, the regions for air condition and pipe) has at least one door, that is, $C_{i}^{i n n e r} \geq 1$ and $\min_{i = 1, 2, \dots, n - 1} {\max_{j = 1, 2, \dots, n - 1} {m_{i j}}} \geq 1$ , where $i, j = 1, 2, \dots, n - 1$ ;
(4): $G$ is a connected graph, that is, $λ_{2} (L_{g}) > 0$ .

All those four criteria are sufficient conditions for a fully connected graph. Furthermore, Criterion (4) is the sufficient condition of Criteria (1)–(3), but its computation is much complicated than other criteria.

3.3. Classifying of Subspaces Based on GNN

A GNN with

K

layers is defined as

H^{(k + 1)} = σ (M^{H} H^{(k)} W^{(k)}),

where

k = 1, 2, \dots, K - 1

is the index of layer,

W^{k} \in R^{d_{k - 1} \times d_{k}}

is the weight parameters to be learned,

d_{k}

is the output dimension of the

k

th layer of the GNN, and

σ (\cdot)

is the activation function.

The input of GNN is the feature matrix

X^{g} \in R^{n \times m}

of

G

and the output is the classifying probability matrix

C^{g} \in N^{n \times n_{s}}

, where

m

is the length of the feature,

n

is the number of subspaces, and

n_{s}

is the number of categories. The input dimension of the first layer is

d_{0} = n

, and the last output is

H^{k} = C^{G}

with

d_{K} = n_{s}

.

The BCE loss function adopted to train the GNN is as follows:

l_{G} = \frac{1}{N} \sum_{i = 1}^{N} - [H^{K} \log {\bar{C}}^{G} + (1 - H^{K}) \log (1 - {\bar{C}}^{G})]

(3)

where

{\bar{C}}^{G}

is the one-hot labeled category. Considering that the number of subspaces in each VFP varies,

M^{G}

is expanded to

{\bar{M}}^{G} \in R^{20 \times 20}

with

{\bar{M}}^{G} = d i a g {X^{G}, I_{(20 - n)}}

, and

X^{G}

is expanded to

{\bar{X}}^{G} \in R^{20 \times 20}

with

{\bar{X}}^{G} = d i a g {X^{G}, 0_{(20 - n)}}

. The output dimension of the last layer becomes

d_{K} + 1

and the label vector

{\bar{C}}^{G} = [\begin{matrix} C^{G} \\ d^{k} \cdot 1_{(20 - n) \times 1} \end{matrix}]

. The labels of subspace are coded from 0 to

d_{k} - 1

. Thus, the new virtual subspace is labeled with

d_{k}

.

4. Experimental Results and Discussion

In this section, three experiments are conducted to illustrate the proposed methods. First, EdgeGAN is compared with the DL-based pipeline on the ZSCCSVFP dataset. Second, the usage of connective criteria is demonstrated by presenting an example. Lastly, the GNN is compared with four common classifying methods to validate its advantage in terms of structural information.

4.1. EdgeGAN

In this experiment, all training sets are executed on the hardware platform “CPU Intel Core i9-9900K, 64 GB memory, and GPU NVIDIA RTX2080TI×2,” and the software is “Python 3.6, Pytorch 1.4.0 [36], Cuda 10.0, and Cudnn 7.4.2 [37].” The maximal training epoch is 220, and the batch size is 128.

λ_{1}

is always set to 10, and

λ_{2}

is set to 0 in the first 10 epochs and 100 in the subsequent epochs. The learning rate is set to 0.0002 at the first 20 epochs and decreased to 0 linearly in subsequent epochs. The training is recorded in Figure 8.

G_f i l t e r_l o s s

is 0 in the first 10 epochs and decreases gradually. The

G_L 1_l o s s

is stable at approximately 1.38 since the 20th epoch. Thus, it is not a suitable measurement of accuracy. The corresponding evolutionary process of

Y

is depicted in Figure 9.

The quality of generated images can be divided into three levels:

(1): Level 1: The generated images are free from noisy points and have high-quality lines, and the recognition accuracy of primitives is close to the conventional pipeline. The proportion of level 1 is approximately 40%. These images can be used to obtain vector graphics with a few manual adjustments, similar to the conventional pipeline. Figure 10 compares the number of adjusting operations that are counted by a decoration designer on 100 FPs with level 1 results. Although the results of EdgeGAN satisfy the requirements of the application, its performance is still slightly weaker than that of the DL-based pipeline. The mean value of operations of the DL-based pipeline (16.50) is close to that of EdgeGAN (16.67). However, the standard deviation of EdgeGAN (8.34) is much larger than that of the DL-based pipeline (4.4628), which means that the latter is more stable. Moreover, 30 PFMs generated by the DL-based pipeline need less than eight operations, while only 21 PFMs by EdgeGAN, which means that the former has a higher rate of excellence. The results of EdgeGAN. Considering that the pseudo-ground truth annotations themselves are obtained on the basis of the conventional pipeline and suffer from inaccuracy, the results are reasonable. The performance of EdgeGAN can be improved if it is training on a larger and higher quality dataset.
(2): Level 2: In addition to inaccurate primitives, some noisy points, broken lines, redundancy lines, or unaligned lines are presented in the generated images, as shown in the lines in the main body of Figure 11. The proportion of level 2 is approximately 55%. The self-supervising loss can relieve but cannot eliminate this phenomenon. Some postprocessing methods are necessary to address these problems. Solving this problem by using the EdgeGAN itself is direct but still challenging.
(3): Level 3: Serious defects in quality or accuracy with a proportion of approximately 5% are observed in the sloping walls in Figure 11. The reason is that the number of samples with sloping walls is less than 100, which is much less than horizontal and vertical walls.

On one single RTX2080TI, the frame rate of EdgeGAN and its postprocessing is approximately 32 fps; and the frame of the DL-based pipeline on an Intel 9900 K CPU is approximately 2 fps. Although EdgeGAN can obtain PFM at a much higher speed, a gap still exists between the integral accuracy and quality of generated images and the requirements of applications.

4.2. Connectivity of Subspaces

The adjacent matrix of the vector graph in Figure 6 is as follows. Notably, subspace 1, 2, and 4 are ignored.

M_{G} = [\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 \end{matrix}] .

Thus, the Laplacian matrix is

L_{G} = [\begin{matrix} - 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & - 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & - 1 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & - 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & - 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & - 1 & - 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & - 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & - 1 & 0 & 0 \end{matrix}] .

The graph is not connected because

λ_{2} (L_{G}) > 0

.

M_{G}

shows that the presence of five unconnected loops. Other criteria can also be calculated easily with

M_{G}

.

4.3. Classifying of Subspaces Based on GNN

A new dataset that contains feature matrices annotated with subspace types is established to validate the advantage of GNN. The distributions of instances in the dataset are listed in Table 1. The features used here include window ratio, area ratio, number of doors, number of windows, and number of edges. Four widely used methods [38], namely, C4.5, iterative dichotomiser 3 (ID3), basic backpropagation (BP) neural network, and classification and regression tree (CART), are compared with GNN. The input of these four methods is the feature vector of one subspace, which means that they can only predict the type of one subspace independently. The input dimension of the BP network with one hidden layer is 5, the output dimension is 7, and the number of neurons in the hidden layer is 20. Part of the decision tree obtained by CART is shown in Figure 12.

Only GNN considers the connective graph and achieves higher accuracy than other methods. The results are listed in Table 2. The confusion matrices of CART and EdgeGAN are depicted in Figure 13 and Figure 14, respectively. The accuracies of the study room and the kitchen are enhanced dramatically.

5. Conclusions

EdgeGAN generates PFM in an end-to-end manner with a frame rate of 32 fps on an RTX2080TI GPU, which is much faster than the DL-based pipeline’s 2 fps since many modules of the pipeline can only run on a CPU. Although the accuracy of EdgeGAN is slightly lower than that of the DL-based pipeline, especially on sloping walls, its potential can be further exploited if given a larger and higher quality training set. Four connective criteria are proposed to inspect the connectivity of subspaces segmented from one FP. Those criteria are also suitable for postprocessing the results of traditional methods and object detection frameworks. GNN utilizes the connective information to predict the categories of subspaces and achieves 4.69% higher accuracy than other classification approaches. The category information of subspaces can be used to check with the depictive texts of FP.

In this study, since the PFM generation and subspace segmentation are fulfilled separately, the computing speed and performance can be improved further if they are realized in an end-to-end manner based on a one-stage framework. Thus, we will develop a one-stage multitask framework that finishes primitive detection, subspace segmentation, optical character recognition, and consistency inspection, simultaneously, in a future study. Furthermore, to improve the quality of PFM about irregular walls, some deep activate contour methods, such as deep snake [39] and deep level set loss [40], will also be exploited.

Author Contributions

Conceptualization, data curation, K.Z.; methodology, S.D.; project administration, W.L.; software, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Guangdong Basic and Applied Basic Research Projects (2019A1515111082, 2020A1515110504), Fund for High-Level Talents Afforded by University of Electronic Science and Technology of China, Zhongshan Institute (417YKQ12, 419YKQN15), Social Welfare Major Project of Zhongshan (2019B2010, 2019B2011), Achievement Cultivation Project of Zhongshan Industrial Technology Research Institute (419N26), and Young Innovative Talents Project of Education Department of Guangdong Province (419YIY04).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

All abbreviations notations used in this work are listed below.

FP	floor plans
VFP	vectorization of floor plans
FVG	floor vector graph
PFM	primitive feature map
SCG	subspace connective graph
GAN	generative adversarial network
GNN	graph neural network
EdgeGAN	edge extraction GAN
ZSCVFP	private dataset established by us

References

Lewis, R.; Séquin, C. Generation of 3D building models from 2D architectural plans. Comput. Aided Des. 1998, 30, 765–779. [Google Scholar] [CrossRef]
Gimenez, L.; Hippolyte, J.-L.; Robert, S.; Suard, F.; Zreik, K. Review: Reconstruction of 3D building information models from 2D scanned plans. J. Build. Eng. 2015, 2, 24–35. [Google Scholar] [CrossRef]
Lu, T.; Tai, C.-L.; Su, F.; Cai, S. A new recognition model for electronic architectural drawings. Comput. Aided Des. 2005, 37, 1053–1069. [Google Scholar] [CrossRef]
Lu, T.; Tai, C.-L.; Bao, L.; Su, F.; Cai, S. 3D Reconstruction of Detailed Buildings from Architectural Drawings. Comput. Aided Des. Appl. 2005, 2, 527–536. [Google Scholar] [CrossRef]
Lu, T.; Yang, H.; Yang, R.; Cai, S. Automatic analysis and integration of architectural drawings. Int. J. Doc. Anal. Recognit. 2006, 9, 31–47. [Google Scholar] [CrossRef]
Zhu, J. Research on 3D Building Reconstruction from 2D Vector Floor Plan Based on Structural Components Recognition. Master’s Thesis, Tsinghua University, Beijing, China, 2013. [Google Scholar]
Jiang, Z. Research on Floorplan Image Recognition Based on Shape and Edge Features. Master’s Thesis, Harbin Institute of Technology, Harbin, China, 2016. [Google Scholar]
Gimenez, L.; Robert, S.; Suard, F.; Zreik, K. Automatic reconstruction of 3D building models from scanned 2D floor plans. Autom. Constr. 2016, 63, 48–56. [Google Scholar] [CrossRef]
Tombre, K.; Tabbone, S.; Pelissier, L.; Lamiroy, B.; Dosch, P. Text/Graphics Separation Revisited. In International Workshop on Document Analysis Systems; Springer: Berlin/Heidelberg, Germany, 2002; pp. 200–211. [Google Scholar]
Ahmed, S.; Weber, M.; Liwicki, M.; Dengel, A. Text/Graphics Segmentation in Architectural Floor Plans. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 734–738. [Google Scholar]
Ahmed, S.; Liwicki, M.; Weber, M.; Dengel, A. Automatic Room Detection and Room Labeling from Architectural Floor Plans. In Proceedings of the 10th IAPR International Workshop on Document Analysis Systems, Gold Coast, Australia, 27–29 March 2012; pp. 339–343. [Google Scholar]
Smith, R. An overview of the Tesseract OCR engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; Volume 2, pp. 629–633. [Google Scholar]
Long, S.; He, X.; Yao, C. Scene Text Detection and Recognition: The Deep Learning Era. Int. J. Comput. Vis. 2021, 129, 161–184. [Google Scholar] [CrossRef]
Dodge, S.; Xu, J.; Stenger, B. Parsing floor plan images. In Proceedings of the 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), Nagoya, Japan, 8–12 May 2017; pp. 358–361. [Google Scholar]
Liu, C.; Wu, J.; Kohli, P.; Furukawa, Y. Raster-to-Vector: Revisiting Floorplan Transformation. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2214–2222. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Jian, S. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934v1. [Google Scholar]
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Object Detection with Keypoint Triplets. arXiv 2019, arXiv:1904.08189v1. [Google Scholar]
Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. arXiv 2019, arXiv:1808.01244v2. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Cham, Switzerland, 5–9 October 2015; pp. 234–241. [Google Scholar]
Goodfellow, I.J.; Pouget-abadie, J.; Mirza, M.; Xu, B.; Warde-farley, D. Generative Adversarial Nets. arXiv 2014, arXiv:1406.2661v1. [Google Scholar]
Sandelin, F. Semantic and Instance Segmentation of Room Features in Floor Plans Using Mask R-CNN. Master’s Thesis, Uppsala University, Uppsala, Sweden, 2019. [Google Scholar]
Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784v1. [Google Scholar]
Odena, A.; Olah, C.; Shlens, J. Conditional Image Synthesis with Auxiliary Classifier GANs. In Proceedings of the International Conference on Machine Learning, ICML 2017, Sydney, Australia, 6–11 August 2017; Volume 6, pp. 4043–4055. [Google Scholar]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. arXiv 2017, arXiv:1701.07875v3. [Google Scholar]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved Training of Wasserstein GANs. arXiv 2017, arXiv:1704.00028v3. [Google Scholar]
Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar]
Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative Adversarial Networks: An Overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef] [Green Version]
Hong, Y.; Hwang, U.; Yoo, J.; Yoon, S. How Generative Adversarial Networks and Their Variants Work. ACM Comput. Surv. 2019, 52, 1–43. [Google Scholar] [CrossRef] [Green Version]
Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 8798–8807. [Google Scholar]
Wang, T.; Liu, M.; Zhu, J.; Liu, G.; Tao, A.; Kautz, J.; Catanzaro, B. Video-to-Video Synthesis. arXiv 2018, arXiv:1808.06601v2. [Google Scholar]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2868–2876. [Google Scholar]
Kim, T.; Cha, M.; Kim, H.; Kwon, J.; Jiwon, L. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. arXiv 2017, arXiv:1703.05192. [Google Scholar]
Kalervo, A.; Ylioinas, J.; Häikiö, M.; Karhu, A.; Kannala, J. CubiCasa5K: A Dataset and an Improved Multi-task Model for Floorplan Image Analysis. arXiv 2019, arXiv:1904.01920. [Google Scholar]
Facebook. Available online: https://Pytorch.Org/ (accessed on 13 October 2020).
Nvidia. Available online: https://Developer.Nvidia.Com/Zh-Cn/Cuda-Toolkit (accessed on 25 June 2020).
Li, H. Statistical Learning Method; Tsinghua Press: Beijing, China, 2019. [Google Scholar]
Zambaldi, V.; Raposo, D.; Santoro, A.; Bapst, V. Relational Deep Reinforcement Learning. arXiv 2018, arXiv:1806.01830v2. [Google Scholar]
Kim, Y.; Kim, S.; Kim, T.; Kim, C. CNN-Based Semantic Segmentation Using Level Set Loss. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 8–10 January 2019; pp. 1752–1760. [Google Scholar]

Figure 1. Reconstructing the 3D model from a 2D floor plan.

Figure 2. Conventional pipeline of VFP.

Figure 3. The annotation tool for primitives.

Figure 4. Architecture of EdgeGAN.

Figure 5. The self-supervising filter of EdgeGAN.

Figure 6. Segmentation of FP.

Figure 7. Subspace connective graph.

Figure 8. The curve of loss.

Figure 9. Generated images in epoch 10, 60, 110, 160, 210.

Figure 10. Comparison between conventional pipeline and EdgeGAN.

Figure 11. The undetected sloping walls.

Figure 12. Decision tree of CART.

Figure 13. Confusion matrix of CART.

Figure 14. Confusion matrix of GNN.

Table 1. Number of instances in the dataset.

	Training Set	Test Set
master bedroom	809	200
balcony	1242	315
bathroom	1143	287
study room	174	46
living room	809	200
second bedroom	2358	587
kitchen	805	200

Table 2. Accuracy of subspace decision.

Method	C4.5	ID3	BP	CART	GNN
Accuracy	74.82%	75.49%	79.13%	79.66%	84.35%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, S.; Wang, W.; Li, W.; Zou, K. Vectorization of Floor Plans Based on EdgeGAN. Information 2021, 12, 206. https://0-doi-org.brum.beds.ac.uk/10.3390/info12050206

AMA Style

Dong S, Wang W, Li W, Zou K. Vectorization of Floor Plans Based on EdgeGAN. Information. 2021; 12(5):206. https://0-doi-org.brum.beds.ac.uk/10.3390/info12050206

Chicago/Turabian Style

Dong, Shuai, Wei Wang, Wensheng Li, and Kun Zou. 2021. "Vectorization of Floor Plans Based on EdgeGAN" Information 12, no. 5: 206. https://0-doi-org.brum.beds.ac.uk/10.3390/info12050206

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Vectorization of Floor Plans Based on EdgeGAN

Abstract

1. Introduction

2. Problem Description

Framework Based on EdgeGAN

3. Methods

3.1. EdgeGAN

3.2. Criteria for Connective Inspection

3.3. Classifying of Subspaces Based on GNN

4. Experimental Results and Discussion

4.1. EdgeGAN

4.2. Connectivity of Subspaces

4.3. Classifying of Subspaces Based on GNN

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI