Road extraction is widely used in many urban applications such as road map updating, geographic information updating, car navigations, geometric correction of urban remote sensing image, etc. [1
]. Road region segmentation based on remote sensing images [4
] has its unique and difficult characteristics, which are manifested in Figure 1
: (1) The road is long and narrow, although it occupies a small proportion of the whole image, and often covers the whole image; (2) the topological connectivity relationship is complex, especially in the road intersection; (3) the geometric features are similar to the water system and railway; (4) the texture features are easy to be confused with the surrounding background environment; (5) the extracted roads are not connected due to the occlusion of trees, shadows, buildings, etc. These characteristics above show the differences between road and non-road features, which makes the challenge for road extraction by using the current popular semantic segmentation methods to some extent.
Recently, various current popular semantic segmentation methods have been published in succession. Fully convolutional networks (FCN) [5
] is the first model for Encoder-Decoder supervised learning and pre-training, and it cannot fully capture contextual semantic relationship due to loss of spatial information via using pooling. As a result, researchers proposed efficient multi-scale contextual semantic fusion modules, such as Deeplab’s dilated convolution [6
], pyramid scene parsing network’s (PSPNet’s) pyramid pooling module [9
], and encoder-decoder networks for effective fusion of low-level and high-level features at different resolutions, such as U-Net [10
], LinkNet [11
], and D-LinkNet [3
]. In particular, D-LinkNet, as a typical road extraction network, has a good lightweight effect. Of course, there is a shortage of local information loss due to the use of dilated convolution. At present, the emerging attention mechanism [12
] for global information learning has also achieved success in the field of semantic segmentation, such as Non-local [15
], PSANet [16
], A2Net [17
], EMANet [18
], and HsgNet [19
]. Graph convolution networks (GCN) [20
] are also brought into focus because of its strong reasoning learning ability. However, it is still difficult to apply the above method to the extraction task of the complex and occluded roads with features similar to background, especially in the improvement of road connectivity.
Constantly, with the development of location big data, some scholars infer the distribution of road network by tracking the GPS trajectory data of vehicles to extract the road network [21
]. In [26
], researchers used GPS data as input data to improve the road disconnection caused by occlusion, texture similarity, and geometric feature similarity. Apparently, the road connectivity can be improved by introducing multivariate location data, which provides a direction for the re-creation of this paper. However, another problem found in the research process is also worthy of attention and optimization. During the shooting process of remote sensing image, the image distribution will be uneven, and the contrast will be reduced due to the occlusion of cloud and the illumination of light, which therefore leads to the difficulty in pixel classification [27
]. Therefore, the enhancement of data pre-processing is another focus of this paper.
In this paper, we aim to increase the accuracy of road extraction and improve the road connectivity via improving the above problems. We propose to improve the performance of hidden representations of the model based on fusion location data, and to improve the road disconnection caused by occlusion, shadow, cloud, etc. We study the general model of the regression method and integrate the data pre-processing and post-processing modules into this paper. In data pre-processing, the histogram equalization [27
]. is adopted to enhance the remote sensing image data and increase the data contrast and feature difference; in data post-processing, an Iteration Reinforcement (IteR) module is designed to fuse the original information to repeatedly self-correct the prediction output and study the prediction output feature map by Iteration Reinforcement.
The specific contributions of this paper are as follows:
We propose a new road extraction method based on location data fusion and designed a road extraction network based on D-LinkNet, Fusion Network (FuNet) for short. In addition, we studied the general data pre-processing and post-processing methods of the proposed network. We added the Iteration Reinforcement (IteR) module of post-processing function to the output terminal of the network to splice, fuse, and retrain all the information of the original input data and the output results of the network.
We design an IteR module to perform data post-processing. IteR consists of n basic blocks. By introducing multiple iterative optimization techniques, the prediction results can reach an optimal and stable result after multiple optimizations, and the connectivity identification of the road can also be improved when the overall recognition rate of the road model is enhanced. The basic block structure is introduced to improve the performance of the model. The proposed module is universal.
The histogram equalization algorithm is used for data pre-processing of remote sensing image. The data are enhanced by histogram equalization to improve the image contrast. Different from the commonly used data augmentation methods such as image rotation, clipping, and zooming, etc., it makes up for the limited training set caused by the difficulty in semantic segmentation image annotation. The proposed method is universal.
In this paper, we compare and analyze a number of advanced road extraction methods on the public data set BeiJing DataSet [1
] to certify the effectiveness and progressiveness of (1)–(3). We also discussed the performance changes of the proposed model under different conditions, including the use of histogram equalization before and after data processing, the role of IteR module, and the changes with the number of basic blocks of the IteR module. According to the discussion results, we gave some feasible suggestions for application in this paper.
This paper is organized as follows: In Section 2
, related work is introduced. In Section 3
, the proposed methodology based on iteration reinforcement is detailed. The experiment and results are shown in Section 4
. The discussion is presented in Section 5
. Finally, the conclusion is drawn in Section 6
2. Related Work
With the rapid development of machine learning and deep learning, some achievements have been accumulated in road extraction. However, it is still difficult to extract road regions based on remote sensing imagery. The research results on road connectivity especially are relatively few.
In the aspect of traditional machine learning, Song and Civco [32
] proposed a method to detect road regions using shape index feature and support vector machines (SVM). Das et al. [33
] designed a multi-level framework based on two significant road features to extract roads from high-resolution multispectral images using probabilistic SVM. Alshehhi and Marpu [2
] presented an unsupervised road extraction method based on hierarchical image segmentation. Recently, a road segmentation result using shallow convolutional neural network combined with multi-feature view-based is published. The network made use of the abstract features extracted from the derived representation of the input image display, and combined gradients information as additional features of the image to obtain better results [34
]. These methods rely on prior knowledge and additional features, and the method of deep learning is widely used in road extraction task due to automatically learn features. In the aspect of deep learning, Saito [35
] exploited CNN to extract roads directly from the original images and achieved better results in Massachusetts Roads Dataset. RoadTracer [36
] proposed by Bastani directly outputted the road network from CNN through the iterative search process based on CNN decision function. Xia et al. [37
] also directly used DCNN to extract road and tested them in GF-2 images. According to the newly published research results, some scholars have introduced the idea of deep transfer learning and integrated learning into the extraction task of road target objects in stages to improve the integrity of the roads network [38
]. The roads are extracted directly by deep convolutional neural network in the above studies. However, with the continuous progress of deep learning in the field of computer vision, researchers began to do innovative research combined with deep learning. At present, in view of the uniqueness of roads extraction, there are four semantic segmentation techniques based on deep learning are worthy of further study.
The first model that impresses us is the multi-scale and multi-dimensional information fusion network model typically represented by dilated convolution, such as U-Net [10
], LinkNet [11
], and D-LinkNet [3
]. They splice feature maps with different resolutions to integrate low-level detail information and high-level semantic information. In particular, D-LinkNet proposed by Zhou [3
] et al. won the first prize in 2018 DeepGlobe Road Extraction Challenge by expanding the receptive field and multi-scale contextual semantic information fusion. However, not all pixels are involved in the calculation due to kernel discontinuity, which results in the loss of spatial information and being unfit for the road extraction that require learning global information.
The second network model is the innovative network based on attention mechanism [12
]. Non-local [15
], PSANet [16
], OCNet [39
], and CCNet [40
] models were the first to introduce self-attention in 2018, as well as Local RelationNet [41
] model in 2019, which achieved good results in global and long-distance spatial information learning. A2
] and CGNL [42
] optimized the self-attention mathematically. SGR [43
], Beyond Grids [44
], GloRe [45
], LatentGNN [46
], APCNet [47
], and EMANet [18
] explored and practiced the “low rank” reconstruction. DANet [48
] and cross attention network [49
] further demonstrated that the attention to the information on the feature channel is conducive to the improvement of semantic segmentation accuracy. Of course, learning global information and long-distance semantics based on attention is effective [50
], which makes up for the loss of dilated convolution information. However, although attention mechanisms can learn global information, it also brings information redundancy.
The third direction that we are interested in is graph convolution. Graph Convolution Networks (GCN) [20
] is a very popular semantic relation reasoning approach for image segmentation in recent years. Different from the CRFs [52
] and the random walk network [53
], GCN is better at learning the global and long-distance spatial information. Wang et al. [55
] proposed to use GCN in video recognition task to capture the relation between objects. In the latest invention published by CVPR in 2020, the author exploited the graph convolution to perform semantic sketch segmentation and adopted the graph convolution with two branches to extract intra-stroke and inter-stroke features, respectively [56
]. In addition, the popular methods such as GAT [57
], GAE [58
], and GGN [59
] also take GCN as a model to build basic block. However, there are some problems in the above methods. They have not been tested on the task of road extraction, especially on the improvement of road connectivity.
The fourth direction, also one of the issues considered in this paper, is the effective improvement of road connectivity. At first, some scholars exploited the traditional method to improve the road connectivity by using the manually designed finite element model and by combining the contextual prior knowledge, such as High-order CRF [60
], Junction-point processes [61
], and so on. In recent years, Batra et al. [62
] tried to solve the roads topological connectivity by tracking the specific annotation direction in combination with the behavior of manual road annotation. Some researchers generate the road network by smoothing and denoising to GPS data [21
]. In [26
], the combination of remote sensing image and GPS data was input into the model for the first time to improve the road extraction ability of the model. In the road extraction method, we can improve the disconnectivity of the extracted road due to the occlusion of trees, buildings, shadows, and cloud by introducing GPS data.
The performance of semantic segmentation methods above will be better if a data augmentation technique and a data post-processing method can be integrated. The data augmentation technique is still a powerful way to improve the accuracy of semantic segmentation. The effect of traditional data extension methods such as tailoring, rotation, and scaling is not obvious due to the difficulty in annotating data. In particular, due to the low contrast of the acquired remote sensing image data caused by sunlight or weather, the image contrast can be improved by data augmentation, so that the model can identify the target object more easily [27
]. Simultaneously, data post-processing is a very common method to improve semantic segmentation, and there are many post-processing methods [62
]. In [63
], a refinement pipeline is introduced to iteratively enhance the prediction output, and the refinement process is performed for the whole model. The predicted segmentation results and the original input images are spliced during the optimization, and then sent to the model for calculation. The approach improves the performance of the model after multiple iterations, but the computation amount is very huge. The refinement method for several iterations is also adopted in [21
]. Different from the previous method of splicing the prediction results with the original pictures, the prediction results are spliced with the decoded output feature maps, and the model effect is better after multiple iterations. A stacked multi-branch convolutional module is proposed in the model for iteration, instead of the iteration of the entire network, which can effectively utilize the mutual information and reduce the computation amount. Some scholars also employ a post-processing probability layer combined with deep learning to effectively optimize road segmentation [64
]. Mnih and Hinton [65
] uses RBMs as the basic block to construct the deep neural network and combines pre-processing and post-processing methods further improve the accuracy of road segmentation. The contributions of the above scholars have inspired the research of this paper.
In this paper, we constructed a road extraction network by combining the data pre-processing with histogram equalization and the fusion location data to strengthen the learning of output results by embedding a general IteR module at the end of the network. The IteR module is inspired from [62
], but the entire network is not iterated to avoid excessive computation; instead, the prediction output is fused with the original image, and the iteration is repeated to achieve self-correction. Experimental results show that the proposed road extraction network of the post-processing module based on IteR and the data pre-processing method are effective. Compared with other experimental methods, the results are optimal, the accuracy of road extraction is increased, and the road connectivity is improved.