Machine Recognition of Map Point Symbols Based on YOLOv3 and Automatic Configuration Associated with POI

Zhang, Huili; Zhou, Xiaowen; Li, Huan; Zhu, Ge; Li, Hongwei

doi:10.3390/ijgi11110540

Open AccessArticle

Machine Recognition of Map Point Symbols Based on YOLOv3 and Automatic Configuration Associated with POI

The School of Geo-Science & Technology, Zhengzhou University, Zhengzhou 450000, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2022, 11(11), 540; https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi11110540

Submission received: 13 August 2022 / Revised: 18 October 2022 / Accepted: 24 October 2022 / Published: 28 October 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This study is oriented towards machine autonomous mapping and the need to improve the efficiency of map point symbol recognition and configuration. Therefore, an intelligent recognition method for point symbols was developed using the You Only Look Once Version 3 (YOLOv3) algorithm along with the Convolutional Block Attention Module (CBAM). Then, the recognition results of point symbols were associated with the point of interest (POI) to achieve automatic configuration. To quantitatively analyze the recognition effectiveness of this study algorithm and the comparison algorithm for map point symbols, the recall, precision and mean average precision (mAP) were employed as evaluation metrics. The experimental results indicate that the recognition efficiency of point symbols is enhanced compared to the original YOLOv3 algorithm, and that the mAP is increased by 0.55%. Compared to the Single Shot MultiBox Detector (SSD) algorithm and Faster Region-based Convolutional Neural Network (Faster RCNN) algorithm, the precision, recall rate, and mAP all performed well, achieving 97.06%, 99.72% and 99.50%, respectively. On this basis, the recognized point symbols are associated with POI, and the coordinate of point symbols are assigned through keyword matching and enrich their attribute information. This enables automatic configuration of point symbols and achieves a relatively good effect of map configuration.

Keywords:

YOLOv3; attention mechanism; map point symbol recognition; POI; keyword matching; automatic configuration

1. Introduction

Maps are abstract representations of the actual world [1]. Map symbols convey the meaning of cartographic objects through their shapes and colors. In automatic mapping, symbol recognition and positioning provide geographic information of symbols and location information of geographic features. Different symbols are used to express the meaning of various objects, and various combinations of symbols depict the actual world. Point symbols can represent objects that are actually distributed through the world as points, lines, and areas. Therefore, automatic extraction of map point symbols is a crucial process in cartography [2]. Currently, cartography is still at the stage of human–computer interaction. People’s understanding of symbols and geographical environment is transformed into graphical expression. Through the design, configuration and related relationship processing of symbols, maps with human cognitive characteristics are made and stored in the computer. Computers do not yet possess human cognitive and thinking capabilities, and the barriers to automatic mapping have so far been difficult to break through. In recent years, with the advancement of information technology, deep learning methods have been used to intelligently analyze and extract rich semantic information from maps. For instance, object detection [3], semantic segmentation [4], and image classification [5] are frequently used for remote sensing image processing. In the geographic information domain, by comparison, traditional template matching [6,7] is commonly used. By comparing the template image with the test image, the similarity between the two is calculated to locate the predefined target. In addition, there is a statistical analysis [8,9] method for characterizing texture images with model coefficients, a structural analysis [10,11] method for recognizing complex objects based on image structural features, a mathematical morphological method [12,13,14] for studying image spatial shape and structure using set theory, and a deep learning method [15,16,17] for machines to simulate the operation of human brain thinking, etc. Traditional Convolutional Neural Networks (hereinafter referred to as CNN) and Deep Neural Networks (DNN) [18,19,20] struggle to comprehend the complex semantic information of symbols, whereas the development of deep learning [21] in the field of graphics is gradually resolving traditional graphics problems. Image symbols have made significant breakthroughs in addressing complex spatial relationships and semantic information, enabling map point symbol recognition. This study is devoted to studying map point symbols [22], that is, learning the symbols’ characteristics and corresponding semantic information. The scene information on a large-scale urban map is relatively simple, while the objects represented by point symbols are extremely rich. Point of interest (hereinafter referred to as POI) objects such as schools, hospitals, and shopping malls contain abundant human activity characteristics, which reduce symbol recognition errors caused by background information interference. The You Only Look Once Version 3 (hereinafter referred to as YOLOv3) [23,24] algorithm has the advantage of recognizing small targets [25,26] with respect to both real-time performance and accuracy. An attention module can be added to enhance the symbol feature extraction capability for problems such as symbol deformation during map scanning. The algorithm can quickly and precisely determine the point symbol category and corresponding semantic information on a large-scale urban map, completing the recognition of map point symbols.

The artificial intelligence algorithm trains a large number of samples to learn the color, texture, shape, and other features of various point symbols, enabling the computer to recognize map symbols and retrieve their semantic information. For cartography applications, the computer is unable to automatically obtain the spatial location information of the symbols, making their configuration on the map difficult. Nowadays, various mapping software [27,28], such as ArcGIS, Adobe Illustrator and so on allow for the interactive configuration of maps. The time and labor costs associated with a large number of human–computer interactions, as well as the resulting precision deviation, hinder the development of map digitization and intelligence [29]. Conventional simulated annealing algorithms, genetic algorithms [30,31] and particle swarm simulated annealing algorithms [32] can better resolve the problem of location conflicts between map symbols and annotations [33,34], but cannot match their semantic information. Therefore, it is difficult to fully automate the process from symbol recognition to configuration. The configuration of point symbols differs from that of annotations, which lack precise geographical coordinates. When configuring, they prioritize addressing the correlation between one another. However, point symbols are positioned to the corresponding position based on their coordinates, and the configuration emphasizes the correspondence of symbolic semantic information. Since the computer is capable of mapping point symbols and semantic acquisition, the automatic production of point symbols can be completed if it can independently obtain the symbol space position information. POI is associated with point symbol coordinates and their attribute information (name, address, category, etc.) [35,36]. If the acquired POI data is mapped with the semantics of the computer-recognized map symbols [37] then, in this case, the spatial position of the symbol is specified, and the intelligent configuration of point symbols is completed.

The previous map point symbol recognition and configuration was merely a vectorized transformation of a specific map, but the map vectorization process frequently results in the distortion of image information, accuracy errors, scanning errors, etc., which prevent the map symbols from accurately representing the geographic information. Traditional map vectorization converts raster data into vector data, which is a representation of cartographic information in a vector format. However, this method does not acquire the cognition of map symbols, and needs to be handled by professionals, which is time-consuming and limited in application. This study is associated with POI data and contains a vast amount of attribute information. The point symbols are configured to the Map World, and the attribute information therein can assist different practitioners in completing the analysis operation. It also reduces the data acquisition work, not only the vectorization conversion of specific maps. In this study, by acquiring POI data for the region of interest and specified categories, thematic maps with different styles of the specified region can be configured, including a nautical map, a transportation map, a tourism map, an administrative map, an industrial map, an agricultural map, etc. Personnel from various industries create thematic maps according to their needs, and mine the potential information of spatial targets by means of spatial analysis such as network analysis, spatial information classification and spatial statistical analysis. This study is applied to many fields, such as the making of thematic maps as well as the analysis of urban development based on the density and distribution of the objects represented by point symbols. In addition, it enables urban development planning based on spatial analysis, etc. Due to the significance of large-scale artificial intelligence mapping, this study has the following specific objectives: (1) to produce a dataset for map point symbol recognition; (2) to evaluate the point symbol recognition accuracy of the algorithm used in this study along with the comparison algorithm and (3) recognition effect; (4) to associate map point symbols with POI and to automate their matching.

2. Materials and Methods

The point symbols in large-scale city maps are regular in shape, and are generally composed of circles, rectangles, various letters and Chinese characters [38]. These have specific meanings [39,40] and represent different urban infrastructures [41]. Therefore, it is indispensable to recognize and configure the point symbols on the urban map. This section will describe the following aspects: (1) the YOLOv3 network structure for machine recognition of map point symbols; (2) the method to improve the accuracy of map point recognition; (3) machine automatic positioning configuration of map point symbols.

2.1. YOLOv3 Network Structure for Machine Recognition of Map Point Symbols

At present, deep learning methods are commonly employed to recognize map point symbols. When selecting a model, the following points should be satisfied according to the actual needs of a given situation: ① the network structure is small and occupies small amounts of memory; ② the operation speed is fast and meets the real-time requirements; ③ the prediction accuracy is high [42], and accurate target detection is achieved; ④ the small target recognition effect is good [43], which meets the requirements of map point symbol recognition. Given the aforementioned points, considering the model size, operation speed, prediction accuracy and small target recognition effect, the CNN can be better applied to map point symbol recognition. The You Only Look Once [44,45,46] (hereinafter referred to as YOLO) model is a fast detection model based on deep learning, which can directly obtain the location and class information of targets in images. YOLOv3 is an improved version of the YOLO algorithm, whose core idea is to divides the input image into multiple regular grids. By learning the target features in the dataset, it determines the target category and the grid cell where the center of the target is located, thereby determining the area range of the target. The development of YOLOv3 is relatively mature, and it overcomes the shortcomings of You Only Look Once Version 1 (YOLOv1) and You Only Look Once Version 2 (YOLOv2) in terms of inadequate detection of small targets. Based on this, the feature pyramid network structure [47,48] is introduced to achieve multi-scale detection [49]. The identical one-stage [50] Single Shot MultiBox Detector (hereinafter referred to as SSD) [51] algorithm has the advantage of a rapid recognition speed. The core concept is to employ multiple feature maps for regression of target locations and classification of categories. SSD has the accuracy to compete with methods that utilize additional target suggestion steps and is determined to be much faster. A training dataset and target prediction for a unified framework are also produced. Although SSD employs a feature pyramid structure, its recognition of small targets is less satisfactory. The two-stage [52] Faster Region-based Convolutional Neural Network (hereinafter referred to as Faster RCNN) [53] algorithm is the first truly end-to-end deep learning detection algorithm proposed in 2015. By generating a series of candidate bounding boxes as sample candidate boxes, the input image targets are then predicted for classification by a convolutional neural network. Its biggest innovation is to contribute the Region Proposal Network (RPN) and generate a candidate box (instead of selective search) through an anchor mechanism. Lastly, feature extraction, candidate frame selection, border regression, and classification are all integrated into one network, resulting in a more accurate and efficient detection process. The Faster RCNN framework, however, is complex, with a long detection cycle and poor real-time performance. As a classic of the YOLO series, YOLOv3 is both real-time and accurate, and it has been utilized extensively in image detection, video detection, and camera real-time detection. Additionally, due to its special flexibility, it plays an important role in industrial engineering, video tracking, and other fields, such as underwater microbial detection, traffic sign detection, video vehicle detection, face recognition, mineral detection, etc. To investigate the recognition of point symbols, this study employs YOLOv3 algorithm, which has been widely used in the existing YOLO series versions with a high application rate.

YOLOv3 employs Darknet-53 [24] as its core network for feature extraction. Darknet-53 contains 53 fully connected convolutional layers and employs a large number of convolutional kernels to extract point symbol features, and the network structure as depicted in Figure 1. The Darknet-53 network is divided into two components: feature extraction (the green box in Figure 1) and prediction (the purple box in Figure 1). For the input 416 × 416 × 3 image, feature extraction is first performed by the Darknet-53 network. The residual convolution in the network convolves the image, and the height and width of the input incoming feature layer are compressed to produce a feature layer. This feature layer is convolved 1 × 1 and 3 × 3 and then superimposed to form the residual structure. The network is deepened by continuing 1 × 1 and 3 × 3 convolutions and the superposition of residual edges. The feature extraction process of the Darknet-53 network involves continuous downsampling, where the height and width of the image are continuously compressed, the number of channels is continuously expanded, and the acquired feature layers reflect the characteristics of the input image. Each convolution of the Darknet-53 network utilizes the DarknetConv2D (hereinafter referred to as DBL) structure, which is comprised of Conv, Batch Normalization (hereinafter referred to as BN), and Leaky ReLU. Conv is the convolution layer, which employs various convolution kernels to process the graph and extract distinct features. BN is the normalization layer, which normalizes the feature map. Leaky ReLU is the linear activation function. DBL is used in the network, each convolution is regularized, and BN normalization and Leaky ReLU are performed after the convolution is completed. The Darknet-53 network structure is stacked with a large number of residual structures to augment the depth of the network, which is used to facilitate the network’s extraction of higher-level semantic features. It also prevents the disappearance or explosion of gradients, reduces the training difficulty of the deep network, and improves the recognition accuracy. Upon completion of the feature extraction, five convolutions are performed on the final three feature layers. Following the convolution, the feature map branches are then stacked by convolution and upsampling. An additional 3 × 3 convolution and one 1 × 1 convolution is performed, and the results are then fed into the multiscale detection network for classification prediction and regression predictions. As a result of the images input into the detection network, the center coordinates and class of each target symbol are predicted. Additionally, the three output feature maps with different resolutions (13 × 13, 26 × 26 and 52 × 52) correspond to large-, medium-, and small-sized target prediction, respectively, taking into account the detection of different sizes of target of the input images.

2.2. Method to Improve the Accuracy of Map Point Symbols Recognition

The addition of an attention mechanism to the network can enhance the robustness of point symbols recognition. In deep learning, the attention mechanism [54,55] is a resource allocation scheme that allocates computing resources to more important tasks in the case of limited computing power, and can solve the problem of information overload simultaneously. Its essential function is to filter useful information from a large amount of information, to pay more attention to target details, and to suppress irrelevant information. The addition of an attention module to a network can efficiently extract target features to enhance recognition accuracy. The commonly used attention mechanisms include Squeeze-and-Excitation (SENet) [56], Spatial Attention Module [57], Self-Attention [58], Convolutional Block Attention Module (hereinafter referred to as CBAM) [59,60] and so on. CBAM is a feedforward convolutional neural network attention module that successively integrates the Channel Attention Module (hereinafter referred to as CAM) and the Spatial Attention Module (hereinafter referred to as SAM). As a lightweight module, CBAM can be seamlessly integrated into any CNN architecture for end-to-end training. CBAM does not have a large proportion of convolutional structure inside the module, but consists primarily of a small number of pooling layers and feature fusion operations. This structure avoids the large amount of computation caused by convolutional multiplication, reduces the complexity of the module, and decreases the computational effort. The CAM in CBAM focuses more on the channel level, with the aim of providing different feature channels with different weights in order to selectively emphasize or ignore certain features. SAM identifies the regions that are required for the task and prioritizes the target regions, looking for the most important parts of the network to process. By inserting two attention modules in turn, each branch of the network pays greater attention to the characteristics of channels and spaces. The feature information is assisted in transferring effectively on the network by enhancing or suppressing the relevant feature information. The simultaneous attention allocation to the two dimensions enhances the improvement of model performance. The CBAM is therefore added to the neural network in this study to improve the network performance.

The CBAM structure is shown in Figure 2. When the incoming feature map F (H × W × C) passes through the CAM module (the green box in Figure 2), it first goes through the global maximum pooling and global average pooling of length and width, respectively. Two 1 × 1 × C feature maps are obtained by compressing in two different dimensions. The pooled feature maps share a multilayer perceptual network (MLP). Then, after the convolutional dimensionality reduction and dimensionality increase operations, the two feature maps are superimposed. After the Sigmoid function [61,62] activation, the weights of each channel of the feature map are normalized. Multiplying the normalized weights by the input feature map yields the optimized feature F’ and completes the CAM implementation. The optimized feature maps from the CAM module are applied to the SAM module (the purple box in Figure 2). Initially, the feature map undergoes global maximum pooling and global average pooling based on the channel dimension, with the pooling operation yielding an information region with large activation levels. Here, two H × W × 1 feature maps are obtained, and these feature map are enhanced in terms of spatial location. Then, the two layers of feature maps are superimposed and reduced in dimension to one feature map with a 7 × 7 convolution. The number of channels is adjusted using a 1 × 1 convolution, and finally the weights are normalized using a Sigmoid activation function. Multiplying the normalized weights by the input feature map yields the final optimized feature F’, completing the entire CBAM implementation. The Sigmoid function is a common S-shaped function in biology, also known as S-shaped growth curve. In deep learning, it is often used as the activation function of neural networks due to its properties such as its single-increasing function as well as its inverse single-increasing function. The Sigmoid activation function provides neural networks with high convergence and mapping precision, and it is frequently used in the hidden layer in CNN models. The output range of the Sigmoid function is limited between 0 and 1, so it normalizes each neuron and the data do not easily diverge during the data transfer. For the problems of scan deformation and symbol superposition in map scanning, YOLOv3 introduces the CBAM attention mechanism, which causes the network to emphasize the key information of the target more and suppress the secondary information. It efficiently extracts the target features, reduces the time cost, and improves the accuracy of target detection and target classification [63].

2.3. Machine Automatic Positioning Configuration of Map Point Symbols

There is a wealth of information contained within each symbol on the map. If the symbol configuration is simply symbolized [64], the meaning of the symbol itself will be lost. Therefore, when configuring the point symbols [65], it is necessary to associate symbol semantics. The configuration of point symbols is automatically carried out on the basis of recognition. During configuration, symbols are automatically positioned by coordinates and cover rich semantic information. However, when point symbols are recognized, their location information cannot be obtained, making automatic configuration challenging. The POI data contains geographic coordinate information and corresponding attribute information. It is associated with the results of point symbol recognition to solve the problem of spatial location acquisition. Figure 3 illustrates the map point symbol machine automatic positioning configuration process. The association between point symbol recognition results and POI is mainly performed by keyword matching [66], and so keyword matching is the key step of point symbol configuration. Regular expressions [67] are widely used in text detection because of their superior expressive power and descriptive flexibility, and so this study uses regular expressions for keyword matching. Additionally, Non-deterministic Finite Automaton (hereinafter referred to as NFA) [68] and Deterministic Finite Automaton (hereinafter referred to as DFA) [69] are the main methods of solving the regular expression matching problem. The NFA matches the corresponding documents based on expressions (Regex-Directed), while the DFA matches the corresponding regular expressions by text (Text-Directed). The transfer function of NFA may determine multiple successor states, and its processing speed is slow. Additionally, the rule parsing when matching keywords consumes a great deal of memory, seriously affecting the efficiency of the algorithm. The transfer function of DFA determines only the unique successor state, without considering alternative paths and backtracking process. Compared to NFA, it reduces the state transfer time, improves the keyword matching speed, and has a significant advantage in matching performance. Accordingly, it is applicable to the matching of point symbol recognition results with POI data.

The association of map point symbols with POIs is accomplished primarily through two steps: candidate keywords selection and keyword matching. The selection of candidate keywords is not yet well-defined, and the application of filtering rules could result in omission. Therefore, descriptive texts that appear more frequently and can express such point symbols are selected manually as candidate words. There is no limit to the number of selected words, and efforts are made to completely include all words associated with these symbols in the POI. This study performs keyword matching using the DFA-based regular expression algorithm. DFA transforms from one state into another via a series of events, i.e., state → event → state. The principle of DFA is that any element in a finite set has two states, continue and end, according to the DFA principle. The purpose of this is to retrieve elements sequentially until they reach the end state. The principal steps involved in DFA-based keyword matching are: First, construct the NFA, which serves as the basis for DFA construction. Then, the NFA engine employs a so-called “greedy” matching backtracking algorithm that accepts the first match after testing all possible extensions of the regular expression in a specified order. NFA next accesses the exact same state multiple times, that is, it repeatedly backtracks when encountering unmatched text and repeats the matching process until it succeeds or all texts fail to match. NFA has the advantage of constructing specific extensions of regular expressions that capture sub-expression matches and matching back-references. When NFA traverses the text to be matched, a collection of active states is activated for the determined states and inputs. DFA is a single-value mapping, i.e., for a certain state, there is only one state transfer for each input. Therefore, DFA defines a unique definite state on this state set, and the current state of DFA corresponds to the set of current active states by NFA. When DFA performs matching, it begins with the initial state, sequentially reads the text to be matched, and performs character jumping in order to match candidate keywords. At the end of each jump, it determines whether or not the current state is accepted. If the state is accepted, the match is successful; otherwise, the state transition continues until all matches are completed. Finally, all matching information is output. The DFA-based keyword matching rules are depicted in Algorithm 1.

Algorithm 1: DFA-based keyword matching rules.

Input: Candidate Keywords A =

{a_{1}, a_{2}, \dots, a_{n}}

, POI to Be Matched T

Step 1: Initialize the result set R;

Step 2: Construct NFA;

Step 3: NFA traverses the item T to be matched, and activates the active state set;

Step 4: Read the characters in T and jump, match all keywords in A, and check whether
the status is accepted;

Step 5: If the match is successful, the result is stored in R₁, and the next text match
is performed;

Step 6: All matches are completed, and the set R is returned.

Output: Result Set R

3. Experiments

This study compares the algorithm of this study to other algorithms using the labeled dataset. The recognition effect of map point symbols is tested to evaluate the efficiency of the algorithm proposed in this study. By matching the crawled POI with the map point symbol, it is configured in the map. This section presents the experimental data, the map point symbols machine recognition, and the map point symbols automatic localization configuration.

3.1. Experimental Data

3.1.1. Point Symbols Sample Dataset

In this study, three different styles of Chinese provincial atlases are examined, and the same type of point symbols do not necessarily indicate exactly the same information. Point symbols were obtained using a scanner and marked to the standard format of the PASCAL VOC [70] target detection data set. LabelImg is used to mark the outer frame of the target symbols. There are a total of 6675 pictures, 12 types of symbols commonly used, and the number of mark symbols for each image varies between 1 and 11. The point symbol names labeled are school, hospital, market, government sector, bus station, mountain, post office, port, edifice, bank, PSB, and hotel. Figure 4 illustrates the number of labeled point symbols for each type. The XML file generated after labeling contains the labeled symbol category (category name), as well as screen coordinates (the coordinates of the upper left corner of the labeled box (

X_{m i n}

_,

Y_{m i n}

) and the coordinates of the lower right corner (

X_{m a x}

,

Y_{m a x}

)).

The dataset is divided into the training set, validation set and test set. There are 15,484 labeled boxes in the dataset. Table 1 displays the precise subdivision of the dataset. In the process of target detection, the phenomenon of overfitting [71,72] frequently occurs, in which the recognition effect is good on the training set, while the performance is poor on the test set. The main reason for this situation devloping is that the training samples are small or there is noise interference. The map point symbols are regular shapes with specific styles and are less affected by background information, and less noise aids point symbol feature extraction to reduce recognition errors. At the same time, the data volume of this study is large, and each type of symbol in the data set is labeled in large batches. The selected samples are sufficiently representative, and the sample labeled boxes completely cover the target symbols and are accurately labeled. The large amount of data is the main means to avoid the overfitting of the model. To prevent errors in point symbol recognition due to symbol deformation during map scanning, and to increase data diversity and improve model generalization and robustness, the amount of data is boosted via online data enhancement [73,74,75] through image scaling, flipping, and color gamut distortion. The sample data are not identical for each training session to reduce the probability of overfitting. This is used as the dataset for point symbol recognition, and the data enhancement only generates new enhanced images during training without increasing the amount of data, thus reducing the storage space.

3.1.2. POI Dataset

Using crawler technology to obtain POI data for the research on the Auto Navi Map, the area selected for this is Longzihu College Park, Jinshui District, Zhengzhou City, Henan Province. Since this area contains information on numerous schools and business districts involving many fields such as commerce, transportation, health care, education, etc., and the corresponding map point symbols are rich, this area is used to study the configuration of point symbols. The acquired fields include name, id, type, typecode, pname, cityname, adname, address, pcode, citycode, adcode, x and y. Since there are too many types of POI data obtained, and the majority of them lack a specific classification meaning for the configured point symbols, only the POI types related to the recognized point symbols are classified, as reported in Table 2.

We cleanse the acquired POI (processing missing values, removing outliers and extreme values, correcting, checking, etc.), and filter out POIs with complete attribute information. There are 615 valid POI data in 9 categories that are associated with the recognized point symbols in the crawled area. The GCJ-02 is converted into the World Geodetic System 1984 (WGS84) by defining the projection.

3.2. Map Point Symbols Machine Recognition

3.2.1. Evaluation Criteria

In this study, the performance of the algorithm was evaluated quantitatively using 3 metrics: recall, precision, and mean Average Precision (hereinafter referred to as mAP) [72]. The following formulas are used to calculate recall and precision:

Recall = \frac{TP}{TP + FN} \times 100 %,

(1)

Precision = \frac{TP}{TP + FP} \times 100 %,

(2)

where TP represents the number of Ground Truth boxes detected when the confidence level is greater than the specified threshold; FP represents the number of Ground Truth boxes detected when the confidence level is less than the specified threshold; and FN represents the number of Ground Truth boxes not detected.

The mAP is obtained on the basis of recall and precision and is the mean value of the target Average Precision (hereinafter referred to as AP) of all categories. AP is the area enclosed under the precision–recall curve, and the larger the area shown, the higher the accuracy. Combining these two parameters to measure network performance, the mAP is calculated as:

mAP = \frac{\sum AP}{N},

(3)

where AP represents the average recognition accuracy of each type of target, and N represents the total number of recognized categories.

3.2.2. Model Comparison and Experimental Analysis

In order to verify the efficacy of the proposed YOLOv3 algorithm with the CBAM attention mechanism, this study conducts ablation experiments. Additionally, the effect of the CBAM attention mechanism on the performance of the object detection algorithm is evaluated in the same experimental environment. In the experiment, Python3.6.4 (Guido van Rossum, Dutch), Pytorch1.7.0 (Facebook AI Research, CA, USA), and Cuda11.6 (NVIDIA, CA, USA) are used on a Ubuntu18.04.6 (Mark Shuttleworth, South Africa) system. The model runs on an NVIDIA GeForce RTX 3060 GPU (NVIDIA, CA, USA), and the training results for 100 times are depicted in Figure 5 and Table 3.

The experimental plots of precision, recall rate, and mAP values for the original YOLOv3 algorithm and the proposed algorithm in this study are depicted in Figure 5, and the values of each metric are shown in Table 3. Compared with the original algorithm, the mAP value of the proposed method is improved, and precision and recall are also superior to the original YOLOv3 algorithm. It can also be seen from Table 3 that the mAP value of the proposed method is 0.55% higher than that of the original algorithm, reaching 99.36%, and precision and recall also increase by 0.36% and 0.63%, respectively. The superiority of each parameter of the algorithm proposed in this study compared with the original algorithm demonstrates that the algorithm can effectively improve the recognition of map point symbols.

To highlight the advantages of the algorithm presented in this study for the recognition of point symbols, the proposed method is compared with other mainstream deep learning networks, including SSD models and Faster RCNN models. The SSD uses the lightweight network MobileNetV2, which is small in size, reduces the amount of computation, and saves time and cost. Faster RCNN operates VGG16 with a deeper network, which has good generalization performance and aids in learning target features. The experimental results are presented in Table 4.

It can be seen from Table 4 that the transmission rate of the proposed method falls between Faster RCNN and SSD. As a one-stage algorithm, it is more real-time-oriented than the two-stage algorithm, i.e., Faster RCNN. However, there are significant gap compared to the SSD algorithm for lightweight networks. Furthermore, the method proposed in this study has a significant advantage over other algorithms in terms of accuracy and mAP value. Although the recall rate is 0.22% lower than that of Faster RCNN, 99.72% is still achieved. The precision of Faster RCNN is only 91.29%, which is 5.77% less than the method proposed in this study. Additionally, the method proposed in this study demonstrates significant superiority over SSD in terms of precision, recall and mAP value. In contrast, the algorithm presented in this study offers more advantages, is more effective, and is more suitable for recognizing map point symbols.

3.2.3. Visualization Results

The receptive field in deep learning represents the receptive range of different neurons in a network to an image. When the value of the receptive field is immense, an extensive image range can be accessed, and features with higher semantic levels can be obtained. However, on the contrary, the included features tend to be more local and detailed. For a map of the same scale, the distance between the human eye and the map will produce a different visual perception range, and so the target size will change and produce different map reading effects. Similarly, the visible perception changes produced by the computer are influenced by the visual distance and affect the recognition results. This study employs four models to recognize maps with different styles. In this subsection, from a qualitative perspective, four models are utilized to recognize maps with different styles, and the recognition effects of different size targets affected by visual distance at the same scale are illustrated in Figure 6 and Figure 7.

It can be seen from Figure 6 that model (b) outperforms other models in recognizing point symbols within the target area and can completely detect the symbols to be recognized. In contrast, both (a) and (c) have false detections, such as missed detections, position shifts, and repeated detections. Additionally, (d) does not detect symbols at this visual distance and has the worst effect.

It can be seen from Figure 7 that when recognizing large and clear target symbols, (a,b) correctly recognize all symbols within the target detection area with an accurate position detection. Compared to the deviation of the symbol position in (c) (e.g., the symbol position of “Lankai Hotel” with a large deviation) and the symbol misdetection in (d) (the symbol indicating “City Eight Hospital”), (a,b) has a superior detection effect at this visual distance. That is, (a,b) is better at this visual distance.

Overall, in these two different styles and different visual distances of map point symbol recognition, the proposed method has obvious advantages over other algorithms, regardless of the size of the target symbol. Since our proposed method adds the CBAM attention module, it integrates the SAM attention module and the CAM attention module so that the feature extraction network pays more attention to the characteristics of the target itself, ignores unimportant information, and improves the global feature extraction ability, thereby strengthening small target recognition. In general, our proposed method has high accuracy in map point symbols recognition, and the point symbols identified by this method can be used for subsequent symbol configuration.

3.3. Map Point Symbols Machine Automatic Localization Configuration

3.3.1. Matching of Point Symbols to POI

The automatic configuration of map point symbols is an essential step to realizing the intelligence of cartography. By converting the information on paper maps or digital maps into a series of attribute values that computers can store and manipulate, it provides data support for operations such as spatial analysis.

The AutoNavi API interface is used to crawl the POI in Longzihu College Park, Jinshui District, Zhengzhou City, Henan Province. The crawled data contains not only the types associated with the learned point symbols, but also a large number of data types without semantic information. These data cannot all be associated with point symbols and configured on the map; therefore, keyword classification matching is required to filter relevant data and combine with point symbols to complete the configuration.

When performing keyword matching, the matching field is determined by the degree of association with the symbol name, such as {name: Boxue Road Police Station; id: B0FFG9W5HA; type: government agencies and social groups; public security agencies; public security police; typecode: 130501; pname: Henan Province; cityname: Zhengzhou; adname: Jinshui District; address: 200 m east of the intersection of Jinshui East Road and Boxue Road; pcode: 410000; citycode: 371; adcode: 410105; x: 113.802418; y: 34.775305} or { name: NEW ZEALAND; id: B0FFGG7SP4; type: commercial residence; building; commercial office building; typecode: 120201;pname: Henan Province; cityname: Zhengzhou; adname: Jinshui District; address: Northeast of Huxin Ring Road and Huxin 2nd Road angular; pcode: 410000; citycode: 371; adcode: 410105; x: 113.803559; y: 34.796053}, two POI data. Extracting keywords only from the “name” field cannot cover all types of POIs, and some may be missed. The “type” field indicates the type of each POI. To match, you can select one or more keywords associated with point symbols from this field. In maps of different types, the symbols are different; in maps with different mapping styles but the same types, the symbols are not exactly identical. Therefore, based on the recognized point symbols, one can select various types of symbols with a common style to configure the map, resulting in a uniform and beautiful cartographic effect. The selected keywords and symbol styles are reported in Table 5.

Regular expressions are used to filter the pre-defined characters, select the above keywords as the matching objects, and match the keywords based on DFA. Candidate keywords include letters, numbers, Chinese characters. There is no length requirement for candidate keywords, but excessively long or multiple keywords may affect the matching efficiency. Since each type of symbol corresponds to one or more keywords, there is a one-to-one or many-to-one relationship between POIs that match the keywords. Each POI matches all keywords in the template one by one to determine the category to which the POI belongs. All matched POI data are classified, and POI attributes are appended to the semantics of the corresponding category point symbols, thereby completing the matching between dotted symbols and POIs.

3.3.2. Point Symbols Machine Automatic Positioning Configuration

The city map reflects the city’s basic appearance and various facilities with the city as the scope. The accuracy and aesthetics of the point symbols configuration in the city map affect the accuracy and readability of the map. Therefore, this study realizes the automatic configuration of point symbols based on recognition. Since the candidate keywords are associated with point symbols, the successfully matched POIs are of the same type as the point symbols associated with the candidate words. Additionally, POI attribute information (name, address, geographic coordinates, city zip code, and other information in POI data) is added to the semantics of these symbols. Thus, the point symbols have the semantic information of the recognition result as well as the attribute information of the matched POI, realizing the mutual mapping between the POI and point symbol [76,77]. Automatic positioning is performed by the matching of geographic coordinates, the matching of the corresponding symbol styles by symbol categories, and displaying the attribute information contained in the POI, thus completing the automatic location configuration of point symbols. This study develops a script through Visual Studio Code, specifies the access path of for Cesium, and completes the configuration point symbols in combination with the Map World service. Loading the Map World on Cesium solves the problem of geographic coordinate registration, and the images and annotations that come with the Map World can enrich the map display effect after the point symbols configuration.

Figure 8 depicts the result of the point symbol configuration on the Map World image and the vector base map. Since the selected point symbols are regular shapes, the symbol styles are displayed on the geographic coordinates of the POI data corresponding to the center point. In practical applications, in order to emphasize certain types of symbols or to represent hierarchical features, it is common for symbols of different categories to be of varying sizes. This study sets uniform symbol sizes for point symbols but permits variable-sized symbol configurations. Symbol sizes of the same type can also be set independently to express various features. As depicted in Figure 9, the attribute information is stored in the symbol of the corresponding type by loading the corresponding fields: name, type, address, pname, cityname, adname, pcode, citycode and adcode. After this, the attribute information query is accomplished. This method is used to configure map point symbols, which fully satisfies the real-time configuration of numerous recognized point symbols during zooming in, zooming out, and roaming. Because it utilizes latitude and longitude coordinates to locate point symbols, it can be used in any regional map point symbols configurations. Moreover, if more POI data are acquired, additional data can be added via coordinate information to enrich the configured map. Since POI data can be obtained in a variety of types according to the POI classification code list, it is possible to create thematic maps with only a single or more symbol types. It can visually display the spatial geographic distribution, quantity, and spatial structure of the mapped objects, thereby completing their graphical representation.

4. Conclusions

For efficient recognition and configuration of point symbols in maps, this study proposes a two-step mapping method that intelligently recognizes point symbols and then configures them. On the basis of the YOLOv3 algorithm, on one hand, the CBAM attention mechanism is added to the network model initially. The CBAM module does not have a large number of convolutions and has a small number of pooling layers and feature fusion structures. It prevents a large number of calculations caused by convolution multiplication, reduces the complexity of the module, and decreases the amount of calculation. At the same time, two analysis dimensions, spatial attention and channel attention, are introduced in order to realize the sequential attention structure from channel to space. Hence, the algorithm is optimized so that the network pays greater attention to the target details, suppresses irrelevant information, effectively extracts the target features, and avoids the problem of overfitting. On the other hand, the data set is expanded through data enhancement to ensure sufficient algorithm training data, thereby enhancing the accuracy of symbol recognition and providing data support for subsequent symbol configuration. Compared to the original algorithm, the algorithm proposed in this study proved to be more accurate at recognizing map point symbols, and the mAP was improved by 0.55%. In comparison to the classic SSD algorithm and the Faster RCNN algorithm, the precision, recall, and mAP values of this algorithm are excellent, reaching 97.06%, 99.72%, and 99.50%, respectively. The map point symbols complete a recognition-based automatic configuration. The point symbols recognition result includes symbol style and semantic information. It combines with POI to obtain spatial location information to realize automatic positioning and configuration of point symbols. The keyword matching is performed by selecting the keywords and the corresponding field of the POI. Since the chosen keywords are associated with point symbols, a successful match is considered to be of the same type as the associated point symbol. The POI attribute information is added to the corresponding point symbols’ semantic information to complete the matching of each category of point symbol. The matched point symbols possess semantic information regarding symbol recognition as well as rich attribute information in POIs. As a result, the matched geographic coordinates are positioned on the Map World and loaded with corresponding attributes to complete the automatic configuration of point symbols. The configured point symbols can realize the attribute information query function. It is demonstrated through experiments that map point symbols can be effectively recognized using deep learning methods. Moreover, through keyword matching and POI association, a good point symbol map configuration effect is obtained.

When map-making, people use abstract graphics and symbols to represent the real objective world, reflecting the cognitive consensus of the cartographer; when users read a map, they read the map symbols to form their individual cognition of the objective world. Various cartographic groups may design different map symbols for the same geographical object, but this does not affect the public’s comprehension of the meaning of the symbols. In this sense, map language is a universal language that transcends national boundaries and geographic regions. Beginning with visual variables, which are variations in graphic or color factors that can cause visual differences between symbols, this can directly lead to differences in symbol design. In terms of basic elements, such as shape, size, and color, there is a high degree of cognitive consensus between Chinese and foreign scholars. Nevertheless, differences do exist in the understanding and application, and thus the map symbols made from different combinations of visual variables differ, so that the representation of similar feature point symbols varies from country to country. This study investigates the recognition and configuration of point symbols in Chinese maps and proposes the application of the YOLOv3 algorithm to recognize point map symbols and associate with POI to obtain coordinates in order to complete point symbol automatic positioning configuration. When recognizing map point symbols, the recognition effect has a significant correlation with the annotated dataset. The neural network extracts the features of the labeled symbols in the dataset and predicts the location and category of the given candidate regions. For other unlabeled point symbols, the machine has not been trained to obtain the feature information and therefore cannot recognize them. The technical ideas and methods of this study can be extended to other national maps for point symbol recognition and configuration. By making different datasets, the neural network learns the features of various types of symbols in the dataset. These can then be applied to the map to be recognized, and the configuration task can be accomplished through the relationship with the POI. However, this study has a number of limitations. The point symbols selected in this paper are regular shapes, whose geometric centers corresponds to the geographic coordinates configured on Map World. However, not all point symbols take the geometric center as their locating point, and the influence of the locating point of the map symbol on the configuration effect has not been considered. This study adjusts the size of the symbol according to its visual effects. Nonetheless, symbol overlap will occur when the symbols size of the two or more objects with similar distances are excessively large. This study has not considered this situation and is merely an attempt to implement the method proposed. In a summary, this study has conducted research on the intelligent identification and efficient configuration of map point symbols and provides a summary of the findings. This study is a valuable attempt to automate map production; however, additional in-depth research is required to improve the efficiency of intelligent identification and configuration of map point symbols.

Author Contributions

Conceptualization, Huili Zhang; methodology, Huili Zhang, Ge Zhu and Xiaowen Zhou; software, Huan Li and Ge Zhu; validation, Huili Zhang and Huan Li; formal analysis, Xiaowen Zhou; writing—original draft, Huili Zhang and Xiaowen Zhou; writing—review and editing, Huili Zhang and Hongwei Li; visualization; Huan Li and Ge Zhu; supervision, Huan Li and Hongwei Li. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by “Theory and Method of Map and Spatial Cognition under Human-Machine-Environment Collaboration” of High-level Talents Research Project of Zhengzhou University. (Zhengzhou university, grant number: 135-32310276); “Research on Machine Map Theory and Modeling Methods” of Key Program of National Natural Science Foundation of China. (National Natural Science Foundation of China, grant number: 42130112).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The first author may provide all data supporting the findings of this study upon reasonable request.

Acknowledgments

We sincerely thank each anonymous reviewer who provided comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, S.; Chen, Y.; Zhou, D. Symbolic representation on geographic concepts and their mutual relationships. In Geoinformatics 2006: Geospatial Information Science; SPIE: Bellingham, WA, USA, 2006. [Google Scholar] [CrossRef]
Ahmed, M.; Ward, R. An expert system for general symbol recognition. Pattern Recognit. 2000, 33, 1975–1988. [Google Scholar] [CrossRef]
Zaidi, S.; Ansari, M.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digit. Signal Process. 2022, 126, 103514. [Google Scholar] [CrossRef]
Chen, G.; Tan, X.; Guo, B.; Zhu, K.; Liao, P.; Wang, T.; Wang, Q.; Zhang, X. SDFCNv2: An Improved FCN Framework for Remote Sensing Images Semantic Segmentation. Remote Sens. 2021, 13, 4902. [Google Scholar] [CrossRef]
Song, S.; Yu, H.; Miao, Z.; Zhang, Q.; Lin, Y.; Wang, S. Domain Adaptation for Convolutional Neural Networks-Based Remote Sensing Scene Classification. Geosci. Remote Sens. Lett. IEEE 2019, 16, 1324–1328. [Google Scholar] [CrossRef]
Simistira, F.; Papavassiliou, V.; Katsouros, V.; Carayannis, G. A System for Recognition of On-Line Handwritten Mathematical Expressions. In Proceedings of the International Conference on Frontiers in Handwriting Recognition, Bari, Italy, 18–20 September 2012; pp. 193–198. [Google Scholar] [CrossRef]
Cao, C.; Zheng, J.; Huang, Y. A Special Symbol Recognition and Location Algorithm Based on Muti-Template Matching. Comput. Appl. Softw. 2021, 38, 175–180. [Google Scholar] [CrossRef]
Sadahiro, Y. A Statistical Method for Determining the Size of Map Labels. Theory Appl. GIS 2009, 3, 33–44. [Google Scholar] [CrossRef] [Green Version]
Aly, W.; Uchida, S.; Fujiyoshi, A.; Suzuki, M. Statistical Classification of Spatial Relationships among Mathematical Symbols. In Proceedings of the International Conference on Document Analysis & Recognition, Bari, Italy, 18–20 September 2012; pp. 1350–1354. [Google Scholar] [CrossRef] [Green Version]
Bi, J.; Tian, L.; Zhang, G. Intelligent Recognition of Map Point Symbols Based on Cognitive Theory. Hydrogr. Surv. Charting 2016, 36, 65–67. [Google Scholar] [CrossRef]
Sun, C.; Shi, K.L.; Yong, J.H. Algorithm for Recognizing Symbols from Vector Engineering Drawings Based on a Two-Layer Structure. J. Comput. Aided Des. Comput. Graph. 2017, 29, 2171–2179. [Google Scholar] [CrossRef]
Datta, R.; Mandal, P.; Chanda, B. Detection and identification of logic gates from document images using mathematical morphology. In Proceedings of the Computer Vision, Pattern Recognition, Image Processing & Graphics, Patna, India, 16–19 December 2015. [Google Scholar] [CrossRef]
Ullah, I.; Lee, H.J. An Approach of Locating Korean Vehicle License Plate Based on Mathematical Morphology and Geometrical Features. In Proceedings of the International Conference on Computational Science & Computational Intelligence, Las Vegas, NV, USA, 15–17 December 2016; pp. 836–840. [Google Scholar] [CrossRef]
Liu, Y.; Liu, G.; Zheng, Z. Application of Mathematical Morphology in Airfield Target Recognition. J. Proj. Rocket. Missiles Guid. 2005, 25, 66–68. [Google Scholar] [CrossRef]
Yun, D.Y.; Seo, S.K.; Zahid, U.; Lee, C.J. Deep Neural Network for Automatic Image Recognition of Engineering Diagrams. Appl. Sci. 2020, 10, 4005. [Google Scholar] [CrossRef]
Quan, Y.; Shi, Y.; Miao, Q.; Qi, Y. A Combinatorial Solution to Point Symbol Recognition. Sensors 2018, 18, 3403. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Guo, M.; Bei, W.; Huang, Y.; Chen, Z.; Zhao, X. Deep learning framework for geological symbol detection on geological maps. Comput. Geosci. 2021, 157, 104943. [Google Scholar] [CrossRef]
Zhang, Y.; Cai, J.; Cai, H. CNN-Based Symbol Recognition in Piping Drawings. In Construction Research Congress 2020; American Society of Civil Engineers: Reston, VA, USA, 2020. [Google Scholar] [CrossRef]
Hou, X. A Sketch Recognition Algorithm Based on Bayesian Network and Convolution Neural Network. J. Jilin Univ. Inf. Sci. Ed. 2019, 23, 261–267. [Google Scholar] [CrossRef]
Wang, X.; Zhang, P.; Zhao, Q.; Pan, J.; Yan, Y. Improved End-to-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods. IEICE Trans. Inf. Syst. 2016, 99, 2550–2553. [Google Scholar] [CrossRef]
Riba, P.; Dutta, A.; Llados, J.; Fornes, A. Graph-based deep learning for graphics classification. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017), Kyoto, Japan, 9–15 November 2017; Volume 2, pp. 29–30. [Google Scholar] [CrossRef] [Green Version]
Zhou, X.; Li, D.; Xue; Wang, Y.; Shao, Z. GeoAI Framework of Intelligent Recognition for Ubiquitous Map Imagery:Current State and Prospect. Geomat. Inf. Sci. Wuhan Univ. 2022, 47, 1–10. [Google Scholar] [CrossRef]
Li, J.; Huang, S. YOLOv₃ Based Object Tracking Method. Electron. Opt. Control 2019, 26, 87–93. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Lin, F.; Zheng, X.; Wu, Q. Small object detection in aerial view based on improved YoloV3 neural network. In Proceedings of the 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), Dalian, China, 25–27 August 2020; pp. 522–525. [Google Scholar] [CrossRef]
Luo, J.; Huang, J.; Bai, X. Road Small Target Detection Method Based on Improved YOLOv3. J. Chin. Comput. Syst. 2022, 43, 449–455. [Google Scholar] [CrossRef]
Yan, R.; Ren, F.; Yang, Y.; Tao, T.T. Automatic Configuration Method of Map Symbol from ArcGIS to CorelDRAW. J. Geomat. 2017, 42, 69–73. [Google Scholar] [CrossRef]
Yang, Q. Map symbolization of basic geographic information database under ArcGIS software platform. Sci. Tech. Inf. Gansu 2016, 45, 22–24+50. [Google Scholar] [CrossRef]
Bartonek, D.; Andelova, P. Method for Cartographic Symbols Creation in Connection with Map Series Digitization. ISPRS Int. J. Geo Inf. 2022, 11, 105. [Google Scholar] [CrossRef]
Cao, Z.; Zhao, S.; Yao, Z.; Chen, W. Automatic Military One-point Located Symbols Placement Based on the Genetic Algorithm. In Proceedings of the 2010 International Conference on Computational Intelligence and Vehicular System (CIVS2010), Shanghai, China, 24–26 April 2015; pp. 160–164. [Google Scholar] [CrossRef]
Ware, J.; Jones, C.; Thomas, N. Automated map generalization with multiple operators: A simulated annealing approach. Int. J. Geogr. Inf. Sci. 2003, 17, 743–769. [Google Scholar] [CrossRef]
Yang, T. PSO-SA based annotation configuration for highly dense targets. Electron. Des. Eng. 2020, 28, 182–187. [Google Scholar] [CrossRef]
Huang, H.; Guo, Q.; Sun, Y.; Liu, Y. Reducing Building Conflicts in Map Generalization with an Improved PSO Algorithm. ISPRS Int. J. Geo Inf. 2017, 6, 127. [Google Scholar] [CrossRef] [Green Version]
Li, L.; Yu, Z.H.; Zhu, H.H.; Kuai, X. Handling Graphic Conflicts between Cartographic Features:Exemplifying Geolinear Features(Road, River and Boundary). Acta Geod. Cartogr. Sin. 2015, 44, 563–569. [Google Scholar] [CrossRef]
Liu, K.; Yin, L.; Lu, F.; Mou, N. Visualizing and exploring POI configurations of urban regions on POI-type semantic space. Cities 2020, 99, 102610. [Google Scholar] [CrossRef]
Zhang, Z.; Zou, C.; Ding, R.; Chen, Z. VCG: Exploiting visual contents and geographical influence for Point-of-Interest recommendation. Neurocomputing 2019, 357, 53–65. [Google Scholar] [CrossRef]
Tian, J.P.; You, X.; Jia, F.L.; Xia, Q. Cognitive Semantic Analysis and Dynamic Generation of Cartographic Symbols. Acta Geod. Cartogr. Sin. 2017, 46, 928–938. [Google Scholar] [CrossRef]
Cao, Y.; Jiang, N.; Zahng, Y.; Zhang, X. Constitution Variables and Generation Modes of Electronic Map Symbols. Acta Geod. Cartogr. Sin. 2012, 41, 784–790. [Google Scholar]
Yuan, L.; Uttal, D. Analogy Lays the Foundation for Two Crucial Aspects of Symbolic Development: Intention and Correspondence. Topics in Cognitive Science 2017, 9, 738–757. [Google Scholar] [CrossRef] [Green Version]
Ma, C.Y.; Liu, Y.L. Map Visual Art Oriented by Structuralism and Deconstruction of Symbol Philosophy. Geomat. Inf. Sci. Wuhan Univ. 2006, 31, 552–556. [Google Scholar] [CrossRef]
Xia, X.; Ye, Y.; Pi, L. Study and Thinking of the Development of Chinese Modern City Maps. J. Geo Inf. Sci. 2016, 18, 77–87. [Google Scholar] [CrossRef]
Mao, J.; Zhang, X.; Ji, Y.; Zhang, Z.; Guo, Z. Improved High Precision Aircraft Target Detection Method of YOLT. J. Phys. Conf. Ser. 2021, 1955, 12027–12028. [Google Scholar] [CrossRef]
Ye, K.; Fang, Z.; Huang, X.; Ma, X.; Ji, J.; Xie, Y. Research on small target detection algorithm based on improved yolov3. In Proceedings of the 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, China, 25–27 December 2020; pp. 1467–1470. [Google Scholar] [CrossRef]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo Algorithm Developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef] [Green Version]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; Han, J.H.; Yong, W.K.; Moon, Y.S. A New Architecture of Feature Pyramid Network for Object Detection. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 1224–1228. [Google Scholar] [CrossRef]
Lin, T.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef] [Green Version]
Li, X.; Fu, C.; Li, X.; Wang, Z. Improved Faster R-CNN for Multi-Scale Object Detection. J. Comput. Aided Des. Comput. Graph. 2019, 31, 1095–1101. [Google Scholar] [CrossRef]
Li, K.; Wang, X.; Lin, H.; Li, L.; Yang, Y.; Meng, C.; Gao, J. Survey of One-Stage Small Object Detection Methods in Deep Learning. J. Front. Comput. Sci. Technol. 2022, 16, 41–58. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A. SSD: Single Shot MultiBox Detector; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9905, pp. 21–37. [Google Scholar] [CrossRef] [Green Version]
Wu, J.; Sun, Y.; Tang, G.; Xu, X. Analyses of Time Efficiency and Speed-ups in Inference Process of Two-Stage Object Detection Algorithms. In Proceedings of the 2018 IEEE 4th International Conference on Computer and Communications (ICCC), Chengdu, China, 7–10 December 2018; pp. 1498–1502. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
Niu, Z.; Zhong, G.; Yu, H. A Review on the Attention Mechanism of Deep Learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Guo, M.; Xu, T.; Liu, J.; Liu, Z.; Jiang, P.; Mu, T.; Zhang, S.; Martin, R.; Cheng, M.; Hu, S. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef] [Green Version]
Yan, J.; Peng, Z.; Yin, H.; Wang, J.; Wang, X.; Shen, Y.; Stechele, W.; Cremers, D. Trajectory prediction for intelligent vehicles using spatial-attention mechanism. IET Intell. Transp. Syst. 2020, 14, 1855–1863. [Google Scholar] [CrossRef]
Jia, H.; Wang, Y.; Cong, R.; Lin, Y. Neural Network Text Classification Algorithm Combining Self-Attention Mechanism. Comput. Appl. Softw. 2020, 37, 200–206. [Google Scholar] [CrossRef]
Li, H.; Wu, X.J.; Durrani, T. NestFuse: An Infrared and Visible Image Fusion Architecture based on Nest Connection and Spatial/Channel Attention Models. IEEE Trans. Instrum. Meas. 2020, 69, 9645–9656. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11211, pp. 3–19. [Google Scholar] [CrossRef] [Green Version]
Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef] [PubMed]
Sheng, F.; Yin, Y.; Qin, C.; Zhang, K. Research and Implementation Based on Transcendental Function Coprocessor Sigmoid Function. Microelectron. Comput. 2018, 35, 11–14. [Google Scholar] [CrossRef]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
Zhang, J.; Zhu, Y. A method based on graphic entity for visualizing complex map symbols on the web. Cartogr. Geogr. Inf. Sci. 2015, 42, 44–53. [Google Scholar] [CrossRef]
Zuo, X.; Nie, J. Algorithm of symbol generation and configuration of land polygons in present land-use map. Rans. Nonferrous Met. Soc. China 2011, 21, 743–747. [Google Scholar] [CrossRef]
Li, K.; Lan, J. Efficient Unfixed Keywords Matching Algorithm Based on TCAM. Comput. Eng. 2012, 38, 269–271. [Google Scholar] [CrossRef]
Fu, Z.; Li, J. Survey on high performance regular expression matching algorithms. Comput. Eng. Appl. 2018, 54, 1–13. [Google Scholar] [CrossRef]
Zu, Y.; Yang, M.; Xu, Z.; Wang, L.; Tian, X.; Peng, K.; Dong, Q. GPU-based NFA Implementation for Memory Efficient High Speed Regular Expression Matching. ACM Sigplan Not. 2012, 47, 129–139. [Google Scholar] [CrossRef]
Ficara, D.; Giordano, S.; Procissi, G.; Vitucci, F.; Antichi, G.; Di Pietro, A. An Improved DFA for Fast Regular Expression Matching. Acm Sigcomm Comput. Commun. Rev. 2008, 38, 31–40. [Google Scholar] [CrossRef] [Green Version]
Everingham, M.; Van Gool, L.; Williams, C.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
Bejani, M.; Ghatee, M. A systematic review on overfitting control in shallow and deep neural networks. Artif. Intell. Rev. 2021, 54, 6391–6438. [Google Scholar] [CrossRef]
Kim, H.C.; Jae, K.M. A comparison of methods to reduce overfitting in neural networks. Int. J. Adv. Smart Converg. 2020, 9, 173–178. [Google Scholar] [CrossRef]
Lin, C.; Shan, C.; Zhao, G.; Yang, Z.; Peng, J.; Chen, S.; Huang, R.; Li, Z.; Yi, X.; Du, J. Review of Image Data Augmentation in Computer Vision. J. Front. Comput. Sci. Technol. 2021, 15, 583–611. [Google Scholar] [CrossRef]
Li, G.; Yang, Y.; Qu, X.; Cao, D.; Li, K. A deep learning based image enhancement approach for autonomous driving at night. Knowl. Based Syst. 2020, 213, 106617. [Google Scholar] [CrossRef]
Dvornik, N.; Mairal, J.; Schmid, C. On the Importance of Visual Context for Data Augmentation in Scene Understanding. IEEE Trans. Pattern Anal. 2021, 43, 2014–2028. [Google Scholar] [CrossRef] [Green Version]
Katsumata, Y.; Taniguchi, A.; Hafi, L.E.; Hagiwara, Y.; Taniguchi, T. SpCoMapGAN: Spatial Concept Formation-based Semantic Mapping with Generative Adversarial Networks. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 7927–7934. [Google Scholar] [CrossRef]
Huang, L.; Fang, W.; Cui, Z. Ontology mapping model with uncertainty in semantic integration. Comput. Eng. Appl. 2009, 45, 140–144. [Google Scholar] [CrossRef]

Figure 1. YOLOv3 network structure.

Figure 2. Workflow diagram of CBAM module.

Figure 3. Map point symbols automatic positioning configuration process.

Figure 4. Point symbols dataset labeling plot.

Figure 5. Training results of YOLOv3 and the method proposed in this study:(a) precision comparison chart; (b) recall comparison chart; (c) mAP comparison chart.

Figure 6. Point Symbols Recognition Results of Four Models on the Target Smaller Map: (a–d) are the recognition results of YOLOv3, the proposed method, Faster RCNN, and SSD on smaller target maps, respectively.

Figure 7. Point Symbols Recognition Results of Four Models on the Target Bigger Map: (a–d) are the recognition results of YOLOv3, the proposed method, Faster RCNN, and SSD on another style city map and a more prominent target map, respectively.

Figure 8. The rendering of the point symbols configuration on the Map World image and vector base map: (a) The configuration effect of the Map World image map; (b) The configuration effect of the Map World vector map.

Figure 9. Attribute Information Query.

Table 1. Experimental dataset partition.

Type	Training Set	Validation Set	Test Set	Labeled Boxes
Number	5406	601	668	15,484

Table 2. POI type division table.

Symbol Category	Type Description
bank	bank, 24 h self-service banking
edifice	building, office building
government sector	committee, service center, office
hospital	clinic, pharmacy, hospital, health service station
hotel	hotel, guesthouse, apartment
market	shopping center, shopping mall
post office	post office
PSB	police office, police station, security kiosk, public security bureau

Table 3. Comparison of ablation experiment results.

Model	Precision (%)	Recall (%)	mAP (%)
YOLOv3	96.70	99.09	98.95
YOLOv3 (CBAM)	97.06	99.72	99.50

Table 4. Comparison of detection results of different models.

Model	Backbone	Input	FPS	Precision (%)	Recall (%)	mAP (%)
YOLOv3 (CBAM)	Darknet53	416 × 416	7.92	97.06	99.72	99.50
Faster RCNN	VGG16	600 × 600	2.13	91.29	99.94	99.01
SSD	MobileNetV2	300 × 300	20.46	94.23	92.93	97.44

Table 5. Keywords and symbols styles associated with point symbols.

Point Symbols Name	Keywords	Styles
bank	bank, ATM
edifice	business office
government sector	relevant government agency, district/county/town/provincial municipal government and related unit
hospital	hospital, clinic, pharmacy
hotel	hotel, guesthouse, guest house
market	shopping mall
post office	post office
PSB	public security police, social security agency, fire department
school	institution of higher learning, secondary school, primary school

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Zhou, X.; Li, H.; Zhu, G.; Li, H. Machine Recognition of Map Point Symbols Based on YOLOv3 and Automatic Configuration Associated with POI. ISPRS Int. J. Geo-Inf. 2022, 11, 540. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi11110540

AMA Style

Zhang H, Zhou X, Li H, Zhu G, Li H. Machine Recognition of Map Point Symbols Based on YOLOv3 and Automatic Configuration Associated with POI. ISPRS International Journal of Geo-Information. 2022; 11(11):540. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi11110540

Chicago/Turabian Style

Zhang, Huili, Xiaowen Zhou, Huan Li, Ge Zhu, and Hongwei Li. 2022. "Machine Recognition of Map Point Symbols Based on YOLOv3 and Automatic Configuration Associated with POI" ISPRS International Journal of Geo-Information 11, no. 11: 540. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi11110540

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Recognition of Map Point Symbols Based on YOLOv3 and Automatic Configuration Associated with POI

Abstract

1. Introduction

2. Materials and Methods

2.1. YOLOv3 Network Structure for Machine Recognition of Map Point Symbols

2.2. Method to Improve the Accuracy of Map Point Symbols Recognition

2.3. Machine Automatic Positioning Configuration of Map Point Symbols

3. Experiments

3.1. Experimental Data

3.1.1. Point Symbols Sample Dataset

3.1.2. POI Dataset

3.2. Map Point Symbols Machine Recognition

3.2.1. Evaluation Criteria

3.2.2. Model Comparison and Experimental Analysis

3.2.3. Visualization Results

3.3. Map Point Symbols Machine Automatic Localization Configuration

3.3.1. Matching of Point Symbols to POI

3.3.2. Point Symbols Machine Automatic Positioning Configuration

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI