Next Article in Journal
A Spatiotemporal Dilated Convolutional Generative Network for Point-Of-Interest Recommendation
Next Article in Special Issue
Assessing Safety and Suitability of Old Trails for Hiking Using Ground and Drone Surveys
Previous Article in Journal
Vegetation Phenological Changes in Multiple Landforms and Responses to Climate Change
Previous Article in Special Issue
Spatiotemporal Analysis of Tourists and Residents in Shanghai Based on Location-Based Social Network’s Data from Weibo
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smart Tour Route Planning Algorithm Based on Naïve Bayes Interest Data Mining Machine Learning

1
Tourism Department, Leshan Vocational and Technical College, Leshan 614000, China
2
Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China
3
Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu 610041, China
4
Institute of Information Engineering, Zhengzhou University of Industrial Technology, Zhengzhou 451159, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2020, 9(2), 112; https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9020112
Submission received: 1 January 2020 / Revised: 8 February 2020 / Accepted: 17 February 2020 / Published: 19 February 2020
(This article belongs to the Special Issue Smart Tourism: A GIS-Based Approach)

Abstract

:
A smart tour route planning algorithm based on a Naïve Bayes interest data mining machine learning is brought forward in the paper, according to the problems of current tour route planning methods. A machine learning model of Naïve Bayes interest data mining is set up by learning a mass of training data on tourists’ interests and needs. Through the recommended interest tourist site classifications from the machine learning module, the optimal tourist site mining algorithm based on the membership degree searching propagating tree of a tourist’s temporary accommodation is set up, which mines and outputs the optimal tourist sites. The mined optimal tourist sites are taken as seed points to set up a tour route planning algorithm based on the optimal propagating tree of a closed-loop structure. Through the proposed algorithm, an experiment is designed and performed to output optimal tour routes conforming to tourists’ needs and interests, including the propagating tree closed-loop structures, a minimum heap of propagating tree weight function value, and a weight function value complete binary tree. We prove that the proposed algorithm has the features of intelligence and accuracy, and it can learn tourists’ needs and interests to output optimal tourist sites and tour routes and ensure that tourists can get the best motive benefits and travel experience in the tour process, by analyzing the experiment data and results.

Graphical Abstract

1. Introduction

Tour route planning is an important and indispensable content for smart tourism research and tourism geographic information system (GIS) development. Tourists are the most critical component of tourism activities, and they play a vital part in the considerable development and progress of smart tourism and tourism economies. The satisfaction degree of the motive benefits obtained in the whole tour process will directly influence a tourist’s subjective evaluations on a certain tourism city, as well as its tourist sites and tourism facilities and, thus, indirectly influence the subsequent tourists when making their travel schedule. Before visiting an unfamiliar city, tourists will initially make proper travel schedule according to their interests, time, cost budget, etc., in which the tour route is dispensable. A superior tour route will help tourists find the best motive benefits and travel experience [1,2,3].
In the development process of smart tourism and tourism GIS, embedding smart tour route planning and recommending function is an important way for tourism recommendation system to realize intelligence, the core technique is in the designing and developing of smart tour route planning algorithms. Traditional tour route planning depends on two methods, one is tourists making schedules by themselves, the other is tourists purchasing tour routes provided by a travel agency [4,5,6]. For the first method, under the condition of tourists’ being unfamiliar with the tourism city, they usually receive tourism information from websites, books, magazines, other tourist’s recommendations, etc. They then plan the trip according to the information that they obtain as well as their subjective needs.
For the second method, the tour routes planned by the travel agency usually contain the hottest, highest star-rated, and most visited tourist sites, but neglect the less popular, low star-rated, and least visited ones. These are unlikely to covers all of the tourist sites of the city. Moreover, in the tour route planned by a travel agency, one or more tourist sites may not be of interest for a particular set of tourists, but, they have to passively accept the route and pay for all of the tourist sites in order to join in the trip, which will decrease the satisfaction degree for the whole trip.
We analyze the traditional tour route planning methods and existing problems. We conclude that, firstly, it is necessary and important to provide tourists with optimal tourist sites of interest and tour routes to achieve the best motive benefits and travel experiences. Tourist interest should be considered to be the core factor in developing smart tourism recommendation systems, and it is also the principal condition for developing tour route planning algorithms. Secondly, providing tourists with optimal tourist sites and tour routes as well as tour decision support efficiently and in accordance with their needs and interests, is the aim of smart tourism development [7,8,9,10]. By setting up a tourist interest machine learning model based on tourism big data, individual needs and interests tendencies can be predicted and output, which is the front end for smart GIS or smart tourism recommendation systems. Thirdly, the issue of confirming the specific optimal tourist sites should be considered to ensure that each tourist site conforms to tourist interests on the basis of predicting needs and interests tendencies as well as tourist site interest classifications [11,12]. Meanwhile, tourist site geographic position and distribution, neighbourhood relationship with temporary accommodation, decreasing travel budget, and expenditure should be optimized. Fourthly, the mined optimal tourist sites are set as discrete seed points. Combining with objectively extant factors that will influence tourists’ motive benefits in the tour process, a tour route planning algorithm that is based on an optimal propagating tree closed-loop structure is designed. Through the algorithm, optimal tour routes, tour guide maps, and decision support services are all provided for tourists. This is the back end for a smart GIS or smart tourism recommendation system [13,14,15].
According to the developing algorithm and system, a smart tour route planning algorithm based on machine learning of Naïve Bayes interest data mining is brought forward in the paper. The first core of the algorithm is in building a tourist interest machine learning module by mining different feature attributes of sample tourists within tourism big data [16]. According to the critical information of tourists’ tour schedules, temporary accommodation locations, interest tendency output by the machine learning module, etc., a smart machine uses an optimal tourist searching and mining algorithm to output optimal tourist sites, which conform to tourists’ needs and interests within a nearest neighbourhood buffer. Subsequently, optimal tour routes, guide maps and decision supports are output for tourists. In the first step, by collecting a sufficiently large quantity of tourism big data, a smart machine quantifies the effective feature attributes and tourist site classifications [17,18,19].
In the system, a Naïve Bayes algorithm is used to mine the relationship model between tourist feature attributes and interest tendency. This relationship model is the front end machine learning module for the system, which realizes the function where a tourist inputs basic information and then the system outputs the interest tourist site classification and interest tendency [20]. In the second step, a tourist site of interest propagating tree model of cross-cluster neighbourhood buffer with a temporary accommodation clustering center is designed and built according to the information of tourists’ feature attributes, time schedule, budget and accommodation location, etc. A propagating tree algorithm is used to search each optimal tourist site for the optimal geographic distribution to conform to the tourists’ needs and interests in different tourist site clusters. We take the mined optimal tourist sites as the critical nodes in the tour route. The tour route planning algorithm is built with the starting point of temporary accommodation [21,22,23].
While considering that the trip in the whole tour route will be influenced by factors, like geographic information services, traffic information services, physical qualifications, tourist site attraction indexes, etc., a tour route planning algorithm should combine with these factors, as they are indispensable objective conditions in the trip process, and they conform to the tour reality [24,25,26]. The tour routes output by the smart machine have the following features: (1) all tourist site classifications and specific tourist sites conform to the tourist’s needs and interests; (2) all tourist sites are nearest to temporary accommodation, which demands the lowest expenditure; (3) temporary accommodations are the both starting point and terminal point of the whole trip, which conforms to the schedule; (4) the algorithm combines with factors that influence the motive benefits of the trip, which conforms to the tour reality; and, (5) the smart machine not only outputs optimal tour routes, but also outputs sub-optimal ones, guide maps, and decision supports. This mode is user-friendly, as tourists will be able to make the final decisions. The main research contents of the paper include the following sections.
  • The design and foundation of a Naïve Bayes interest data mining machine learning module. The machine learning module that is designed in the paper is aimed to mine tourist interest tendencies by training with a sufficient quantity of tourism interest data. Tourists provide basic information and then the smart machine will output the tourist site of interest classification distribution matrix with the interest tendency from the highest to lowest in the element order. According to the output matrix with tourist site classifications and tourist site quantity proportions, a tourist can choose one certain matrix row with elements according to their needs and interests.
  • The foundation of an optimal tourist site mining algorithm that is based on the membership degree searching propagating tree. According to the mining results and the matrix row elements that tourists select, the smart machine continues to mine optimal tourist sites with temporary accommodation clustering centers, which have the optimal geographic distribution within a neighbourhood buffer and can decrease the trip expenditure.
  • The algorithm modeling of tour route planning that is based on an optimal closed-loop structure. We set the mined optimal tourist sites as tour route nodes in the trip. Starting from the temporary accommodation, an integrated closed-loop tour route is built via passing all of the nodes and going back to the temporary accommodation. The modeling process applies the method of closed-loop structure iterating propagating tree motive weight function value to get the minimum heap R of function values and complete binary tree, and then find the optimal tour routes and sub-optimal tour routes. The algorithm combines with factors that influence the motive benefits in the tour process and conforms to the tour reality. The output tour routes and guide maps can be used directly in a tourist’s decision support in choosing proper tour route.
  • The performance of a sample experiment and analyzing of the data result. We take the tourism city Zhengzhou as our study area. We set the basic information provided by one tourist as an example to carry out the experiment, including testing the function of an interest mining machine learning module and output results, mining for optimal tourist sites, and iterating feasible tour routes. Finally, the data results are analyzed and concluded to testify the effectiveness and feasibility of the algorithm built in the paper.

2. The Design and Foundation of Naïve Bayes Interest Data Mining Machine Learning

The design thought for smart tour route recommendation systems is in training a sufficient quantity of easily obtained tourism interest big data and setting up a machine learning module to obtain tourists’ interest tendencies on tourist site classifications, and mining and recommending optimal tourist sites of interest according to their schedule. Thus, tourism big data mining is an important method for obtaining tourist needs and interests tendencies. An interest machine learning module forms the front end of smart tour route recommendation system. We apply a sufficient quantity of feature attributes, as training data and each training datum have a uniform feature label. Each group of feature attributes relates to one or more tourist site of interest classifications.
Applying a sufficient quantity of training data to build a machine learning module and predict the classification of unknown samples is the key method of machine learning. Naïve Bayes algorithms are a classical statistical classification method, which used in data predicting, classifying, and regressing [27,28,29,30]. It has a good performance and low error rate in independent data samples’ predicting and classifying. As each tourist is an independent individual and their relationship models between different tourist’s feature attributes and interest classifications are independent respectively, which matches the conditions of the Naïve Bayes algorithm [31,32,33]. Thus, in the study, the Naïve Bayes algorithm is used as a basic mode to build the interest machine learning module.

2.1. Machine Learning Module Design and Training Data Collecting

The foundation of the machine learning module is based on tourism interest big data. Text data that conforms to its function are collected and the valuable information is mined from the data. The valuable information is the training data that will be used in building the machine learning module. Through the process of data denoising, cleaning, integrating and grouping, etc., the training data is precisely processed. The valuable information data should be noted in text format and stored item by item in a database. Each item contains tourists’ feature attribute information, and each feature attribute relates to one or more elements in the tourist site classification vector. The final output result of the machine learning module is the predicted descending order basic vector with the tourist interest tendency elements from highest to lowest.
Definition 1.
Bayes formula. We set Ω as the sample space of experiment E . A is the event of E . B 1 , B 2 , …, B n is a partition of Ω , and P ( A ) > 0 , P ( B i ) > 0 , i ( 0 , n ] Z + . Formula (1) is called the Bayes formula. A single sample relates to Formula (2). The Bayes algorithm is based on the hypothetical prior probability. The observed probability of different data is given to calculate the posterior probability.
P ( B i | A ) = P ( A / B i ) P ( B i ) j = 1 n P ( A / B j ) P ( B j ) , i ( 0 , n ] Z +
P ( B | A ) = P ( A | B ) P ( B ) P ( A )
Definition 2.
Tourist feature attribute vector X . We group plentiful tourist feature attributes into different classifications and extract specific classification attributes for the sample training data. The vector that is composed by feature attributes is called the tourist feature attribute vector X .
Vector X describes the common features of tourists and it is the basic information of tourists. As for one tourist, the n dimension feature vector X = { x i | i ( 0 , n ] Z + } is used to describe the tourist’s n features x 1 , x 2 , …, x n . An arbitrary training sample’s tourist feature attribute vector X relates to one or more tourist site classifications. We set k as the total quantity of the obtained training samples.
Definition 3.
Tourist site classification vector C . According to tourist site characteristics and features, the main intentions of tourists visiting tourist sites and the actual situations of visitors can be used to group city tourist sites into m classifications. The m dimension of a vector that is composed by m tourist site classifications is called the tourist site classification vector C .
Vector C contains tourist site classifications reflecting on tourists’ interest tendencies which are based on feature attribute vectors, and then C = { c j |   j ( 0 , m ] Z + } . It contains tourist site of interest classifications c 1 , c 2 , …, c m relating to tourist feature attributes, in which an arbitrary c j relates to one certain or more tourist feature attribute vectors X . We set m as the total quantity of tourist site classifications, and 0 < m < < k . As to one certain tourism city, we define the specific tourist site of the No. c j tourist site classification as c j s . The total quantity of tourist sites in c j is s j , and then s ( 0 , s j ] Z + .
Definition 4.
Training sample data vector D . The vector D = { X , C } composed of elements x i in the tourist feature attribute vector X and elements c j in the related tourist classification C is called the training sample data vector D .
According to the definition, the total quantity of the training sample data vectors is k . The elements of D are D = { x 1 , x 2 , , x n , c j } , and j ( 0 , m ] Z + . Element c j in vector D is the mined tourist site classification, which is the most interesting one for our certain tourist.
Definition 5.
Predicted descending order basic vector E . The smart recommendation system learns tourist interest big data and output interest tourist classification of one certain sample tourist. The Naïve Bayes algorithm outputs posterior probability values and arranges the tourist site of interest classifications in descending order and then stores them in element sequences into a vector, and this vector is called the predicted descending order basic vector E .
Vector E contains m elements of E j , j ( 0 , m ] Z + , and they are all posterior probability values. This vector is the critical content for the smart recommendation system to output the sample tourist’s site of interest classifications, and it is also the base for the smart recommendation system to output specific interest sites in proportion according to tourists’ needs and interests as well as their schedule, etc. Thus, it is the core for the smart recommendation system front end development. The vector E = ( E 1 , E 2 , E m ) might totally form A m m kinds of tour sequences, and tourists’ interest tendency on m tourist site classifications declines from the left element to the right one in the vector.
Definition 6.
Tourist site of interest classification distribution matrix A . According to the predicted descending order basic vector E and tourist feature attribute X , the specific quantity of tourist site in each classification, which meet tourists’ needs and interests can be confirmed. We store the tourist site quantity into a p × m dimension matrix A . This matrix is called the tourist site of interest classification distribution matrix A .
Matrix A is used to store and display the proportion of tourist site classifications and quantity output by smart machine under the conditions of the same tourist’s feature attributes and interests, in which the matrix row represents the proportion classification, the matrix column represents the confirmed specific tourist site quantity of one certain tourist site classification.
According to the definition, the smart machine learns from the training data to predict the interests and needs of tourist samples and obtains p kinds of proportions of tourist site classifications and quantities. Each proportion could be provided for tourists to make a decision, as the tourist site quantity is arranged in sequence in vector E . As the interest tendency declines from the left element to the right one in the matrix row, the allocated tourist site quantity from the designed algorithm should also decrease in the same order as the elements in the matrix row. Formula (3) is the structure of matrix A . Element a w j represents the quantity of the No. c j classification tourist site in the No. w kind of tourist site classification and quantity proportion, and w ( 0 , p ] Z + , j ( 0 , m ] Z + . Vector A w represents the No. w kind of proportion in matrix A .
A = a 11 a 12 a 13 a 1 m a 21 a 22 a 23 a 2 m a p 1 a p 2 a p 3 a p m
We calculate from the first nonzero tourist site classification a w j 0 in the same tourist site proportion of the matrix row. We set the total quantity of nonzero tourist site classifications as c o u n t 1 , and c o u n t [ 0 , m ] Z + . In the aspect of tourist site selecting, element a w j should meet the condition of the tourists’ actual schedule and plans, such as appropriate budget, travel time, physical condition, etc. In terms of the time schedule, take a one-day trip as an example. The quantity of tourist sites to be visited should be set within the range of a maximum value to match the tourist’s conditions. Matrix A meets the conditions of formula (4).
A = ( a w j ) s . t . 0 a w j max a w j , 0 < j = 1 m a w j 5 s . t . 1 < c o u n t 1 m , a w j Z + , c o u n t 1 Z +
Plentiful training sample data of the interest machine learning model come from the mainstream tourism website, known as “FengWo”, including tourists’ travel journals, travel trajectories, evaluations on tourist sites and routes, etc. As to our particular tourism city’s tourist sites, text data of k tourists’ travel strategies, evaluation data, and travel journal information on the website are crawled and processed, and the crawled arbitrary one tourist’s data should simultaneously contain n feature attributes x 1 , x 2 , …, x n of the tourist feature attribute vector X , and the tourist site classification c j of the tourist site classification vector C . The obtained text data are cleaned, integrated, and grouped to obtain the training data for the machine learning model and stored in text format in vector D . We set the sub-division attribute of the tourist feature attribute x i as x i c , i ( 0 , n ] Z + , c ( 0 , max c ] Z + . The quantity of samples for the sub-division attribute x i c is k i c , and the quantity of sample for tourist site classification c j is k j r , r ( 0 , max r ] Z + , which meets formula (5).
c = 1 max c k i c = k , r = 1 max r k i r = k s . t . i , i ( 0 , n ] Z + , c ( 0 , max c ] Z + , r ( 0 , max r ] Z +
Figure 1 shows the storage format of feature attribute vector X , tourist site classification vector C , and training sample data vector D , according to the definition of machine learning module and training data feature attributes.

2.2. The Foundation of Naïve Bayes Interest Mining Machine Learning Mode

Plentiful training sample data are used to set up the Naïve Bayes interest mining machine learning model by learning and mining valuable information from tourism interest big data [34,35,36]. The process of setting up the model conforms to the basic principle of machine learning algorithms. First, the machine learning module learns plentiful tourists’ interest data, including feature attribute data and interest tendency data. The machine learning module calls an algorithm to batch process the training data. With the quantity of input training data increasing, the stability and robustness will enhance simultaneously, and finally become completely stable. We input the experiment sample data into the machine learning module and they will be processed and calculated to output the predicated descending basic vector E . Vector E is the prediction of the experiment sample object on the tourism interest tendency. Storing tourist site classifications in the vector E element order is the main purpose of the machine learning module. According to the sample object’s feature attributes, the specific quantities and the proportion of tourist sites for each tourist site classification is simultaneously output.
As to the confirmed m tourist site classifications c 1 , c 2 , …, c m , the machine learning module calculates the posterior probability of a sample object X = { x 1 , x 2 , , x n } and sets the tourist site classification with the maximum posterior probability as the predicted tourist site classification assigned to the sample object, in which the condition for the machine learning to judge a sample object being assigned to the tourist site classification c j is when, and only when, P ( c j | X ) > P ( c ¬ j | X ) , j ( 0 , m ] Z + . The predicted assigned tourist site classification c j contains the maximum posterior assumption. P ( c j | X ) is obtained from formula (1).
As to all classifications, P ( X ) is the constant term, and the total calculation turns to search the maximum P ( X | c j ) P ( c j ) . If the tourist site classification c j does not appear with equal probability, there should exist one c j that meets the condition P ( c j ) = P ( c ¬ j ) at the least. Tourists’ feature attribute data are independent, respectively. We set each sample’s label and count the statistics data, and the prior probability of tourist site classification P ( c j ) meets formula (4), in which k j r is the quantity of samples belonging to classification c j , and k is the total quantity of training sample data, r ( 0 , max r ] Z + . Under the Naïve assumption of classification conditional independence, P ( X | c j ) meets the condition of formula (6), in which P ( x i | c j ) is assessed by training sample data.
P ( c j ) = k j r / k P ( X | c j ) = i = 1 n P ( x i | c j ) s . t . j ( 0 , m ] Z + , r ( 0 , max r ] Z + , i ( 0 , n ] Z +
As feature attributes x i and x ¬ i are mutually exclusive and independent, and in the same attribute x i , the sub-division attributes x i c and x i ¬ c are also mutually exclusive and independent, there is P ( x i | c j ) = k j i / k j r , in which k j i is the quantity of samples belonging to tourist site classification c j of sub-division attribute x i c in attribute x i , and then k j i ( 0 , k i c ] Z + . Here is the foundation process of the Naïve Bayes machine learning model.
Step 1 We set up training sample data form T . The training sample data vector D is used to set up the data form T , and all sample data are stored in the text format. The format of the data form is T = { No . k , x 1 , x 2 , , x n , c j } . Each training sample data relates to one data form, and the k training samples generate a data form with the capacity of k .
Step 2 We calculate the tourist site classification prior probability and conditional probability, respectively. The training sample data of each tourist site classification c j obtained from tourism big data mining are known and of non-equal probability. We calculate the prior probability P ( c j ) of the tourist site classification c j and conditional probability P ( X | c j ) of the training samples, respectively.
Sub-step 1: We confirm the feature attributes x i and sub-division attributes x i c of the experiment sample object X a , a ( 0 , k ] Z + . The feature attributes x i and sub-division attributes x i c of the experiment sample object X a are identical to training samples’ feature attributes. We store the feature attributes x i values and the sub-division attributes x i c values in the text format in the database. The final form data format of X a is T a = X a { x 1 , x 2 , , x n , c j } .
Sub-step 2: We calculate the prior probability of the tourist site classification c j . The prior probability of the tourist site classification c j meets the condition of P ( c j ) = k j r / k .
Sub-step 3: We calculate the conditional probability value P ( x i | c j ) of the feature attribute. We calculate the conditional probability of sample object X a feature attribute P ( x i | c j ) = k j i / k j r and obtain m × n conditional probability values.
Sub-step 4: We calculate the conditional probability value P ( X | c j ) of a sample object. We calculate the conditional probability of the sample object P ( X a | c j ) = i = 1 n P ( x i | c j ) and find m conditional probability values.
Step 3: We maximize the value P ( X | c j ) P ( c j ) and j ( 0 , m ] Z + . We calculate m values of P ( X | c j ) P ( c j ) and obtain max P ( X | c j ) P ( c j ) as the maximum value. This maximum value reflects that under the condition of k sample objects’ tourism interest big data, the machine learning module can learn and predict the most probable tendency of tourist site classifications to be selected by tourists.
Step 4: We set up a predicted descending order basic vector E . We set up a 1 × m dimension empty predicted descending order basic vector E 0 . From the maximum value max P ( X | c j ) P ( c j ) to the minimum value min P ( X | c j ) P ( c j ) , we store m values of P ( X | c j ) P ( c j ) in m elements of vector E 0 . Each P ( X | c j ) P ( c j ) value relates to one tourist site classification c j and then we obtain the predicted descending order basic vector E . Vector E is the predicted ranking result on the interest tendency of the tourist sample object X a .
Step 5: We set up the tourist site of interest classification distribution matrix A . Take one day as the basic study unit, according to the formula (4) condition, the total quantity of tourist sites recommended by the smart machine cannot exceed 5. While considering the travel experience’s abundance and sufficiency, at least two classifications’ quantity of specific tourist sites cannot be 0, thus, c o u n t 1 ( 1 , m ] Z + .
According to the data format of the main tourism websites and the factors mostly considered by tourists before traveling, the feature attributes are confirmed as the four contents:
  • x 1 : tourist age
  • x 2 : tourist income (monthly pay, unit: ten thousand yuan)
  • x 3 : travel budget (single person per day, unit: ten thousand yuan)
  • x 4 : tourism season
And city tourist site classification includes four contents:
  • c 1 : park and greenland
  • c 2 : venue
  • c 3 : amusement park
  • c 4 : shopping
According to the machine learning modeling conditions, we divide the crawled feature attributes into further sub-division attributes. The principle is as follows.
x 1 :{ x 11 : the middle and old aged ( 46 a g e ); x 12 : the youth ( 18 a g e < 46 );
x 13 : the early youth ( 13 a g e < 18 ); x 14 : children ( 0 a g e < 13 )};
x 2 :{ x 21 : i n c o m e 0.2 ; x 22 : 0.2 < i n c o m e 0.5 ; x 23 : 0.5 < i n c o m e 1.0 ; x 24 : i n c o m e > 1.0 };
x 3 :{ x 31 : exp e n s e 0.02 ; x 32 : 0.02 < exp e n s e 0.05 ;
x 33 : 0.05 < exp e n s e 0.1 ; x 34 : exp e n s e > 0.1 };
x 4 :{ x 41 : s p r i n g ; x 42 : s u m m e r ; x 43 : a u t u m n ; x 44 : w i n t e r }.
Confirm the parameter value k = 1000 , that is, the training sample data contains 1000 valuable data for vector D , conforming to the conditions. We confirm the tourist site of interest classification distribution matrix via the Naïve Bayes learning algorithm. If the tourist sample chooses the tourist site classification and quantity of the tourist site in matrix row w , then w ( 0 , p ] Z + . The quantity of tourist site for each classification is a w j , j ( 0 , m ] Z + .

3. Smart Tour Route Planning Algorithm Modeling

The tourist site of interest classification that was obtained from the Naïve Bayes machine learning mining model conforms to tourists’ interests and needs. Under the condition, we design an optimal tourist site mining algorithm based on the membership degree searching propagating tree to mine tourist sites with optimal geographic distribution. The mined optimal tourist sites are designed as nodes of tour routes to develop a smart tour route planning algorithm combined with factors, such as tourism GIS services, traffic information services, and tourist site information services, which influence tourists’ travel experiences. The algorithm can output optimal tour routes, which conform to actual conditions, meet tourists’ interests and motive benefits, and decrease travel expenditures. Meanwhile, sub-optimal tour routes are also provided for tourists.

3.1. Optimal Tourist Site Mining Algorithm based on Membership Degree Searching Propagating Tree

The tourist site of interest classification distribution matrix A formed by the Naïve Bayes machine learning mining process is the critical model for smart machine to learn tourists’ needs and interests. In the matrix A , arbitrary row A w represents one feasible sort of tourist site classification and quantity. Each row’s classification and quantity can meet the needs of tourists, but they differ in specific tourist sites, which will output different tour routes. According to the definition, as to one row A w of matrix A , the feasible sort of classification and quantity is j = 1 m C s j a w j , s . t . a w j 0 , but not all the sorts are the optimal ones. The tourists start from temporary accommodation in the city, visit all selected tourist sites, and finally return to temporary accommodation, and the whole process forms an integrated tour route. The selected tourist sites should meet the needs and interests while costing the minimum expenditure. Thus, within the neighbourhood range of a temporary accommodation center, the nearer to the center, the more beneficial of the tourist site will be. Thus, in j = 1 m C s j a w j , there are different sorts of tourist classifications and quantities and only one sort is optimal in geographic distribution. The membership degree relationship is used to set up the neighbourhood searching arc for the seed tourist site and to iterate the tourist sites to generate the propagating tree. The process of searching for subordinate seed tourist sites is in the range of one tourist site cluster or cross-cluster, that is, the tourist sites may belong to the same cluster or different clusters. The final output result of the process is a propagating tree with optimal geographic distributed tourist sites, and notes on the tree are the mined optimal tourist sites.
Definition 7.
Tourist site clustering center K . The temporary accommodation which is confirmed and checked in before the trip is set as the starting point and terminal point of the whole tour activity. The temporary accommodation is the first critical point of the planned tour route, which is called the tourist site clustering center K . The center K is determined by the temporary accommodation’s location, here defined as the longitude and latitude ( l , B ) .
The center K will change with the tourist’s decision on the temporary accommodation and will directly influence the propagating tree’s formation, shape and distribution, and influence the mined optimal tourist sites.
Definition 8.
Seed tourist site G e and seed tourist site vector G w . Starting from tourist site clustering center K , searched and confirmed optimal tourist sites within neighbourhood via objective function and membership degree are called seed tourist site G e . Under the condition of one sort of tourist site classification and quantity, the searched seed tourist sites for each tourist site classification c j is a w j , and j = 1 m a w j , s . t . w , e ( 0 , j = 1 m a w j ] Z + is the total quantity of seed tourist sites, according to the tourist site of interest classification distribution matrix A and its arbitrary row vector A w . We store j = 1 m a w j seed tourist sites in the sequence of the propagating tree’s nodes in the vector element from left to right in order, and this vector is called the seed tourist site vector G w .
Under the condition of the confirmed tourist site clustering center K , matrix A can generate p seed tourist site vectors G w , and w ( 0 , p ] Z + , according to the definition. Each G w stores the searched optimal tourist sites of row A w .
Definition 9.
Subordinate seed tourist site G e and non-subordinate seed tourist site ¬ G e . As to one seed tourist site G e , the searched and confirmed tourist site which is closest to starting center K or seed tourist site G e and it will be listed to store in propagating tree nodes via subordinate function and membership degree relationship model is called subordinate seed tourist site G e . In the same searching process, other tourist sites that are not listed to store in propagating tree nodes are called non-subordinate seed tourist site ¬ G e .
Definition 10.
Seed tourist site searching arc. We set initial seed tourist site G e as the circle center, and take the neighbourhood radius confirmed by the objective function as the arc. The arc is used to search the subordinate seed tourist site G e . The combined structure of the radius and arc is called the seed tourist site searching arc. The seed tourist site searching arc is the direction and path to search the seed tourist site. In one searching process, a smart machine scans all of the tourist sites, and seed tourist site will be bound to pass the seed tourist site searching arc with a minimum objective function value.
Definition 11.
Optimal tourist site propagating tree t r e e w . The structure tree searched and confirmed by seed tourist sites and subordinate seed tourist site generation model is called the optimal tourist site propagating tree. The nodes of the optimal tourist site propagating tree are tourist sites, which are optimally geographic distributed, conform to tourists’ interests and feature attributes, and cost the least expenditure.
According to the definition, under the condition of the confirmed tourist site clustering center K , matrix A can generate p optimal tourist site propagating trees t r e e w , and w ( 0 , p ] Z + . The optimal tourist site mining algorithm that is based on the membership degree searching propagating tree is designed and developed, according to definition and the thought of optimal tourist site propagating tree modeling.
Step 1. Confirm the propagating tree universe of discourse
According to the definition of the tourist site classification vector C , as to one certain tourism city, the specific tourist site of No. c j tourist site classification is c j s . The tourist site quantity of c j is s j , s ( 0 , s j ] Z + .
We set the city tourist site set as C s = { c 11 , c 12 , , c 1 s 1 , c 21 , c 22 , , c 2 s 2 , , c m 1 , c m 2 , , c m s m } R s , which is called the universe of discourse. c j s = ( c j s 1 , c j s 2 , , c j s u ) T R s is the feature vector of samples to be observed, and it relates to one point of universe of discourse feature space, that is, the tourist site in city geographic space. c j s α is the feature attribute value of No. α dimensions of feature vectors. As to the tourist site itself, feature attribute values contain the tourist site’s longitude l , tourist site’s latitude B , and tourist site’s attraction index ε .
Step 2. Divide the propagating tree universe of discourse into clusters
Starting from the tourist site clustering center K , we divide the propagating tree universe of discourse into m clusters c j , and each division cluster c j relates to one tourist site classification, which forms a tourist site cluster c j . Thus, the clusters of propagating tree universe of discourse are c 1 , c 2 , …, c m , and they meet the formula (7) conditions.
c 1 c 2 c m = C c j c ¬ j = , s . t . j , ¬ j ( 0 , m ] Z + c j , c j C , s . t . j ( 0 , m ] Z +
In the process of searching seed tourist sites starting from the tourist site clustering center K , the searched subordinate seed tourist site G e and initial seed tourist site G e may be in the same classification cluster or in the different classification clusters, and the searching process should meet the constraint conditions. Here, the seed tourist sites in the same classification cluster are noted as G e + , while in the different classification cluster, they are noted as G e .
Step 3. Set up the objective function and subordinate function
The space searching relationship of the tourist site clustering center K , seed tourist site G e , and subordinate seed tourist site G e is determined by the clustering principle of K , G e , and G e . The principle is the second order Minkowski distance. According to the definition of tourist site feature attributes, the Minkowski distance between the tourist site clustering center K and the first mined seed tourist site G 1 , and the Minkowski distance between the seed tourist site G e and subordinate seed tourist site G e are determined by their feature attributes. Other than the tourist site’s longitude l , latitude B , and tourist site attraction index ε , there exist factors that influence the process to search the subordinate seed tourist site.
Definition 12.
Membership degree direct influence factor λ v 1 . In the process of single searching, the factors that directly influence whether one certain tourist site is the subordinate seed tourist site G e of initial seed tourist site G e or not are called the membership degree direct influence factors λ v 1 , v 1 ( 0 , max v 1 ] Z + .
Definition 13.
Membership degree indirect influence factor δ v 2 . In the process of single searching, the factors that indirectly influence whether one certain tourist site is the subordinate seed tourist site G e of initial seed tourist site G e or not are called the membership degree indirect influence factors δ v 2 , v 2 ( 0 , max v 2 ] Z + .
The membership degree direct influence factors λ v 1 include the ferry distance between the tourist site clustering center K and tourist site (km), the ferry distance between the two tourist sites (km), the quantity of subways and bus lines between the ferry interval, the taxi fee of the ferry interval and the road traffic jam index, according to the actual travel process and city tourism service. The membership degree indirect influence factors δ v 2 include the quantity of traffic light between the tourist site clustering center K and the tourist site, the quantity of traffic light between two tourist sites, the average walking distance from a tourist site to the nearest subway or bus station (km), the average waiting time for a taxi (h), and the average quantity of traffic jammed roads. According to definition, factor s λ v 1 and δ v 2 are represented in text format. The symbol “ d i r + ” stands for factors λ v 1 , and the symbol “ i n d i r ” stands for factors δ v 2 . The text format is defined as <Factor, Relationship, Algorithm, Attribute>, and each factor is represented, as follows.
< Direct factor 1: < λ 1 , ferry distance, temporary accommodation → tourist site S 1 (km, S 1 R + ),
λ 1 = S 1 1 , d i r + >;
  Indirect factor 1: < δ 1 , quantity of traffic light, temporary accommodation → tourist site N 1
  ( N 1 Z + ), δ 1 = 0.01 N 1 , i n d i r >>
< Direct factor 2: < λ 2 , ferry distance, tourist site → tourist site S 2 (km, S 2 R + ), λ 2 = S 2 1 , d i r + >;
  Indirect factor 2: < δ 2 , quantity of traffic light, tourist site → tourist site N 2 ( N 2 Z + ),
   δ 2 = 0.01 N 2 , i n d i r >>
< Direct factor 3: < λ 3 , quantity of subway and bus line N 3 ( N 3 Z + ), λ 3 = 0.1 N 3 , d i r + >
  Indirect factor 3: < δ 3 , ferry distance, tourist site → nearest subway or bus station S 3 (km,
S 3 R + ), δ 3 = 0.01 S 3 , i n d i r >>
< Direct factor 4: < λ 4 , taxi fee of ferry distance, c o s t ( c o s t R + ), λ 4 = c o s t 1 , d i r + >
  Indirect factor 4: < δ 4 , average waiting time of taxi, t (h, t R + ), δ 4 = 0.01 t , i n d i r >>
< Direct factor 5: < λ 5 , road traffic jam index, d ( d R + ), λ 5 = 1 d , d i r + >
  Indirect factor 5: < δ 5 , average quantity of traffic jam, N 4 ( N 4 Z + ), δ 5 = 0.01 N 4 , i n d i r >>.
According to the definition, the city tourist site set C can be stored as a u × j = 1 m s j dimension matrix. The matrix’s columns relates to specific tourist sites, while the rows relates to the feature attribute. The feature attributes include the membership degree direct influence factors λ v 1 , membership degree indirect influence factors δ v 2 , tourist site longitude l , tourist site latitude B , the tourist site attraction index ε , and max u = max v 1 + max v 2 + 3 . According to Definitions 12 and 13, factors λ v 1 and δ v 2 of the tourist site clustering center K and first searched seed tourist site, seed tourist site, and subordinate seed tourist site are determined by the tourist site clustering center K and relative tourist sites, in which, if one point changes, the values of factors λ v 1 and δ v 2 will change simultaneously, thus the values of factors λ v 1 and δ v 2 are fluctuating. The objective function of the tourist site clustering center K and first searched seed tourist site, seed tourist site, and subordinate seed tourist site are determined by feature attributes, as shown in formula (8).
σ ( K , c j s ) = v 1 = 1 max   v 1 λ v 1 + v 2 = 1 max   v 2 δ v 2 + 0.01 ( ( Δ l K , c j s ) 2 + ( Δ B K , c j s ) 2 ) 1 / 2 + | Δ ε K , c j s | σ ( c j s , c j s ) = v 1 = 1 max   v 1 λ v 1 + v 2 = 1 max   v 2 δ v 2 + 0.01 ( ( Δ l c j s , c j s ) 2 + ( Δ B c j s , c j s ) 2 ) 1 / 2 + | Δ ε c j s , c j s | J ( σ ( K , c j s ) , σ ( c j s , c j s ) ) = σ ( K , c j s ) + σ ( c j s , c j s ) s . t . j j   o r   s s
Definition 14.
Objective function descending order vector Q . In the process of single searching subordinate seed tourist site G e , we store the searched objective function values in a vector in the sequence of elements from left to right in descending order, and this vector is called the objective function descending order vector Q .
Definition 15.
Objective function fluctuating curve. In the process of single searching a subordinate seed tourist site G e , the fluctuating curve, which reflects objective function values tendency, is called the objective function fluctuating curve.
The objective function fluctuating curve changes with the tourist site clustering center K and the selected tourist site classification and quantity A w . When K or A w changes greatly, the objective function fluctuating curve tendency will also change greatly. In the process of searching the subordinate seed tourist site G e starting from the clustering center K or seed tourist site G e , in one time of searching, one group of objective function values will be generated. The objective function fluctuating curve visually reflects the affinities relationship between the seed tourist site and other tourist sites in one single searching process.
Definition 16.
Seed tourist site full rank for classification c j . As to one certain nonzero tourist site classification a w j 0 of tourist site classification and quantity A w in matrix A , during the searching process, when the quantity of the searched seed tourist site for this classification reaches a w j , the seed tourist site for the classification c j is full rank under the condition of A w , and it is noted as c j . When the seed tourist site for the classification is full rank, the propagating tree will not accept further searched seed tourist sites of the same classification.
One single searching process only confirms and mines one tourist site as the subordinate tourist site and, meanwhile, the other tourist sites are non-subordinate tourist sites. The subordinate function μ ( ( K , c j s ) , c j s ) = μ ( K , c j s ) ( c j s ) , which is noted by the membership degree that represents the subordinate relationship between tourist site c j s and initial seed tourist site c j s or the tourist site clustering center K in one single searching process. The subordinate function is formula (9).
μ ( ( K , c j s ) , c j s ) = μ ( K , c j s ) ( c j s ) = 1 ,   c o n d i t i o n 1 0 ,   c o n d i t i o n 2
Definition 17.
Membership degree distribution matrix μ ( c ) . When the tourist site c j s is the subordinate seed tourist site for the clustering center K or initial seed tourist site c j s , the membership degree value of tourist site c j s is 1, or the value is 0. One single searching process can confirm one tourist site’s membership degree value as 1, and other tourist sites’ values as 0. The matrix that represents the subordinate relationship of all tourist sites via subordinate function values is called the membership degree distribution matrix μ ( c ) .
As shown in formula (10), it represents the distribution of seed tourist sites. The matrix row is one sort of tourist site classification and quantity. The matrix column is the membership degree of the No. s tourist site for the sort of tourist site classification and quantity. The quantity of column is max s j , and vacant elements are noted as 0. When the clustering center K or A w changes, the membership degree distribution matrix will also change.
μ ( c ) = μ ( c 11 ) μ ( c 12 ) μ ( c 13 ) μ ( c 1 s 1 ) μ ( c 21 ) μ ( c 22 ) μ ( c 23 ) μ ( c 2 s 2 ) μ ( c m 1 ) μ ( c m 2 ) μ ( c m 3 ) μ ( c m s m )
Step 4. Set up the optimal tourist site mining algorithm
Objective function descending order vector Q stores objective function values. If the tourist site classification relating to the first element of objective function value is not full rank and not listed in the previous seed tourist sites, and then the tourist site relating to the objective function value is mined as subordinate seed tourist site G e of the tourist site clustering center K or initial seed tourist site G e , if in the same cluster, note it as G e + , if in the different cluster, we note it as G e . Tourist sites relating to the objective function values on other elements are non-subordinate seed tourist sites ¬ G e . Starting from the clustering center K , the process of searching the seed tourist site vector G w and obtaining the objective function descending order vector Q , as well as the membership degree distribution matrix μ ( c ) is as follows.
Sub-step 1. Confirm matrix A . The tourist selects one sort of tourist site classification and quantity vector A w .
Sub-step 2. We set up 1 × j = 1 m a w j dimension seed tourist site vector G w , 1 × j = 1 m s j dimension objective function descending order vector Q and m × max s j dimension membership degree distribution matrix, and set all elements as 0.
Sub-step 3. We set up the Open list and Closed list. The open list is used to store all non-seed tourist sites to be searched. The Closed list is used to store all searched seed tourist sites. The storage format of the Open list and Closed list is the same as the tourist site classification vector C , and the elements for the two lists are set in the sequence of the tourist site classification and order. The Open list and Closed list contain j = 1 m s j elements, respectively, according to the definition. We store all elements of the city tourist site set C s in the Open list.
Sub-step 4. Search and confirm the No.1 seed tourist site G 1 . Here is the definition of seed tourist site searching angle.
Definition 18.
Seed tourist site searching angle φ . Starting from one certain central point, we draw a ray l 1 directing to the geographic north and another ray l 2 connecting with the central point and another point. The included angle from the north ray l 1 to ray l 2 in a clockwise direction is called the searching angle. If the central point is the clustering center K or initial seed tourist site G e , the other point is one tourist site c j s to be searched, and the included angle from the ray of the clustering center K or initial seed tourist site G e to the ray of the tourist site c j s is called the seed tourist site searching angle φ , noted as φ ( K , c j s ) or φ ( G e , c j s ) .
The process of searching the No.1 seed tourist site is as follows.
(I) The clustering center K is set as the central point to confirm the j = 1 m S j searching angle φ ( K , c 11 ) , φ ( K , c 12 ) , …, φ ( K , c m s m ) for tourist sites;
(II) search and calculate the objective function value σ ( K , c 11 ) in the direction of the searching angle φ ( K , c 11 ) and objective function value σ ( K , c 12 ) in the direction of the searching angle φ ( K , c 12 ) ;
① if σ ( K , c 11 ) σ ( K , c 12 ) , store φ ( K , c 11 ) into the first element of vector Q , and store φ ( K , c 12 ) into the second element of vector Q ;
② if σ ( K , c 11 ) < σ ( K , c 12 ) , store φ ( K , c 12 ) into the first element of vector Q , and store φ ( K , c 11 ) into the second element of vector Q ;
(III) search and calculate the objective function value σ ( K , c 13 ) on the direction of the searching angle φ ( K , c 13 ) :
① if σ ( K , c 11 ) σ ( K , c 12 ) σ ( K , c 13 ) , keep the first and second element unchanged, and store σ ( K , c 13 ) into the third element of vector Q ;
② if σ ( K , c 11 ) σ ( K , c 13 ) σ ( K , c 12 ) , keep the first element unchanged, and descend σ ( K , c 12 ) to the third element of vector Q ;
③ if σ ( K , c 13 ) σ ( K , c 11 ) σ ( K , c 12 ) , descend σ ( K , c 11 ) and σ ( K , c 12 ) to the second and third elements of vector Q , and ascend σ ( K , c 13 ) to the first element of vector Q ; and,
④ as to σ ( K , c 11 ) < σ ( K , c 12 ) , the comparison method of σ ( K , c 13 ) and other two values is the same as step (III) sub-steps ①–③.
(IV) Return to step (I)–(III) and continue searching and comparing the objective function values of other searching angles, store the function values into vector Q , and finally find the objective function descending order vector Q 1 and objective function fluctuating curve c u r v e 1 searched by the central point of the clustering center K .
(V) Extract the first element value of vector Q 1 , and its searching angle’s related tourist site is Q 11 . Enter the following judgment steps:
① Search the Closed list. If Q 11 appears in the Closed list, jump to the second element Q 12 of vector Q 1 ;
② If Q 12 appears in the Closed list, continue to jump to the third element Q 13 of vector Q 1 ;
③ Start searching from tourist site Q 11 , according to the method of step (V) sub-steps ① and ②, if tourist site Q 1 v 1 appears in the Closed list, continue searching; if one certain tourist site Q 1 v 1 does not appear in the Closed list, then jump to step ④, v 1 ( 0 , j = 1 m s j ] Z + ;
④ Judge and confirm the tourist site classification c j for tourist site Q 1 v 1 :
(i) If the tourist site classification c j is not full rank ¬ c j , and then confirm tourist site Q 1 v 1 , as the No.1 seed tourist site G 1 and store it into the first element of the seed tourist site vector G w . Confirm the seed tourist site’s membership degree to the clustering center K is 1. The other tourist sites’ membership degrees are all 0. Store G 1 into the Closed list and delete G 1 from the Open list; and,
(ii) If the tourist site classification c j is full rank c j , return to step (V) sub-steps ①–③ and search the next tourist site Q 1 v 2 which does not appear in the Closed list. Enter the judgment of step (V) sub-step ④. Repeat the process until the seed tourist seed is searched and confirmed, and then store it into the first element of vector G w .
Sub-step 5. Search and confirm the No.2 seed tourist site and subsequent seed tourist sites.
(I) According to Sub-step 4, set the initial seed tourist site G 1 as the central point. Search the No.2 seed tourist site G 2 in the whole geographic range and store it into vector. Confirm the membership degree of the seed tourist site to initial seed tourist site G 1 as 1, other tourist sites’ membership degrees as set as 0. Store G 2 into the Close list, and delete it from the Open list;
① If the tourist site classification for the seed tourist site G 1 is not full rank ¬ c j , that is, G 1 and G 2 are in the same cluster, note G 2 as G 1 + * ; and,
② If the tourist site classification for the seed tourist site G 1 is full rank c j , that is, G 1 and G 2 are in two different clusters, note G 2 as G 1 * .
(II) Set the initial seed tourist site G 2 as the central point. Search the No.3 seed tourist site G 3 in the whole geographic range and store it into vector. Confirm the membership degree of the seed tourist site to initial seed tourist site G 2 as 1, and set the other tourist sites’ membership degrees as set as 0. Store G 3 into the Close list, and delete it from the Open list. The method to note the G 3 cluster is the same as Sub-step 5 step (I); and,
(III) According to Sub-step 5 step (I) and (II), search and store subsequent seed tourist sites until each tourist site of interest classification c j gets to full rank c j , j = 1 , 2 , , m , and also the seed tourist site vector G w is full rank. The method to note cluster is the same as Sub-step 5 step (I). In the process of searching the seed tourist site, the objective function descending order vector and objective function curve relating to each seed tourist site are also obtained. Figure 2 shows the process of searching and mining subordinate seed tourist sites with previously searched seed tourist sites as the central points.
Step 5. Generate the optimal tourist site propagating tree
Starting from the clustering center K , generate the optimal tourist site propagating tree in the sequence of the seed tourist site vector G w element. This tree is the tendency of optimal tourist sites that meet tourists’ needs and interests and have the optimal geographic distribution. It is also the visualized process for a smart machine to output the optimal tourist sites according to the selected tourist classification and quantity.
Step 6. Generate the membership degree distribution matrix μ ( c ) .
Based on the searched seed tourist site vector G w , the membership degree distribution matrix μ ( c ) is generated. This matrix can intuitively reflect the quantity of the seed tourist sites as well as their distribution of each tourist site classification.

3.2. Tour Route Planning Algorithm Modeling based on Optimal Closed-loop Structure

The smart machine automatically plans optimal tour routes that meet tourists’ best motive benefits, according to the tourists’ interests learned from the Naïve Bayes machine learning module and optimal tourist sites searched by the membership degree searching propagating tree. All of the designed and developed algorithms are based on one-day trips. Within one day, the smart machine confirms no more than five optimal tourist sites for tourists and ensures that all of the mined tourist sites not only meet tourists’ needs and interests, cost the least expenditure with the optimal geographic distribution, but also consider tourists’ physical conditions, which helps tourists to have sufficient time to visit all the recommended tourist sites. Starting from temporary accommodation K , the whole trip of ferrying from one tourist site to another and visiting each tourist site and then returning to K is an integrated closed-loop process, in which the quantity of visited optimal tourist sites is set as τ , being noted as τ = j = 1 m a w j , τ ( 0 , 5 ] Z + . Under the condition of confirmed K , there will be A τ τ sorts of tour routes, but not all of the tour routes can meet the tourists’ best motive benefits, there should be optimal ones and sub-optimal ones. The optimal ones will be the first important recommendation to tourists, while the sub-optimal ones will also be recommended to tourists. The attraction and motive benefits of one tour route for tourists depends on the influence of all the factors on the tour route, including factors λ v 1 and δ v 2 in the actual trip, which are extracted to set up the objective function J ( σ ( K , c j s ) , σ ( c j s , c j s ) ) .
Definition 19.
Generation tree of the closed-loop structure O ω . Starting from the temporary accommodation K , the whole trip of ferrying from one tourist site to another and visiting each tourist site and then returning to K is an integrated closed-loop structure, and this structure is called a generation tree closed-loop structure O ω .
According to the quantity of tourist sites τ , the A τ τ quantity of closed-loop structures can be confirmed, ω ( 0 , A τ τ ] Z + , τ ( 0 , 5 ] Z + . One closed-loop structure relates to one tour route generation tree.
Definition 20.
Generation tree sub-unit H ( ) . In the whole trip process of one closed-loop structure, tourists will pass τ + 1 independent ferry intervals, and each ferry interval is called generation tree sub-unit H ( ) .
According to the closed-loop structure, the ferry interval between K and tourist, between two tourist sites, and between tourist site and K are noted as H ( K , G e ) , H ( G e , G e + 1 ) , and H ( G e , K ) . A generation tree sub-unit is the basic unit structure to output a sub-unit motive function value and generation tree motive function value. Here, it is defined that generation tree sub-units are independent from each other; tourists’ motive benefit obtained in one sub-unit has no relationship with another sub-unit.
Definition 21.
Sub-unit motive function I ( ) . In each sub-unit, the function is designed with the same initial motive iteration value I 0 to iterate with the membership degree direct influence factors λ v 1 and indirect influence δ v 2 and output motive iteration value of independent interval H ( ) . This function is called the sub-unit motive function I ( ) , as shown in formula (11).
The sub-unit motive function I ( ) reflects the motive benefits of the ferry interval. The higher the function value is, the bigger the influence of factors on motive benefits will be, and the more satisfaction tourists will have. In the ferry interval of a sub-unit, the motive function I ( ) is a monotone increasing function whose values will increase with tourists ferry distance increases. It finally outputs a maximum value of the interval, which is the sub-unit motive function I ( ) value. Different sub-units have different function values, thus the whole trip’s sub-unit motive function I ( ) values fluctuate with distance. Sub-unit motive function I ( ) value has the feature of non-direction, that is, in the same sub-unit, function I ( ) value remains unchanged back and forth.
I ( K , G e ) = v 1 = 1 max v 1 I 0 λ v 1 + v 2 = 1 max v 2 I 0 δ v 2 + I 0 ( ( Δ l K , G e ) 2 + ( Δ B K , G e ) 2 ) 1 / 2 + I 0 | Δ ε K , G e | I ( G e , G e ) = v 1 = 1 max v 1 I 0 λ v 1 + v 2 = 1 max v 2 I 0 δ v 2 + I 0 ( ( Δ l G e , G e ) 2 + ( Δ B G e , G e ) 2 ) 1 / 2 + I 0 | Δ ε G e , G e | I ( G e , K ) = v 1 = 1 max v 1 I 0 λ v 1 + v 2 = 1 max v 2 I 0 δ v 2 + I 0 ( ( Δ l G e , K ) 2 + ( Δ B G e , K ) 2 ) 1 / 2 + I 0 | Δ ε G e , K |
Definition 22.
Sub-unit motive weight h ( ) . The reciprocal of the sub-unit motive function I ( ) value is defined as the sub-unit motive weight h ( ) . The sub-unit motive weight h ( ) is the edge weight for two connecting point in the closed-loop. It is used as an edge weight parameter to search the optimal closed-loop structure.
According to the definition, the sub-unit motive weight h ( ) meets formula (12). The sub-unit motive weight h ( ) also has the non-direction feature. Thus, the graph that is composed by the clustering center K and seed tourist sites G e is connected and non-direction graph.
h ( K , G e ) = 1 I ( K , G e ) , h ( G e , G e ) = 1 I ( G e , G e ) , h ( G e , K ) = 1 I ( G e , K )
Definition 23.
Generation tree weight function L ( ) . The function which is iterated by the τ + 1 sub-unit motive weight h ( ) and reflects the motive benefits of one generation tree closed-loop’s tour route is called the generation tree weight function L ( ) , as shown in formula (13). One generation tree weight function L ( ) relates to one tour route, and the lower the function value is, the more motive benefits the tourists will get from the tour route.
According to definition, in one closed-loop structure, the generation tree weight function L ( ) is a monotone increasing function whose value increases with tourists’ ferrying distance increases, and finally outputs a maximum value. A τ τ function L ( ) values are the elements for generation tree weight function minimum heap.
L ( K , K ) = h ( K , G e ) + e = 1 τ h ( G e , G e ) + h ( G e , K ) s . t . | e e | ( 0 , τ 1 ] Z +
Definition 24.
Generation tree weight function minimum heap R . The minimum heap, which is formed by generation tree weight function values stored as array elements, is called the generation tree weight function minimum heap R .
According to the seed tourist site quantity τ and generation tree quantity A τ τ , the minimum heap meets the following conditions:
(1) it contains A τ τ elements;
(2) set n = A τ τ , its element serial numbers k 1 , k 2 , …, k n meet: k i k 2 i , k i k 2 i + 1 , 1 i n / 2 ;
(3) the level of parent node is No.0. The height of the tree is d , and the other nodes are either on the No. d level or on the No. d 1 level;
(4) when d 1 , there are 2 d 1 nodes on the No. d 1 level;
(5) the branch nodes of the No. d 1 level all gather on the left of the tree;
(6) element value of each node is smaller than its child nodes; and,
(7) of all the node elements in the same level, left element is smaller than the right one.
According to the the definition, tour route planning algorithm modeling that is based on optimal closed-loop structure is set up. The basic thought is, motive weights between clustering center K and each seed tourist site G e , seed tourist site G e , and seed tourist site G e are confirmed by sub-unit motive function. By searching the A τ τ generation tree weight function values, a minimum heap sorting algorithm is used to confirm the minimum heap with weight function values in ascending order, and finally confirm the optimal tour routes and sub-optimal tour routes. The specific steps for the algorithm are as follows.
Step 1. Confirm the algorithm parameters:
(I) Confirm λ v 1 and δ v 2 . Extract the basic geographic information data of a certain tourism city and confirm the membership degrees direct influence factors λ v 1 and indirect influence factors δ v 2 between the clustering center K and each seed tourist site, seed tourist site G e , and seed tourist site G e ;
(II) Confirm l , B and ε . Extract the basic geographic information data and confirm the longitude and latitude coordinates ( l , B ) of the clustering center K and each seed tourist site G e . Mine the tourism data information and obtain tourist site attraction indexes. Set the attraction index of the clustering center K as ε K = 0 , as it is the starting point of the tour route.
Step 2. Iterate and calculate the sub-unit motive function values. From formula (11), the C τ + 1 2 motive function I ( ) values between the clustering center K and each seed tourist site G e , seed tourist site G e , and seed tourist site G e .
Sub-step 1 Confirm the τ motive function values between the clustering center K and each seed tourist site G e . The clustering center K is the starting point and terminal point of the tour route;
Sub-step 2. Confirm C τ 2 motive function values between arbitrary two seed tourist sites.
Step 3. Confirm the sub-unit motive weight. According to the sub-unit motive function values, confirm the C τ + 1 2 sub-unit motive weights between the clustering center K and each seed tourist site G e , seed tourist site G e , and seed tourist site G e . The motive weight value is the edge weight of the connected and non-direction graph composed of the clustering center K and each seed tourist site G e .
Step 4. Search generation tree weight function minimum heap R . Through an edge correcting method to search the A τ τ generation tree weight function values relating to A τ τ generation tree closed-loop’s tour routes. Search and obtain the generation tree weight function minimum heap R sorted by the generation tree weight function values in array via a sorting algorithm.
Sub-step 1. Set up a generation tree basic structure loop. Define a virtual closed-loop circle and evenly place points of the clustering center K and all seed tourist sites G e on the circle. The connecting arc or line between two points can be clipped or connected in accordance with algorithm conditions, as shown in Figure 3. For the convenience of setting up the algorithm, note the clustering center K as v 1 , seed tourist site G 1 as v 2 , and son on, and the seed tourist site G τ as v τ + 1 .
Sub-step 2. Search the initial generation tree closed-loop structure O 1 , and set:
O 1 = v 1 , v 2 , , v i , , v j , , v τ + 1 , v 1 , 1 < i j < τ + 1 , and i , j , τ Z + .
(I) in structure O 1 , search the τ + 1 sub-unit motive weights h ( ) of adjacent v i and v i + 1 ;
(II) iterate the generation tree weight function value L 1 ( K , K ) of the closed-loop structure O 1 ; and,
(III) store the weight function value L 1 ( K , K ) into the parent node R 1 of minimum heap R .
Sub-step 3. Search the next generation tree closed-loop structure O 2 . Find i , j and i , j meet the following conditions:
(1) 1 < i + 1 < j < τ + 1 ;
(2) h ( v i , v j ) + h ( v i + 1 , v j + 1 ) < h ( v i , v i + 1 ) + h ( v j , v j + 1 ) .
Clip and rebuild the closed-loop structure O 1 :
(I) delete sub-unit H ( v i , v i + 1 ) in O 1 ;
(II) delete sub-unit H ( v j , v j + 1 ) in O 1 ;
(III) add sub-unit H ( v i , v j ) ; and,
(IV) add sub-unit H ( v i + 1 , v j + 1 ) .
The structure of the rebuilt generation tree closed-loop is:
O 2 = v 1 , v 2 , , v i , v j , v j + 1 , , v i + 1 , v j + 1 , v j + 2 , , v τ + 1 , v 1 . Search the weight function value of generation tree closed-loop structure O 2 .
(V) In structure O 2 , search the τ + 1 sub-unit motive weights h ( ) of adjacent v i and v i + 1 ;
(VI) iterate the generation tree weight function value L 2 ( K , K ) of the closed-loop structure O 2 ; and,
(VII) compare the generation tree weight function value L 1 ( K , K ) and L 2 ( K , K ) , and update the generation tree weight function minimum heap R :
① If L 1 ( K , K ) L 2 ( K , K ) :
(i) keep the weight function value L 1 ( K , K ) storing in the parent node R 1 of minimum heap R unchanged; and,
(ii) store the weight function value L 2 ( K , K ) into the child node R 2 of parent node R 1 in minimum heap R .
② If L 1 ( K , K ) > L 2 ( K , K ) :
(i) delete the parent node R 1 value L 1 ( K , K ) ; and,
(ii) store the weight function value L 2 ( K , K ) into the parent node R 1 in minimum heap R ; and,
(iii) store the weight function value L 1 ( K , K ) into the child node R 2 of parent node R 1 in minimum heap R .
Sub-step 4. Return to Sub-step 3 and use the same method to search the next generation tree closed-loop structure O 3 .
(I) in structure O 3 , search τ + 1 sub-unit motive weights h ( ) of adjacent v i and v i + 1 ;
(II) iterate the generation tree weight function value L 3 ( K , K ) of the closed-loop structure O 3 ; and,
(III) compare the generation tree weight function values L 1 ( K , K ) , L 2 ( K , K ) and L 3 ( K , K ) , and then update the generation tree weight function minimum heap R :
① If L 1 ( K , K ) L 2 ( K , K ) :
(i) if L 1 ( K , K ) L 2 ( K , K ) L 3 ( K , K ) , keep the weight function values L 1 ( K , K ) and L 2 ( K , K ) storing unchanged, store the weight function value L 3 ( K , K ) into the child node R 3 of parent node R 1 in minimum heap R ;
(ii) if L 1 ( K , K ) L 3 ( K , K ) < L 2 ( K , K ) , keep the weight function value L 1 ( K , K ) storing unchanged, delete the child node R 2 value and store the weight function value L 3 ( K , K ) into the child node R 2 of parent node R 1 , store the weight function value L 2 ( K , K ) into the child node R 3 of parent node R 1 in minimum heap R ; and,
(iii) if L 3 ( K , K ) < L 1 ( K , K ) L 2 ( K , K ) , delete the child node R 1 and R 2 values, store the weight function value L 3 ( K , K ) into the parent node R 1 . Store the weight function value L 1 ( K , K ) and L 2 ( K , K ) into the child node R 2 and R 3 of parent node R 1 respectively in minimum heap R .
② If L 1 ( K , K ) > L 2 ( K , K ) :
(i) if L 3 ( K , K ) L 1 ( K , K ) > L 2 ( K , K ) , keep the weight function values L 1 ( K , K ) and L 2 ( K , K ) storing unchanged, store the weight function value L 3 ( K , K ) into the child node R 3 of parent node R 1 in minimum heap R ;
(ii) if L 1 ( K , K ) > L 3 ( K , K ) L 2 ( K , K ) , keep the weight function value L 2 ( K , K ) storing unchanged, delete the child node R 2 value, and store the weight function value L 3 ( K , K ) into the child node R 2 of parent node R 1 , store the weight function value L 1 ( K , K ) into the child node R 3 of parent node R 1 in minimum heap R ; and,
(iii) if L 1 ( K , K ) > L 2 ( K , K ) > L 3 ( K , K ) , delete the child node R 1 and R 2 values, store the weight function value L 3 ( K , K ) into the parent node, and store the weight function values L 1 ( K , K ) and L 2 ( K , K ) into the child node R 3 and R 2 of parent node R 1 , respectively in minimum heap R .
Sub-step 5. Return to Sub-step 3, and use the same method to search all generation tree closed-loop structures O 4 O τ and find the rebuilt generation tree weight function minimum heap R . As step 4 ends, enter Step 5.
Step 5. Output tour route sorting heap relating to generation tree weight function minimum heap R . The weight function value L ω ( K , K ) relates to the generation tree closed-loop structure O ω , which relates to the tour route. According to the algorithm rule, the weight function value that is stored in the parent node of minimum heap R relates to the optimal tour route. As its output generation tree motive weight function value is the minimum one, the iteration value of all sub-unit motive function values is the maximum one. In the aspect of the comprehensive output result, the optimal tour route performs best on tourist site classification, tourist quantity, confirmed specific tourist sites, tourist sites distribution, tour sequence, GIS service, traffic information service, and tourist site star level, etc. The two child nodes of the parent node relate to sub-optimal tour routes. A smart machine will output the visualized results for tourists according to the input conditions.

4. Sample Experiment and Result Analysis

A sample experiment is carried out to testify the Naïve Bayes machine learning algorithm, optimal tourist site mining algorithm based on membership degree searching propagating tree, and optimal tour route planning algorithm based on closed-loop structure to testify the algorithm feasibility. A mainstream and popular tourism website is used to collect the data for mining interest information. We take one tourism city as an example, and choose certain typical tourist sites in the downtown area as the research range [37,38,39,40]. The experiment chooses one tourist as the study object. Via the Naïve Bayes machine learning algorithm, the tourist site of interest classifications are learned and confirmed. Subsequently, we search and mine the optimal tourist sites via temporary accommodation as a clustering center. According to the mined tourist sites, the optimal tour routes are planned for tourists, and the relative guide maps are also provided. Finally the experiment results are analyzed and concluded. Further research directions are also concluded on the aspect of the algorithm and method modeling.

4.1. Research Range and Data Sampling

We take the tourism city Zhengzhou as an example, and 25 typical tourist sites in the downtown area are selected as experiment samples. All of the selected tourist sites meet the following conditions: First, all of the tourist sites are located in the downtown area, that is, tourists can get access to any one of them by taking urban transport such as bus, subway, taxi, etc., but not including tourist sites in the outskirts districts and counties where urban transport does not have access. Second, the tourist sites have an attraction index, a certain amount of travelling volume and value for visiting. Third, city roads and avenues with each other connect the tourist sites, as tourists can ferry from one tourist site to another freely. Fourth, the tourist sites are independent with each other in geographic space, the travel experience tourists get for one tourist site does not influence the travel experience in another tourist site. While considering all of the conditions, the experiment applies the algorithm built in the research to crawl tourism big data and mine interest knowledge. From the Zhengzhou GIS database and “Baidu” map, GIS service data, traffic information data, tourist site information data that are used to confirm factors λ v 1 and δ v 2 are mined and selected as the experiment basic data.

4.1.1. Tourist Site Basic Data

According to the tourist site selecting conditions, the tourist site classification vector C is confirmed. By the means of tourist site feature classifying, we set m = 4 . We classify typical tourist sites of Zhengzhou city into four groups, that is c 1 : Park and Greenland classification; c 2 : Venue classification; c 3 : Amusement park classification; and, c 4 : Shopping center classification. According to Zhengzhou tourism statistics data, the selected typical tourist sites are as follows.
c 1 = { c 11 : Renmin park; c 12 : Bishagang park; c 13 : Zijinshan park; c 14 : Lvcheng square; c 15 : Botanic park; c 16 Forest park; c 17 : Zoo};
c 2 = { c 21 : Henan museum; c 22 : Zhengzhou museum; c 23 : Zhengzhou science and technology museum; c 24 : Erqi memorial; c 25 : Aquarium; c 26 : Zhongyuan tower};
c 3 = { c 31 : Century park; c 32 : Water park; c 33 : Children park; c 34 : Bar street; c 35 : Fengle park}; and,
c 4 = { c 41 : Dehua street; c 42 : Erqi Wanda; c 43 : Zhongyuan Wanda; c 44 : Wangfujing; c 45 : Dennis; c 46 : CC mall; c 47 : Guomao}.
We take Zhengzhou city’s main roads and avenues that connect all of the tourist sites as basic structure to output the map of tourist sites’ geographic distribution, as Figure 4 shows, including all tourist sites or tourist attractions, which are represented by black dots in Figure 4a, and the selected typical tourist sites in Figure 4b.

4.1.2. Interest Mining Machine Learning Modeling Data

We take mainstream and popular tourism website “Fengwo”, “Xiecheng”, etc., as tourism big data sources and crawl text data from the websites to mine critical information. After data cleaning, data integration, and data protocol, k = 1000 tourist samples are finally selected to set the machine learning algorithm. Each tourist information contains the feature attributes x 1 , x 2 , …, x n demanded by vector X and tourist classification c j of vector C . Store tourist information as the format of training sample data vector D = { x 1 , x 2 , , x n , c j } , in which x i contains specific feature attributes. According to the tourist quantity contained by the feature classification in vector X , relative tourist site classification quantity and Naïve Bayes interest minging algorithm, the experiment calculates the conditional probability of the feature attribute vector X and the prior probability of tourist site classification, as listed in Table 1. In the Table, the first line is prior probability of tourist site classification P ( c j ) ; the other lines are the conditional probability of x i i in the feature attribute vector X .
When a tourist’s basic conditions are given, the machine learning module will output a group of predicted descending order basic vector E , according to the prior probability of tourist site classification and the conditional probability of feature attributes. Vector E represents the interest tendency on tourist site classifications of the tourist under the condition of basic needs, and the tourist site classification elements are arranged by the interest tendency.

4.1.3. Algorithm Influence Factors λ v 1 and δ v 2 Data

The factors used in algorithm modeling include a membership degree direct influence factor λ v 1 , membership degree indirect influence factor δ v 2 , longitude and latitude, and attraction index, according to the modeling process of the optimal tourist site mining algorithm and optimal tour route planning algorithm based on a closed-loop structure. The Zhengzhou GIS database and “Baidu” map are used to extract these factors, including: (1) traffic light quantity between two tourist sites; (2) bus and subway quantity between two tourist sites; (3) distance from tourist site to the nearest bus or subway station; (4) taxi fee between two tourist sites; (5) road traffic jam index; (6) average waiting time of each road for vehicles; and, (7) specific traffic jam roads. The longitude and latitude of the clustering center K and tourist sites are extracted from the “GPSspg” website, and each tourist site’s attraction index is crawled from mainstream and popular tourism websites. Table 2 shows the longitude and latitude ( l , B ) and attraction index a for tourist sites.

4.2. Sample Experiment and Result Analysis

The sample experiment is designed via the obtained basic data. The basic thought of the experiment is to confirm one tourist as a research object. We set the tourists’ temporary accommodation as the clustering center K . The tourist’s needs and interests confirm the feature attribute vector X . Predicted descending order basic vector E is output by a Naïve Bayes machine learning module, based on which, according to the quantity of specific tourist sites to be visited, a smart machine outputs the tourist site of interest classification distribution matrix A . Via matrix A and a confirmed sorting of tourist site classification and quantity, the smart machine searches and mines tourist sites with optimal geographic distribution to meet the tourists’ needs and interests while using an optimal tourist site mining algorithm based on a membership degree searching propagating tree. Meanwhile, the process of searching and mining tourist sites are controlled and monitored. Generation tree weight function minimum heap R is output by the optimal tour route planning algorithm based on a closed-loop structure, according to the mined optimal tourist sites.

4.2.1. Sample Experiment

Tourist Site of Interest Classification Mining Result

The input tourists’ basic conditions are noted as feature attribute vector X , the experiment confirms X = { x 12 : 18 a g e < 46 ; x 22 : 0.2 < i n c o m e 0.5 ; x 32 : 0.02 < exp e n s e 0.05 ; x 41 : s p r i n g } . The selected temporary accommodation’s longitude and latitude are K = ( 113.678 , 34.751 ) . Via the Naïve Bayes machine learning module, the conditional probability of the feature attribute vector X and tourist site prior probability; Table 3 result data are calculated and output. The data are the conditional probability for the feature attribute vector X under the condition of tourist site classification.
According to Table 3, the values of the product probability are:
P ( X | c 1 ) P ( c 1 ) = 0.00534 ; P ( X | c 2 ) P ( c 2 ) = 0.00566 ; P ( X | c 3 ) P ( c 3 ) = 0.00050 ; and, P ( X | c 4 ) P ( c 4 ) = 0.00024 .
According to the calculated values, smart machine outputs predicted a descending order basic vector E = { c 2 , c 1 , c 3 , c 4 } . Vector E shows that, under the condition of output information, the tourist’s most interested tourist site classification type is Venue, then Park and Greenland, and last Amusement park and Shopping center. We suppose the tourist selects four tourist sites to visit in one day, τ = 4 . The output tourist site of interest classification distribution matrix A is formula (14).
A = 1 2 1 0 0 2 2 0 2 2 0 0 1 1 1 1
A smart machine will recommend the sort of tourist site classification and quantity row by row in matrix A , and specifically recommends the first row. The tourist considers his own interests, time schedule, budget, physical condition, etc., to select one sort of tourist site classification and quantity. We take the first row as the experimental example. The tourist is willing to visit one tourist site of Park and Greenland, two tourist sites of Venue, and one tourist site of Amusement park.

Optimal Tourist Site Mining Result

We take the tourist’s temporary accommodation K = ( 113.678 , 34.751 ) as the central point. We search optimal tourist sites under this sort of tourist site classification and quantity via an optimal tourist site mining algorithm and output seed tourist site cluster table, optimal tourist site generation tree t r e e 1 , objective function value table, objective function fluctuating curve, and membership degree distribution matrix. We define the attraction index of the clustering center K as a K = 0 . Table 4 shows the obtained central seed tourist sites and objective function values when searching one tourist site of Park and Greenland, two tourist sites of Venue, and one tourist site of Amusement park, meanwhile it is also the list of seed tourist sites.
The first line of Table 4 shows each seed tourist site searched from K and subsequent seed tourist sites. Each column shows the objective function values when searching its subordinate seed tourist site, in which the objective function value is 0 when searching itself. Each seed tourist site is noted as the subordinate seed tourist site of the previous one, either belonging to the same cluster or a different cluster. According to the searched objective function values, each objective function descending order vector Q is obtained when searching one seed tourist site. Figure 5 shows the objective function fluctuating curves when searching seed tourist sites with different starting points.
According to the Table 4 data and Figure 5 objective function fluctuating curves, the objective function descending order vectors generated by searching seed tourist sites with different central points are shown, as follows.
(1) Central point K :
Q 1 = { 4.026 , 2.933 , 2.910 , 2.361 , 2.096 , 1.957 , 1.946 , 1.941 , 1.886 , 1.865 , 1.844 , 1.788 , 1.786 , 1.742 , 1.715 , 1.705 , 1.685 , 1.671 , 1.654 , 1.633 , 1.591 , 1.524 , 1.396 , 1.372 , 1.297 } (2) Central point c 24 :
Q 2 = { 5.979 , 2.994 , 2.301 , 1.767 , 1.553 , 1.551 , 1.511 , 1.508 , 1.507 , 1.494 , 1.450 , 1.446 , 1.404 , 1.398 , 1.381 , 1.359 , 1.339 , 1.335 , 1.323 , 1.308 , 1.166 , 1.103 , 1.102 , 0.982 , 0.000 } (3) Central point c 11 :
Q 3 = { 2.261 , 2.203 , 1.713 , 1.708 , 1.545 , 1.492 , 1.484 , 1.478 , 1.459 , 1.448 , 1.448 , 1.431 , 1.423 , 1.411 , 1.328 , 1.324 , 1.320 , 1.281 , 1.235 , 1.223 , 1.129 , 1.093 , 1.052 , 0.858 , 0.000 } (4) Central point c 32 :
Q 4 = { 1.673 , 1.636 , 1.608 , 1.556 , 1.490 , 1.484 , 1.484 , 1.453 , 1.380 , 1.374 , 1.350 , 1.341 , 1.308 , 1.304 , 1.290 , 1.283 , 1.272 , 1.266 , 1.264 , 1.220 , 1.041 , 1.006 , 1.002 , 0.893 , 0.000 }
According to the definition of the membership degree distribution matrix μ ( c ) , the matrix μ ( c ) generated by searching optimal tourist site structure tree is shown in formula (15).
μ ( c ) = 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
We analyze the result data, starting from the temporary accommodation K , the searched and mined optimal tourist sites are c 24 : Erqi memorial, c 11 : Renmin park; c 32 : Water park, and c 21 : Henan memorial. Figure 6 shows the process of generating an optimal tourist site structure tree. Figure 6d is the final generated structure tree.

Optimal Tour Route Searching Result

Starting from the temporary accommodation K , the tourist visits four tourist sites of c 24 : Erqi memorial, c 11 : Renmin park; c 32 : Water park, c 21 : Henan memorial, and finally returns to the starting point K , which forms an integrated generation tree closed-loop. According to the tour route planning algorithm based on optimal closed-loop structure, sub-unit motive weight values for the interval of temporary accommodation K and tourist sites, one tourist site and another one, are confirmed, as shown in Table 5. Via the optimal tour route planning algorithm, taking basic Zhengzhou city’s GIS data, membership degree direct influence factors λ v 1 and indirect influence factors δ v 2 , the generation tree weight function minimum heap R is searched and confirmed, and then optimal tour routes and sub-optimal tour routes are output, as shown in Figure 6. We set the arbitrary sub-unit motive function initial value as I 0 = 1.000 . According to formula (11), each sub-unit’s motive function values increase with the tourist’s ferry distance increasing, the maximum value is obtained at the terminal tourist site of the sub-unit. Figure 7 shows the sub-unit motive function I ( ) fluctuating curves (green color), sub-unit motive weight value h ( ) fluctuating curves (blue color), and generation tree weight function L ( ) fluctuating curves (brown color).
T the initial heap is obtained according to the generation tree weight function values L ( ) and generation tree weight function minimum heap R algorithm, and then the minimum heap is output as Figure 8a. The weight function value complete binary tree is output as Figure 8b. As to the minimum heap R and complete binary tree, the value of the starting element of minimum heap R and parent node of binary tree are 2.563, whose related tour routes are the No.(1) and No.(24) tour routes. The two routes’ generation tree weight function value L ( ) is the minimum value, and thus it relates to the maximum sub-unit motive function I ( ) iteration value, that is, the tourist can get the best motive benefits from the two optimal routes. The value 2.662 on the minimum heap’s third and fourth elements and further child nodes on complete binary tree relates to sub-optimal tour routes. According to the optimal and sub-optimal tour routes, related generation tree closed-loop structures and guide maps are output, as shown in Figure 9, in which Figure 9a relates to the No.(1) tour route, Figure 9b relates to No.(24) tour route, Figure 9c relates to the No.(7) tour route, and Figure 9d relates to the No.(23) tour route.

Algorithm Effectiveness Comparison

We compare algorithm in this research with the other shortest route search algorithms on the aspect of time complexity and space complexity. The proposed algorithm in this research is in the experimental group, while the A * search algorithm, Dijkstra search algorithm and Floyd search algorithm are in the control group. Under the same initial conditions of starting point, selected tourist sites, terminal point, and weight values, the algorithms in the experiment group and control group are used to search and output the same results in Table 6. Four algorithms traverse all closed-loop structures and output the minimum heap via the minimum heap sorting algorithm in order to find optimal and sub-optimal routes, thus the results are the superposition with minimum heap sorting algorithm. In the sample experiment, the starting point and terminal point are K , n = 6 . Table 7 shows the time complexity (TC) and space complexity (SC) for each algorithm to obtain the same results in Table 6.

4.2.2. Experiment Result Analysis and Discussion

(1) Experiment basic data analysis
The experiment takes the tourism city of Zhengzhou as an example. By setting and confining the research range, the tourist sites in the downtown area of the city are confirmed as the study objects. The selected and confirmed tourist site objects have the feature of urban public transport accessibility and urban road connectivity, which ensure the feasibility and practicability of the research result. The 25 tourist sites that are extracted from all tourist sites and tourist attractions in downtown of Zhengzhou city are the most popular, typical, and representative, which can cover four tourist site classifications, optimize the tourist site storage data of smart machine, and ensure that the tourist sites provided for tourists are the optimal ones. As to the randomness of temporary accommodation K selected by tourists, the selected tourist sites are separated in geographic distribution, which cover Zhengzhou city’s different districts and ensures that arbitrary locations can search all of the tourist sites and mine the optimal ones.
The text format information mined from the “FengWo” tourism website is the basic data for setting up a Naïve Bayes machine learning module. In this study, 1000 tourists’ traveling data text format information covering n feature attributes from “FengWo” tourism website were extracted and processed. By calculating the conditional probability of feature attribute vector X and the prior probability of tourist site classification to set up the interest machine learning model. When tourists input basic information, the smart machine outputs tourist site classifications according to the interest tendency. Table 1 shows the conditional probability of feature attribute vector X and prior probability of tourist site classification, counting from the 1000 tourist samples. Thus, the Naïve Bayes machine learning module has fairly strong generalization ability.
The extracted membership degree direct influence factors and indirect influence factors include GIS service information data, traffic service information data, and tourist site information, which are the critical and important factors influencing tourist motive benefits during the trip. They act on the generation of motive iteration function values and generation tree weight function values. The experiment data are extracted from the “Baidu” map and Zhengzhou city’s GIS database, reflecting the real world and actual trip situation, thus the tour routes planned with actual spatial information data can be directly provided as recommendations for tourists by the smart machine.
(2) Tourist site of interest classification mining result analysis
We analyze the tourist site of interest classification mining result, when the sample tourist confirms the feature attributes, the interest mining machine learning module outputs conditional probability of feature attribute vector X in Table 3. The machine then outputs the probability product P ( X | c i ) P ( c i ) and predicted descending order basic vector E . The smart machine outputs the tourist site of interest classification distribution matrix A , according to the selected sort of tourist site classification and quantity. On the aspect of the tourist site of interest classification mining result, when an arbitrary item of tourists’ feature attribute or confirmed quantity of tourist sites to be visited change, the tourist site of interest classification distribution matrix A will also change, which will influence the result of the optimal tourist site mining and optimal tour route planning. The sort of tourist site classification and quantity in matrix A will be displayed for tourists.
(3) Optimal tourist site mining result analysis
According to the optimal tourist site mining algorithm, smart machines starts to search from temporary accommodation. Searching for one subordinate seed tourist site, one group of objective function values is obtained. Table 4 shows the mined four subordinate seed tourist sites when smart machine searches the three tourist site classifications under the condition of the first row of matrix A . The first line of the Table shows the mined subordinate seed tourist sites K , c 24 ( G 1 + ), c 11 ( G 2 ), and c 32 ( G 3 ). The searching process ends with c 21 ( G 4 ). Table 8 is the summary of the searching process for optimal tourist sites.
Each seed tourist site relates to one column of objective function values, and the bold black value relates to the mined subordinate seed tourist site. Since the algorithm confines the tourist site classification, the maximum value of the column objective function values might not be the seed tourist site objective value, as a certain classification might not be the interesting one or the value’s tourist site classification is full rank. When searching the central point itself, the objective function is defined as 0.000. We analyze the Figure 5 objective function fluctuating curves with different central points, and each curve contains the maximum value and the minimum value. Since the tourist site of interest does not include c 4 , the smart machine automatically neglects the objective function values of c 4 . Starting from the K central point, the minimum value 1.297 appears in tourist site c 16 , and the maximum value 2.933 appears in tourist site c 24 , it meets the condition, so the K central point’s subordinate seed tourist site is c 24 . Starting from the c 24 central point, the minimum value 0.982 appears in tourist site c 35 , and the maximum value 2.301 appears in tourist site c 11 , it meets the condition, so the c 24 central point’s subordinate seed tourist site is c 11 . Starting from the c 11 central point, the minimum value 0.858 appears in tourist site c 35 , and the maximum value 1.545 appears in tourist site c 32 , it meets the condition, so the c 11 central point’s subordinate seed tourist site is c 32 . Starting from the c 32 central point, the minimum value 0.893 appears in tourist site c 15 , and the maximum value 1.673 appears in tourist site c 21 , it meets the condition, so the c 32 central point’s subordinate seed tourist site is c 21 . We analyze the membership degree distribution matrix μ ( c ) , element 1 relates to the seed tourist site. The matrix μ ( c ) intuitively shows the distribution of seed tourist sites stored in computer, when the location of K , tourists’ needs, and interests, and the selected sort of tourist site classification and quantity change, the element distribution in matrix μ ( c ) will also change. Figure 6 shows the whole process of searching seed tourist sites. From the Figure, the method and process to search seed tourist sites conforms to the algorithm developed in the research, and the mined tourist sites are optimally distributed in geographic space, which can meet tourists’ interests and cost the minimum expenditure.
(4) Optimal tour route searching result analysis
After analyzing the Table 5 data, it is observed that the minimum sub-unit motive weight value 0.341 appears in the interval of K and c 24 , the maximum value 0.736 appears in the interval of c 21 and c 24 The smaller the motive weight value is, the larger the motive function value will be, and the more motive benefits tourists will get in this interval. Table 6 shows the sub-unit motive function I ( ) values, sub-unit motive weight h ( ) values, and generation tree weight L ( ) values relating to the 24 tour routes of generation tree closed-loop structures. We analyze the Table 6 data and Figure 7 curves, and different tour routes of the generation tree closed-loop structures output different sub-unit motive function I ( ) values, sub-unit motive weight h ( ) values and generation tree weight L ( ) values. In one sub-unit, the motive function values increases with tourists’ ferry distance increasing. Starting from the initial function value and increasing to the maximum value at the next tourist site. The sub-unit motive function value and motive weight value are reciprocals with each other, whose curve trends are reversed, that is one curve is monotone increasing in one unit, and the other one will be monotone decreasing in the same unit. The generation tree weight function is always monotone increasing in the whole trip, and it gets to the maximum value at the terminal point K According to Table 6, the minimum heap of weight function values and ascending complete binary tree are output, as shown in Figure 8. The weight function value 2.563 stored in the first and second element of the minimum heap and in the parent node and parent node’s left child node of the complete binary tree relate to the No.(1) and No.(24) generation tree closed-loop structures and tour routes. This illustrates that the two tour routes relate to the maximum motive iteration function values, and tourists can get the best motive benefits by taking the two routes, thus they are the optimal ones. The third and fourth elements of the minimum heap and further child nodes of the parent node relate to sub-optimal routes. The generation tree closed-loop structures and guide maps relating to the optimal and sub-optimal tour routes are shown in Figure 9, according to the finally output minimum heap and complete binary tree.
The generation trees of optimal tour routes K c 24 c 11 c 32 c 21 K and K c 21 c 32 c 11 c 24 K cover the whole circle, start from the clustering center K and end with the same point K in clockwise and anticlockwise directions. The sub-optimal tour routes K c 11 c 24 c 32 c 21 K and K c 21 c 32 c 24 c 11 K do not pass the circle, but are formed by connecting lines in the circle. Different closed-loop structures relate to different tour route guide maps. Figure 9 shows the guide maps relating to the optimal and sub-optimal tour routes. The smart machine provides tourists with certain tour routes of the total 24 tour routes, and then sets the two optimal ones K c 24 c 11 c 32 c 21 K and K c 21 c 32 c 11 c 24 K to the front. Meanwhile, the two tour routes will be especially recommended for tourists, and then the sub-optimal ones K c 11 c 24 c 32 c 21 K and K c 21 c 32 c 24 c 11 K The final recommended tour routes all conform to tourists’ needs and interests. By taking these tour routes, tourists can get the best motive benefits in tourist site classification, quantity, geographic distribution, expenditure, and travel experience.
(5) Comparison with other algorithms and route planning toolkit
From the Table 7 data, a comparison between the proposed algorithm and other search algorithms is concluded. In the process of searching for the same results, four algorithms have different performance and effectiveness. For all of them, the time complexity increases with the increasing of value n , in which the Floyd algorithm has the fastest increasing rate, and then Dijkstra algorithm, the proposed algorithm, and A * algorithm. When the time unit is 1ns (nanosecond), each algorithm’s operation time is in nanosecond time measurement. In the experiment, the total quantity of tourist sites is 25, thus the operation time will be in the feasible range. Set n = 6 , Floyd algorithm’s time complexity is much larger than other algorithms. When compared with other algorithms, the proposed algorithm has moderate time complexity. Compare with A * algorithm, the advantage of the proposed algorithm is that there is no need to preset heuristic function and calculate points distances via distance function, which can avoid the A * algorithm’s redundant data and the situation of not obtaining optimal solution. As compared with Dijkstra algorithm and the Floyd algorithm, the proposed algorithm has relatively higher execution effectiveness and smaller space complexity.
When compared with Google map and ArcGIS auxiliary toolkit, the different features and method innovation of the proposed method are as follows. First, the input and preset contents are different. In the proposed method, tourists simply input age, time schedule, travel budget, etc., smart machine will output optimal tourist sites and routes. Tourists don’t bother to acquire specialized knowledge on geographic information, thus the proposed method is user-friendly. As to Google map and ArcGIS auxiliary toolkit, the input and preset contents are relatively more professional, tourists should more or less acquire specialized knowledge. Second, the mode to ensure that tourist sites are different. The proposed method mines and confirms optimal tourist sites via input basic information and clustering center location, while Google map and ArcGIS auxiliary toolkit require tourists to select tourists by themselves. Thus, the proposed method is more suitable for the tourists who are not familiar with the city and tourist sites and who completely rely on smart system to get tour route. Third, the proposed method provides tour route with multiple tourist sites, Google map mainly provides route between two points. Network Analyst module of ArcGIS can make route time analysis, point to point route analysis, service area definition, subordinate facility searching, starting and terminal points matrix analysis, etc. When planning the route of multiple points, ArcGIS needs to upload multiple function layers, whose internal algorithm could be Dijkstra algorithm, etc. On the aspect of algorithm effectiveness, the proposed method and ArcGIS internal algorithm both have high execution effectiveness.

5. Conclusions and Future Work

A smart tour route planning algorithm based on a Naïve Bayes machine learning interest data mining is proposed, as to the current problems of tour route planning. The aim of the study is to provide tourists with a feasible and practical smart method to visit the tourist sites of interest. A Naïve Bayes machine learning module is set up by learning tourism big data. It can recommend tourist site classification to tourists in accordance with basic information and needs. The feature of the machine learning module is to output tourist site classification vectors with interest tendencies from high to low, and then output a tourist site of interest classification distribution matrix. From this matrix, the smart machine recommends sorts of tourist classification and quantity to tourists, and tourists can choose one sort according to their own needs and schedule. By setting up an optimal tourist site mining algorithm that is based on a membership degree searching generation tree algorithm, the smart machine searches and mines optimal tourist sites according to the selected sort of classification and quantity. The feature of the optimal tourist site mining algorithm is to search for the best geographically distributed tourist sites within the neighbourhood buffer, which satisfies the needs of tourists and costs the least expenditure. When combined with the factors of GIS service, traffic information, and tourist site information that influence motive benefits during the whole trip, the optimal tour route planning algorithm that is based on closed-loop structure is set up. The closed-loop structure iterates the generation tree motive weight function values of different tour routes and output a generation tree weight function minimum heap R and relative complete binary tree, and confirms the optimal tour route and sub-optimal tour route as well as guide maps for tourists. In the research, the developed algorithm and method have complete and integrated functions, the output tour routes cover all tourist sites of interest for tourists and conform to the practical and actual traveling process. All of the visualized optimal tourist sites, tour routes, and guide maps are mutually provided for the tourists. The tourists can select the most appropriate one according to their own needs and interests.
The algorithm that is designed and developed in the study is based on the mining and learning tourism big data. In future research work, there is more work that could be carried out. First, tourists’ feature attributes can be subdivided to more precisely mine tourists’ needs and interests. Secondly, an interest tendency deviation correction method will be designed and developed to accurately predict and output tourists’ interests, the aim of which is to ensure that each tourist can get the best motive benefits and travel experience. Thirdly, on the aspect of tourist site mining and tour route planning, tourist sites accessibility will be studied, and more transportation and ferry ways will be considered to enrich the functions of the smart machine. Finally, real-time controlling and monitoring of tourists’ travelling process could be studied and developed to ensure the tourists’ motive benefits.

Author Contributions

This research was completed jointly by all the authors. Mingzhan Su collected and processed the original data. Zhong Liu and Yu Hu mainly conducted and designed the experiments. Xiao Zhou performed the experiments and wrote the paper. Mingzhan Su and Bin Sun assisted with the experimental design. Bin Sun and Guanghui Feng conducted the data collection and analysis. All authors have read and agree to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (Grant 2017YFB0503500), Henan Scientific and Technological Project (Grant 182102210382), Open-end Fund of State Key Laboratory of Geo-Information Engineering (Grant No. SKLGIE2018-ZZ-6) and Training Program for Young Backbone Teachers of Henan Higher Education Institute (Grant No. 2017GGJS196).

Acknowledgments

We thank the great help provided by editors and reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhan, Q.; Deng, S.; Zheng, Z. An adaptive sweep-circle spatial clustering algorithm based on gestalt. ISPRS Int. J. Geo-Inf. 2017, 6, 272. [Google Scholar] [CrossRef] [Green Version]
  2. Wu, X.; Huang, Z.; Peng, X.; Chen, Y.; Liu, Y. Building a spatially-embedded network of tourism hotspots from geotagged social media data. IEEE Access 2018, 6, 21945–21954. [Google Scholar] [CrossRef]
  3. Kruger, M.; Viljoen, A.; Saayman, M. Who visits the Kruger national park, and why? Identifying target markets. J. Travel Tour. Mark. 2017, 34, 312–340. [Google Scholar] [CrossRef]
  4. Zheng, W.; Huang, X.; Li, Y. Understanding the tourist mobility using gps: Where is the next place? Tour. Manag. 2017, 59, 267–280. [Google Scholar] [CrossRef] [Green Version]
  5. Wang, X.; Choi, T.M.; Liu, H.; Yue, X. Novel ant colony optimization methods for simplifying solution construction in vehicle routing problems. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3132–3141. [Google Scholar] [CrossRef]
  6. Yang, W.; Ai, T.; Lu, W. A method for extracting road boundary information from crowdsourcing vehicle GPS trajectories. Sensors 2018, 18, 1261. [Google Scholar] [CrossRef] [Green Version]
  7. Chehreghan, A.; Abbaspour, R.A. A geometric-based approach for road matching on multi-scale datasets using a genetic algorithm. Cartogr. Geogr. Inf. Sci. 2018, 45, 255–269. [Google Scholar] [CrossRef]
  8. Zhang, Y.; Huang, J.; Deng, M.; Fang, X.; Hu, J. Relaxation labelling matching for multi-scale residential datasets based on neighboring patterns. Geomat. Inf. Sci. Wuhan Univ. 2018, 43, 1098–1105. [Google Scholar]
  9. Jabbarpour, M.R.; Noor, R.M.; Khokhar, R.H. Green vehicle traffic routing system using ant-based algorithm. J. Netw. Comput. Appl. 2015, 58, 294–308. [Google Scholar] [CrossRef]
  10. Kang, S.; Lee, G.; Kim, J.; Park, D. Identifying the spatial structure of tourism attraction system in South Korea using GIS and network analysis: An application of anchor-point theory. J. Destin. Mark. Manag. 2018, 9, 358–370. [Google Scholar]
  11. Rahayuningsih, T.; Muntasib, E.K.S.H.; Prasetyo, L.B. Nature based tourism resources assessment using geographic information system (GIS): Case study in Bogor. Procedia Environ. Sci. 2016, 33, 365–375. [Google Scholar] [CrossRef] [Green Version]
  12. Tracewski, L.; Bastin, L.; Fonte, C.C. Repurposing a deep learning network to filter and classify volunteered photographs for land cover and land use characterization. Geo-Spat. Inf. Sci. 2017, 20, 252–268. [Google Scholar] [CrossRef] [Green Version]
  13. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
  14. Tufekci, Z. Big questions for social media big data: Representativeness, validity and other methodological pitfalls. ICWSM 2014, 14, 505–514. [Google Scholar]
  15. Gwon, G.-P.; Hur, W.-S.; Kim, S.-W.; Seo, S.-W. Generation of a precise and efficient lane-level road map for intelligent vehicle systems. IEEE Trans. Veh. Technol. 2017, 66, 4517–4533. [Google Scholar] [CrossRef]
  16. Hall, C.M.; Le-Klahn, D.T.; Ram, Y. Tourism, Public Transport and Sustainable Mobility; Channel View Publications: Bristol, UK, 2017. [Google Scholar]
  17. Kim, J.; Thapa, B.; Jang, S. GPS-based mobile exercise application: An alternative tool to assess spatio-temporal patterns of visitors’ activities in a National Park. J. Park Recreat. Admin. 2019, 37, 124–134. [Google Scholar] [CrossRef]
  18. Yang, B.; Luan, X.; Zhang, Y. A pattern-based approach for matching nodes in heterogeneous urban road networks. Trans. GIS. 2014, 18, 718–739. [Google Scholar] [CrossRef]
  19. Hwang, R.H.; Hsueh, Y.L.; Chen, Y.T. An effective taxi recommender system based on a spatio-temporal factor analysis model. Inf. Sci. 2015, 314, 28–40. [Google Scholar] [CrossRef]
  20. Kong, X.; Xia, F.; Wang, J.; Rahim, A.; Das, S.K. Time-location-relationship combined service recommendation based on taxi trajectory data. IEEE Trans. Ind. Inf. 2017, 13, 1202–1212. [Google Scholar] [CrossRef]
  21. Sun, D.; Zhang, C.; Zhang, L.; Chen, F.; Peng, Z.-R. Urban travel behavior analyses and route prediction based on flfloating car data. Transport. Lett. 2014, 6, 118–125. [Google Scholar] [CrossRef]
  22. Chen, B.Y.; Yuan, H.; Li, Q.Q.; Lam, W.H.K.; Shaw, S.L.; Yan, K. Map-matching algorithm for large-scale low-frequency floating car data. Int. J. Geogr. Inf. Sci. 2014, 28, 22–38. [Google Scholar] [CrossRef]
  23. Yang, P.; Tang, K.; Lozano, J.A.; Cao, X. Path planning for single unmanned aerial vehicle by separately evolving waypoints. IEEE Trans. Robot. 2015, 31, 1130–1146. [Google Scholar] [CrossRef] [Green Version]
  24. Albinati, J.; Oliveira, S.E.; Otero, F.E.; Pappa, G.L. An ant colony-based semi-supervised approach for learning classifification rules. Swarm Intell. 2015, 9, 315–341. [Google Scholar] [CrossRef] [Green Version]
  25. Farahnakian, F.; Ashraf, A.; Pahikkala, T.; Liljeberg, P.; Plosila, J.; Porres, I.; Tenhunen, H. Using ant colony system to consolidate VMs for green cloud computing. IEEE Trans. Serv. Comput. 2015, 8, 187–198. [Google Scholar] [CrossRef]
  26. Wang, Z.; Xing, H.; Li, T.; Yang, Y.; Qu, R.; Pan, Y. A modifified ant colony optimization algorithm for network coding resource minimization. IEEE Trans. Evol. Comput. 2015, 1, 1–18. [Google Scholar]
  27. Gkiotsalitis, K.; Stathopoulos, A. A Mobile Application for Real-Time Multimodal Routing Under a Set of Users’ Preferences. J. Intell. Transp. Syst. 2014, 19, 149–166. [Google Scholar] [CrossRef]
  28. Dolinayova, A.; Masek, J.; Kendra, M.; Camaj, J.; Grandsart, D.; Marlier, E.; Colzani, P.; Arena, M.; Paragreen, J.; Navaratnam, P.; et al. Research of the Passenger’s Preferences and Requirements for the Travel Companion Application. J. Adv. Transp. 2018, 4, 1–12. [Google Scholar] [CrossRef] [Green Version]
  29. Ciesielski, K.C.; Falcao, A.X.; Miranda, P.A.V. Path-value functions for which Dijkstra’s Algorithm Returns Optimal Mapping. J. Math. Imaging Vis. 2018, 60, 1025–1036. [Google Scholar] [CrossRef]
  30. Li, J.Q.; Zhou, K.; Zhang, L.; Zhang, W.B. A multimodal trip planning system incorporating the park-and-ride mode and real-time traffic and transit information. Proc. Its World Congr. 2010, 25, 65–76. [Google Scholar]
  31. Borrís, J.; Moreno, A.; Valls, A. Intelligent tourism recommender systems: A survey. Expert Syst. Appl. 2014, 41, 7370–7389. [Google Scholar] [CrossRef]
  32. Srinivasan, K.K.; Prakash, A.A.; Seshadri, R. Finding most reliable paths on networks with correlated and shifted log–normal travel times. Transp. Res. Part B Methodol. 2014, 66, 110–128. [Google Scholar] [CrossRef]
  33. Opsahl, T.; Agneessens, F.; Skvoretzc, J. Node centrality in weighted networks: Generalizing degree and shortest paths. Soc. Netw. 2010, 32, 245–251. [Google Scholar] [CrossRef]
  34. Perrine, K.; Khani, A.; Ruiz-Juri, N. Map-Matching algorithm for applications in multimodal Transportation Network Modeling. Transp. Res. Rec. 2015, 2537, 62–70. [Google Scholar] [CrossRef]
  35. Yuan, L.; Yu, Z.; Luo, W.; Zhang, J.; Hu, Y. Clifford algebra method for network expression, computation, and algorithm construction. Math. Methods Appl. Sci. 2014, 37, 1428–1435. [Google Scholar] [CrossRef]
  36. Daina, N.; Sivakumar, A.; Polak, J.W. Electric vehicle charging choices: Modelling and implications for smart charging services. Transp. Res. C Emerg. Technol. 2017, 81, 36–56. [Google Scholar] [CrossRef]
  37. Zhou, X.; Xu, C.; Kimmons, B. Detecting tourism destinations using scalable geospatial analysis based on cloud computing platform. Comput. Environ. Urban Syst. 2015, 54, 144–153. [Google Scholar] [CrossRef]
  38. Jia, Q.; Wang, R. Automatic extraction of road networks from GPS traces. Photogramm. Eng. Remote. Sens. 2016, 82, 593–604. [Google Scholar]
  39. Zheng, Y.; Liu, Y.; Yuan, J.; Xie, X. Urban computing with taxicabs. In Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, 17–21 September 2011; pp. 89–98. [Google Scholar]
  40. Asavasuthirakul, D.; Harfifield, A.; Kesorn, K. A framework of personalized travelling information services for Thailand. Adv. Mater. Res. 2014, 931–932, 1382–1386. [Google Scholar] [CrossRef]
Figure 1. The process of feature attribute vectors and tourist site classification vectors forming training sample data and their storage format. Feature attributes are stored in database A, and tourist site classifications are stored in database B. The formed training sample data are stored in database C. The output form data are used to set up the Naïve Bayes interest mining machine learning model.
Figure 1. The process of feature attribute vectors and tourist site classification vectors forming training sample data and their storage format. Feature attributes are stored in database A, and tourist site classifications are stored in database B. The formed training sample data are stored in database C. The output form data are used to set up the Naïve Bayes interest mining machine learning model.
Ijgi 09 00112 g001
Figure 2. The process of searching and mining subordinate seed tourist sites with the clustering center K or seed tourist site G e as central points. (a) The process of mining seed tourist site G 1 with central point K under the control of the algorithm. (b) The process of mining seed tourist site G 2 with central point G 1 under the control of the algorithm. (c) The process of mining seed tourist site G 3 with central point G 2 under the control of the algorithm. (d) The process of mining the seed tourist site with central point G 3 . The subsequent steps are in the same way.
Figure 2. The process of searching and mining subordinate seed tourist sites with the clustering center K or seed tourist site G e as central points. (a) The process of mining seed tourist site G 1 with central point K under the control of the algorithm. (b) The process of mining seed tourist site G 2 with central point G 1 under the control of the algorithm. (c) The process of mining seed tourist site G 3 with central point G 2 under the control of the algorithm. (d) The process of mining the seed tourist site with central point G 3 . The subsequent steps are in the same way.
Ijgi 09 00112 g002
Figure 3. A generation tree basic structure loop and tour route connecting lines. (a) The generation tree basic structure loop composed by clustering center K and τ seed tourist sites G e , which is ordered by serial numbers of K and G e . In the process of the algorithm, the connecting arc and line can be clipped and connected. (b) The closed-loop path composed by some arcs and lines of the basic structure loop under the control of the algorithm.
Figure 3. A generation tree basic structure loop and tour route connecting lines. (a) The generation tree basic structure loop composed by clustering center K and τ seed tourist sites G e , which is ordered by serial numbers of K and G e . In the process of the algorithm, the connecting arc and line can be clipped and connected. (b) The closed-loop path composed by some arcs and lines of the basic structure loop under the control of the algorithm.
Ijgi 09 00112 g003
Figure 4. Zhengzhou city all tourist sites or tourist attractions geographic distribution, and the selected typical tourist sites for the experiment. (a) All the tourist sites or tourist attractions which are noted as black dots. The red star represents the temporary accommodation confirmed by tourists, that is the clustering center K . The blue area is the main downtown area of Zhengzhou city, the black circle represents the clustering center’s neighbourhood buffer to search and mine optimal tourist sites. The gray lines are the main roads and avenues of Zhengzhou city. (b) The four tourist site classifications with 25 typical tourist sites. Green represents the park and greenland classifications, blue represents the venue classification, red represents the amusement classification, and yellow represents the shopping center classification.
Figure 4. Zhengzhou city all tourist sites or tourist attractions geographic distribution, and the selected typical tourist sites for the experiment. (a) All the tourist sites or tourist attractions which are noted as black dots. The red star represents the temporary accommodation confirmed by tourists, that is the clustering center K . The blue area is the main downtown area of Zhengzhou city, the black circle represents the clustering center’s neighbourhood buffer to search and mine optimal tourist sites. The gray lines are the main roads and avenues of Zhengzhou city. (b) The four tourist site classifications with 25 typical tourist sites. Green represents the park and greenland classifications, blue represents the venue classification, red represents the amusement classification, and yellow represents the shopping center classification.
Ijgi 09 00112 g004
Figure 5. The objective function fluctuating curves when searching seed tourist sites with different starting points. (a) The objective function fluctuating curve which starts searching from K ; (b) The objective function fluctuating curve which starts searching from c 24 ; (c) The objective function fluctuating curve which starts searching from c 11 ; and (d) The objective function fluctuating curve, which starts searching from c 32 .
Figure 5. The objective function fluctuating curves when searching seed tourist sites with different starting points. (a) The objective function fluctuating curve which starts searching from K ; (b) The objective function fluctuating curve which starts searching from c 24 ; (c) The objective function fluctuating curve which starts searching from c 11 ; and (d) The objective function fluctuating curve, which starts searching from c 32 .
Ijgi 09 00112 g005
Figure 6. The process of generating an optimal tourist site structure tree. (a) Starting from the central point K and searching the result of seed tourist site c 24 . (b) Starting from the central point c 24 and searching seed tourist site c 11 . (c) Starting from the central point c 11 and searching seed tourist site c 32 . (d) Starting from the central point c 32 and searching seed tourist site c 21 .
Figure 6. The process of generating an optimal tourist site structure tree. (a) Starting from the central point K and searching the result of seed tourist site c 24 . (b) Starting from the central point c 24 and searching seed tourist site c 11 . (c) Starting from the central point c 11 and searching seed tourist site c 32 . (d) Starting from the central point c 32 and searching seed tourist site c 21 .
Ijgi 09 00112 g006
Figure 7. Sub-unit motive function I ( ) fluctuating curves (green color), sub-unit motive weight value h ( ) fluctuating curves (blue color) and generation tree weight function L ( ) fluctuating curves (brown color).
Figure 7. Sub-unit motive function I ( ) fluctuating curves (green color), sub-unit motive weight value h ( ) fluctuating curves (blue color) and generation tree weight function L ( ) fluctuating curves (brown color).
Ijgi 09 00112 g007aIjgi 09 00112 g007b
Figure 8. The building process of generation tree weight function minimum heap R and the output ascending order complete binary tree. (a) The foundation of initial heap and the minimum heap with ascending order of weight function values. (b) The complete binary tree with weight function values. The visualized tree can provide the smart machine with the pointer for outputting optimal and sub-optimal tour routes.
Figure 8. The building process of generation tree weight function minimum heap R and the output ascending order complete binary tree. (a) The foundation of initial heap and the minimum heap with ascending order of weight function values. (b) The complete binary tree with weight function values. The visualized tree can provide the smart machine with the pointer for outputting optimal and sub-optimal tour routes.
Ijgi 09 00112 g008
Figure 9. The generation tree closed-loop structures and guide maps output by a smart machine. (a) Related to the No.(1) tour route, (b) Related to the No.(24) tour route, (c) Related to the No.(7) tour route, (d) Related to the No.(23) tour route.
Figure 9. The generation tree closed-loop structures and guide maps output by a smart machine. (a) Related to the No.(1) tour route, (b) Related to the No.(24) tour route, (c) Related to the No.(7) tour route, (d) Related to the No.(23) tour route.
Ijgi 09 00112 g009
Table 1. Prior probability of tourist site classification and conditional probability of feature attributes.
Table 1. Prior probability of tourist site classification and conditional probability of feature attributes.
c 1 c 2 c 3 c 4 c 1 c 2 c 3 c 4
0.2670.2460.2510.236 0.2670.2460.2510.236
x 11 0.7030.1920.0110.044 x 31 0.4670.4990.1010.057
x 12 0.1320.3980.4120.578 x 32 0.4140.3980.0980.066
x 13 0.1020.3830.5440.309 x 33 0.2490.3010.4450.423
x 14 0.0910.4130.7030.088 x 34 0.2780.3670.4230.411
x 21 0.5230.4820.0910.032 x 41 0.7230.3340.4580.278
x 22 0.5090.4270.1280.106 x 44 0.1010.6920.1190.573
x 23 0.3270.3380.4260.510 x 42 0.6210.3180.4860.344
x 24 0.2670.1020.4890.623 x 43 0.1260.5450.0960.493
Table 2. Longitude and latitude ( l , B ) and attraction index a for tourist sites.
Table 2. Longitude and latitude ( l , B ) and attraction index a for tourist sites.
c 1 l , B a c 2 l , B a c 3 l , B a c 4 l , B a
c 11 113.663
34.761
0.732 c 21 113.672
34.788
0.639 c 31 113.721
34.731
0.478 c 41 113.666
34.750
0.886
c 12 113.630
34.751
0.698 c 22 113.627
34.745
0.520 c 32 113.639
34.792
0.482 c 42 113.642
34.717
0.712
c 13 113.687
34.762
0.622 c 23 113.627
34.746
0.591 c 33 113.612
34.746
0.503 c 43 113.602
34.746
0.779
c 14 113.630
34.746
0.579 c 24 113.667
34.752
0.811 c 34 113.684
34.793
0.512 c 44 113.613
34.762
0.678
c 15 113.537
34.736
0.524 c 25 113.658
34.819
0.483 c 35 113.562
34.919
0.609 c 45 113.675
34.756
0.690
c 16 113.712
34.805
0.312 c 26 113.729
34.723
0.396 c 46 113.601
34.758
0.689
c 17 113.685
34.789
0.712 c 47 113.680
34.785
0.721
Table 3. Conditional probability for feature attribute vector X under the condition of tourist site classification.
Table 3. Conditional probability for feature attribute vector X under the condition of tourist site classification.
c 1 c 2 c 3 c 4
X 0.0200.0230.0020.001
Table 4. The obtained central seed tourist sights and objective function values.
Table 4. The obtained central seed tourist sights and objective function values.
K c 24   ( G 1 + ) c 11   ( G 2 ) c 32   ( G 3 ) K c 24   ( G 1 + ) c 11   ( G 2 ) c 32   ( G 3 )
c 11 2.3612.3010.0001.484 c 31 1.6851.4461.3281.002
c 12 1.9411.3981.4111.374 c 32 1.5911.5081.5450.000
c 13 2.0961.7671.7131.264 c 33 1.6711.5071.4591.290
c 14 1.7421.4501.4481.220 c 34 1.7051.5111.4781.308
c 15 1.6331.1021.0520.893 c 35 1.3720.9820.8581.006
c 16 1.2971.4941.4481.304 c 41 2.9105.9792.2031.608
c 17 1.9461.3081.3201.484 c 42 1.8861.3811.2231.283
c 21 1.8441.3591.4841.673 ( G 4 ) c 43 1.8651.1031.0931.453
c 22 1.7151.5511.4921.266 c 44 1.7881.3231.2351.490
c 23 1.7861.5531.4231.341 c 45 4.0262.9941.7081.380
c 24 2.9330.0002.2611.556 c 46 1.6541.1661.1291.350
c 25 1.3961.3351.2811.272 c 47 1.9571.3391.4311.636
c 26 1.5241.4041.3241.041
Table 5. Sub-unit motive weight values h ( ) of starting points and tourist sites.
Table 5. Sub-unit motive weight values h ( ) of starting points and tourist sites.
K c 24 c 11 c 32 c 21
K --0.3410.4240.6290.542
c 24 0.341-- 0.6630.736
c 11 0.4240.435--0.6470.674
c 32 0.6290.6630.647--0.598
c 21 0.5420.7360.6740.598--
Table 6. Sub-unit motive weight value h ( ) and generation tree weight function value L ( ) .
Table 6. Sub-unit motive weight value h ( ) and generation tree weight function value L ( ) .
I ( c i j c i j ) h ( c i j c i j ) L ( K , c i j ) L ( K , K )
(1) K c 24 c 11 c 32 c 21 K 2.933,2.299,1.546,1.672,1.8450.341,0.435,0.647,0.598,0.5420.341,0.776,1.423,2.021,2.5632.563
(2) K c 24 c 11 c 21 c 32 K 2.933,2.299,1.484,1.672,1.5900.341,0.435,0.674,0.598,0.6290.341,0.776,1.450,2.048,2.6772.677
(3) K c 24 c 32 c 11 c 21 K 2.933,1.508,1.546,1.484,1.8450.341,0.663,0.647,0.674,0.5420.341,1.004,1.651,2.325,2.8682.867
(4) K c 24 c 32 c 21 c 11 K 2.933,1.508,1.672,1.484,2.3580.341,0.663,0.598,0.674,0.4240.341,1.004,1.602,2.276,2.7002.700
(5) K c 24 c 21 c 11 c 32 K 2.933,1.359,1.484,1.546,1.5900.341,0.736,0.674,0.647,0.6290.341,1.077,1.751,2.398,3.0273.027
(6) K c 24 c 21 c 32 c 11 K 2.933,1.359,1.672,1.546,2.3580.341,0.736,0.598,0.647,0.4240.341,1.077,1.675,2.322,2.7462.746
(7) K c 11 c 24 c 32 c 21 K 2.358,2.299,1.508,1.672,1.8450.424,0.435,0.663,0.598,0.5420.424,0.859,1.522,2.120,2.6622.662
(8) K c 11 c 24 c 21 c 32 K 2.358,2.299,1.359,1.672,1.5900.424,0.435,0.736,0.598,0.6290.424,0.859,1.595,2.193,2.8222.822
(9) K c 11 c 32 c 24 c 21 K 2.358,1.546,1.508,1.359,1.8450.424,0.647,0.663,0.736,0.5420.424,1.071,1.734,2.470,3.0123.012
(10) K c 11 c 32 c 21 c 24 K 2.358,1.546,1.672,1.359,2.9330.424,0.647,0.598,0.736,0.3410.424,1.071,1.669,2.405,2.7462.746
(11) K c 11 c 21 c 32 c 24 K 2.358,1.484,1.672,1.508,2.9330.424,0.674,0.598,0.663,0.3410.424,1.098,1.696,2.359,2.7002.700
(12) K c 11 c 21 c 24 c 32 K 2.358,1.484,1.359,1.508,1.5900.424,0.674,0.736,0.663,0.6290.424,1.098,1.834,2.497,3.1263.126
(13) K c 32 c 11 c 21 c 24 K 1.590,1.546,1.484,1.359,2.9330.629,0.647,0.674,0.736,0.3410.629,1.276,1.950,2.686,3.0273.027
(14) K c 32 c 11 c 24 c 21 K 1.590,1.546,2.299,1.359,1.8450.629,0.647,0.435,0.736,0.5420.629,1.276,1.711,2.447,2.9892.989
(15) K c 32 c 21 c 11 c 24 K 1.590,1.672,1.484,2.299,2.9330.629,0.598,0.674,0.435,0.3410.629,1.227,1.901,2.336,2.6772.677
(16) K c 32 c 21 c 24 c 11 K 1.590,1.672,1.359,2.299,2.3580.629,0.598,0.736,0.435,0.4240.629,1.227,1.963,2.398,2.8222.822
(17) K c 32 c 24 c 21 c 11 K 1.590,1.508,1.359,1.484,2.3580.629,0.663,0.736,0.674,0.4240.629,1.292,2.028,2.702,3.1263.126
(18) K c 32 c 24 c 11 c 21 K 1.590,1.508,2.299,1.484,1.8450.629,0.663,0.435,0.674,0.5420.629,1.292,1.727,2.401,2.9432.943
(19) K c 21 c 11 c 24 c 32 K 1.845,1.484,2.299,1.508,1.5900.542,0.674,0.435,0.663,0.6290.542,1.216,1.651,2.314,2.9432.943
(20) K c 21 c 11 c 32 c 24 K 1.845,1.484,1.546,1.508,2.9330.542,0.674,0.647,0.663,0.3410.542,1.216,1.863,2.526,2.8672.867
(21) K c 21 c 24 c 11 c 32 K 1.845,1.359,2.299,1.546,1.5900.542,0.736,0.435,0.647,0.6290.542,1.278,1.713,2.360,2.9892.989
(22) K c 21 c 24 c 32 c 11 K 1.845,1.359,1.508,1.546,2.3580.542,0.736,0.663,0.647,0.4240.542,1.278,1.941,2.588,3.0123.012
(23) K c 21 c 32 c 24 c 11 K 1.845,1.672,1.508,2.299,2.3580.542,0.598,0.663,0.435,0.4240.542,1.140,1.803,2.238,2.6622.662
(24) K c 21 c 32 c 11 c 24 K 1.845,1.672,1.546,2.299,2.9330.542,0.598,0.647,0.435,0.3410.542,1.140,1.787,2.222,2.5632.563
Table 7. Comparison on the time complexity and space complexity of the four algorithms.
Table 7. Comparison on the time complexity and space complexity of the four algorithms.
Search AlgorithmTC TC   ( n = 6 ) SC SC   ( n = 6 )
Proposed algorithm O ( 2 n log 2 n ) O ( 31.02 ) O ( 1 ) O ( 1 )
A * O ( n log 2 n + n ) O ( 21.51 ) O ( 1 ) O ( 1 )
Dijkstra O ( n log 2 n + n 2 ) O ( 51.51 ) O ( n ) O ( 6 )
Floyd O ( n log 2 n + n 3 ) O ( 231.51 ) O ( n 2 ) O ( 36 )
Table 8. The searching process for optimal tourist sites.
Table 8. The searching process for optimal tourist sites.
Starting PointThe Mined Optimal Tourist SiteNote for the Mined Tourist Site
K c 24 G 1 +
c 24 c 11 G 2
c 11 c 32 G 3
c 32 c 21 G 4

Share and Cite

MDPI and ACS Style

Zhou, X.; Su, M.; Liu, Z.; Hu, Y.; Sun, B.; Feng, G. Smart Tour Route Planning Algorithm Based on Naïve Bayes Interest Data Mining Machine Learning. ISPRS Int. J. Geo-Inf. 2020, 9, 112. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9020112

AMA Style

Zhou X, Su M, Liu Z, Hu Y, Sun B, Feng G. Smart Tour Route Planning Algorithm Based on Naïve Bayes Interest Data Mining Machine Learning. ISPRS International Journal of Geo-Information. 2020; 9(2):112. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9020112

Chicago/Turabian Style

Zhou, Xiao, Mingzhan Su, Zhong Liu, Yu Hu, Bin Sun, and Guanghui Feng. 2020. "Smart Tour Route Planning Algorithm Based on Naïve Bayes Interest Data Mining Machine Learning" ISPRS International Journal of Geo-Information 9, no. 2: 112. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9020112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop