Trends and Prospects in Data Mining Techniques for Big Graph/Spatial Data

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (15 December 2022) | Viewed by 4021

Special Issue Editors

School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
Interests: data management; data mining; cohesive subgraph searching; graph embedding; graph neural networks; keyword searching; trajectory computing
Special Issues, Collections and Topics in MDPI journals
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Interests: data management; spatial-temporal data mining

Special Issue Information

Dear Colleagues,

With the striking boom of internet and mobile technologies, big graph/spatial data are prevalent in various areas, such as social media, knowledge bases, e-commerce platforms, and so on. As a powerful tool, graph data are often used to model the complex connectedness among entities in the social networks, communication networks, collaboration networks, biological networks, transportation networks, knowledge networks, etc. Similarly, spatial data are able to model the spatial locations of entities in location-based services (e.g., Google Maps) and geo-social networks (e.g., Flickr and Foursqure). What’s more, in many real-world applications, such as recommendations of POIs, trip planning, location-based viral marketing, community discovery, group mobility, and behaviour modelling, graph data and spatial data are often used jointly to model the activities among them. Driven by these applications, there is an increasing demand for the development of novel graph/spatial data analytics models and scalable graph/spatial data analytics techniques and systems.

However, the problem of effective and efficient mining of big graph/spatial data has long been an open challenge to the data science community. Apart from having the generic characteristics of big data (e.g., big volume, high velocity, and complex variety), big graph/spatial data additionally has more challenging characteristics, including but not limited to high variability, low veracity, difficulty in validation, and ensuring data security.

The purpose of this Special Issue is, therefore, to disseminate the results of advanced data mining approaches to addressing the aforementioned challenges of processing big graph/spatial data. Moreover, this Special Issue will be of interest to researchers in developing techniques for large scale graph/spatial data analytics in various application domains. The intended audiences include researchers from both academia and industry who are interested in exploiting the value of large-scale graph/spatial data.

The topics of interest related to this Special Issue include, but are not limited to:

  • Modelling, storage, indexing and query-processing techniques for graph/spatial data;
  • Data management systems for the collection, storage, and access of graph/spatial data;
  • Data mining techniques for graph/spatial data;
  • AI and machine learning techniques for graph/spatial data;
  • Data analytics for dynamic and streaming graph/spatial data;
  • Techniques for distributed graph/spatial data analytics;
  • Visualization techniques and systems for graph/spatial data;
  • Spatio-temporal graph data analytics;
  • Crowdsourcing techniques based on graph/spatial data;
  • Location-based services and location-based social networks;
  • Traffic pattern analysis and intelligent transportation;
  • Individual, group behaviour analysis and social activity discovery;
  • Graph analytics in various application domains such as social networks multimedia, semantic web, biological data, business processes, transport data, etc.;
  • Vision papers to survey the area of graph/spatial data analytics as well as to experimental compare existing works.

Dr. Yixiang Fang
Dr. Bolong Zheng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • graph data mining
  • graph neural networks
  • graph embedding
  • knowledge graph mining
  • spatial/spatio-temporal data mining
  • urban data mining
  • user mobility and activity modeling
  • recommendation systems

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 1916 KiB  
Article
Dynamic Community Detection Based on Evolutionary DeepWalk
by Song Qu, Yuqing Du, Mu Zhu, Guan Yuan, Jining Wang, Yanmei Zhang and Xiangyu Duan
Appl. Sci. 2022, 12(22), 11464; https://0-doi-org.brum.beds.ac.uk/10.3390/app122211464 - 11 Nov 2022
Cited by 1 | Viewed by 1203
Abstract
To fully characterize the evolution process of the topological structure of dynamic communities, we propose a dynamic community detection based on Evolutionary DeepWalk (DEDW) for the high-dimensional data and dynamic characteristics. First, DEDW solves the problem of data sparseness in the process of [...] Read more.
To fully characterize the evolution process of the topological structure of dynamic communities, we propose a dynamic community detection based on Evolutionary DeepWalk (DEDW) for the high-dimensional data and dynamic characteristics. First, DEDW solves the problem of data sparseness in the process of dynamic network data representation through graph embedding. Then, DEDW uses the DeepWalk algorithm to generate node embedding feature vectors based on the characteristics of the stable change of the community structure; finally, DEDW integrates historical network structure information to generate evolutionary graph features and implements dynamic community detection with the K-means algorithm. Experiments show that DEDW can mine the time-smooth change characteristics of dynamic communities, solve the problem of data sparseness in the process of node embedding, fully consider historical structure information, and improve the accuracy and stability of dynamic community detection. Full article
Show Figures

Figure 1

24 pages, 1035 KiB  
Article
Generalized Sketches for Streaming Sets
by Wenhua Guo, Kaixuan Ye, Yiyan Qi, Peng Jia and Pinghui Wang
Appl. Sci. 2022, 12(15), 7362; https://0-doi-org.brum.beds.ac.uk/10.3390/app12157362 - 22 Jul 2022
Viewed by 1314
Abstract
Many real-world datasets are given as a stream of user–interest pairs, where a user–interest pair represents a link from a user (e.g., a network host) to an interest (e.g., a website), and may appear more than once in the stream. Monitoring and mining [...] Read more.
Many real-world datasets are given as a stream of user–interest pairs, where a user–interest pair represents a link from a user (e.g., a network host) to an interest (e.g., a website), and may appear more than once in the stream. Monitoring and mining statistics, including cardinality, intersection cardinality, and Jaccard similarity of users’ interest sets on high-speed streams, are widely employed by applications such as network anomaly detection. Although estimating set cardinality, set intersection cardinality, and set Jaccard similarity, respectively, is well studied, there is no effective method that provides a one-shot solution for estimating all these three statistics. To solve the above challenge, we develop a novel framework, SimCar. SimCar online builds an order-hashing (OH) sketch for each user occurring in the data stream of interest. At any time of interest, one can query the cardinalities, intersection cardinalities, and Jaccard similarities of users’ interest sets. Specially, using OH sketches, we develop maximum likelihood estimation (MLE) methods to estimate cardinalities and intersection cardinalities of users’ interest sets. In addition, we use OH sketches to estimate Jaccard similarities of users’ interest sets and build locality-sensitive hashing tables to search for users with similar interests with sub-linear time. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate the superiority of our methods. Full article
Show Figures

Figure 1

20 pages, 683 KiB  
Article
Mosar: Efficiently Characterizing Both Frequent and Rare Motifs in Large Graphs
by Wenhua Guo, Wenqian Feng, Yiyan Qi, Pinghui Wang and Jing Tao
Appl. Sci. 2022, 12(14), 7210; https://0-doi-org.brum.beds.ac.uk/10.3390/app12147210 - 18 Jul 2022
Viewed by 945
Abstract
Due to high computational costs, exploring motif statistics (such as motif frequencies) of a large graph can be challenging. This is useful for understanding complex networks such as social and biological networks. To address this challenge, many methods explore approximate algorithms using edge/path [...] Read more.
Due to high computational costs, exploring motif statistics (such as motif frequencies) of a large graph can be challenging. This is useful for understanding complex networks such as social and biological networks. To address this challenge, many methods explore approximate algorithms using edge/path sampling techniques. However, state-of-the-art methods usually over-sample frequent motifs and under-sample rare motifs, and thus they fail in many real applications such as anomaly detection (i.e., finding rare patterns). Furthermore, it is not feasible to apply existing weighted sampling methods such as stratified sampling to solve this problem, because it is difficult to sample subgraphs from a large graph in a direct manner. In this paper, we observe that rare motifs of most real-world networks have “more edges” than frequent motifs, and motifs with more edges are sampled by random edge sampling with higher probabilities. Based on these two observations, we propose a novel motif sampling method, Mosar, to estimate motif frequencies. In particular, our Mosar method samples frequent and rare motifs with different probabilities, and tends to sample motifs with low frequencies. As a result, the new method greatly reduces the estimation errors of these rare motifs. Finally, we conducted extensive experiments on a variety of real-world datasets with different sizes, and our experimental results show that the Mosar method is two orders of magnitude more accurate than state-of-the-art methods. Full article
Show Figures

Figure 1

Back to TopTop