Applications of Big Data Analysis and Modeling

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: closed (30 January 2024) | Viewed by 10232

Special Issue Editors


E-Mail Website
Guest Editor
Computer Science Department, Southwest University, Tiansheng Road #2, Beibei District, Chongqing 400715, China
Interests: data-driven system modeling; network science; community detection; network representation learning; complex social networks analysis
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Artificial Intelligence, Optics and Electronics, Northwestern Polytechnical University, Xi’an 710072, China
Interests: intelligent decision-making and cognition; data mining; artificial intelligence; network science; complex systems

E-Mail Website
Guest Editor
School of Artificial Intelligence, Optics and Electronics, Northwestern Polytechnical University, Xi’an 710072, China
Interests: data-driven modeling; social networks analysis; application of machine learning approaches
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the rapid development of modern internet technology, vast quantities of data are generated every day in practice, including texts, videos and images. For instance, the emergence and prosperity of various social platforms have witnessed the rapid development of online communications, especially information sharing.

In the era of big data, the core research fields of both academia and industry reside in data-driven applications, consisting of many machine learning approaches, such as supervised, semi-supervised and unsupervised learning, aiming to mine valuable information from the big data to provide convenience to our daily lives. Especially, graph analysis is becoming a hot topic that attracts various interests from biologists, economists, chemists, physicists, etc. The collected data in graph analysis can be represented by graph data through different embedding approaches, while many mathematics-based methods (e.g., matrix factorization) are developed. Recently, graph neural networks, which originate from spectral graph theory, generalize neural networks and deep learning to the graph. Due to the emergence of various deep learning models, the performance of analyzing data collected from practical systems can be efficiently promoted, such as recommendation, traffic forecasting, medicine development, epidemic spreading and natural language processing. The aim of this Special Issue is to publish cutting-edge original research papers on the latest advances in the analysis and application of big data in the development of machine learning approaches, including theories, models, algorithms, and applications in the real world.

Potential topics of interest include, but are not limited to:

  • Machine learning theories;
  • Machine learning models;
  • Machine learning algorithms;
  • Embedding/representation methods;
  • Feature selection and clustering;
  • Graph neural networks/graph convolutional networks;
  • Complex network analysis based on GNNs;
  • Sentiment analysis and text classification;
  • False account/news detection for online social networks;
  • Applications in NLP, emotional analysis, computer vision, intelligent traffic, recommendation system, financial, new medicine design, epidemiologic modeling, etc.

Prof. Dr. Chao Gao
Prof. Dr. Zhen Wang
Prof. Dr. Peican Zhu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • deep learning
  • graph convolutional networks
  • statistical analysis of big data
  • applications of big data analysis
  • data-based complex network analysis

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 4952 KiB  
Article
Pinning Event-Triggered Scheme for Synchronization of Delayed Uncertain Memristive Neural Networks
by Jiejie Fan, Xiaojuan Ban, Manman Yuan and Wenxing Zhang
Mathematics 2024, 12(6), 821; https://0-doi-org.brum.beds.ac.uk/10.3390/math12060821 - 11 Mar 2024
Viewed by 499
Abstract
To reduce the communication and computation overhead of neural networks, a novel pinning event-triggered scheme (PETS) is developed in this paper, which enables pinning synchronization of uncertain coupled memristive neural networks (CMNNs) under limited resources. Time-varying delays, uncertainties, and mismatched parameters are all [...] Read more.
To reduce the communication and computation overhead of neural networks, a novel pinning event-triggered scheme (PETS) is developed in this paper, which enables pinning synchronization of uncertain coupled memristive neural networks (CMNNs) under limited resources. Time-varying delays, uncertainties, and mismatched parameters are all considered, which makes the system more interpretable. In addition, from the low energy cost point of view, an algorithm for pinned node selection is designed to further investigate the newly event-triggered function under limited communication resources. Meanwhile, based on the PETS and following the Lyapunov functional method, sufficient conditions for the pinning exponential stability of the proposed coupled error system are formulated, and the analysis of the self-triggered method shows that our method can efficiently avoid Zeno behavior under the newly determined triggered conditions, which contribute to better PETS performance. Extensive experiments demonstrate that the PETS significantly outperforms the existing schemes in terms of solution quality. Full article
(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)
Show Figures

Figure 1

21 pages, 3593 KiB  
Article
A Blockchain-Based Fairness Guarantee Approach for Privacy-Preserving Collaborative Training in Computing Force Network
by Zhe Sun, Weiping Li, Junxi Liang, Lihua Yin, Chao Li, Nan Wei, Jie Zhang and Hanyi Wang
Mathematics 2024, 12(5), 718; https://0-doi-org.brum.beds.ac.uk/10.3390/math12050718 - 28 Feb 2024
Viewed by 458
Abstract
The advent of the big data era has brought unprecedented data demands. The integration of computing resources with network resources in the computing force network enables the possibility of distributed collaborative training. However, unencrypted collaborative training is vulnerable to threats such as gradient [...] Read more.
The advent of the big data era has brought unprecedented data demands. The integration of computing resources with network resources in the computing force network enables the possibility of distributed collaborative training. However, unencrypted collaborative training is vulnerable to threats such as gradient inversion attacks and model theft. To address this issue, the data in collaborative training are usually protected by cryptographic methods. However, the semantic meaninglessness of encrypted data makes it difficult to prevent potential data poisoning attacks and free-riding attacks. In this paper, we propose a fairness guarantee approach for privacy-preserving collaborative training, employing blockchain technology to enable participants to share data and exclude potential violators from normal users. We utilize a cryptography-based secure aggregation method to prevent data leakage during blockchain transactions, and employ a contribution evaluation method for encrypted data to prevent data poisoning and free-riding attacks. Additionally, utilizing Shamir’s secret sharing for secret key negotiation within the group, the negotiated key is directly introduced as noise into the model, ensuring the encryption process is computationally lightweight. Decryption is efficiently achieved through the aggregation of encrypted models within the group, without incurring additional computational costs, thereby enhancing the computational efficiency of the encryption and decryption processes. Finally, the experimental results demonstrate the effectiveness and efficiency of our proposed approach. Full article
(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)
Show Figures

Figure 1

18 pages, 811 KiB  
Article
A Dual Fusion Pipeline to Discover Tactical Knowledge Guided by Implicit Graph Representation Learning
by Xiaodong Wang, Pei He, Hongjing Yao, Xiangnan Shi, Jiwei Wang and Yangming Guo
Mathematics 2024, 12(4), 528; https://0-doi-org.brum.beds.ac.uk/10.3390/math12040528 - 08 Feb 2024
Viewed by 575
Abstract
Discovering tactical knowledge aims to extract tactical data derived from battlefield signal data, which is vital in information warfare. The learning and reasoning from battlefield signal information can help commanders make effective decisions. However, traditional methods are limited in capturing sequential and global [...] Read more.
Discovering tactical knowledge aims to extract tactical data derived from battlefield signal data, which is vital in information warfare. The learning and reasoning from battlefield signal information can help commanders make effective decisions. However, traditional methods are limited in capturing sequential and global representation due to their reliance on prior knowledge or feature engineering. The current models based on deep learning focus on extracting implicit behavioral characteristics from combat process data, overlooking the embedded martial knowledge within the recognition of combat intentions. In this work, we fill the above challenge by proposing a dual fusion pipeline introducing graph representation learning into sequence learning to construct tactical behavior sequence graphs expressing implicit martial knowledge, named TBGCN. Specifically, the TBGCN utilizes graph representation learning to represent prior knowledge by building a graph to induce deep learning paradigms, and sequence learning finds the hidden representation from the target’s serialized data. Then, we employ a fusion module to merge two such representations. The significance of integrating graphs with deep learning lies in using the artificial experience of implicit graph structure guiding adaptive learning, which can improve representation ability and model generalization. Extensive experimental results demonstrate that the proposed TBGCN can effectively discover tactical knowledge and significantly outperform the traditional and deep learning methods. Full article
(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)
Show Figures

Figure 1

15 pages, 13991 KiB  
Article
Community Structure and Resilience of the City Logistics Networks in China
by Jun-Chao Ma, Zhi-Qiang Jiang, Yin-Jie Ma and Yue-Hua Dai
Mathematics 2023, 11(20), 4352; https://0-doi-org.brum.beds.ac.uk/10.3390/math11204352 - 19 Oct 2023
Viewed by 991
Abstract
Logistics security, as the lifeline of the economy connecting production, distribution, and consumption, holds a pivotal position in the modern economic system, where any potential threats like natural disasters or cyber attacks could have far-reaching impacts on the overall economy. With a unique [...] Read more.
Logistics security, as the lifeline of the economy connecting production, distribution, and consumption, holds a pivotal position in the modern economic system, where any potential threats like natural disasters or cyber attacks could have far-reaching impacts on the overall economy. With a unique large-scale logistics data set, logistics networks between cities in China are constructed. We thus identify communities of cities that have dense logistics connections in these networks. The cities in the communities are found to exhibit strong connections in the economy, resources, and industry. The detected communities are also aligned with the urban agglomerations mentioned in the guidelines reported by the National Development and Reform Commission of China. We further extend our analysis to assess the resilience of the city logistics networks, especially focusing on the influence of community structures. Random and intentional attacks are considered in our resilience analysis. Our results reveal that the city logistics networks are robust to the random attacks and are vulnerable to the intentional attacks on the nodes with dense links between and within communities. Our results not only deepen our understanding of the community structure and resilience of the city logistics networks but also provide insights on how to improve the efficiency and safety of intercity logistics. Full article
(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)
Show Figures

Figure 1

15 pages, 1352 KiB  
Article
Research on a Hotel Collaborative Filtering Recommendation Algorithm Based on the Probabilistic Language Term Set
by Erwei Wang, Yingyin Chen and Yumin Li
Mathematics 2023, 11(19), 4106; https://0-doi-org.brum.beds.ac.uk/10.3390/math11194106 - 28 Sep 2023
Cited by 2 | Viewed by 864
Abstract
In the face of problems such as information overload and the information cocoon resulting from big data, it is a key point of current research to solve the problem of semantic fuzziness of online reviews and improve the accuracy of personalized recommendation algorithms [...] Read more.
In the face of problems such as information overload and the information cocoon resulting from big data, it is a key point of current research to solve the problem of semantic fuzziness of online reviews and improve the accuracy of personalized recommendation algorithms by using online reviews. Based on the advantage of the probabilistic language term set to deal with fuzzy information and the historical data of online hotel reviews, this paper proposes a collaborative filtering recommendation algorithm for hotels. Firstly, the text data of hotel online reviews are crawled by a crawler and processed by jieba and TF-IDF tools. Secondly, the hotel evaluation attribute set is constructed, and the sentiment analysis of the review statements is carried out with the help of the HowNet sentiment dictionary and manual annotation method. The probabilistic language term set is used to classify the data and derive statistics, and the maximum deviation method is used to determine the weight of each attribute. Then, the cosine similarity formula is fused with the modified cosine similarity formula to calculate the similarity and construct the decision matrix. Finally, combined with the historical data of the user’s hotel selection, the hotel recommendation results are generated. This paper collected review data from 10 hotels in Macau from the official “Ctrip” website. The proposed recommendation algorithm model was then applied to process and analyze the data, resulting in the generation of a ranked list of hotel recommendations. To validate the accuracy and effectiveness of this research, the recommendation results were compared with those produced by other algorithms. Full article
(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)
Show Figures

Figure 1

17 pages, 1221 KiB  
Article
ProMatch: Semi-Supervised Learning with Prototype Consistency
by Ziyu Cheng, Xianmin Wang and Jing Li
Mathematics 2023, 11(16), 3537; https://0-doi-org.brum.beds.ac.uk/10.3390/math11163537 - 16 Aug 2023
Cited by 1 | Viewed by 1131
Abstract
Recent state-of-the-art semi-supervised learning (SSL) methods have made significant advancements by combining consistency-regularization and pseudo-labeling in a joint learning paradigm. The core concept of these methods is to identify consistency targets (pseudo-labels) by selecting predicted distributions with high confidence from weakly augmented unlabeled [...] Read more.
Recent state-of-the-art semi-supervised learning (SSL) methods have made significant advancements by combining consistency-regularization and pseudo-labeling in a joint learning paradigm. The core concept of these methods is to identify consistency targets (pseudo-labels) by selecting predicted distributions with high confidence from weakly augmented unlabeled samples. However, they often face the problem of erroneous high confident pseudo-labels, which can lead to noisy training. This issue arises due to two main reasons: (1) when the model is poorly calibrated, the prediction of a single sample may be overconfident and incorrect, and (2) propagating pseudo-labels from unlabeled samples can result in error accumulation due to the margin between the pseudo-label and the ground-truth label. To address this problem, we propose a novel consistency criterion called Prototype Consistency (PC) to improve the reliability of pseudo-labeling by leveraging the prototype similarities between labeled and unlabeled samples. First, we instantiate semantic-prototypes (centers of embeddings) and prediction-prototypes (centers of predictions) for each category using memory buffers that store the features of labeled examples. Second, for a given unlabeled sample, we determine the most similar semantic-prototype and prediction-prototype by assessing the similarities between the features of the unlabeled sample and the prototypes of the labeled samples. Finally, instead of using the prediction of the unlabeled sample as the pseudo-label, we select the most similar prediction-prototype as the consistency target, as long as the predicted category of the most similar prediction-prototype, the ground-truth category of the most similar semantic-prototype, and the ground-truth category of the most similar prediction-prototype are equivalent. By combining the PC approach with the techniques developed by the MixMatch family, our proposed ProMatch framework demonstrates significant performance improvements compared to previous algorithms on datasets such as CIFAR-10, CIFAR-100, SVHN, and Mini-ImageNet. Full article
(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)
Show Figures

Figure 1

15 pages, 755 KiB  
Article
An Improved Genetic Algorithm for the Granularity-Based Split Vehicle Routing Problem with Simultaneous Delivery and Pickup
by Yuxin Liu, Zihang Qin and Jin Liu
Mathematics 2023, 11(15), 3328; https://0-doi-org.brum.beds.ac.uk/10.3390/math11153328 - 28 Jul 2023
Cited by 2 | Viewed by 1003
Abstract
The Split Vehicle Routing Problem with Simultaneous Delivery and Pickup (SVRPSDP) consists of two subproblems, i.e., the Vehicle Routing Problem with Simultaneous Delivery and Pickup (VRPSDP) and the Split Delivery Vehicle Routing Problem (SDVRP). Compared to the subproblems, SVRPSDP is much closer to [...] Read more.
The Split Vehicle Routing Problem with Simultaneous Delivery and Pickup (SVRPSDP) consists of two subproblems, i.e., the Vehicle Routing Problem with Simultaneous Delivery and Pickup (VRPSDP) and the Split Delivery Vehicle Routing Problem (SDVRP). Compared to the subproblems, SVRPSDP is much closer to reality. However, some realistic factors are still ignored in SVRPSDP. For example, the shipments are integrated and cannot be infinitely subdivided. Hence, this paper investigates the Granularity-based Split Vehicle Routing Problem with Simultaneous Delivery and Pickup (GSVRPSDP). The characteristics of GSVRPSDP are that the demands of customers are split into individual shipments and both the volume and weight of each shipment are considered. In order to solve GSVRPSDP efficiently, a Genetic-Simulated hybrid algorithm (GA-SA) is proposed, in which Simulated Annealing (SA) is inserted into the Genetic Algorithm (GA) framework to improve the global search abilities of individuals. The experimental results indicate that GA-SA can achieve lower total costs of routes compared to the traditional meta-algorithms, such as GA, SA and Particle Swarm Optimization (PSO), with a reduction of more than 10%. In the further analysis, the space utilization and capacity utilization of vehicles are calculated, which achieve 86.1% and 88.9%, respectively. These values are much higher than those achieved by GA (71.2% and 74.8%, respectively) and PSO (60.9% and 65.7%, respectively), further confirming the effectiveness of GA-SA. And the superiority of simultaneous delivery and pickup is proved by comparing with separate delivery and pickup. Specifically, the costs of separate delivery and pickup are more than 80% higher than that of simultaneous delivery and pickup. Full article
(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)
Show Figures

Figure 1

12 pages, 1840 KiB  
Article
Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering
by Yuchen Sha, Yujian Feng, Miao He, Shangdong Liu and Yimu Ji
Mathematics 2023, 11(15), 3269; https://0-doi-org.brum.beds.ac.uk/10.3390/math11153269 - 25 Jul 2023
Cited by 1 | Viewed by 2044
Abstract
Existing knowledge graph (KG) models for commonsense question answering present two challenges: (i) existing methods retrieve entities related to questions from the knowledge graph, which may extract noise and irrelevant nodes, and (ii) there is a lack of interaction representation between questions and [...] Read more.
Existing knowledge graph (KG) models for commonsense question answering present two challenges: (i) existing methods retrieve entities related to questions from the knowledge graph, which may extract noise and irrelevant nodes, and (ii) there is a lack of interaction representation between questions and graph entities. However, current methods mainly focus on retrieving relevant entities with some noisy and irrelevant nodes. In this paper, we propose a novel retrieval-augmented knowledge graph (RAKG) model, which solves the above issues using two key innovations. First, we leverage the density matrix to make the model reason along the corrected knowledge path and extract an enhanced subgraph of the knowledge graph. Second, we fuse representations of questions and graph entities through a bidirectional attention strategy, in which two representations fuse and update using a graph convolutional network (GCN). To evaluate the performance of our method, we conducted experiments on two widely used benchmark datasets: CommonsenseQA and OpenBookQA. The case study gives insight into the finding that the augmented subgraph provides reasoning along the corrected knowledge path for question answering. Full article
(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)
Show Figures

Figure 1

15 pages, 1816 KiB  
Article
AdvSCOD: Bayesian-Based Out-Of-Distribution Detection via Curvature Sketching and Adversarial Sample Enrichment
by Jiacheng Qiao, Chengzhi Zhong, Peican Zhu and Keke Tang
Mathematics 2023, 11(3), 692; https://0-doi-org.brum.beds.ac.uk/10.3390/math11030692 - 29 Jan 2023
Cited by 2 | Viewed by 1260
Abstract
Detecting out-of-distribution (OOD) samples is critical for the deployment of deep neural networks (DNN) in real-world scenarios. An appealing direction in which to conduct OOD detection is to measure the epistemic uncertainty in DNNs using the Bayesian model, since it is much more [...] Read more.
Detecting out-of-distribution (OOD) samples is critical for the deployment of deep neural networks (DNN) in real-world scenarios. An appealing direction in which to conduct OOD detection is to measure the epistemic uncertainty in DNNs using the Bayesian model, since it is much more explainable. SCOD sketches the curvature of DNN classifiers based on Bayesian posterior estimation and decomposes the OOD measurement into the uncertainty of the model parameters and the influence of input samples on the DNN models. However, since lots of approximation is applied, and the influence of the input samples on DNN models can be hardly measured stably, as demonstrated in adversarial attacks, the detection is not robust. In this paper, we propose a novel AdvSCOD framework that enriches the input sample with a small set of its neighborhoods generated by applying adversarial perturbation, which we believe can better reflect the influence on model predictions, and then we average their uncertainties, measured by SCOD. Extensive experiments with different settings of in-distribution and OOD datasets validate the effectiveness of AdvSCOD in OOD detection and its superiority to state-of-the-art Bayesian-based methods. We also evaluate the influence of different types of perturbation. Full article
(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)
Show Figures

Figure 1

Back to TopTop