Advances and Challenges in Big Data Analytics and Applications

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 January 2024) | Viewed by 9398

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science and Engineering, University of Alaska Anchorage, Anchorage, AK 99508, USA
Interests: big data; cyber security; machine learning; cloud computing; system architectures
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Intelligent Mechatronics Engineering, Sejong University, Seoul 05006, Republic of Korea
Interests: computer vision; human–computer interaction; biometrics; medical image processing and understanding; artificial intelligence; deep learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Big data analytics is a blanket term for interdisciplinary approaches to data analysis that make use of cutting-edge mathematical and statistical methods to examine the massive amount of data sets. Although data science is a strong tool for improving performance, its effectiveness depends on the quality of the data being utilized to create the solutions. Big data analytics is the application of intelligence to data in order to transform data into useful insights for business and society. Organizations can gain significantly from big data analytics in a variety of ways, enhancing their ability to compete and innovate. However, this needs devising efficient data analytics and processing techniques to cater to the growing challenges so that industry and society can make decisions with confidence. As a result, research in big data-related technologies is getting more and more attention from practitioners, governments, and both academia and industry.

This Special Issue solicits submissions from multidisciplinary applied sciences presenting the advances and challenges in the domain of big data analytics and applications.

Dr. Kamran Siddique
Prof. Dr. Ka Lok Man
Dr. Rizwan Ali Naqvi
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • data analysis
  • novel algorithms for big data analysis
  • big data processing techniques
  • big data case studies and applications

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 1925 KiB  
Article
Revolutionising Financial Portfolio Management: The Non-Stationary Transformer’s Fusion of Macroeconomic Indicators and Sentiment Analysis in a Deep Reinforcement Learning Framework
by Yuchen Liu, Daniil Mikriukov, Owen Christopher Tjahyadi, Gangmin Li, Terry R. Payne, Yong Yue, Kamran Siddique and Ka Lok Man
Appl. Sci. 2024, 14(1), 274; https://0-doi-org.brum.beds.ac.uk/10.3390/app14010274 - 28 Dec 2023
Viewed by 855
Abstract
In the evolving landscape of portfolio management (PM), the fusion of advanced machine learning techniques with traditional financial methodologies has opened new avenues for innovation. Our study introduces a cutting-edge model combining deep reinforcement learning (DRL) with a non-stationary transformer architecture. This model [...] Read more.
In the evolving landscape of portfolio management (PM), the fusion of advanced machine learning techniques with traditional financial methodologies has opened new avenues for innovation. Our study introduces a cutting-edge model combining deep reinforcement learning (DRL) with a non-stationary transformer architecture. This model is designed to decode complex patterns in financial time-series data, enhancing portfolio management strategies with deeper insights and robustness. It effectively tackles the challenges of data heterogeneity and market uncertainty, key obstacles in PM. Our approach integrates key macroeconomic indicators and targeted news sentiment analysis into its framework, capturing a comprehensive picture of market dynamics. This amalgamation of varied data types addresses the multifaceted nature of financial markets, enhancing the model’s ability to navigate the complexities of asset management. Rigorous testing demonstrates the model’s efficacy, highlighting the benefits of blending diverse data sources and sophisticated algorithmic approaches in mastering the nuances of PM. Full article
(This article belongs to the Special Issue Advances and Challenges in Big Data Analytics and Applications)
Show Figures

Figure 1

16 pages, 1623 KiB  
Article
Prediction of Gasoline Orders at Gas Stations in South Korea Using VAE-Based Machine Learning Model to Address Data Asymmetry
by Sungyeon Yoon and Minseo Park
Appl. Sci. 2023, 13(20), 11124; https://0-doi-org.brum.beds.ac.uk/10.3390/app132011124 - 10 Oct 2023
Viewed by 920
Abstract
South Korea has developed road-based transportation and uses a lot of gasoline. South Korea imports gasoline since it is not produced domestically. So, fluctuations in gasoline prices have a significant impact on the national economy. Currently, gasoline orders, which are based on gasoline [...] Read more.
South Korea has developed road-based transportation and uses a lot of gasoline. South Korea imports gasoline since it is not produced domestically. So, fluctuations in gasoline prices have a significant impact on the national economy. Currently, gasoline orders, which are based on gasoline consumption, are analyzed in relation to fluctuations in gasoline prices. However, gasoline orders can also change due to various non-price factors. Therefore, to understand the trend of gasoline orders, it is important to identify additional factors that gas stations consider when determining orders. We collected 180 monthly samples of data on 167 variables. Sudden international issues lead to rapid fluctuations in gasoline orders, which can lead to outliers. A class imbalance occurs because outliers are generally fewer in number than the normal data points. Therefore, to address the class imbalance, we proposed a method that grouped the data samples into 11 clusters using the K-means clustering algorithm and then augmented the data into 85 datasets in each cluster through the Variational Auto-Encoder. We evaluated the augmented datasets through the R-Squared, Root Mean Squared Errors, and accuracy of various regression models. Based on the experimental results, when predicting gasoline orders at gas stations in South Korea using augmented datasets, linear regression showed the best performance. Full article
(This article belongs to the Special Issue Advances and Challenges in Big Data Analytics and Applications)
Show Figures

Figure 1

21 pages, 647 KiB  
Article
Production Improvement Rate with Time Series Data on Standard Time at Manufacturing Sites
by Injong Ki, Hasup Song, Jihyeok Ryu and Jongpil Jeong
Appl. Sci. 2023, 13(19), 10937; https://0-doi-org.brum.beds.ac.uk/10.3390/app131910937 - 03 Oct 2023
Viewed by 1078
Abstract
Amid the changes brought about by the 4th Industrial Revolution, numerous studies have been undertaken to develop smart factories, with a strong emphasis on knowledge-based manufacturing through smart factory construction. Advances in manufacturing data collection, fusion, and mining technologies have significantly bolstered the [...] Read more.
Amid the changes brought about by the 4th Industrial Revolution, numerous studies have been undertaken to develop smart factories, with a strong emphasis on knowledge-based manufacturing through smart factory construction. Advances in manufacturing data collection, fusion, and mining technologies have significantly bolstered the utilization of knowledge-based manufacturing. Data mining technology is widely employed for facility maintenance and failure prediction. Smart factory operations are pursuing automation and autonomization. Automation of production planning is also essential to achieve automation and autonomy in factory operations, from planning to execution. With the advancement of data mining technology, it is possible to automate production planning for the production planning and prediction of future production through information based on current conditions based on the past. The baseline information generated based on the current situation is suitable for automating short-term operational planning. If we generate time series reference information based on data from the past to the present, we can also automate long-term operation planning. By measuring the results of productivity improvements in mass-produced products from the past to the present and extrapolating them to future products, time series baseline information on production time is generated. If the baseline information is used for long-term planning, it can be used to predict future production capacity and facility shortages. This study presents a methodology and utilization method for calculating the rate of change in production time, which can be applied to production plan prediction and equipment investment capacity forecasting in future factory operations, using historical time series production time data. Full article
(This article belongs to the Special Issue Advances and Challenges in Big Data Analytics and Applications)
Show Figures

Figure 1

16 pages, 3705 KiB  
Article
Applied Techniques for Twitter Data Retrieval in an Urban Area: Insight for Trip Production Modeling
by Rempu Sora Rayat, Adenantera Dwicaksono, Heru P. H. Putro and Puspita Dirgahayani
Appl. Sci. 2023, 13(14), 8539; https://0-doi-org.brum.beds.ac.uk/10.3390/app13148539 - 24 Jul 2023
Cited by 1 | Viewed by 1083
Abstract
This paper presents methods of retrieving Twitter data, both streaming and archive data, using Application Programming Interfaces. Twitter data are a kind of Location Based Social Network Data that, nowadays, is emerging in transportation demand modeling. Data regarding the locations of trip makers [...] Read more.
This paper presents methods of retrieving Twitter data, both streaming and archive data, using Application Programming Interfaces. Twitter data are a kind of Location Based Social Network Data that, nowadays, is emerging in transportation demand modeling. Data regarding the locations of trip makers represent the most crucial step in the modeling. No research article has specifically addressed this topic with an up-to-date method; hence, this paper aims to refresh methods for retrieving Twitter data that can capture relevant data. The method is unique as the data are gathered for trip production modeling in zonal urban areas. Python script programs were built for both data retrieving methods. The programs were run for streaming data from May 2020 to April 2021 and archive data from 2018. The data were collected within Serang City, which is the nearest provincial city to Jakarta, the capital of Indonesia. In order to gather streaming data with no loss, the program has been run with referencing on sub-district office coordinate locations. Retrieving the intended data produces 1,090,623 documents, of which 54,103 are geotagged data from 2495 users. The study concluded that streaming data produce more geolocation data, while historical data capture more Twitter user data with relatively very little geotagged data and greater textual data than the period covered in this research. Thus, both techniques of retrieving Twitter data for urban personal trip modeling are necessary. Obtaining sufficient data collection using data streaming retrieval resulted in the most effective data preprocessing. This research contributes to Location Based Social Network data mining knowledge, both geolocation and text mining, and is useful for insight into developing trip production modeling in passenger transportation demand modeling using Machine Learning. This study also aims to provide useful methods for transportation system researchers and data scientists in utilizing Location Based Social Network data. Full article
(This article belongs to the Special Issue Advances and Challenges in Big Data Analytics and Applications)
Show Figures

Figure 1

30 pages, 9410 KiB  
Article
FSopt_k: Finding the Optimal Anonymization Level for a Social Network Graph
by Maryam Kiabod, Mohammad Naderi Dehkordi, Behrang Barekatain and Kaamran Raahemifar
Appl. Sci. 2023, 13(6), 3770; https://0-doi-org.brum.beds.ac.uk/10.3390/app13063770 - 15 Mar 2023
Viewed by 1050
Abstract
k-degree anonymity is known as one of the best models for anonymizing social network graphs. Although recent works have tried to address the privacy challenges of social network graphs, privacy levels are considered to be independent of the features of the graph degree [...] Read more.
k-degree anonymity is known as one of the best models for anonymizing social network graphs. Although recent works have tried to address the privacy challenges of social network graphs, privacy levels are considered to be independent of the features of the graph degree sequence. In other words, the optimal value of k is not considered for the graph, leading to increasing information loss. Additionally, the graph may not need a high privacy level. In addition, determining the optimal value of k for the graph in advance is a big problem for the data owner. Therefore, in this paper, we present a technique named FSopt_k that is able to find the optimal value of k for each social network graph. This algorithm uses an efficient technique to partition the graph nodes to choose the best k value. It considers the graph structure features to determine the best privacy level. In this way, there will be a balance between privacy and loss in the anonymized graph. Furthermore, information loss will be as low as possible. The evaluation results depict that this algorithm can find the optimal value of k in a short time as well as preserve the graph’s utility. Full article
(This article belongs to the Special Issue Advances and Challenges in Big Data Analytics and Applications)
Show Figures

Figure 1

20 pages, 1945 KiB  
Article
A Study on ML-Based Sleep Score Model Using Lifelog Data
by Jiyong Kim and Minseo Park
Appl. Sci. 2023, 13(2), 1043; https://0-doi-org.brum.beds.ac.uk/10.3390/app13021043 - 12 Jan 2023
Viewed by 3062
Abstract
The rate of people suffering from sleep disorders has been continuously increasing in recent years, such that interest in healthy sleep is also naturally increasing. Although there are many health-care industries and services related to sleep, specific and objective evaluation of sleep habits [...] Read more.
The rate of people suffering from sleep disorders has been continuously increasing in recent years, such that interest in healthy sleep is also naturally increasing. Although there are many health-care industries and services related to sleep, specific and objective evaluation of sleep habits is still lacking. Most of the sleep scores presented in wearable-based sleep health services are calculated based only on the sleep stage ratio, which is not sufficient for studies considering the sleep dimension. In addition, most score generation techniques use weighted expert evaluation models, which are often selected based on experience instead of objective weights. Therefore, this study proposes an objective daily sleep habit score calculation method that considers various sleep factors based on user sleep data and gait data collected from wearable devices. A credit rating model built as a logistic regression model is adapted to generate sleep habit scores for good and bad sleep. Ensemble machine learning is designed to generate sleep habit scores for the intermediate sleep remainder. The sleep habit score and evaluation model of this study are expected to be in demand not only in health-care and health-service applications but also in the financial and insurance sectors. Full article
(This article belongs to the Special Issue Advances and Challenges in Big Data Analytics and Applications)
Show Figures

Figure 1

Back to TopTop