Special Issue "Innovative Applications of Big Data and Cloud Computing"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 February 2022.

Special Issue Editors

Prof. Dr. Chao-Tung Yang
E-Mail Website1 Website2
Guest Editor
Department of Computer Science, Tunghai University, Taichung 407705, Taiwan
Interests: cloud computing; big data; machine learning; parallel processing
Special Issues and Collections in MDPI journals
Dr. Chen-Kun Tsung
E-Mail Website
Guest Editor
Department of Computer Science and Information Engineering, National Chin-Yi University of Technology, Taichung 41170, Taiwan
Interests: cloud computing; big data; web-based applications; combinatorial optimization
Special Issues and Collections in MDPI journals
Dr. Neil Yuwen Yen
E-Mail Website
Guest Editor
Division of Computer Science, The University of Aizu, Aizu-Wakamatsu City, Fukushima-ken 965-8580, Japan
Interests: human modeling; data mining; machine learning; social web; cognitive science; awareness engineering
Special Issues and Collections in MDPI journals
Dr. Vinod Kumar Verma
E-Mail Website
Guest Editor
Department of Computer Science & Engineering, Sant Longowal Institute of Engineering & Technology, Punjab 148106, India
Interests: wireless sensor networks; trust and reputation systems; cloud computing; brain computing; internet of things; big data
Special Issues and Collections in MDPI journals

Special Issue Information

Dear Colleagues,

Humans are constantly generating huge amounts of data in different situations, e.g., living, manufacturing, research, etc. In recent years, the capture and processing of data has become easier, with applications designed to assist us in making decisions. For example, the air quality index (AQI) represents the pollution degree, and scientists collect the AQI value to offer constant suggestions for outdoor activities; analysts analyze traffic data to discover transportation demand, and drivers plan the travels through road usage; production managers use manufacturing data to ensure that product quality remains within acceptable tolerance ranges.

Such innovation services require handling of mass data to derive suggestions. Cloud computing has been of great assistance in allowing data to be controlled easily and efficiently. The on-demand delivery of services is the major advantage of the cloud computing, allowing services to be easily invoked without any hardware and software limitations nor geographic considerations. Thus, information delivery and data analysis can be separated, and analysts and researchers can focus on the system purposes. Not only system designers but also users prefer to access systems via cloud services.

To explore the innovation service and practical systems, the Special Issue “Innovative Applications of Big Data and Cloud Computing” aims at the applications of core service design, platform implementation, data visualization, and future prediction using big data and cloud computing. We invite researchers to contribute their state-of-the-art experimental or computational results, and the topics of particular interest are as follows:

  • Cloud System Design and Implementation.
  • Core Service Design and Implementation in Cloud or Web Ecosystems.
  • Front-End Service Design and Implementation in Cloud or Web Ecosystems.
  • Big Data Analysis and Implementation in Cloud or Web Ecosystems.
  • Big Data Visualization in Cloud or Web Ecosystems.

Please feel free to contact us with any questions.

Prof. Chao-Tung Yang
Dr. Chen-Kun Tsung
Dr. Neil Yen
Dr. Vinod Kumar Verma
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • cloud computing
  • innovation service design
  • practical platform implementation

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
An Event-Driven Serverless ETL Pipeline on AWS
Appl. Sci. 2021, 11(1), 191; https://0-doi-org.brum.beds.ac.uk/10.3390/app11010191 - 28 Dec 2020
Cited by 1 | Viewed by 876
Abstract
This work presents an event-driven Extract, Transform, and Load (ETL) pipeline serverless architecture and provides an evaluation of its performance over a range of dataflow tasks of varying frequency, velocity, and payload size. We design an experiment while using generated tabular data throughout [...] Read more.
This work presents an event-driven Extract, Transform, and Load (ETL) pipeline serverless architecture and provides an evaluation of its performance over a range of dataflow tasks of varying frequency, velocity, and payload size. We design an experiment while using generated tabular data throughout varying data volumes, event frequencies, and processing power in order to measure: (i) the consistency of pipeline executions; (ii) reliability on data delivery; (iii) maximum payload size per pipeline; and, (iv) economic scalability (cost of chargeable tasks). We run 92 parameterised experiments on a simple AWS architecture, thus avoiding any AWS-enhanced platform features, in order to allow for unbiased assessment of our model’s performance. Our results indicate that our reference architecture can achieve time-consistent data processing of event payloads of more than 100 MB, with a throughput of 750 KB/s across four event frequencies. It is also observed that, although the utilisation of an SQS queue for data transfer enables easy concurrency control and data slicing, it becomes a bottleneck on large sized event payloads. Finally, we develop and discuss a candidate pricing model for our reference architecture usage. Full article
(This article belongs to the Special Issue Innovative Applications of Big Data and Cloud Computing)
Show Figures

Figure 1

Article
Minimizing Resource Waste in Heterogeneous Resource Allocation for Data Stream Processing on Clouds
Appl. Sci. 2021, 11(1), 149; https://0-doi-org.brum.beds.ac.uk/10.3390/app11010149 - 25 Dec 2020
Viewed by 543
Abstract
Resource allocation is vital for improving system performance in big data processing. The resource demand for various applications can be heterogeneous in cloud computing. Therefore, a resource gap occurs while some resource capacities are exhausted and other resource capacities on the same server [...] Read more.
Resource allocation is vital for improving system performance in big data processing. The resource demand for various applications can be heterogeneous in cloud computing. Therefore, a resource gap occurs while some resource capacities are exhausted and other resource capacities on the same server are still available. This phenomenon is more apparent when the computing resources are more heterogeneous. Previous resource-allocation algorithms paid limited attention to this situation. When such an algorithm is applied to a server with heterogeneous resources, resource allocation may result in considerable resource wastage for the available but unused resources. To reduce resource wastage, a resource-allocation algorithm, called the minimizing resource gap (MRG) algorithm, for heterogeneous resources is proposed in this study. In MRG, the gap between resource usages for each server in cloud computing and the resource demands among various applications are considered. When an application is launched, MRG calculates resource usage and allocates resources to the server with the minimized usage gap to reduce the amount of available but unused resources. To demonstrate MRG performance, the MRG algorithm was implemented in Apache Spark. CPU- and memory-intensive applications were applied as benchmarks with different resource demands. Experimental results proved the superiority of the proposed MRG approach for improving the system utilization to reduce the overall completion time by up to 24.7% for heterogeneous servers in cloud computing. Full article
(This article belongs to the Special Issue Innovative Applications of Big Data and Cloud Computing)
Show Figures

Figure 1

Article
Semi-Automatic Cloud-Native Video Annotation for Autonomous Driving
Appl. Sci. 2020, 10(12), 4301; https://0-doi-org.brum.beds.ac.uk/10.3390/app10124301 - 23 Jun 2020
Viewed by 617
Abstract
An innovative solution named Annotation as a Service (AaaS) has been specifically designed to integrate heterogeneous video annotation workflows into containers and take advantage of a cloud native highly scalable and reliable design based on Kubernetes workloads. Using the AaaS as a foundation, [...] Read more.
An innovative solution named Annotation as a Service (AaaS) has been specifically designed to integrate heterogeneous video annotation workflows into containers and take advantage of a cloud native highly scalable and reliable design based on Kubernetes workloads. Using the AaaS as a foundation, the execution of automatic video annotation workflows is addressed in the broader context of a semi-automatic video annotation business logic for ground truth generation for Autonomous Driving (AD) and Advanced Driver Assistance Systems (ADAS). The document presents design decisions, innovative developments, and tests conducted to provide scalability to this cloud-native ecosystem for semi-automatic annotation. The solution has proven to be efficient and resilient on an AD/ADAS scale, specifically in an experiment with 25 TB of input data to annotate, 4000 concurrent annotation jobs, and 32 worker nodes forming a high performance computing cluster with a total of 512 cores, and 2048 GB of RAM. Automatic pre-annotations with the proposed strategy reduce the time of human participation in the annotation up to 80% maximum and 60% on average. Full article
(This article belongs to the Special Issue Innovative Applications of Big Data and Cloud Computing)
Show Figures

Figure 1

Article
FirepanIF: High Performance Host-Side Flash Cache Warm-Up Method in Cloud Computing
Appl. Sci. 2020, 10(3), 1014; https://0-doi-org.brum.beds.ac.uk/10.3390/app10031014 - 04 Feb 2020
Viewed by 681
Abstract
In cloud computing, a shared storage server, which provides a network-attached storage device, is usually used for centralized data management. However, when multiple virtual machines (VMs) concurrently access the storage server through the network, the performance of each VM may decrease due to [...] Read more.
In cloud computing, a shared storage server, which provides a network-attached storage device, is usually used for centralized data management. However, when multiple virtual machines (VMs) concurrently access the storage server through the network, the performance of each VM may decrease due to limited bandwidth. To address this issue, a flash-based storage device such as a solid state drive (SSD) is often employed as a cache in the host server. This host-side flash cache saves remote data, which are frequently accessed by the VM, locally in the cache. However, frequent VM migration in the data center can weaken the effectiveness of a host-side flash cache as the migrated VM needs to warm up its flash cache again on the destination machine. This study proposes Cachemior, Firepan, and FirepanIF for rapid flash-cache migration in cloud computing. Cachemior warms up the flash cache with a data preloading approach using the shared storage server after VM migration. However, it does not achieve a satisfactory level of performance. Firepan and FirepanIF use the source node’s flash cache as the data source for flash cache warm-up. They can migrate the flash-cache more quickly than conventional methods as they can avoid storage and network congestion on the shared storage server. Firepan incurs downtime of the VM during flash cache migration for data consistency. FirepanIF minimizes the VM downtime with the invalidation filter, which traces the I/O activity of the migrated VM during flash cache migration in order to invalidate inconsistent cache blocks. We implement and evaluate the three flash cache migration techniques in a realistic virtualized environment. FirepanIF demonstrates that it can improve the performance of the I/O workload by up to 21.87% compared to conventional methods. Full article
(This article belongs to the Special Issue Innovative Applications of Big Data and Cloud Computing)
Show Figures

Figure 1

Article
Resource Utilization Scheme of Idle Virtual Machines for Multiple Large-Scale Jobs Based on OpenStack
Appl. Sci. 2019, 9(20), 4327; https://0-doi-org.brum.beds.ac.uk/10.3390/app9204327 - 15 Oct 2019
Viewed by 1022
Abstract
Cloud computing services that provide computing resources to users through the Internet also provide computing resources in a virtual machine form based on virtualization techniques. In general, supercomputing and grid computing have mainly been used to process large-scale jobs occurring in scientific, technical, [...] Read more.
Cloud computing services that provide computing resources to users through the Internet also provide computing resources in a virtual machine form based on virtualization techniques. In general, supercomputing and grid computing have mainly been used to process large-scale jobs occurring in scientific, technical, and engineering application domains. However, services that process large-scale jobs in parallel using idle virtual machines are not provided in cloud computing at present. Generally, users do not use virtual machines anymore, or they do not use them for a long period of time, because existing cloud computing assigns all of the use rights of virtual machines to users, resulting in the low use of computing resources. This study proposes a scheme to process large-scale jobs in parallel, using idle virtual machines and increasing the resource utilization of idle virtual machines. Idle virtual machines are basically identified through specific determination criteria out of virtual machines created using OpenStack, and then they are used in computing services. This is called the idle virtual machine–resource utilization (IVM–ReU), which is proposed in this study. Full article
(This article belongs to the Special Issue Innovative Applications of Big Data and Cloud Computing)
Show Figures

Figure 1

Article
Diagnosis and Prediction of Large-for-Gestational-Age Fetus Using the Stacked Generalization Method
Appl. Sci. 2019, 9(20), 4317; https://0-doi-org.brum.beds.ac.uk/10.3390/app9204317 - 14 Oct 2019
Cited by 9 | Viewed by 1057
Abstract
An accurate and efficient Large-for-Gestational-Age (LGA) classification system is developed to classify a fetus as LGA or non-LGA, which has the potential to assist paediatricians and experts in establishing a state-of-the-art LGA prognosis process. The performance of the proposed scheme is validated by [...] Read more.
An accurate and efficient Large-for-Gestational-Age (LGA) classification system is developed to classify a fetus as LGA or non-LGA, which has the potential to assist paediatricians and experts in establishing a state-of-the-art LGA prognosis process. The performance of the proposed scheme is validated by using LGA dataset collected from the National Pre-Pregnancy and Examination Program of China (2010–2013). A master feature vector is created to establish primarily data pre-processing, which includes a features’ discretization process and the entertainment of missing values and data imbalance issues. A principal feature vector is formed using GridSearch-based Recursive Feature Elimination with Cross-Validation (RFECV) + Information Gain (IG) feature selection scheme followed by stacking to select, rank, and extract significant features from the LGA dataset. Based on the proposed scheme, different features subset are identified and provided to four different machine learning (ML) classifiers. The proposed GridSearch-based RFECV+IG feature selection scheme with stacking using SVM (linear kernel) best suits the said classification process followed by SVM (RBF kernel) and LR classifiers. The Decision Tree (DT) classifier is not suggested because of its low performance. The highest prediction precision, recall, accuracy, Area Under the Curve (AUC), specificity, and F1 scores of 0.92, 0.87, 0.92, 0.95, 0.95, and 0.89 are achieved with SVM (linear kernel) classifier using top ten principal features subset, which is, in fact higher than the baselines methods. Moreover, almost every classification scheme best performed with ten principal feature subsets. Therefore, the proposed scheme has the potential to establish an efficient LGA prognosis process using gestational parameters, which can assist paediatricians and experts to improve the health of a newborn using computer aided-diagnostic system. Full article
(This article belongs to the Special Issue Innovative Applications of Big Data and Cloud Computing)
Show Figures

Figure 1

Back to TopTop