Next Issue
Volume 5, March
Previous Issue
Volume 4, September

Big Data Cogn. Comput., Volume 4, Issue 4 (December 2020) – 17 articles

Cover Story (view full-size image): Recent advancements in AI and human–machine interaction have paved the way for team collaboration of humans and machines, e.g., in industrial disassembly processes. Nonetheless, systematic and interdisciplinary approaches toward engineering systems that comprise human–machine teams are still rare. In this work, we review and analyze the state of the art and derive and discuss core requirements and concepts through an illustrating scenario. We then focus on how reciprocal trust between humans and intelligent machines is defined, built, measured, and maintained from a system engineering and planning perspective. We then outline three important areas of future research on engineering and operating human–machine teams for trusted collaboration. For each area, we describe exemplary research opportunities. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
Order results
Result details
Select all
Export citation of selected articles as:
Open AccessArticle
Big Data and Actuarial Science
Big Data Cogn. Comput. 2020, 4(4), 40; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040040 - 19 Dec 2020
Viewed by 701
Abstract
This article investigates the impact of big data on the actuarial sector. The growing fields of applications of data analytics and data mining raise the ability for insurance companies to conduct more accurate policy pricing by incorporating a broader variety of data due [...] Read more.
This article investigates the impact of big data on the actuarial sector. The growing fields of applications of data analytics and data mining raise the ability for insurance companies to conduct more accurate policy pricing by incorporating a broader variety of data due to increased data availability. The analyzed areas of this paper span from automobile insurance policy pricing, mortality and healthcare modeling to estimation of harvest-, climate- and cyber risk as well as assessment of catastrophe risk such as storms, hurricanes, tornadoes, geomagnetic events, earthquakes, floods, and fires. We evaluate the current use of big data in these contexts and how the utilization of data analytics and data mining contribute to the prediction capabilities and accuracy of policy premium pricing of insurance companies. We find a high penetration of insurance policy pricing in almost all actuarial fields except in the modeling and pricing of cyber security risk due to lack of data in this area and prevailing data asymmetries, for which we identify the application of artificial intelligence, in particular machine learning techniques, as a possible solution to improve policy pricing accuracy and results. Full article
Show Figures

Figure 1

Open AccessArticle
Annotation-Assisted Clustering of Player Profiles in Cultural Games: A Case for Tensor Analytics in Julia
Big Data Cogn. Comput. 2020, 4(4), 39; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040039 - 12 Dec 2020
Viewed by 647
Abstract
Computer games play an increasingly important role in cultural heritage preservation. They keep tradition alive in the digital domain, reflect public perception about historical events, and make history, and even legends, vivid, through means such as advanced storytelling and alternative timelines. In this [...] Read more.
Computer games play an increasingly important role in cultural heritage preservation. They keep tradition alive in the digital domain, reflect public perception about historical events, and make history, and even legends, vivid, through means such as advanced storytelling and alternative timelines. In this context, understanding the respective underlying player base is a major success factor as different game elements elicit various emotional responses across players. To this end, player profiles are often built from a combination of low- and high-level attributes. The former pertain to ordinary activity, such as collecting points or badges, whereas the latter to the outcome of strategic decisions, such as participation in in-game events such as tournaments and auctions. When available, annotations about in-game items or player activity supplement these profiles. In this article, we describe how such annotations may be integrated into different player profile clustering schemes derived from a template Simon–Ando iterative process. As a concrete example, the proposed methodology was applied to a custom benchmark dataset comprising the player base of a cultural game. The findings are interpreted in the light of Bartle taxonomy, one of the most prominent player categorization. Moreover, the clustering quality is based on intra-cluster distance and cluster compactness. Based on these results, recommendations in an affective context for maximizing engagement are proposed for the particular game player base composition. Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)
Show Figures

Figure 1

Open AccessArticle
A Traffic Analysis on Serverless Computing Based on the Example of a File Upload Stream on AWS Lambda
Big Data Cogn. Comput. 2020, 4(4), 38; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040038 - 10 Dec 2020
Viewed by 774
Abstract
The shift towards microservisation which can be observed in recent developments of the cloud landscape for applications has led towards the emergence of the Function as a Service (FaaS) concept, also called Serverless. This term describes the event-driven, reactive programming paradigm of [...] Read more.
The shift towards microservisation which can be observed in recent developments of the cloud landscape for applications has led towards the emergence of the Function as a Service (FaaS) concept, also called Serverless. This term describes the event-driven, reactive programming paradigm of functional components in container instances, which are scaled, deployed, executed and billed by the cloud provider on demand. However, increasing reports of issues of Serverless services have shown significant obscurity regarding its reliability. In particular, developers and especially system administrators struggle with latency compliance. In this paper, following a systematic literature review, the performance indicators influencing traffic and the effective delivery of the provider’s underlying infrastructure are determined by carrying out empirical measurements based on the example of a File Upload Stream on Amazon’s Web Service Cloud. This popular example was used as an experimental baseline in this study, based on different incoming request rates. Different parameters were used to monitor and evaluate changes through the function’s logs. It has been found that the so-called Cold-Start, meaning the time to provide a new instance, can increase the Round-Trip-Time by 15%, on average. Cold-Start happens after an instance has not been called for around 15 min, or after around 2 h have passed, which marks the end of the instance’s lifetime. The research shows how the numbers have changed in comparison to earlier related work, as Serverless is a fast-growing field of development. Furthermore, emphasis is given towards future research to improve the technology, algorithms, and support for developers. Full article
Show Figures

Figure 1

Open AccessArticle
eGAP: An Evolutionary Game Theoretic Approach to Random Forest Pruning
Big Data Cogn. Comput. 2020, 4(4), 37; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040037 - 28 Nov 2020
Viewed by 663
Abstract
To make healthcare available and easily accessible, the Internet of Things (IoT), which paved the way to the construction of smart cities, marked the birth of many smart applications in numerous areas, including healthcare. As a result, smart healthcare applications have been and [...] Read more.
To make healthcare available and easily accessible, the Internet of Things (IoT), which paved the way to the construction of smart cities, marked the birth of many smart applications in numerous areas, including healthcare. As a result, smart healthcare applications have been and are being developed to provide, using mobile and electronic technology, higher diagnosis quality of the diseases, better treatment of the patients, and improved quality of lives. Since smart healthcare applications that are mainly concerned with the prediction of healthcare data (like diseases for example) rely on predictive healthcare data analytics, it is imperative for such predictive healthcare data analytics to be as accurate as possible. In this paper, we will exploit supervised machine learning methods in classification and regression to improve the performance of the traditional Random Forest on healthcare datasets, both in terms of accuracy and classification/regression speed, in order to produce an effective and efficient smart healthcare application, which we have termed eGAP. eGAP uses the evolutionary game theoretic approach replicator dynamics to evolve a Random Forest ensemble. Trees of high resemblance in an initial Random Forest are clustered, and then clusters grow and shrink by adding and removing trees using replicator dynamics, according to the predictive accuracy of each subforest represented by a cluster of trees. All clusters have an initial number of trees that is equal to the number of trees in the smallest cluster. Cluster growth is performed using trees that are not initially sampled. The speed and accuracy of the proposed method have been demonstrated by an experimental study on 10 classification and 10 regression medical datasets. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Open AccessArticle
Ticket Sales Prediction and Dynamic Pricing Strategies in Public Transport
Big Data Cogn. Comput. 2020, 4(4), 36; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040036 - 27 Nov 2020
Viewed by 680
Abstract
In recent years, the demand for collective mobility services registered significant growth. In particular, the long-distance coach market underwent an important change in Europe, since FlixBus adopted a dynamic pricing strategy, providing low-cost transport services and an efficient and fast information system. This [...] Read more.
In recent years, the demand for collective mobility services registered significant growth. In particular, the long-distance coach market underwent an important change in Europe, since FlixBus adopted a dynamic pricing strategy, providing low-cost transport services and an efficient and fast information system. This paper presents a methodology, called DA4PT (Data Analytics for Public Transport), for discovering the factors that influence travelers in booking and purchasing bus tickets. Starting from a set of 3.23 million user-generated event logs of a bus ticketing platform, the methodology shows the correlation rules between booking factors and purchase of tickets. Such rules are then used to train machine learning models for predicting whether a user will buy or not a ticket. The rules are also used to define various dynamic pricing strategies with the purpose of increasing the number of tickets sales on the platform and the related amount of revenues. The methodology reaches an accuracy of 95% in forecasting the purchase of a ticket and a low variance in results. Exploiting a dynamic pricing strategy, DA4PT is able to increase the number of purchased tickets by 6% and the total revenue by 9% by showing the effectiveness of the proposed approach. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Open AccessArticle
Engineering Human–Machine Teams for Trusted Collaboration
Big Data Cogn. Comput. 2020, 4(4), 35; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040035 - 23 Nov 2020
Viewed by 651
Abstract
The way humans and artificially intelligent machines interact is undergoing a dramatic change. This change becomes particularly apparent in domains where humans and machines collaboratively work on joint tasks or objects in teams, such as in industrial assembly or disassembly processes. While there [...] Read more.
The way humans and artificially intelligent machines interact is undergoing a dramatic change. This change becomes particularly apparent in domains where humans and machines collaboratively work on joint tasks or objects in teams, such as in industrial assembly or disassembly processes. While there is intensive research work on human–machine collaboration in different research disciplines, systematic and interdisciplinary approaches towards engineering systems that consist of or comprise human–machine teams are still rare. In this paper, we review and analyze the state of the art, and derive and discuss core requirements and concepts by means of an illustrating scenario. In terms of methods, we focus on how reciprocal trust between humans and intelligent machines is defined, built, measured, and maintained from a systems engineering and planning perspective in literature. Based on our analysis, we propose and outline three important areas of future research on engineering and operating human–machine teams for trusted collaboration. For each area, we describe exemplary research opportunities. Full article
Show Figures

Figure 1

Open AccessArticle
An Adaptable Big Data Value Chain Framework for End-to-End Big Data Monetization
Big Data Cogn. Comput. 2020, 4(4), 34; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040034 - 23 Nov 2020
Viewed by 767
Abstract
Today, almost all active organizations manage a large amount of data from their business operations with partners, customers, and even competitors. They rely on Data Value Chain (DVC) models to handle data processes and extract hidden values to obtain reliable insights. With the [...] Read more.
Today, almost all active organizations manage a large amount of data from their business operations with partners, customers, and even competitors. They rely on Data Value Chain (DVC) models to handle data processes and extract hidden values to obtain reliable insights. With the advent of Big Data, operations have become increasingly more data-driven, facing new challenges related to volume, variety, and velocity, and giving birth to another type of value chain called Big Data Value Chain (BDVC). Organizations have become increasingly interested in this kind of value chain to extract confined knowledge and monetize their data assets efficiently. However, few contributions to this field have addressed the BDVC in a synoptic way by considering Big Data monetization. This paper aims to provide an exhaustive and expanded BDVC framework. This end-to-end framework allows us to handle Big Data monetization to make organizations’ processes entirely data-driven, support decision-making, and facilitate value co-creation. For this, we present a comprehensive review of existing BDVC models relying on some definitions and theoretical foundations of data monetization. Next, we expose research carried out on data monetization strategies and business models. Then, we offer a global and generic BDVC framework that supports most of the required phases to achieve data valorization. Furthermore, we present both a reduced and full monetization model to support many co-creation contexts along the BDVC. Full article
Show Figures

Figure 1

Open AccessArticle
A Complete VADER-Based Sentiment Analysis of Bitcoin (BTC) Tweets during the Era of COVID-19
Big Data Cogn. Comput. 2020, 4(4), 33; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040033 - 09 Nov 2020
Viewed by 836
Abstract
During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. [...] Read more.
During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. However, there is a research gap in determining the optimal preprocessing strategy in BTC tweets to develop an accurate machine learning prediction model for bitcoin prices. This paper develops different text preprocessing strategies for correlating the sentiment scores of Twitter text with Bitcoin prices during the COVID-19 pandemic. We explore the effect of different preprocessing functions, features, and time lengths of data on the correlation results. Out of 13 strategies, we discover that splitting sentences, removing Twitter-specific tags, or their combination generally improve the correlation of sentiment scores and volume polarity scores with Bitcoin prices. The prices only correlate well with sentiment scores over shorter timespans. Selecting the optimum preprocessing strategy would prompt machine learning prediction models to achieve better accuracy as compared to the actual prices. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

Open AccessArticle
JAMPI: Efficient Matrix Multiplication in Spark Using Barrier Execution Mode
Big Data Cogn. Comput. 2020, 4(4), 32; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040032 - 05 Nov 2020
Viewed by 873
Abstract
The new barrier mode in Apache Spark allows for embedding distributed deep learning training as a Spark stage to simplify the distributed training workflow. In Spark, a task in a stage does not depend on any other tasks in the same stage, and [...] Read more.
The new barrier mode in Apache Spark allows for embedding distributed deep learning training as a Spark stage to simplify the distributed training workflow. In Spark, a task in a stage does not depend on any other tasks in the same stage, and hence it can be scheduled independently. However, several algorithms require more sophisticated inter-task communications, similar to the MPI paradigm. By combining distributed message passing (using asynchronous network IO), OpenJDK’s new auto-vectorization and Spark’s barrier execution mode, we can add non-map/reduce-based algorithms, such as Cannon’s distributed matrix multiplication to Spark. We document an efficient distributed matrix multiplication using Cannon’s algorithm, which significantly improves on the performance of the existing MLlib implementation. Used within a barrier task, the algorithm described herein results in an up to 24% performance increase on a 10,000 × 10,000 square matrix with a significantly lower memory footprint. Applications of efficient matrix multiplication include, among others, accelerating the training and implementation of deep convolutional neural network-based workloads, and thus such efficient algorithms can play a ground-breaking role in the faster and more efficient execution of even the most complicated machine learning tasks. Full article
Show Figures

Figure 1

Open AccessArticle
OTNEL: A Distributed Online Deep Learning Semantic Annotation Methodology
Big Data Cogn. Comput. 2020, 4(4), 31; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040031 - 29 Oct 2020
Viewed by 766
Abstract
Semantic representation of unstructured text is crucial in modern artificial intelligence and information retrieval applications. The semantic information extraction process from an unstructured text fragment to a corresponding representation from a concept ontology is known as named entity disambiguation. In this work, we [...] Read more.
Semantic representation of unstructured text is crucial in modern artificial intelligence and information retrieval applications. The semantic information extraction process from an unstructured text fragment to a corresponding representation from a concept ontology is known as named entity disambiguation. In this work, we introduce a distributed, supervised deep learning methodology employing a long short-term memory-based deep learning architecture model for entity linking with Wikipedia. In the context of a frequently changing online world, we introduce and study the domain of online training named entity disambiguation, featuring on-the-fly adaptation to underlying knowledge changes. Our novel methodology evaluates polysemous anchor mentions with sense compatibility based on thematic segmentation of the Wikipedia knowledge graph representation. We aim at both robust performance and high entity-linking accuracy results. The introduced modeling process efficiently addresses conceptualization, formalization, and computational challenges for the online training entity-linking task. The novel online training concept can be exploited for wider adoption, as it is considerably beneficial for targeted topic, online global context consensus for entity disambiguation. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)
Show Figures

Figure 1

Open AccessArticle
Using Big and Open Data to Generate Content for an Educational Game to Increase Student Performance and Interest
Big Data Cogn. Comput. 2020, 4(4), 30; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040030 - 22 Oct 2020
Viewed by 818
Abstract
The goal of this paper is to utilize available big and open data sets to create content for a board and a digital game and implement an educational environment to improve students’ familiarity with concepts and relations in the data and, in the [...] Read more.
The goal of this paper is to utilize available big and open data sets to create content for a board and a digital game and implement an educational environment to improve students’ familiarity with concepts and relations in the data and, in the process, academic performance and engagement. To this end, we used Wikipedia data to generate content for a Monopoly clone called Geopoly and designed a game-based learning experiment. Our research examines whether this game had any impact on the students’ performance, which is related to identifying implied ranking and grouping mechanisms in the game, whether performance is correlated with interest and whether performance differs across genders. Student performance and knowledge about the relationships contained in the data improved significantly after playing the game, while the positive correlation between student interest and performance illustrated the relationship between them. This was also verified by a digital version of the game, evaluated by the students during the COVID-19 pandemic; initial results revealed that students found the game more attractive and rewarding than a traditional geography lesson. Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)
Show Figures

Figure 1

Open AccessArticle
Treatment of Bad Big Data in Research Data Management (RDM) Systems
Big Data Cogn. Comput. 2020, 4(4), 29; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040029 - 18 Oct 2020
Cited by 1 | Viewed by 811
Abstract
Databases such as research data management systems (RDMS) provide the research data in which information is to be searched for. They provide techniques with which even large amounts of data can be evaluated efficiently. This includes the management of research data and the [...] Read more.
Databases such as research data management systems (RDMS) provide the research data in which information is to be searched for. They provide techniques with which even large amounts of data can be evaluated efficiently. This includes the management of research data and the optimization of access to this data, especially if it cannot be fully loaded into the main memory. They also provide methods for grouping and sorting and optimize requests that are made to them so that they can be processed efficiently even when accessing large amounts of data. Research data offer one thing above all: the opportunity to generate valuable knowledge. The quality of research data is of primary importance for this. Only flawless research data can deliver reliable, beneficial results and enable sound decision-making. Correct, complete and up-to-date research data are therefore essential for successful operational processes. Wrong decisions and inefficiencies in day-to-day operations are only the tip of the iceberg, since the problems with poor data quality span various areas and weaken entire university processes. Therefore, this paper addresses the problems of data quality in the context of RDMS and tries to shed light on the solution for ensuring data quality and to show a way to fix the dirty research data that arise during its integration before it has a negative impact on business success. Full article
(This article belongs to the Special Issue Educational Data Mining and Technology)
Show Figures

Figure 1

Open AccessReview
A Review of Blockchain in Internet of Things and AI
Big Data Cogn. Comput. 2020, 4(4), 28; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040028 - 14 Oct 2020
Cited by 1 | Viewed by 1398
Abstract
The Internet of Things (IoT) represents a new technology that enables both virtual and physical objects to be connected and communicate with each other, and produce new digitized services that improve our quality of life. The IoT system provides several advantages, however, the [...] Read more.
The Internet of Things (IoT) represents a new technology that enables both virtual and physical objects to be connected and communicate with each other, and produce new digitized services that improve our quality of life. The IoT system provides several advantages, however, the current centralized architecture introduces numerous issues involving a single point of failure, security, privacy, transparency, and data integrity. These challenges are an obstacle in the way of the future developments of IoT applications. Moving the IoT into one of the distributed ledger technologies may be the correct choice to resolve these issues. Among the common and popular types of distributed ledger technologies is the blockchain. Integrating the IoT with blockchain technology can bring countless benefits. Therefore, this paper provides a comprehensive discussion of integrating the IoT system with blockchain technology. After providing the basics of the IoT system and blockchain technology, a thorough review of integrating the blockchain with the IoT system is presented by highlighting benefits of the integration and how the blockchain can resolve the issues of the IoT system. Then, the blockchain as a service for the IoT is presented to show how various features of blockchain technology can be implemented as a service for various IoT applications. This is followed by discussing the impact of integrating artificial intelligence (AI) on both IoT and blockchain. In the end, future research directions of IoT with blockchain are presented. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Open AccessArticle
An Intelligent Automatic Human Detection and Tracking System Based on Weighted Resampling Particle Filtering
Big Data Cogn. Comput. 2020, 4(4), 27; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040027 - 09 Oct 2020
Viewed by 805
Abstract
At present, traditional visual-based surveillance systems are becoming impractical, inefficient, and time-consuming. Automation-based surveillance systems appeared to overcome these limitations. However, the automatic systems have some challenges such as occlusion and retaining images smoothly and continuously. This research proposes a weighted resampling particle [...] Read more.
At present, traditional visual-based surveillance systems are becoming impractical, inefficient, and time-consuming. Automation-based surveillance systems appeared to overcome these limitations. However, the automatic systems have some challenges such as occlusion and retaining images smoothly and continuously. This research proposes a weighted resampling particle filter approach for human tracking to handle these challenges. The primary functions of the proposed system are human detection, human monitoring, and camera control. We used the codebook matching algorithm to define the human region as a target and track it, and we used the practical filter algorithm to follow and extract the target information. Consequently, the obtained information was used to configure the camera control. The experiments were tested in various environments to prove the stability and performance of the proposed system based on the active camera. Full article
Show Figures

Graphical abstract

Open AccessCase Report
Official Survey Data and Virtual Worlds—Designing an Integrative and Economical Open Source Production Pipeline for xR-Applications in Small and Medium-Sized Enterprises
Big Data Cogn. Comput. 2020, 4(4), 26; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040026 - 25 Sep 2020
Viewed by 1353
Abstract
How can official survey data be prepared for virtual worlds? For small and medium-sized enterprises (SMEs), these steps can still be very time-consuming. This article is vital for these companies, since its aim is to create a practical open source solution for everyday [...] Read more.
How can official survey data be prepared for virtual worlds? For small and medium-sized enterprises (SMEs), these steps can still be very time-consuming. This article is vital for these companies, since its aim is to create a practical open source solution for everyday work based on up-to-date research. Developing integrated Virtual Reality applications for geographic information systems VRGIS today, we are facing following problems: georeferenced data are currently available in many different data formats. Often there is a lack of common standards, user-friendliness and possibilities for process automation, just as the correct georeferencing in virtual worlds is an open field. It is shown that open source platforms can offer very interesting, practical and economical solutions. Following the method of structured and focused comparison according to George and Bennett, fourteen current software solutions are presented as examples. The applications can be classified according to the taxonomy of Anthes et al., with regard to output devices and software development. A comprehensive networking matrix for applied interactive technologies will be introduced for SME partner searches in related software developments. The evaluation criteria of integration capability, operability without programming knowledge and cost-effectiveness allow for a subsequent discussion and evaluation. Finally, this paper presents a simple proprietary and open-source software solution for small and medium-sized enterprises. Map illustrations and methods for georeferencing are explained. Exemplary digital products and data formats are presented at the Landesamt für Digitalisierung, Breitband und Vermessung (LDBV) in Bavaria. Full article
Show Figures

Figure 1

Open AccessArticle
Apache Spark SVM for Predicting Obstructive Sleep Apnea
Big Data Cogn. Comput. 2020, 4(4), 25; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040025 - 23 Sep 2020
Viewed by 678
Abstract
Obstructive sleep apnea (OSA), a common form of sleep apnea generally caused by a collapse of the upper respiratory airway, is associated with one of the leading causes of death in adults: hypertension, cardiovascular and cerebrovascular disease. In this paper, an algorithm for [...] Read more.
Obstructive sleep apnea (OSA), a common form of sleep apnea generally caused by a collapse of the upper respiratory airway, is associated with one of the leading causes of death in adults: hypertension, cardiovascular and cerebrovascular disease. In this paper, an algorithm for predicting obstructive sleep apnea episodes based on a spark-based support vector machine (SVM) is proposed. Wavelet decomposition and wavelet reshaping were used to denoise sleep apnea data, and cubic B-type interpolation wavelet transform was used to locate the QRS complex in OSA data. Twelve features were extracted, and SVM was used to predict OSA onset. Different configurations of SVM were compared with the regular, as well as Spark Big Data, frameworks. The results showed that Spark-based kernel SVM performs best, with an accuracy of 90.52% and specificity of 93.4%. Overall, Spark-SVM performed better than regular SVM, and polynomial SVM performed better than linear SVM, both for regular SVM and Spark-SVM. Full article
Show Figures

Figure 1

Open AccessArticle
Multi-Level Clustering-Based Outlier’s Detection (MCOD) Using Self-Organizing Maps
Big Data Cogn. Comput. 2020, 4(4), 24; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc4040024 - 23 Sep 2020
Viewed by 724
Abstract
Outlier detection is critical in many business applications, as it recognizes unusual behaviours to prevent losses and optimize revenue. For example, illegitimate online transactions can be detected based on its pattern with outlier detection. The performance of existing outlier detection methods is limited [...] Read more.
Outlier detection is critical in many business applications, as it recognizes unusual behaviours to prevent losses and optimize revenue. For example, illegitimate online transactions can be detected based on its pattern with outlier detection. The performance of existing outlier detection methods is limited by the pattern/behaviour of the dataset; these methods may not perform well without prior knowledge of the dataset. This paper proposes a multi-level outlier detection algorithm (MCOD) that uses multi-level unsupervised learning to cluster the data and discover outliers. The proposed detection method is tested on datasets in different fields with different sizes and dimensions. Experimental analysis has shown that the proposed MCOD algorithm has the ability to improving the outlier detection rate, as compared to the traditional anomaly detection methods. Enterprises and organizations can adopt the proposed MCOD algorithm to ensure a sustainable and efficient detection of frauds/outliers to increase profitability (and/or) to enhance business outcomes. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop