Big Data for eHealth Applications

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Applied Biosciences and Bioengineering".

Deadline for manuscript submissions: closed (10 May 2022) | Viewed by 39123

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors

Institute for High Performance Computing and Networking ICAR, National Research Council of Italy (CNR), 00185 Rome, Italy
Interests: parallel computing; natural language processing; artificial intelligence; deep learning; e-Health; big data analytics
Special Issues, Collections and Topics in MDPI journals
Institute for High Performance Computing and Networking ICAR, National Research Council of Italy (CNR), Rome, Italy
Interests: artificial intelligence; deep learning; natural language processing; big data analytics; quantum computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In the last few years, the rapid growth of the available digitised medical data has opened new challenges for the scientific research community in the healthcare informatic field. In this scenario, the constantly increasing volume of medical data, as well as the complexity and heterogeneity of this kind of data, requires innovative Big Data Analytics methods for extracting valuable insights from them and, at the same time, these new approaches must also guarantee the required levels of privacy and security. These solutions must provide effective and efficient tools to support the daily routine of physicians, medical professionals, and policy makers, improving the quality of the healthcare systems. The recent pandemic emergency has made the need for new Big Data approaches for the processing of such data more urgent.

This Special Issue will be focused on technologies and methodologies in the field of Big Data Analytics for eHealth, also considering their combined use with Internet of Thing (IoT) devices, Artificial Intelligence (AI) methods, and Cyber Security (CS) techniques for the definition of complex systems and architectures for the eHealth domain. A special regard will be paid to papers devoted to the analysis of COVID-19-related Big Data. Contributions can focus on architectures, algorithms, methods; survey papers and reviews are also welcomed.

Dr. Stefano Silvestri
Dr. Francesco Gargiulo
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • eHealth
  • Medical Informatics
  • Big data Analytics for eHealth
  • COVID-19 Big Data Analysis
  • Artificial Intelligence in Medicine
  • Biomedical Big Data Mining
  • Health Information Systems
  • Complex Big Data Architectures
  • Data-driven Methods
  • Security and Privacy of Medical Data

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review

5 pages, 184 KiB  
Editorial
Special Issue on Big Data for eHealth Applications
by Stefano Silvestri and Francesco Gargiulo
Appl. Sci. 2022, 12(15), 7578; https://0-doi-org.brum.beds.ac.uk/10.3390/app12157578 - 28 Jul 2022
Cited by 1 | Viewed by 1088
Abstract
In the last few years, the rapid growth in available digitised medical data has opened new challenges for the scientific research community in the healthcare informatics field [...] Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)

Research

Jump to: Editorial, Review

19 pages, 371 KiB  
Article
Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases
by Stefano Silvestri, Francesco Gargiulo and Mario Ciampi
Appl. Sci. 2022, 12(12), 5775; https://0-doi-org.brum.beds.ac.uk/10.3390/app12125775 - 07 Jun 2022
Cited by 13 | Viewed by 1978
Abstract
The large availability of clinical natural language documents, such as clinical narratives or diagnoses, requires the definition of smart automatic systems for their processing and analysis, but the lack of annotated corpora in the biomedical domain, especially in languages different from English, makes [...] Read more.
The large availability of clinical natural language documents, such as clinical narratives or diagnoses, requires the definition of smart automatic systems for their processing and analysis, but the lack of annotated corpora in the biomedical domain, especially in languages different from English, makes it difficult to exploit the state-of-art machine-learning systems to extract information from such kinds of documents. For these reasons, healthcare professionals lose big opportunities that can arise from the analysis of this data. In this paper, we propose a methodology to reduce the manual efforts needed to annotate a biomedical named entity recognition (B-NER) corpus, exploiting both active learning and distant supervision, respectively based on deep learning models (e.g., Bi-LSTM, word2vec FastText, ELMo and BERT) and biomedical knowledge bases, in order to speed up the annotation task and limit class imbalance issues. We assessed this approach by creating an Italian-language electronic health record corpus annotated with biomedical domain entities in a small fraction of the time required for a fully manual annotation. The obtained corpus was used to train a B-NER deep neural network whose performances are comparable with the state of the art, with an F1-Score equal to 0.9661 and 0.8875 on two test sets. Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)
Show Figures

Figure 1

22 pages, 1388 KiB  
Article
Cyberattack Path Generation and Prioritisation for Securing Healthcare Systems
by Shareeful Islam, Spyridon Papastergiou, Eleni-Maria Kalogeraki and Kitty Kioskli
Appl. Sci. 2022, 12(9), 4443; https://0-doi-org.brum.beds.ac.uk/10.3390/app12094443 - 27 Apr 2022
Cited by 9 | Viewed by 2176
Abstract
Cyberattacks in the healthcare sector are constantly increasing due to the increased usage of information technology in modern healthcare and the benefits of acquiring a patient healthcare record. Attack path discovery provides useful information to identify the possible paths that potential attackers might [...] Read more.
Cyberattacks in the healthcare sector are constantly increasing due to the increased usage of information technology in modern healthcare and the benefits of acquiring a patient healthcare record. Attack path discovery provides useful information to identify the possible paths that potential attackers might follow for a successful attack. By identifying the necessary paths, the mitigation of potential attacks becomes more effective in a proactive manner. Recently, there have been several works that focus on cyberattack path discovery in various sectors, mainly on critical infrastructure. However, there is a lack of focus on the vulnerability, exploitability and target user profile for the attack path generation. This is important for healthcare systems where users commonly have a lack of awareness and knowledge about the overall IT infrastructure. This paper presents a novel methodology for the cyberattack path discovery that is used to identify and analyse the possible attack paths and prioritise the ones that require immediate attention to ensure security within the healthcare ecosystem. The proposed methodology follows the existing published vulnerabilities from common vulnerabilities and exposures. It adopts the common vulnerability scoring system so that base metrics and exploitability features can be used to determine and prioritise the possible attack paths based on the threat actor capability, asset dependency and target user profile and evidence of indicator of compromise. The work includes a real example from the healthcare use case to demonstrate the methodology used for the attack path generation. The result from the studied context, which processes big data from healthcare applications, shows that the uses of various parameters such as CVSS metrics, threat actor profile, and Indicator of Compromise allow us to generate realistic attack paths. This certainly supports the healthcare practitioners in identifying the controls that are required to secure the overall healthcare ecosystem. Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)
Show Figures

Figure 1

27 pages, 9089 KiB  
Article
The Assessment of COVID-19 Vulnerability Risk for Crisis Management
by Marek Wyszyński, Michał Grudziński, Krzysztof Pokonieczny and Marek Kaszubowski
Appl. Sci. 2022, 12(8), 4090; https://0-doi-org.brum.beds.ac.uk/10.3390/app12084090 - 18 Apr 2022
Cited by 7 | Viewed by 1929
Abstract
The subject of this article is to determine COVID-19 vulnerability risk and its change over time in association with the state health care system, turnover, and transport to support the crisis management decision-making process. The aim was to determine the COVID-19 Vulnerability Index [...] Read more.
The subject of this article is to determine COVID-19 vulnerability risk and its change over time in association with the state health care system, turnover, and transport to support the crisis management decision-making process. The aim was to determine the COVID-19 Vulnerability Index (CVI) based on the selected criteria. The risk assessment was carried out with methodology that includes the application of multicriteria analysis and spatiotemporal aspect of available data. Particularly the Spatial Multicriteria Analysis (SMCA) compliant with the Analytical Hierarchy Process (AHP), which incorporated selected population and environmental criteria were used to analyse the ongoing pandemic situation. The influence of combining several factors in the pandemic situation analysis was illustrated. Furthermore, the static and dynamic factors to COVID-19 vulnerability risk were determined to prevent and control the spread of COVID-19 at the early stage of the pandemic situation. As a result, areas with a certain level of risk in different periods of time were determined. Furthermore, the number of people exposed to COVID-19 vulnerability risk in time was presented. These results can support the decision-making process by showing the area where preventive actions should be considered. Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)
Show Figures

Figure 1

19 pages, 1671 KiB  
Article
Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study
by Mayara Khadhraoui, Hatem Bellaaj, Mehdi Ben Ammar, Habib Hamam and Mohamed Jmaiel
Appl. Sci. 2022, 12(6), 2891; https://0-doi-org.brum.beds.ac.uk/10.3390/app12062891 - 11 Mar 2022
Cited by 30 | Viewed by 6344
Abstract
On 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many [...] Read more.
On 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many researchers in medicine have contributed to fighting COVID-19. In this context and given the great growth of scientific publications related to this global pandemic, manual text and data retrieval has become a challenging task. To remedy this challenge, we are proposing CovBERT, a pre-trained language model based on the BERT model to automate the literature review process. CovBERT relies on prior training on a large corpus of scientific publications in the biomedical domain and related to COVID-19 to increase its performance on the literature review task. We evaluate CovBERT on the classification of short text based on our scientific dataset of biomedical articles on COVID-19 entitled COV-Dat-20. We demonstrate statistically significant improvements by using BERT. Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)
Show Figures

Figure 1

11 pages, 1924 KiB  
Article
Design of a Wearable Healthcare Emergency Detection Device for Elder Persons
by Flora Amato, Walter Balzano and Giovanni Cozzolino
Appl. Sci. 2022, 12(5), 2345; https://0-doi-org.brum.beds.ac.uk/10.3390/app12052345 - 23 Feb 2022
Cited by 3 | Viewed by 2440
Abstract
Improving quality of life in geriatric patients is related to constant physical activity and fall prevention. In this paper, we propose a wearable system that takes advantage of sensors embedded in a smart device to collect data for movement identification (running, walking, falling [...] Read more.
Improving quality of life in geriatric patients is related to constant physical activity and fall prevention. In this paper, we propose a wearable system that takes advantage of sensors embedded in a smart device to collect data for movement identification (running, walking, falling and daily activities) of an elderly user in real-time. To provide high efficiency in fall detection, the sensor’s readings are analysed using a neural network. If a fall is detected, an alert is sent though a smartphone connected via Bluetooth. We conducted an experimental session using an Arduino Nano 33 BLE Sense board in inside and outside environments. The results of the experiment have shown that the system is extremely portable and provides high success rates in fall detection in terms of accuracy and loss. Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)
Show Figures

Figure 1

13 pages, 1749 KiB  
Article
Reducing the Heart Failure Burden in Romania by Predicting Congestive Heart Failure Using Artificial Intelligence: Proof of Concept
by Maria-Alexandra Pană, Ștefan-Sebastian Busnatu, Liviu-Ionut Serbanoiu, Electra Vasilescu, Nirvana Popescu, Cătălina Andrei and Crina-Julieta Sinescu
Appl. Sci. 2021, 11(24), 11728; https://0-doi-org.brum.beds.ac.uk/10.3390/app112411728 - 10 Dec 2021
Cited by 9 | Viewed by 2171
Abstract
Due to population aging, we are currently confronted with an increased number of chronic heart failure patients. The primary purpose of this study was to implement a noncontact system that can predict heart failure exacerbation through vocal analysis. We designed the system to [...] Read more.
Due to population aging, we are currently confronted with an increased number of chronic heart failure patients. The primary purpose of this study was to implement a noncontact system that can predict heart failure exacerbation through vocal analysis. We designed the system to evaluate the voice characteristics of every patient, and we used the identified variations as an input for a machine-learning-based approach. We collected data from a total of 16 patients, 9 men and 7 women, aged 65–91 years old, who agreed to take part in the study, with a detailed signed informed consent. We included hospitalized patients admitted with cardiogenic acute pulmonary edema in the study, regardless of the precipitation cause or other known cardiovascular comorbidities. There were no specific exclusion criteria, except age (which had to be over 18 years old) and patients with speech inabilities. We then recorded each patient’s voice twice a day, using the same smartphone, Lenovo P780, from day one of hospitalization—when their general status was critical—until the day of discharge, when they were clinically stable. We used the New York Heart Association Functional Classification (NYHA) classification system for heart failure to include the patients in stages based on their clinical evolution. Each voice recording has been accordingly equated and subsequently introduced into the machine-learning algorithm. We used multiple machine-learning techniques for classification in order to detect which one turns out to be more appropriate for the given dataset and the one that can be the starting point for future developments. We used algorithms such as Artificial Neural Networks (ANN), Support Vector Machine (SVM) and K-Nearest Neighbors (KNN). After integrating the information from 15 patients, the algorithm correctly classified the 16th patient into the third NYHA stage at hospitalization and second NYHA stage at discharge, based only on his voice recording. The KNN algorithm proved to have the best classification accuracy, with a value of 0.945. Voice is a cheap and easy way to monitor a patient’s health status. The algorithm we have used for analyzing the voice provides highly accurate preliminary results. We aim to obtain larger datasets and compute more complex voice analyzer algorithms to certify the outcomes presented. Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)
Show Figures

Figure 1

11 pages, 5580 KiB  
Article
Nonlinear Random Forest Classification, a Copula-Based Approach
by Radko Mesiar and Ayyub Sheikhi
Appl. Sci. 2021, 11(15), 7140; https://0-doi-org.brum.beds.ac.uk/10.3390/app11157140 - 02 Aug 2021
Cited by 11 | Viewed by 1956
Abstract
In this work, we use a copula-based approach to select the most important features for a random forest classification. Based on associated copulas between these features, we carry out this feature selection. We then embed the selected features to a random forest algorithm [...] Read more.
In this work, we use a copula-based approach to select the most important features for a random forest classification. Based on associated copulas between these features, we carry out this feature selection. We then embed the selected features to a random forest algorithm to classify a label-valued outcome. Our algorithm enables us to select the most relevant features when the features are not necessarily connected by a linear function; also, we can stop the classification when we reach the desired level of accuracy. We apply this method on a simulation study as well as a real dataset of COVID-19 and for a diabetes dataset. Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)
Show Figures

Figure 1

19 pages, 1642 KiB  
Article
A Novel Unsupervised Computational Method for Ventricular and Supraventricular Origin Beats Classification
by Manuel M. Casas, Roberto L. Avitia, Jose Antonio Cardenas-Haro, Jugal Kalita, Francisco J. Torres-Reyes, Marco A. Reyna and Miguel E. Bravo-Zanoguera
Appl. Sci. 2021, 11(15), 6711; https://0-doi-org.brum.beds.ac.uk/10.3390/app11156711 - 22 Jul 2021
Cited by 1 | Viewed by 1627
Abstract
Arrhythmias are the most common events tracked by a physician. The need for continuous monitoring of such events in the ECG has opened the opportunity for automatic detection. Intra- and inter-patient paradigms are the two approaches currently followed by the scientific community. The [...] Read more.
Arrhythmias are the most common events tracked by a physician. The need for continuous monitoring of such events in the ECG has opened the opportunity for automatic detection. Intra- and inter-patient paradigms are the two approaches currently followed by the scientific community. The intra-patient approach seems to resolve the problem with a high classification percentage but requires a physician to label key samples. The inter-patient makes use of historic data of different patients to build a general classifier, but the inherent variability in the ECG’s signal among patients leads to lower classification percentages compared to the intra-patient approach. In this work, we propose a new unsupervised algorithm that adapts to every patient using the heart rate and morphological features of the ECG beats to classify beats between supraventricular origin and ventricular origin. The results of our work in terms of F-score are 0.88, 0.89, and 0.93 for the ventricular origin beats for three popular ECG databases, and around 0.99 for the supraventricular origin for the same databases, comparable to supervised approaches presented in other works. This paper presents a new path to make use of ECG data to classify heartbeats without the assistance of a physician despite the needed improvements. Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)
Show Figures

Figure 1

9 pages, 963 KiB  
Article
On Combining Feature Selection and Over-Sampling Techniques for Breast Cancer Prediction
by Min-Wei Huang, Chien-Hung Chiu, Chih-Fong Tsai and Wei-Chao Lin
Appl. Sci. 2021, 11(14), 6574; https://0-doi-org.brum.beds.ac.uk/10.3390/app11146574 - 17 Jul 2021
Cited by 11 | Viewed by 2326
Abstract
Breast cancer prediction datasets are usually class imbalanced, where the number of data samples in the malignant and benign patient classes are significantly different. Over-sampling techniques can be used to re-balance the datasets to construct more effective prediction models. Moreover, some related studies [...] Read more.
Breast cancer prediction datasets are usually class imbalanced, where the number of data samples in the malignant and benign patient classes are significantly different. Over-sampling techniques can be used to re-balance the datasets to construct more effective prediction models. Moreover, some related studies have considered feature selection to remove irrelevant features from the datasets for further performance improvement. However, since the order of combining feature selection and over-sampling can result in different training sets to construct the prediction model, it is unknown which order performs better. In this paper, the information gain (IG) and genetic algorithm (GA) feature selection methods and the synthetic minority over-sampling technique (SMOTE) are used for different combinations. The experimental results based on two breast cancer datasets show that the combination of feature selection and over-sampling outperform the single usage of either feature selection and over-sampling for the highly class imbalanced datasets. In particular, performing IG first and SMOTE second is the better choice. For other datasets with a small class imbalance ratio and a smaller number of features, performing SMOTE is enough to construct an effective prediction model. Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)
Show Figures

Figure 1

Review

Jump to: Editorial, Research

35 pages, 50717 KiB  
Review
A Systematic Review of Federated Learning in the Healthcare Area: From the Perspective of Data Properties and Applications
by Prayitno, Chi-Ren Shyu, Karisma Trinanda Putra, Hsing-Chung Chen, Yuan-Yu Tsai, K. S. M. Tozammel Hossain, Wei Jiang and Zon-Yin Shae
Appl. Sci. 2021, 11(23), 11191; https://0-doi-org.brum.beds.ac.uk/10.3390/app112311191 - 25 Nov 2021
Cited by 64 | Viewed by 11748
Abstract
Recent advances in deep learning have shown many successful stories in smart healthcare applications with data-driven insight into improving clinical institutions’ quality of care. Excellent deep learning models are heavily data-driven. The more data trained, the more robust and more generalizable the performance [...] Read more.
Recent advances in deep learning have shown many successful stories in smart healthcare applications with data-driven insight into improving clinical institutions’ quality of care. Excellent deep learning models are heavily data-driven. The more data trained, the more robust and more generalizable the performance of the deep learning model. However, pooling the medical data into centralized storage to train a robust deep learning model faces privacy, ownership, and strict regulation challenges. Federated learning resolves the previous challenges with a shared global deep learning model using a central aggregator server. At the same time, patient data remain with the local party, maintaining data anonymity and security. In this study, first, we provide a comprehensive, up-to-date review of research employing federated learning in healthcare applications. Second, we evaluate a set of recent challenges from a data-centric perspective in federated learning, such as data partitioning characteristics, data distributions, data protection mechanisms, and benchmark datasets. Finally, we point out several potential challenges and future research directions in healthcare applications. Full article
(This article belongs to the Special Issue Big Data for eHealth Applications)
Show Figures

Figure 1

Back to TopTop