Data Science for Medical Informatics

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Applied Biosciences and Bioengineering".

Deadline for manuscript submissions: closed (25 February 2022) | Viewed by 27273

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer and Biosciences, University of Applied Sciences Mittweida, 09648 Mittweida, Germany
Interests: data integration in medical and life sciences in general, infrastructures for distributed privacy preserving data analyses, data science in medical research, FAIR data points
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Institute for Medical Informatics, University of Cologne, 50923 Köln, Germany
Interests: big data and machine learning in medical sciences; personalized medicine; linked data and semantic web applications in medical sciences; FAIR data

Special Issue Information

Dear Colleagues,

Medical data science is a rapidly growing field and will transform healthcare and medical research. Despite the presence of a large spectrum of analysis methods, including powerful machine learning algorithms, unavailability and low quality of data is the biggest barrier for developing successful AI based medical applications.

Most of the medical data still resides in non- digital form or as text, or poorly documented and can not be semantically interpreted. FAIR (Findable, Accessible, Interoperable, Reusable) data is the key for closing the gap between advancement in machine learning and translation of AI in clinics.  The FAIR data management and analysis leads to reproducible, interpretable, transferable  data science, as well as having a positive impact on data quality.

This special issue addresses this gap by collecting best FAIR data practices in medical data science.  We welcome submissions presenting new approaches and methods for improving FAIR data and quality, reproducible and privacy preserved analyses, as well as success stories for translation and impact of AI in medical practice. 

Prof. Dr. Toralf Kirsten
Prof. Dr. Oya Beyan
Guest Editors

Keywords

  • FAIR data points
  • data quality
  • reproducible and Transferable AI
  • case studies for demonstrating impact of AI in patient care
  • best practices for data science in medical sciences
  • privacy preserving data analysis

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

16 pages, 396 KiB  
Article
GAN-Based Approaches for Generating Structured Data in the Medical Domain
by Masoud Abedi, Lars Hempel, Sina Sadeghi and Toralf Kirsten
Appl. Sci. 2022, 12(14), 7075; https://0-doi-org.brum.beds.ac.uk/10.3390/app12147075 - 13 Jul 2022
Cited by 15 | Viewed by 4993
Abstract
Modern machine and deep learning methods require large datasets to achieve reliable and robust results. This requirement is often difficult to meet in the medical field, due to data sharing limitations imposed by privacy regulations or the presence of a small number of [...] Read more.
Modern machine and deep learning methods require large datasets to achieve reliable and robust results. This requirement is often difficult to meet in the medical field, due to data sharing limitations imposed by privacy regulations or the presence of a small number of patients (e.g., rare diseases). To address this data scarcity and to improve the situation, novel generative models such as Generative Adversarial Networks (GANs) have been widely used to generate synthetic data that mimic real data by representing features that reflect health-related information without reference to real patients. In this paper, we consider several GAN models to generate synthetic data used for training binary (malignant/benign) classifiers, and compare their performances in terms of classification accuracy with cases where only real data are considered. We aim to investigate how synthetic data can improve classification accuracy, especially when a small amount of data is available. To this end, we have developed and implemented an evaluation framework where binary classifiers are trained on extended datasets containing both real and synthetic data. The results show improved accuracy for classifiers trained with generated data from more advanced GAN models, even when limited amounts of original data are available. Full article
(This article belongs to the Special Issue Data Science for Medical Informatics)
Show Figures

Figure 1

17 pages, 1503 KiB  
Article
Towards an Ontology-Based Phenotypic Query Model
by Christoph Beger, Franz Matthies, Ralph Schäfermeier, Toralf Kirsten, Heinrich Herre and Alexandr Uciteli
Appl. Sci. 2022, 12(10), 5214; https://0-doi-org.brum.beds.ac.uk/10.3390/app12105214 - 21 May 2022
Cited by 1 | Viewed by 1348
Abstract
Clinical research based on data from patient or study data management systems plays an important role in transferring basic findings into the daily practices of physicians. To support study recruitment, diagnostic processes, and risk factor evaluation, search queries for such management systems can [...] Read more.
Clinical research based on data from patient or study data management systems plays an important role in transferring basic findings into the daily practices of physicians. To support study recruitment, diagnostic processes, and risk factor evaluation, search queries for such management systems can be used. Typically, the query syntax as well as the underlying data structure vary greatly between different data management systems. This makes it difficult for domain experts (e.g., clinicians) to build and execute search queries. In this work, the Core Ontology of Phenotypes is used as a general model for phenotypic knowledge. This knowledge is required to create search queries that determine and classify individuals (e.g., patients or study participants) whose morphology, function, behaviour, or biochemical and physiological properties meet specific phenotype classes. A specific model describing a set of particular phenotype classes is called a Phenotype Specification Ontology. Such an ontology can be automatically converted to search queries on data management systems. The methods described have already been used successfully in several projects. Using ontologies to model phenotypic knowledge on patient or study data management systems is a viable approach. It allows clinicians to model from a domain perspective without knowing the actual data structure or query language. Full article
(This article belongs to the Special Issue Data Science for Medical Informatics)
Show Figures

Figure 1

21 pages, 762 KiB  
Article
Multi-Institutional Breast Cancer Detection Using a Secure On-Boarding Service for Distributed Analytics
by Sascha Welten, Lars Hempel, Masoud Abedi, Yongli Mou, Mehrshad Jaberansary, Laurenz Neumann, Sven Weber, Kais Tahar, Yeliz Ucer Yediel, Matthias Löbe, Stefan Decker, Oya Beyan and Toralf Kirsten
Appl. Sci. 2022, 12(9), 4336; https://0-doi-org.brum.beds.ac.uk/10.3390/app12094336 - 25 Apr 2022
Cited by 4 | Viewed by 2235
Abstract
The constant upward movement of data-driven medicine as a valuable option to enhance daily clinical practice has brought new challenges for data analysts to get access to valuable but sensitive data due to privacy considerations. One solution for most of these challenges are [...] Read more.
The constant upward movement of data-driven medicine as a valuable option to enhance daily clinical practice has brought new challenges for data analysts to get access to valuable but sensitive data due to privacy considerations. One solution for most of these challenges are Distributed Analytics (DA) infrastructures, which are technologies fostering collaborations between healthcare institutions by establishing a privacy-preserving network for data sharing. However, in order to participate in such a network, a lot of technical and administrative prerequisites have to be made, which could pose bottlenecks and new obstacles for non-technical personnel during their deployment. We have identified three major problems in the current state-of-the-art. Namely, the missing compliance with FAIR data principles, the automation of processes, and the installation. In this work, we present a seamless on-boarding workflow based on a DA reference architecture for data sharing institutions to address these problems. The on-boarding service manages all technical configurations and necessities to reduce the deployment time. Our aim is to use well-established and conventional technologies to gain acceptance through enhanced ease of use. We evaluate our development with six institutions across Germany by conducting a DA study with open-source breast cancer data, which represents the second contribution of this work. We find that our on-boarding solution lowers technical barriers and efficiently deploys all necessary components and is, therefore, indeed an enabler for collaborative data sharing. Full article
(This article belongs to the Special Issue Data Science for Medical Informatics)
Show Figures

Figure 1

15 pages, 10055 KiB  
Article
Mapping Cancer Registry Data to the Episode Domain of the Observational Medical Outcomes Partnership Model (OMOP)
by Jasmin Carus, Sylvia Nürnberg, Frank Ückert, Catarina Schlüter and Stefan Bartels
Appl. Sci. 2022, 12(8), 4010; https://0-doi-org.brum.beds.ac.uk/10.3390/app12084010 - 15 Apr 2022
Cited by 6 | Viewed by 2207
Abstract
A great challenge in the use of standardized cancer registry data is deriving reliable, evidence-based results from large amounts of data. A solution could be its mapping to a common data model such as OMOP, which represents knowledge in a unified semantic base, [...] Read more.
A great challenge in the use of standardized cancer registry data is deriving reliable, evidence-based results from large amounts of data. A solution could be its mapping to a common data model such as OMOP, which represents knowledge in a unified semantic base, enabling decentralized analysis. The recently released Episode Domain of the OMOP CDM allows episodic modelling of a patient’ disease and treatment phases. In this study, we mapped oncology registry data to the Episode Domain. A total of 184,718 Episodes could be implemented, with the Concept of Cancer Drug Treatment most frequently. Additionally, source data were mapped to new terminologies as part of the release. It was possible to map ≈ 73.8% of the source data to the respective OMOP standard. Best mapping was achieved in the Procedure Domain with 98.7%. To evaluate the implementation, the survival probabilities of the CDM and source system were calculated (n = 2756/2902, median OAS = 82.2/91.1 months, 95% Cl = 77.4–89.5/84.4–100.9). In conclusion, the new release of the CDM increased its applicability, especially in observational cancer research. Regarding the mapping, a higher score could be achieved if terminologies which are frequently used in Europe are included in the Standardized Vocabulary Metadata Repository. Full article
(This article belongs to the Special Issue Data Science for Medical Informatics)
Show Figures

Figure 1

8 pages, 1693 KiB  
Article
Use of Process Modelling for Optimization of Molecular Tumor Boards
by Katharina Lauk, Mia-Carlotta Peters, Janna-Lisa Velthaus, Sylvia Nürnberg and Frank Ueckert
Appl. Sci. 2022, 12(7), 3485; https://0-doi-org.brum.beds.ac.uk/10.3390/app12073485 - 29 Mar 2022
Cited by 1 | Viewed by 1918
Abstract
In Molecular Tumor Boards, a team of experts discuss the individual therapy options of a cancer patient based on their individual molecular profile. The process—from recommendation request, through molecular diagnosis, to a personalized therapy recommendation—is complex and time-consuming. Therefore, process optimization is needed [...] Read more.
In Molecular Tumor Boards, a team of experts discuss the individual therapy options of a cancer patient based on their individual molecular profile. The process—from recommendation request, through molecular diagnosis, to a personalized therapy recommendation—is complex and time-consuming. Therefore, process optimization is needed to decrease the workload of physicians and to standardize the process. For this purpose, we modeled the current workflow of the Molecular Tumor Board at the University Hospital Hamburg-Eppendorf on Service-Oriented Architecture using Business Process Modeling and Notation to highlight areas for improvement. This identified many manual tasks and an extensive workload for the physician. We then created a novel, simplified, more efficient workflow in which the physician is supported by additional software. In summary, we show that the use of Service-Oriented Architecture using Business Process Modeling and Notation for Molecular Tumor Board processes promotes rapid adaptability, standardization, interoperability, quality assurance, and facilitates collaboration. Full article
(This article belongs to the Special Issue Data Science for Medical Informatics)
Show Figures

Figure 1

19 pages, 417 KiB  
Article
A Secure CDM-Based Data Analysis Platform (SCAP) in Multi-Centered Distributed Setting
by Seungho Jeon, Chobyeol Shin, Eunnarae Ko and Jongsub Moon
Appl. Sci. 2021, 11(19), 9072; https://0-doi-org.brum.beds.ac.uk/10.3390/app11199072 - 29 Sep 2021
Viewed by 1793
Abstract
Hospitals have their own database structures and maintain their data in a closed manner. For this reason, it is difficult for researchers outside of institutions to access multi-center data. Therefore, if the data maintained by all hospitals follow a commonly shared format, researchers [...] Read more.
Hospitals have their own database structures and maintain their data in a closed manner. For this reason, it is difficult for researchers outside of institutions to access multi-center data. Therefore, if the data maintained by all hospitals follow a commonly shared format, researchers can analyze multi-center data using the same method. To safely analyze data using a common data model (CDM) in a distributed multi-center network environment, the objective of this study is to propose and implement the processes for distribution, executing the analysis codes, and returning the results. A secure CDM-based data analysis platform (SCAP) consists of a certificate authority (CA), authentication server (AS), code signer (CS), ticket-granting server (TGS), relaying server (RS), and service server (SS). The AS, CS, TGS, and RS form the central server group of the platform. An SS is stored on a hospital server as an agent for communication with the server group. We designed the functionalities and communication protocols among servers. To safely conduct the intended functions, the proposed protocol was implemented based on a cryptographic algorithm. An SCAP was developed as a web application running on this protocol. Users accessed the platform through a web-based interface. Full article
(This article belongs to the Special Issue Data Science for Medical Informatics)
Show Figures

Figure 1

Review

Jump to: Research

26 pages, 445 KiB  
Review
R Packages for Data Quality Assessments and Data Monitoring: A Software Scoping Review with Recommendations for Future Developments
by Joany Mariño, Elisa Kasbohm, Stephan Struckmann, Lorenz A. Kapsner and Carsten O. Schmidt
Appl. Sci. 2022, 12(9), 4238; https://0-doi-org.brum.beds.ac.uk/10.3390/app12094238 - 22 Apr 2022
Cited by 2 | Viewed by 5512
Abstract
Data quality assessments (DQA) are necessary to ensure valid research results. Despite the growing availability of tools of relevance for DQA in the R language, a systematic comparison of their functionalities is missing. Therefore, we review R packages related to data quality (DQ) [...] Read more.
Data quality assessments (DQA) are necessary to ensure valid research results. Despite the growing availability of tools of relevance for DQA in the R language, a systematic comparison of their functionalities is missing. Therefore, we review R packages related to data quality (DQ) and assess their scope against a DQ framework for observational health studies. Based on a systematic search, we screened more than 140 R packages related to DQA in the Comprehensive R Archive Network. From these, we selected packages which target at least three of the four DQ dimensions (integrity, completeness, consistency, accuracy) in a reference framework. We evaluated the resulting 27 packages for general features (e.g., usability, metadata handling, output types, descriptive statistics) and the possible assessment’s breadth. To facilitate comparisons, we applied all packages to a publicly available dataset from a cohort study. We found that the packages’ scope varies considerably regarding functionalities and usability. Only three packages follow a DQ concept, and some offer an extensive rule-based issue analysis. However, the reference framework does not include a few implemented functionalities, and it should be broadened accordingly. Improved use of metadata to empower DQA and user-friendliness enhancement, such as GUIs and reports that grade the severity of DQ issues, stand out as the main directions for future developments. Full article
(This article belongs to the Special Issue Data Science for Medical Informatics)
Show Figures

Figure 1

17 pages, 317 KiB  
Review
AI Ethics—A Bird’s Eye View
by Maria Christoforaki and Oya Beyan
Appl. Sci. 2022, 12(9), 4130; https://0-doi-org.brum.beds.ac.uk/10.3390/app12094130 - 20 Apr 2022
Cited by 14 | Viewed by 5416
Abstract
The explosion of data-driven applications using Artificial Intelligence (AI) in recent years has given rise to a variety of ethical issues regarding data collection, annotation, and processing using mostly opaque algorithms, as well as the interpretation and employment of the results of the [...] Read more.
The explosion of data-driven applications using Artificial Intelligence (AI) in recent years has given rise to a variety of ethical issues regarding data collection, annotation, and processing using mostly opaque algorithms, as well as the interpretation and employment of the results of the AI pipeline. The ubiquity of AI applications negatively impacts a variety of sensitive areas, ranging from discrimination against vulnerable populations to privacy invasion and the environmental cost that these algorithms entail, and puts into focus on the ever present domain of AI ethics. In this review article we present a bird’s eye view approach of the AI ethics landscape, starting from a historical point of view, examining the moral issues that were introduced by big datasets and the application of non-symbolic AI algorithms, the normative approaches (principles and guidelines) to these issues and the ensuing criticism, as well as the actualization of these principles within the proposed frameworks. Subsequently, we focus on the concept of responsibility, both as personal responsibility of the AI practitioners and sustainability, meaning the promotion of beneficence for both the society and the domain, and the role of professional certification and education in averting unethical choices. Finally, we conclude with indicating the multidisciplinary nature of AI ethics and suggesting future challenges. Full article
(This article belongs to the Special Issue Data Science for Medical Informatics)
Back to TopTop