Health Data Information Retrieval

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Systems".

Deadline for manuscript submissions: 31 May 2024 | Viewed by 27624

Special Issue Editors


E-Mail Website
Guest Editor
Institute for High Performance Computing and Networking, National Research Council of Italy, 7-00185 Roma, Italy
Interests: e-health; security; electronic health records; interoperability

E-Mail Website
Guest Editor
Institute for High Performance Computing and Networking, National Research Council of Italy, 7-00185 Roma, Italy
Interests: software architectures; interoperability; information systems; health informatics

Special Issue Information

Dear Colleagues,

The MDPI Information Journal invites submissions to a Special Issue on “Health Data Information Retrieval”.

The increasing proliferation of digital health data coming from a variety of sources, including electronic health records, laboratory results, and personal device data, is bringing new opportunities for improving healthcare and research. Such data are characterized from a wide heterogeneity in terms of content, format, and clinical domain and representation, ranging from structured to unstructured forms (such as text, images, signals, and so on). This raises a number of challenges relating to effective storage, access, and searching of information for several purposes in the domain of healthcare.

Moreover, recent advances in machine learning, natural language processing, and big data analytics, along with the spread of health informatics standards, are able to facilitate systematic data collection and discovery from information retrieval systems.

Authors are invited to contribute by submitting original papers describing research findings, innovative solutions, and lessons learned in health data information retrieval. The aim of this Special Issues is to provide both a showcase to present the main research results in the area and an engine to introduce and explore new concepts.

Relevant topics of interest include but are not limited to the following:

  • Information extraction in healthcare;
  • Data processing in healthcare;
  • Distributed and large-scale health data access and management;
  • Information and networking security in healthcare;
  • Infrastructure and information services in healthcare;
  • Big data analytics in healthcare;
  • Search engines, big data search, and mining in healthcare;
  • Question answering for health information retrieval;
  • Query expansion, content classification, and clustering in healthcare;
  • NLP, machine learning, and computational linguistics for health information retrieval;
  • Web intelligence applications and search in healthcare.

Dr. Mario Ciampi
Dr. Mario Sicuranza
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Health information systems
  • Health data representation
  • Natural language processing
  • Big data analytics
  • Machine learning
  • Semantic search
  • Interoperability of networked information

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

27 pages, 25304 KiB  
Article
Multiple Explainable Approaches to Predict the Risk of Stroke Using Artificial Intelligence
by Susmita S, Krishnaraj Chadaga, Niranjana Sampathila, Srikanth Prabhu, Rajagopala Chadaga and Swathi Katta S
Information 2023, 14(8), 435; https://0-doi-org.brum.beds.ac.uk/10.3390/info14080435 - 01 Aug 2023
Viewed by 2687
Abstract
Stroke occurs when a brain’s blood artery ruptures or the brain’s blood supply is interrupted. Due to rupture or obstruction, the brain’s tissues cannot receive enough blood and oxygen. Stroke is a common cause of mortality among older people. Hence, loss of life [...] Read more.
Stroke occurs when a brain’s blood artery ruptures or the brain’s blood supply is interrupted. Due to rupture or obstruction, the brain’s tissues cannot receive enough blood and oxygen. Stroke is a common cause of mortality among older people. Hence, loss of life and severe brain damage can be avoided if stroke is recognized and diagnosed early. Healthcare professionals can discover solutions more quickly and accurately using artificial intelligence (AI) and machine learning (ML). As a result, we have shown how to predict stroke in patients using heterogeneous classifiers and explainable artificial intelligence (XAI). The multistack of ML models surpassed all other classifiers, with accuracy, recall, and precision of 96%, 96%, and 96%, respectively. Explainable artificial intelligence is a collection of frameworks and tools that aid in understanding and interpreting predictions provided by machine learning algorithms. Five diverse XAI methods, such as Shapley Additive Values (SHAP), ELI5, QLattice, Local Interpretable Model-agnostic Explanations (LIME) and Anchor, have been used to decipher the model predictions. This research aims to enable healthcare professionals to provide patients with more personalized and efficient care, while also providing a screening architecture with automated tools that can be used to revolutionize stroke prevention and treatment. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

14 pages, 5861 KiB  
Article
An Efficient Healthcare Data Mining Approach Using Apriori Algorithm: A Case Study of Eye Disorders in Young Adults
by Kanza Gulzar, Muhammad Ayoob Memon, Syed Muhammad Mohsin, Sheraz Aslam, Syed Muhammad Abrar Akber and Muhammad Asghar Nadeem
Information 2023, 14(4), 203; https://0-doi-org.brum.beds.ac.uk/10.3390/info14040203 - 27 Mar 2023
Cited by 6 | Viewed by 2792
Abstract
In the public health sector and the field of medicine, the popularity of data mining and its usage in knowledge discovery and databases (KDD) are rising. The growing popularity of data mining has discovered innovative healthcare links to support decision making. For this [...] Read more.
In the public health sector and the field of medicine, the popularity of data mining and its usage in knowledge discovery and databases (KDD) are rising. The growing popularity of data mining has discovered innovative healthcare links to support decision making. For this reason, there is a great possibility to better diagnose patient’s diseases and maintain the quality of healthcare services in hospitals. So, there is an urgent need to make disease diagnosis possible by discovering the hidden patterns from the patients’ history information in developing countries. This work is a step towards how to use the extracted knowledge to enhance the quality of healthcare facilities. In this paper, we have proposed a web-centered hospital information management system (HIMS) that identifies frequent patterns from the data with eye disorder patients using the association rule-based Apriori data mining technique. The proposed framework has the capability to overcome all the key issues and problems in the current hospital information management system regarding data analysis and reporting services. For this purpose, data were collected from more than 1000 university students (China citizens) both online and manually (printed questionnaire). After applying the Apriori algorithm on the collected data, we revealed that almost 140 individuals out of 1035 had myopia (near-sighted disorder), at current age of 22 years, and that there were no male patients found with myopia. We concluded that their clinical relevance and utility can generate favorable results from prospective clinical studies by mapping out the habits or lifestyles that potentially lead to fatal diseases. In the future, we plan to extend this work to fully automate HIMS to help practitioners to diagnose the reasons of various diseases by extracting patient lifestyle patterns. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

17 pages, 3515 KiB  
Article
Smart Machine Health Prediction Based on Machine Learning in Industry Environment
by Sagar Yeruva, Jeshmitha Gunuganti, Sravani Kalva, Surender Reddy Salkuti and Seong-Cheol Kim
Information 2023, 14(3), 181; https://0-doi-org.brum.beds.ac.uk/10.3390/info14030181 - 14 Mar 2023
Viewed by 3705
Abstract
In an industrial setting, consistent production and machine maintenance might help any company become successful. Machine health checking is a method of observing the status of a machine to predict mechanical mileage and predict the machine’s disappointment. The most often utilized traditional approaches [...] Read more.
In an industrial setting, consistent production and machine maintenance might help any company become successful. Machine health checking is a method of observing the status of a machine to predict mechanical mileage and predict the machine’s disappointment. The most often utilized traditional approaches are reactive and preventive maintenance. These approaches are unreliable and wasteful in terms of time and resource utilization. The use of system health management in conjunction with a predictive maintenance strategy allows for the scheduling of maintenance times in such a way that device malfunction is avoided, and thus the repercussions are avoided. IoT can help monitor equipment health and provide the best outcomes, especially in an industrial setting. Internet of Things (IoT) and machine learning models are quite successful in providing ongoing knowledge and comprehensive study on infrastructure performance. Our suggested technique uses a mobile application that seeks to anticipate the machine’s health status using a classification method utilizing IoT and machine learning technologies, which might benefit the industry environment by alerting the appropriate maintenance team before inflicting significant harm to the system and disrupting normal operations. A comparison of decision tree, XGBoost, SVM, and KNN performance has been carried out. According to our findings, XGBoost achieves higher classification accuracy compared to the other algorithms. As a result, this model is selected for creating a user-based application that allows the user to easily check the state of the machine’s health. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

15 pages, 1615 KiB  
Article
Challenges Encountered and Lessons Learned when Using a Novel Anonymised Linked Dataset of Health and Social Care Records for Public Health Intelligence: The Sussex Integrated Dataset
by Elizabeth Ford, Richard Tyler, Natalie Johnston, Vicki Spencer-Hughes, Graham Evans, Jon Elsom, Anotida Madzvamuse, Jacqueline Clay, Kate Gilchrist and Melanie Rees-Roberts
Information 2023, 14(2), 106; https://0-doi-org.brum.beds.ac.uk/10.3390/info14020106 - 08 Feb 2023
Cited by 1 | Viewed by 2228
Abstract
Background: In the United Kingdom National Health Service (NHS), digital transformation programmes have resulted in the creation of pseudonymised linked datasets of patient-level medical records across all NHS and social care services. In the Southeast England counties of East and West Sussex, public [...] Read more.
Background: In the United Kingdom National Health Service (NHS), digital transformation programmes have resulted in the creation of pseudonymised linked datasets of patient-level medical records across all NHS and social care services. In the Southeast England counties of East and West Sussex, public health intelligence analysts based in local authorities (LAs) aimed to use the newly created “Sussex Integrated Dataset” (SID) for identifying cohorts of patients who are at risk of early onset multiple long-term conditions (MLTCs). Analysts from the LAs were among the first to have access to this new dataset. Methods: Data access was assured as the analysts were employed within joint data controller organisations and logged into the data via virtual machines following approval of a data access request. Analysts examined the demographics and medical history of patients against multiple external sources, identifying data quality issues and developing methods to establish true values for cases with multiple conflicting entries. Service use was plotted over timelines for individual patients. Results: Early evaluation of the data revealed multiple conflicting within-patient values for age, sex, ethnicity and date of death. This was partially resolved by creating a “demographic milestones” table, capturing demographic details for each patient for each year of the data available in the SID. Older data (≥5 y) was found to be sparse in events and diagnoses. Open-source code lists for defining long-term conditions were poor at identifying the expected number of patients, and bespoke code lists were developed by hand and validated against other sources of data. At the start, the age and sex distributions of patients submitted by GP practices were substantially different from those published by NHS Digital, and errors in data processing were identified and rectified. Conclusions: While new NHS linked datasets appear a promising resource for tracking multi-service use, MLTCs and health inequalities, substantial investment in data analysis and data architect time is necessary to ensure high enough quality data for meaningful analysis. Our team made conceptual progress in identifying the skills needed for programming analyses and understanding the types of questions which can be asked and answered reliably in these datasets. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

19 pages, 4179 KiB  
Article
Case Study of Multichannel Interaction in Healthcare Services
by Ailton Moreira, Júlio Duarte and Manuel Filipe Santos
Information 2023, 14(1), 37; https://0-doi-org.brum.beds.ac.uk/10.3390/info14010037 - 07 Jan 2023
Cited by 2 | Viewed by 2510
Abstract
A multichannel interaction service is a practice whereby organizations communicate and interact with their existing customers and potential new customers through different channels. This article presents a brief case study of multichannel interaction in healthcare services, which studies the viability of continuous multichannel [...] Read more.
A multichannel interaction service is a practice whereby organizations communicate and interact with their existing customers and potential new customers through different channels. This article presents a brief case study of multichannel interaction in healthcare services, which studies the viability of continuous multichannel interaction for personalized healthcare services to enable health professionals to follow up and monitor patients in home-based care. Furthermore, this study aims to explore the possibility of the continuity and complementarity of the interactions across different communication channels with the patients. The data used for this study was gathered during the first wave of the COVID-19 pandemic. This study showed that despite this type of interaction being relatively new in healthcare services, it has considerable potential for improving the relationship between patients, health professionals, and care providers. Upon completion of the data analysis, several conclusions were drawn. One such conclusion was the ability to maintain continuity of interaction across multiple channels, as well as the synergy between the different channels of interaction available to patients and the impact this has on the way patients and health professionals interact. Additionally, it was determined that the complementarity of different interaction channels is crucial when implementing multichannel interaction services. Furthermore, the implementation of this solution resulted in improved communication between patients and health professionals. Also, it has decreased health professional’s workload and reduced care providers costs regarding remote patient follow-up. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

13 pages, 1769 KiB  
Article
Semi-Automatic Systematic Literature Reviews and Information Extraction of COVID-19 Scientific Evidence: Description and Preliminary Results of the COKE Project
by Davide Golinelli, Andrea Giovanni Nuzzolese, Francesco Sanmarchi, Luana Bulla, Misael Mongiovì, Aldo Gangemi and Paola Rucci
Information 2022, 13(3), 117; https://0-doi-org.brum.beds.ac.uk/10.3390/info13030117 - 28 Feb 2022
Cited by 6 | Viewed by 3381
Abstract
The COVID-19 pandemic highlighted the importance of validated and updated scientific information to help policy makers, healthcare professionals, and the public. The speed in disseminating reliable information and the subsequent guidelines and policy implementation are also essential to save as many lives as [...] Read more.
The COVID-19 pandemic highlighted the importance of validated and updated scientific information to help policy makers, healthcare professionals, and the public. The speed in disseminating reliable information and the subsequent guidelines and policy implementation are also essential to save as many lives as possible. Trustworthy guidelines should be based on a systematic evidence review which uses reproducible analytical methods to collect secondary data and analyse them. However, the guidelines’ drafting process is time consuming and requires a great deal of resources. This paper aims to highlight the importance of accelerating and streamlining the extraction and synthesis of scientific evidence, specifically within the systematic review process. To do so, this paper describes the COKE (COVID-19 Knowledge Extraction framework for next generation discovery science) Project, which involves the use of machine reading and deep learning to design and implement a semi-automated system that supports and enhances the systematic literature review and guideline drafting processes. Specifically, we propose a framework for aiding in the literature selection and navigation process that employs natural language processing and clustering techniques for selecting and organizing the literature for human consultation, according to PICO (Population/Problem, Intervention, Comparison, and Outcome) elements. We show some preliminary results of the automatic classification of sentences on a dataset of abstracts related to COVID-19. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

16 pages, 1938 KiB  
Article
A Privacy-Preserving and Standard-Based Architecture for Secondary Use of Clinical Data
by Mario Ciampi, Mario Sicuranza and Stefano Silvestri
Information 2022, 13(2), 87; https://0-doi-org.brum.beds.ac.uk/10.3390/info13020087 - 13 Feb 2022
Cited by 6 | Viewed by 5767
Abstract
The heterogeneity of the formats and standards of clinical data, which includes both structured, semi-structured, and unstructured data, in addition to the sensitive information contained in them, require the definition of specific approaches that are able to implement methodologies that can permit the [...] Read more.
The heterogeneity of the formats and standards of clinical data, which includes both structured, semi-structured, and unstructured data, in addition to the sensitive information contained in them, require the definition of specific approaches that are able to implement methodologies that can permit the extraction of valuable information buried under such data. Although many challenges and issues that have not been fully addressed still exist when this information must be processed and used for further purposes, the most recent techniques based on machine learning and big data analytics can support the information extraction process for the secondary use of clinical data. In particular, these techniques can facilitate the transformation of heterogeneous data into a common standard format. Moreover, they can also be exploited to define anonymization or pseudonymization approaches, respecting the privacy requirements stated in the General Data Protection Regulation, Health Insurance Portability and Accountability Act and other national and regional laws. In fact, compliance with these laws requires that only de-identified clinical and personal data can be processed for secondary analyses, in particular when data is shared or exchanged across different institutions. This work proposes a modular architecture capable of collecting clinical data from heterogeneous sources and transforming them into useful data for secondary uses, such as research, governance, and medical education purposes. The proposed architecture is able to exploit appropriate modules and algorithms, carry out transformations (pseudonymization and standardization) required to use data for the second purposes, as well as provide efficient tools to facilitate the retrieval and analysis processes. Preliminary experimental tests show good accuracy in terms of quantitative evaluations. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

22 pages, 3001 KiB  
Article
A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
by Okechinyere J. Achilonu, Victor Olago, Elvira Singh, René M. J. C. Eijkemans, Gideon Nimako and Eustasius Musenge
Information 2021, 12(11), 451; https://0-doi-org.brum.beds.ac.uk/10.3390/info12110451 - 30 Oct 2021
Cited by 6 | Viewed by 2832
Abstract
A cancer pathology report is a valuable medical document that provides information for clinical management of the patient and evaluation of health care. However, there are variations in the quality of reporting in free-text style formats, ranging from comprehensive to incomplete reporting. Moreover, [...] Read more.
A cancer pathology report is a valuable medical document that provides information for clinical management of the patient and evaluation of health care. However, there are variations in the quality of reporting in free-text style formats, ranging from comprehensive to incomplete reporting. Moreover, the increasing incidence of cancer has generated a high throughput of pathology reports. Hence, manual extraction and classification of information from these reports can be intrinsically complex and resource-intensive. This study aimed to (i) evaluate the quality of over 80,000 breast, colorectal, and prostate cancer free-text pathology reports and (ii) assess the effectiveness of random forest (RF) and variants of support vector machine (SVM) in the classification of reports into benign and malignant classes. The study approach comprises data preprocessing, visualisation, feature selections, text classification, and evaluation of performance metrics. The performance of the classifiers was evaluated across various feature sizes, which were jointly selected by four filter feature selection methods. The feature selection methods identified established clinical terms, which are synonymous with each of the three cancers. Uni-gram tokenisation using the classifiers showed that the predictive power of RF model was consistent across various feature sizes, with overall F-scores of 95.2%, 94.0%, and 95.3% for breast, colorectal, and prostate cancer classification, respectively. The radial SVM achieved better classification performance compared with its linear variant for most of the feature sizes. The classifiers also achieved high precision, recall, and accuracy. This study supports a nationally agreed standard in pathology reporting and the use of text mining for encoding, classifying, and production of high-quality information abstractions for cancer prognosis and research. Full article
(This article belongs to the Special Issue Health Data Information Retrieval)
Show Figures

Figure 1

Back to TopTop