BIR: Biomedical Information Retrieval System for Cancer Treatment in Electronic Health Record Using Transformers
Abstract
:1. Introduction
- We propose novel techniques for biomedical information retrieval of related or similar EHR between medical problems during symptom detection in existing information through testing and predicting the treatment in hospital clinical procedures.
- The proposed approach was evaluated on a dataset of EHRs and found to be able to outperform state-of-the-art BIR systems on a variety of tasks, including medical question answering and information extraction. The proposed approach is able to learn the semantic relationships between words in biomedical documents, which is essential for effective BIR.
- We evaluated our mechanism on a dataset of clinical texts and found that it was able to outperform state-of-the-art attention mechanisms on a variety of tasks, including medical question answering and information extraction.
- Clinical texts are often long and complex, with a lot of medical jargon. This makes it difficult for traditional attention mechanisms to focus on the relevant parts of a sentence. We evaluated our mechanism on an Integrating Biology and the Bedside (I2B2) dataset of clinical texts and found that it was able to outperform state-of-the-art attention mechanisms on a variety of tasks, including medical question answering and information extraction.
2. Related Work
2.1. Artificial Intelligence (AI)–Assisted Tools
2.2. Word-Level Attention Mechanisms
2.3. Pretrained Language Models (PLMs) for Summarization
3. Method
3.1. Electronic Health Records Architecture
3.2. Biomedical Information Retrieval (BIR) Approach
3.3. Recurrent Neural Networks (RNNs)
3.3.1. EHR Feature Extraction Layer
3.3.2. EHR Linear Segment Attention Layer
3.4. Evaluation Metrics
4. Implementation and Results
4.1. Data Preprocess
4.2. Parameter Tuning
4.2.1. Baseline Discussion
4.2.2. Results and Discussion
4.3. Implications
4.3.1. Theoretical Implications
4.3.2. Practical Implications
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Matson, R.P.; Niesen, M.J.; Levy, E.R.; Opp, D.N.; Lenehan, P.J.; Donadio, G.; O’Horo, J.C.; Venkatakrishnan, A.J.; Badley, A.D.; Soundararajan, V. Paediatric Safety Assessment of BNT162b2 Vaccination in a Multistate Hospital-Based Electronic Health Record System in the USA: A Retrospective Analysis. Lancet Digit. Health 2023, 5, e206–e216. [Google Scholar] [CrossRef]
- Polnaszek, B.; Gilmore-Bykovskyi, A.; Hovanes, M.; Roiland, R.; Ferguson, P.; Brown, R.; Kind, A.J. Overcoming the Challenges of Unstructured Data in Multi-Site, Electronic Medical Record-Based Abstraction. Med. Care 2016, 54, e65. [Google Scholar] [CrossRef] [PubMed]
- Howard, J.; Clark, E.C.; Friedman, A.; Crosson, J.C.; Pellerano, M.; Crabtree, B.F.; Karsh, B.-T.; Jaen, C.R.; Bell, D.S.; Cohen, D.J. Electronic Health Record Impact on Work Burden in Small, Unaffiliated, Community-Based Primary Care Practices. J. Gen. Intern. Med. 2013, 28, 107–113. [Google Scholar] [CrossRef] [PubMed]
- Nadarajah, R.; Wu, J.; Hogg, D.; Raveendra, K.; Nakao, Y.M.; Nakao, K.; Arbel, R.; Haim, M.; Zahger, D.; Parry, J. Prediction of Short-Term Atrial Fibrillation Risk Using Primary Care Electronic Health Records. Heart 2023, 109, 1072–1079. [Google Scholar] [CrossRef] [PubMed]
- Kreimeyer, K.; Foster, M.; Pandey, A.; Arya, N.; Halford, G.; Jones, S.F.; Forshee, R.; Walderhaug, M.; Botsis, T. Natural Language Processing Systems for Capturing and Standardizing Unstructured Clinical Information: A Systematic Review. J. Biomed. Inform. 2017, 73, 14–29. [Google Scholar] [CrossRef] [PubMed]
- Luís, C.; Guerra-Carvalho, B.; Braga, P.C.; Guedes, C.; Patrício, E.; Alves, M.G.; Fernandes, R.; Soares, R. The Influence of Adipocyte Secretome on Selected Metabolic Fingerprints of Breast Cancer Cell Lines Representing the Four Major Breast Cancer Subtypes. Cells 2023, 12, 2123. [Google Scholar] [CrossRef] [PubMed]
- Sharma, D.C. India Still Struggles with Rural Doctor Shortages. Lancet 2015, 386, 2381–2382. [Google Scholar] [CrossRef] [PubMed]
- Savova, G.K.; Danciu, I.; Alamudun, F.; Miller, T.; Lin, C.; Bitterman, D.S.; Tourassi, G.; Warner, J.L. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical RecordsNatural Language Processing for Cancer Phenotypes from EMRs. Cancer Res. 2019, 79, 5463–5470. [Google Scholar] [CrossRef]
- Carrell, D.S.; Schoen, R.E.; Leffler, D.A.; Morris, M.; Rose, S.; Baer, A.; Crockett, S.D.; Gourevitch, R.A.; Dean, K.M.; Mehrotra, A. Challenges in Adapting Existing Clinical Natural Language Processing Systems to Multiple, Diverse Health Care Settings. J. Am. Med. Inform. Assoc. 2017, 24, 986–991. [Google Scholar] [CrossRef] [PubMed]
- Tamang, S.; Humbert-Droz, M.; Gianfrancesco, M.; Izadi, Z.; Schmajuk, G.; Yazdany, J. Practical Considerations for Developing Clinical Natural Language Processing Systems for Population Health Management and Measurement. JMIR Med. Inform. 2023, 11, e37805. [Google Scholar] [CrossRef]
- Anderson, J.E.; Chang, D.C. Using Electronic Health Records for Surgical Quality Improvement in the Era of Big Data. JAMA Surg. 2015, 150, 24–29. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Ouyang, C.; Liu, Y.; Bu, Y. Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules. Int. J. Environ. Res. Public Health 2020, 17, 2687. [Google Scholar] [CrossRef]
- Buthelezi, L.A.; Pillay, S.; Ntuli, N.N.; Gcanga, L.; Guler, R. Antisense Therapy for Infectious Diseases. Cells 2023, 12, 2119. [Google Scholar] [CrossRef] [PubMed]
- Dong, X.; Halevy, A. Indexing Dataspaces. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, 11–14 June 2007; pp. 43–54. [Google Scholar]
- Jensen, P.B.; Jensen, L.J.; Brunak, S. Mining Electronic Health Records: Towards Better Research Applications and Clinical Care. Nat. Rev. Genet. 2012, 13, 395–405. [Google Scholar] [CrossRef]
- Rink, B.; Harabagiu, S.; Roberts, K. Automatic Extraction of Relations between Medical Concepts in Clinical Texts. J. Am. Med. Inform. Assoc. 2011, 18, 594–600. [Google Scholar] [CrossRef] [PubMed]
- Mukherjea, S.; Bamba, B.; Kankar, P. Information Retrieval and Knowledge Discovery Utilizing a Biomedical Patent Semantic Web. IEEE Trans. Knowl. Data Eng. 2005, 17, 1099–1110. [Google Scholar] [CrossRef]
- Giglia, E. Quertle and KNALIJ: Searching PubMed Has Never Been so Easy and Effective. Eur. J. Phys. Rehabil. Med. 2011, 47, 687–690. [Google Scholar] [PubMed]
- Bao, Y.; Deng, Z.; Wang, Y.; Kim, H.; Armengol, V.D.; Acevedo, F.; Ouardaoui, N.; Wang, C.; Parmigiani, G.; Barzilay, R. Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes. JCO Clin. Cancer Inform. 2019, 1, 1–9. [Google Scholar] [CrossRef]
- Kilicoglu, H.; Demner-Fushman, D.; Rindflesch, T.C.; Wilczynski, N.L.; Haynes, R.B. Towards Automatic Recognition of Scientifically Rigorous Clinical Research Evidence. J. Am. Med. Inform. Assoc. 2009, 16, 25–31. [Google Scholar] [CrossRef] [PubMed]
- Kilicoglu, H. Biomedical Text Mining for Research Rigor and Integrity: Tasks, Challenges, Directions. Brief. Bioinform. 2018, 19, 1400–1414. [Google Scholar] [CrossRef] [PubMed]
- Saiz, F.S.; Sanders, C.; Stevens, R.; Nielsen, R.; Britt, M.; Yuravlivker, L.; Preininger, A.M.; Jackson, G.P. Artificial Intelligence Clinical Evidence Engine for Automatic Identification, Prioritization, and Extraction of Relevant Clinical Oncology Research. JCO Clin. Cancer Inform. 2021, 5, 102–111. [Google Scholar] [CrossRef] [PubMed]
- Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 7–12 August 2016; pp. 207–212. [Google Scholar]
- Lin, Y.; Shen, S.; Liu, Z.; Luan, H.; Sun, M. Neural Relation Extraction with Selective Attention over Instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 2124–2133. [Google Scholar]
- Mahdi, S.S.; Battineni, G.; Khawaja, M.; Allana, R.; Siddiqui, M.K.; Agha, D. How Does Artificial Intelligence Impact Digital Healthcare Initiatives? A Review of AI Applications in Dental Healthcare. Int. J. Inf. Manag. Data Insights 2023, 3, 100144. [Google Scholar] [CrossRef]
- Strunga, M.; Urban, R.; Surovková, J.; Thurzo, A. Artificial Intelligence Systems Assisting in the Assessment of the Course and Retention of Orthodontic Treatment. Healthcare 2023, 11, 683. [Google Scholar] [CrossRef] [PubMed]
- Segev, A.; Leshno, M.; Zviran, M. Internet as a Knowledge Base for Medical Diagnostic Assistance. Expert Syst. Appl. 2007, 33, 251–255. [Google Scholar] [CrossRef]
- Tsipouras, M.G.; Exarchos, T.P.; Fotiadis, D.I.; Kotsia, A.P.; Vakalis, K.V.; Naka, K.K.; Michalis, L.K. Automated Diagnosis of Coronary Artery Disease Based on Data Mining and Fuzzy Modeling. IEEE Trans. Inf. Technol. Biomed. 2008, 12, 447–458. [Google Scholar] [CrossRef]
- Liu, Y.; Lapata, M. Text Summarization with Pretrained Encoders. arXiv 2019, arXiv:1908.08345. [Google Scholar]
- El-Kassas, W.S.; Salama, C.R.; Rafea, A.A.; Mohamed, H.K. Automatic Text Summarization: A Comprehensive Survey. Expert Syst. Appl. 2021, 165, 113679. [Google Scholar] [CrossRef]
- Du, Y.; Li, Q.; Wang, L.; He, Y. Biomedical-Domain Pre-Trained Language Model for Extractive Summarization. Knowl.-Based Syst. 2020, 199, 105964. [Google Scholar] [CrossRef]
- Aaditya, M.D.; Lal, D.M.; Singh, K.P.; Ojha, M. Layer Freezing for Regulating Fine-Tuning in BERT for Extractive Text Summarization. In Proceedings of the PACIS, Dubai, United Arab Emirates, 12 July 2021; p. 182. [Google Scholar]
- Moradi, M.; Dorffner, G.; Samwald, M. Deep Contextualized Embeddings for Quantifying the Informative Content in Biomedical Text Summarization. Comput. Methods Programs Biomed. 2020, 184, 105117. [Google Scholar] [CrossRef]
- Padmakumar, V.; He, H. Unsupervised Extractive Summarization Using Pointwise Mutual Information. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, 19–23 April 2021; pp. 2505–2512. [Google Scholar]
- Wang, B.; Xie, Q.; Pei, J.; Chen, Z.; Tiwari, P.; Li, Z.; Fu, J. Pre-Trained Language Models in Biomedical Domain: A Systematic Survey. ACM Comput. Surv. 2023, 56, 1–52. [Google Scholar] [CrossRef]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A. Language Models Are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Feng, F.; Yang, Y.; Cer, D.; Arivazhagan, N.; Wang, W. Language-Agnostic Bert Sentence Embedding. arXiv 2020, arXiv:2007.01852. [Google Scholar]
- Tay, Y.; Dehghani, M.; Bahri, D.; Metzler, D. Efficient Transformers: A Survey. ACM Comput. Surv. CSUR 2020, 55, 109. [Google Scholar] [CrossRef]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Mutlu, B.; Sezer, E.A. Enhanced Sentence Representation for Extractive Text Summarization: Investigating the Syntactic and Semantic Features and Their Contribution to Sentence Scoring. Expert Syst. Appl. 2023, 227, 120302. [Google Scholar] [CrossRef]
- Qiu, J.; Wang, Q.; Zhou, Y.; Ruan, T.; Gao, J. Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 3–6 December 2018; IEEE: New York, NY, USA, 2018; pp. 935–942. [Google Scholar]
- Demner-Fushman, D.; Antani, S.; Simpson, M.; Thoma, G.R. Design and Development of a Multimodal Biomedical Information Retrieval System. J. Comput. Sci. Eng. 2012, 6, 168–177. [Google Scholar] [CrossRef]
- Mohan, S.; Fiorini, N.; Kim, S.; Lu, Z. A Fast Deep Learning Model for Textual Relevance in Biomedical Information Retrieval. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; International World Wide Web Conferences Steering Committee: Republic and Canton of Geneva, CHE, 2018; pp. 77–86. [Google Scholar]
- Huang, X.; Hu, Q. A Bayesian Learning Approach to Promoting Diversity in Ranking for Biomedical Information Retrieval. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, MA, USA, 19–23 July 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 307–314. [Google Scholar]
- Trieschnigg, D. Proof of Concept: Concept-Based Biomedical Information Retrieval. SIGIR Forum 2011, 44, 89. [Google Scholar] [CrossRef]
- Xu, B.; Lin, H.; Lin, Y. Learning to Refine Expansion Terms for Biomedical Information Retrieval Using Semantic Resources. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 16, 954–966. [Google Scholar] [CrossRef]
- Xu, B.; Lin, H.; Lin, Y.; Ma, Y.; Yang, L.; Wang, J.; Yang, Z. Improve Biomedical Information Retrieval Using Modified Learning to Rank Methods. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018, 15, 1797–1809. [Google Scholar] [CrossRef] [PubMed]
- Hanauer, D.A.; Barnholtz-Sloan, J.S.; Beno, M.F.; Del Fiol, G.; Durbin, E.B.; Gologorskaya, O.; Harris, D.; Harnett, B.; Kawamoto, K.; May, B. Electronic Medical Record Search Engine (EMERSE): An Information Retrieval Tool for Supporting Cancer Research. JCO Clin. Cancer Inform. 2020, 4, 454–463. [Google Scholar] [CrossRef]
- Adler-Milstein, J.; Bates, D.W. Paperless Healthcare: Progress and Challenges of an IT-Enabled Healthcare System. Bus. Horiz. 2010, 53, 119–130. [Google Scholar] [CrossRef]
- Zhu, D.; Wu, S.T.; Masanz, J.J.; Carterette, B.; Liu, H. Using Discharge Summaries to Improve Information Retrieval in Clinical Domain. In Proceedings of the CLEF, Valencia, Spain, 11 September 2013. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Nguyen, D.Q.; Verspoor, K. End-to-End Neural Relation Extraction Using Deep Biaffine Attention. In Proceedings of the European Conference on Information Retrieval, Cologne, Germany, 14–18 April 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 729–738. [Google Scholar]
- Alsentzer, E.; Murphy, J.R.; Boag, W.; Weng, W.-H.; Jin, D.; Naumann, T.; McDermott, M. Publicly Available Clinical BERT Embeddings. arXiv 2019, arXiv:1904.03323. [Google Scholar]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef]
- Frei, J.; Frei-Stuber, L.; Kramer, F. GERNERMED++: Semantic Annotation in German Medical NLP through Transfer-Learning, Translation and Word Alignment. J. Biomed. Inform. 2023, 147, 104513. [Google Scholar] [CrossRef]
- Jettakul, A.; Wichadakul, D.; Vateekul, P. Relation Extraction between Bacteria and Biotopes from Biomedical Texts with Attention Mechanisms and Domain-Specific Contextual Representations. BMC Bioinform. 2019, 20, 627. [Google Scholar] [CrossRef] [PubMed]
- Li, F.; Jin, Y.; Liu, W.; Rawat, B.P.S.; Cai, P.; Yu, H. Fine-Tuning Bidirectional Encoder Representations from Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study. JMIR Med. Inform. 2019, 7, e14830. [Google Scholar] [CrossRef]
- Jahanbakhsh, M.; Rabiei, R.; Asadi, F.; Moghaddasi, H. Electronic Health Record Architecture: A Systematic Review. J. Paramed. Sci. 2016, 7, 29–36. [Google Scholar]
- Ahmad, P.N.; Shah, A.M.; Lee, K. A Review on Electronic Health Record Text-Mining for Biomedical Name Entity Recognition in Healthcare Domain. Healthcare 2023, 11, 1268. [Google Scholar] [CrossRef]
- Pruski, C.; Wisniewski, F. Efficient Medical Information Retrieval in Encrypted Electronic Health Records. In Quality of Life through Quality of Information; IOS Press: Amsterdam, The Netherlands, 2012; pp. 225–229. [Google Scholar]
- Lerner, I.; Paris, N.; Tannier, X. Terminologies Augmented Recurrent Neural Network Model for Clinical Named Entity Recognition. J. Biomed. Inform. 2020, 102, 103356. [Google Scholar] [CrossRef]
- Li, X.; Wong, K.-C. Evolutionary Multiobjective Clustering and Its Applications to Patient Stratification. IEEE Trans. Cybern. 2019, 49, 1680–1693. [Google Scholar] [CrossRef] [PubMed]
- Li, I.; Pan, J.; Goldwasser, J.; Verma, N.; Wong, W.P.; Nuzumlalı, M.Y.; Rosand, B.; Li, Y.; Zhang, M.; Chang, D. Neural Natural Language Processing for Unstructured Data in Electronic Health Records: A Review. arXiv 2021, arXiv:2107.02975. [Google Scholar] [CrossRef]
- Korn, P.; Sidiropoulos, N.; Faloutsos, C.; Siegel, E.; Protopapas, Z. Fast and Effective Retrieval of Medical Tumor Shapes. IEEE Trans. Knowl. Data Eng. 1998, 10, 889–904. [Google Scholar] [CrossRef]
- Jain, H.; Thao, C.; Zhao, H. Enhancing Electronic Medical Record Retrieval through Semantic Query Expansion. Inf. Syst. e-Bus. Manag. 2012, 10, 165–181. [Google Scholar] [CrossRef]
- Yang, B.; Ye, M.; Tan, Q.; Yuen, P.C. Cross-Domain Missingness-Aware Time-Series Adaptation With Similarity Distillation in Medical Applications. IEEE Trans. Cybern. 2022, 52, 3394–3407. [Google Scholar] [CrossRef] [PubMed]
- Porkodi, V.; Karuppusamy, S.A. Classification of Chronic Obstructive Pulmonary Disease (COPD) Using Gabor Filter With SVM Classifier. Int. J. Eng. Adv. Technol. 2019, 9, 787–790. [Google Scholar] [CrossRef]
- Jagannatha, A.N.; Yu, H. Bidirectional RNN for Medical Event Detection in Electronic Health Records. Proc. Conf. 2016, 2016, 473. [Google Scholar]
- Luu, T.M.; Phan, R.; Davey, R.; Chetty, G. Clinical Name Entity Recognition Based on Recurrent Neural Networks. In Proceedings of the 2018 18th International Conference on Computational Science and Applications (ICCSA), Melbourne, VIC, Australia, 2–5 July 2018; IEEE: New York, NY, USA, 2018; pp. 1–9. [Google Scholar]
- Lasko, T.A.; Denny, J.C.; Levy, M.A. Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data. PLoS ONE 2013, 8, e66341. [Google Scholar] [CrossRef]
- Rotsztejn, J.; Hollenstein, N.; Zhang, C. Eth-Ds3lab at Semeval-2018 Task 7: Effectively Combining Recurrent and Convolutional Neural Networks for Relation Classification and Extraction. arXiv 2018, arXiv:1804.02042. [Google Scholar]
- Song, H.; Rajan, D.; Thiagarajan, J.; Spanias, A. Attend and Diagnose: Clinical Time Series Analysis Using Attention Models. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32, pp. 4090–4098. [Google Scholar]
- Graves, A.; Schmidhuber, J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
- Tjandra, A.; Sakti, S.; Manurung, R.; Adriani, M.; Nakamura, S. Gated Recurrent Neural Tensor Network. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; IEEE: New York, NY, USA, 2016; pp. 448–455. [Google Scholar]
- Yuan, M.; Ren, J. Numerical Feature Transformation-Based Sequence Generation Model for Multi-Disease Diagnosis. Int. J. Pattern Recognit. Artif. Intell. 2021, 35, 2159034. [Google Scholar] [CrossRef]
- Liu, Y.; Gou, X. A Text Classification Method Based on Graph Attention Networks. In Proceedings of the 2021 International Conference on Information Technology and Biomedical Engineering (ICITBE), Nanchang, China, 24–26 December 2021; IEEE: New York, NY, USA, 2021; pp. 35–39. [Google Scholar]
- Patrick, J.D.; Nguyen, D.H.M.; Wang, Y.; Li, M. I2b2 Challenges in Clinical Natural Language Processing 2010. In Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data, i2b2, Boston, MA, USA, 2010. [Google Scholar]
- Prechelt, L. Automatic Early Stopping Using Cross Validation: Quantifying the Criteria. Neural Netw. 1998, 11, 761–767. [Google Scholar] [CrossRef] [PubMed]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. HuggingFace’s Transformers: State-of-the-Art Natural Language Processing. arXiv 2019, arXiv:1910.03771. [Google Scholar]
- Chawla, N.V.; Japkowicz, N.; Kotcz, A. Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explor. Newsl. 2004, 6, 1–6. [Google Scholar] [CrossRef]
- Sahu, S.K.; Anand, A.; Oruganty, K.; Gattu, M. Relation Extraction from Clinical Texts Using Domain Invariant Convolutional Neural Network. arXiv 2016, arXiv:1606.09370. [Google Scholar]
- Solt, I.; Szidarovszky, F.P.; Tikk, D. Concept, Assertion and Relation Extraction at the 2010 I2b2 Relation Extraction Challenge Using Parsing Information and Dictionaries. In Proceedings of the 4th i2b2/VA Workshop 2010, Washington, DC, USA, 13 November 2010. [Google Scholar]
- Bhatia, S.; Kumar, A.; Khan, M.M. Role of Genetic Algorithm in Optimization of Hindi Word Sense Disambiguation. IEEE Access 2022, 10, 75693–75707. [Google Scholar] [CrossRef]
- Ji, Z.; Ghiasvand, O.; Wu, S.; Xu, H. A Discrete Joint Model for Entity and Relation Extraction from Clinical Notes. AMIA Summits Transl. Sci. Proc. 2021, 2021, 315. [Google Scholar] [PubMed]
Clinical Text Problem Samples | |
---|---|
Doctor: “He was given Lexix to prevent him from congestive heart failure.” | |
People who are currently diagnosed with cancer, including breast cancer, have a higher risk of severe illness if they get COVID-19. | |
Chemotherapy and immunotherapy can weaken the immune system and possibly cause lung problems. | |
Pneumonia is an infection that inflames the air sacs in one or both lungs. |
Information Extraction Techniques | Proposed Method | Limitation |
---|---|---|
Biomedical information in EHR [42] | Combination of multimodel techniques and tools | A current user interface and the usefulness of the search features |
Document’s text to a keyword style query [43] | Query-document delta matrix passed through deep feedforward | A relatively small amount of training data |
Bayesian learning approach [44] | Biomedical IR performance through diversity and a reranking algorithm | TREC 2004–2007 Genomics datasets |
Biomedical domain knowledge IR [45] | A cross-linguistic framework for monolingual and concept-based retrieval of biomedical information | Concept-based retrieval and user system communication |
Biomedical query expansion [46] | Pseudo-relevance feedback method based on mesh, which combines information with a corpus | Extracting biomedical feature resources for optimizing expansion term refinement |
Learning manual information [47] | Optimal ranking strategy and groupwise learning boost the diversity of retrieved relevant documents | Automatic aspect mining when the dataset contains no such annotations |
Tool for Electronic Medical Record Search Engine (EMERSE) [48] | EMERSE is a Web-based application that supports cancer research online (http://www.webmd.com/cancer/ and http://www.cancer.gov/, accessed on 28 March 2023) | Involves securely networking sites for obfuscated counts |
Point of healthcare IE [49] | Clinical care or healthcare IR systems | Manual healthcare IR |
Electronic medical record [50] | Primarily investigated triresearch questions medical IR | Inclusion of entity attributes, web text preprocessing, and cross-validation |
Patient Feature | ||||||
---|---|---|---|---|---|---|
Record | 101 | 102 | 103 | 104 | 105 | 106 |
Syndrome | Breast neoplasm | Cervical neoplasm | Lung cancer | Breast neoplasm | Lung cancer | Breast neoplasm |
Treatment | Hormone therapy | Teletherapy | Immunotherapy | Hormone therapy | Immunotherapy | Hormone therapy |
Chemotherapy | Brachytherapy | Targeted therapy | Chemotherapy | Targeted therapy | Chemotherapy | |
SERMs | Radiation therapy | Chemotherapy | SERMs | Chemotherapy | SERMs | |
Doctor | Oncologist | Oncologist | Oncologist | Oncologist | Oncologist | Oncologist |
Dosage | 21–60 mg | 0.40–2.0 Gy/h | 58–73 Gy | 31–51 mg | 46–62 Gy | 21–51 mg |
Mode | nm | - | - | nm | - | nm |
Frequency | q.d | - | - | q.d | - | q.d |
Duration | 6 months | 55 days | 4 months | 2–3 months | 3–6 months | 3 months |
Reason | Healthy | Healthy | Death | Healthy | Healthy | Death |
Gender | F | F | F | F | M | F |
Stage | I | II | I | II | III | I |
Hyperparameters | Case 1 | Case 2 | Case 3 | Case 4 | Case 5 |
---|---|---|---|---|---|
Learning rates | 1 × | 2 × | 3 × | 3 × | 5 × |
Epochs | 30 | 20 | 20 | 10 | 15 |
Batch sizes | 128 | 64 | 32 | 8 | 16 |
n_clusters | 2 | 2 | 2 | 2 | 0 |
Dropout | 0.4 | 0.4 | 0.2 | 0.2 | 0.3 |
Optimizer | Adamax | GD | RMSprop | Adamax | AdamW |
Weight decay | 0.1 | 0.01 | 0.01 | 0.1 | 0.1 |
Output layer | Softmax | - | - | Softmax | Softmax |
Pretrain model | 12 | 24 | 24 | 12 | 12 |
Kernel | 1 | 1 | 1 | 1 | 3 |
Hidden Layers | 768 | 768 | 768 | 768 | 768 |
Test size | 0.6 | 0.5 | 0.4 | 0.3 | 0.2 |
Train size | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 |
Distribution | Macro | Weight | |||||||
---|---|---|---|---|---|---|---|---|---|
Split | Instance | Acc | Prec | Rec | F1 | Prec | Rec | F1 | |
NBD | Test | 20% | 78.4% | 0.5533 | 0.5371 | 0.5451 | 0.7655 | 0.7637 | 0.7646 |
Valid | 80% | 80% | 0.5683 | 0.5448 | 0.5563 | 0.7659 | 0.7641 | 0.7650 | |
BDD | Test | 20% | 88.4% | 0.6551 | 0.6377 | 0.6463 | 0.8554 | 0.8380 | 0.8466 |
Valid | 80% | 89% | 0.6689 | 0.5643 | 0.6547 | 0.8593 | 0.8425 | 0.8508 |
Parameters | Metric | Baseline-1 | Baseline-2 | RoBERTa-crf | Bio-BERT-crf | RoBERTa-LSTM | Bio-BERT-LSTM | Our |
---|---|---|---|---|---|---|---|---|
Case 1 | Acc | 27% | 29% | - | 47% | 49% | 51% | 59% |
F1 | 0.217 | 0.221 | - | 0.429 | 0.457 | 0.452 | 0.543 | |
P | 0.203 | 0.236 | - | 0.435 | 0.443 | 0.468 | 0.530 | |
R | 0.233 | 0.208 | - | 0.423 | 0.472 | 0.437 | 0.557 | |
Case 2 | Acc | - | 41% | 50% | 48% | 54% | 58% | 64% |
F1 | - | 0.337 | 0.445 | 0.425 | 0.498 | 0.519 | 0.614 | |
P | - | 0.346 | 0.438 | 0.436 | 0.502 | 0.536 | 0.605 | |
R | - | 0.328 | 0.452 | 0.415 | 0.494 | 0.503 | 0.623 | |
Case 3 | Acc | 46% | - | 63% | 51% | 68% | 66% | 76% |
F1 | 0.415 | - | 0.587 | 0.465 | 0.647 | 0.611 | 0.724 | |
P | 0.405 | - | 0.560 | 0.458 | 0.651 | 0.625 | 0.713 | |
R | 0.426 | - | 0.617 | 0.472 | 0.643 | 0.698 | 0.735 | |
Case 4 | Acc | 65% | 58% | 76% | 49% | 70% | 74% | 68% |
F1 | 0.586 | 0.519 | 0.703 | 0.431 | 0.648 | 0.713 | 0.642 | |
P | 0.600 | 0.537 | 0.715 | 0.445 | 0.652 | 0.704 | 0.655 | |
R | 0.573 | 0.502 | 0.691 | 0.418 | 0.644 | 0.722 | 0.630 | |
Case 5 | Acc | 54% | 66% | 76% | 57% | 81% | 86% | 89% |
F1 | 0.517 | 0.614 | 0.730 | 0.537 | 0.757 | 0.806 | 0.846 | |
P | 0.494 | 0.632 | 0.725 | 0.530 | 0.745 | 0.815 | 0.855 | |
R | 0.542 | 0.597 | 0.735 | 0.544 | 0.769 | 0.797 | 0.837 |
Models | TrCP | TeRP | TeCP | PIP | Medic | TrAP | TrWP |
---|---|---|---|---|---|---|---|
I2b2 2010 | |||||||
Sahu et al. [81] | 56.4% | 11% | 50.6% | 64.9% | 55% | 71.6% | 59% |
Rink et al. [16] | 55.4% | 75% | 51% | 69.4% | 76.4% | 75.7% | 64% |
Patrick et al. [77] | 48.7% | 84% | 50% | 65.1% | - | 71.2% | 76% |
Divita et al. [82] | 48.5% | 83.7% | 37.7% | 71% | 55% | 47.46% | 68% |
I2b2 2012 | |||||||
Bhatia et al. [83] | 17% | 26% | 82% | 48% | 56.3 | - | 78.9% |
Ji et al. [84] | 29.45% | 55.95% | 32.79% | 21.67% | - | 47.46% | 48% |
Our | 66% | 87% | 57% | 70% | 69% | 81% | 89% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ahmad, P.N.; Liu, Y.; Khan, K.; Jiang, T.; Burhan, U. BIR: Biomedical Information Retrieval System for Cancer Treatment in Electronic Health Record Using Transformers. Sensors 2023, 23, 9355. https://0-doi-org.brum.beds.ac.uk/10.3390/s23239355
Ahmad PN, Liu Y, Khan K, Jiang T, Burhan U. BIR: Biomedical Information Retrieval System for Cancer Treatment in Electronic Health Record Using Transformers. Sensors. 2023; 23(23):9355. https://0-doi-org.brum.beds.ac.uk/10.3390/s23239355
Chicago/Turabian StyleAhmad, Pir Noman, Yuanchao Liu, Khalid Khan, Tao Jiang, and Umama Burhan. 2023. "BIR: Biomedical Information Retrieval System for Cancer Treatment in Electronic Health Record Using Transformers" Sensors 23, no. 23: 9355. https://0-doi-org.brum.beds.ac.uk/10.3390/s23239355