Publications

2025

Pushing the boundaries of radiotherapy-immunotherapy combinations: highlights from the 7th immunorad conference.
Laurent, Pierre-Antoine, Fabrice André, Alexandre Bobard, Desiree Deandreis, Sandra Demaria, Stephane Depil, Stefan B. Eichmüller et al. (2025)
Over the last decade, the annual Immunorad Conference, held under the joint auspicies of Gustave Roussy (Villejuif, France) and the Weill Cornell Medical College (New-York, USA) has aimed at exploring the latest advancements in the fields of tumor immunology and radiotherapy-immunotherapy combinations for the treatment of cancer. Gathering medical oncologists, radiation oncologists, physicians and researchers with esteemed expertise in these fields, the Immunorad Conference bridges the gap between preclinical outcomes and clinical opportunities. Thus, it paves a promising way toward optimizing radiotherapy-immunotherapy combinations and, from a broader perspective, improving therapeutic strategies for patients with cancer. Herein, we report on the topics developed by key-opinion leaders during the 7th Immunorad Conference held in Paris-Les Cordeliers (France) from September 27th to 29th 2023, and set the stage for the 8th edition of Immunorad which will be held at Weill Cornell Medical College (New-York, USA) in October 2024.
Laurent, Pierre-Antoine, Fabrice André, Alexandre Bobard, Desiree Deandreis, Sandra Demaria, Stephane Depil, Stefan B. Eichmüller et al. Pushing the boundaries of radiotherapy-immunotherapy combinations: highlights from the 7th immunorad conference. OncoImmunology 14, no. 1 (2025): 2432726. doi: https://doi.org/10.1080/2162402X.2024.2432726
FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare
Lekadir, Karim; Feragen, Aasa; Fofanah, Abdul Joseph; Frangi, Alejandro F; Zuluaga, Maria A.; et al.
Despite major advances in artificial intelligence (AI) research for healthcare, the deployment and adoption of AI technologies remain limited in clinical practice. This paper describes the FUTURE-AI framework, which provides guidance for the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI Consortium was founded in 2021 and comprises 117 interdisciplinary experts from 50 countries representing all continents, including AI scientists, clinical researchers, biomedical ethicists, and social scientists. Over a two year period, the FUTURE-AI guideline was established through consensus based on six guiding principles—fairness, universality, traceability, usability, robustness, and explainability. To operationalise trustworthy AI in healthcare, a set of 30 best practices were defined, addressing technical, clinical, socioethical, and legal dimensions. The recommendations cover the entire lifecycle of healthcare AI, from design, development, and validation to regulation, deployment, and monitoring.
Summary points
- Despite major advances in medical artificial intelligence (AI) research, clinical adoption of emerging AI solutions remains challenging owing to limited trust and ethical concerns
- The FUTURE-AI Consortium unites 117 experts from 50 countries to define international guidelines for trustworthy healthcare AI
- The FUTURE-AI framework is structured around six guiding principles: fairness, universality, traceability, usability, robustness, and explainability
- The guideline addresses the entire AI lifecycle, from design and development to validation and deployment, ensuring alignment with real world needs and ethical requirements
- The framework includes 30 detailed recommendations for building trustworthy and deployable AI systems, emphasising multistakeholder collaboration
- Continuous risk assessment and mitigation are fundamental, addressing biases, data variations, and evolving challenges during the AI lifecycle
- FUTURE-AI is designed as a dynamic framework, which will evolve with technological advancements and stakeholder feedback
Lekadir, Karim; Feragen, Aasa; Fofanah, Abdul Joseph; Frangi, Alejandro F; Zuluaga, Maria A.; et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare BMJ 2025; 388 doi: https://doi.org/10.1136/bmj.r340 (Published 17 February 2025) Cite this as: BMJ 2025;388:r340
Generating Explanations in Medical Question-Answering by Expectation Maximization Inference over Evidence
Sun, Wei; Li, Mingxiao; Sileo, Damien; Davis, Jesse J.; Moens, Marie Francine
Medical Question Answering (medical QA) systems play an essential role in assisting healthcare workers in finding answers to their questions. However, it is not sufficient to merely provide answers by medical QA systems because users might want explanations, that is, more analytic statements in natural language that describe the elements and context that support the answer. To do so, we propose a novel approach for generating natural language explanations for answers predicted by medical QA systems. As high-quality medical explanations require additional medical knowledge, so that our system extracts knowledge from medical textbooks to enhance the quality of explanations during the explanation generation process. Concretely, we designed an Expectation-Maximization approach that makes inferences about the evidence found in these texts, offering an efficient way to focus attention on lengthy evidence passages. Experimental results, conducted on two datasets MQAE-diag and MQAE, demonstrate the effectiveness of our framework for reasoning with textual evidence. Our approach outperforms state-of-the-art models, achieving a significant improvement of 6.13 and 5.47 percentage points on the Rouge-L score; 6.49 and 5.28 percentage points on the Bleu-4 score on the MQAE-diag and MQAE datasets.
Wei Sun, Mingxiao Li, Damien Sileo, Jesse Davis, and Marie-Francine Moens. 2025. Generating Explanations in Medical Question-Answering by Expectation Maximization Inference over Evidence. ACM Trans. Comput. Healthcare 6, 2, Article 23 (April 2025), 23 pages. https://doi.org/10.1145/3712296
Radiomics in Dermatological Optical Coherence Tomography (OCT): Feature Repeatability, Reproducibility, and Integration into Diagnostic Models in a Prospective Study
Widaatalla, Yousif ; Wolswijk, Tom ; Khan, Muhammad Danial ; Halilaj, Iva ; Mosterd, Klara ; Woodruff, Henry C. ; Lambin, Philippe
Objectives: Radiomics has seen substantial growth in medical imaging; however, its potential in optical coherence tomography (OCT) has not been widely explored. We systematically evaluate the repeatability and reproducibility of handcrafted radiomics features (HRFs) from OCT scans of benign nevi and examine the impact of bin width (BW) selection on HRF stability. The effect of using stable features on a radiomics classification model was also assessed. Methods: In this prospective study, 20 volunteers underwent test–retest OCT imaging of 40 benign nevi, resulting in 80 scans. The repeatability and reproducibility of HRFs extracted from manually delineated regions of interest (ROIs) were assessed using concordance correlation coefficients (CCCs) across BWs ranging from 5 to 50. A unique set of stable HRFs was identified at each BW after removing highly correlated features to eliminate redundancy. These robust features were incorporated into a multiclass radiomics classifier trained to distinguish benign nevi, basal cell carcinoma (BCC), and Bowen’s disease. Results: Six stable HRFs were identified across all BWs, with a BW of 25 emerging as the optimal choice, balancing repeatability and the ability to capture meaningful textural details. Additionally, intermediate BWs (20–25) yielded 53 reproducible features. A classifier trained with six stable features achieved a 90% accuracy and AUCs of 0.96 and 0.94 for BCC and Bowen’s disease, respectively, compared to a 76% accuracy and AUCs of 0.86 and 0.80 for a conventional feature selection approach. Conclusions: This study highlights the critical role of BW selection in enhancing HRF stability and provides a methodological framework for optimizing preprocessing in OCT radiomics. By demonstrating the integration of stable HRFs into diagnostic models, we establish OCT radiomics as a promising tool to aid non-invasive diagnosis in dermatology.
Widaatalla, Y.; Wolswijk, T.; Khan, M.D.; Halilaj, I.; Mosterd, K.; Woodruff, H.C.; Lambin, P. Radiomics in Dermatological Optical Coherence Tomography (OCT): Feature Repeatability, Reproducibility, and Integration into Diagnostic Models in a Prospective Study. Cancers 2025, 17, 768. https://doi.org/10.3390/cancers17050768
Harmonizing CT scanner acquisition variability in an anthropomorphic phantom: A comparative study of image-level and feature-level harmonization using GAN, ComBat, and their combination
Mali, Shruti Atul, Nastaran Mohammadian Rad, Henry C. Woodruff, Adrien Depeursinge, Vincent Andrearczyk, and Philippe Lambin
Purpose
Radiomics allows for the quantification of medical images and facilitates precision medicine. Many radiomic features derived from computed tomography (CT) are sensitive to variations across scanners, reconstruction settings, and acquisition protocols. In this phantom study, eight different CT reconstruction parameters were varied to explore image- and feature-level harmonization approaches to improve tissue classification.
Methods
Varying reconstructions of an anthropomorphic radiopaque phantom containing three lesion categories (metastasis, hemangioma, and benign cyst) and normal liver tissue were used for evaluating two harmonization methods and their combination: (i) generative adversarial networks (GANs) at the image level; (ii) ComBat at the feature level, and (iii) a combination of (i) and (ii). A total of 93 texture and intensity features were extracted from each tissue class before and after image-level harmonization and were also harmonized at the feature level. Reproducibility and stability were assessed via the Concordance Correlation Coefficient (CCC) and pairwise comparisons using paired stability tests. The ability of features to discriminate between tissue classes was assessed by measuring the area under the receiver operating characteristic curve. The global reproducibility and discriminative power were assessed by averaging over the entire dataset and across all tissue types.
Results
ComBat improved reproducibility by 31.58% and stability by 5.24%, while GAN increased reproducibility by 8% it reduced stability by 4.33%. Classification analysis revealed that ComBat increased average AUC by 15.19%, whereas GAN decreased AUC by 2.56%.
Conclusion
While GAN qualitatively enhances image harmonization, ComBat provides superior statistical improvements in feature stability and classification performance, highlighting the importance of robust feature-level harmonization in radiomics.
Mali SA, Rad NM, Woodruff HC, Depeursinge A, Andrearczyk V, et al. (2025) Harmonizing CT scanner acquisition variability in an anthropomorphic phantom: A comparative study of image-level and feature-level harmonization using GAN, ComBat, and their combination. PLOS ONE 20(5): e0322365. https://doi.org/10.1371/journal.pone.0322365
Impact of synthetic data on training a deep learning model for lesion detection and classification in contrast-enhanced mammography
Van Camp, Astrid, Henry C. Woodruff, Lesley Cockmartin, Marc Lobbes, Michael Majer, Corinne Balleyguier, Nicholas W. Marshall, Hilde Bosmans, and Philippe Lambin
Purpose: Predictive models for contrast-enhanced mammography often perform better at detecting and classifying enhancing masses than (non-enhancing) microcalcification clusters. We aim to investigate whether incorporating synthetic data with simulated microcalcification clusters during training can enhance model performance.
Approach: Microcalcification clusters were simulated in low-energy images of lesion-free breasts from 782 patients, considering local texture features. Enhancement was simulated in the corresponding recombined images. A deep learning (DL) model for lesion detection and classification was trained with varying ratios of synthetic and real (850 patients) data. In addition, a handcrafted radiomics classifier was trained using delineations and class labels from real data, and predictions from both models were ensembled. Validation was performed on internal (212 patients) and external (279 patients) real datasets.
Results: The DL model trained exclusively with synthetic data detected over 60% of malignant lesions. Adding synthetic data to smaller real training sets improved detection sensitivity for malignant lesions but decreased precision. Performance plateaued at a detection sensitivity of 0.80. The ensembled DL and radiomics models performed worse than the standalone DL model, decreasing the area under this receiver operating characteristic curve from 0.75 to 0.60 on the external validation set, likely due to falsely detected suspicious regions of interest.
Conclusions: Synthetic data can enhance DL model performance, provided model setup and data distribution are optimized. The possibility to detect malignant lesions without real data present in the training set confirms the utility of synthetic data. It can serve as a helpful tool, especially when real data are scarce, and it is most effective when complementing real data.
Van Camp A, Woodruff HC, Cockmartin L, Lobbes M, Majer M, Balleyguier C, Marshall NW, Bosmans H, Lambin P. Impact of synthetic data on training a deep learning model for lesion detection and classification in contrast-enhanced mammography. J Med Imaging (Bellingham). 2025 Nov;12(Suppl 2):S22006. doi: 10.1117/1.JMI.12.S2.S22006. Epub 2025 Apr 28. PMID: 40302983; PMCID: PMC12036226.

2024

An automated toolbox for microcalcification cluster modeling for mammographic imaging
Astrid Van Camp, Eva Punter, Katrien Houbrechts, Lesley Cockmartin, Renate Prevos, Nicholas W. Marshall, Henry C. Woodruff, Philippe Lambin, Hilde Bosmans (2024)
Background
Mammographic imaging is essential for breast cancer detection and diagnosis. In addition to masses, calcifications are of concern and the early detection of breast cancer also heavily relies on the correct interpretation of suspicious microcalcification clusters. Even with advances in imaging and the introduction of novel techniques such as digital breast tomosynthesis and contrast-enhanced mammography, a correct interpretation can still be challenging given the subtle nature and large variety of calcifications.
Purpose
Computer simulated lesion models can serve to develop, optimize, or improve imaging techniques. In addition to their use in comparative (virtual clinical trial) detection experiments, these models have potential application in training deep learning models and in the understanding and interpretation of breast lesions. Existing simulation methods, however, often lack the capacity to model the diversity occurring in breast lesions or to generate models relevant for a specific case. This study focuses on clusters of microcalcifications and introduces an automated, flexible toolbox designed to generate microcalcification cluster models customized to specific tasks.
Methods
The toolbox allows users to control a large number of simulation parameters related to model characteristics such as lesion size, calcification shape, or number of microcalcifications per cluster. This leads to the capability of creating models that range from regular to complex clusters. Based on the input parameters, which are either tuned manually or pre-set for a specific clinical type, different sets of models can be simulated depending on the use case. Two lesion generation methods are described. The first method generates three-dimensional microcalcification clusters models based on geometrical shapes and transformations. The second method creates two-dimensional (2D) microcalcification cluster models for a specific 2D mammographic image. This novel method employs radiomics analysis to account for local textures, ensuring the simulated microcalcification cluster is appropriately integrated within the existing breast tissue. The toolbox is implemented in the Python language and can be conveniently run through a Jupyter Notebook interface, openly accessible at https://gitlab.kuleuven.be/medphysqa/deploy/breast-calcifications. Validation studies performed by radiologists assessed the level of malignancy and realism of clusters tuned with specific parameters and inserted in mammographic images.
Results
The flexibility of the toolbox with multiple simulation methods is illustrated, as well as the compatibility with different simulation frameworks and image types. The automation allows for the straightforward and fast generation of diverse microcalcification cluster models. The generated models are most likely applicable for various tasks as they can be configured in a variety of ways and inserted in different types of mammographic images of multiple acquisition systems. Validation studies confirmed the capacity to simulate realistic clusters and capture clinical properties when tuned with appropriate parameter settings.
Conclusion
This simulation toolbox offers a flexible means of simulating microcalcification cluster models with potential use in both technical and clinical research in mammography imaging. The 3D generation methods allow for specifying many characteristics regarding the calcification shape and cluster architecture, and the 2D generation method presents a novel manner to create microcalcification clusters tailored to existing breast textures.
Van Camp A, Punter E, Houbrechts K, et al. An automated toolbox for microcalcification cluster modeling for mammographic imaging. Med Phys. 2024;1-15. https://doi.org/10.1002/mp.17521
Artificial intelligence based data curation: enabling a patient-centric European health data space
de Zegher I, Norak K, Steiger D, Müller H, Kalra D, Scheenstra B, Cina I, Schulz S, Uma K, Kalendralis P, Lotman E-M, Benedikt M, Dumontier M and Celebi R
The emerging European Health Data Space (EHDS) Regulation opens new prospects for large-scale sharing and re-use of health data. Yet, the proposed regulation suffers from two important limitations: it is designed to benefit the whole population with limited consideration for individuals, and the generation of secondary datasets from heterogeneous, unlinked patient data will remain burdensome. AIDAVA, a Horizon Europe project that started in September 2022, proposes to address both shortcomings by providing patients with an AI-based virtual assistant that maximises automation in the integration and transformation of their health data into an interoperable, longitudinal health record. This personal record can then be used to inform patient-related decisions at the point of care, whether this is the usual point of care or a possible cross-border point of care. The personal record can also be used to generate population datasets for research and policymaking. The proposed solution will enable a much-needed paradigm shift in health data management, implementing a ‘curate once at patient level, use many times’ approach, primarily for the benefit of patients and their care providers, but also for more efficient generation of high-quality secondary datasets. After 15 months, the project shows promising preliminary results in achieving automation in the integration and transformation of heterogeneous data of each individual patient, once the content of the data sources managed by the data holders has been formally described. Additionally, the conceptualization phase of the project identified a set of recommendations for the development of a patient-centric EHDS, significantly facilitating the generation of data for secondary use.
Citation: de Zegher I, Norak K, Steiger D, Müller H, Kalra D, Scheenstra B, Cina I, Schulz S, Uma K, Kalendralis P, Lotman E-M, Benedikt M, Dumontier M and Celebi R (2024) Artificial intelligence based data curation: enabling a patient-centric European health data space. Front. Med. 11:1365501. doi: 10.3389/fmed.2024.1365501 https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2024.1365501/full
Disambiguation of acronyms in clinical narratives with large language models.
Kugic, A., Schulz, S., & Kreuzthaler, M. (2024)
Objective: To assess the performance of large language models (LLMs) for zero-shot disambiguation of acronyms in clinical narratives.
Materials and Methods: Clinical narratives in English, German, and Portuguese were applied for testing the performance of four LLMs: GPT-3.5, GPT-4, Llama-2-7b-chat, and Llama-2-70b-chat. For English, the anonymized Clinical Abbreviation Sense Inventory (CASI, University of Minnesota) was used. For German and Portuguese, at least 500 text spans were processed. The output of LLM models, prompted with contextual information, was analyzed to compare their acronym disambiguation capability, grouped by document-level metadata, the source language, and the LLM.
Results: On CASI, GPT-3.5 achieved 0.91 in accuracy. GPT-4 outperformed GPT-3.5 across all datasets, reaching 0.98 in accuracy for CASI, 0.86 and 0.65 for two German datasets, and 0.88 for Portuguese. Llama models only reached 0.73 for CASI and failed severely for German and Portuguese. Across LLMs, performance decreased from English to German and Portuguese processing languages. There was no evidence that additional document-level metadata had a significant effect.
Conclusion: For English clinical narratives, acronym resolution by GPT-4 can be recommended to improve readability of clinical text by patients and professionals. For German and Portuguese, better models are needed. Llama models, which are particularly interesting for processing sensitive content on premise, cannot yet be recommended for acronym resolution.
Kugic, A., Schulz, S., & Kreuzthaler, M. (2024). Disambiguation of acronyms in clinical narratives with large language models. Journal of the American Medical Informatics Association, ocae157. https://doi.org/10.1093/jamia/ocae157
Unraveling Clinical Insights: A Lightweight and Interpretable Approach for Multimodal and Multilingual Knowledge Integration.
Uma, K., & Moens, M. F. (2024).
In recent years, the analysis of clinical texts has evolved significantly, driven by the emergence of language models like BERT such as PubMedBERT, and ClinicalBERT, which have been tailored for the (bio)medical domain that rely on extensive archives of medical documents. While they boast high accuracy, their lack of interpretability and language transfer limitations restrict their clinical utility. To address this, we propose a new, lightweight graph-based embedding method designed specifically for radiology reports. This approach considers the report’s structure and content, connecting medical terms through the multilingual SNOMED Clinical Terms knowledge base. The resulting graph embedding reveals intricate relationships among clinical terms, enhancing both clinician comprehension and clinical accuracy without the need for large pre-training datasets. Demonstrating the versatility of our method, we apply this embedding to two tasks: disease and image classification in X-ray reports. In disease classification, our model competes effectively with BERT-based approaches, yet it is significantly smaller and requires less training data. Additionally, in image classification, we illustrate the efficacy of the graph embedding by leveraging cross-modal knowledge transfer, highlighting its applicability across diverse languages.
Uma, K., & Moens, M. F. (2024). Unraveling Clinical Insights: A Lightweight and Interpretable Approach for Multimodal and Multilingual Knowledge Integration. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health)@ LREC-COLING 2024 (pp. 197-203). https://aclanthology.org/2024.cl4health-1.24
Kommunikationsfähigkeit und Interoperabilität von Gesundheitsdaten in einem vernetzten Gesundheitssystem.
Daumke, P., Haverkamp, C., Heckmann, S., Kuper, M., Müller, A., Oemig, F., ... & Schulz, S. (2024).
Interoperabilität ist für ein vernetztes Gesundheitssystems unabdingbar. Basierend auf Terminologiestandards wie ICD, LOINC und SNOMED CT erfordert sie eine korrekte Interpretation von Patientendaten in der jeweiligen Anwendungssituation. Dies wird unterstützt durch syntaktische Standards wie FHIR, welche Codes in den patientenspezifischen Kontext einbetten. Um Routinedaten interoperabel zu machen, ist die Kluft zwischen klinischer Sprache und normierter Dokumentation zu überbrücken. Natural Language Processing (NLP) ist hierbei eine Technologie, die sich derzeit im Zeichen der Künstlichen Intelligenz rapide weiterentwickelt. Die Kommunikation mit dem Computer in menschlicher Sprache wird erheblich an Bedeutung gewinnen. Das Kapitel gibt einen Einblick in aktuelle Techniken und Ressourcen zur Unterstützung von Interoperabilität. Dazu kommen Perspektiven der Gesundheitsversorgung, Gesundheitsverwaltung, Wissenschaft, Industrie und Selbstverwaltung zur Sprache.
Daumke, P., Haverkamp, C., Heckmann, S., Kuper, M., Müller, A., Oemig, F., ... & Schulz, S. (2024). Kommunikationsfähigkeit und Interoperabilität von Gesundheitsdaten in einem vernetzten Gesundheitssystem. In Health Data Management: Schlüsselfaktor für erfolgreiche Krankenhäuser (pp. 457-496). Wiesbaden: Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-43236-2_41
Towards Explainability in Automated Medical Code Prediction from Clinical Records.
Uma, K., Francis, S., Sun, W., Moens, MF. (2024).
The International Statistical Classification of Diseases and Related Health Problems (ICD) is a global standard, a diagnostic tool that is frequently used for endemic research, health management, and clinical diagnosis, and it plays a crucial role in providing shrewd medical treatment. Comparable statistics on the causes of mortality and morbidity across locations and throughout time have been based on the ICD. The traditional procedure of assigning codes is expensive, error-prone and time-consuming, and automated mapping of ICD codes is now a significant area of scholarly research. With the help of statistical modeling, rule-engines, conventional machine learning, and deep learning techniques like graph embedding, attention mechanisms, adversarial learning, and pre-trained language models (PLMs), this paper aims to analyze and document inferences on the evolution of clinical coding automation. We try to summarize with comparative performance analysis various approaches addressed towards codification of free-text clinical narratives on the publicly available Medical Information Mart. This study investigates whether clinicians and researchers could benefit from an adequate interpretation of model predictions from an Explainable Artificial Intelligence (XAI) perspective. Finally, the survey illustrates ICD coding and disease classification applications and its challenges, evaluation metrics, datasets, and directions towards automating explanatory medical code predictions.
Uma, K., Francis, S., Sun, W., Moens, MF. (2024). Towards Explainability in Automated Medical Code Prediction from Clinical Records. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 825. Springer, Cham. https://doi.org/10.1007/978-3-031-47718-8_40
Deep Learning of Multimodal Ultrasound: Stratifying the Response to Neoadjuvant Chemotherapy in Breast Cancer Before Treatment
Gu, Jionghui ; Zhong, Xian ; Fang, Chengyu ; Lou, Wenjing ; Fu, Peifen ; Woodruff, Henry C. ; Wang, Baohua ; Jiang, Tian’an ; Lambin, Philippe
Background: Not only should resistance to neoadjuvant chemotherapy (NAC) be considered in patients with breast cancer but also the possibility of achieving a pathologic complete response (PCR) after NAC. Our study aims to develop 2 multimodal ultrasound deep learning (DL) models to noninvasively predict resistance and PCR to NAC before treatment.
Methods: From January 2017 to July 2022, a total of 170 patients with breast cancer were prospectively enrolled. All patients underwent multimodal ultrasound examination (grayscale 2D ultrasound and ultrasound elastography) before NAC. We combined clinicopathological information to develop 2 DL models, DL_Clinical_resistance and DL_Clinical_PCR, for predicting resistance and PCR to NAC, respectively. In addition, these 2 models were combined to stratify the prediction of response to NAC.
Results: In the test cohort, DL_Clinical_resistance had an AUC of 0.911 (95%CI, 0.814-0.979) with a sensitivity of 0.905 (95%CI, 0.765-1.000) and an NPV of 0.882 (95%CI, 0.708-1.000). Meanwhile, DL_Clinical_PCR achieved an AUC of 0.880 (95%CI, 0.751-0.973) and sensitivity and NPV of 0.875 (95%CI, 0.688-1.000) and 0.895 (95%CI, 0.739-1.000), respectively. By combining DL_Clinical_resistance and DL_Clinical_PCR, 37.1% of patients with resistance and 25.7% of patients with PCR were successfully identified by the combined model, suggesting that these patients could benefit by an early change of treatment strategy or by implementing an organ preservation strategy after NAC.
Conclusions: The proposed DL_Clinical_resistance and DL_Clinical_PCR models and combined strategy have the potential to predict resistance and PCR to NAC before treatment and allow stratified prediction of NAC response.
Gu J, Zhong X, Fang C, Lou W, Fu P, Woodruff HC, Wang B, Jiang T, Lambin P. Deep Learning of Multimodal Ultrasound: Stratifying the Response to Neoadjuvant Chemotherapy in Breast Cancer Before Treatment. Oncologist. 2024 Feb 2;29(2):e187-e197. doi: 10.1093/oncolo/oyad227. PMID: 37669223; PMCID: PMC10836325.
Simulated image-specific microcalcification clusters and associated mass enhancement to enhance training of a deep learning model for cancer detection in contrast-enhanced mammography
Van Camp, Astrid ; Woodruff, Henry C. ; Cockmartin, Lesley ; Marshall, Nicholas William ; Bosmans, Hilde T.C. ; Lambin, Philippe
We present an automated method to generate synthetic contrast-enhanced mammography cases with simulated microcalcification clusters. This method accounts for existing textures in the breast, with the simulated clusters inserted in the low-energy image. In parallel, potential mass-like enhancement is modelled from real values in the recombined image. The same deep learning model was trained with different amounts and ratios of real and synthetic data. When trained with real data only, malignant masses are more often correctly detected and classified than malignant microcalcification clusters. The addition of synthetic data with simulated clusters during training could increase detection sensitivity for all types of malignant lesions and maintained similar levels of AUC for classification. This enhanced performance was consistent on both internal and external test sets. These findings demonstrate the potential applicability of synthetic data to enhance deep learning models, especially when real data are scarce or imbalanced.
Astrid Van Camp, Henry C. Woodruff, Lesley Cockmartin, Nicholas W. Marshall, Hilde Bosmans, and Philippe Lambin "Simulated image-specific microcalcification clusters and associated mass enhancement to enhance training of a deep learning model for cancer detection in contrast-enhanced mammography", Proc. SPIE 13174, 17th International Workshop on Breast Imaging (IWBI 2024), 1317404 (29 May 2024); https://doi.org/10.1117/12.3026879

2023

Towards principles of ontology-based annotation of clinical narratives.
Schulz, S., Del-Pinto, W., Han, L., Kreuzthaler, M., Aghaei, S., & Nenadic, G. (2023).
Despite the increasing availability of ontology-based semantic resources for biomedical content representation, large amounts of clinical data are in narrative form only. Therefore, many clinical information management tasks require information extraction using natural language processing (NLP).
Clinical corpora annotated by humans are crucial resources for this purpose. On the one hand, they are needed to domain-fine-tune language models (LMs) with the purpose to formally represent clinical information extracted from unstructured free-text. On the other hand, annotated corpora are indispensable for assessing the results of information extracting using NLP.
The effectiveness of annotations crucially depends on annotation quality. Detailed annotation guidelines, which define the form that extracted information should take, prevent human annotators from taking erratic annotation decisions and guarantee a good inter-annotator agreement. Our hypothesis is that, to this end, annotations should (i) be based on ontological principles and (ii) be consistent with existing clinical documentation standards.
With the experience of several annotation projects we highlight the need for sophisticated guidelines. We formulate a set of abstract principles on which such guidelines should be based, followed by examples how to keep them, on the one hand, user-friendly and consistent, and on the other hand compatible with the international semantic standards SNOMED CT and FHIR, including their areas of overlap.
We sketch the representation of the resulting representations in a knowledge graph as a state-of-the-art semantic representation paradigm, which can be enriched by additional content on A-Box and T-Box level and on which symbolic and neural reasoning tasks can be applied.
Schulz, S., Del-Pinto, W., Han, L., Kreuzthaler, M., Aghaei, S., & Nenadic, G. (2023). Towards principles of ontology-based annotation of clinical narratives. In Proceedings of the International Conference on Biomedical Ontologies (Vol. 2023). https://ceur-ws.org/Vol-3603/Paper4.pdf
Semantic Annotation of Tabular Data for Machine-to-Machine Interoperability via Neuro-Symbolic Anchoring
Shervin Mehryar, Remzi Celebi
In this paper we investigate automated annotation of tabular data using semantic technologies in combination with neural network embedding. Specifically, we propose an anchoring model in which property and cell types from the data embedding space are aligned with ontology relation and entity types. We show that by combining the power of symbolic reasoning, neural embeddings, and loss function design, a significant performance improvement as high as 86% for column property, 82% for column type, and 87% for column qualifier annotations can be achieved based on DBpedia and Wikidata table extractions.
Shervin Mehryar, Remzi Celebi. Semantic Annotation of Tabular Data for Machine-to-Machine Interoperability via Neuro-Symbolic Anchoring. SemTab’23: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching 2023, co-located with the 22nd. International Semantic Web Conference (ISWC), November 6-10, 2023, Athens, Greece
AI for life: Trends in artificial intelligence for biotechnology
Andreas Holzinger, Katharina Keiblinger, Petr Holub, Kurt Zatloukal, Heimo Müller
Due to popular successes (e.g., ChatGPT) Artificial Intelligence (AI) is on everyone's lips today. When advances in biotechnology are combined with advances in AI unprecedented new potential solutions become available. This can help with many global problems and contribute to important Sustainability Development Goals. Current examples include Food Security, Health and Well-being, Clean Water, Clean Energy, Responsible Consumption and Production, Climate Action, Life below Water, or protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss. AI is ubiquitous in the life sciences today. Topics include a wide range from machine learning and Big Data analytics, knowledge discovery and data mining, biomedical ontologies, knowledge-based reasoning, natural language processing, decision support and reasoning under uncertainty, temporal and spatial representation and inference, and methodological aspects of explainable AI (XAI) with applications of biotechnology. In this pre-Editorial paper, we provide an overview of open research issues and challenges for each of the topics addressed in this special issue. Potential authors can directly use this as a guideline for developing their paper.
Holzinger A, Keiblinger K, Holub P, Zatloukal K, Müller H. AI for life: Trends in artificial intelligence for biotechnology. N Biotechnol. 2023;74:16-24. doi: 10.1016/j.nbt.2023.02.001. PMID: 36754147
Masking Language Model Mechanism with Event-Driven Knowledge Graphs for Temporal Relations Extraction from Clinical Narratives.
Uma, K., Francis, S., & Moens, M. F. (2023).
For many natural language processing systems, the extraction of temporal links and associations from clinical narratives has been a critical challenge. To understand such processes, we must be aware of the occurrences of events and their time or temporal aspect by constructing a chronology for the sequence of events. The primary objective of temporal relation extraction is to identify relationships and correlations between entities, events, and expressions. We propose a novel architecture leveraging Transformer based graph neural network by combining textual data with event graph embeddings for predicting temporal links across events, entities, document creation time and expressions. We demonstrate our preliminary findings on i2b2 temporal relations corpus for predicting BEFORE, AFTER and OVERLAP links with event graph for correct set of relations. Comparison with various Biomedical-BERT embedding types were benchmarked yielding best performance on PubMed BERT with language model masking (LMM) mechanism on our methodology. This illustrates the effectiveness of our proposed strategy.
Uma, K., Francis, S., & Moens, M. F. (2023). Masking Language Model Mechanism with Event-Driven Knowledge Graphs for Temporal Relations Extraction from Clinical Narratives. In International Conference on Complex Networks and Their Applications (pp. 162-174). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-53468-3_14
Toward human-level concept learning: Pattern benchmarking for AI algorithms
Holzinger, A., Saranti, A., Angerschmid, A., Finzel, B., Schmid, U., & Mueller, H. (2023).
Artificial intelligence (AI) today is very successful at standard pattern-recognition tasks due to the availability of large amounts of data and advances in statistical data-driven machine learning. However, there is still a large gap between AI pattern recognition and human-level concept learning. Humans can learn amazingly well even under uncertainty from just a few examples and are capable of generalizing these concepts to solve new conceptual problems. The growing interest in explainable machine intelligence requires experimental environments and diagnostic/benchmark datasets to analyze existing approaches and drive progress in pattern analysis and machine intelligence. In this paper, we provide an overview of current AI solutions for benchmarking concept learning, reasoning, and generalization; discuss the state-of-the-art of existing diagnostic/benchmark datasets (such as CLEVR, CLEVRER, CLOSURE, CURI, Bongard-LOGO, V-PROM, RAVEN, Kandinsky Patterns, CLEVR-Humans, CLEVRER-Humans, and their extension containing human language); and provide an outlook of some future research directions in this exciting research domain.
Holzinger A, Saranti A, Angerschmid A, Finzel B, Schmid U, Mueller H. Toward human-level concept learning: Pattern benchmarking for AI algorithms. Patterns (N Y). 2023 Jul 5;4(8):100788. doi: 10.1016/j.patter.2023.100788. PMC: 10435961
Combining Deep Learning and Handcrafted Radiomics for Classification of Suspicious Lesions on Contrast-enhanced Mammograms
Beuque, Manon P.L. ; Lobbes, Marc B.I. ; van Wijk, Yvonka ; Widaatalla, Yousif ; Primakov, Sergey P. ; Majer, Michael ; Balleyguier, Corinne S. ; Woodruff, Henry C. ; Lambin, Philippe
Background
Handcrafted radiomics and deep learning (DL) models individually achieve good performance in lesion classification (benign vs malignant) on contrast-enhanced mammography (CEM) images.
Purpose
To develop a comprehensive machine learning tool able to fully automatically identify, segment, and classify breast lesions on the basis of CEM images in recall patients.
Materials and Methods
CEM images and clinical data were retrospectively collected between 2013 and 2018 for 1601 recall patients at Maastricht UMC+ and 283 patients at Gustave Roussy Institute for external validation. Lesions with a known status (malignant or benign) were delineated by a research assistant overseen by an expert breast radiologist. Preprocessed low-energy and recombined images were used to train a DL model for automatic lesion identification, segmentation, and classification. A handcrafted radiomics model was also trained to classify both human- and DL-segmented lesions. Sensitivity for identification and the area under the receiver operating characteristic curve (AUC) for classification were compared between individual and combined models at the image and patient levels.
Results
After the exclusion of patients without suspicious lesions, the total number of patients included in the training, test, and validation data sets were 850 (mean age, 63 years ± 8 [SD]), 212 (62 years ± 8), and 279 (55 years ± 12), respectively. In the external data set, lesion identification sensitivity was 90% and 99% at the image and patient level, respectively, and the mean Dice coefficient was 0.71 and 0.80 at the image and patient level, respectively. Using manual segmentations, the combined DL and handcrafted radiomics classification model achieved the highest AUC (0.88 [95% CI: 0.86, 0.91]) (P < .05 except compared with DL, handcrafted radiomics, and clinical features model, where P = .90). Using DL-generated segmentations, the combined DL and handcrafted radiomics model showed the highest AUC (0.95 [95% CI: 0.94, 0.96]) (P < .05).
Conclusion
The DL model accurately identified and delineated suspicious lesions on CEM images, and the combined output of the DL and handcrafted radiomics models achieved good diagnostic performance.
Beuque, Manon P.L. ; Lobbes, Marc B.I. ; van Wijk, Yvonka ; Widaatalla, Yousif ; Primakov, Sergey P. ; Majer, Michael ; Balleyguier, Corinne S. ; Woodruff, Henry C. ; Lambin, Philippe. Combining Deep Learning and Handcrafted Radiomics for Classification of Suspicious Lesions on Contrast-enhanced Mammograms (2023) Radiology, 307 (5), art. no. e221843. DOI: 10.1148/radiol.221843.