Follow us on Twitter
Follow us on Linkedin
Latest News
Automate Curation and Publishing of
Personal Health Data Through Artificial Intelligence

Publications

2024

  • An automated toolbox for microcalcification cluster modeling for mammographic imaging

    Astrid Van Camp, Eva Punter, Katrien Houbrechts, Lesley Cockmartin, Renate Prevos, Nicholas W. Marshall, Henry C. Woodruff, Philippe Lambin, Hilde Bosmans (2024)

    Background
    Mammographic imaging is essential for breast cancer detection and diagnosis. In addition to masses, calcifications are of concern and the early detection of breast cancer also heavily relies on the correct interpretation of suspicious microcalcification clusters. Even with advances in imaging and the introduction of novel techniques such as digital breast tomosynthesis and contrast-enhanced mammography, a correct interpretation can still be challenging given the subtle nature and large variety of calcifications.

    Purpose
    Computer simulated lesion models can serve to develop, optimize, or improve imaging techniques. In addition to their use in comparative (virtual clinical trial) detection experiments, these models have potential application in training deep learning models and in the understanding and interpretation of breast lesions. Existing simulation methods, however, often lack the capacity to model the diversity occurring in breast lesions or to generate models relevant for a specific case. This study focuses on clusters of microcalcifications and introduces an automated, flexible toolbox designed to generate microcalcification cluster models customized to specific tasks.

    Methods
    The toolbox allows users to control a large number of simulation parameters related to model characteristics such as lesion size, calcification shape, or number of microcalcifications per cluster. This leads to the capability of creating models that range from regular to complex clusters. Based on the input parameters, which are either tuned manually or pre-set for a specific clinical type, different sets of models can be simulated depending on the use case. Two lesion generation methods are described. The first method generates three-dimensional microcalcification clusters models based on geometrical shapes and transformations. The second method creates two-dimensional (2D) microcalcification cluster models for a specific 2D mammographic image. This novel method employs radiomics analysis to account for local textures, ensuring the simulated microcalcification cluster is appropriately integrated within the existing breast tissue. The toolbox is implemented in the Python language and can be conveniently run through a Jupyter Notebook interface, openly accessible at https://gitlab.kuleuven.be/medphysqa/deploy/breast-calcifications. Validation studies performed by radiologists assessed the level of malignancy and realism of clusters tuned with specific parameters and inserted in mammographic images.

    Results
    The flexibility of the toolbox with multiple simulation methods is illustrated, as well as the compatibility with different simulation frameworks and image types. The automation allows for the straightforward and fast generation of diverse microcalcification cluster models. The generated models are most likely applicable for various tasks as they can be configured in a variety of ways and inserted in different types of mammographic images of multiple acquisition systems. Validation studies confirmed the capacity to simulate realistic clusters and capture clinical properties when tuned with appropriate parameter settings.

    Conclusion
    This simulation toolbox offers a flexible means of simulating microcalcification cluster models with potential use in both technical and clinical research in mammography imaging. The 3D generation methods allow for specifying many characteristics regarding the calcification shape and cluster architecture, and the 2D generation method presents a novel manner to create microcalcification clusters tailored to existing breast textures.

    Van Camp A, Punter E, Houbrechts K, et al. An automated toolbox for microcalcification cluster modeling for mammographic imaging. Med Phys. 2024;1-15. https://doi.org/10.1002/mp.17521

  • Artificial intelligence based data curation: enabling a patient-centric European health data space

    de Zegher I, Norak K, Steiger D, Müller H, Kalra D, Scheenstra B, Cina I, Schulz S, Uma K, Kalendralis P, Lotman E-M, Benedikt M, Dumontier M and Celebi R

    The emerging European Health Data Space (EHDS) Regulation opens new prospects for large-scale sharing and re-use of health data. Yet, the proposed regulation suffers from two important limitations: it is designed to benefit the whole population with limited consideration for individuals, and the generation of secondary datasets from heterogeneous, unlinked patient data will remain burdensome. AIDAVA, a Horizon Europe project that started in September 2022, proposes to address both shortcomings by providing patients with an AI-based virtual assistant that maximises automation in the integration and transformation of their health data into an interoperable, longitudinal health record. This personal record can then be used to inform patient-related decisions at the point of care, whether this is the usual point of care or a possible cross-border point of care. The personal record can also be used to generate population datasets for research and policymaking. The proposed solution will enable a much-needed paradigm shift in health data management, implementing a ‘curate once at patient level, use many times’ approach, primarily for the benefit of patients and their care providers, but also for more efficient generation of high-quality secondary datasets. After 15 months, the project shows promising preliminary results in achieving automation in the integration and transformation of heterogeneous data of each individual patient, once the content of the data sources managed by the data holders has been formally described. Additionally, the conceptualization phase of the project identified a set of recommendations for the development of a patient-centric EHDS, significantly facilitating the generation of data for secondary use.

    Citation: de Zegher I, Norak K, Steiger D, Müller H, Kalra D, Scheenstra B, Cina I, Schulz S, Uma K, Kalendralis P, Lotman E-M, Benedikt M, Dumontier M and Celebi R (2024) Artificial intelligence based data curation: enabling a patient-centric European health data space. Front. Med. 11:1365501. doi: 10.3389/fmed.2024.1365501

    https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2024.1365501/full

  • Disambiguation of acronyms in clinical narratives with large language models.

    Kugic, A., Schulz, S., & Kreuzthaler, M. (2024)

    Objective: To assess the performance of large language models (LLMs) for zero-shot disambiguation of acronyms in clinical narratives.

    Materials and Methods: Clinical narratives in English, German, and Portuguese were applied for testing the performance of four LLMs: GPT-3.5, GPT-4, Llama-2-7b-chat, and Llama-2-70b-chat. For English, the anonymized Clinical Abbreviation Sense Inventory (CASI, University of Minnesota) was used. For German and Portuguese, at least 500 text spans were processed. The output of LLM models, prompted with contextual information, was analyzed to compare their acronym disambiguation capability, grouped by document-level metadata, the source language, and the LLM.

    Results: On CASI, GPT-3.5 achieved 0.91 in accuracy. GPT-4 outperformed GPT-3.5 across all datasets, reaching 0.98 in accuracy for CASI, 0.86 and 0.65 for two German datasets, and 0.88 for Portuguese. Llama models only reached 0.73 for CASI and failed severely for German and Portuguese. Across LLMs, performance decreased from English to German and Portuguese processing languages. There was no evidence that additional document-level metadata had a significant effect.

    Conclusion: For English clinical narratives, acronym resolution by GPT-4 can be recommended to improve readability of clinical text by patients and professionals. For German and Portuguese, better models are needed. Llama models, which are particularly interesting for processing sensitive content on premise, cannot yet be recommended for acronym resolution.

    1. Kugic, A., Schulz, S., & Kreuzthaler, M. (2024). Disambiguation of acronyms in clinical narratives with large language models. Journal of the American Medical Informatics Association, ocae157. https://doi.org/10.1093/jamia/ocae157
  • Unraveling Clinical Insights: A Lightweight and Interpretable Approach for Multimodal and Multilingual Knowledge Integration.

    Uma, K., & Moens, M. F. (2024).

    In recent years, the analysis of clinical texts has evolved significantly, driven by the emergence of language models like BERT such as PubMedBERT, and ClinicalBERT, which have been tailored for the (bio)medical domain that rely on extensive archives of medical documents. While they boast high accuracy, their lack of interpretability and language transfer limitations restrict their clinical utility. To address this, we propose a new, lightweight graph-based embedding method designed specifically for radiology reports. This approach considers the report’s structure and content, connecting medical terms through the multilingual SNOMED Clinical Terms knowledge base. The resulting graph embedding reveals intricate relationships among clinical terms, enhancing both clinician comprehension and clinical accuracy without the need for large pre-training datasets. Demonstrating the versatility of our method, we apply this embedding to two tasks: disease and image classification in X-ray reports. In disease classification, our model competes effectively with BERT-based approaches, yet it is significantly smaller and requires less training data. Additionally, in image classification, we illustrate the efficacy of the graph embedding by leveraging cross-modal knowledge transfer, highlighting its applicability across diverse languages.

    Uma, K., & Moens, M. F. (2024). Unraveling Clinical Insights: A Lightweight and Interpretable Approach for Multimodal and Multilingual Knowledge Integration. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health)@ LREC-COLING 2024 (pp. 197-203). https://aclanthology.org/2024.cl4health-1.24/

  • Kommunikationsfähigkeit und Interoperabilität von Gesundheitsdaten in einem vernetzten Gesundheitssystem.

    Daumke, P., Haverkamp, C., Heckmann, S., Kuper, M., Müller, A., Oemig, F., ... & Schulz, S. (2024).

    Interoperabilität ist für ein vernetztes Gesundheitssystems unabdingbar. Basierend auf Terminologiestandards wie ICD, LOINC und SNOMED CT erfordert sie eine korrekte Interpretation von Patientendaten in der jeweiligen Anwendungssituation. Dies wird unterstützt durch syntaktische Standards wie FHIR, welche Codes in den patientenspezifischen Kontext einbetten. Um Routinedaten interoperabel zu machen, ist die Kluft zwischen klinischer Sprache und normierter Dokumentation zu überbrücken. Natural Language Processing (NLP) ist hierbei eine Technologie, die sich derzeit im Zeichen der Künstlichen Intelligenz rapide weiterentwickelt. Die Kommunikation mit dem Computer in menschlicher Sprache wird erheblich an Bedeutung gewinnen. Das Kapitel gibt einen Einblick in aktuelle Techniken und Ressourcen zur Unterstützung von Interoperabilität. Dazu kommen Perspektiven der Gesundheitsversorgung, Gesundheitsverwaltung, Wissenschaft, Industrie und Selbstverwaltung zur Sprache.

    Daumke, P., Haverkamp, C., Heckmann, S., Kuper, M., Müller, A., Oemig, F., ... & Schulz, S. (2024). Kommunikationsfähigkeit und Interoperabilität von Gesundheitsdaten in einem vernetzten Gesundheitssystem. In Health Data Management: Schlüsselfaktor für erfolgreiche Krankenhäuser (pp. 457-496). Wiesbaden: Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-43236-2_41

  • Simulated image-specific microcalcification clusters and associated mass enhancement to enhance training of a deep learning model for cancer detection in contrast-enhanced mammography.

    Van Camp, A., Woodruff, H. C., Cockmartin, L., Marshall, N. W., Bosmans, H., & Lambin, P. (2024).

    We present an automated method to generate synthetic contrast-enhanced mammography cases with simulated microcalcification clusters. This method accounts for existing textures in the breast, with the simulated clusters inserted in the low-energy image. In parallel, potential mass-like enhancement is modelled from real values in the recombined image. The same deep learning model was trained with different amounts and ratios of real and synthetic data. When trained with real data only, malignant masses are more often correctly detected and classified than malignant microcalcification clusters. The addition of synthetic data with simulated clusters during training could increase detection sensitivity for all types of malignant lesions and maintained similar levels of AUC for classification. This enhanced performance was consistent on both internal and external test sets. These findings demonstrate the potential applicability of synthetic data to enhance deep learning models, especially when real data are scarce or imbalanced.

    1. Van Camp, A., Woodruff, H. C., Cockmartin, L., Marshall, N. W., Bosmans, H., & Lambin, P. (2024). Simulated image-specific microcalcification clusters and associated mass enhancement to enhance training of a deep learning model for cancer detection in contrast-enhanced mammography. In 17th International Workshop on Breast Imaging (IWBI 2024) (Vol. 13174, pp. 13-19). SPIE. https://doi.org/10.1117/12.3026879
  • Deep Learning of Multimodal Ultrasound: Stratifying the Response to Neoadjuvant Chemotherapy in Breast Cancer Before Treatment.

    6. Gu, J., Zhong, X., Fang, C., Lou, W., Fu, P., Woodruff, H. C., ... & Lambin, P. (2024).

    Background: Not only should resistance to neoadjuvant chemotherapy (NAC) be considered in patients with breast cancer but also the possibility of achieving a pathologic complete response (PCR) after NAC. Our study aims to develop 2 multimodal ultrasound deep learning (DL) models to noninvasively predict resistance and PCR to NAC before treatment.

    Methods: From January 2017 to July 2022, a total of 170 patients with breast cancer were prospectively enrolled. All patients underwent multimodal ultrasound examination (grayscale 2D ultrasound and ultrasound elastography) before NAC. We combined clinicopathological information to develop 2 DL models, DL_Clinical_resistance and DL_Clinical_PCR, for predicting resistance and PCR to NAC, respectively. In addition, these 2 models were combined to stratify the prediction of response to NAC.

    Results: In the test cohort, DL_Clinical_resistance had an AUC of 0.911 (95%CI, 0.814-0.979) with a sensitivity of 0.905 (95%CI, 0.765-1.000) and an NPV of 0.882 (95%CI, 0.708-1.000). Meanwhile, DL_Clinical_PCR achieved an AUC of 0.880 (95%CI, 0.751-0.973) and sensitivity and NPV of 0.875 (95%CI, 0.688-1.000) and 0.895 (95%CI, 0.739-1.000), respectively. By combining DL_Clinical_resistance and DL_Clinical_PCR, 37.1% of patients with resistance and 25.7% of patients with PCR were successfully identified by the combined model, suggesting that these patients could benefit by an early change of treatment strategy or by implementing an organ preservation strategy after NAC.

    Conclusions: The proposed DL_Clinical_resistance and DL_Clinical_PCR models and combined strategy have the potential to predict resistance and PCR to NAC before treatment and allow stratified prediction of NAC response.

    Gu J, Zhong X, Fang C, Lou W, Fu P, Woodruff HC, Wang B, Jiang T, Lambin P. Deep Learning of Multimodal Ultrasound: Stratifying the Response to Neoadjuvant Chemotherapy in Breast Cancer Before Treatment. The Oncologist. 2024 Feb; 29(2): e187–e197. doi: 10.1093/oncolo/oyad227

  • Towards Explainability in Automated Medical Code Prediction from Clinical Records.

    Uma, K., Francis, S., Sun, W., Moens, MF. (2024).

    The International Statistical Classification of Diseases and Related Health Problems (ICD) is a global standard, a diagnostic tool that is frequently used for endemic research, health management, and clinical diagnosis, and it plays a crucial role in providing shrewd medical treatment. Comparable statistics on the causes of mortality and morbidity across locations and throughout time have been based on the ICD. The traditional procedure of assigning codes is expensive, error-prone and time-consuming, and automated mapping of ICD codes is now a significant area of scholarly research. With the help of statistical modeling, rule-engines, conventional machine learning, and deep learning techniques like graph embedding, attention mechanisms, adversarial learning, and pre-trained language models (PLMs), this paper aims to analyze and document inferences on the evolution of clinical coding automation. We try to summarize with comparative performance analysis various approaches addressed towards codification of free-text clinical narratives on the publicly available Medical Information Mart. This study investigates whether clinicians and researchers could benefit from an adequate interpretation of model predictions from an Explainable Artificial Intelligence (XAI) perspective. Finally, the survey illustrates ICD coding and disease classification applications and its challenges, evaluation metrics, datasets, and directions towards automating explanatory medical code predictions.

    1. Uma, K., Francis, S., Sun, W., Moens, MF. (2024). Towards Explainability in Automated Medical Code Prediction from Clinical Records. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 825. Springer, Cham. https://doi.org/10.1007/978-3-031-47718-8_40

2023

  • Towards principles of ontology-based annotation of clinical narratives.

    Schulz, S., Del-Pinto, W., Han, L., Kreuzthaler, M., Aghaei, S., & Nenadic, G. (2023).

    Despite the increasing availability of ontology-based semantic resources for biomedical content representation, large amounts of clinical data are in narrative form only. Therefore, many clinical information management tasks require information extraction using natural language processing (NLP).

    Clinical corpora annotated by humans are crucial resources for this purpose. On the one hand, they are needed to domain-fine-tune language models (LMs) with the purpose to formally represent clinical information extracted from unstructured free-text. On the other hand, annotated corpora are indispensable for assessing the results of information extracting using NLP.

    The effectiveness of annotations crucially depends on annotation quality. Detailed annotation guidelines, which define the form that extracted information should take, prevent human annotators from taking erratic annotation decisions and guarantee a good inter-annotator agreement. Our hypothesis is that, to this end, annotations should (i) be based on ontological principles and (ii) be consistent with existing clinical documentation standards.

    With the experience of several annotation projects we highlight the need for sophisticated guidelines. We formulate a set of abstract principles on which such guidelines should be based, followed by examples how to keep them, on the one hand, user-friendly and consistent, and on the other hand compatible with the international semantic standards SNOMED CT and FHIR, including their areas of overlap.

    We sketch the representation of the resulting representations in a knowledge graph as a state-of-the-art semantic representation paradigm, which can be enriched by additional content on A-Box and T-Box level and on which symbolic and neural reasoning tasks can be applied.

    Schulz, S., Del-Pinto, W., Han, L., Kreuzthaler, M., Aghaei, S., & Nenadic, G. (2023). Towards principles of ontology-based annotation of clinical narratives. In Proceedings of the International Conference on Biomedical Ontologies (Vol. 2023). https://ceur-ws.org/Vol-3603/Paper4.pdf

  • Semantic Annotation of Tabular Data for Machine-to-Machine Interoperability via Neuro-Symbolic Anchoring

    Shervin Mehryar, Remzi Celebi

    In this paper we investigate automated annotation of tabular data using semantic technologies in combination with neural network embedding. Specifically, we propose an anchoring model in which property and cell types from the data embedding space are aligned with ontology relation and entity types. We show that by combining the power of symbolic reasoning, neural embeddings, and loss function design, a significant performance improvement as high as 86% for column property, 82% for column type, and 87% for column qualifier annotations can be achieved based on DBpedia and Wikidata table extractions.

    Shervin Mehryar, Remzi Celebi. Semantic Annotation of Tabular Data for Machine-to-Machine Interoperability via Neuro-Symbolic Anchoring. SemTab’23: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching 2023, co-located with the 22nd. International Semantic Web Conference (ISWC), November 6-10, 2023, Athens, Greece

  • AI for life: Trends in artificial intelligence for biotechnology

    Andreas Holzinger, Katharina Keiblinger, Petr Holub, Kurt Zatloukal, Heimo Müller

    Due to popular successes (e.g., ChatGPT) Artificial Intelligence (AI) is on everyone's lips today. When advances in biotechnology are combined with advances in AI unprecedented new potential solutions become available. This can help with many global problems and contribute to important Sustainability Development Goals. Current examples include Food Security, Health and Well-being, Clean Water, Clean Energy, Responsible Consumption and Production, Climate Action, Life below Water, or protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss. AI is ubiquitous in the life sciences today. Topics include a wide range from machine learning and Big Data analytics, knowledge discovery and data mining, biomedical ontologies, knowledge-based reasoning, natural language processing, decision support and reasoning under uncertainty, temporal and spatial representation and inference, and methodological aspects of explainable AI (XAI) with applications of biotechnology. In this pre-Editorial paper, we provide an overview of open research issues and challenges for each of the topics addressed in this special issue. Potential authors can directly use this as a guideline for developing their paper.

    Holzinger A, Keiblinger K, Holub P, Zatloukal K, Müller H. AI for life: Trends in artificial intelligence for biotechnology. N Biotechnol. 2023;74:16-24. doi: 10.1016/j.nbt.2023.02.001. PMID: 36754147

  • Combining Deep Learning and Handcrafted Radiomics for Classification of Suspicious Lesions on Contrast-enhanced Mammograms

    Manon P. L. Beuque, Marc B. I. Lobbes, Yvonka van Wijk, Yousif Widaatalla, Sergey Primakov, Michael Majer, Corinne Balleyguier, Henry C. Woodruff, Philippe Lambin

    A deep learning algorithm was able to accurately identify and delineate suspicious lesions on contrast-enhanced mammograms, and the combined outputs of this tool and a handcrafted radiomics model achieved good diagnostic performance.

    Background Handcrafted radiomics and deep learning (DL) models individually achieve good performance in lesion classification (benign vs malignant) on contrast-enhanced mammography (CEM) images. Purpose To develop a comprehensive machine learning tool able to fully automatically identify, segment, and classify breast lesions on the basis of CEM images in recall patients. Materials and Methods CEM images and clinical data were retrospectively collected between 2013 and 2018 for 1601 recall patients at Maastricht UMC+ and 283 patients at Gustave Roussy Institute for external validation. Lesions with a known status (malignant or benign) were delineated by a research assistant overseen by an expert breast radiologist. Preprocessed low-energy and recombined images were used to train a DL model for automatic lesion identification, segmentation, and classification. A handcrafted radiomics model was also trained to classify both human- and DL-segmented lesions. Sensitivity for identification and the area under the receiver operating characteristic curve (AUC) for classification were compared between individual and combined models at the image and patient levels. Results After the exclusion of patients without suspicious lesions, the total number of patients included in the training, test, and validation data sets were 850 (mean age, 63 years ± 8 [SD]), 212 (62 years ± 8), and 279 (55 years ± 12), respectively. In the external data set, lesion identification sensitivity was 90% and 99% at the image and patient level, respectively, and the mean Dice coefficient was 0.71 and 0.80 at the image and patient level, respectively. Using manual segmentations, the combined DL and handcrafted radiomics classification model achieved the highest AUC (0.88 [95% CI: 0.86, 0.91]) (P < .05 except compared with DL, handcrafted radiomics, and clinical features model, where P = .90). Using DL-generated segmentations, the combined DL and handcrafted radiomics model showed the highest AUC (0.95 [95% CI: 0.94, 0.96]) (P < .05). Conclusion The DL model accurately identified and delineated suspicious lesions on CEM images, and the combined output of the DL and handcrafted radiomics models achieved good diagnostic performance.

    euque MPL, Lobbes MBI, van Wijk Y, Widaatalla Y, Primakov S, Majer M, Balleyguier C, Woodruff HC, Lambin P. Combining Deep Learning and Handcrafted Radiomics for Classification of Suspicious Lesions on Contrast-enhanced Mammograms. Radiology. 2023 Jun;307(5):e221843. doi: 10.1148/radiol.221843. PMID: 37338353

  • Masking Language Model Mechanism with Event-Driven Knowledge Graphs for Temporal Relations Extraction from Clinical Narratives.

    Uma, K., Francis, S., & Moens, M. F. (2023).

    For many natural language processing systems, the extraction of temporal links and associations from clinical narratives has been a critical challenge. To understand such processes, we must be aware of the occurrences of events and their time or temporal aspect by constructing a chronology for the sequence of events. The primary objective of temporal relation extraction is to identify relationships and correlations between entities, events, and expressions. We propose a novel architecture leveraging Transformer based graph neural network by combining textual data with event graph embeddings for predicting temporal links across events, entities, document creation time and expressions. We demonstrate our preliminary findings on i2b2 temporal relations corpus for predicting BEFORE, AFTER and OVERLAP links with event graph for correct set of relations. Comparison with various Biomedical-BERT embedding types were benchmarked yielding best performance on PubMed BERT with language model masking (LMM) mechanism on our methodology. This illustrates the effectiveness of our proposed strategy.

    Uma, K., Francis, S., & Moens, M. F. (2023). Masking Language Model Mechanism with Event-Driven Knowledge Graphs for Temporal Relations Extraction from Clinical Narratives. In International Conference on Complex Networks and Their Applications (pp. 162-174). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-53468-3_14

  • Toward human-level concept learning: Pattern benchmarking for AI algorithms

    Holzinger, A., Saranti, A., Angerschmid, A., Finzel, B., Schmid, U., & Mueller, H. (2023).

    Artificial intelligence (AI) today is very successful at standard pattern-recognition tasks due to the availability of large amounts of data and advances in statistical data-driven machine learning. However, there is still a large gap between AI pattern recognition and human-level concept learning. Humans can learn amazingly well even under uncertainty from just a few examples and are capable of generalizing these concepts to solve new conceptual problems. The growing interest in explainable machine intelligence requires experimental environments and diagnostic/benchmark datasets to analyze existing approaches and drive progress in pattern analysis and machine intelligence. In this paper, we provide an overview of current AI solutions for benchmarking concept learning, reasoning, and generalization; discuss the state-of-the-art of existing diagnostic/benchmark datasets (such as CLEVR, CLEVRER, CLOSURE, CURI, Bongard-LOGO, V-PROM, RAVEN, Kandinsky Patterns, CLEVR-Humans, CLEVRER-Humans, and their extension containing human language); and provide an outlook of some future research directions in this exciting research domain.

    Holzinger A, Saranti A, Angerschmid A, Finzel B, Schmid U, Mueller H. Toward human-level concept learning: Pattern benchmarking for AI algorithms. Patterns (N Y). 2023 Jul 5;4(8):100788. doi: 10.1016/j.patter.2023.100788. PMC: 10435961