Follow us on Twitter
Follow us on Linkedin
Latest News
YouTube
Automate Curation and Publishing of
Personal Health Data Through Artificial Intelligence

Deliverables

Filter by workpackage

Restricted access to deliverables

Some of the deliverables listed below contain sensitive information and are not publicly available. If you are interested in accessing any of these documents, please contact the AIDAVA management team at contact-us@aidava.eu. We would be pleased to discuss the terms under which these deliverables can be made available.

  • D1.1 Description of use cases

    Published Feb 2023, Public

    This deliverable clarifies the problem to be solved in AIDAVA, i.e. (1) develop a "data cleaning" machine with maximum automation in curation and publishing of personal healthcare data and (2) maximise engagement with the patients in the curation process of these data when automation is not possible.
    The deliverable then describes the 2 use cases for testing the prototype: (1) the first use case is hospital centric and will demonstrate the feasibility of a federated "EU" breast cancer registry composed by interoperable extracts issued form multiple different sources from each of the evaluation sites; (2) the second use case is patient-centric and will demonstrate how patient data curated into an individual longitudinal health record can be reused for visualising of the patient record and for computing a cardiac risk score supporting physicians in monitoring risk for their patients. We will also explore the possibility of presenting the risk score to the patient. Finally, the deliverable provides more information on the metrics that will support assessment of the prototype and identify key elements to be taken into account when deploying the prototype across sites.

    https://zenodo.org/records/10075525

  • D1.2 Report from user survey with personas canvas

    Published May 2023, Public

    The AIDAVA project's Task 1.2 (T1.2) aimed to better understand the different user groups of the AIDAVA "AI-powered data curation and publishing virtual assistant" by involving two patient organisations, hospitals and health data intermediaries (HDIs). Also, T1.2 assessed patients' and citizens' interest and willingness to control and curate their personal health data. To achieve this goal, the project team developed 8 personas based on 39 in-depth interviews, consisting of 2 patient personas, 2 data user personas, 2 data curator personas, and 2 third-party app developer personas. Foundation documents and persona canvases were created for each persona. Additionally, a survey was conducted with 250 participants to determine the general willingness of citizens to use a virtual assistant and what functionalities they would like the AIDAVA virtual assistant to have.

    This information will be used to support the user-centred development of an AI-based data curation and publishing assistant. The personas will help developers empathise with different user groups, leading to more user-centric decisions. It will also serve as the foundation of the explainability and feedback layer for the user interface for patients - based on user profiles gathered when the user starts using the system for the first time. The personas will complement the business requirements specified in deliverable D1.3, and the default user profiles can be based on the main characteristics of the personas.

    https://zenodo.org/records/10593217

  • D1.3 Business requirements for G1

    Published Jun 2023, Public

    This document provides a detailed description of the business requirements – functional and non-functional requirements - of the AI-powered data curation and publishing assistant, aimed at supporting patients (and expert curators) in managing and cleaning their personal health data – in compliance with ethical and regulatory requirements.

    The first objective of the AIDAVA project is to demonstrate that the prototype works in a realistic - though strictly controlled through an assessment protocol approved by the local ethical committees as described in Deliverable D1.4 - environment, considering data privacy and regulatory constraints. The second objective is to develop a solution that can be transformed into a full-fledged product, including MDR certification; to meet this second objective we decided to keep all requirements that were captured, and to indicate if they were in scope of the prototype or only in scope of the future product. While the product related requirements will not be developed during the AIDAVA project, it is expected that the technology architects will take these into account when defining the architecture of the system and ensure the prototype can smoothly evolve toward a marketable product. The requirements were gathered through a structured approach.

    First, the user journey was defined. It includes the following steps: registration and logging, upload and ingestion of patient personal data from different sources, integration & curation of data from these sources, use of the resulting curated data, deletion of account (and data). Second, the requirements were gathered along the steps of this user journey for the different users - and potential customers - identified for the prototype: patient, expert curator, data users but also administrator and third-party app developers. Capture of requirements took place mainly through structured workshops and on-line meetings across the different sites.

    • The requirements across the users were then consolidated and clustered in epics, defined in Deliverable D2.3 Solution Design; duplicate requirements across user groups were deleted.
    • Each consolidated requirement was then provided with a level of severity (blocking/crucial, major, minor, out of scope) indicating the importance of having the requirement successfully developed. While assessing the severity, specific attention was given on data privacy, security, and regulatory process as well as on patient needs and acceptability.
    • Finally, the team identified the need to have the requirement in the prototype or only in the future product.

    A total of 596 requirements were gathered with the different users; after consolidation across users and prioritisation, 277 requirements were considered as needed for the prototype (46 blocking, 178 major, 53 minor) and 99 additional ones were considered in scope for a product. As the need for information and documentation came regularly while gathering requirements, the content of four different documents was drafted.

    The content of the business requirements will be consolidated with the automation requirements from Task 2.1 and from quality management requirements from Task 4.2 and transformed by the development team into features-centric user stories to be further used as the basis of the development.

    The content of the deliverable will be re-assessed – and potentially adapted – after the evaluation of Generation 1 of the prototype to deliver Generation 2.

    https://zenodo.org/records/10075580

  • D1.4 Definition of assessment study including test scenarios & metrics, and study initiation package

    Published Nov 2023, Public

    The AIDAVA prototype will be delivered in 2 generations: Generation 1 in Q3 2024 and Generation 2 in Q2 2026. It will be tested in 4 hospitals and 2 Health Data Intermediaries, with 45 patients respectively per therapeutic area across all sites (90 patients for the 2 therapeutic areas in scope). This deliverable includes the description of the 4 documents developed to support the execution of this assessment study of the two generations of the AIDAVA prototype in an ELSI compliant way, with a minimum burden for the patients and the sites.

    The first document - and the most important one - is the study protocol (Annexe 1); it starts with a synopsis of the study and includes a description of the objectives of the study, the specification of the primary and secondary endpoints, the study schedule with the different activities to take place during the evaluation of the prototype across the 2 generations (including the washout period between the 2 generations), the study population with eligibility criteria, the data points to be collected with associated data collection forms (in RedCap) and the statistical analysis.

    Another important document, related to the protocol is the English version of the Study Information Package and Informed Consent Form (Annexe 2) to be translated by each site and provided to patients during the recruitment process.

    The third document includes a training plan (Annexe 3) for the patients participating in the evaluation and for the study team. It includes a specification of the different modules and a training program for the participants of the study, based on their role.

    The final document is a template Data Sharing agreement (Annexe 4), to be adapted and finalised by each site, including guidance for technical and legal provisions.

    The deliverable also includes description of work that was conducted with the help of Health Data Intermediaries (HDI) who helped to identify vendors who would provide a patient app application (to collect Quality of Life information) and a blood pressure medical device to be used during the study; the collected data will be managed by the HDI and provided to AIDAVA for integration in the patient record.

    We also provide an overview on the feedback provided by the patients´ consultants for the different materials mentioned above, and specify the study design with the schedule of activities as well as the Study Information Package and the Informed Consent Form.

    https://zenodo.org/records/12927925

  • D1.5 ELSI templates filled with midterm recruitment report

    Published June 2024, sensitive

    This deliverable presents a project management approach to tracking the progress across the AIDAVA pilot sites in obtaining the permissions that they each require in order to provide data and patients to contribute to the project research. Section 1 of this report summarises why these project management templates are useful to the overall smooth running and governance oversight of the project. It also summarises how they were developed. Section 2, the results of the work, presents the templates to be used for three important time points in the project. The first is tracking the permissions to obtain extracts of electronic health record information that can be used for NLP and AI training, in order to develop the data extraction and processing tools that AIDAVA will use to enrich the cardiovascular prevention and breast cancer data in the project. This progress (now completed) is summarised in section 2.1. The second time point is obtaining the permissions to recruit patients who are willing to allow their EHR data to be used by AIDAVA to improve its computability and quality and to engage those patients (and/or their caregivers) in data curation. These templates are presented in section 2.2. The third time point is the actual process of patient recruitment, where the templates are quite different and capture the progress in accruing patients who will contribute their records and their own efforts in data curation, presented in section 2.3. This deliverable presents initial content from the sites, not considered to be definitive of the permissions needed or recruitment activity at each site but providing their current in-progress status and illustrative of how they will be maintained. This document will be maintained as a living document by the PMO, progressively updated per site, about their permissions and recruitment. It is a governance instrument to ensure the project is conducting appropriate pilot studies.

  • D1.6 G1 installed for testing across all organisation

    Published August 2024, sensitive

    This deliverable presents a project management approach to tracking the progress across the AIDAVA pilot sites in obtaining the permissions that they each require in order to provide data and patients to contribute to the project research. Section 1 of this report summarises why these project management templates are useful to the overall smooth running and governance oversight of the project. It also summarises how they were developed. Section 2, the results of the work, presents the templates to be used for three important time points in the project. The first is tracking the permissions to obtain extracts of electronic health record information that can be used for NLP and AI training, in order to develop the data extraction and processing tools that AIDAVA will use to enrich the cardiovascular prevention and breast cancer data in the project. This progress (now completed) is summarised in section 2.1. The second time point is obtaining the permissions to recruit patients who are willing to allow their EHR data to be used by AIDAVA to improve its computability and quality and to engage those patients (and/or their caregivers) in data curation. These templates are presented in section 2.2. The third time point is the actual process of patient recruitment, where the templates are quite different and capture the progress in accruing patients who will contribute their records and their own efforts in data curation, presented in section 2.3. This deliverable presents initial content from the sites, not considered to be definitive of the permissions needed or recruitment activity at each site but providing their current in-progress status and illustrative of how they will be maintained. This document will be maintained as a living document by the PMO, progressively updated per site, about their permissions and recruitment. It is a governance instrument to ensure the project is conducting appropriate pilot studies.

  • D1.7 Report on G1 Performance

    Submitted January 2025, sensitive

    This document presents the results of the evaluation of Generation 1 of the AIDAVA prototype, which took place during the period of July to early December 2024. A number of activities took place to prepare evaluation with site patients including

    ● Elaboration of a study protocol, including study information package and Informed Consent Form for the patients; ● Submission and approval of the study protocol to the local ethics committee - including data privacy impact assessment whenever relevant and signature of a data sharing agreement with the health data intermediary related to the site; ● Dry-run workshop with the patient consultants in May 2024 to prepare the clinical study team for the evaluation.

    The prototype was deployed at the end of June 2024 in each site, with continuous support during the evaluation period. Screening and recruitment started a few months before deployment; it slowed down during the holidays by lack of time from the clinicians. The First Patient First Visit took place mid-July and the Last Patient Last Visit took place mid December. Consolidation of qualitative and quantitative findings - through REDCap forms related to the study protocol - started mid December and was finalized mid January. At the end of the study, each site finalised the evaluation forms and access to the AIDAVA platform was frozen for all participants. The objective for Generation 1 of the AIDAVA prototype was to integrate existing curation tools with the front end prototype to estimate the current situation. However, only existing OCR and German based NLP tools were considered as usable with enough reliability and the system would not have been acceptable for the patients as such. This forced the development team to include suboptimal open source tools and very early prototypes of new tools being developed for Generation 2. The deployment of the AIDAVA prototype requires initial configuration with description of the data extracted into a forma Data Transfer Specification; this was a time consuming process as information was not readily available. Deployment of the prototype itself, including integration of the Health Data Intermediary, was however smooth with regular improvements. The overall performance was - as expected - suboptimal and raised concerns from the patients' perspective (e.g., unclear and burdensome questions raised by the system, lack of direct benefits) and from the data users (e.g., lack of usable high-quality data to support any decision). Despite the suboptimal performance of the systems, the acceptance of the system by the users was slightly above medium and the patients perceived the importance and value of the prototype while realizing that it was only a first step in what should be ultimately deployed. In addition the evaluation was extremely useful for the project teams: the site study teams learned how to configure the system and how to run the evaluation process, the development team crystallised several technical issues to be solved or improved for Generation 2 to be deployed by end 2025. While Generation 1 is suboptimal, it demonstrates that there is true potential for automation in data curation of health data into an harmonised semantic standard, under the form of a Personal Health Knowledge Graph. The improvements in the curation tools planned for Generation 2 are needed to demonstrate the full value of the approach; it will be the focus of the project in the next 16 months.

  • D1.8 Update of Report (D1.2) from user survey with personas canvas

    Submitted January 2025, public

    This Deliverable D1.8 provides an update of Deliverable D1.2 and supersedes D1.2. While Deliverable D1.2 described the results of the work of the first phase of Task 1.2 of the AIDAVA project, Deliverable D1.8 completes this by adding the results of the work of the second phase of Task 1.2. The AIDAVA project's Task 1.2 (T1.2) aimed to better understand the different user groups of the AIDAVA "AI-powered data curation and publishing virtual assistant" by involving two patient organisations, hospitals and health data intermediaries (HDIs). Also, T1.2 assessed patients' and citizens' interest and willingness to control and curate their personal health data. In the first phase of Task 1.2, the project team developed, based on 39 in-depth interviews, 8 personas consisting of 2 patient personas, 2 data user personas, 2 data curator personas, and 2 third-party app developer personas. Foundation documents and persona canvases were created for each persona. Additionally, a survey was conducted with 250 participants to determine the general willingness of citizens to use a virtual assistant and what functionalities they would like the AIDAVA virtual assistant to have. In the second phase of Task 1.2, the project team collected and analysed the findings and lessons learnt from the AIDAVA G1 prototype assessment study in the three test sites to identify all aspects of relevance for the update of the AIDAVA personas. The update of the AIDAVA personas according to the insights gained from the AIDAVA G1 prototype assessment study resulted in nine personas: One additional patient persona was elaborated to represent those, who are working with AIDAVA (only) on a smartphone. Further major updates of the patient personas concerned the patients’ approaches for uploading documents to AIDAVA through the Health Data Intermediary and the patients’ strategies for overcoming problems when using AIDAVA. Major updates of the data curator personas were related to their skills, motivation and challenges with respect to the job of an AIDAVA data curator. Only minor adjustments were needed for data user and third-party app developer personas. Additionally, to update the results regarding the general willingness of citizens to use a virtual assistant for automatic data curation, the survey was re-run to obtain the opinion of a broader group of respondents. The survey results of Phase 2 were strikingly consistent with the results of Phase 1, indicating that the trends collected are representative even with a small survey (± 250 people). This information will be used to support the user-centred development of an AI-based data curation and publishing assistant. The personas will help developers empathise with different user groups, leading to more user-centric decisions. It will also serve as the foundation of the explainability and feedback layer for the user interface for patients - based on user profiles gathered when the user starts using the system for the first time. The personas will complement the business requirements specified in Task 1.3, and the user profiles can be based on the main characteristics of the personas.

    https://zenodo.org/records/14930632

  • D1.9 Update of business requirements (D1.3) for G2

    Submitted March 2025, Public

    This document describes the activities undertaken in Phase 2 of Task 1.3 of the AIDAVA project. It provides an update of Deliverable D1.3 based on the evaluation of achievement (or not) of the requirements, which were defined in Phase 1 of Task 1.3 for the AIDAVA prototype, as well as an updated list of business requirements and related acceptance criteria elaborated for Generation 2 of the AIDAVA prototype (AIDAVA G2). To achieve this, activities in Phase 2 of Task 1.3 included the following steps:

    Evaluation of requirements met in AIDAVA G1. The acceptance criteria fulfilment assessment revealed that 85% of the ‘blocking’, 52% of the ‘major’ and 38% of the ‘minor’ acceptance criteria, which were elaborated in Phase 1 of Task 1.3 and deemed relevant for an AIDAVA prototype, were already fulfilled by the Generation 1 of the AIDAVA prototype (AIDAVA G1).

    New requirements and related acceptance criteria derived on the experiences and findings from the AIDAVA G1 prototype assessment study (see Deliverable D1.7. Report on G1 performance).

    New requirements and related acceptance criteria were formulated based on the results of analysis of legal provisions in WP4 (see Deliverable 4.8. Regulatory Conformance Analysis of AI Development Pipeline).

    A consolidated list of requirements and acceptance criteria was compiled based on the results of the previous steps: starting from those acceptance criteria which were defined in the first phase of Task 1.3 for an AIDAVA prototype and not yet fulfilled by AIDAVA G1, new requirements and acceptance criteria, which evolved from practical experience with AIDAVA G1 development and testing as well as from explainability needs, were added.

    All acceptance criteria in the consolidated list were assigned a level of severity (blocking, major, minor) from users’ point of view so that the resulting list of business requirements and acceptance criteria can give guidance for the development of the AIDAVA G2 prototype. The updated consolidated business requirements list for the AIDAVA G2 prototype comprises 198 requirements (110 defined for G1 and not fulfilled, 88 new as a result of the G1 evaluation) and 355 related acceptance criteria (34 blocking, 209 major, 112 minor). 47% of the acceptance criteria were formulated from the patients’ perspective. This is in line with AIDAVA seeing patients as the main user group for the ‘AI-powered virtual assistant for health data curation and publishing’. This updated list of business requirements and related acceptance criteria will be further used as the basis for development of the AIDAVA G2 prototype. GND will integrate these business requirements with the technical requirements identified through the AIDAVA G1 development and testing, discuss with all AIDAVA development partners, and prioritise from the developers’ point of view. This process will lead to an improved AIDAVA G2 prototype as per the specified requirements.

    https://zenodo.org/records/15433239

  • D2.1 Global Data Sharing Standard

    Published May 2023, Public

    Ontologies are increasingly used to support harmonisation of population data from heterogeneous data sources in support of clinical research, with a specific research question requiring a well defined dataset. AIDAVA is exploring the possibility of using an ontology to harmonise all patient data, extracted from heterogeneous data sources, into an individual personal health knowledge graph (PHKG) that can then be reused for multiple purposes, in clinical care and clinical research.

    The decision to take an ontology approach in AIDAVA, rather than to follow a structural standard such as an information model, was made already at proposal time as ontologies are semantic rich and agnostic of structural and syntactical formats, increasing potentially of interoperability and reuse in compliance to the FAIR principles. Moreover, new knowledge can be added smoothly by extending the ontology concepts with RDF triples and data quality constraints through SHACL rules.

    Development of the AIDAVA Reference Ontology followed a structured approach including ideation, requirement analysis, design and development. The requirements took into account the use cases developed in WP1, the requirements extracted from the automation phases described in Task 2.1 and the annotation process described in Task 4.3. The data quality constraints were built in alignment with Task 4.2. We identified 4 Ontology Strategic Requirements and 6 Ontology Requirement Specifications that provided directions for the design and the developement of the ontology.

    A critical aspect of an ontology like the AIDAVA Reference Ontology to comply with FAIR principles as effectively as possible is to maximise alignment with emerging and existing standards. While reviewing the work on semantic interoperability of related initiatives, including TEHDAS and the European Electronic Health Record exchange format (EEHRxf), we came to the conclusion that SNOMED CT and LOINC were priority standards to be included. However they need to be completed by other standards to cover additional relationships and other domains. Several candidates were considered and it was decided to include the semantics subsumed in the HL7 FHIR General Purpose Data Types, and relevant HL7 FHIR profiles through the governance process, as second priority. We expect that other semantic standards will be required to achieve the long term objective of the AIDAVA Reference Ontology to cover a majority of medical concepts contained in personal health medical records.

    This deliverable also describes the technical specification of the AIDAVA Reference Ontology, which defines the structure, components, and relationships within the scope of the two targeted use cases (Breast cancer registry and Cardiovascular score) and in a broader context (ensuring semantic interoperability across systems). It includes a formal representation of the concepts, entities and their attributes, which are specified in the AIDAVA Dataset.

    While developing the ontology, we realised that additional concepts and relationships as well data quality constraints will need to be added when data sources to be curated will be onboarded across sites, and when more narrative texts will be annotated. This requires a governance process to be executed during the project lifetime, as described in Section 3.4. In addition, and assuming the project will be successful, governance will also be needed beyond the project to maximise sustainability and reuse of the results. While is not in scope of this deliverable, the proposed approach is introduced here; it will be discussed extensively during the planned meetings with the Sustainability Advisory Board.

    https://zenodo.org/records/10117718

  • D2.2 Details on data curation & publishing process

    Published June 2023, Sensitive

    The document provides a comprehensive outline of AIDAVA's approach to the onboarding and (automatic) curation of patient health data sources into a Personal Health Knowledge Graph (PHKG). This document is the outcome of extensive literature reviews, consultation between project partners, and proof of concept build and testing; with a specific focus on automation, data interoperability, data quality and privacy compliance.

    The document details the current limitations and interoperability challenges with respect to health data and health data systems, highlighting the need and potential applications for the AIDAVA approach towards curating such data and publishing an accurate and complete PHKG without compromising data privacy or accuracy.

    Furthermore, the specific limitations and interoperability challenges are classified and detailed alongside the assumptions made and (potential) open-source tools which can be integrated into the AIDAVA platform, thus minimising the manual overhead for patients and data stewards involved in the onboarding, curation and publishing processes. The bulk of the document describes the proposed workflows supporting maximum automation of these processes. During the development phase - as part of Task 5.2 - these workflows will probably require adaptations.

  • D2.3 Solution Design Document for G1

    Published Jun 2023, Public

    Availability of integrated, high-quality personal health data (PHD) remains limited, with impact on quality & cost of care and limiting possibilities for research and analytics. Indeed, PHD is currently distributed, heterogeneous, captured through different modalities, with variable quality. Findability and Accessibility of this data – following the FAIR principles – is addressed in numerous projects; Interoperability and Reuse remains a challenge due to several factors that are addressed by the intelligent virtual assistant being prototyped in the AIDAVA project. Concretely, the objective of the project is to maximise automation in data curation & publishing3 of heterogeneous PHDs while empowering individuals – patients or their deputies and data curators – when automation is not possible due to lack of contextual information. Through the data curation workflows (Deliverable 2.2. Data curation and publishing process), the AIDAVA virtual assistant prototype is expected to transform each patient's PHD into a Personal Health Knowledge Graph (PHKG). All PHKGs will be generated in compliance with the AIDAVA Reference Ontology (Deliverable 2.1. AIDAVA Reference Ontology as a Global Data Sharing Standard). From the PHKGs and the mapping information contained in the Reference Ontology, the publishing module will generate different target outputs as required by different use cases (see Deliverable 1.1. Description of Use Cases). This deliverable focuses on the solution design of the AIDAVA prototype virtual assistant. The solution includes a backend and a frontend. The backend includes foundational components such as the user directory, a master data reference repository, the catalogue of data sources with metadata supporting automation, a library of curation tools used in the automation workflows, the reference ontology which is the standard of reference for each PHKG, the repository of patient data – from raw format to PHKG and published format in the target standard – and the overarching orchestration module that supports automation and interaction with the end users. The frontend is the module that interacts with the end users. In Generation 1 of the AIDAVA prototype, user interaction will be minimum; in Generation 2, the user interface will build on advanced technologies from human-computer interaction, with explainability to facilitate understanding of the questions raised by the virtual assistant during the curation process. Explanations will be tailored to users categorised through different user profiles as identified in Deliverable D1.2. Report from user survey with personas canvas.

    These different components are described in the solution architecture, with the related Epics, in turn consolidated in Initiatives for the development team. In addition, the first 2 levels of a formal description of the system - based on the C4 model - is provided; an in-depth description of lower levels of the C4 models will be developed in Deliverable D3.1. VA Architecture.

    This deliverable also describes the proposed support model to be implemented when evaluating the prototype across the different sites. Finally, this deliverable introduces potential target customers. A full market analysis, with in-depth analysis of customers, market size and market potential for a product that could be developed on the result of the AIDAVA project will be provided in Deliverable 2.4, as an updated version of this deliverable, after evaluation of the first generation of the prototype.

    https://zenodo.org/records/10245773

  • D2.4 Solution Design AIDAVA (G2)

    Submitted May 2025, Public

    This document aims to be a reference point for organisations interested in improving and potentially productizing AIDAVA like solutions—after the end of the project—and who need to understand the different elements included in the AIDAVA architecture. Availability of integrated, high-quality personal health data (PHD) remains limited, with impact on quality & cost of care and limiting possibilities for research and analytics. Indeed, PHD is currently distributed, heterogeneous, captured through different modalities, with variable quality. Findability and Accessibility of this data, following the FAIR principles, is addressed in numerous projects; Interoperability and Reuse remains a challenge due to several factors—heterogeneity, narrative content, inconsistencies, errors...—that are addressed by the intelligent virtual assistant being prototyped in the AIDAVA project. Concretly, the objective of the project is to maximise automation in data curation & publishing3 of heterogeneous PHDs while empowering individuals—patients or their deputies and data curators—when automation is not possible due to lack of contextual information. Through a structured data curation workflow (Deliverable 2.2. Data curation and publishing process), the AIDAVA virtual assistant prototype aims to transform each patient's PHD into a Personal Health Knowledge Graph (PHKG). All PHKGs are generated in compliance with the AIDAVA Reference Ontology (Deliverable 2.1. AIDAVA Reference Ontology as a Global Data Sharing Standard [1]). From the PHKGs and the mapping information contained in data source description (Deliverable 3.3. Populated Data Source catalogue for testing), the publishing module is able to generate different target outputs required by different use cases (see Deliverable 1.1. Description of Use Cases [2]). The AIDAVA prototype is being released in two generations: Generation 1 (G1) was released in June 2024 and assessed in 4 clinical sites (see Deliverable 1.7. Report on G1 performances); Generation 2 (G2) will be released in December 2025 and assessed early 2026. This deliverable builds on the information gathered from G1 (see Deliverable D2.3 Solution Design for G1 [3]) to derive the final solution design and Go To Market approach for G2. The high level architecture is similar between G1 and G2, the main difference between the two generations lies in the capabilities and maturities of the data curation tools supporting automation, and the capabilities of the Human-In-The-Loop (HTIL) module. As a consequence, the first main contribution of this updated deliverable is the description of the integration of newly developed curation tools (see Section 4.3). While the HTIL components are being drastically improved in G2, it does not change the overall architecture and therefore there is no update in this document; more details on HTIL can be found in Deliverable 5.2. Analysis on explainability for different skill levels; delivery of explainability module for G2. Other sections have been updated following evaluation of G1. The second contribution of this deliverable relates to the Go to Market approach which has been significantly expanded following lessons learned from G1 and discussion with the AIDAVA Sustainability Advisory Board (Deliverable 6.8. Recommendations from SAB).

    https://zenodo.org/records/15583203

  • D3.1 VA architecture (Application and Technical)

    Published Sep 2023, Public

    The objective of the AIDAVA project is to prototype an intelligent virtual assistant that will maximise automation in data curation & publishing of heterogeneous personal health data while empowering individual patients when automation is not possible due to lack of contextual information. The solution includes a backend and a frontend described in the solution design (see Deliverable D2.3. Solution Design). This deliverable focuses on the technical and data architecture of the AIDAVA prototype and on its deployment, integration and testing with the different evaluation sites. As the consortium intends to develop a reusable prototype, we first clarify the difference between product and prototype and confirm the importance of taking into account product constraints in the technical architecture to ensure reuse. We also define a set of architecture principles that guided the elaboration of the deliverable. The technical framework relies on a microservices-based structure, encompassing numerous satellite applications and curation tools. These components will be seamlessly integrated to facilitate the automation process using predefined workflows that incorporate workflow orchestration tools. Additionally, the architecture encompasses connectivity with various medical partners. In this context, the system will acquire input data from file shares or databases and establish connections with health data intermediaries, receiving data either through API endpoints or by utilising SDKs. The data architecture expands on the components identified in the solution design deliverable, from an implementation perspective. As the AIDAVA project aims to test the solution in real life with real patients consenting to manage and curate their data, an important part of this deliverable relates to integration in the different evaluation sites, the needed hardware as well as deployment and testing.

    This technical architecture is the consolidation of 1 year of efforts across different teams. It provides the consortium with a solid description of the solution that needs to be implemented to successfully meet the objectives of the project. While there are challenges ahead, there is confidence that the first generation (G1) of the prototype can be successfully developed and deployed. The technical architecture - and this document - will be updated for Generation 2 (G2) of the prototype, taking into account the results of the evaluation by patients and clinical sites, the need to integrate more powerful NLP curation tools and an improved human computer interaction front end developed in other work packages of the project.

    https://zenodo.org/records/10593465

  • D3.2 Epics and user stories for G1

    Published Nov 2023, Sensitive

    The project has made significant strides in understanding and addressing the needs of the targeted end-user groups, including patients, expert curators, and data users, among others. Through extensive interviews and collaboration, in “Deliverable No. 1.3 – Business requirements for G1”, a comprehensive list of Business Requirements (BR) and corresponding Acceptance Criteria (AC) emerged. This process has allowed us to gain valuable insights into the specific expectations and functional as well as non-functional requirements of each user group.

    To enhance the clarity and manageability of these BRs, we have categorized them into well-defined action initiatives3 identified in D2.3 Solution Design for G1, among which the most important are the ingestion and curation. This categorization enables the prototype developers to understand better the desired functions and functionalities for each aspect of the application.

    The AC associated with each BR has undergone thorough refinement and validation by the prototype owner group members and the prototype development representatives. This process ensures that the solution aligns precisely with the expectations and standards set by the end users.

    To gain a holistic view of the distribution of BR across different actors and epics, we have transferred the BRs from an Excel spreadsheet to a visualization app, called FigJam. This allows us to identify areas with the most BR and make informed decisions about resource allocation and prioritization.

    In terms of tracking our work progress, we have established links between the BRs and ACs first to the user interface or architecture diagrams, then to user stories (US), and finally to user screens. This ensures that the development efforts are aligned with the overarching project goals and user expectations. The Work Breakdown Structure (WBS) is designed to provide a clear hierarchy of tasks. Epics define high-level requirements, US break down each Epic into specific user screens or technical components, and Tasks outline the specific development tasks required for completion. This structured approach streamlines the development process and project management.

    Throughout this process, the user stories have been fine-tuned based on feedback gained through iterative meetings with end-user representatives. This feedback-driven approach has been instrumental in ensuring that the application not only meets but exceeds user expectations.

    We have to refer to the user stories list, in Annex 1, as a living requirement list, meaning that as the development of the prototype will progress, potentially missing use cases will be identified. Those new use cases will have to be recorded, defined, estimated from complexity and maturity point of view, and finally scheduled for implementation based on the development roadmap.

  • D3.3 Populated Data Source Catalogue

    Submitted July 2024, sensitive

    The first objective of AIDAVA is to automate as far as possible the curation process, transforming heterogeneous health data into a single, harmonised Personal Health Knowledge Graph. To maximise automation, the system needs to have detailed information on the available data sources in a machine-understandable format. This information should be stored in a Catalogue of Data Sources. The deliverable explains the importance of the Catalogue of Data Sources in the overall architecture of the AIDAVA prototype and describes the three main components of the catalogue:

    ● technical representation metadata with the description of the schema,

    ● FAIR metadata - compliant with DCAT-AP and supporting the FAIR principles,

    ● mapping of the technical metadata with the AIDAVA Reference ontology and supporting transformation of data sources into the Personal Health Knowledge Graph.

    The catalogue is being populated by each evaluation site, based on the content of the Data Transfer Specification developed by each site as part of the Data Sharing Agreement. Mapping the technical representation metadata with the ontology is a tedious and difficult process if done manually. In addition, the mapping was complicated by the fact that Swiss Personal Health Network (SPHN) ontology, used as the basis of the AIDAVA Reference Ontology, required additions. To support the mapping process, several instruments were developed and used.

    ● A governance workflow in Github, supporting additions in the Reference Ontology, as described in Deliverable 2.1 [1].

    ● The Data Transfer Specification Validator checks conformance with the specification of the DTS before further processing.

    ● The Data Source Mapping tool supports mapping between the attributes provided in the Data

    Transfer Specification and the concepts defined in AIDAVA Reference Ontology. The Validator and the Mapping tool are further introduced in the deliverable. The catalogue prototype tool is presented in this deliverable with screen dumps. The evaluation sites inserted their Data Transfer Specification in the prototype with the help of partner GND. They are now in the process of mapping the data source with the AIDAVA ontology and downloading this mapping in the catalogue. The process is expected to be complete by the end of April. The implementation of the Catalogue of Data Sources made very clear that good and detailed documentation of data sources is not a simple process (which may explain at least partially the lack of interoperability of health data), yet it is of crucial importance, for any processing of health data and certainly for implementing the EHDS in an effective way.

  • D3.4 Populated library of curation tools (and publishing) for G1

    Published April 2024, sensitive

    The primary objective of AIDAVA is to enhance the efficiency of the curation process by automating the transformation of diverse health data into a unified and harmonised Personal Health Knowledge Graph (PHKG). In order to achieve optimal automation, the utilisation of curation tools identified through the data curation workflow - identified in Deliverable 2.2. Details on data curation & publishing process - is essential. These tools are expected to undergo evolution and refinement, and potentially changes, making it imperative to establish a repository dedicated to preserving pertinent information about them and supporting a smooth change management in the curation workflow process. Given that these tools are developed by different teams, standardisation and centralised storage with appropriate guidelines for integration within the AIDAVA prototype, become crucial. The establishment of a central library ensures that we can systematically monitor the integration and deployment status and versions of each tool within the system, with API’s specification. Additionally, the library facilitates tracking information such as the current licensing status and allows for seamless modifications to the input and output parameters of the curation tools. The Library of Curation Tools is seamlessly integrated into the AIDAVA Admin page, with exclusive access restricted to system administrators. For Generation 1 of the prototype it will contain seventeen different tools: three to support data source onboarding when setting up the system within a hospital, eleven to support the data curation workflow and three to support publishing of the Personal Health Knowledge in the three target outputs identified in the use cases.

  • D3.5 Imaging data curation tools in library for G1

    Submitted March 2024, sensitive

    Medical image curation and enhancement hold critical importance in the healthcare sector. They improve diagnostic accuracy by providing clearer images, which is crucial for detecting medical conditions with higher precision. High-quality, curated medical images facilitate better communication among healthcare professionals, leading to more cohesive and efficient treatment strategies. Within the AIDAVA project, Deliverable 3.5 plays a key role by integrating existing tools from ongoing European projects into the library of curation tools, for the purpose of medical image curation. This deliverable establishes vital connections with other crucial components of the project, including Work Package 5 (WP5), illustrating its integral role in the broader context of the AIDAVA initiative. For Generation 1 of the prototype, the objective was to use existing curation tools. Imaging in AIDAVA includes mammography and echocardiography, as specified in the use cases; tools are readily available only for mammography. This deliverable is consequently focusing on mammography. For Generation2, we will focus on echocardiography.

  • D3.6 G1 installed for testing

    Published June 2024, public

    AIDAVA is a prototype system that supports patients to (semi)automatically integrate and curate their personal health data. The prototype is focusing on hospital data and non-hospital data pooled through a Health Data Intermediary (HDI), across 2 uses cases : monitoring of cardiac risk score across patients with myocardial infarction treated across 3 different hospitals and interoperability of queries across breast cancer registries federated across 3 different hospitals. The system first allows patients to integrate all their data (non-hospital data from the HDI and hospital data and then helps the patient to curate these data into an interoperable and consistent format: the Personal Health Knowledge Graph). AIDAVA uses Artificial Intelligence (AI) technologies to support automation in this curation process; if the AI tools cannot solve an issue, the systems initiate a dialogue with the users. When the data are curated, AIDAVA can provide the patient's International Patient Summary (IPS), for CVD patients help cardiologists to compute a cardiac risk score, and for BC patients help researchers to answer queries across different sites. Generation 1 of the prototype has been developed following the user requirements gathered in Work Package 1, and includes multiple curation tools that are integrated through dedicated containers. While a lot of the functionalities are provided in the frontend of the prototype, the curation tools provided in the backend - that were intended to be based on existing tools in G1 and then replaced in G2 by tools newly developed during the project - are suboptimal. We therefore already included some of the newly developed tools in G1; these tools will however require optimization (as planned) for G2. The system has been rigorously tested, including unit tests for individual components, integration tests for combined functionality and mainly user tests conducted by GND's internal testers. It is being deployed across the 4 evaluation hospitals to undergo testing with real patients, with patient informed consent and following a strict evaluation protocol agreed by local Ethical committees. The results of this evaluation will be the basis of the update/improvement to be brought in Generation 2 of the prototype.

    https://zenodo.org/records/14035231

  • D4.1 Annotation guidelines, tools & training

    Published Dec 2023, Public

    Manual annotations of clinical narratives are crucial for the adoption and evaluation of NLP tools, which support an overall AI assisted data curation approach within the AIDAVA project. In the preparation phase - in scope of this deliverable - for the Task “T4.3 Manual Annotation of text documents in 3 languages”, and based on the data elements identified for the use cases cross border breast cancer patient registries, and longitudinal individual health records for patients at risk of sudden cardiac arrest, requirements for the manual annotation tool have been formulated. Grounded on the requirement analysis, INCEpTION was chosen to support the manual annotation task. A first manual annotation schema was developed and tested, with a focus on the use of SNOMED CT and FHIR for the normalized form of the entity types of interest. A first version of the annotation guidelines is drafted in this document and will be revised in close cooperation with the manual annotators at the three different clinical sides (Med Uni Graz with MUG, Northern Estonian Medical Center with NEMC, Maastricht Medical University Center with UM), AVER and ONTO during the piloting phase until Q1 2023.

    https://zenodo.org/records/10593537

  • D4.2 Data Management Plan

    Published Feb 2023, Public

    This Deliverable provides the Data Management Plan (DMP) for AIDAVA. It is based on the European Commission Template for Horizon 2020 projects available at https://ec.europa.eu/research/participants/data/ref/h2020/gm/reporting/h2020-tpl-oa-data-mgt-plan_en.docx.

    AIDAVA has populated this Data Management Plan in line with recommended EC guidelines. It will be updated as the project proceeds.

    A DMP is an important component of any data intensive programme because it imposes a need for balance between protection of data, success of the programme and the potential for reuse of data. AIDAVA is unique as a project because the primary data handling is focused on data ingestion and curation as a tool to assist citizens in managing their own health data.

    The approach to developing the data management plan has included workshop discussions with partners at the October 2022 Kick Off Meeting in Maastricht and a dedicated data flow workshop held in Tallinn in December 2023.

    The details gathered were compared with the proposal and obligations on the partners as described in the consortium agreement. They were also compared with the developing Research Protocols for both the Breast Cancer and Cardiovascular Disease (CVD) use cases developed in Task 1.4

    The results of the details gathered are presented as the Data Management Plan in Section 3 of this Deliverable. It concludes with the next steps and specification of updates in time for M40’s second version of the Data Management Plan.

    https://zenodo.org/records/10075333

  • D4.3 Update to Annotation guidelines, tools & training

    Published May 2023, Public

    Manual annotations of clinical narratives are crucial for the adoption and evaluation of Natural Language Processing (NLP) tools, which support an overall AI-assisted data curation approach within AIDAVA. For a symbolic representation of clinical entities of interest and the way how they are related, normalisations that use international standards like SNOMED CT, FHIR or LOINC are crucial. For this deliverable, we updated the first version of the manual annotation guideline (see AIDAVA Deliverable D4.1), where requirements for annotation tooling were formulated with respect to the AIDAVA use cases, together with some initial annotation instructions. Grounded on this requirement analysis, INCEpTION was chosen as an annotation tool after a rigorous investigation of available annotation software. A first manual annotation schema was developed and tested, with a focus on the use of SNOMED CT and FHIR for the normalisation of the types of clinical entities (as annotating them with terminology codes) referred to by clinical narratives. Within this preparation phase, INCEpTION was deployed on all three clinical sites (MUG, NEMC, MUMC), with a first version of a consolidated INCEpTION layer definition. A bi-weekly “train the trainers” session was started at the end of 2022, supporting a continuous transition into the piloting phase of the developed guideline, analysing example narratives and how they should be annotated according to the first version of the guideline. Within the piloting phase lasting from January 2023 to May 2023, in communication with the responsible clinicians, relevant attributes were identified, and a selection of them was used for updating, testing and refinement of the annotation guideline. Annotators were recruited at all three different sites and their feedback was taken into account for the customization and technical set up of INCEpTION. Alignment with Deliverable D2.1 "Reference Ontology as a Global Data Sharing Standard" defining the AIDAVA Reference Ontology was identified as crucial, therefore this deliverable was postponed for one month from April to May 2023.Building on the first version of the guideline delivered early January 2023, this updated descriptive guideline provides a comprehensive framework and detailed instructions to ensure accurate annotation of clinical narratives. It covers crucial aspects like data standardisation and best practices in annotation (including annotation tool, general principles, specific instructions, concrete examples, and quality control items), ensuring consistent, interoperable, and high-quality annotations. This is invaluable for effective knowledge graph construction, data analysis, and knowledge extraction as central requirements in AIDAVA.

    The updated annotation instructions form the core of this deliverable, enabling to start the productive phase of the manual annotation. Manual annotation of texts is iterative and dynamic. It is, therefore, crucial to recognise potential updates and improvements that may arise during the productive phase. Factors that can contribute to the modifications and enhancements of the set of annotation instructions include active feedback from the annotation team, new insights into text phenomena that lead to annotator disagreements, updates in data requirements from use cases, and evolution of project objectives as a result of dissemination and communication activities during the project. To ensure consistency and minimise inconsistencies in the annotation work, a structured feedback mechanism is established, involving documenting any challenges or updates in a shared document, and conducting meetings with the annotation team to address any emerging insights or challenges.

    https://zenodo.org/records/10593800

  • D4.4 Information governance framework and instruments

    Published Aug 2023, Public

    Deliverable 4.4 describes the Information Governance Framework for AIDAVA. Information Governance relates to regulatory compliance and risk management for information handling. It will also inform technical design and implementation, including security services such as access controls and encryption.

    Partner i~HD has therefore under Task 4.1 engaged with the Consortium to conduct the requisite information gathering and risk assessments to ensure high assurance around the handling of health information in line with key Information Governance principles. It has used the Data Protection by Design and Default approach provided by GDPR to engage with the Consortium early on to ensure that it defines the data flows to achieve the goals of AIDAVA, assesses the data protection, security and ethical risks of the project and defines the key instruments that will address them.

    The outcome of this is the Data Protection Impact Assessment template for the Consortium, which in turn has assisted with the production of a Data Management Plan published as D4.1. Both deliverables are based on Data Flow Diagrams initiated during a dedicated workshop held in December 2022 between WP1 and WP4, and further refined through joint meetings held throughout the project. The processes have allowed an agreement on the roles of the partners and on the contractual agreements required to govern AIDAVA with progress made on defining these contracts. The contracts themselves include a set of bilateral Data Sharing Agreements developed on a standard template defined within the consortium, including - whenever applicable - existing legal provisions and specific technical provisions; Data Processing Agreements were assessed as not needed.

    Task 4.1 has also offered advisory on submissions to Research Ethics Committees - for accessing patient data for annotation purposes and for assessing the prototype - and design choices for the project. The drafting of a Code of Practice is also underway and key challenges are being collated for submission to AIDAVA’s independent Ethics Advisory Board which will meet for the first time in early October.

    https://zenodo.org/records/10650565

  • D4.5 1st Report of the Ethics Advisory Board

    Published April 2024, public

    This Deliverable D4.5 provides the first report of AIDAVA’s Ethics Advisory Board of experts (EAB) . It summarises the main activities of the EAB and provides details of current and next steps in ethics oversight of AIDAVA. D4.5 provides details of the establishment of the EAB, its operation, membership, key points of focus and concludes with next steps. Key points of focus include considerations of bias in the recruitments of study participants and the digital divide, whether study participants and the Patient Consultants are representative of the patient population in terms of socio-demographics, whether the introduction of an AIDAVA solution places undue burden on patient participants and Curators within health care providers who are identified by AIDAVA as offering support. Annexes to this report include the Terms of Reference. AIDAVA faces and is addressing challenges posed by new regulation and technology use, unexplored patient involvement paradigms and forthcoming regulations. The need for a representative EAB including as many stakeholders views and expertise as possible is clear. The EAB has therefore been formed and is keen to commence work on reviewing AIDAVA. The EAB will meet as outlined in this report and will engage with AIDAVA in terms of the ongoing assessments and reviews as the solutions are developed and trialled. Further learning will be gleaned from attendance at the pilot assessments in May in Maastricht, and ongoing consideration and oversight as AIDAVA develops. Further liaison with the project partners and the SAB will enhance and enrich the understanding and the addressing of the challenges that AIDAVA faces.

    https://zenodo.org/records/14035276

  • D4.6 Definition of Data Quality Metrics

    Submitted July 2024, public

    Reusing poor quality data has limited value. When developing the requirements for the AIDAVA curation virtual assistant, data users repeatedly asked the same question: how reliable the data is. The answer differs depending on the state of the data - i) for data sources, a quality label can be established based on the quality level provided by the data holder — if available — including the credentials of the persons who created and validated the data; ii) for the curated data (i.e. the PHKG), the quality label will be linked to the quality from the source, the level of quality and certification of the curation tools used during transformation, the level of health and literacy of the humans who provided answers when there were semantic gaps, and the number of data quality checks that could not be resolved; iii) for published data, the quality label will be linked to the level of the curated data, the compliance with the target format, the completeness of the content, the absence of bias as well as the quality, reliability and certification of the imputation algorithm, if applicable. This document provides a detailed overview of AIDAVA deliverable 4.6, focusing on data quality and metadata across the health data life cycle. This deliverable serves as a key component in AIDAVA, aimed at developing a comprehensive data quality assessment methodology. This methodology is crucial for ensuring the reliability, transparency, and effective reuse of health data. The document highlights the importance of maintaining high standards of health data quality and incorporates data quality dimensions, methodologies, and tools. Furthermore, deliverable 4.6 is linked with other integral parts of the project, namely deliverables 1.3 (Business requirements for R1) [1], 1.4 (Definition of assessment study including test scenarios & metrics, and study initiation package) [2] , 2.1 (Global data sharing standard) [3], and 2.2 (Details on data curation & publishing process) (deliverable on request). These deliverables introduce SHACL (Shapes Constraint Language) rules and specific data quality guidelines, contributing for establishing data quality practices.

    https://zenodo.org/records/13758846

  • D4.7 Annotated datasets (3 languages/2 TA) with report

    Submitted May 2024, sensitive

    AIDAVA aims to create a coherent semantic framework for organising an individual's health data. To this end, it constructs a Personal Health Knowledge Graph (PHKG) based on a Reference Ontology, aligned with global standards for electronic health records (EHRs) and adhering to the FAIR principles of data management (Findable, Accessible, Interoperable, Reusable). As up to 80% of patient-based information still resides in unstructured (e.g. clinical narratives) or semi-structured data sources (e.g. a procedure report with a structured quantitative assessment and a narrative qualitative assessment), it is critical to maximise automation - using NLP tools - in the transformation of narrative information into the highly standardised and structured representation schema of a PHKG. Manual annotations of narratives are essential for model-based adaptation, downstreaming and testing of any specific NLP components. This final deliverable aims to report on the annotation datasets developed during the AIDAVA project, with both quantitative and qualitative statistics derived from manual annotations across all three clinical sites (MUG, UM, NEMC). These annotations follow the revised AIDAVA Annotation Guide (D4.3) [1] - centred around the Breast Cancer (BC) and Cardiovascular Disease (CVD) use cases, as outlined in AIDAVA Deliverable 1.1 – Description of Use Cases.

  • D4.8 Regulatory Conformance Analysis of AI Development Pipeline

    Submitted January 2025, Public

    This deliverable describes the regulatory conformance analysis of AIDAVA and its development pipeline to date and provides a review of likely criteria to meet the conformity requirements as the AIDAVA virtual assistant tool is developed. This analysis is timely because the second generation of the tool is about to undergo development and its assessment as part of a research study will be modified to address the updated features and learning from the first generation assessment. The analysis is also timely because of changes to pan-European legislation and regulatory frameworks, most notably with the arrival of the Artificial Intelligence (AI) Act and a renewed focus on preparation for its enforcement across the research and innovation fields. As an AI driven tool, the AIDAVA solution requires assessment in terms of the likely applicability of the AI Act, and how it might develop into a product with Market Authorisation in terms of the AI Act and potentially the Medical Device Regulation (MDR). This deliverable provides a description of the approach taken to analyse the existing and forthcoming regulatory compliance of the development pipeline, the results of the analysis and recommendations that are provided to assist the development processes in achieving AI Act and likely MDR compliance. It summarises secondary legislation and standards that are under development and will likely be of relevance once they are published over the next twenty-four months. The analysis has been conducted by focusing primarily on the AI Act using assessment frameworks and approaches as defined and outlined by the European Commission along with an internal analysis of AI Act compliance requirements that are not fully met by these assessment frameworks. The prime assessment framework used has been the Assessment List for Trustworthy AI (ALTAI), which has provided a foundation for the analysis and helped to underpin the recommendations as provided. The additional requirements derived from a comparison of the AI Act and ALTAI has also provided insight into the recommendations. The result has been to provide over thirty recommendations for the development pipeline in terms of achieving regulatory compliance. The majority of these relate to documentation, specifically around the AIDAVA tool’s lifecycle with regards to its AI components, and an update to its management and record keeping. Testing considerations, including the use of Regulatory Sandboxes ahead of additional real world tests, are also highlighted. This deliverable concludes that AIDAVA has achieved a significant milestone in regulatory compliance and stands in good stead to meet the requirements moving forward as they evolve. Next steps will be for WP4 to work closely with the developer teams to define documentation templates and update logging procedures by refining down the recommendations to achievable requirements in line with the other G2 requirements that are under development and due for completion in March 2025.

    https://zenodo.org/records/14938904

  • D5.1 Data quality check services

    Submitted July 2024, sensitive

    Reusing poor quality data has limited value, so understanding the reliability of data is essential. During the development of the AIDAVA curation virtual assistant, data users frequently inquired about the reliability of the data following the curation process. To answer this question, we have built automatic data quality checks to help curators correct data quality issues as part of the data curation process, and to measure the quality on the resulting curated ones. This deliverable provides a detailed description the implementation of data quality checks of curated data (i.e., Source Health Knowledge Graph - SHKG - and Personal Health Knowledge Graph integrating all SHKGs) and serves as a key component in AIDAVA, aimed at developing a comprehensive data quality assessment methodology. This methodology is crucial for ensuring the reliability, transparency, and effective reuse of health data. The document also highlights the importance of maintaining high standards of health data quality and incorporates data quality dimensions, methodologies, and tools. Furthermore, this deliverable is linked with other integral parts of the project, namely Deliverable 1.3 Business requirements for R11and Deliverable 4.6 Definition of Data Quality Metrics1,2 introducing SHACL (Shapes Constraint Language) rules and specific data quality guidelines, contributing to the establishment of data quality practices.

  • D5.2 Analysis on explainability for different skill levels; delivery of explainability module for G2

    Submitted January 2025, sensitive

    This document describes the work done in Task 5.3 regarding the explainability part of the interaction between users and the artificial intelligence (AI) (in the following called human-AI interaction) in AIDAVA, as well as the adaptation of explainability in AIDAVA to the user’s level of digital literacy. The goal was to design an explainability module for generation G2 of the AIDAVA prototype which is suitable for different digital literacy levels of the users, so that patients, expert curators and data users can build an appropriate level of trust in AIDAVA. In order to adapt the explainability module of AIDVA to the user’s individual level of digital literacy, information regarding this literacy level must be stored in the user profile. Thus, as a first step an instrument for assessment of an AIDAVA user’s digital literacy was developed. This “Digital Literacy Questionnaire for AIDAVA” was elaborated based on the findings from literature and considering the specific requirements of AIDAVA; it was drafted in English language, pre-tested with colleagues and patient consultants, and prepared for the participants in the AIDAVA G1 study in Dutch, Estonian and German language. As a preparatory step for the design of the explainability module for AIDAVA, scientific literature was reviewed to explore findings regarding explainability and digital literacy, and the explainability-related legal and regulatory provisions in Europe were identified. Based on these insights an initial list of explainability-related user needs was compiled in the form of prototypical user questions for AIDAVA. A small-scale “Explainability Survey” among people with different levels of digital literacy helped to confirm and complete the list of prototypical user questions and revealed valuable aspects of the explainability-related needs of people with different levels of digital literacy. Further insights into (future) AIDAVA users’ explainability-related preferences were gained from patients and expert curators, who participated in the AIDAVA G1 prototype assessment study in the three AIDAVA test sites. All these findings with respect to the relevant explainability-related needs and preferences of people with different digital literacy formed the basis for the design of the explainability module for generation G2 of the AIDAVA prototype. The explainability module for generation G2 of the AIDAVA prototype comprises three types of explanations: local explanations aiming to explain why AIDAVA generated a specific output, global technical explanations informing about technical aspects of the AI-based AIDAVA app, and global social explanations providing information about relevant socio-technical aspects of the AIDAVA app. Based on the identified explainability-related user needs and the insights gained from literature and AIDAVA G1 testing, the explainability module for AIDAVA G2 was designed as a modular concept with a layered structure of explanations, which can be displayed to the user on demand according to their needs. The design of the AIDAVA explainability module ensures that it is effective for users with varying digital literacy levels. The first explanation layer includes those explanatory topics which are of high relevance for all users, regardless of their digital literacy level. The links between the explanation layers align with the importance of topics for users with low, medium, and high digital literacy, allowing users to access content according to their preferences. The principle of progressive disclosure is applied, so that explanations start with concise, basic information, with additional details provided upon user request. This approach reduces cognitive load and prevents overwhelming users. Furthermore, to make explanations meaningful and understandable for users across all levels of digital literacy, two versions of content will be developed across all layers: basic explanations in simple, plain language for users with low digital literacy, and technical explanations providing more detailed content with technical terms for users with medium and high digital literacy. In the coming months the graphical user interfaces for the explainability module will be developed by GND following user centred design principles to incorporate regular feedback of the patient consultants, and relevant backend functionalities for the AIDAVA explainability module will be implemented in WP5 in close cooperation with the technical AIDAVA partners.

  • D5.3 Novel DL based tools to extract concepts from clinical text and generate multimodal PHKG

    Submitted October 2024, sensitive

    AIDAVA aims to make personal health data interoperable and ready for reuse at scale. Achieving this objective necessitates the integration of various data sources, and for unstructured text sources, advanced Natural Language Processing (NLP) tools are essential. Deliverable D5.3 concentrated on the development of innovative NLP tools designed to extract highly-structured health information from free-text clinical texts. Utilizing manually annotated data from three clinical sites in German, Dutch and Estonian, we successfully trained novel NLP models. These models demonstrated high accuracy in extracting clinical concepts with SNOMED CT codes from free-text clinical texts in all three languages, German, Dutch and Estonian. The NLP tools will be integrated into the AIDAVA platform and will undergo further refinement in future deliverables with regard to extracting relations and temporal information.

  • D5.4 Novel ML based FAIRification tools for structured data

    Submitted December 2024, sensitive

    The objective of the AIDAVA prototype is to maximise automation in the curation of heterogeneous individual data sources to generate a personal health knowledge graph, compliant with the AIDAVA Reference Ontology. Automation is based on a set of workflows resolving data interoperability issues that were identified in Deliverable D2.2. Details on data curation & publishing process; each workflow includes one or more (AI-based) transformation tools as well as a Human-In-The-Loop dialogue in case automation is not possible. In addition to the automation requirements, related to the workflow, we also took into account the list of business requirements that were specified in Deliverable D1.3. Business Requirements for G1. We identified 3 main types of tools related to data curation and data quality enhancement, including in the different workflows responsible for automation.

    1) NLP related to extract information from narrative text, in different languages,

    2) ML based (and more classical ETL type) FAIRification tools for structured data in heterogeneous formats and semi-structured data,

    3) Data Quality Checks to ensure consistency and completeness of the resulting record, integrated across multiple data sources.

    This deliverable is focusing on the FAIRification tools for structured and semi-structured data and more specifically on multilingual entity linking and matching, and entity deduplication across multiple sources and tabular data mapping. For each of these tools, we describe the state-of-the art, the approach used to develop the tools taking into account the additional complexity of multi-linguality, the specification for integration within the AIDAVA backend, initial results. We conclude with the work to be continued in the coming months before delivering the final version of these tools to be deployed within Generation 2 of the AIDAVA prototype.

  • D5.5 Knowledge Graph Embedding Algorithms

    Submitted January 2025, sensitive

    This Deliverable D5.5 “Knowledge Graph Embedding algorithms” presents the state of the art approaches on knowledge graph embeddings (KGE) models and discusses their possible modifications and adaptations to the AIDAVA use-cases. More specifically the deliverable presents several scenarios for leveraging these models in AIDAVA workflows, like link prediction for knowledge graph completion task, missing data imputation approaches for categorical and numerical values predictions, as well as entity alignment as an approach for ontologies mappings and integration of data into personal health knowledge graphs (PHKG) from heterogeneous sources. The activities reported in this deliverable are related to the development of novel artificial intelligence (AI) tools generating the embeddings of the personal health knowledge graph (PHKG) - generated by AIDAVA as the representation of personal individual longitudinal health record - to enable data fusion and to enhance natural language processing (NLP) algorithms developed Task 5.1, to enrich the PHKG by imputing missing links and values, and to enlighten complex relations which can help in identification of risk factors, treatment outcomes and adverse events prediction. Some of the developed services will be used as an assistant tool for data imputation and entity alignment. The developed tools and services will be integrated into the virtual assistant (VA) Generation 2 of the prototype.

  • D6.1 Public project website

    Published Feb 2023, Public

    This deliverable shortly describes the initial set-up of the AIDAVA public project website with screenshots of the main pages attached in the annex.

    https://zenodo.org/records/15433862

  • D6.2 Plan for dissemination & exploitation of results incl. communication (PDEC), updates in official report

    Published Feb 2023, Sensitive

    The Plan for the Dissemination and Exploitation of results including Communication activities (PDEC) is a strategic document, helping the project consortium to establish the basis for an intellectual property strategy, as well as develop and monitor specific communication, dissemination and exploitation activities. The PDEC is part of WP6 “Innovation Management: Communication, Dissemination, Exploitation and Sustainability”. This WP focuses on creating visibility and fostering outreach of AIDAVA through external communication of AIDAVA activities, progress and achievements, as well as dedicated disseminating activities and strategic planning of exploitation routes for potential results.

    This PDEC aims to act as a guide and organising framework for all work relating to dissemination and exploitation of the project’s results, as well as the broader communication activities related to the project and its achievements. It is a central tool for the planning and documentation of communication, dissemination and exploitation activities and will therefore allow close monitoring of the progress of individual activities throughout the project lifetime. It should be noted that the current PDEC in month 6 is an initial version which builds on the plans outlined in the Description of Action (DoA). These plans will be further developed and updated as the project progresses and results become available, with updated versions of the PDEC being due in M18, M36 and M48.

  • D6.3 IP Manual

    Published Feb 2023, Sensitive

    The IP Manual offers a comprehensive guide on intellectual property (IP) within the context of collaborative innovation, detailing common IP terms, challenges, risks, and opportunities. It elaborates on IP management strategies as outlined in the Consortium Agreement, aiming to establish a unified understanding among partners on managing access to background IP, ownership, sharing, protection, and utilization of project outcomes both within and outside the project framework.

  • D6.4 Communication Toolkit

    Published Apr 2023, Public

    This communication toolkit is prepared within the scope of WP6 “Innovation Management: Communication, Dissemination, Exploitation and Sustainability” to create optimal visibility of AIDAVA and a wide project outreach to all relevant stakeholders. As a basis for all outreach activities a project-specific visual identity defining a logo and colour scheme as well as a set of Microsoft office templates has been created. The toolkit comprises the Corporate Identity (CI) and different material prepared on the basis of this CI. Emphasis with all materials to be produced in AIDAVA is on usefulness and eco-friendliness.

    https://zenodo.org/records/15434311

  • D6.5 Audio-visual material

    Published March 2024, Public

    In line with the Description of the Action and to support dissemination to the interested public, audio-visual material has been produced for AIDAVA. The material is designed to effectively communicate the project's goals and significance to a wide-ranging audience. This includes the general public, the scientific community, and various related projects and initiatives. Our aim is to increase awareness and understanding of AIDAVA's objectives and the positive effects it aims to achieve.

    https://zenodo.org/records/15434519

  • D6.6 Update of the PDEC in official report 1

    Submitted July 2024, sensitive

    The Plan for the Dissemination and Exploitation of results including Communication activities (PDEC) is a strategic document, helping the project consortium to establish the basis for an intellectual property strategy, as well as develop and monitor specific communication, dissemination and exploitation activities. The PDEC is part of WP6 “Innovation Management: Communication, Dissemination, Exploitation”. This WP focuses on creating visibility and fostering outreach of AIDAVA through external communication of AIDAVA activities, progress and achievements, as well as dedicated disseminating activities and strategic planning of exploitation routes for potential results. The PDEC aims to act as a guide and organising framework for all work relating to dissemination and exploitation of the project’s results, as well as the broader communication activities related to the project and its achievements. It is a central tool for the planning of communication, dissemination and exploitation activities and will therefore allow close monitoring of the progress of individual activities throughout the project lifetime. For an overview of all completed communication and dissemination activities, please see D6.7 “Report on dissemination and communication activities”. The current document is an update to the Initial PDEC (D6.2) which was submitted in month 6. This document will only report the updates and hence should be read as a supplementary document to what has already been reported in D6.2. These plans will be further developed and updated as the project progresses and results become available, with the next updated versions of the PDEC being due in M36 and M48.

  • D6.7 Report on Dissemination and Communication activities

    Published March 2024, Public

    The Report on dissemination and communication activities is part of WP6 “Innovation Management: Communication, Dissemination, Exploitation and Sustainability”. The purpose of this WP is to foster outreach and engage various stakeholders through external communication of AIDAVA activities, progress and achievements, as well as dedicated disseminating activities and exploitation of potential results. The current deliverable presents the dissemination and communication activities undertaken by members of the consortium for the period M01-M18. These activities are performed in accordance with the guidelines and framework described in D6.2 Plan for Dissemination & exploitation of Results incl. Communication Activities (PDEC). An update to the PDEC in the form of D6.6 Update of the PDEC in official report 1 provides more information about the activities planned to be performed past M18. These further dissemination and communication activities will be reported in D6.10 Update of Report on Dissemination and Communication Activities 1 (due M36) and D6.11 Update of Report on Dissemination and Communication Activities 2 (due M48).

    https://zenodo.org/records/14035361

  • D6.8 Recommendations from the Sustainability Advisory Board

    Submitted January 2025, sensitive

    The Recommendations from the SAB report is part of WP6 “Innovation Management: Communication, Dissemination, Exploitation and Sustainability”. The aim of this WP is to promote outreach and actively involve diverse stakeholders by externally communicating AIDAVA's activities, developments, and achievements, alongside targeted dissemination efforts to ensure sustainability and exploitation of Key Exploitable Results (KERs). The current deliverable presents the activities undertaken by members of the consortium for the period M01-M22 around sustainability, including composition of the Sustainability Advisory Board (SAB) and organization of SAB meetings. During those meetings, the project team presented the AIDAVA project objectives, the project KERs, their value proposition, stakeholders’ groups, and envisioned exploitation plans. A critical component in each SAB meeting was the collection of feedback and recommendations from the SAB members on how to ensure sustainability of the KERs and of a potential product combining multiple KERs. The deliverable describes the activities of the first three SAB meetings and provides a summary of the recommendations. More specifically, two “Go To Market” approaches have been suggested as potential avenues for an “AIDAVA product” combining several KERs: 1) interoperability enablement of healthcare authorities and healthcare organisations as part of implementation of the EHDS and 2) citizens empowerment in managing their health data. These two proposals will be further explored with the AIDAVA project team and SAB members in the months to come, together with all the results of the project. All activities described in this deliverable are performed in accordance with the guidelines and framework described in D6.2 Plan for dissemination & exploitation of results incl. communication activities (PDEC) and the follow-up updated D6.6 Update of the PDEC in official report.

  • D7.1 Project Management Platform

    Published Nov 2023, Sensitive

    This deliverable describes the basic functionalities of the internal project management platform set up for the AIDAVA project. Those functionalities include, inter alia, the preparation of deliverables, reportings, the organization of meetings and keeping an overview of upcoming dissemination/ communication activities and publications. The project management platform is a tool set out to support an efficient and effective project management in AIDAVA.

  • D7.2 Management Guide

    Published Dec 2023, Sensitive

    This Management Guide will lay down standard workflows and processes for the most common and recurring management activities on AIDAVA consortium level. In addition, it will give guidelines and recommendations with regard to communication within the project and dissemination of project results. The Guide will be made available to all project partners at the beginning of AIDAVA to foster active collaboration and a smooth implementation from the start.

  • D7.3 Risk Management & Mitigation Plan

    Published Dec 2023, Sensitive

    The AIDAVA project incorporates risk assessment primarily through its Work Package 7, focusing on scientific coordination, quality assurance, and risk management to ensure timely risk identification and effective decision-making. The Management Team, alongside the Steering Committee, conducts regular quality assessments and risk evaluations, facilitating immediate conflict resolution and adjustments to the work plan. Quarterly Steering Committee meetings are critical for maintaining strict risk management practices. The project's Management Team, including key project coordinators and the Project Management Office, is responsible for mitigation planning and suggesting project-level adjustments to address potential risks. This comprehensive approach ensures the project's adaptability through continuous risk monitoring, planned mitigation strategies, and necessary adjustments to the project's contracts and work plans.

  • D7.4 Internal progress report 1

    Published Aug 2023, Sensitive

    The AIDAVA deliverable, Internal Progress Report 1, covers the project's initial nine months (M1-M9), detailing the achievements and progress across various work packages and tasks. It evaluates the project's adherence to the planned timeline, identifies any challenges or delays encountered, and provides an overview of the solutions implemented to address these issues. The report also highlights key milestones achieved, summarizes the outcomes of collaboration efforts among project partners, and outlines the next steps for continued project advancement. Additionally, it includes an assessment of project risks, with a focus on mitigation strategies to ensure the project remains on track. This document serves as a comprehensive update on the AIDAVA project's development, offering insights into its management, scientific coordination, and overall progress.

  • D7.5 Internal progress report 2

    Submitted February 2025, sensitive

    The AIDAVA Internal Progress Report 2 covers the period between M19 and M27 of project implementation, detailing the achievements and progress across various work packages and tasks. It evaluates the project's adherence to the planned timeline, identifies any challenges or delays encountered, and provides an overview of the solutions implemented to address these issues. The report also highlights key milestones achieved, summarizes the outcomes of collaboration efforts among project partners, and outlines the next steps for continued project advancement. Additionally, it includes an assessment of project risks, with a focus on mitigation strategies to ensure the project remains on track. This document serves as a comprehensive update on the AIDAVA project's development, offering insights into its management, scientific coordination, and overall progress.