Skip to main content

IHI heart disease project devises AI-based tool to standardise health data

iCARE4CVD has delivered an innovative new tool to harmonise data in the cardiovascular disease field, a key step towards advancing more personalised care.

26 November 2025
An EKG machine, symbolising the kind of health data that exists on cardiovascular disease patients
© Pitchyfoto, Shutterstock

iCARE4CVD aims to personalise and improve the care of people living with cardiovascular disease (CVD). As a first step, they plan to analyse data on over a million patients to gain new insights into the diseases. The problem is that this data is currently stored in multiple locations and in multiple different ways, depending on the hospital, country, or technology used. For example, one hospital may refer to ‘heart failure’, another to ‘HF’, and another may use a numerical code. Test results may also be encoded using different units or be structured in different ways. Furthermore, some data may be complex, e.g. family history of high blood pressure measured at different times.

The challenge for iCARE4CVD was to find a way of harmonising these diverse data types so that they can be reliably used for research to that will enhance the care of the millions of people living with CVD worldwide.

To do this, the team developed a CDE (common data element) mapper which, as its name suggests, maps data elements including medical terms, measurements and even complex data entries to existing, internationally recognised medical vocabularies.

Harnessing AI to get results

The CDE-Mapper, which is described in a paper in Computers in Biology and Medicine, draws on an artificial intelligence (AI) approach called retrieval-augmented generation (RAG).

‘RAG works in two steps. First, it searches the most relevant information from trusted sources, such as biomedical dictionaries or the knowledge collected in iCARE4CVD. Then, it uses a large language model (a type of AI that can understand and generate text) to create an answer based on that information,’ explains first author of the paper Komal Gilani of Maastricht University in the Netherlands. ‘We chose RAG because linking clinical data to standard medical codes requires high accuracy; by checking against reliable medical sources, RAG ensure the AI’s answers are grounded in real, verified biomedical knowledge.’

The system is designed to learn and improve over time by involving human experts who review and confirm its choices.

A strong performer compared to other options

The iCARE4CVD team tested the CDE-Mapper on different data sources, including medical literature and patient records, and compared its performance against other widely used AI systems. Their goal was to assess how accurately the CDE-Mapper matched different health terms and formats to standard vocabularies.

They found that it outperformed existing methods by 11% when it comes to identifying and translating health concepts. It did especially well when dealing with complex or multi-part data, such as heart rate measured in a specific body position. It was also consistent at finding correct matches across medical dictionaries, which is important when data comes from real-world health records.

Clear benefits for patients, doctors and researchers

The team is now refining the tool to make it faster, more flexible, and easier to use in real-world health care settings. Meanwhile the project is clear on how the CDE-Mapper could help researchers studying CVD by making it easier to spot trends and develop new therapies. On the care side, it could provide doctors and patients with clearer insights, leading to better diagnoses and more personalised care, especially for those with complex conditions. The CDE-Mapper also contributes to the implementation of the European Health Data Space (EHDS).

‘The EHDS aims to make health data across Europe more standardised and interoperable,’ says Ms Gilani. ‘The CDE-Mapper can contribute to this goal by offering a scalable, semi-automated way (with a human-in-loop approach) to align study data metadata with international medical vocabularies. This makes it easier to share, integrate, and reuse datasets for research, healthcare innovation, and public health.’