Content
summary Summary

Researchers at EMBL-EBI in Cambridge and the German Cancer Research Center (DKFZ) have introduced Delphi-2M, a generative transformer trained on health records that estimates individual disease risks over time and simulates possible future trajectories.

Ad

According to the study published in Nature, the model learns sequences of diagnoses and basic health information from large databases and predicts the probability of more than 1,000 conditions - including when they might occur.

Delphi-2M was first trained on a little over 400,000 participants from the UK Biobank and then evaluated, without fine-tuning, on data from 1.93 million people in Denmark’s national health registers. The team emphasizes that the system delivers probabilities and trends, not medical certainties or causal explanations. For now, it should be seen mainly as proof of concept.

How the model works

Technically, Delphi-2M adapts a GPT-style transformer to medical timelines: instead of words, it processes life events along a timeline. Importantly, it predicts not just what might happen (the next diagnosis), but also when.

Ad
Ad

Inputs include a person’s medical history as a list of ICD-10 diagnoses with age at first occurrence, plus basic demographic and lifestyle factors such as sex, BMI, smoking, and alcohol use. The model processes these along a timeline and outputs daily hazard rates - the probability each day that one of more than 1,000 conditions (or death) could occur.

It also estimates the expected time until the next event and can simulate complete possible future trajectories based on that. Predictions are updated whenever new patient information is added.

To handle long gaps in medical records, the model inserts neutral placeholders. Code and documentation are available on GitHub, though the model itself is restricted under UK Biobank data access rules.

How accurate are the predictions?

In internal testing, Delphi-2M outperformed chance by a wide margin for nearly all diseases, with particularly strong results in predicting short-term mortality. Accuracy decreased somewhat over longer horizons, but underlying trends held up even after 10 years.

External validation on Danish health data showed only a slight performance drop compared to UK Biobank results - suggesting the approach could scale to larger populations and more diverse datasets if trained on more data or bigger models.

Recommendation

Potential applications and timeline

The researchers see immediate potential in public health planning. Aggregated predictions could help estimate regional or demographic disease burdens more accurately. For use with individual patients, they expect a five-to-ten-year timeline given regulatory hurdles, according to the Financial Times.

The model performed strongest on conditions with clearer progression patterns such as cardiovascular disease, diabetes, and sepsis. It was less reliable for rare congenital disorders or diagnoses strongly shaped by external factors. The team is exploring ways to integrate other data layers such as genomics and proteomics. Core methods for combining risk and time modeling have been patented.

Limits and open questions

The study also makes clear the system’s limitations. The UK Biobank skews toward healthier, better-educated participants aged 40 to 70. Deaths before enrollment are absent, and very old age groups are underrepresented. Diagnoses also come from mixed sources, including self-reports, primary care, hospitals, and registers.

Such gaps can bias predictions. For instance, if a patient’s records lack hospital data, the model underpredicts conditions mostly diagnosed in hospitals. Conversely, for individuals with hospitalization history, it will predict such conditions far more often. Sepsis, 93 percent hospital-coded, was predicted about eight times more frequently in these cases. These patterns partly reflect real care pathways, but they also introduce artifacts.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

For this reason, the authors explicitly caution against causal interpretations. Models like Delphi-2M should be seen as a complement to - not a replacement for - clinical judgment.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at EMBL-EBI and DKFZ developed Delphi-2M, a transformer model trained on health records that predicts individual disease risks and timing for over 1,000 conditions by processing medical timelines instead of text, using ICD-10 diagnoses, demographics, and lifestyle factors to output daily probability rates.
  • The model showed strong accuracy in internal testing on UK Biobank data and maintained performance when tested on 1.93 million Danish health records, with particularly reliable predictions for cardiovascular disease, diabetes, and short-term mortality, though accuracy decreased over longer time periods.
  • While researchers see potential for public health planning within five to ten years, the system has significant limitations including bias toward healthier participants, missing data from different healthcare sources, and prediction artifacts that make it unsuitable as a replacement for clinical judgment.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.