UK NHS trains AI on entire population’s health data

Table of Contents

Foresight model aims to predict disease risk for 57 million patients using generative AI trained on real-world medical records.

In what may be a defining test for national-scale artificial intelligence in healthcare, the United Kingdom has trained a generative AI model on the medical histories of nearly its entire population. The model – named Foresight – was developed using deidentified NHS data from 57 million patients and is designed to predict more than a thousand future disease diagnoses, including hospital admissions, complications and major events such as myocardial infarction.

The dataset, derived from GP visits, hospital interactions, vaccinations and the national death registry between 2018 and 2023, represents over 10 billion medical events. Its sheer breadth positions Foresight as a singular experiment in scale and cohesion; no other health system – certainly none as large and complex as the NHS – has attempted to train a generative AI in this way.

The project is led by researchers at University College London and King’s College London in collaboration with NHS England, the British Heart Foundation (BHF) Data Science Centre and Health Data Research UK. The model is built on Meta’s LLaMA 2 architecture, with training conducted within NHS England’s Secure Data Environment, using cloud infrastructure provided by Amazon and Databricks. While many AI models are tested on curated or synthetic data, Foresight’s distinction lies in its real-world grounding – trained on clinical encounters from an entire nation’s health system.

Longevity.Technology: This marks a moment of genuine significance for data-driven medicine – and a bold step toward population-scale preventative care. Training a generative AI on the deidentified health records of 57 million people is both a logistical feat and a uniquely British moonshot, made possible by the NHS’s centralised structure. Foresight is ambitious in both scope and method: AI at population scale, built to forecast individual risk across thousands of potential conditions, and to do so with a level of granularity and reach that is, quite simply, unprecedented. It hints at a future in which predictive models function like diagnostic weather systems – identifying at-risk patients long before symptoms emerge.

But scale alone is not virtue. With no opt-out, no patient redress and no published metrics, the project walks a fine line between visionary science and ethical overreach. Public trust in health AI will hinge not just on what these models can do, but on how transparently and responsibly they do it.

The researchers are currently evaluating the model’s predictive power – specifically, whether it can retrospectively forecast outcomes in 2023 based on data from 2018 to 2022. If successful, this approach may signal a shift in healthcare strategy: from retrospective analysis to anticipatory intervention. However, it also raises questions about the relationship between consent, control, and utility – particularly when datasets of this scale leave individuals with no opt-out mechanism and no ability to remove records once incorporated into the model.

Dr Chris Tomlinson, lead researcher at the UCL Institute of Health Informatics, emphasized the value of inclusivity in large-scale data use. “AI models are only as good as the data on which they’re trained. So if we want a model that can benefit all patients, with all conditions, then the AI needs to have seen that during training.

“Using national-scale data allows us to represent the kaleidoscopic diversity of England’s population, particularly for minority groups and rare diseases, which are often excluded from research [1].”

For researchers focused on aging biology and its clinical translation, models like Foresight may provide essential scaffolding for risk stratification and early detection. Identifying individuals entering pre-frailty or showing subtle signs of cognitive or metabolic decline – long before conventional diagnostics – could enable timely, targeted interventions that help preserve function and prolong healthspan.

There is, among the research team, a recognition that both technical success and public trust must go hand in hand. While the data has been carefully deidentified, it is not fully anonymous – and GDPR’s application to such data remains uncertain. The project operates under pandemic-era research provisions, which allow for broader data use than usual, but these permissions are not unlimited. In this context, transparency becomes more than best practice – it is a prerequisite for legitimacy.

The technical feat itself was considerable. “Combining the computing resources needed for AI with NHS data has always been challenging,” said Simon Ellershaw, a PhD researcher at UCL. “But thanks to the support of our partners we’ve been able to safely and securely apply state-of-the-art AI methods to NHS data at unprecedented scale [1].”

This is not the first appearance of Foresight; an earlier version of the model was tested using data from two NHS Trusts, where it showed potential in mapping health trajectories. The national-scale pilot represents a natural, albeit ambitious, progression.

“This pilot is building on previous research that demonstrated Foresight’s ability to predict health trajectories from data from two NHS trusts,” said Professor Richard Dobson, based at both UCL and King’s College London. “To be able to use it in a national setting is very exciting as it will potentially demonstrate more powerful predictions that can inform services nationally and locally.

“Currently the data in this pilot is broad but shallow, and ultimately we’d like to harness the expertise and AI platforms behind Foresight by including richer sources of information like clinicians’ notes, or results of investigations such as blood tests and scans if they become available [1].”

That addition – blood markers, imaging, clinical narratives – would place the model in even closer proximity to biological aging metrics. Such data types could support integration of epigenetic age, inflammation profiles or resilience indicators, moving the model beyond disease forecasting into the realm of systems-level aging analysis.

Patient involvement has been woven into the governance process. A BHF Data Science Centre public contributor involved in reviewing the project said: “As a patient, I’m interested in how this research could help identify linked health conditions, reduce the risk of developing new ones, and support those who face challenges accessing healthcare. It’s important that people know how their health data is being used, so it’s encouraging to see a focus on transparency and making sure AI is used in the NHS in a safe, ethical way with public benefit at its heart [2].”

That question – how to align individual rights with public good – is not unique to the NHS, nor to this project. But Foresight may be the most visible bellwether of its kind: a national-scale prototype of what predictive, preventative healthcare might one day become.

Dr Vin Diwakar, National Director of Transformation at NHS England, emphasized the role of infrastructure in making such research feasible. “AI has the potential to transform the way we prevent and treat disease, if trained on large datasets and safely tested. The NHS Secure Data Environment has been fundamental to this pioneering research, shaping a future where earlier treatments and interventions are targeted to those who will benefit, preventing future ill health. This will boost our ability to move quickly towards personalised, preventative care [1].”

As the science of aging advances, the ability to operationalize risk data at scale will become increasingly vital. What Foresight offers is not just a technical precedent, but a glimpse of how whole-population health forecasting might serve as the backbone of 21st-century longevity strategy.