Skip to main content
Blog

Tracing NHS data back to the source

Professor Matt Sydes explains how we’re semi-automating the data provenance process to ease the admin burden on researchers and speed up clinical trials.

A hard truth is that we need to improve the delivery of clinical trials. They are often large and time consuming for good reason. We need reliable findings and must protect against false positives and false negatives.

If patients are already expected to do well or the treatment is only expected to make a modest improvement or we need to have greater certainty of the results, the trial will need to be larger and longer. And more time inevitably means more money.

Gloved hands using a keyboard

Fortunately, we have access to the world’s largest linked health datasets and NHS data will play a big role in saving time, money and protecting participants from unnecessary risk.

NHS DigiTrials and the new network of Secure Data Environments (SDEs) for Research are already giving trusted researchers support to conduct their trials and access to NHS data. 

The efficient connection and search across near real-time NHS datasets should significantly help diversity and inclusivity of clinical trials. But the ability to link across NHS datasets is not yet simple, and we are making good progress in this area. 

We hold over 200 different datasets, each for a specific purpose and collected in a unique way.

For nearly 30 years, I’ve been designing, conducting, analysing, reporting and communicating clinical trials. Most of this time, I’ve been employed as a statistician researching ways to streamline the delivery of trials.

I recently joined NHS England from my previous role as Professor of Clinical Trials and Methodology at MRC Clinical Trials Unit at UCL (University College London).

I am excited to bring my expertise and years of experience in the research community to help NHS England seize the opportunities of data-driven clinical trials and overcome their associated challenges.


The gold standard

Healthcare research aims to find out more about disease trends and risk factors, outcomes of treatments, patterns of care, health care costs and use.

Testing through clinical trials is an essential part of the research process. Randomised control trials are the most reliable way to compare treatments and are hailed as the gold standard.

Trials operate within a very strict regulatory framework – particularly drug trials (Clinical Trials of Investigational Medicinal Products). Their findings inform crucial decisions by regulators and often directly impact clinical practice and the way healthcare is delivered.

This concept, known as data provenance, is crucial for understanding the context and reliability of the information.

Lightening the load

The NHS is stretched and has been for many years. Anything we can do to reduce the burden on our busy healthcare systems from any individual research project will help enable more research overall.

One area that can be improved is how data is collected for clinical trials. Currently, the process of collecting, recording, transmitting, checking, and re-checking the trial data can create an unnecessary burden of effort.

Instead, if the relevant data could be directly pulled from the healthcare system's records and promptly shared with the trial team, it would simplify the process. Our SDEs provide researchers access to some of this NHS data, which is a huge stride forwards – but there’s a catch.


Tracing the data

NHS England does not have one single data set for everything, we hold over 200 different sets, each for a specific purpose and collected in a unique way.

Researchers must be able to trace the origin of each piece of data they use. This concept, known as data provenance, is crucial for understanding the context and reliability of the information. This is an important requirement in clinical trials to ensure data accuracy, quality, and integrity.  The use of secure data environments further complicates this issue, because traditional methods for demonstrating data provenance may no longer be as effective.

It’s not glamorous, but it is a critically important step that will reduce administrative burden, aid trust and transparency, and improve the overall delivery of clinical trials.

The DEDICaTe project

In July 2022, I worked with Macey Murray and our colleagues across NHS England, UCL, HDR UK and the University of Oxford to set out the principles for how data provenance and healthcare systems data could be addressed for clinical trials manually across 2 important datasets. 

Now we have published a new paper on semi-automating this process. This DEDICaTe proof-of-concept study focused on 4 important national datasets: 

  • Civil Registration of Deaths
  • Hospital Episode Statistics 
    • Admissions
    • Outpatients
    • Critical care

The project team mapped the data flow from around the UK into NHS England’s data platforms, including the relevant processing rules, and how these datasets, where appropriate, would pass to approved researchers. 

Together, this provides the necessary documentation and clarity for the datasets using a semi-automated approach – data can be updated and managed automatically, giving researchers access to up to date provenance information on relevant datasets. Find out more on the DEDICate website.

This is an exciting first step! The documentation needs regular review for updates, perhaps annually, and trial teams need to know how to access and use the documentation for their filing purposes. 

If we expand this work to all national datasets held by NHS England for research, ideally including pathology and blood measurements, this would be a huge benefit to researchers. There would be a greater impact still if our colleagues across the UK that hold healthcare systems data mirrored this work. This is achievable – it’s not glamorous, but it is a critically important step that will reduce administrative burden, aid trust and transparency, and improve the overall delivery of clinical trials. 


Challenges today, opportunities tomorrow

Most forms of research don’t need to demonstrate the provenance and integrity of the datasets underpinning their findings, but clinical trials have strict regulations to do so, to protect participants and to ensure the integrity of the research.  The potential benefits to doing so, extend far beyond clinical trials.

The journey ahead may be challenging, but with continued collaboration and innovation, we're taking important steps towards a future where clinical trials are more streamlined, data-driven, and impactful. I'm looking forward to working with my colleagues to push for that future.



Related subjects

Adam Hollings and Humaira Hussein talk about how NHS England Secure Data Environment service is powering life-saving research and treatments across the UK. 

Author

Last edited: 1 October 2024 10:50 am