NHS England Secure Data Environment service is part of the Secure Data Environment network which is powering life-saving research and treatments across the UK.

By delivering a safe, sustainable, and trusted health data service, NHS England Secure Data Environment service is supporting vital clinical research across areas such as cancer, cardiovascular disease, clinical guidance and COVID-19. In its first year alone, the environment has facilitated around 100 research projects and supported 260 researchers.
As a service, we are on a mission to further reduce the time it takes for approved research to begin in the environment. Our team is continually testing innovative solutions to speed up processes from onboarding through to exporting results. These efforts have already produced impressive results, with some processes seeing time reductions from days to minutes.
In July 2024, two innovators, Adam Hollings and Humaira Hussein, showcased their groundbreaking work at the Health and Care Analytics conference 2024. Their poster presentations showcased innovations that have dramatically improved data validation speed and enhanced researcher capabilities, inspiring fellow analysts in the process.
Good analysis saves lives; and good analysis is built on good quality data.
Adam Hollings: Data validation reduced from 3 days to 30 minutes
Once a research project has a Data Sharing Agreement, a secure workspace within the Secure Data Environment (SDE) is created for the approved researcher(s) to access the selected datasets. All data provisioned into the SDE platform must be validated. This process was manual and took on average 3 days to complete.
I, along with Helen Richardson and Elizabeth Kelly in the Service team decided to make significant improvements in this area. We focused on:
- improving the efficiency and consistency of the data validation process
- making it a reusable process to save time and follow best practice
- sharing the code so others can benefit.
Using Python, we created reusable programming code that thoroughly checks for problems with the data tables. We also engaged with stakeholders to ensure validation checks were comprehensive.
Through these innovations, we reduced the time to validate data from 3 days to an astonishing 30 minutes. This dramatic reduction in validation time allows researchers to begin their work much more quickly, significantly accelerating the pace of healthcare research.
The validation code is now reusable across different datasets, promoting efficiency and best practices. By enabling earlier identification and resolution of potential issues, this new approach not only saves time but also enhances the quality and reliability of healthcare data available for research.
One of the themes at HACA2024 was that good analysis saves lives; and good analysis is built on good quality data. By ensuring our data is validated in a consistent, reliable and transparent way we lay the groundwork for future analysis. Saving time as well is the icing on the cake. Some other attendees had mentioned they had been doing similar efforts in their own trusts and organisations; by sharing this work we can facilitate that and help everyone improve.
By creating a ‘cheat sheet’ of commands and functions available to use in SparkR, we have showcased the vast capabilities of the SDE.
Humaira Hussein: Empowering researchers with SparkR
NHS England Secure Data Environment offers a rich choice of programme languages to run research queries. To support those who are unfamiliar with SparkR, we have developed a suite of tools to enhance data analysis capabilities.
Shoaib Ali Ajaib, Nickie Wareing, Liam Beckingham and I in the SDE service team, created the Example Analysis notebooks that showcase the power of SparkR in Databricks for healthcare researchers. The notebooks present a range of analytical scenarios that can be adapted and applied by researchers across various settings.
These notebooks demonstrate:
- efficient big data handling: SparkR's distributed computing capabilities allow for faster processing of large datasets.
- versatile analysis tools: The notebooks cover a range of analytical scenarios, from data manipulation to complex statistical models like generalised linear models.
- data visualisation: Integration with ggplot2 for creating insightful visualisations of healthcare data.
The benefits for researchers are substantial. Many SDE users, unfamiliar with distributed computing and interactive notebook environments, now have a valuable resource to guide their analysis. They can adapt the provided SparkR code to their specific research needs, leading to more efficient analysis and maximised data use. To ensure continuous improvement, we host regular drop-in sessions for customers, allowing us to gather user feedback and refine the notebooks.
The SDE Example Analysis notebook are a great source of information, particularly for researchers new to the SDE. By creating a ‘cheat sheet’ of commands and functions available to use in SparkR, we have showcased the vast capabilities of the SDE, which other researchers and health analysts enjoyed learning about and would be able to use in their own projects.
Looking ahead
Both innovations in data validation and analysis represent a significant leap forward in healthcare data management within NHS England Secure Data Environment. The team is now exploring ways to further integrate these tools and expand their capabilities based on researcher feedback.
By dramatically improving both the speed of data validation and the depth of data analysis, NHS England is speeding up access and analysis in the Secure Data Environment. These advancements promise not only to accelerate research but also to improve the quality and reliability of insights drawn from healthcare data, ultimately contributing to better patient outcomes.
Find out how NHS England Secure Data Environment can support your research today.
Authors
Latest blogs
Last edited: 26 July 2024 4:23 pm