Skip to main content

Exploring how to create mock patient data (synthetic data) from real patient data

Current Chapter

Exploring how to create mock patient data (synthetic data) from real patient data


The NHS AI Lab Skunkworks team has been releasing open-source code from their artificial intelligence (AI) projects since 2021. One of the challenges faced with releasing code is that without suitable test data it is not possible to properly demonstrate AI tools, preventing users without data access from being able to see the tool in action.

One avenue for enabling this is to provide “synthetic data”, where new “fake” data is generated from real data using a specifically designed model, in a way that maintains several characteristics of the original data. In particular, synthetic data aims to achieve:

  • Utility - the synthetic data must be fit for its defined use
  • Quality - it must be a sufficient representation of the real data
  • Privacy - it mustn’t ‘leak’ or expose any sensitive information from the real data

Last edited: 28 April 2025 8:37 am