Part of The HES processing cycle and data quality checks
Automated data cleaning process from SUS into HES
At this stage in the processing, a number of automated cleaning rules are then applied to the data, before also deriving new items and making the information available for HES users.
Provider organisation code mapping
Reference data based on the information on Organisation Data Services (ODS) is used to automatically update old or invalid provider organisation codes to the correct provider code. Where no appropriate provider code can be found, this data is deleted.
Duplicate detection and removal methodology
Procedures are run each month to automatically identify records that have been submitted multiple times by a provider. Duplicates are identified by looking for matching data in a number of key fields and when identified duplicate records are automatically deleted from the monthly provisional data. Details of all deleted records are shared with trusts so that if necessary they can request further data deletions from SUS to help ensure SUS remains accurate. Once the duplicated records have been removed from SUS, they will also be removed from HES when the next data extract is received.
Derivation and cleaning rules
Data cleaning rules for HES have been developed over time and continue to evolve to enhance the dataset. These rules have three main purposes:
1. To clean common and obvious data quality errors.
2. To derive additional data items to populate the HES dataset.
3. Remove activity data outside the relevant date range for the current HES extract.
You may find it useful to also refer to the HES Technical Output Specification (TOS), tab ‘processing rules'.
Master Patient Service (MPS) Person ID
The MPS ID in an enhanced person-matching algorithm, replacing the previous HESID field, as many people attend hospital more than once. The MPS ID improves matching of patients across all national patient-level data, not just across HES. It exploits new analytical and data science capabilities and increases efficiency through automations.
This is the final derivation applied to HES, following all other derivations and cleaning rules.
Last edited: 31 March 2025 11:41 am