Skip to main content

To facilitate HES users in their understanding of a real application of MPS, this chapter lists a series of real examples (re-enacted with fictitious data) from HES data sets.


Aggregated findings from real data

We used the HES data set from the financial year 2021/2022. This is provisional data, final published HES data for 2021/2022 year may differ.

Table 7. Counts of the different combinations of the MPS output fields for HES APC (FY=2021/2022)

Matched algorithm indicator

Matched confidence percentage

Postcode score percentage

Date of birth score percentage 

 

Gender score percentage

Count 
0 0 0 0 0 233,033
1 0 0 0 0 668
1 100 null null null 20,666,061
1 100 0 0 0 72,512
4 100 100 100 100 19,516
4 0 0 0 0 25,615

 

In Table 7 all possible combinations of values for the MPS output fields are displayed for HES APC data set.

With the knowledge of the technical details of the data linkage algorithm chapter in mind, we are now able to explain the different counts.

There are only 3 unique valid values for the MatchedAlgorithmIndicator (that is, 0, 1 and 4) which confirms that no alphanumeric trace step is run for HES, due to the absence of name fields.

The third row shows that most of the matches happen at cross-check trace in DPS, which is expected because HES is a well curated data set where most records will have correct NHS numbers and DOB that match PDS.

72,512 records are still matched with cross-check trace but in Spine instead. This can be seen by the zero (rather than the null) values for the score percentage columns. As we can appreciate in the next paragraph, this can happen for several reasons:

  • the match could not be found in PDS cached, but can be found in PDS live
  • the DOB was only partially correct, and therefore the match was picked up by cross-check trace in Spine which does some additional checks with respect to the step in DPS
  • the NHS number in the HES record was superseded by another NHS number

668 records have a MatchedAlgorithmIndicator value of 1 but MatchedConfidencePercentage value of 0, meaning that the algorithm could not find a match and it exited at cross-check trace (in Spine) because records did not meet the eligibility criteria for proceeding to algorithmic trace, that is, having valid values in the DOB, gender and postcode fields. These records might either be matched at MPS_ID matching step or not at all.

25,615 records have a MatchedAlgorithmIndicator value of 4 but MatchedConfidencePercentage value of 0, meaning that MPS attempted all tracing steps without finding a match. These records might either be matched at MPS_ID matching step or not at all.

19,516 records were matched with algorithmic trace, and these have MatchedAlgorithmIndicator value of 4 and MatchedConfidencePercentage value of 100.

The 233,033 records in row 1 have not been processed at all against PDS because they did not meet the minimum requirements (due to invalid DOB field). These might either be matched at MPS_ID matching step or not at all.


Empirical examples

The examples in this section are based on real results observed in the processing of HES records through MPS, with personal identifiable information all replaced by consistent fictitious values, hence making it impossible to identify real individuals.

Please note that as the HES data set does not include patient names, these examples do not include any instances of alphanumeric trace, or the use of names in algorithmic trace.

These examples use the field names for an MPS request and response files as listed in Table 9 and Table 10, respectively. The equivalent HES field names for readers familiar with HES data are listed in Table 8 below.

Table 8. MPS request file fields

MPS field name in the request file HES field name they are mapped to
NHS number NEWNHSNO
Gender SEX
Date of birth DOB
Postcode HOMEADD
Local_Patient_ID Combination of PROCODET/PROCODE5/PROCODE3 and LOPATID

 

Group A: happy path scenarios

The first 4 case studies are common happy path scenarios where a Person_ID was found.

Case Study 1: Valid NHS number, matched by DPS cross-check trace

Given a HES record with the following values:

HES record NHS number Gender Date of birth Postcode
1 3333333333 2 2000-02-22 LS1 4AP

Fictitious data 

The following response fields could be returned by MPS:

HES record person_ID Matched algorithm indicator Matched confidence percentage Date of  birth score percentage Gender score percentage Postcode score percentage
1 3333333333 1 100 null null null

Fictitious data 

The MatchedAlgorithmIndicator value of 1 indicates that a match was found at the cross-check trace step. The null score percentages indicate that it was in the DPS cross-check trace. This corresponds to row 3 in Table 7, where we see is the most popular scenario.

If the score percentages were zero, it would indicate that the match was found at the cross-check trace stage in Spine.

Case Study 2: Wrong or null NHS number, matched at algorithmic trace

Given a HES record with the following values:

HES record NHS number Gender Date of birth Postcode
1 4444444444 2 2000-02-22 LS1 4AP

Fictitious data 

HES record person_ID Matched algorithm indicator Matched confidence percentage Date of birth score percentage Gender score percentage Postcode score percentage
1 3333333333 4 100 100 100 100

Fictitious data 

The MatchedAlgorithmIndicator value of 4 indicates that a match was found at the algorithmic trace step. If the NHS number and DOB had been correct, it would have been matched at cross-check trace instead.

The DateOfBirthScorePercentage is 100, indicating that it must have been a wrong NHS number that prevented it from being matched at cross-check trace. The other score percentages are all 100 which indicates that the returned Person_ID is the NHS number of a record which matches on DOB, gender, and postcode.

Case Study 3: No local patient ID, matched to an existing MPS record

Given a HES record with the following values:

HES record NHS number Local patient ID  Gender Date of birth Postcode
1 null null  2 2000-02-22 LS1 4AP

Fictitious data 

The following response fields could be returned by MPS:

HES record person_ID Matched algorithm indicator Matched confidence percentage Date of birth score percentage Gender score percentage Postcode score percentage
1 A123456789 4 0 0 0 0

Fictitious data 

The MatchedAlgorithmIndicator value of 4 and MatchedConfidencePercentage value of 0 indicate that algorithmic trace was performed but was not successful, so the record was sent on for MPS matching.

The Person_ID begins with ‘A’, which indicates that there was a successful match at MPS matching (it could begin with ‘A’ or ‘B’).

Because the input query had no local patient identifier and no given name or family name, it must have matched on DOB, gender, and postcode.

Case Study 4: With local patient ID, matched to an existing MPS record

Given a HES record with the following values:

HES record NHS number Local patient ID Gender Date of birth Postcode
1 null 98A21B 2 2000-02-22 LS1 4AP

Fictitious data 

The following response fields could be returned by MPS:

HES record  Person_ID Matched algorithm indicator Matched confidence percentage Date of birth score percentage Gender score percentage Postcode score
1

A123456789

4 0 0 0 0

Fictitious data 

The MatchedAlgorithmIndicator value of 4 and MatchedConfidencePercentage value of 0 indicate that algorithmic trace was performed but was not successful, so the record was sent on for MPS matching.

The Person_ID begins with ‘A’, which indicates that there was a successful match at MPS matching or that a new MPS_ID was created in the MPS record bucket (it could begin with ‘A’ or ‘B’).

Because the input query has a local patient identifier, but no given name or family name, if it was matched it must have matched on local patient identifier and DOB. We don’t know whether it matched on gender or postcode.

Group B: different identifiers linked to the same Person_ID

The next 6 examples have been chosen to show how records with different identifiers can be assigned the same Person_ID (and therefore the same Token_Person_ID).

Case Study 5: Two records with the same DOB, one without NHS number, return the same Person_ID

Given two HES records with the following values:

HES record NHS number Gender Date of birth Postcode
1 3333333333 2 2000-02-22 null
2 null 2 2000-02-22 LS1 4AP

Fictitious data 

The following response fields could be returned by MPS:

HES record Person_ID Matched algorithm indicator Matched confidence percentage Date of birth score percentage Gender score percentage Postcode score
1 3333333333 1 100 null null null 
2 3333333333 4 100 100 100 100

 

Fictitious data 

For HES record 1 we can see a match was made in the cross-check trace step as indicated by the MatchedAlgorithmIndicator value of 1. We can infer that an exact match was possible with the NHS number and DOB provided (as shown by the fact that the same DOB scored 100 for HES record 2). Given that a match in the cross-check trace step was made for HES record 1 the scores for DOB, gender and postcode can be either null or zero as they were not calculated in the algorithmic trace step. If null (like in this case), the match was found by cross-check trace in DPS.

In contrast, for HES record 2 which was missing an NHS number, the record was matched in the algorithmic trace step, as shown by the MatchedAlgorithmIndicator value of 4. In this instance, the patient was matched to the highest scoring PDS record with a score of 100 due to an exact match on DOB, gender and postcode.

As matches to PDS were found for both records, the Person_ID field assumes the value of the NHS number.

Case Study 6: Two records with different postcodes return the same Person_ID

Given two HES records with the following values:

HES record NHS Number Gender Date of birth Postcode
1 3333333333 1 1994-02-24 SW1A 2AA
2 3333333333 1 1994-02-24 SW1A 2AH

Fictitious data 

The following response fields could be returned by MPS:

HES record Person_ID Matched algorithm indicator Matched confidence percentage Date of birth score percentage Gender score percentage Postcode score percentage
1 3333333333 1 100 null  null  null 
2 3333333333 1 100 null  null  null 

Fictitious data 

For these records, the fact that the postcodes differ makes no difference in the matching process as both contain accurate DOB and NHS number. This allows the matching to take place at the cross-check trace step.

Case Study 7:Two records with slightly different DOB, one without NHS number, return the same Person_ID

Given two HES records with the following values:

HES record NHS number Gender Date of birth  Postcode
1 3333333333 1 1982-03-04 SW1A 2AA
2 null 1 1982-03-09 SW1A 2AH

Fictitious data 

The following response fields could be returned by MPS:

HES record Person_ID Matched algorithm indicator Matched confidence percentage Date of birth score percentage  Gender score percentage Postcode score percentage
1 3333333333 1 100 0 0 0
2 3333333333 4 100 100 100 100

Fictitious data

For HES record 1, the patient was successfully traced by the cross-check trace step.

Record 2 was an exact match on DOB (trace score = 100). This means that record 1 must not have matched on full DOB and must instead have matched at cross-check trace in Spine on partial DOB (1982-03) and outcode (SW1A).

For HES record 2, the patient was successfully traced by the algorithmic trace step with the highest scoring PDS record being 100 on DOB, gender and postcode.

Case Study 8: Two records with slightly different DOB, same NHS number, return the same Person_ID

Given two HES records with the following values:

HES record NHS number Gender  Date of birth Postcode
1 3333333333 1 1976-10-05 LS1 4AP
2 3333333333 1 1971-10-05 ZZ99 3WZ

Fictitious data

The following response fields could be returned by MPS:

HES record Person_ID Matched algorithm indicator Matched confidence percentage Date Of birth score percentage Gender score percentage Postcode score percentage
1 3333333333 1 100 null null null
2 3333333333 1 100 0 0 0

Fictitious data

The MatchedAlgorithmIndicator value of 1 and MatchedConfidencePercentage value of 100 indicate that a match was found at the cross-check trace step. The null score percentages for the first record indicate that it was in the DPS cross-check trace. The second record has scores of zero, hence the match was found at the cross-check trace stage in Spine.

In record 2, DPS cross-check trace could not find a successful match because the year of birth is incorrect, however, Spine cross-check allows for a partial DOB match where the outcode (the left part of the postcode) matches. In this example, the patient had on its postcode history a ‘ZZ99’ which matched the outcode.

Case Study 9: Two records with different gender return the same Person_ID

Given two HES records with the following values:

HES record NHS number Gender Date of birth Postcode
1 3333333333 1 1994-02-24 LS1 4AP
2 3333333333 9 1994-02-24 LS1 4AP

Fictitious data

The following response fields could be returned by MPS:

HES record Person_ID Matched algorithm Indicator Matched confidence percentage Date of birth score percentage Gender score percentage Postcode score percentage
1 3333333333 1 100 null null null
2 3333333333 1 100 null null null

Fictitious data

For these records, the fact that the gender differs makes no difference in the matching process to the person ID as both contain accurate DOB and NHS number. This allows the matching to take place at the cross-check trace step in DPS.

Case Study 10: Two records with different gender and postcode return the same Person_ID

Given two HES records with the following values:

HES record NHS number Gender Date of birth Postcode
1 3333333333 9 1970-07-01 ZZ99 3WZ
2 3333333333 2 1970-07-01 LS1 4AP

Fictitious data

The following response fields could be returned by MPS:

HES record Person_ID Matched algorithm Indicator Matched confidence percentage Date Of birth score percentage Gender score percentage Postcode score percentage
1 3333333333 1 100 null null null
2 3333333333 1 100 null null null

fictitious data

The MatchedAlgorithmIndicator value of 1 and MatchedConfidencePercentage value of 100 indicate that a match was found at the cross-check trace step. The null score percentages for the first record indicate that it was in the DPS cross-check trace. Postcode and gender are different in the two records; however, this does not affect the cross-check trace behaviour because it only looks at NHS number and date of birth.

Group C: edge cases

The final 4 examples have been chosen to demonstrate edge cases, where it is helpful to explain some results where a match could not be found, or which may appear surprising.

Case Study 11: Two records with superseded versus current NHS numbers

Given two HES records with the following values:

HES record NHS number Gender  Date of birth  Postcode
1 4444444444 2 2003-03-03 null
2 5555555555 2 2003-03-03 LS1 4AP

Fictitious data 

We would receive the following response fields following processing in MPS:

HES record Person_ID Matched algorithm Indicator Matched confidence percentage Date of birth score percentage Gender score percentage Postcode score percentage
1 444444444 1 100 null null null
2 444444444 1 100 0 0 0

Fictitious data 

For HES record 1, we can see a match is made in the cross-check trace step within DPS as indicated by the MatchedAlgorithmIndicator value of 1. This means that an exact match was possible using the NHS number and DOB provided by the PDS records cached within DPS.

HES record 2 has a different NHS number but is matched to the same Person_ID as HES record 1 and has the same MatchedAlgorithmIndicator value of 1. DOB, gender and postcode score percentage fields have values of 0, which means that record 2 was cross-check traced in Spine. The success of the tracing to a Person_ID associated with a different NHS number allows us to infer that 555555555 is an invalid NHS number which has been superseded by 4444444444. Cross-check trace in DPS does not return matches where NHS numbers are superseded, while cross-check trace in Spine was able to recognize the match because it also checks for superseded NHS numbers.

Case Study 12: Two records with different gender, postcode, and NHS number return the same Person_ID

Given two HES records with the following values:

HES record NHS number  Gender Date of birth  Postcode
1 4444444444 1 1994-02-24 SW1A 2AA
2 3333333333 2 1994-02-24 SW1A 2AA

Fictitious data 

We could receive the following response fields following processing through MPS:

HES record Person_ID Matched algorithm indicator Matched confidence percentage Date of birth score percentage Gender score percentage Postcode score percentage
1 3333333333 1 100 0 0 0
2 3333333333 1 100 null null null 

Fictitious data 

This is the same as case study 11, except that the gender and postcode are different in the two records, superficially leading a HES user to believe that these would be two different patients sharing the same date of birth. However, MPS assigns both to the same NHS number via cross-check trace. The reason is that cross-check trace does not use gender or postcode, and if NHS number 444444444 was superseded by 333333333, then Spine cross-check trace would be able to pick these up as the same Person_ID.

Case Study 13: One-time-use ID generated as no matches at any stage

Given a HES record with the following values:

HES record NHS number Gender Date of birth Postcode
1 null 2003-03-03 LS1 4AP

Fictitious data 

The following response fields could be returned by MPS:

HES record Person_ID Matched algorithm indicator Matched confidence percentage Date of birth score percentage Gender score percentage Postcode score percentage
1 U123KE3ABC4 4 0 0 0 0

Fictitious data 

The MatchedAlgorithmIndicator value of 4 with MatchedConfidencePercentage of 0 indicates that none of the trace steps returned a match against the PDS records or the MPS record bucket. This could indicate that there are no individuals which match the identifying characteristics provided, or that multiple matches were returned and MPS was unable to determine a single match.

This record has valid DOB and postcode, so it contains sufficient information to create a new MPS_ID. However, the Person_ID begins with ‘U’ indicating that a one-time-use ID was generated for this record, hence we conclude that multiple NHS numbers were matched, and algorithmic trace could not resolve the match.

Case Study 14: Invalid DOB results in no matches at any stage

Given a HES record with the following values:

Fictitious data 

HES record NHS number Gender Date of birth Postcode
1 3333333333 2 1800-01-01 LS1 4AP

The following response fields could be returned by MPS:

HES record Person_ID Matched algorithm indicator Matched confidence percentage Date of  birth score percentage Gender score percentage Postcode score percentage
1 U123KE3ABC 0 0 0 0 0

Fictitious data 

The MatchedAlgorithmIndicator value of 0 indicates that all steps were skipped. This is because every trace step requires DOB, but in this case the DOB was recognised as invalid.

The Person_ID begins with ‘U’ indicating that a one-time-use ID was generated for this record, hence there were no sufficient information to even generate a new MPS_ID.

Case Study 15: Same Person_ID for records corresponding to different people

Given three HES records with the following values:

HES record NHS number Local patient ID Gender Date of birth Postcode 
1 3333333333 D012347 1 1880-01-01 ZZ99 3WZ
2 4444444444 F123458 1 1880-01-01 ZZ99 3WZ
3 5555555555 H234569 1 1880-01-01 ZZ99 3WZ

Fictitious data 

The following response fields could be returned by MPS:

HES record Person_ID Matched algorithm Indicator Matched confidence percentage     Date of birth score percentage Gender score percentage Postcode score percentage 
1 B123456789 4 0 0 0 0
2 B123456789 4 0 0 0 0
4 B123456789 4 0 0 0 0

Fictitious data 

The MatchedAlgorithmIndicator value of 4 with a MatchedConfidencePercentages of 0 indicates that the records ran through 2 cross-check trace steps and algorithmic trace. However, it did not find a match as this stage. The 3 records were matched to the same MPS_ID during the MPS_ID matching phase.

It is unlikely that these 3 records, with different NHS numbers and local patient IDs refer to the same person, but because of the use of pseudo postcodes in conjunction with default values for DOB, MPS_ID matching is not able to distinguish between them.

Notably, the DOB for this example is not recognized as invalid by MPS. If it were, then the MatchedAlgorithmIndicator would have been 0, and the three records would have each been assigned a one-time-use ID.


Mps_diagnostics can help with the case studies

The examples laid out in the case studies may be confusing to analysts when encountered in real data, since the Person_IDs are accompanied by little (or no) information on the matching process. Some relevant contextual information can be found in the MPS response record (returned by MPS), however this record is not usually shared with users, including on HES, where only the MatchedAlgorithmIndicator and the confidence scores are available. The mps_diagnostics data set was created to address this shortcoming.

MPS Diagnostics is the pipeline that produces mps_diagnostics and uses the contextual information from the MPS response file, and some additional data from PDS, to create 10 columns of metadata explaining in user-friendly terms how each Person_ID was derived.

An accompanying document explains more about mps_diagnostics, revisiting some of the case studies in this chapter and demonstrating how mps_diagnostics helps to explain them.

mps_diagnostics is available upon request for internal NHS England analysts via CDAs (clear data agreements), or for external NHS E users via DSAs (data sharing agreements).


Last edited: 27 February 2024 3:54 pm