Constructing submission files
Provides guidance for when you are constructing a submission file.
We have created Intermediate Database (IDB) guidance to help users to prepare data in the IDB. This guidance can be used for most data sets developed by the NHS Digital Data Set Development service.
Creating the person_ID
The Person_ID is derived so the MPS service will return the MPS trace NHS number available for all person demographic datasets that are processed through it.
Data Processing Services (DPS) will use this service in combination with their own processing to generate a universal Person_ID (previously referred to as the Common Based Linkage Attribute (CBLA)). This Person_ID will either be populated with the NHS Number where the input data can be matched to a single record selected from PDS that holds a valid NHS number.
UPRI 1 (Unmatched Person Record Identifier 1)
Where the input data cannot be matched to a single person record holding a valid NHS number within PDS but the data for the record is sufficient to identify a distinct person. So the MPS service will generate an ID that will be consistently attributed to that distinct person record, thus making it suitable for linkage. UPRI 1 will be prefixed with an A or B (or, in the future, could theoretically be any alpha character excluding ‘U’).
UPRI 2 (Unmatched Person record Identifier 2)
Where the input data cannot be matched to a single person record holding a valid NHS number within PDS and the data for the record is insufficient to identify a distinct person (for example John Smith and postcode).
In this scenario, DPS will generate a one-off ID (UPRI 2) that will not be attributed to any distinct person record and thus cannot be used for linkage.
UPRI 2 will always be prefixed with a U.
Once the Person_ID has been passed through the MPS service it is then passed through a tool called Privitar that uses an algorithm to pseudonymise the ID as we do not provide the Person_ID in the clear.
Data formats
We have provided examples of the data format types used within the data sets and provided descriptions and notes about how these formats are handled throughout data submission.
Dates
A record could be rejected or a warning generated if start date/time in a group is after the end date or time.
Dates and times need to be in chronological order. PERSON BIRTH DATE in the Master Patient Index group for example must be before the end of the reporting period, as a person must be born before the end of the reporting period to appear in the submission.
The NHS Data Model and Dictionary specification for date formats is a 10 character field in the format CCYY-MM-DD. The submission IDB and/or XML schema holds dates in a generic 'date' format. As long as the supplied IDB or XML Schema is not modified, and a valid format is submitted, date data will flow effectively.
The IDB or XML Schema will accept various date formats and translate them into the required format. However, data providers should check that the day and month parts of the date have been correctly interpreted. For example: the 5th April 2013 entered as DD-MM-YYYY will be correctly interpreted. The same date entered as MM-DD-YYYY (American) would not as the IDB or XML Schema would have no way of knowing that the intention was not to enter 4th May 2013.
Dates and times, where specified, should fall within the reporting period unless otherwise indicated.
Dates and times when clocks change to BST
The date and time should be submitted as it is recorded in local systems.
Invalid dates
Invalid dates (such as 47/15/2015 which does not exist in a standard calendar) will cause the record to be rejected and an error message returned. An invalid person birth date will result in a file-level rejection.
NHS Data Model and Dictionary formats
Records can be rejected if data is submitted in an incorrect format. The ‘Format’ column in each data table within the TOS specifies which format is required for each data item and this will also be specified for each data item (for example REFERRAL REQUEST RECEIVED DATE).
Each of these formats are explained in the table below, as well as an explanation of characters used.
Characters
n = numeric value 0-9
a= Alphabetic text a-z, A-Z plus any characters present in the UTF-8 basic latin character set.
Characters that are acceptable to flow, but are not 0-9, a-z or A-Z are referred to as special characters. There is more information on special characters on the TOS technical glossary page.
TOS format | Explanation | Examples of valid data |
n6 | Numeric value, exactly 6 numbers in length | 123456, 874523, 000123 (note 123 is invalid, leading 0’s must be included) |
a7 | Alphabetic Text Character value, exactly 7 letters in length | abcdefg, meofhbh |
an6 | Alpha-Numeric value, a combination of exactly 6 numbers/letters | a1b2c3, aaa123, abcdef, 123456 |
max an6 | Alpha-Numeric value, a combination of up to 6 numbers/letters | a1b2c3, 123456, as34, baw21, a1 |
min an3 max an6 | Alpha-Numeric value, a combination of at least 3 and up to 6 numbers/letters | a1s, ae53s, 123456, abcdef, ad73 |
Data format
an10 (CCYY-MM-DD) | Alpha-Numeric exactly 10-character date value, including hyphens, two numbers for the Century, Year, Month and Day. Don’t worry if the IDB presents dates back in a slightly different format, DD-MM-YYCC, this is fine. | 2019-10-21, 2019-01-03 |
Data and time format
an19 CCYY-MM-DDTHH:MM:SS | Alpha-Numeric exactly 19-character date and time value, including hyphens, “T” and colons, two numbers for the Century, Year, Month, Day, Hours, Minutes and Seconds. T is a delimiter; it must be present and must be in eleventh position. Similarly, the colons must be present and in positions fourteen and seventeen. | 2019-10-21T10:38:32, 2019-01-03T17:26:01 |
National codes
Many data items recorded across the data sets are associated with NHS Data Model and Dictionary national codes. The code lists for these items will be listed in the TOS for their corresponding data items and also in the NHS Data Model and Dictionary (for example: ATTENDED OR DID NOT ATTEND CODE).
A record could be rejected, or a warning generated if a code is submitted that isn’t on the list, is left blank or is in the wrong format.
White space and leading zero’s
Leading and trailing spaces count as characters in a piece of data being submitted so caution needs to be taken to avoid having extra white space added in error which can lengthen data items and break the format rules.
White spaces and NULLs submitted as values are different and can affect Data Quality. A white space has a length and a value , NULL has neither. Null values must be allowed to flow provided validation rules for the individual item have been applied. Otherwise, records should not flow.
Leading zeros, if applicable in a national code list, need to be included. For example, if a national code value is 01 and the format is n2 then ‘1’ will not be accepted because it is too short. ‘Space 1’ is also invalid as a space is not numeric. In some cases, a space does need to flow, for example for POSTCODE OF USUAL ADDRESS.
NHS numbers
NHS numbers, if provided, must pass Modulus-11 checks. For all NHS number fields, It is recommended that default NHS numbers such as ‘1111111111’, ‘9999999999’, ‘1234567890’ etc. are not submitted, where NHS number is not known they should be submitted Null. If default numbers are submitted, they will be accepted by the system.
Postcodes
All Postcodes are formatted as ""max an8"" in the Data Dictionary. The data item must be submitted in one of the following formats:
A1_1AA
A11_1AA
AA1_1AA
AA11_1AA
A1A_1AA
AA1A_1AA
The fourth character from the right is always a space and separates the outward and inward parts of the postcode.
All postcodes are validated against the Gridall file produced by the Office of National Statistics (ONS).
New postcodes come into use, and redundant postcodes are retired on a continual basis. This necessitates updates to the ODS files that hold a record of all postcodes, and subsequent updates to the reference files that are used during Mental Health Services Data Set (MHSDS) and Improving Access to Psychological Therapies (IAPT) processing. The time lag involved means that on some occasions submissions contain valid postcodes that are then reported as invalid in the Portal warnings. If providers are aware that this is/or might be the case, they can contact the NHS National Service Desk team. They will record details of the new valid postcodes, which are incorrectly generating the warnings, and ensure the reference files are updated accordingly.
Further detail about postcodes can be found in the NHS postcode directory.
Mandation and validation
The requirements for each data item are shown as they are described in the Technical Output Specification.
This table shows how data is validated according to its mandation.
Mandated data items (M)
Mandatory data items must be reported and represent the minimum items for a record to be accepted in this group. Failure to submit these items will result in the rejection of the record.
Mandatory data items must contain valid data. Any groups being submitted must have all mandatory data items completed and records will be rejected if mandatory data items are left blank or have been submitted in an invalid format.
Validation rules
The rejections relate to all the data for that patient's record within the specific table.
- rejected - if blank
- rejected - if format error
- warning if national code error (where national codes are present or a look-up table exists)
Required data items (R)
Required data items should be reported where they apply. Failure to submit these items will not result in the rejection of the record but may affect the derivation of national indicators or national analysis. The purpose of the data set is not to change clinical practice.
Validation rules
The rejections relate to all the data for that patient's record within the specific table.
Certain key required data items used for MPS Person Index Logic (e.g. NHS Number) output a warning if blank. Discharge Date or other end dates must be completed if known, but can be left blank where not yet known.
- Not applicable if blank
- rejected if format error
- warning if national code error (where national codes are present or a look-up table exists)
Optional data items (O)
Optional data items may be submitted on an optional basis at the submitter's discretion.
Validation rules
The rejections relate to all the data for that patient's record within the specific table.
- not applicable if blank
- rejected if format error
- warning if national code error (where national codes are present or a look up table exists)
Pilot data items (P) (where applicable)
Pilot data items have been included within the specification for piloting purposes only to support future implementation. These data items have not been approved and/or mandated and should not be submitted unless specifically requested by NHS Digital.
Validation rules
The rejections relate to all the data for that patient's record within the specific table
- not applicable if blank
- rejected if format error
- not applicable if national code error
Derived (D)
These items are derived during pre and/or post deadline processing for inclusion in the extracts made available for download. Please note: these are not for submission to the Submission Portal and are not included in the IDB or XML Schema.
Validation of records
Once the data has been submitted to the central data warehouse the data is validated in the following ways.
File level
Validation of the entire submission for the reporting period and can lead to rejection or issuing of warning messages. Identified issue(s) to be rectified and a resubmission made. Warning messages should be addressed and required actions undertaken.
These can be found through the file-level rejects tab
Example validation
The IDB file you have uploaded contains no entries in the MHS002GP table. No data included in this table suggests vital information is missing from this submission.
Error or warning message
MHSREJ003 - Failed content check. MPI table is empty.
Table level
These compare records within or across multiple tables, leading to rejection of multiple records or a warning message displayed. For example, they could be to check referential integrity between tables or for duplicated records within a table. Rejected records. would not progress to post deadline processing. Records with warnings would progress, but data quality would not be as required.
These can be found through the individual table tab.
Example validation
This group will be rejected if there is no valid MSD101 group transmitted for this PREGNANCY IDENTIFIER.
Error or warning message
MSD2011 - Group rejected - No valid MSD101 group transmitted for this.
Record level
These can be against a single data item or across multiple data items within a single record, leading to either the rejection of the record or a warning displayed. Rejected records would not progress to post deadline processing. Records with warnings would progress, but data quality would not be as required.
These can be found at individual table tabs.
Table level
If General Medical Practice Code (Patient Registration) is not in national organisation tables as an "open" organisation, a warning will be reported.
Error or warning message
IDS00206 - Warning - General medical practice code (patient registration) is not for a current live organisation in national tables.
Error reporting
Rejections
A rejection means that, due to a data quality error, NHS Digital are unable to take this data forward for further processing. The data has therefore been rejected and must be corrected and resubmitted in order for it to be accepted. There are three levels that a rejection can occur at:
File-level rejects
The whole file will only be rejected if there is nothing at all in a mandatory group (as by default, the rest of each record would be rejected). Some data items, such as NHS NUMBER in the Master Patient Index failing the Modulus-11 check, will also result in the whole file being rejected and no records flowing. More details on file level rejects can be found in the File-level Rejects tab in the TOS.
Group-level rejects
A group (or table) will be rejected when data submitted haven’t passed the respective Group-level validation checks. For example, many groups within most data sets will be rejected if they don’t contain a Local Patient Identifier.
Field level rejects
Some data items will cause the whole record to be rejected if they contain errors, for example: if ORGANISATION IDENTIFIER (LOCAL PATIENT IDENTIFIER) in the Master Patient Index table flows in the wrong format , this will result in the whole record being rejected. Other records, i.e. for different patients, will still flow.
Warnings
A warning is triggered when a possible data quality issue is encountered, but the data would be accepted for further processing without intervention. For example, where a code is submitted that is not in the expected national code list or where a field is blank where we would expect it to be populated. If a warning is triggered, we would advise that the data is checked to establish why this is.
Warnings that appear in data quality reports are there to help improve the data being submitted. Warnings do not stop the flow of data, they are indicators.
Warnings or rejections on required data items
Some required data items will generate a warning if not completed depending on whether other data items are populated. For Mental Health, NHS NUMBER STATUS INDICATOR CODE in MHS001MPI for example, must be completed if NHS NUMBER is blank. Another example is in the MHS202CareActivity group. If CODED FINDING (CODED CLINICAL ENTRY) has a value then FINDING SCHEME IN USE must contain a valid value, again a warning will be generated if not.
Extracts
Extracts formats
The extracts produced for providers and commissioners closely resemble the input data format. The data is input by the providers using a single IDB containing the data tables. The output extracts will consist of a single XML file containing a segment for each of the tables. Each segment will include the data for that table taken forward after validations have been applied.
The XML file is easily imported into programs such as Access, Excel and SQL.
Provider extracts
Providers receive two types of extract, a Pre-Deadline extract and a Post-Deadline extract:
The Pre-Deadline extract is produced for every file submitted by the provider that passes file-level validation. Pre-Deadline extracts are an output of pre-deadline processing. These extracts can help providers to review and improve their submissions, before the deadline.
The Post-Deadline extract is only produced for the 'last good' file submitted by deadline. The Post-Deadline extract is an output of post-deadline processing, which will validate the data in exactly the same way as pre-deadline but with the addition of Submission Portal derivations to the extract.
Commissioner extracts
Commissioners with an Organisation Data Service (ODS) Code can register to download record level MHSDS post-deadline extracts from the Submission Portal. The MHSDS records in these extracts are filtered by Organisation Code (Code of Commissioner) as entered into the MHSDS submissions for each patient submitted by a provider.
Those records where your commissioner code is present in the Organisation Code (Code of Commissioner) field of a submitted IDB will appear in your extract subject to the inclusion criteria applied during processing. Your final extract will be an amalgamation of records from provider files which quote your commissioner code in this field in their IDB. If data from providers you commission relevant services from do not appear in your extract you should contact the provider to ensure they are recording your organisation code correctly in future submissions.
Provider pre-deadline, provider post-deadline and DSCRO extracts
The National Extract is a pseudonymised post-deadline extract containing all MHSDS data items except patient identifiers, for the use of NHS Digital.
Submission portal summary reports
Each provider can access their own summary reports during pre-deadline processing.
Summary reports are produced for each uploaded file as part of pre deadline processing. For each report, a summary can be viewed online and a detailed text file downloaded.
The reports have been designed to help organisations assess the quality of the submitted file so they can consider whether to make a further submission.
The Summary report provides information in the following categories:
1. validation failures
2. diagnostics
3. aggregate counts
Commissioners can now view a post deadline clusters report designed to help monitor the implementation of currencies and payment.
If a file fails validation, then only the validation failures will be produced.
Further details can be found in the file-level rejects, submission portal diagnostics and submission portal counts tabs within each TOS.
Last edited: 5 August 2022 1:30 pm