Guidance for submitting participant data (cohort file)
The method of submitting a cohort file via an upload to NHS England’s Secure Electronic File Transfer (SEFT).
Before you start
We recommend reading this guidance page in full, including the pre-submission checklist before you submit your cohort file.
Introduction
NHS England’s Participant Validation Engine enables users to upload their cohort of participants to allow for validation, tracing and retention of those participants’ details for future linkage to NHS England datasets listed within a particular Data Sharing Agreement (DSA).
The method of submitting a cohort file is via an upload to NHS England’s Secure Electronic File Transfer (SEFT).
Once a file has been uploaded, the participants are initially validated to ensure that the data items provided are in the correct format and listed within the DSA.
If the file is successfully validated the participants’ details will be matched to the Master Person Service (MPS) held by NHS England. Once matched, the participants’ details will be stored in a cohort table for linkage to NHS England datasets.
Users receive email notifications when their file passes or fails validation, and receive a report on what validation failures occurred and which participants failed matching.
Pre-submission checklist
Check this list of reminders before you submit your cohort file.
- Is your Data Sharing Agreement (DSA) active?
- Have you received and read the SEFT guidance?
- Is your cohort file in .CSV format?
- Have you included the column headers in your file?
- Have you included all mandatory data items in your file?
- Have you checked all data items you are supplying are in the correct format?
- Are all of the data items you are supplying listed in your DSA?
- Does your file contain less than one million records?
- Have you included the first part of the DSA’s NIC number in your filename?
- Do you know the correct SEFT folder to upload your file to?
Preliminary checks
Note the following points before you create your cohort file:
1. Once your organisation’s Data Sharing Agreement (DSA) is active, the Secure Electronic File Transfer (SEFT) team will email the individual within your team identified as the 'Cohort Submitter' or ‘Participant Data Provider’ to inform them that the cohort associated with your DSA can be uploaded to SEFT. Guidance for using SEFT will be provided within the email and can also be found at the end of this guidance.
2. Cohort files must be submitted as ‘Comma Separated Value’ (.CSV) only. Ensure you do not include any commas in the values provided in the rows. We cannot support any other file formats, such as files submitted as .XLSX.
3. The cohort filename must contain the first part of your DSA number, followed by free text of your choosing, such as NIC-123456[anyothertext].csv.
As an example: ‘NIC-123456_October Upload.csv’.
Where you have multiple agreements, ensure you have selected the correct NIC number.
4. Ensure you have read through the SEFT guidance and are aware of which folders to use for uploading your cohort submission and for retrieving your data.
5. Check that all data items that your Data Sharing Agreement allows you to submit for your participants have been collated.
The cohort submission template
1. The cohort submission template is provided as an accompanying file to this guidance. It serves as an example as to what a correctly formatted cohort file should look like.
2. Remove the example data in the template and add your participants’ details. Alternatively create your own CSV cohort file using the template as a guide. Include the headers as per the template.
3. The template contains column headers and within those columns are examples of the data that can be entered.
4. Although the template has most of its fields completed, you may not be able to submit all of these fields. To determine which fields can be submitted in line with your Data Sharing Agreement see ‘Checking against the Data Sharing Agreement’.
5. Save the CSV cohort file with a filename as per point 3 in ’Preliminary checks’. Make sure you include the first part of your DSA number.
Populating your file
The file formatting table below provides guidance on the requirements of the cohort file(s). These must be adhered to, as this ensures the cohort can progress successfully through the cohort validation process.
1. We strongly advise that you submit as many identifiers as possible (agreed in the DSA), one of these needs to be DOB in your cohort.
The best quality tracing is provided when NHS number and DOB are available.
Other combinations of identifiers (such as DOB, NHS number and Family name) will also generate a trace but the quality will depend on the data (see ‘Tracing/matching process’ for more details). Too little identifier information could result in your file submission being accepted, but your cohort may not be traced.
2. The purpose of the ‘Status’ field is to confirm whether the participant should be added to or deleted from the cohort. A file must contain ADD or DELETE in the 'Status' column for every row of data. You must not submit the same participant with a DELETE and ADD status in the same file.
3. The file must contain a ‘UNIQUE_REFERENCE’ for each row of data (so if you DELETE and ADD the same record in a file it will fail because it would have the same ‘UNIQUE_REFERENCE’).
4. There is a maximum cohort size of 1 million participants/rows in the cohort file. If you submit more than 1 million your file will be rejected.
5. PERSON_MIN_DATE specifies the earliest date from which you would like data about the participant and PERSON_MAX_DATE specifies the latest date from which you would like data about the participant.
You must complete the ‘Cohort Submission Template’ according to one of the options below.
If date minimisation required:
If your data sharing agreement requires your extracts to be minimised by date, each participant needs at least one date filled in (either Person_Min_Date or Person_Max_Date) to be included in the extract. You can either:
- fill in just Person_Min_Date - the date entered cannot be later than the ‘PERSON MAX DATE’ if you are using both fields
- fill in both Person_Min_Date and Person_Max_Date
If for any reason, there are participants without any dates, these will automatically be excluded from the extract.
Date minimisation not required:
In the case that you do not require date minimisation you should not include dates for any participant in the ‘Cohort Submission Template’.
File formatting table
If mandatory data items are not provided then the file will either be rejected or it will not be possible to match any participants.
Column | Field heading | Mandatory/ optional | Description | Format |
---|---|---|---|---|
A | UNIQUE_REFERENCE | Mandatory | This is the UNIQUE REFERENCE (StudyID) that you use to identify a particular record in your extracts. You must supply a value for each row of data. The file must not contain the same UNIQUE REFERENCE more than once |
This value must be unique within this field. Any combination of letters or numbers is allowed but should not include identifiable data 100 character limit Special characters are not allowed with exception of a hyphen ( - ) |
B | NHS_NO | Optional | NHS number | 10 Numeric digits |
C | FAMILY_NAME | Optional | Surname, or family name | Text – 40 character limit |
D | GIVEN_NAME | Optional | Forename, or given name | Text – 40 character limit |
E | OTHER_GIVEN_NAME | Optional | Other given names or middle names | Text – 100 character limit |
F | GENDER | Optional | Gender of the participant |
Use one of the following numerical values: 1 for Male 2 for Female 0 for Unknown 9 for Not specified |
G | DATE_OF_BIRTH | Mandatory | Participant’s date of birth. The DOB is required in every minimum set of details used in each of the tracing/matching steps and should be provided |
YYYYMMDD or YYYY/MM/DD or YYYY.MM.DD |
H | POSTCODE | Optional | Standard UK postcode of participant’s address | Alpha numeric – with or without a space in between – 8 character limit |
I | ADDRESS_LINE1 | Optional | First line of participant’s address | Text |
J | ADDRESS_LINE2 | Optional | Second line of participant’s address | Text |
K | ADDRESS_LINE3 | Optional | Third line of participant’s address | Text |
L | ADDRESS_LINE4 | Optional | Fourth line of participant’s address | Text |
M | ADDRESS_LINE5 | Optional | Fifth line of participant’s address | Text |
N | STATUS | Mandatory |
A status of ADD or DELETE must be included for each row of data See point 2 in 'Populating your file' on the purpose of this field |
Use one of the following text values: |
O | PERSON_MIN_DATE | Optional |
The earliest date from which you would like data about the participant See point 5 in ‘Populating your file’ on the purpose of this field |
YYYYMMDD |
P | PERSON_MAX_DATE | Optional |
The latest date from which you would like data about the participant See point 6 in ‘Populating your file’ on the purpose of this field |
YYYYMMDD |
Checking against the Data Sharing Agreement
When uploading your cohort, please refer to Annex B ‘Additional Technical Information’ in your Data Sharing Agreement (DSA) to ensure the identifiers (such as Family name, DOB, NHS number) you will upload in your file are within the scope of your DSA.
In this example only 3 data items can be provided in line with the DSA. If, for example, you do include something else, such as Postcode, this is not allowed under the DSA and the system would remove the Postcode data and it would not be used in the matching.
File validation checks
1. To upload your cohort, you must access your Cohort Submitter SEFT Account. This account contains your initials at the end of your User ID - such as NIC-12345-ABCDE_XX. Our system will only accept files dropped into this location.
2. The system will validate your submitted file. If the file fails any checks you will receive an email advising of the failure and directing you to the submitter’s SEFT account to view the associated error file named ‘Participant-Data-Validation-Errors'.
Note that you may have up to 2% of the participants fail validation checks for the file to continue to the matching stage. If, however, 2% or more of the participants fail validation then the whole file will be rejected. You will need to rectify the errors and submit a new file. On average it will take 1 hour for validation/matching to take place, but this can take up to 12 hours, depending on the size of the cohort and data items supplied for matching.
3. After your file has been successfully validated and participants matched, you will receive a link directing you to download a ‘Participant-Data-Summary-File’. This file will provide you with key information about the submission. This will be placed in the download folder of the Cohort Submitter’s SEFT account in the form of a multi worksheet tab Excel file.
4. Check for outstanding validation errors in the ‘Errors’ tab of the summary file. You can correct those records and resubmit another file.
5 To add new participants to an existing cohort at a later date, repeat the process above and submit a new file with new participants marked with a status of ADD.
6. To amend the details of a participant in your cohort that has already been uploaded , submit a new file that includes the updated participant’s details and mark their status as ADD. The existing record will then be updated with the new values supplied. You can do this as a standalone file or include these amended details in a file alongside new participants to be added to the cohort.
7. If you are the Data Recipient of the extracts linked to your cohort, the SEFT account that you will need to access to download your data is the one without your initials at the end of your User ID - such as NIC-12345-ABCDE.
Tracing/matching process
After successful validation we will send the list of participants to be ‘Traced’ or matched against our Master Person Service (MPS). Submit as much identifier information as your DSA allows to give the maximum opportunity for trace success.
MPS methods to trace participants are listed in this table:
MPS trace | NHS number | DOB | Given name | Family name | Postcode | Gender |
---|---|---|---|---|---|---|
Cross check | Mandatory | Mandatory | Optional | Optional | Optional | Not used |
Alphanumeric | Not used | Mandatory | Optional | Mandatory | Optional | Mandatory |
Algorithmic | Not used | Mandatory | Optional | Optional | Optional | Mandatory |
Cross check
Only DOB and NHS number are mandatory for this trace step, but if there is not an exact match then Postcode and name information can be used.
Alphanumeric
If there is no match at ‘Cross check’ trace, then we will run an ‘Alphanumeric’ trace.
The minimum required fields for an Alphanumeric trace are DOB, Family name and Gender, but the fields Given name and Postcode can also be used.
Algorithmic
Algorithmic trace uses the supplied identifiers to perform further traces. In this type of tracing method a match needs to be above a set threshold to be returned as a match.
For more information please refer to the Master Person Service (MPS) user guidance.
Contact us
If you have any queries or require further clarification, email [email protected] and include the words ‘Participant Validation Engine’ in the subject line.
Last edited: 25 March 2025 12:01 pm