Skip to main content

Import reference data

Find out how to import your own reference data into the NHS England Secure Data Environment (SDE). 

This service allows researchers to import reference data files into the SDE environment for use alongside data sets already available to the user under their Data Sharing Agreement (DSA).

This guide provides information on

  • what type of data you can bring in
  • how to prepare the file
  • the process for uploading and providing contextual information
  • how to access the file in the environment
  • how to update a file in the environment

Types of data included in this service

Reference data is data which may be used to classify, sort or better interpret data records. Typically, reference data is static or slowly changing over time. 

Examples of reference data include
  • codelists, for example specific subsets of SNOMED codes, for example ones that deal with cardiovascular incidents
  • look up data, i.e. a data set which maps codes to meaningful descriptions for example, data sets that map SNOMED codes to SNOMED terms, i.e. 414292006 - “Fracture of lower leg”
This service does not allow
  • data that contains any personal and/or confidential information
  • data files that are larger than 1MB

This means, for the time being, this service does not allow participant (cohort) data to be imported into the SDE.

If you are unsure if your data can be imported with this service, please contact us at [email protected].


Preparing the file

When importing your file, it is important that your file is named and formatted correctly so that it can be uploaded without any delays.

The following rules apply when preparing your file.

Format
  • the file must be in a CSV format. The values should be comma-separated, not pipe or tab separated
  • if working in MS Excel, save as Comma-separated Values (.csv) file type. Saving as CSV UTF-8 (Comma-delimited) (.csv) in MS Excel may corrupt the file. Further instructions provided below*
  • the file cannot be larger than 1MB
  • each reference data set or code list must be in its own separate CSV file. If you want to upload different tables, you must separate this into different CSV files and submit each separately
*How to ensure your CSV file has the right encoding using Notepad++ or Notepad

Before following these steps, ensure you have saved your file as a CSV (comma delimited) (*.csv) file type in MS Excel.

Follow these steps if using Notepad++:

  1. Open the file up using Notepad++
  2. Select Encoding from the menu at the top
  3. Select Convert to UTF-8
  4. Save the file

Follow these steps if using Notepad:

  1. Open the file up using Notepad
  2. Select File from the menu at the top
  3. Select Save As
  4. Leave the filename as is, set Save as type to be All files (*.*) and set Encoding to be UTF-8
  5. Save the file
File name
  • use a file name that accurately describes the data The name you give this file will become the name of the table created in Databricks
  • file names should contain only alpha-numeric characters or underscores. Use underscores instead of hyphens or spaces
  • alpha-numeric characters must be lowercase

If you have previously uploaded a file with the same name, and that data is already in a Databricks table, the data from the new file will replace the existing table.

It is important that you check the Collaboration Database in your Data Sharing Agreement to avoid overwriting an existing reference data table.

However, if you are trying to upload a file, where a file with the same name is still in the process of being uploaded or checked, you will get an error.

File name examples
Acceptable Not acceptable
ref_data_snomed.csv ref data SNOMED.csv
ckd1_data_snomed.csv CKD1 (data) SNOMED.csv
Headers
  • the first row in the csv file must contain your headers there should be no rows above this, or no other rows with headers
  • headers should not include spaces or hyphen use underscores instead
  • no special characters are allowed in the header (apart from an underscore)
  • column names should be unambiguous, clear, and describe what the column contains (this helps our input checking team when reviewing the file)
Header naming examples
Acceptable Not acceptable 
"Header1" "Header 1"
"Generic_Name" "Generic-Name"
Data
  • every column will be loaded as a string data type for example, even if your values are numeric, they will be loaded as a string and not a numeric data type. Please verify the number format is correct in a text editor prior to uploading
  • you cannot have information in a column where there's no corresponding header (for example, you cannot have 5 headers, but one row extends into a 6th column)
  • for row values, special characters (generally characters that are not letters or numbers) are allowed (with the exception of £), but the entire string should be enclosed in speech marks to ensure successful loading. Note that it will not be necessary to add in speech marks if you are preparing your file in MS Excel
  • spaces can be included in a string of text values and no speech marks are required
  • a file will be rejected as being ‘too small’ if it’s totally empty or it only has a header row
  • the data should not contain any line breaks 
Example of a line break in excel view and text editor views
Header1 Header2 Header3
Hi

Hello

World

Hey
example csv file
     

testcsv

Examples

Example of file which will fail checks:

  • extra header row
  • space in the header
  • special characters in this string

bad example

Example of file which will pass checks:

  • extraneous header row removed
  • no spaces in header
  • row value with special character has quotation marks around it

csv good example

 


Remove information about individuals

Ensure that no personal and/or confidential information is found anywhere in the file. Our compliance team checks all uploads and will ask you to re-upload the file if any personal and/or confidential information is found.

To avoid your request being rejected, we strongly advise that you do not bring in values sourced from unstructured/free text fields because of the risk of personal and/or confidential information inadvertently being introduced (for example a field with ‘notes/comments’).


The process

This is a high level process map describing the steps involved, from preparing your file to being able to access the table in your environment.

High level process map

What the image shows
  1. Review guidance
  2. Prepare csv file
  3. Upload file and send contextual information
  4. Does it pass technical format checks?
  5. No (user is informed by email)
  6. Yes, file is reviewed against guidelines
  7. Pass checklist?
  8. No (user informed by email) 
  9. Yes, file added to environment

Uploading the file

You can submit one file at a time.

To upload and submit the file

  1. Log in to the SDE portal.
  2. Go to upload reference data.
  3. Select choose file.
  4. Click upload.
  5. Check the file is correct and click submit file.
  6. Send the supporting contextual information.

upload ref data


Providing contextual information

It is important that we understand, and document why, you want to upload this data.

Send this contextual information by email to [email protected] as soon as you have uploaded the data file.

We cannot approve the data import until we have this contextual information.

The email should

  • refer to the name of the file that this refers to
  • explain what the data contains if your labelling/headers etc are not self-explanatory, ensure that your contextual information explains them
  • explain how this data will complement other data to help with your research.

If you are uploading several files for the same project you can collectively explain the context in one email

Here is an example of appropriate contextual information

I’ve just uploaded a file called diabetes_codelist_project54a.csv

The file is a code list with ICD10 codes related to the condition I am researching.

There are 2 columns

  • ICD10 – this is the ICD10 code
  • ICD10_Description – this is the description of the condition

We want to bring in this data so that we can establish a specific cohort for analysis.


How we check the file

After you upload the file, it will be checked for

  1. File and format requirements – these are automated checks that ensure the file can be ingested into the SDE.
  2. Ensuring that no personal and/or confidential information is in the data - this is checked manually by our Input checking team.
  3. Whether the file presents any risk for re-identification of data records within the SDE. For example, where a file contains measures (such as pollution levels) recorded at a particular time and geographical location.

If either of requirements 1 or 2 are not met, you will be informed of the reason why, and asked to resubmit the data. 

If the input checker deems that the file may represent a risk for re-identification, the file review will pass to a Data Wrangler for further analysis. The Data Wrangler will contact you for more information.

Upon receiving the file and contextual information, the input checking team will process your request within 5 working days. This means either your data will be added to Databricks within that time, or you will be contacted about any issues.


Accessing the data in the environment

Once the file has been checked and approved, the data will become available as a table in Databricks in the database dsa_<agreement name>_collab database

You will receive an email when the table is ready to use.

Data explorer


Updating a file

If you want to update a table in the database, for instance if you have a more up-to-date file, you should upload a new file using the same name as the existing one.

Once approved, the existing Databricks table will be automatically overwritten with the new data from the updated file.


I want to bring in another type of data set

If you want to bring in data that does not fit the criteria for this service, that may require a different process. Contact [email protected] to discuss your requirements.


Summary

Here are some quick reminders before you start preparing your data file.

Do
  • ensure the file is the right format and size
  • ensure that the file name follows the correct naming conventions
  • ensure that the headers and data follow the correct format
  • send contextual information as soon as you have uploaded the data
Don't
  • include any personal and/or confidential information in your data

Contact us

For import data issues, support with preparing your file and context emails, contact the SDE input checker team at [email protected].

For technical issues, such as account access, contact the National Service Desk on [email protected] or by phoning 0300 303 5035.


Drop-in sessions

SDE users are invited to attend drop-in session run by our data specialist team. These open meetings are for questions relating to the SDE environment and its tools, including data sets and importing data.

Drop-in sessions occur every Monday, Wednesday and Friday from 10am to 11am GMT (excluding public holidays) via Microsoft Teams. Invites are sent with your SDE account activation email. Alternatively, contact [email protected] for more information.

Last edited: 6 May 2025 7:57 am