Skip to main content

Part of Introduction to the Secure Data Environment

Import reference data

Current Chapter

Current chapter – Import reference data


Overview

This chapter will introduce you to importing reference data into NHS England’s Secure Data Environment (SDE).

This is an optional chapter. If you know that you will not need to import reference data, you can read this overview section and skip the rest of the chapter.

An example of when you might want to use the Import Reference Data function would be if you already have a condensed list of relevant acceptable reference codes, such as SNOMED codes, outside of the SDE. To avoid duplicating work, you could simply import your own codes rather than using the reference data provided within the environment's database.

This page has been designed to be read alongside the guidance on import reference data.


File restrictions

There are restrictions on the file types and sizes you can bring into the SDE. Currently, the service only accepts csv files no larger than 1MB.

You can read the Import reference data guidance, which is linked below, for detailed instructions on how to prepare your file to be imported into the SDE. 


Giving context

When importing data into the SDE, you must provide the context - or reason - for why you need to import it. The service team will then document this reason as part of the import process. This can be done by sending a context email to [email protected] as soon as you have uploaded the data file.

Context emails must contain:

  • what the data is showing (such as SNOMED codes related to smoking)
  • what the field names mean 
  • what the values represent

How to import reference data into the SDE

How to import reference data into the SDE

Transcript for how to import reference data into the SDE video

Slide 1

Welcome. This section will walk you through the process of importing reference data into NHS England's Secure Data Environment.

Slide 2

Here is a high-level process map that you can find on our guidance page linked below this video. The file you have imported will undergo some technical format checks that will check the file, name, format, and size. It is then manually reviewed by the input checker against our guidance and, if successful, is added to the environment.

Slide 3

The first step is to prepare your data for upload.

Slide 4

The best way to prepare is by reading our import reference data guidance page, which provides clear instructions on the file restrictions that apply. This guidance page can be found in the links below this video.

The main restrictions are:

  • files must be of csv format
  • files may not be larger than one megabyte
  • files must not contain any personally identifiable information

Slide 5

As an example, I have created a dummy csv file called 'valid_demo' with six rows and three columns.

Slide 6

Slide 7

After selecting your agreement at the SDE login page, you'll view the option to upload reference data. Here you can view some of our guidance and you'll be able to choose a file to upload.

You can then submit the file for automated preliminary checks. In this step, the file size type and name will be checked against our guidelines.

After submission, you'll receive an automated email. If successful, the file will then be reviewed by the input checkers. You can either decide to upload another file or finish the process.

Slide 8

Reference data file submissions must be accompanied by a contextual email sent to the input checking mailbox. Contextual emails must contain what the data is showing. For example, SNOMED codes related to smoking, what the titles mean, what the values represent.

Slide 9

If the file passes technical formatting checks, you'll be notified with an automated email.

Input checkers will then check the file for personally identifiable information. Requests will be processed within five working days.

Slide 10

Slide 11

If your reference file has been approved by the input checkers, you'll receive an automated email informing you that your file is now available in the SDE.

Slide 12

You can access your file through the collab database 'dsa_collab'. The last time I ran this command in the Databricks notebook, the table had not yet been uploaded but after upload, rerunning shows that the table now exists, and I can use SQL, PySpark or SparkR in Databricks to query it.

The field types are all string as expected. I can also query the table using native R in RStudio.

Slide 13

When uploading reference data to the Secure Data Environment, watch out for these common mistakes:

  • uploading files that are not csv
  • not sending a contextual email to the input checkers
  • attempting to upload files larger than one megabyte in size

Slide 14

when generating a csv file using Microsoft Excel, make sure to select the file format which says '.csv' rather than '.csv UTF-8'

Slide 15

That brings us to the end of this video.

Necessary reading


Last edited: 15 November 2024 2:18 pm