Part of Using Databricks in the NHS England Secure Data Environment
Best practice and further information
Best practice
The following best practice should be observed to help you work in Databricks efficiently whilst maintaining consideration for other SDE users.
Consider others when making changes to tables and notebooks
- do not ‘drop’, ‘truncate’ or ‘delete’ other users’ tables without consulting with them
- do not delete or edit other users’ notebooks without consulting with them
Handle unrecognised folders
If you notice a new folder that you and your colleagues do not recognise, raise a service request on 0300 303 5035 or via email at [email protected].
Use lowercase for table and database names
Avoid using uppercase text for table and database names as this will result in the query failing.
Use meaningful identifiers
Prefix your tables and notebooks with a meaningful identifier so that you can easily recognise them. We recommend using your initials or a project identifier code.
Test your code before running
Test your code to ensure it works before you run it on Databricks. Use a testing framework such as ‘pytest’, ‘unittest’ or ‘doctest’ so the code can be easily re-tested later.
Allow a code to finish running when creating a table
Always allow a code to finish running when a table is being created or altered and do not cancel it part way through. If the run is cancelled, an error will occur and you will be prevented from creating a table with the same name.
Store code centrally for re-use
If you wish to re-use code, add it to a central function or a notebook of its own. Use dbutils.notebook.run or %run command to re-use the code in other analytical pipelines.
Delete temporary tables after use
Delete any temporary tables that were created as intermediate tables when executing a notebook. Deleting tables saves storage, especially if the notebook is scheduled for daily execution.
Last edited: 11 January 2024 1:57 pm