Part of Using Databricks in the NHS England Secure Data Environment
Using notebooks and clusters
Databricks uses notebooks and clusters to analyse and visualise data, and to run automated jobs.
Notebooks
A notebook allows you to write and execute code, a section at a time, using cells. A notebook can have one or more cells.
Each cell can contain a different type of code. One could contain Python, the next could contain Structured Query Language (SQL) and the one after that could contain Markdown, for example.
Using mixed languages
Cells use the default language of the notebook they are in but you can prime cells for different programming languages using magic commands. Magic commands allow you to switch between Python, Scala, SQL, R and Markdown.
The percent sign is used to indicate you are using a magic command. For example.
Magic command | Description |
---|---|
%sql | SQL magic – makes the cell a SQL cell |
%python | Python magic – makes the cell a Python cell |
%scala | Scala magic – makes the cell a Scala cell |
%r | R magic – makes the cell an R cell |
%md | Markdown magic - makes the cell a Markdown cell |
In notebooks, you can use SQL, Python, Scala and R code cells sequentially. This allows you to use the preferred code for the task.
Data can be passed from cell to cell, and notebook to notebook, to create workflows.
For more information on managing notebooks, refer to the Databricks notebooks management page.
For more information on using notebooks, refer to the Databricks notebooks page.
Clusters
A cluster is a set of computation resources and configurations on which you run data engineering, data science and data analytics workloads. It is effectively the engine that drives the notebook that allows you to perform your analysis.
Each Data Sharing Agreement (DSA) has its own default cluster. All users logged in under the same DSA will share the same cluster. A cluster must be running to view data in the Data tab.
You can view the DSA you are working under next to the Run all option on the top right-hand side of your notebook screen.
For more information on DSAs visit the Data Access Environment (DAE).
You should only restart a cluster as a last resort. If it is necessary to do so – for example, if the cluster freezes you should ask your colleagues before you restart.
Creating folders for notebooks
You can create folders to store and organise your notebooks.
1. From the sidebar, click the Workspace icon.
2. In the workspace folder, select Users, click on the 3 dots to get to Create and then click on Folder.
3. Enter the name for the new folder.
4. Click Create Folder.
Refer to the Working collaboratively section for further information on using shared folders.
Creating and using notebooks
You can create notebooks to write and execute code. Notebooks can be created in a shared folder or within your own personal folder.
Creating a notebook
To create a notebook.
1. From the sidebar, click the workspace icon.
Under the workspace folder, you will be able to see all the folders you have permission for.
2. Select the folder you want to add the notebook to.
You will have access to shared folders and personal folders.
If you add the notebook to a shared folder, all users with permissions for that folder will be able to access the notebook.
If you add the notebook to your own personal folder, the notebook will only be accessible by yourself, but you can still choose to share the notebook in your personal with a designated user, or users. Refer to the Working Collaboratively section for further information.
3. Right click on the folder name and select Create, then Notebook.
4. This will create a new untitled notebook. Click on the Untitled Notebook heading and rename the notebook. Prefix your notebook with a meaningful identifier such as your initials or project code so you can easily recognise it.
5. Select your language from the drop-down menu. (This will determine the default language of the notebook.
The notebook will open with an empty cell at the top.
You can now start to write code into the cell in your notebook.
When you close the notebook, the code will be automatically saved.
Running code
To run the code in a cell in your notebook:
1. Click the Run icon to the right of the cell.
2. Select Run Cell from the drop-down menu.
Alternatively, you can press SHIFT + Enter within the cell.
The output will be displayed below the cell.
If your code is divided into several cells in your notebook, you can run all the cells by clicking Run > Run All from the notebook toolbar.
Adding cells
A notebook can contain one or more cells. If your code is large, you can divide it into several cells. You can add more cells to your notebook as required.
To add a new cell to your notebook:
1. Hover your mouse over the bottom of the cell in your notebook. This cell must be directly above where you want to add a new cell.
An + icon will be displayed.
2. Click on the + icon.
A new empty cell will be displayed in your notebook.
Cloning notebooks
Sometimes it may be helpful to create a ‘clone’ of a notebook. For instance, you may want to:
- keep an original of a notebook
- make changes to a notebook
- share a copy of a notebook with your colleagues using a shared folder
To clone a notebook:
1. Click Run.
2. Select Clear Cell Outputs from the drop-down menu to clear any query outputs.
Clearing outputs is important so that sensitive information is not shared with a user who may not have the same access permissions.
3. Click confirm.
4. Click File.
5. Select clone from the drop down menu.
6. Enter a new name for the notebook you are cloning.
7. Select the folder to which you would like to save your clone notebook.
8. Click Clone.
Deleting notebooks
To delete a notebook that you no longer require:
1. From the sidebar, click the Workspace icon.
2. Right click on the notebook to be deleted.
3. Select Move to Trash from the drop-down menu.
4. Select Confirm and move to Trash.
The notebook will be moved to the Trash and will be permanently deleted after 30 days.
‘Master’ notebooks
A ‘master’ notebook can run one or more other notebooks as part of its code. This can help you develop effective workflows to allow preparatory work to be signed off and checked that pieces of analysis to be packaged up and run separately.
Last edited: 11 January 2024 1:29 pm