Skip to main content

Current Chapter

Current chapter – Using notebooks and clusters


Databricks uses notebooks and clusters to analyse and visualise data, and to run automated jobs.


Notebooks

A notebook allows you to write and execute code, a section at a time, using cells. A notebook can have one or more cells.

Note book screen displaying a cell

Each cell can contain a different type of code. One could contain Python, the next could contain Structured Query Language (SQL) and the one after that could contain Markdown, for example.

Using mixed languages

Cells use the default language of the notebook they are in but you can prime cells for different programming languages using magic commands. Magic commands allow you to switch between Python, Scala, SQL, R and Markdown.

The percent sign is used to indicate you are using a magic command. For example.

Magic command Description 
%sql SQL magic – makes the cell a SQL cell
%python Python magic – makes the cell a Python cell
%scala Scala magic – makes the cell a Scala cell
%r R magic – makes the cell an R cell
%md Markdown magic - makes the cell a Markdown cell

In notebooks, you can use SQL, Python, Scala and R code cells sequentially. This allows you to use the preferred code for the task.

Data can be passed from cell to cell, and notebook to notebook, to create workflows.

Notebook cell screen

For more information on managing notebooks, refer to the Databricks notebooks management page.

For more information on using notebooks, refer to the Databricks notebooks page.


Clusters

A cluster is a set of computation resources and configurations on which you run data engineering, data science and data analytics workloads. It is effectively the engine that drives the notebook that allows you to perform your analysis.

Each Data Sharing Agreement (DSA) has its own default cluster. All users logged in under the same DSA will share the same cluster. A cluster must be running to view data in the Data tab.

You can view the DSA you are working under next to the Run all option on the top right-hand side of your notebook screen.

Notebook screen with example data sharing agreement number highlighted

For more information on DSAs visit the Data Access Environment (DAE).

Restarting shared clusters

Restarting a cluster will clear any information stored in its memory, such as temp views, dataframes and variables. If anyone is using a cluster when you restart it, their queries will fail.

You should only restart a cluster as a last resort. If it is necessary to do so – for example, if the cluster freezes you should ask your colleagues before you restart.


Creating folders for notebooks

You can create folders to store and organise your notebooks.

1. From the sidebar, click the Workspace icon.

2. In the workspace folder, select Users, click on the 3 dots to get to Create and then click on Folder.

folders for notebooks with created section highlighted

3. Enter the name for the new folder.

4. Click Create Folder.

create new folder

Refer to the Working collaboratively section for further information on using shared folders.


Creating and using notebooks

You can create notebooks to write and execute code. Notebooks can be created in a shared folder or within your own personal folder. 

Creating a notebook

To create a notebook.

1. From the sidebar, click the workspace icon.

Under the workspace folder, you will be able to see all the folders you have permission for. 

workspace folder with permissions

 

2. Select the folder you want to add the notebook to.

You will have access to shared folders and personal folders.

If you add the notebook to a shared folder, all users with permissions for that folder will be able to access the notebook.

If you add the notebook to your own personal folder, the notebook will only be accessible by yourself, but you can still choose to share the notebook in your personal with a designated user, or users.  Refer to the Working Collaboratively section for further information.

3. Right click on the folder name and select Create, then Notebook.

4. This will create a new untitled notebook. Click on the Untitled Notebook heading and rename the notebook. Prefix your notebook with a meaningful identifier such as your initials or project code so you can easily recognise it.  

5. Select your language from the drop-down menu. (This will determine the default language of the notebook.

untitled notebook in workspace

The notebook will open with an empty cell at the top.  

 

untitled notebook

You can now start to write code into the cell in your notebook.

When you close the notebook, the code will be automatically saved.


Running code

To run the code in a cell in your notebook:

1. Click the Run icon to the right of the cell.

2. Select Run Cell from the drop-down menu.

Alternatively, you can press SHIFT + Enter within the cell.

untitled notebook run code screen

The output will be displayed below the cell.

Untitled notebook output

If your code is divided into several cells in your notebook, you can run all the cells by clicking Run > Run All from the notebook toolbar.


Adding cells

A notebook can contain one or more cells. If your code is large, you can divide it into several cells. You can add more cells to your notebook as required.

To add a new cell to your notebook:

1. Hover your mouse over the bottom of the cell in your notebook. This cell must be directly above where you want to add a new cell.

An + icon will be displayed.

add cell

2.  Click on the + icon.

A new empty cell will be displayed in your notebook.

a new empty cell is displayed


Cloning notebooks

Sometimes it may be helpful to create a ‘clone’ of a notebook. For instance, you may want to:

  • keep an original of a notebook
  • make changes to a notebook
  • share a copy of a notebook with your colleagues using a shared folder

To clone a notebook:

1. Click Run.

2. Select Clear Cell Outputs from the drop-down menu to clear any query outputs.

clone notebook

Clearing outputs is important so that sensitive information is not shared with a user who may not have the same access permissions.

3. Click confirm

are you sure you want to clear all cell output message

4. Click File

5. Select clone from the drop down menu.  

clone notebook drop down screen

6. Enter a new name for the notebook you are cloning.

7. Select the folder to which you would like to save your clone notebook.

8. Click Clone.

clone test


Deleting notebooks

To delete a notebook that you no longer require:

1. From the sidebar, click the Workspace icon.

2. Right click on the notebook to be deleted.

3. Select Move to Trash from the drop-down menu.

move to trash screen

4. Select Confirm and move to Trash.

The notebook will be moved to the Trash and will be permanently deleted after 30 days.
 


‘Master’ notebooks

A ‘master’ notebook can run one or more other notebooks as part of its code. This can help you develop effective workflows to allow preparatory work to be signed off and checked that pieces of analysis to be packaged up and run separately.


Last edited: 11 January 2024 1:29 pm