Slide 1
In this video, I'm going to give you a quick demo of using RStudio in the NHS England’s Secure Data Environment or SDE.
Slide 2
Slide 3
The first time you use RStudio, you'll need to set up a connection with Databricks.
First go to Databricks, then select your username in the top right corner. In user settings select developer, access tokens and generate a new token. Remove the text in the lifetime field and enter a token name in the comment field. Select generate and copy the token that appears. Then go to Notepad++ and paste the copy token within the text box. Save the document within your home folder so that it persists through to future sessions.
Once you have done this, go to the start menu in the bottom left corner of the desktop and search ODBC. Select the 64-bit ODBC Data Sources connection toped the set-up window, and select configure. Then paste the token that you copied earlier into the password field. You can test the connection by selecting 'Test' or continue by selecting 'Okay'.
Slide 4
A successfully tested connection will generate a success message in a new test results window.
Slide 5
To instantiate the connection between RStudio and Databricks, select RStudio on the desktop, and then select the connections tab and select 'New Connection'.
Select Databricks, then select 'test'. If ok a success message will appear, select 'OK' and then 'OK' again. In the connections tab, you can view what tables are available to query by selecting 'hive_metastore'.
For more information on querying Databricks using RStudio, view the 'supporting resources for using RStudio in the SDE' links below this video.
Slide 6
If you want to clone code to RStudio using its git interface, you will need to connect RStudio and GitLab.
Slide 7
Slide 8
You'll need your GitLab username to do this. Your GitLab username can be found in GitLab. Go to your username in the top menu, then select preferences, then select Account under the user settings menu.
In the change username section, you can view your username on the right-hand side of the path section. Save this in your home folder.
Slide 9
Slide 10
To connect RStudio and GitLab, go to a project within GitLab and select clone. Copy the 'Clone with HTTPS' URL. Then go to RStudio and select 'New Project', 'Version Control', 'Git'. Paste the copied cloned URL in the repository URL field, and make sure the directory is within your home folder.
Slide 11
Once it has successfully cloned, in the GitLab sign-in window, enter your GitLab username and GitLab token that you saved earlier. Then select sign-in, your cloned project should now be displayed and available to work with within RStudio.
Slide 12
Slide 13
Sometimes the git connection pane does not show, in which case you can run the git status command from the R terminal. Copy and paste the last two lines of code in the terminal and run them and the git pane should reappear.
Slide 14
When making a commit from RStudio to GitLab. It may ask for your information, just enter your SDE email address and name, and it should allow you to continue.
Slide 15
Some things to note about RStudio are, it can both read tables from Databricks and write tables to Databricks, although only to the 'dsa_collab' database. It reads files from the virtual desktop, make sure you save them in a persistent folder such as the Home Folder, Collab Storage, or GitLab, so they persist to future sessions. The memory available is limited by the virtual machine. The compute and memory is separate to that used by Databricks.
Slide 16
This brings us to the end of the video.