Skip to main content

Part of Using Databricks in the NHS England Secure Data Environment

Pulling tables into Python using PySpark

Current Chapter

Current chapter – Pulling tables into Python using PySpark


PySpark is the Python Application Programming Interface (API) written in Python to support Apache Spark. Apache Spark is the distributed framework used to handle big data analysis.

When pulling tables from Databricks into Python, you need to go via Spark using "PySpark": spark.table('table_name').

Dataframes provide a domain specific language for structured data manipulation. 

The following example shows how to pull a table into Python using PySpark. 

example showing how to pull a table into Python using PySpark


Last edited: 11 January 2024 1:30 pm