Part of Using Databricks in the NHS England Secure Data Environment
Pulling tables into Python using PySpark
PySpark is the Python Application Programming Interface (API) written in Python to support Apache Spark. Apache Spark is the distributed framework used to handle big data analysis.
When pulling tables from Databricks into Python, you need to go via Spark using "PySpark": spark.table('table_name').
Dataframes provide a domain specific language for structured data manipulation.
The following example shows how to pull a table into Python using PySpark.
Last edited: 11 January 2024 1:30 pm