Part of Using Databricks in the NHS England Secure Data Environment

Pulling tables into Python using PySpark

Previous Chapter

Using cells

Current Chapter

Current chapter – Pulling tables into Python using PySpark

Next Chapter

Apache Spark and Spark SQL

PySpark is the Python Application Programming Interface (API) written in Python to support Apache Spark. Apache Spark is the distributed framework used to handle big data analysis.

When pulling tables from Databricks into Python, you need to go via Spark using "PySpark": spark.table('table_name').

Dataframes provide a domain specific language for structured data manipulation.

The following example shows how to pull a table into Python using PySpark.

example showing how to pull a table into Python using PySpark

Last edited: 11 January 2024 1:30 pm

Pulling tables into Python using PySpark

Chapters