Part of Using Databricks in the NHS England Secure Data Environment
Apache Spark and Spark SQL
Spark is a data processing engine for cluster computing, which sits between the data source and the analysis tool. It supports different data sources and different querying languages, including an SQL variant called Spark SQL. Spark SQL allows the use of a mixture of SQL commands to perform complex analytics.
Spark SQL syntax
Spark SQL is different to standardised SQL and uses Spark SQL syntax. For a detailed description of the Spark SQL syntax, along with examples of when you would use it, refer to the Apache Spark SQL syntax pages of the Apache Spark website These provide a list of Data Definition and Data Manipulation Statements, as well as Data Retrieval and Auxiliary Statements.
Last edited: 11 January 2024 1:30 pm