The purpose of this article is to provide the setup instructions for the recorded demo. The presentation and demo include:
- Overview of Azure Synapse Analytics
- What are Serverless SQL and Serverless Spark and the benefits of using it.
- Explore the importance of Data Analysis - Why accurate and relevant data are important?
- Demo: Serverless SQL and Serverless Spark to review, validate and create visualization using the sample data.
After this session, you will be able to use Azure Synapse Analytics to conduct your own data analysis to get better results.
Link: Youtube - Conduct Data Analysis using Azure Synapse Analytics | Serverless SQL | Serverless Spark
For this demo, we require:
- Azure Synapse Analytics with a 'Small' spark pool.
- Azure Storage with hierarchical namespace enabled (Data Lake gen 2)
- A dedicated 'demo' container to be created in the Azure Storage. This container will store all the data files.
Folder/file | Description |
---|---|
NY Taxi - Sample data | Download all 4 NY Taxi sample files. Dataset Reference: Kaggle |
Demo 1 - Create Dataset - SQL | This SQL script will create the Schema and External Table from the parquet file. |
Demo 1 - Analyze Dataset - SQL | This SQL script illustrates the steps to conduct a data analysis using Serverless SQL. |
Demo 2 - Create & Analyze Dataset - PySpark | This PySpark script will examine, clean the data and create a Table in the Lake DB. |
Demo 2 - Serverless SQL to access Lake DB | This SQL script illustrates the ability of Serverless SQL to query Lake Database. |