README.md

Azure Synapse Analytics - Data Analysis

The purpose of this article is to provide the setup instructions for the recorded demo. The presentation and demo include:

Overview of Azure Synapse Analytics
What are Serverless SQL and Serverless Spark and the benefits of using it.
Explore the importance of Data Analysis - Why accurate and relevant data are important?
Demo: Serverless SQL and Serverless Spark to review, validate and create visualization using the sample data.

After this session, you will be able to use Azure Synapse Analytics to conduct your own data analysis to get better results.

For this demo, we require:

Azure Synapse Analytics with a 'Small' spark pool.
Azure Storage with hierarchical namespace enabled (Data Lake gen 2)
A dedicated 'demo' container to be created in the Azure Storage. This container will store all the data files.

Folder/file	Description
NY Taxi - Sample data	Download all 4 NY Taxi sample files. Dataset Reference: Kaggle
Demo 1 - Create Dataset - SQL	This SQL script will create the Schema and External Table from the parquet file.
Demo 1 - Analyze Dataset - SQL	This SQL script illustrates the steps to conduct a data analysis using Serverless SQL.
Demo 2 - Create & Analyze Dataset - PySpark	This PySpark script will examine, clean the data and create a Table in the Lake DB.
Demo 2 - Serverless SQL to access Lake DB	This SQL script illustrates the ability of Serverless SQL to query Lake Database.