Skip to content

Latest commit

 

History

History

Synapse-Serverless-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Azure Synapse Analytics - Data Analysis

The purpose of this article is to provide the setup instructions for the recorded demo. The presentation and demo include:

  • Overview of Azure Synapse Analytics
  • What are Serverless SQL and Serverless Spark and the benefits of using it.
  • Explore the importance of Data Analysis - Why accurate and relevant data are important?
  • Demo: Serverless SQL and Serverless Spark to review, validate and create visualization using the sample data.

After this session, you will be able to use Azure Synapse Analytics to conduct your own data analysis to get better results.

Video

Link: Youtube - Conduct Data Analysis using Azure Synapse Analytics | Serverless SQL | Serverless Spark

Setup

Pre-req

For this demo, we require:

  1. Azure Synapse Analytics with a 'Small' spark pool.
  2. Azure Storage with hierarchical namespace enabled (Data Lake gen 2)
  3. A dedicated 'demo' container to be created in the Azure Storage. This container will store all the data files.

Code and scripts

Folder/file Description
NY Taxi - Sample data Download all 4 NY Taxi sample files. Dataset Reference: Kaggle
Demo 1 - Create Dataset - SQL This SQL script will create the Schema and External Table from the parquet file.
Demo 1 - Analyze Dataset - SQL This SQL script illustrates the steps to conduct a data analysis using Serverless SQL.
Demo 2 - Create & Analyze Dataset - PySpark This PySpark script will examine, clean the data and create a Table in the Lake DB.
Demo 2 - Serverless SQL to access Lake DB This SQL script illustrates the ability of Serverless SQL to query Lake Database.