Skip to content

Hesham942/ETL-for-nested-JSON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Nested JSON ETL

This project provides an ETL (Extract, Transform, Load) framework designed to handle data with nested JSON structures.

Installation

This project requires Python 3.x and the following libraries:

  • pandas
  • json
  • pyspark

You can install them using pip:

pip install pandas json spark pyspark 

. Usage Instructions:

  • the dataset is nested json format and it can't be read so we needed to change into a readable or structured format.

    1. Data Source: this problem is found a lot you can find it online and one of the best providing these datasets is kaggle.
    2. Transformation: we need to explode pivot columns and flattening nested structures to deal with the real data .
    3. Loading: data is being uploaded to postgreSQL using psycopg2 library.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published