Managing_Big_Data

This is the code repositary of my project assignment in the master course "Managing Big Data" at University of Twente. Spark was implemented for analysing the Kaggle dataset "Newyork city Taxi Trip Records Dataset" to generate and provide business insights regarding the taxi industry in New York City. RQ denotes "Research Question". RQ1 denotes the code of the first research question. There are three research questions in this project assignment as follows:

What is the difference of usage between Green Taxi and HVFHV on weekdays and weekends between 2019 and 2022?
For Green Taxi and HVFHV, which type of taxi is preferred on different timeslots in one day?
Regarding the travel distance difference between Green Taxi and HVFHV, do commuters have a preference for taking a certain type of taxi for long/short distance travel?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Managing_Big_Data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Managing_Big_Data