Skip to content

Latest commit

 

History

History
7 lines (7 loc) · 861 Bytes

README.md

File metadata and controls

7 lines (7 loc) · 861 Bytes

Managing_Big_Data

This is the code repositary of my project assignment in the master course "Managing Big Data" at University of Twente. Spark was implemented for analysing the Kaggle dataset "Newyork city Taxi Trip Records Dataset" to generate and provide business insights regarding the taxi industry in New York City. RQ denotes "Research Question". RQ1 denotes the code of the first research question. There are three research questions in this project assignment as follows:

  1. What is the difference of usage between Green Taxi and HVFHV on weekdays and weekends between 2019 and 2022?
  2. For Green Taxi and HVFHV, which type of taxi is preferred on different timeslots in one day?
  3. Regarding the travel distance difference between Green Taxi and HVFHV, do commuters have a preference for taking a certain type of taxi for long/short distance travel?