In this homework we'll prepare the environment and practice with terraform and SQL
Install Google Cloud SDK. What's the version you have?
To get the version, run gcloud --version
Create an account in Google Cloud and create a project.
Now install terraform and go to the terraform directory (week_1_basics_n_setup/1_terraform_gcp/terraform
)
After that, run
terraform init
terraform plan
terraform apply
Apply the plan and copy the output (after running apply
) to the form.
It should be the entire output - from the moment you typed terraform init
to the very end.
Run Postgres and load data as shown in the videos
We'll use the yellow taxi trips from January 2021:
wget https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.csv
You will also need the dataset with zones:
wget https://s3.amazonaws.com/nyc-tlc/misc/taxi+_zone_lookup.csv
Download this data and put it to Postgres
How many taxi trips were there on January 15?
Consider only trips that started on January 15.
Find the largest tip for each day. On which day it was the largest tip in January?
Use the pick up time for your calculations.
(note: it's not a typo, it's "tip", not "trip")
What was the most popular destination for passengers picked up in central park on January 14?
Use the pick up time for your calculations.
Enter the zone name (not id). If the zone name is unknown (missing), write "Unknown"
What's the pickup-dropoff pair with the largest
average price for a ride (calculated based on total_amount
)?
Enter two zone names separated by a slash
For example:
"Jamaica Bay / Clinton East"
If any of the zone names are unknown (missing), write "Unknown". For example, "Unknown / Clinton East".
- Form for submitting: https://forms.gle/yGQrkgRdVbiFs8Vd7
- You can submit your homework multiple times. In this case, only the last submission will be used.
Deadline: 26 January (Wednesday), 22:00 CET
Here is the solution to questions 3-6: video