Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 2.4 KB

README.md

File metadata and controls

33 lines (23 loc) · 2.4 KB

Blockchain.com Data Scientist TakeHome (February 2022)

Solution by Kotenkov Igor

Description

This task is about forecasting how many bikes are rented from the TFL (Transport for London) Cycle Hire scheme.

Specifically, a candidate should attempt to answer the question “Can national electrical power generation help estimate how many bikes are hired?”

The idea is that these two datasets may be correlated with data we don’t have information on (e.g., the weather).

Included Data Sources (in the data folder):

  1. tfl-daily-cycle-hires.xlsx: the daily number of hired bikes. Downloaded from London Datastore;
  2. electrical_power_data.csv: the daily amounts of produced energy (by source). Downloaded from REF using the following pattern: https://www.ref.org.uk/fuel/index.php?valdate=2009&tab=dp&share=N (Substituted “2009” in the URL to get data for later years);
  3. A candidate may also use other data sources (e.g., the attached Bank Holidays ukbankholidays.csv).

Deliverables

Note: A clear methodology supported by reasonable justifications is more important than an extremely accurate model.

The solution should include the following:

  1. Some preliminary data exploration;
  2. A model which predicts TFL Cycle Hire numbers using ONLY the TFL dataset;
  3. A model which predicts TFL Cycle Hire numbers using the TFL and electrical power generation dataset.

A candidate can use any model(s). However, they should:

  1. Give reasons for their choices;
  2. Outline how/why they selected the features which were used as inputs;
  3. Evaluate their model(s) through multiple metrics.

A candidate should produce a Jupyter Notebook with the solution. Here in the repository, you can also check the HTML version (easier to open in a browser).

Blockchain.com generally expects candidates to spend around 3 hours on the task, depending on their availability.

Please note that I, the author of this solution (not the whole TakeHome), spent ~8 hours on the attached solution. However, I spent most of the time on notes/descriptions and their proper (and the best possible) English translation.