Solution by Kotenkov Igor
This task is about forecasting how many bikes are rented from the TFL (Transport for London) Cycle Hire scheme.
Specifically, a candidate should attempt to answer the question “Can national electrical power generation help estimate how many bikes are hired?”
The idea is that these two datasets may be correlated with data we don’t have information on (e.g., the weather).
Included Data Sources (in the data folder):
- tfl-daily-cycle-hires.xlsx: the daily number of hired bikes. Downloaded from London Datastore;
- electrical_power_data.csv: the daily amounts of produced energy (by source). Downloaded from REF using the following pattern: https://www.ref.org.uk/fuel/index.php?valdate=2009&tab=dp&share=N (Substituted “2009” in the URL to get data for later years);
- A candidate may also use other data sources (e.g., the attached Bank Holidays ukbankholidays.csv).
Note: A clear methodology supported by reasonable justifications is more important than an extremely accurate model.
- Some preliminary data exploration;
- A model which predicts TFL Cycle Hire numbers using ONLY the TFL dataset;
- A model which predicts TFL Cycle Hire numbers using the TFL and electrical power generation dataset.
- Give reasons for their choices;
- Outline how/why they selected the features which were used as inputs;
- Evaluate their model(s) through multiple metrics.
A candidate should produce a Jupyter Notebook with the solution. Here in the repository, you can also check the HTML version (easier to open in a browser).
Blockchain.com generally expects candidates to spend around 3 hours on the task, depending on their availability.
Please note that I, the author of this solution (not the whole TakeHome), spent ~8 hours on the attached solution. However, I spent most of the time on notes/descriptions and their proper (and the best possible) English translation.