Skip to content

Latest commit

 

History

History
14 lines (12 loc) · 715 Bytes

File metadata and controls

14 lines (12 loc) · 715 Bytes

Task

Predict top-3 most relevant (interesting) films for each user in dataset.

Solution

In src/ you can find the full realization of collaborative filtering using only MapReduce Hadoop streaming and sometimes numpy. There are 7 mappers and 7 reducers. To execute them run each run*.sh in run/. Don't forget to change your home directory.

For this job google cloude services were used. In the free trial was created a cluster: 1 NameNode with 2 CPU and 3 Workers with 2 CPU each.

Data

The dataset was loaded from here https://grouplens.org/datasets/movielens/latest/.
The answer is placed in "answer.txt".

P.S. unfortunatly here is only the result of prediction for small dataset.