Predict top-3 most relevant (interesting) films for each user in dataset.
In src/ you can find the full realization of collaborative filtering using only MapReduce Hadoop streaming and sometimes numpy. There are 7 mappers and 7 reducers. To execute them run each run*.sh in run/. Don't forget to change your home directory.
For this job google cloude services were used. In the free trial was created a cluster: 1 NameNode with 2 CPU and 3 Workers with 2 CPU each.
The dataset was loaded from here https://grouplens.org/datasets/movielens/latest/.
The answer is placed in "answer.txt".
P.S. unfortunatly here is only the result of prediction for small dataset.