Task

Predict top-3 most relevant (interesting) films for each user in dataset.

Solution

In src/ you can find the full realization of collaborative filtering using only MapReduce Hadoop streaming and sometimes numpy. There are 7 mappers and 7 reducers. To execute them run each run*.sh in run/. Don't forget to change your home directory.

For this job google cloude services were used. In the free trial was created a cluster: 1 NameNode with 2 CPU and 3 Workers with 2 CPU each.

Data

The dataset was loaded from here https://grouplens.org/datasets/movielens/latest/.
The answer is placed in "answer.txt".

P.S. unfortunatly here is only the result of prediction for small dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Task

Solution

Data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Task

Solution

Data