Recommendation System for Books

Bachelor of Engineering thesis by Paweł Rzepiński and Ryszard Szymański under the supervision of Agnieszka Jastrzębska, Ph.D. Eng. The objective was to develop a recommender system for books using novel dataset. Both collaborative-filtering and content-based approaches were considered. Implemented recommendation models were accessible using web application allowing users to explore the dataset and compare results for both approaches: "Similar books to X" panel presenting items similar to the selected book and "You may also like X, Y, Z" containing recommendations based on books rated by the selected user.

Full showcase video available at Google Drive.

Thesis folder contains both thesis and abstract.

Documentation

Documentation of the recommendation module can be found in the docs folder. Main page is located at docs/_build/html/index.html.

Project structure

├── Makefile           <- Makefile with commands like `make data`, `make models`, `make scores`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- Codebase documentation.
│
├── models             <- Trained and serialized models, model predictions.
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1-rzepinskip-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as PDF files.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt        <- The requirements file for the web application.
├── requirements-dev.txt    <- The requirements file for reproducing the analysis environment.
│
├── setup.py           <- Project's main module. Can be installed with pip command.
└── booksuggest        <- Source code of the recommendation module.
    │
    ├── data           <- Scripts to download or generate data.
    │
    ├── features       <- Scripts to turn raw data into features for modeling.
    │
    ├── models         <- Scripts to train models and then use trained models to make
    │                   predictions.
    │
    └── evaluation     <- Scripts to evaluate scores and validate results against ground-truth data.

Setup instructions

All commands mentioned below should be run from the project's root folder. Use make help to display help information about available commands.

Prerequisites

UNIX based system
GNU Make
Python 3.7
pip

Viewing results

Create a virtual environment:
```
make create_environment
```
Activate the virtual environment:
```
source rs-venv/bin/activate
```
Install packages required by the web application:
```
make app_requirements
```
Run the app:
```
make app
```
Enter the web page address displayed in the console. Web application should be accessible at http://127.0.0.1:8050/.

Reproducing the analysis

Create a virtual environment:
```
make create_environment
```
Activate the virtual environment:
```
source rs-venv/bin/activate
```
Install packages required for development:
```
make requirements
```
Download the raw data:
```
make data
```
Train models:
```
make models
```
Evaluate models:
```
make scores
```

Comments:

When using the whole dataset the make models command takes about 20 minutes, make scores lasts more than 12h.
To check pipeline on the small subset of data use TEST_RUN=1 parameter when running make commands. Then, the whole process should take about 5 minutes. Example: make scores TEST_RUN=1
To utilize make's parallelization use -j <n_jobs> parameter where <n_jobs> specifies the number of parallel jobs run. Most often, n_jobs should be equal to the number of cores in the processor, although there are also some RAM requirements when using whole dataset. Example: make scores -j 2

Acknowledgments

Dataset used in the project goodbooks10k.
Recommendation methods mostly from surprise library.
Project structure based on the cookiecutter data science project template.
Thesis based on latex-mimosis template by Bastian Rieck.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recommendation System for Books

Documentation

Project structure

Setup instructions

Prerequisites

Viewing results

Reproducing the analysis

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 352 Commits
app		app
booksuggest		booksuggest
data		data
docs		docs
features		features
models		models
notebooks		notebooks
references		references
reports		reports
results		results
tests		tests
thesis		thesis
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cb-pipeline.mk		cb-pipeline.mk
cf-pipeline.mk		cf-pipeline.mk
requirements.txt		requirements.txt
setup.py		setup.py

License

szymanskir/booksuggest

Folders and files

Latest commit

History

Repository files navigation

Recommendation System for Books

Documentation

Project structure

Setup instructions

Prerequisites

Viewing results

Reproducing the analysis

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages