This repository implements a recommendation system using FunkSVD (Funk Singular Value Decomposition). It predicts user-item ratings based on collaborative filtering, leveraging techniques like gradient descent optimization and bias adjustment.
- FunkSVD Algorithm: A matrix factorization-based method for collaborative filtering.
- Hyperparameter Tuning: Includes options for tuning learning rate, regularization, and other key parameters.
- Mini-batch Gradient Descent: Efficient training with support for Adam optimizer.
- Train-Test Split: Custom utility to split data into training and testing sets.
- Prediction: Generates predictions for user-item pairs, including support for missing data.
- The entry point for the system.
- Automates hyperparameter tuning using grid search.
- Splits data into train and test sets, trains the model, and evaluates performance using RMSE.
- Generates predictions for user-item pairs in
targets.csv
and saves results tooutput3.csv
.
- Contains the implementation of the FunkSVD algorithm:
- Handles matrix factorization (
P
andQ
matrices). - Supports bias adjustments for users and items.
- Includes mini-batch gradient descent and the Adam optimizer for parameter updates.
- Handles matrix factorization (
- Handles preprocessing and splitting of the dataset into training and testing sets.
- Reads user-item rating data from
ratings.csv
.
- Implements the Adam Optimizer, a popular optimization algorithm used in training machine learning models.
- Provides efficient updates for parameters during mini-batch gradient descent.
Ensure you have the following dependencies installed:
pip install numpy pandas
Place a ratings.csv
file and a targets.csv
file in the project directory. The file ratings.csv
should have the format:
UserId:ItemId,Rating
and the targets.csv
file should have the format:
UserId:ItemId
Execute main.py
to:
- Tune hyperparameters.
- Train the FunkSVD model.
- Generate predictions.
python3 main.py
- During the training the Epoch and the current Loss are also printed in the stdout in the format:
Current_Epoch/Total_Epochs: Loss
- Predictions are printed in the stdout in the format:
UserId:ItemId,Rating
Key Hyperparameters for tuning the model
epochs
: Number of training iterations.lr
: Learning rate for gradient descent.k
: Number of latent factors.batch_size
: Size of mini-batches.lamda
: Regularization strength.test_size
: Fraction of data used for testing.