Paper Recommender

A research paper recommendation engine that uses text embeddings and Gaussian Process Regression (GPR) to suggest relevant papers based on your preferences.

Overview

Paper Recommender helps researchers discover relevant papers by:

Fetching recent papers from arXiv
Using text embeddings to represent paper content
Learning your preferences through an onboarding process
Recommending papers using a variance-based Gaussian Process Regression model
Balancing exploration and exploitation to improve recommendations over time

The system uses a sophisticated approach where the GPR model learns the expected variance between ratings as a function of similarity, rather than directly predicting ratings. This provides more robust recommendations with better uncertainty estimates.

Installation

Prerequisites

Python 3.8 or higher
Ollama for text embeddings

Installing Paper Recommender

Clone the repository:

git clone https://github.com/yourusername/paper-recommender.git
cd paper-recommender

Install the package:
```
pip install -e .
```

Setting up Ollama with nomic-embed-text

The paper recommender uses Ollama with the nomic-embed-text model for generating text embeddings.

Install Ollama by following the instructions at ollama.ai
Pull the nomic-embed-text model:
```
ollama pull nomic-embed-text
```

Verify the installation:

ollama run nomic-embed-text "Hello, world!"

Ensure the Ollama server is running when using Paper Recommender:
```
ollama serve
```

Configuration

Paper Recommender uses a configuration file located at ~/.paper_recommender/config.json. The default configuration is created on first run, but you can customize it:

{
  "chroma_db_path": "~/.paper_recommender/chroma_db",
  "model_path": "~/.paper_recommender/gp_model.pkl",
  "embedding_cache_path": "~/.paper_recommender/embedding_cache.pkl",
  "exploration_weight": 1.0,
  "max_samples": 1000,
  "period_hours": 48,
  "random_sample_size": 5,
  "diverse_sample_size": 5,
  "num_recommendations": 5,
  "gp_num_samples": 100,
  "n_nearest_embeddings": 10
}

Key configuration parameters:

exploration_weight: Controls the balance between exploration and exploitation (higher values favor exploration)
period_hours: Time window for fetching recent papers from arXiv
random_sample_size and diverse_sample_size: Number of papers to show during onboarding
gp_num_samples: Number of GP samples for uncertainty estimation
n_nearest_embeddings: Number of nearest embeddings to use for prediction

Usage

First-time Use

On first run, the system will automatically start the onboarding process:

paper-recommender

Onboarding

The onboarding process helps the system learn your preferences by asking you to rate papers:

paper-recommender --onboard

During onboarding:

You'll be presented with papers selected using different strategies (random and diverse)
Rate papers on a scale of 1-5
The system will use these ratings to build a recommendation model

Getting Recommendations

After onboarding, you can get paper recommendations:

paper-recommender --recommend

The recommendation process:

Fetches recent papers from arXiv
Uses the trained model to predict your ratings
Presents papers with the highest predicted ratings
Allows you to rate the recommended papers to improve future recommendations

Bootstrapping the Model

If you want to retrain the recommendation model with your latest ratings:

paper-recommender --bootstrap

Command-line Options

usage: paper-recommender [-h] [--onboard] [--recommend] [--bootstrap]
                         [--config CONFIG] [--chroma-db-path CHROMA_DB_PATH]
                         [--model-path MODEL_PATH]
                         [--embedding-cache-path EMBEDDING_CACHE_PATH]
                         [--exploration-weight EXPLORATION_WEIGHT]
                         [--max-samples MAX_SAMPLES]
                         [--period-hours PERIOD_HOURS]
                         [--random-sample-size RANDOM_SAMPLE_SIZE]
                         [--diverse-sample-size DIVERSE_SAMPLE_SIZE]
                         [--num-recommendations NUM_RECOMMENDATIONS]

Paper Recommender

optional arguments:
  -h, --help            show this help message and exit
  --onboard             Run onboarding process
  --recommend           Run recommendation process
  --bootstrap           Bootstrap the recommendation model
  --config CONFIG       Path to custom config file
  --chroma-db-path CHROMA_DB_PATH
                        Path to ChromaDB directory
  --model-path MODEL_PATH
                        Path to model pickle file
  --embedding-cache-path EMBEDDING_CACHE_PATH
                        Path to embedding cache file
  --exploration-weight EXPLORATION_WEIGHT
                        Exploration weight for recommendations
  --max-samples MAX_SAMPLES
                        Maximum number of samples for similarity search
  --period-hours PERIOD_HOURS
                        Time period in hours for paper retrieval
  --random-sample-size RANDOM_SAMPLE_SIZE
                        Number of random papers to select during onboarding
  --diverse-sample-size DIVERSE_SAMPLE_SIZE
                        Number of diverse papers to select during onboarding
  --num-recommendations NUM_RECOMMENDATIONS
                        Number of recommendations to show

How It Works

Data Sources

Paper Recommender fetches recent papers from arXiv based on the configured time period. It uses the arXiv API to retrieve paper titles, abstracts, and links.

Embeddings

The system uses the nomic-embed-text model through Ollama to generate vector embeddings for paper content. These embeddings capture the semantic meaning of papers, allowing the system to find similar papers.

Vector Store

Embeddings and ratings are stored in a ChromaDB vector database, enabling efficient similarity search.

Recommendation Algorithm

The recommendation system uses a sophisticated Gaussian Process Regression (GPR) model that:

Learns to predict the variance between ratings as a function of similarity
Uses this variance to weight similar papers when making predictions
Samples the GP model to estimate uncertainty
Uses N-sigma confidence intervals to balance exploration and exploitation

This approach provides more robust recommendations with better uncertainty estimates compared to directly predicting ratings.

Troubleshooting

Common Issues

Ollama Connection Error: Ensure the Ollama server is running with ollama serve
Missing Model: If you get an error about the nomic-embed-text model, run ollama pull nomic-embed-text
No Recommendations: You may need to onboard more papers before getting recommendations

Reporting Issues

If you encounter any bugs or have feature requests, please open an issue on the GitHub repository.

License

This project is licensed under the terms of the LICENSE file included in the repository.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src/paperrecommender		src/paperrecommender
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper Recommender

Overview

Installation

Prerequisites

Installing Paper Recommender

Setting up Ollama with nomic-embed-text

Configuration

Usage

First-time Use

Onboarding

Getting Recommendations

Bootstrapping the Model

Command-line Options

How It Works

Data Sources

Embeddings

Vector Store

Recommendation Algorithm

Troubleshooting

Common Issues

Reporting Issues

License

Contributing

About

Releases

Packages

Languages

License

juehang/paper-recommender

Folders and files

Latest commit

History

Repository files navigation

Paper Recommender

Overview

Installation

Prerequisites

Installing Paper Recommender

Setting up Ollama with nomic-embed-text

Configuration

Usage

First-time Use

Onboarding

Getting Recommendations

Bootstrapping the Model

Command-line Options

How It Works

Data Sources

Embeddings

Vector Store

Recommendation Algorithm

Troubleshooting

Common Issues

Reporting Issues

License

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages