GitHub Repository Recommender System 📊🚀

Welcome to the GitHub Repository Recommender System! This project is designed to fetch data from GitHub repositories, preprocess it, and use various algorithms to recommend repositories to users based on their preferences. Below is a detailed guide on how to set up, run, and understand the project.

Introduction

This project aims to provide a robust recommender system for GitHub repositories. It involves fetching repository data, preprocessing the data, extracting relevant keywords, and generating recommendations based on similarity metrics.

Features

Data Fetching: Retrieve repository data, README content, and issues/labels from GitHub.
Data Preprocessing: Clean and preprocess the fetched data.
Keyword Extraction: Extract keywords using TF-IDF, LDA, and BERT.
Similarity Calculation: Compute similarity between user preferences and repository features.
Recommendations: Generate and display repository recommendations for users.

Getting Started

Prerequisites

Ensure you have the following installed:

Python 3.7+
Git
Virtual Environment (optional but recommended)

Installation

Clone the repository:

git clone https://github.com/your-username/github-recommender-system.git
cd github-recommender-system

Install the required python packages:
```
pip install -r requirements.txt
```

Add your github token to the enviroment:

export GITHUB_TOKEN='your_github_token'

Pipeline Overview

Data Fetching

Fetch Repository Data: Use fetch_repo_data.py to gather repository metadata, README content, languages, and topics.
Fetch Issue Labels: Use fetch_issue_labels.py to scrape issue labels from repository pages.
Fetch Trending Repositories: Use fetch_trending_repos.py to get trending repositories based on language and spoken language.
Fetch Trending Metadata: Use fetch_trending_repos_metadata.py to gather metadata for trending repositories.
Fetch Trending Issues Labels: Use fetch_trending_issues_labels.py to scrape issue labels for trending repositories.

Data Preprocessing and Keyword Extraction

Preprocess Data: Clean and preprocess the README content and issues.
Extract Keywords: Use TF-IDF, LDA, and BERT to extract relevant keywords from the README and issues.

Similarity Index Matching

Vectorize Data: Transform the preprocessed data into vectors using TF-IDF.
Compute Similarity: Calculate cosine similarity between user preferences and repository vectors.
Generate Recommendations: Recommend repositories to users based on the highest similarity scores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GitHub Repository Recommender System 📊🚀

Table of Contents

Introduction

Features

Getting Started

Prerequisites

Installation

Pipeline Overview

Data Fetching

Data Preprocessing and Keyword Extraction

Similarity Index Matching

Files

README.md

Latest commit

History

README.md

File metadata and controls

GitHub Repository Recommender System 📊🚀

Table of Contents

Introduction

Features

Getting Started

Prerequisites

Installation

Pipeline Overview

Data Fetching

Data Preprocessing and Keyword Extraction

Similarity Index Matching