Repo2JSON is a lightweight API service based on Flask, designed to convert GitHub repository source code into JSON format suitable for Large Language Models (LLM). With configurable constraints such as depth, file count, and file size limits, this tool efficiently extracts specific parts of the repository without cloning the entire repository. Repo2JSON is perfect for developers who need to quickly extract and process repository content.
- Depth Limit: Controls the depth of directory traversal up to a specified maximum depth.
- File Count Limit: Skips directories that exceed a predefined number of files.
- File Size Limit: Ignores files that exceed a specified number of characters.
- License Exclusion: Optionally excludes LICENSE files from the download.
- Rate Limit Handling: Automatically handles GitHub API rate limits and provides feedback on when to retry.
- Binary File Exclusion: Ensures only text files are downloaded, excluding all binary files.
- Hardcoded Token Support: Uses a hardcoded GitHub API token if no token is provided in the request.
- Backend Framework: Flask
- HTTP Requests: Requests
- Proxy: Uses HTTP/HTTPS proxy if configured via environment variable.
- Python 3.x
- Flask
- Requests
- If using HTTP/HTTPS proxy, set the
HTTP_PROXY
environment variable.
-
Clone the repository:
git clone https://github.com/leezhuuuuu/Repo2JSON.git cd Repo2JSON
-
Install required packages:
pip install -r requirements.txt
-
If using a proxy, set the environment variable:
export HTTP_PROXY="http://your-proxy-url:port"
-
Run the application:
python app.py
The application uses multiple configurable parameters to control its behavior:
EXCLUDE_LICENSE
: Set toTrue
to skip LICENSE files.ENABLE_DEPTH_LIMIT
: Enable or disable depth limit.MAX_DEPTH
: Maximum depth of directory traversal.ENABLE_FILE_COUNT_LIMIT
: Enable or disable file count limit per directory.MAX_FILES_PER_DIR
: Maximum number of files allowed before skipping a directory.ENABLE_FILE_SIZE_LIMIT
: Enable or disable file size limit.MAX_FILE_SIZE
: Maximum number of characters allowed before skipping a file.HARDCODED_TOKEN
: Hardcoded GitHub API token to use if no token is provided in the request.
To get a GitHub Token, visit GitHub Tokens Settings and generate a new personal access token. Ensure you select repo
permissions for the token to access private repositories and perform repository-related operations.
Downloads the content of a specified GitHub repository and converts it into JSON format suitable for LLM understanding. Requires repo_url
in the request body and a Bearer token in the Authorization header.
{
"repo_url": "https://github.com/owner/repo"
}
Authorization: Bearer <your_github_token>
Returns a JSON object containing the file tree structure and file contents that meet the specified conditions.
{
"file_tree": {
"dir1": {
"file1.txt": null
}
},
"file_contents": {
"dir1/file1.txt": "File content here..."
}
}
The API returns appropriate HTTP status codes and error messages in case of failures such as invalid requests, rate limit exceeded, or internal server errors.
Contributions are welcome! Please feel free to submit pull requests or open issues to discuss any improvements or new features.
This project is licensed under the GNU License. See the LICENSE file for details.