Skip to content

Latest commit

 

History

History
111 lines (66 loc) · 3.36 KB

README.md

File metadata and controls

111 lines (66 loc) · 3.36 KB

chunkr

A fast and quick chunking library for 🦀

Latest version License

The project aims to help Rust developers build text and language-based applications that utilize some kind of documents or text. It is built for developers to chunkify large documents into smaller chunks without using heavy resources.

use chunkr to split large pdf documents into smaller chunks for LLM training and RAG (Retrieval Augmented Generation) application development.

🚀 Getting Started

To add chunkr to your project and start chunking, use the cargo cli

cargo add chunkr

There are some examples mentioned in the examples directory. Checkout those to get started.

To checkout code and build it yourself

Clone the repository and run one of the examples from the examples directory.

git clone https://github.com/d1pankarmedhi/chunkr.git
cd chunkr

🏗️ Examples

Check out these examples to quickly get started:

Chunking

These are some chunking strategy examples:

Run them using the cargo command like:

# cargo run --example example-name chunk-size overlap file-path
cargo run --example chunk_document 1000 20 /home/home/Downloads/clean_code.pdf

💡 Contributing

As an open-source project, we are open to all kinds of contributions, be it through code, documentation, issues, bugs, or even feature suggestions.

Feel free to check out Contribution guide for more details.

📝 License

This project is licensed under the MIT License - see the LICENSE.md file for details