chunkr

A fast and quick chunking library for 🦀

The project aims to help Rust developers build text and language-based applications that utilize some kind of documents or text. It is built for developers to chunkify large documents into smaller chunks without using heavy resources.

use chunkr to split large pdf documents into smaller chunks for LLM training and RAG (Retrieval Augmented Generation) application development.

🚀 Getting Started

To add chunkr to your project and start chunking, use the cargo cli

cargo add chunkr

There are some examples mentioned in the examples directory. Checkout those to get started.

To checkout code and build it yourself

Clone the repository and run one of the examples from the examples directory.

git clone https://github.com/d1pankarmedhi/chunkr.git
cd chunkr

🏗️ Examples

Check out these examples to quickly get started:

Chunking

These are some chunking strategy examples:

Chunking by words - Chunk your documents/texts by number of words.
Chunking by characters - Chunk your documents/text by number of characters.
Chunk PDF document - Chunk your pdf documents by words/characters.

Run them using the cargo command like:

# cargo run --example example-name chunk-size overlap file-path
cargo run --example chunk_document 1000 20 /home/home/Downloads/clean_code.pdf

💡 Contributing

As an open-source project, we are open to all kinds of contributions, be it through code, documentation, issues, bugs, or even feature suggestions.

Feel free to check out Contribution guide for more details.

📝 License

This project is licensed under the MIT License - see the LICENSE.md file for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

chunkr

A fast and quick chunking library for 🦀

🚀 Getting Started

To checkout code and build it yourself

🏗️ Examples

Chunking

💡 Contributing

📝 License

Files

README.md

Latest commit

History

README.md

File metadata and controls

chunkr

A fast and quick chunking library for 🦀

🚀 Getting Started

To checkout code and build it yourself

🏗️ Examples

Chunking

💡 Contributing

📝 License