RULER dataset

RULER generates synthetic examples to evaluate long-context language models with configurable sequence length (from 4k tokens to 128k tokens) and task complexity. It contains a set of 13 tasks grouped in 4 categories (needle in the haystack, question answering, multi-hop tracing and aggregation).

Hugging Face dataset

The Hugging Face dataset for RULER can be found here. To reproduce this dataset,

Install the RULER repository and download the necessary data files (see 1. Download data in the README)
Copy paste the generate.sh from this repository to $RULER/scripts, set the DATA_DIR variable to your desired location of the RULER data files and run the script
Run create_huggingface_dataset.py with the correct data_dir and repo_id variables

Notes : by default we use meta-llama/Meta-Llama-3.1-8B as the tokenizer, while in the original RULER paper, the tokenizer depends on the model used for evaluation. Results may not be directly comparable to the original RULER benchmark. But as our focus is to evaluate the performance of a given model for different compression ratios, we believe this simplification is acceptable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RULER dataset

Hugging Face dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

RULER dataset

Hugging Face dataset