Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
LostRuins authored Aug 24, 2024
1 parent 4595a27 commit e68350b
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ Easily view and modify JSON and JSONL datasets for training large language model
- Supports **searching and filtering** with regex search or simple substrings search
- Filter multiple samples by **contents, length, and number of turns**. Allows combining multiple queries for composite results.
- Includes an **N-gram viewer** to inspect selected examples for word frequency and repetition (word cloud)
- Allows **splitting datasets** by selecting desired subsets with different criteria.
- Allows **splitting and merging datasets** by selecting desired subsets with different criteria.
- Allows easy **dataset deduplication**
- Includes a simple inline editor to modify invidual samples or correct typos.
- Pick individual samples or bulk-combine groups of them to curate your dataset, and **save the results as a new JSON dataset**
- Fast and efficient, comfortably handles small to medium datasets of a few hundred MB without issue
Expand Down

0 comments on commit e68350b

Please sign in to comment.