Skip to content

Latest commit

 

History

History
23 lines (18 loc) · 1.26 KB

README.md

File metadata and controls

23 lines (18 loc) · 1.26 KB

Curating Datasets for Parameter Efficient Fine-tuning

This tutorial demonstrates the usage of NeMo Curator's Python API to curate a dataset for parameter-efficient fine-tuning (PEFT).

In this tutorial, we use the Enron Emails dataset, which is a dataset of emails with corresponding classification labels for each email. Each email has a subject, a body and a category (class label). We demonstrate various filtering and processing operations that can be applied to each record.

Walkthrough

For a detailed walkthrough of this tutorial, please see the following blog post:

Usage

After installing the NeMo Curator package, you can simply run the following command:

python tutorials/peft-curation/main.py

By default, this tutorial will use at most 8 workers to run the curation pipeline. If you face any out of memory issues, you can reduce the number of workers by supplying the --n-workers=N argument, where N is the number of workers to spawn.