[Tutorials] Update the SDG tutorial and expose the inference endpoint (…

…NVIDIA#301) This PR ensures that users can run the PEFT SDG tutorial using arbitrary API endpoints by exposing the URL that is used for synthetic data generation. Signed-off-by: Mehran Maghoumi <[email protected]>
sarahyurick · Oct 21, 2024 · 4ad1a4d · 4ad1a4d
1 parent 94d41ee
commit 4ad1a4d
Show file tree

Hide file tree

Showing 2 changed files with 40 additions and 19 deletions.
diff --git a/tutorials/peft-curation-with-sdg/README.md b/tutorials/peft-curation-with-sdg/README.md
@@ -48,45 +48,49 @@ showcased in this code:
 
 * In order to run the data curation pipeline with semantic deduplication enabled, you would need an
 NVIDIA GPU.
-* To generate synthetic data, you would need a synthetic data generation model compatible with the OpenAI API. Out of the box, this tutorial supports the following model through the [build.nvidia.com](https://build.nvidia.com) API gateway:
+* To generate synthetic data, you would need a synthetic data generation model compatible with the [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). Out of the box, this tutorial supports the following model through the [build.nvidia.com](https://build.nvidia.com) API gateway:
   * [Nemotron-4 340B Instruct](https://build.nvidia.com/nvidia/nemotron-4-340b-instruct)
   * [LLaMa 3.1 405B Instruct](https://build.nvidia.com/meta/llama-3_1-405b-instruct)
-* For assigning qualitative metrics to the generated records, you would need a reward model compatible with the OpenAI API (such as the [Nemotron-4 340B Reward](https://build.nvidia.com/nvidia/nemotron-4-340b-reward) model).
+* For assigning qualitative metrics to the generated records, you would need a reward model compatible with the [OpenAI API](https://platform.openai.com/docs/api-reference/introduction) (such as the [Nemotron-4 340B Reward](https://build.nvidia.com/nvidia/nemotron-4-340b-reward) model).
 
-> **Note:** A valid [build.nvidia.com](https://build.nvidia.com) API key is required to use any of the above models.
+> **Note:** A valid [build.nvidia.com](https://build.nvidia.com) API key is required to use any of the above models. You can obtain a free API key by visiting [build.nvidia.com](https://build.nvidia.com) and creating an account with your email address.
 
 ## Usage
 After installing the NeMo Curator package, you can simply run the following commands:
 ```bash
 # Running the basic pipeline (no GPUs or external LLMs needed)
 python tutorials/peft-curation-with-sdg/main.py
 
-# Run with synthetic data generation and semantic dedeuplication
+# Running with synthetic data generation and semantic dedeuplication using
+# an external LLM inference endpoint located at "https://api.example.com/v1/chat/completions"
+# and the model called "my-llm-model" that is served at that endpoint:
 python tutorials/peft-curation-with-sdg/main.py \
-    --api-key YOUR_BUILD.NVIDIA.COM_API_KEY \
+    --synth-gen-endpoint https://api.example.com/v1/chat/completions \
+    --synth-gen-model my-llm-model \
+    --api-key API_KEY_FOR_LLM_ENDPOINT \
     --device gpu
 
 # Here are some examples that:
-# - Use the GPU and enable semantic deduplication
+# - Use the specified model from build.nvidia.com for synthetic data generation
 # - Do 1 round of synthetic data generation
 # - Generate synthetic data using 0.1% of the real data
-# - Use the specified model from build.nvidia.com for synthetic data generation
+# - Use the GPU and enable semantic deduplication
 
 # Using LLaMa 3.1 405B:
 python tutorials/peft-curation-with-sdg/main.py \
     --api-key YOUR_BUILD.NVIDIA.COM_API_KEY \
-    --device gpu \
+    --synth-gen-model "meta/llama-3.1-405b-instruct" \
     --synth-gen-rounds 1 \
     --synth-gen-ratio 0.001 \
-    --synth-gen-model "meta/llama-3.1-405b-instruct"
+    --device gpu
 
 # Using Nemotron-4 340B:
 python tutorials/peft-curation-with-sdg/main.py \
     --api-key YOUR_BUILD.NVIDIA.COM_API_KEY \
-    --device gpu \
+    --synth-gen-model "nvidia/nemotron-4-340b-instruct" \
     --synth-gen-rounds 1 \
     --synth-gen-ratio 0.001 \
-    --synth-gen-model "nvidia/nemotron-4-340b-instruct"
+    --device gpu
 ```
 
 By default, this tutorial will use at most 8 workers to run the curation pipeline. If you face any

diff --git a/tutorials/peft-curation-with-sdg/main.py b/tutorials/peft-curation-with-sdg/main.py
@@ -242,16 +242,28 @@ def run_pipeline(args, jsonl_fp):
     Returns:
         The file path to the final curated JSONL file.
     """
-    # Disable synthetic data generation if no model specified, or no API key is provided.
-    if args.synth_gen_model is None or args.synth_gen_model == "":
+    # Disable synthetic data generation if the necessary arguments are not provided.
+    if not args.synth_gen_endpoint:
+        print(
+            "No synthetic data generation endpoint provided. Skipping synthetic data generation."
+        )
+        args.synth_gen_rounds = 0
+    if not args.synth_gen_model:
         print(
             "No synthetic data generation model provided. Skipping synthetic data generation."
         )
-        args.synth_gen_round = 0
-    if args.api_key is None:
-        print("No API key provided. Skipping synthetic data generation.")
+        args.synth_gen_rounds = 0
+    if not args.api_key:
+        print(
+            "No synthetic data generation API key provided. Skipping synthetic data generation."
+        )
         args.synth_gen_rounds = 0
 
+    if args.synth_gen_rounds:
+        print(
+            f"Using {args.synth_gen_endpoint}/{args.synth_gen_model} for synthetic data generation."
+        )
+
     synth_gen_ratio = args.synth_gen_ratio
     synth_gen_rounds = args.synth_gen_rounds
     synth_n_variants = args.synth_n_variants
@@ -277,7 +289,7 @@ def run_pipeline(args, jsonl_fp):
     # Create the synthetic data generator.
     llm_client = AsyncOpenAIClient(
         AsyncOpenAI(
-            base_url="https://integrate.api.nvidia.com/v1",
+            base_url=args.synth_gen_endpoint,
             api_key=args.api_key or "",
             timeout=args.api_timeout,
         )
@@ -348,12 +360,17 @@ def run_pipeline(args, jsonl_fp):
 def main():
     parser = argparse.ArgumentParser()
     parser = ArgumentHelper(parser).add_distributed_args()
+    parser.add_argument(
+        "--synth-gen-endpoint",
+        type=str,
+        default="https://integrate.api.nvidia.com/v1",
+        help="The API endpoint to use for synthetic data generation. Any endpoint compatible with the OpenAI API can be used.",
+    )
     parser.add_argument(
         "--synth-gen-model",
         type=str,
         default="nvidia/nemotron-4-340b-instruct",
-        choices=["nvidia/nemotron-4-340b-instruct", "meta/llama-3.1-405b-instruct", ""],
-        help="The model from build.nvidia.com to use for synthetic data generation. Leave blank to skip synthetic data generation.",
+        help="The model from the provided API endpoint to use for synthetic data generation. Leave blank to skip synthetic data generation.",
     )
     parser.add_argument(
         "--synth-gen-ratio",