Skip to content

Commit

Permalink
Add tutorials (NVIDIA#164)
Browse files Browse the repository at this point in the history
Signed-off-by: Ryan Wolf <[email protected]>
  • Loading branch information
ryantwolf authored Jul 23, 2024
1 parent 9452189 commit 2681d48
Show file tree
Hide file tree
Showing 13 changed files with 5,606 additions and 0 deletions.
9 changes: 9 additions & 0 deletions tutorials/synthetic-preference-data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Synthetic Preference Data Generation Using Nemotron-4 340B

The provided notebook will demonstrate how to leverage [Llama 3.1 405B Instruct](https://build.nvidia.com/meta/llama3.1-405b-instruct), and [Nemotron-4 340B Reward](https://build.nvidia.com/nvidia/nemotron-4-340b-reward) through [build.nvidia.com](https://build.nvidia.com/explore/discover).

The build will be a demonstration of the following pipeline!

![image](./SDG%20Pipeline.png)

The pipeline is designed to create a preference dataset suitable for training a custom reward model using the [SteerLM method](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html), however consecutive responses (e.g. sample 1 with 2, 3 with 4, etc.) share the same prompt so the dataset can also be used for preference pairs for training an RLHF Reward Model or for DPO - using the helpfulness score.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 2681d48

Please sign in to comment.