-
Notifications
You must be signed in to change notification settings - Fork 112
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Nemotron CC SDG Pipelines and Pre-processing/Post-Processing Stag…
…es (#527) * Add document splitter and joiner Signed-off-by: Ryan Wolf <[email protected]> * Add support for id field in joiner Signed-off-by: Ryan Wolf <[email protected]> * Fix splitter and joiner Signed-off-by: Ryan Wolf <[email protected]> * Add token count filter Signed-off-by: Ryan Wolf <[email protected]> * Add postprocessing steps for nemotron cc sdg Signed-off-by: Ryan Wolf <[email protected]> * Make left and right bounds optional Signed-off-by: Ryan Wolf <[email protected]> * Add wikipedia rephrasing pipeline Signed-off-by: Ryan Wolf <[email protected]> * Add diverse QA stages Signed-off-by: Ryan Wolf <[email protected]> * Add distillation Signed-off-by: Ryan Wolf <[email protected]> * Add extract knowledge prompt Signed-off-by: Ryan Wolf <[email protected]> * Add knowledge list prompt template Signed-off-by: Ryan Wolf <[email protected]> * Add metadata to knowledge list postprocessor Signed-off-by: Ryan Wolf <[email protected]> * Remove tokenizer from nemotron cc and add docstrings Signed-off-by: Ryan Wolf <[email protected]> * Add API docs and make modules use base class Signed-off-by: Ryan Wolf <[email protected]> * Add tests for new modules Signed-off-by: Ryan Wolf <[email protected]> * Add async nemotron cc and rename classes Signed-off-by: Ryan Wolf <[email protected]> * Add rst section and API docs Signed-off-by: Ryan Wolf <[email protected]> * Address Vibhu and Praateek's reviews Signed-off-by: Ryan Wolf <[email protected]> * Fix splitter and joiner call method Signed-off-by: Ryan Wolf <[email protected]> * Add type hint for cudf Signed-off-by: Ryan Wolf <[email protected]> * Fix typing for cudf Signed-off-by: Ryan Wolf <[email protected]> * Address Lawrence's review Signed-off-by: Ryan Wolf <[email protected]> --------- Signed-off-by: Ryan Wolf <[email protected]>
- Loading branch information
Showing
25 changed files
with
2,611 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.