Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ToxicLanguage validator #422

Merged
merged 39 commits into from
Dec 7, 2023
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
afd808e
Add ToxicLanguage validator
thekaranacharya Nov 2, 2023
84a9b5e
Add walkthrough notebook
thekaranacharya Nov 2, 2023
e0d2e8f
Update model name and truncation/padding logic
thekaranacharya Nov 2, 2023
dbed25a
Add unit tests and integration tests
thekaranacharya Nov 2, 2023
3cafdb2
Modify dev requirements for unit test
thekaranacharya Nov 2, 2023
80c9dd3
Update default threshold based on new experiments
thekaranacharya Nov 3, 2023
beb080a
Update docstring with link to W&B project
thekaranacharya Nov 3, 2023
7cd32df
Merge branch 'main' into karan/sensitive-language
thekaranacharya Nov 3, 2023
d22da73
Update setup.py
thekaranacharya Nov 3, 2023
b3a2717
Bugfix
thekaranacharya Nov 3, 2023
664aa75
Strong type value to str and handle empty value
thekaranacharya Nov 3, 2023
5b86bbb
Add check for nltk in validate_each_sentence and else condition in va…
thekaranacharya Nov 3, 2023
ae36c1d
Add check for non-empty value in get_toxicity
thekaranacharya Nov 3, 2023
933f347
Add check for empty results from pipeline
thekaranacharya Nov 3, 2023
e93e628
Type cast to list
thekaranacharya Nov 3, 2023
95a555f
Convert results to list
thekaranacharya Nov 3, 2023
5e85ffd
Remove list type casting
thekaranacharya Nov 3, 2023
d489ed1
Remove extra unnecessary check
thekaranacharya Nov 3, 2023
fa79e8f
Merge branch 'main' into karan/sensitive-language
thekaranacharya Nov 13, 2023
f8828bc
Delete setup.py
thekaranacharya Nov 13, 2023
4d91e59
Update pyproject
thekaranacharya Nov 13, 2023
28b3f5b
Update both poetry files
thekaranacharya Nov 13, 2023
ebde652
Revert "Update both poetry files"
thekaranacharya Nov 13, 2023
12e497f
Only update pyproject without poetry.lock
thekaranacharya Nov 13, 2023
9c415f1
Run poetry lock --no-update
thekaranacharya Nov 13, 2023
e6e4b13
Remove torch dependency
thekaranacharya Nov 13, 2023
cc20815
Probable fix for torch dependency
thekaranacharya Nov 13, 2023
ea0929f
Remove spacing
thekaranacharya Nov 13, 2023
e1ac6fd
Add strong type casting for results from transformers model
thekaranacharya Nov 13, 2023
815dd07
Fix merge conflicts
thekaranacharya Nov 13, 2023
92e99cb
Fix linting
thekaranacharya Nov 13, 2023
8aae6fb
Change casting
thekaranacharya Nov 13, 2023
f71d2dd
fix merge conflicts
Dec 4, 2023
8ee9d61
lint fix
Dec 4, 2023
70af18d
update from bad merge
Dec 4, 2023
cfed130
add validator to init file
Dec 4, 2023
87ba4c7
add pipeline
Dec 4, 2023
89b6d53
Merge branch '0.3.0' into karan/sensitive-language
zsimjee Dec 6, 2023
3500799
ref validated outputs in toxic lang tests
zsimjee Dec 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions docs/examples/toxic_language.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Check whether an LLM-generated response contains toxic language\n",
"\n",
"### Using the `ToxicLanguage` validator\n",
"\n",
"This is a simple walkthrough of the `ToxicLanguage` validator. This validator checks whether an LLM-generated response contains toxic language. It uses the pre-trained multi-label model from HuggingFace -`unitary/unbiased-toxic-roberta` to check whether the generated text is toxic. It supports both full-text-level and sentence-level validation.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Import the guardrails package\n",
"# and the ToxicLanguage validator\n",
"import guardrails as gd\n",
"from guardrails.validators import ToxicLanguage\n",
"from rich import print"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Create a Guard object with this validator\n",
"# Here, we'll use the default validation method of \"sentence\"\n",
"# and the default threshold of 0.25\n",
"\n",
"guard = gd.Guard.from_string(\n",
" validators=[ToxicLanguage(on_fail=\"fix\")],\n",
" description=\"testmeout\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"What a lovely day it has been to roam around the city.\n",
"I love it here!\n",
"</pre>\n"
],
"text/plain": [
"\n",
"What a lovely day it has been to roam around the city.\n",
"I love it here!\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Parse the raw response\n",
"raw_response = \"\"\"\n",
"What a lovely day it has been to roam around the city. I love it here!\n",
"I hate how pathetic and needy you can be sometimes.\n",
"\"\"\"\n",
"output = guard.parse(\n",
" llm_output=raw_response,\n",
")\n",
"\n",
"# Print the output\n",
"print(output)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see here, the sentence `I hate how pathetic and needy you can be sometimes.` has been detected as toxic, and hence removed from the validated output.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"I am a Generative AI model that is trained on a large corpus of text.\n",
"This is a very powerful tool for generating new text, but it can also be used to generate text that is offensive or\n",
"hateful.\n",
"</pre>\n"
],
"text/plain": [
"\n",
"I am a Generative AI model that is trained on a large corpus of text.\n",
"This is a very powerful tool for generating new text, but it can also be used to generate text that is offensive or\n",
"hateful.\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Parse the raw response\n",
"raw_response = \"\"\"\n",
"I am a Generative AI model that is trained on a large corpus of text.\n",
"I am shocked by how disgusting and vile you are.\n",
"This is a very powerful tool for generating new text, but it can also be used to generate text that is offensive or hateful.\n",
"\"\"\"\n",
"output = guard.parse(\n",
" llm_output=raw_response,\n",
")\n",
"\n",
"# Print the output\n",
"print(output)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, here the sentence `I am shocked by how disgusting and vile you are.` has been detected as toxic, and hence removed from the validated output.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Test with validation method 'full'\n",
"full_guard = gd.Guard.from_string(\n",
" validators=[ToxicLanguage(validation_method=\"full\", on_fail=\"fix\")],\n",
" description=\"testmeout\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"</pre>\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Parse the raw response\n",
"raw_response = \"Stop being such a dumb piece of shit. Why can't you comprehend this?\"\n",
"output = full_guard.parse(\n",
" llm_output=raw_response,\n",
")\n",
"\n",
"# Print the output\n",
"print(output)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we're doing validation on the entire text, and toxic language was detected here - hence, the nothing is returned here.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "lang",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
3 changes: 3 additions & 0 deletions guardrails/validators/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
from guardrails.validators.similar_to_document import SimilarToDocument
from guardrails.validators.similar_to_list import SimilarToList
from guardrails.validators.sql_column_presence import SqlColumnPresence
from guardrails.validators.toxic_language import ToxicLanguage, pipeline
from guardrails.validators.two_words import TwoWords
from guardrails.validators.upper_case import UpperCase
from guardrails.validators.valid_choices import ValidChoices
Expand Down Expand Up @@ -75,10 +76,12 @@
"PIIFilter",
"SimilarToList",
"DetectSecrets",
"ToxicLanguage",
# Validator helpers
"detect_secrets",
"AnalyzerEngine",
"AnonymizerEngine",
"pipeline",
# Base classes
"Validator",
"register_validator",
Expand Down
Loading