Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pedantic spelling #1085

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ And if you like the project, but just don't have time to contribute, that's fine

## I Have a Question

If you want to ask a question, good places to check first are the [garak quick start docs](https://docs.garak.ai) and, if its a coding question, the [garak reference](https://reference.garak.ai/).
If you want to ask a question, good places to check first are the [garak quick start docs](https://docs.garak.ai) and, if it's a coding question, the [garak reference](https://reference.garak.ai/).

Before you ask a question, it is best to search for existing [Issues](https://github.com/NVIDIA/garak/issues) that might help you. In case you have found a suitable issue and still need clarification, you can write your question in this issue. It is also advisable to search the internet for answers first. You can also often find helpful people on the garak [Discord](https://discord.gg/uVch4puUCs).

Expand Down Expand Up @@ -70,7 +70,7 @@ A good bug report shouldn't leave others needing to chase you up for more inform
<!-- omit in toc -->
#### How Do I Submit a Good Bug Report?

You should never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead sensitive bugs must be sent by email to [email protected].
You should never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead, sensitive bugs must be sent by email to [email protected].
<!-- You may add a PGP key to allow the messages to be sent encrypted as well. -->

We use GitHub issues to track bugs and errors. If you run into an issue with the project:
Expand Down
2 changes: 1 addition & 1 deletion FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ No, if the model is the same, you should get the same results - though there are

## How can I scan a RAG pipeline with garak?

Currently the major attack we hear about in RAG systems is indirect prompt injection, and garak already scans for a few of those.
Currently, the major attack we hear about in RAG systems is indirect prompt injection, and garak already scans for a few of those.

## There are so many probes in garak, I was trying to scan a model for all probes, but it took hours and I eventually had to kill that scan. What is the recommended practice on scanning a model? Which typical probes are recommended?

Expand Down
44 changes: 22 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,28 +228,28 @@ For testing. This generator repeats back the prompt it received.

## Intro to probes

| Probe | Description |
| --- | --- |
| blank | A simple probe that always sends an empty prompt. |
| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/garak-llm/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). |
| av_spam_scanning | Probes that attempt to make the model output malicious content signatures |
| continuation | Probes that test if the model will continue a probably undesirable word |
| dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks |
| donotanswer | Prompts to which responsible language models should not answer. |
| encoding | Prompt injection through text encoding |
| gcg | Disrupt a system prompt by appending an adversarial suffix. |
| glitch | Probe model for glitch tokens that provoke unusual behavior. |
| grandma | Appeal to be reminded of one's grandmother. |
| goodside | Implementations of Riley Goodside attacks. |
| leakerplay | Evaluate if a model will replay training data. |
| lmrc | Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes |
| malwaregen | Attempts to have the model generate code for building malware |
| misleading | Attempts to make a model support misleading and false claims |
| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. |
| promptinject | Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022) |
| realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run) |
| snowball | [Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process |
| xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration. |
| Probe | Description |
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format table so it presents as intended when not in preview mode

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the idea here and readable markdown is a good practice, in this case I would say the extra formatting in the second column does not really add value as many editors and raw viewers will wrap lines longer than 120 characters or wider than the terminal or window view.

@leondz @erickgalinkin any thoughts on this?

|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| blank | A simple probe that always sends an empty prompt. |
| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/garak-llm/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). |
| av_spam_scanning | Probes that attempt to make the model output malicious content signatures |
| continuation | Probes that test if the model will continue a probably undesirable word |
| dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks |
| donotanswer | Prompts to which responsible language models should not answer. |
| encoding | Prompt injection through text encoding |
| gcg | Disrupt a system prompt by appending an adversarial suffix. |
| glitch | Probe model for glitch tokens that provoke unusual behavior. |
| grandma | Appeal to be reminded of one's grandmother. |
| goodside | Implementations of Riley Goodside attacks. |
| leakerplay | Evaluate if a model will replay training data. |
| lmrc | Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes |
| malwaregen | Attempts to have the model generate code for building malware |
| misleading | Attempts to make a model support misleading and false claims |
| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. |
| promptinject | Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022) |
| realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run) |
| snowball | [Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process |
| xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration. |

## Logging

Expand Down
Loading