Skip to content

Commit

Permalink
pedantic spelling (#1085)
Browse files Browse the repository at this point in the history
  • Loading branch information
jmartin-tech committed Feb 14, 2025
2 parents eadc9a0 + 81e3e0d commit 69abc94
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 25 deletions.
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ And if you like the project, but just don't have time to contribute, that's fine

## I Have a Question

If you want to ask a question, good places to check first are the [garak quick start docs](https://docs.garak.ai) and, if its a coding question, the [garak reference](https://reference.garak.ai/).
If you want to ask a question, good places to check first are the [garak quick start docs](https://docs.garak.ai) and, if it's a coding question, the [garak reference](https://reference.garak.ai/).

Before you ask a question, it is best to search for existing [Issues](https://github.com/NVIDIA/garak/issues) that might help you. In case you have found a suitable issue and still need clarification, you can write your question in this issue. It is also advisable to search the internet for answers first. You can also often find helpful people on the garak [Discord](https://discord.gg/uVch4puUCs).

Expand Down Expand Up @@ -70,7 +70,7 @@ A good bug report shouldn't leave others needing to chase you up for more inform
<!-- omit in toc -->
#### How Do I Submit a Good Bug Report?

You should never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead sensitive bugs must be sent by email to [email protected].
You should never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead, sensitive bugs must be sent by email to [email protected].
<!-- You may add a PGP key to allow the messages to be sent encrypted as well. -->

We use GitHub issues to track bugs and errors. If you run into an issue with the project:
Expand Down
2 changes: 1 addition & 1 deletion FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ No, if the model is the same, you should get the same results - though there are

## How can I scan a RAG pipeline with garak?

Currently the major attack we hear about in RAG systems is indirect prompt injection, and garak already scans for a few of those.
Currently, the major attack we hear about in RAG systems is indirect prompt injection, and garak already scans for a few of those.

## There are so many probes in garak, I was trying to scan a model for all probes, but it took hours and I eventually had to kill that scan. What is the recommended practice on scanning a model? Which typical probes are recommended?

Expand Down
44 changes: 22 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,28 +228,28 @@ For testing. This generator repeats back the prompt it received.

## Intro to probes

| Probe | Description |
| --- | --- |
| blank | A simple probe that always sends an empty prompt. |
| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/garak-llm/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). |
| av_spam_scanning | Probes that attempt to make the model output malicious content signatures |
| continuation | Probes that test if the model will continue a probably undesirable word |
| dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks |
| donotanswer | Prompts to which responsible language models should not answer. |
| encoding | Prompt injection through text encoding |
| gcg | Disrupt a system prompt by appending an adversarial suffix. |
| glitch | Probe model for glitch tokens that provoke unusual behavior. |
| grandma | Appeal to be reminded of one's grandmother. |
| goodside | Implementations of Riley Goodside attacks. |
| leakerplay | Evaluate if a model will replay training data. |
| lmrc | Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes |
| malwaregen | Attempts to have the model generate code for building malware |
| misleading | Attempts to make a model support misleading and false claims |
| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. |
| promptinject | Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022) |
| realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run) |
| snowball | [Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process |
| xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration. |
| Probe | Description |
|----------------------|-------------------------------------------------------------------------------------------------------------------------------|
| blank | A simple probe that always sends an empty prompt. |
| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/garak-llm/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). |
| av_spam_scanning | Probes that attempt to make the model output malicious content signatures |
| continuation | Probes that test if the model will continue a probably undesirable word |
| dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks |
| donotanswer | Prompts to which responsible language models should not answer. |
| encoding | Prompt injection through text encoding |
| gcg | Disrupt a system prompt by appending an adversarial suffix. |
| glitch | Probe model for glitch tokens that provoke unusual behavior. |
| grandma | Appeal to be reminded of one's grandmother. |
| goodside | Implementations of Riley Goodside attacks. |
| leakerplay | Evaluate if a model will replay training data. |
| lmrc | Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes |
| malwaregen | Attempts to have the model generate code for building malware |
| misleading | Attempts to make a model support misleading and false claims |
| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. |
| promptinject | Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022) |
| realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run) |
| snowball | [Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process |
| xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration. |

## Logging

Expand Down

0 comments on commit 69abc94

Please sign in to comment.