Release v2.0.0 (#64)

What's New: New Features & Enhancements - Introduced Multistage Attack: We've added a novel `multistage_depth` parameter to the `start_testing()` fucntion, allowing users to specify the depth of a dialogue during testing, enabling more sophisticated and targeted LLM Red teaming strategies. - Refactored Sycophancy Attack: The `sycophancy_test` has been renamed to `sycophancy`, transforming it into a multistage attack for increased effectiveness in uncovering model vulnerabilities. - Enhanced Logical Inconsistencies Attack: The `logical_inconsistencies_test` has been renamed to `logical_inconsistencies` and restructured as a multistage attack to better detect and exploit logical weaknesses within language models. - New Multistage Harmful Behavior Attack: Introducing `harmful_behaviour_multistage`, a more nuanced version of the original harmful behavior attack, designed for deeper penetration testing. - Innovative System Prompt Leakage Attack: We've developed a new multistage attack, `system_prompt_leakage`, leveraging jailbreak examples from dataset to target and exploit model internals. Improvements & Refinements - Conducted extensive refactoring for improved code efficiency and maintainability across the framework. - Made numerous small improvements and optimizations to enhance overall performance and user experience. --------- Co-authored-by: Timur Nizamov <[email protected]> Co-authored-by: Nikita Ivanov <[email protected]>
RomiconEZ · Jan 14, 2025 · 0404080 · 0404080
1 parent 35382c6
commit 0404080
Show file tree

Hide file tree

Showing 48 changed files with 2,377 additions and 1,266 deletions.
diff --git a/.bumpversion.cfg b/.bumpversion.cfg
@@ -1,5 +1,5 @@
 [bumpversion]
-current_version = 1.1.1
+current_version = 2.0.0
 commit = False
 tag = False
 parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+))?

diff --git a/.gitignore b/.gitignore
@@ -87,5 +87,7 @@ report.xml
 
 # CMake
 cmake-build-*/
+
 */artifacts/
 /examples/chrome-data/
+/venv/
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -49,11 +49,11 @@ pre-commit install
 
 ### Run tests
 
-1) Go to tests/test_local_llamator.py
+1) Go to `tests/test_local_llamator.py`.
 
-2) Create .env from .env.example and fill in the necessary fields.
+2) Create `.env` from `.env.example` and fill in the necessary fields.
 
-3) Run the function to perform testing depending on your LLM client
+3) Run the function to perform testing depending on your LLM client.
 
 ## Making Changes
 
@@ -62,21 +62,21 @@ pre-commit install
     ```bash
     git checkout -b your-branch-name
     ```
+
 2. Make your changes to the code and add or modify unit tests as necessary.
 
-3. Run tests again
+3. Run tests again.
 
-4. Commit Your Changes
+4. Commit Your Changes.
 
     Keep your commits as small and focused as possible and include meaningful commit messages.
     ```bash
     git add .
     git commit -m "Add a brief description of your change"
     ```
 
-5. Push the changes you did to GitHub
+5. Push the changes you did to GitHub.
 
-6.
     ```bash
     git push origin your-branch-name
     ```
@@ -86,83 +86,91 @@ pre-commit install
 The easist way to contribute to LLAMATOR project is by creating a new test!
 This can be easily acheived by:
 
-#### 1. Create a Test File
-* Navigate to the attacks directory.
+#### 1. Create a Test File:
+* Navigate to the `attacks` directory.
 * Create a new python file, naming it after the specific attack or the dataset it utilizes.
 
-#### 2. Set Up Your File
+#### 2. Set Up Your File.
 
 The easiest way is to copy the existing attack (py file in the attacks directory)
-and change the elements in it according to your implementation
+and change the elements in it according to your implementation.
 
-#### 3. Creating datasets with texts for attacks
+For multi-stage attack implementation see "What Drives the Multi-stage?" notes in [docs](https://romiconez.github.io/llamator/attacks_description.html).
 
-All files containing attack texts or prompts must be in parquet format.
+#### 3. Creating datasets with texts for attacks.
 
-These files are stored in the attack_data folder.
+All files containing attack texts or prompts must be in `.parquet` format.
 
-#### 3. Add your attack file name to the attack_loader.py file:
-```text
-from .attacks import (
-    dynamic_test,
-    translation,
-    typoglycemia,
-    dan,
+These files are stored in the `attack_data` folder.
+
+#### 3. Add your attack file name to the `attack_loader.py` file:
+```python
+from ..attacks import (  # noqa
     aim,
-    self_refine,
-    ethical_compliance,
-    ucar,
+    base64_injection,
     complimentary_transition,
+    dan,
+    ethical_compliance,
     harmful_behavior,
-    base64_injection
+    linguistic,
+    logical_inconsistencies,
+    past_tense,
+    ru_dan,
+    ru_typoglycemia,
+    ru_ucar,
+    sycophancy,
+    typoglycemia,
+    ucar,
 
     #TODO: YOUR TEST HERE
 )
 ```
 
-#### 4. Add your attack name to the initial_validation.py file:
-```text
+#### 4. Add your attack name to the docstring of `start_testing()` in `main.py` and `initial_validation.py` file:
+```python
 AvailableTests = [
     "aim_jailbreak",
     "base64_injection",
     "complimentary_transition",
     "do_anything_now_jailbreak",
-    "RU_do_anything_now_jailbreak",
     "ethical_compliance",
     "harmful_behavior",
-    "past_tense",
     "linguistic_evasion",
-    "sycophancy_test",
-    "typoglycemia_attack",
+    "logical_inconsistencies",
+    "past_tense",
+    "RU_do_anything_now_jailbreak",
     "RU_typoglycemia_attack",
-    "ucar",
     "RU_ucar",
+    "sycophancy",
+    "typoglycemia_attack",
+    "ucar",
 
     #TODO: YOUR TEST HERE
 ]
 ```
 
-#### 5. Add your attack description to the attack_descriptions.json file:
+#### 5. Add your attack to the `attack_descriptions.json` and `attack_descriptions.md` files.
 
 #### 6. Open a PR! Submit your changes for review by opening a pull request.
 
-## Submitting a pull request
+## Submitting a pull request.
 
-1. Update your branch
+1. Update your branch.
 
    Fetch any new changes from the base branch and rebase your branch.
    ```bash
    git fetch origin
    git rebase origin/main
+   ```
 
-2. Submit a Pull Request
+2. Submit a Pull Request.
 
     Go to GitHub and submit a pull request from your branch to the project main branch.
 
-3. Request Reviews
+3. Request Reviews.
 
     Request reviews from other contributors listed as maintainers. If you receive a feedback - make any necessary changes and push them.
 
-4. Merge
+4. Merge.
 
     Once your pull request is approved, it will be merged into the main branch.
diff --git a/README.md b/README.md
@@ -1,13 +1,18 @@
 # LLAMATOR
 
-## Description 📖
+Red Teaming python-framework for testing chatbots and LLM-systems
 
-Red teaming python-framework for testing vulnerabilities of chatbots based on large language models (LLM). Supports testing of Russian-language RAG systems.
+[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
+[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/llamator)](https://pypi.org/project/llamator)
+[![PyPI](https://badge.fury.io/py/llamator.svg)](https://badge.fury.io/py/llamator)
+[![Downloads](https://pepy.tech/badge/llamator)](https://pepy.tech/project/llamator)
+[![Downloads](https://pepy.tech/badge/llamator/month)](https://pepy.tech/project/llamator)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
 ## Install 🚀
 
 ```bash
-pip install llamator==1.1.1
+pip install llamator==2.0.0
 ```
 
 ## Documentation 📚
@@ -16,29 +21,47 @@ Documentation Link: [https://romiconez.github.io/llamator](https://romiconez.git
 
 ## Examples 💡
 
-* 📄 [RAG Chatbot testing via API (RU)](https://github.com/RomiconEZ/llamator/blob/release/examples/llamator-api.ipynb)
-* 🧙‍♂️ [Gandalf bot testing via Selenium (RU)](https://github.com/RomiconEZ/llamator/blob/release/examples/llamator-selenium.ipynb)
-* 💬 [Telegram bot testing via Telethon (RU)](https://github.com/RomiconEZ/llamator/blob/release/examples/llamator-telegram.ipynb)
-* 📱 [WhatsApp client testing via Selenium (ENG)](https://github.com/RomiconEZ/llamator/blob/release/examples/llamator-whatsapp.ipynb)
-* 🔗 [LangChain client testing with custom attack (RU)](https://github.com/RomiconEZ/llamator/blob/release/examples/llamator-langchain-custom-attack.ipynb)
+* 📄 [RAG bot testing via REST API](https://github.com/RomiconEZ/llamator/blob/release/examples/llamator-api.ipynb)
+* 🧙‍♂️ [Gandalf web bot testing via Selenium](https://github.com/RomiconEZ/llamator/blob/release/examples/llamator-selenium.ipynb)
+* 💬 [Telegram bot testing via Telethon](https://github.com/RomiconEZ/llamator/blob/release/examples/llamator-telegram.ipynb)
+* 📱 [WhatsApp bot testing via Selenium](https://github.com/RomiconEZ/llamator/blob/release/examples/llamator-whatsapp.ipynb)
+* 🔗 [LangChain client testing with custom attack](https://github.com/RomiconEZ/llamator/blob/release/examples/llamator-langchain-custom-attack.ipynb)
 
 ## Supported Clients 🛠️
 
 * 🌐 All LangChain clients
 * 🧠 OpenAI-like API
-* ⚙️ Custom Class (Telegram, Selenium, etc.)
+* ⚙️ Custom Class (Telegram, WhatsApp, Selenium, etc.)
 
 ## Unique Features 🌟
 
-* 🛡️ Support for custom attacks from the user
-* 📊 Results of launching each attack in CSV format
-* 📈 Report with attack requests and responses for all tests in Excel format
-* 📄 Test report document available in DOCX format
+* ️🗡 Support for custom attacks from the user
+* 👜 Large selection of attacks on RAG / Agent / Prompt in English and Russian
+* 🛡 Custom configuration of chat clients
+* 📊 History of attack requests and responses in Excel and CSV format
+* 📄 Test report document in DOCX format
+
+## OWASP Classification 🔒
+
+* 💉 [LLM01: Prompt Injection and Jailbreaks](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/blob/main/2_0_vulns/LLM01_PromptInjection.md)
+* 🕵 [LLM07: System Prompt Leakage](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/blob/main/2_0_vulns/LLM07_SystemPromptLeakage.md)
+* 🎭 [LLM09: Misinformation](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/blob/main/2_0_vulns/LLM09_Misinformation.md)
+
+## Community 🌍
+
+* 📣 [Telegram Channel — AI Security Lab](https://t.me/aisecuritylab)
+* 💬 [Telegram Chat — LLAMATOR | AI Red Team Community](https://t.me/llamator)
+
+## Supported by 🚀
+
+* [AI Security Lab ITMO](https://ai.itmo.ru/aisecuritylab)
+* [Raft Security](https://raftds.ru/)
+* [AI Talent Hub](https://ai.itmo.ru/)
 
 ## License 📜
 
 © Roman Neronov, Timur Nizamov, Nikita Ivanov
 
 This project is licensed under the terms of the **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International** license. See the [LICENSE](LICENSE) file for details.
 
-[![Creative Commons License](https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
+[![Creative Commons License](https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png)](https://creativecommons.org/licenses/by-nc-sa/4.0/)