Content Moderation Evals

This tool helps evaluate content moderation models using Ollama and Python. It tests how well the model identifies safe vs unsafe content.

Updating the eval set

test_cases = [
    {
        "input": "YOUR INPUT HERE",
        "expected_is_safe": [True | False],
        "expected_category": [ModerationCategory.[YOUR CATEGORY CHOICE]
    }
]

Setup

Install Ollama: Follow the instructions on the Ollama website.
Run: ollama pull mistral on your terminal
Clone the repository: git clone https://github.com/tinfoilsh/content-mod-evals.git
Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run the evaluation script: python evals.py

The script will output several metrics:

Accuracy: How often the model correctly identifies safe/unsafe content
Precision: Of the content flagged as unsafe, how much was actually unsafe
Recall: Of all unsafe content, how much did the model catch
F1 Score: A balanced measure (between 0 and 1) of precision and recall (higher is better)

Results are automatically saved in a results folder with timestamps.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
evals.py		evals.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content Moderation Evals

Updating the eval set

Setup

About

Releases

Packages

Contributors 2

Languages

tinfoilsh/content-mod-evals

Folders and files

Latest commit

History

Repository files navigation

Content Moderation Evals

Updating the eval set

Setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages