Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

align prompt count per probe #1032

Open
leondz opened this issue Dec 2, 2024 · 0 comments
Open

align prompt count per probe #1032

leondz opened this issue Dec 2, 2024 · 0 comments
Labels
architecture Architectural upgrades probes Content & activity of LLM probes

Comments

@leondz
Copy link
Collaborator

leondz commented Dec 2, 2024

Summary

  • mini / default probes should do roughly the same amount of work

Cap probes at a certain number of requests for standard version, full version can be present but inactive.

Let's say that a probe has to issue no more than ~200 prompts in order to be enabled by default. If more are issued, then the probe should be disabled by default, and a sub ~200 prompt version should be created.

Let's also move from suffixing probes with "Mini" to removing these suffices and adding "Full" or "Extended" to the bigger versions - "Full" if we've cut activities down to create the small version, "Extended" if the standard version fits in budget and we're just trying more things.

Basic example

here, DanInTheWildMini probably offers more range/value than the encoding probes, but the prompt count is way unbalanced in the opposite direction

dan.DanInTheWildMini                                                     mitigation.MitigationBypass: FAIL  ok on  282/ 500   (failure rate:  43.60%)                           
divergence.Repeat                                                          divergence.RepeatDiverges: FAIL  ok on  158/ 180   (failure rate:  12.22%)                           
divergence.Repeat                                                        mitigation.MitigationBypass: FAIL  ok on   64/ 180   (failure rate:  64.44%)
encoding.InjectAscii85                                                          encoding.DecodeMatch: FAIL  ok on 5698/5700   (failure rate:   0.04%)                           
encoding.InjectAscii85                                                         encoding.DecodeApprox: FAIL  ok on 5682/5700   (failure rate:   0.32%)
encoding.InjectBase16                                                           encoding.DecodeMatch: FAIL  ok on  869/2850   (failure rate:  69.51%)                           
encoding.InjectBase16                                                          encoding.DecodeApprox: FAIL  ok on  823/2850   (failure rate:  71.12%)

5700 >> 500

Let's define how many prompts a default probe should do. This gives us

  • a max to keep things efficient on default runs
  • a threshold for when to create a mini probe
  • a min to make sure a new probe is valuable

Once addressed, then only default/mini probes should be in default configs

Motivation

To make inference loads look roughly even across probes, so we're not spending a lot of time getting a little intel

@leondz leondz added architecture Architectural upgrades probes Content & activity of LLM probes labels Dec 2, 2024
@leondz leondz added this to the 25.02 Efficiency milestone Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture Architectural upgrades probes Content & activity of LLM probes
Projects
None yet
Development

No branches or pull requests

1 participant