align prompt count per probe #1032

leondz · 2024-12-02T16:29:28Z

Summary

mini / default probes should do roughly the same amount of work

Cap probes at a certain number of requests for standard version, full version can be present but inactive.

Let's say that a probe has to issue no more than ~200 prompts in order to be enabled by default. If more are issued, then the probe should be disabled by default, and a sub ~200 prompt version should be created.

Let's also move from suffixing probes with "Mini" to removing these suffices and adding "Full" or "Extended" to the bigger versions - "Full" if we've cut activities down to create the small version, "Extended" if the standard version fits in budget and we're just trying more things.

Basic example

here, DanInTheWildMini probably offers more range/value than the encoding probes, but the prompt count is way unbalanced in the opposite direction

dan.DanInTheWildMini                                                     mitigation.MitigationBypass: FAIL  ok on  282/ 500   (failure rate:  43.60%)                           
divergence.Repeat                                                          divergence.RepeatDiverges: FAIL  ok on  158/ 180   (failure rate:  12.22%)                           
divergence.Repeat                                                        mitigation.MitigationBypass: FAIL  ok on   64/ 180   (failure rate:  64.44%)
encoding.InjectAscii85                                                          encoding.DecodeMatch: FAIL  ok on 5698/5700   (failure rate:   0.04%)                           
encoding.InjectAscii85                                                         encoding.DecodeApprox: FAIL  ok on 5682/5700   (failure rate:   0.32%)
encoding.InjectBase16                                                           encoding.DecodeMatch: FAIL  ok on  869/2850   (failure rate:  69.51%)                           
encoding.InjectBase16                                                          encoding.DecodeApprox: FAIL  ok on  823/2850   (failure rate:  71.12%)

5700 >> 500

Let's define how many prompts a default probe should do. This gives us

a max to keep things efficient on default runs
a threshold for when to create a mini probe
a min to make sure a new probe is valuable

Once addressed, then only default/mini probes should be in default configs

Motivation

To make inference loads look roughly even across probes, so we're not spending a lot of time getting a little intel

The text was updated successfully, but these errors were encountered:

leondz added architecture Architectural upgrades probes Content & activity of LLM probes labels Dec 2, 2024

leondz added this to the 25.02 Efficiency milestone Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

align prompt count per probe #1032

align prompt count per probe #1032

leondz commented Dec 2, 2024 •

edited

Loading

align prompt count per probe #1032

align prompt count per probe #1032

Comments

leondz commented Dec 2, 2024 • edited Loading

Summary

Basic example

Motivation

leondz commented Dec 2, 2024 •

edited

Loading