-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples of or recommendations for good adaptive evaluations #11
Comments
Great comment. I agree that trying to see what might be common to adaptive attacks could be useful. I guess one thing mentioned in the paper already is to use approximations for non-differentiable operations (i.e, the attacker is adapting to the non-differentiable part of the net). Maybe it makes sense to pull this, and other ideas out into a more well-defined section on common strategies for adaptive attacks. |
Including how do good thorough adaptive analysis would definitely be wonderful. I personally don't know if I have anything concrete to say on the topic, unfortunately. I worry that if we give some advice on it, then it will cause some people to just follow that approach without thinking more carefully. Which is why we focus mainly giving concrete advice on the things to not do, and just give rather broad discussions of how to do good adaptive evaluations. If you can think of anything generally helpful here, that would be great. Including pointers to papers with good evaluations sounds like an excellent idea as case studies for how people have done good evaluations in the past. |
The problem I could see with pointing to "good" papers is a form of gatekeeping (e.g., most of the papers I can think of are co-written by some of the authors of this paper). |
Hm. That's true, I don't want it to seem like we are pushing our own work preferentially. It also might set the stage for a future debate where people ask for their papers to be included in the "defense papers with good evaluations" list. |
Maybe the point worth expanding on is the recommendation "The loss function is changed as appropriate to cause misclassification". I think the specific point to make here is that you should think hard about why you think your defense works, and then build a new loss function that explicitly attacks this assumption. This suggestion may seem obvious (maybe it is), but at least I think there's relatively little that could go wrong with someone doing this. |
Another idea might be to say: if you use another "sentry" ML model as a "detector" or "robustifier" then an attacker might try to adapt and attack that sentry also, thus simultaneously fooling the original model, and the sentry model. |
Agreed. What's nice about the above generic formulation is that it can cover your example as a special case: the property P that distinguishes adversarial and natural examples is simply the output of your detector or robustifier. |
First, thanks for the great work in setting up this document!
The checklist and detailed explanations in Section 3-5 seem to mostly cover recommendations for how to evaluate defenses using currently known (presumably non-adaptive) attacks.
These are of course extremely valuable, as they are rarely followed rigorously in papers today.
Yet even if they were followed, I think many defenses (especially ones that merely detect adversarial examples) could pass such a stringent evaluation if the attack is not properly adapted to the defense. The current paper does touch on adaptive attacks, but doesn't give much more advice beyond "use adaptive attacks".
I wonder whether we could start a discussion on some general principles that make up a good adaptive attack. In my personal experience, creating adaptive attacks has often been quite an ad-hoc process, and I've encountered a few papers that claim an adaptive, yet unconvincing evaluation. So if anyone has some principles or guidelines to share that they've found useful in the past for creating good adaptive attacks, it would be great to hear about them.
At the very least, I think it would be worthwhile for the paper to explicitly point to and discuss works that are believed to have performed a good adaptive analysis (there's of course a bunch of attack papers in this list, but identifying some recent defense papers that seem to do the right thing would be very useful for readers).
The text was updated successfully, but these errors were encountered: