Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples of or recommendations for good adaptive evaluations #11

Open
ftramer opened this issue Feb 20, 2019 · 7 comments
Open

Examples of or recommendations for good adaptive evaluations #11

ftramer opened this issue Feb 20, 2019 · 7 comments

Comments

@ftramer
Copy link
Contributor

ftramer commented Feb 20, 2019

First, thanks for the great work in setting up this document!

The checklist and detailed explanations in Section 3-5 seem to mostly cover recommendations for how to evaluate defenses using currently known (presumably non-adaptive) attacks.
These are of course extremely valuable, as they are rarely followed rigorously in papers today.

Yet even if they were followed, I think many defenses (especially ones that merely detect adversarial examples) could pass such a stringent evaluation if the attack is not properly adapted to the defense. The current paper does touch on adaptive attacks, but doesn't give much more advice beyond "use adaptive attacks".

I wonder whether we could start a discussion on some general principles that make up a good adaptive attack. In my personal experience, creating adaptive attacks has often been quite an ad-hoc process, and I've encountered a few papers that claim an adaptive, yet unconvincing evaluation. So if anyone has some principles or guidelines to share that they've found useful in the past for creating good adaptive attacks, it would be great to hear about them.

At the very least, I think it would be worthwhile for the paper to explicitly point to and discuss works that are believed to have performed a good adaptive analysis (there's of course a bunch of attack papers in this list, but identifying some recent defense papers that seem to do the right thing would be very useful for readers).

@earlenceferns
Copy link

Great comment. I agree that trying to see what might be common to adaptive attacks could be useful. I guess one thing mentioned in the paper already is to use approximations for non-differentiable operations (i.e, the attacker is adapting to the non-differentiable part of the net). Maybe it makes sense to pull this, and other ideas out into a more well-defined section on common strategies for adaptive attacks.

@carlini
Copy link
Member

carlini commented Feb 21, 2019

Including how do good thorough adaptive analysis would definitely be wonderful. I personally don't know if I have anything concrete to say on the topic, unfortunately. I worry that if we give some advice on it, then it will cause some people to just follow that approach without thinking more carefully. Which is why we focus mainly giving concrete advice on the things to not do, and just give rather broad discussions of how to do good adaptive evaluations.

If you can think of anything generally helpful here, that would be great. Including pointers to papers with good evaluations sounds like an excellent idea as case studies for how people have done good evaluations in the past.

@ftramer
Copy link
Contributor Author

ftramer commented Feb 21, 2019

The problem I could see with pointing to "good" papers is a form of gatekeeping (e.g., most of the papers I can think of are co-written by some of the authors of this paper).

@carlini
Copy link
Member

carlini commented Feb 21, 2019

Hm. That's true, I don't want it to seem like we are pushing our own work preferentially. It also might set the stage for a future debate where people ask for their papers to be included in the "defense papers with good evaluations" list.

@ftramer
Copy link
Contributor Author

ftramer commented Feb 21, 2019

Maybe the point worth expanding on is the recommendation "The loss function is changed as appropriate to cause misclassification".

I think the specific point to make here is that you should think hard about why you think your defense works, and then build a new loss function that explicitly attacks this assumption.
I think it would be worth giving a generic example for adversarial examples detection, where this problem seems most prevalent. E.g., if your defense is based on the assumption that adversarial examples satisfy property P (and regular examples don't), then you have to build a continuous differentiable loss L such that minimizing L is a proxy for changing P (and prove or at least argue that this is the case). Then you should apply existing attacks (with all the other guidelines) using L, even if you never used L to train your network.

This suggestion may seem obvious (maybe it is), but at least I think there's relatively little that could go wrong with someone doing this.

@earlenceferns
Copy link

Another idea might be to say: if you use another "sentry" ML model as a "detector" or "robustifier" then an attacker might try to adapt and attack that sentry also, thus simultaneously fooling the original model, and the sentry model.

@ftramer
Copy link
Contributor Author

ftramer commented Feb 21, 2019

Agreed. What's nice about the above generic formulation is that it can cover your example as a special case: the property P that distinguishes adversarial and natural examples is simply the output of your detector or robustifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants