Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand scope of whitelisting to all domain lookups #3006

Closed
tripleee opened this issue May 21, 2019 · 11 comments
Closed

Expand scope of whitelisting to all domain lookups #3006

tripleee opened this issue May 21, 2019 · 11 comments
Labels
area: spamchecks Detections or the process of testing posts. (No space in the label, is because of Hacktoberfest) status: confirmed Confirmed as something that needs working on. type: feature request Shinies.

Comments

@tripleee
Copy link
Member

Is your feature request related to a problem? Please describe.

There is a number of domains which routinely triggers FPs because some of the watches are very broad. We want to be able to exclude well-known good sites from these broad watches in order to improve precision and reduce noise.

Describe the solution you'd like

bertieb implemented whitelisting for ASN checks in #2664 and I was thinking already at the time that this should be refactored to govern all domain name checks.

Describe alternatives you've considered

Perhaps this should be coupled with a broader review of FPs so we can disable entire reasons (e.g. individual ASNs which produce too many FPs?) but let's keep this focused on the technical implementation.

Additional context

This has been raised in chat repeatedly over the last couple of weeks. I don't think it should be hard to do.

@tripleee
Copy link
Member Author

Can't assign to @bertiebaggio explicitly it seems, but he was volunteering to look into this. https://chat.stackexchange.com/transcript/message/50359451#50359451

@tripleee
Copy link
Member Author

Tangentially related perhaps: #1630

@bertiebaggio
Copy link
Contributor

Thanks for the ping 😄

Discussion of a more general whitelist came up when considering the ASN whitelist, @makyen's thoughts seem relevant here:

I agree that a full implementation of whitelisting would be beneficial, but then we're talking about affecting lots of different detection reasons. There are also times when we want different whitelists for different detections, and to not share the list, or at least not share some entries between detections. A full implementation gets complex.

Do we have a few representative examples of things we'd like to exclude? I've been away from the Smokey coalface due to a job application recently so have missed some of the chat around this.

@tripleee
Copy link
Member Author

Mithrandir pointed out a few in chat last week, I think search for when I mentioned "bertieb" as a quick shortcut, or I can try to provide links tomorrow. Glorfindel mentioned one today, I think xda-develop.com or similar. A search in the FPs woud probably be more methodologically sound, similar to what I did for reviewing ASN:s today (I think #3007)

@ArtOfCode-
Copy link
Member

@bertiebaggio check your inbox for an org invite - that should make it possible to actually assign you here.

@angussidney
Copy link
Member

Related (possibly duplicate?): #490

@bertiebaggio
Copy link
Contributor

tripleee: Thanks, I'll have a look through chat history

Art: done, thanks!

@tripleee
Copy link
Member Author

tripleee commented Jun 4, 2019

Pling, any progress?

@angussidney angussidney added area: spamchecks Detections or the process of testing posts. (No space in the label, is because of Hacktoberfest) type: feature request Shinies. labels Jul 9, 2019
@tripleee
Copy link
Member Author

@stale stale bot added the status: stale label Oct 25, 2019
@stale
Copy link

stale bot commented Oct 30, 2019

This issue has been closed because it has had no recent activity. If this is still important, please add another comment and find someone with write permissions to reopen the issue. Thank you for your contributions.

@stale stale bot closed this as completed Oct 30, 2019
@tripleee tripleee reopened this Oct 30, 2019
@stale stale bot removed the status: stale label Oct 30, 2019
@ArtOfCode- ArtOfCode- added the status: confirmed Confirmed as something that needs working on. label Oct 30, 2019
@ArtOfCode-
Copy link
Member

ArtOfCode- commented Oct 30, 2019

As of ce83f31, I've added an is_website_whitelisted helper method, and used it in a few checks in findspam.py (often through the is_whitelisted_website method that was already in there - though that only checked a small number of regexes).

The new helper method feeds from the metasmoke API: any domain that's tagged with whitelisted will be excluded from Smokey's domain checks.

We can also add the helper to more findspam checks if we think it's necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: spamchecks Detections or the process of testing posts. (No space in the label, is because of Hacktoberfest) status: confirmed Confirmed as something that needs working on. type: feature request Shinies.
Development

No branches or pull requests

4 participants