Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't report posts from Answer Bot #13922

Closed
Andrew5057 opened this issue Dec 19, 2024 · 4 comments
Closed

Don't report posts from Answer Bot #13922

Andrew5057 opened this issue Dec 19, 2024 · 4 comments

Comments

@Andrew5057
Copy link

Is your feature request related to a problem? Please describe.

Currently, SmokeDetector reports posts from Answer Bot if they trip a watchlist or other spam check. That's not optimal; we already know that answers from Answer Bot aren't spam, so we should probably just ignore them.

Describe the solution you'd like

Answer Bot accounts seem to be guaranteed to have an ID of -2, so adding a check for that somewhere is probably the easiest way to go about this. I'm not particularly familiar with the codebase, but maybe at the top of FindSpam.test_post?

Describe alternatives you've considered

  • We definitely shouldn't whitelist based on the username.
  • The user-whitelist doesn't help; Answer Bot is cross-site, and the posts themselves are tripping Smokey's spam checks (not the username).
@makyen
Copy link
Contributor

makyen commented Dec 19, 2024

I am resistant to doing this, because the premise upon which it rests is false. We do not "know that answers from Answer Bot aren't spam". We believe that SE doesn't intend them as spam, but there's nothing that actually prevents those answers from, effectively, being spam. An LLM could produce anything. There's nothing that restricts the answer from, effectively, promoting something inappropriately.

@CoconutMacaroon
Copy link
Contributor

CoconutMacaroon commented Dec 19, 2024

@makyen If we're not going to exclude them from SmokeDetector scanning, consensus should be established on whether or not these posts are tp (for not meeting the referencing requirements) or fp as it's SE-approved. A metasmoke search for posts with a username matching ^Answer Bot$ indicates a lack of consensus (disclaimer: I am responsible for some, but not all, of those TPs).

After re-reading the Feedback Guidance page, I think it should be considered tp for failing to meet the referencing requirements (I don't consider the profile sufficient disclosure). It does have an AI badges, but it still isn't in a blockquote. That said, I wouldn't be opposed to changing policy to state that Answer Bot's posts should be considered FP if they're only genAI and not spam, with rationale that it's SE-approved.

@CoconutMacaroon
Copy link
Contributor

Answers from the Answer Bot and Answer Assistant experiment, whether private or public, will not be present in the data dumps and will not be available via the API.

Berthold on MSE

As SD will no longer get these, I don't really think there's much for Charcoal to do here.

@jeremyBanks
Copy link

jeremyBanks commented Feb 5, 2025

Just for some background: on December 19th I mentioned Smoke Detector to staff in a (private) chat about the places that Answer Bot posts were appearing unintentionally:

  1. Smoke Detector also sees these posts, with the markdown content, but that isn't available through any route I'm aware of. From a quick skim at its source code, I think it's getting the post bodies through one of the WebSocket streams, maybe the global Stack Exchange network firehose, but I haven't confirmed whether that actually contains the markdown of the posts, or if it's using some other mechanism I've missed.

re: Smoke Detector: I see now that it's just using the /questions/POST_ID route to get the bodies. Answer Bot's posts seem to be filtered out of the top-level .items in API responses to /questions and /posts and /answers, but they're not filtered out when they appear in the answers field of a question, when the top-level items in the response are questions.
(I'm not sure if it's becoming aware of them by polling /questions, or a websocket, like wss://qa.sockets.stackexchange.com/ the stackexchange.com real-time feed. But the websockets don't contain the full post bodies, only a summary.)

I also directed them where they could search to verify whether future posts were showing up in Smoke Detector or not:

Sounds like you're good but FYI I was checking what it had flagged by searching "Answer Bot" on the pages at https://metasmoke.erwaysoftware.com/posts, where a few have appeared. (It's supposed to group them by user, but I think the negative user ID confuses their code so they aren't grouped.)

So indeed, this is something that Stack Exchange intends to prevent on their end and it shouldn't be necessary to make any changes to SmokeDetector. If they show up again you can probably report it as a bug.

However, if you decide you do want to (somehow) handle public Answer Bot posts in the future, I'd just point out that Smokey doesn't seem to be grouping the posts as being from a single user, presumably because the negative user ID, or absence of an account ID, confuses the relevant logic. So that might be a special case to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants