Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete Kafka consumer group when job is dropped #18416

Closed
xxchan opened this issue Sep 5, 2024 · 2 comments · Fixed by #20065
Closed

delete Kafka consumer group when job is dropped #18416

xxchan opened this issue Sep 5, 2024 · 2 comments · Fixed by #20065
Assignees
Milestone

Comments

@xxchan
Copy link
Member

xxchan commented Sep 5, 2024

A user complaint that RisingWave created too many consumer groups and exceeded the limit in Kafka. They have to manually clean dead groups.

Although we only have 1 consumer group for 1 MV now (A long time ago, it was consumer-{current_time} , then it was rw-{fragment_id}-{actor_id}, now {prefix:rw}-{fragment_id}), it may still increase quickly:
They have ~50 sources, but multiple MVs on the same source. So when they recreate MVs, the number of consumer groups will increase a lot.

There are several solutions

  1. (This is what I suggested them to do now.) Use TABLE instead of SOURCE. So recreating MVs will only have MV backfill, and won't consume from Kafka. Drawback: storage.

  2. Allow specifying custom group id. (Now we supported custom group id prefix, and will still generate a suffix.) So when they drop and recreate a job, they can reuse the same group.

    Flink supports this. But this may be error-prone. If we have multiple MVs on the same source, different MVs' Kafka consumers will report offset to the same consumer group, and will make it a mess. (Note: this only affect offset monitoring, but not correctness.) Since multiple MVs on the same source is a natural use case in RisingWave, I don't suggest to do it. (If really necessary, we might add a config like unsafe_exact_group_id=xxx)

  3. Delete consumer group on drop. This is reasonable, and perhaps is good for most users. Some possible problems:

    Users may want to retain the consumer group after drop? I guess this is unlikely, since the groups are dedicated to RisingWave.

    More importantly, deleting consumer group requires higher authorization permission. Note that currently the creation is automatically done by Kafka consumers implicitly, instead of manual creating via admin API.

    We might introduce a config option delete_on_drop, and make sure delete failure won't block DROP.

Note: Shared source (#16003) may look like 1 (i.e., materialized source). But it will do source backfill, which will also create consumer groups.

@github-actions github-actions bot added this to the release-2.1 milestone Sep 5, 2024
@pjpringlenom
Copy link

Same story for pulsar. But with Pulsar inactive consumers will eventually consume the backlog quota preventing new data being written.

@xxchan xxchan self-assigned this Oct 16, 2024
@fuyufjh
Copy link
Member

fuyufjh commented Dec 26, 2024

Delete consumer group on drop

This is the most intuitive one to me. Strong +1 for it.

Additionally, considering the shared source, I think the consumer group of backfill stage should be dropped when backfilling ends instead of dropping the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants