Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add connection pause and weighted routing functionality for gradual rollout for core NATS #6563

Open
ajax-surovskyi-y opened this issue Feb 21, 2025 · 4 comments
Labels
proposal Enhancement idea or proposal

Comments

@ajax-surovskyi-y
Copy link
Contributor

Proposed change

Description:
In Kubernetes deployment scenarios (for example using Argo Rollouts) it would be highly beneficial for an application to establish all of its subscriptions (core NATS) on the NATS server while keeping the connection in a "paused" state, i.e. not receiving any messages until the service is fully ready. This functionality would allow a service to start up, initialize its subscriptions, and then begin processing messages only after an explicit resume command is issued.

In addition, having the ability to assign weights to connections or subscriptions would enable gradual rollout of a new version of a service. For example, during an Argo Rollout, operators could gradually increase the traffic weight directed to the new version without requiring any changes on the client side (clients continue to publish requests to the same subject).

Proposed Functionality:

Connection Pause/Resume like Support pausing and resuming consumers but for core NATS:

Enable a connection mode where an application can create all subscriptions upon startup but “pause” message delivery from the NATS server until explicitly resumed.
This feature would help ensure that a service does not start processing traffic before it is fully initialized and ready.
Weighted Routing/Connections:

Allow configuration of weight values for individual connections or subscriptions within the same NATS cluster.
This would enable a gradual rollout (canary deployment) scenario where, for instance, the new version of a service could initially receive a small percentage (e.g., 10%) of the traffic, while the stable version receives the rest.
Over time, these weights could be adjusted to gradually shift more traffic to the new version.

Use case

When deploying an update via Argo Rollout in Kubernetes, a new pod can start up and create all necessary subscriptions on the shared NATS cluster while keeping its connection paused. Once readiness checks pass, an operator (or an automated process) gradually resumes message processing by assigning a low weight (e.g., 10%) to the new version. As confidence in the new version increases, the weight can be increased step by step until it eventually reaches 100%, and the old version is phased out.

Contribution

No response

@ajax-surovskyi-y ajax-surovskyi-y added the proposal Enhancement idea or proposal label Feb 21, 2025
@hpvd
Copy link

hpvd commented Feb 22, 2025

+1

@hpvd
Copy link

hpvd commented Feb 26, 2025

looking at the number of positive reactions, this may look like a pretty strong candidate for 2.12 :-)

@derekcollison
Copy link
Member

This is possible today with subject mapping and weighting.

@ajax-surovskyi-y
Copy link
Contributor Author

ajax-surovskyi-y commented Mar 2, 2025

Thank you for the positive feedback and discussion on this idea.

I have prepared a small PoC that implements a basic scenario: creating subscriptions in the same queue group with different weights. In my experiment, two subscriptions with weights 10 and 100 resulted in message distribution at roughly a 1:10 ratio, which demonstrates the potential value of this approach.

Please note that this PR does not implement the ability to change the weights of existing subscriptions. I see two possible ways to handle this:

  1. Delegate the responsibility to clients, allowing them to re-subscribe with the desired weight.
  2. Add functionality on the server side to change subscription weights dynamically.

I would very much like to see this evolve into a production-ready solution and would be more than happy to contribute to the project. At this stage, the PR is offered as a basic idea, and I would greatly appreciate your thoughts on the approach. Am I on the right track, or are there alternative solutions or improvements you would suggest?

Thank you for your time and input!

Screencast.from.03-01-2025.11_29_36.PM.mp4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Enhancement idea or proposal
Projects
None yet
Development

No branches or pull requests

3 participants