You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to this document it says that retry budget will help stop the retry storm. but I don't think so. let me explain why.
Imagine we've three services A -> B -> C and A is receiving a steady 100rps. we define the retry budget at 20% from A to B. in this case we can have a budget of making 20 retry-able requests. which is fine from A to B but now B will have 120rps of active requests and let's say it also configures its retry budget for B to C at 20%. now, we'll just increase our retry budget on the downstream C because we're having more active requests on B. as we move down to the path, this retry budget will keep on increasing thus creating a retry storm.
please feel free to correct me if my assumption is wrong and please suggest if you have an idea about how to stop retry storm in a mesh. if you face similar issues on prod, I'm interested to know how you solve them. Thank you.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
According to this document it says that retry budget will help stop the retry storm. but I don't think so. let me explain why.
Imagine we've three services A -> B -> C and A is receiving a steady 100rps. we define the retry budget at 20% from A to B. in this case we can have a budget of making 20 retry-able requests. which is fine from A to B but now B will have 120rps of active requests and let's say it also configures its retry budget for B to C at 20%. now, we'll just increase our retry budget on the downstream C because we're having more active requests on B. as we move down to the path, this retry budget will keep on increasing thus creating a retry storm.
please feel free to correct me if my assumption is wrong and please suggest if you have an idea about how to stop retry storm in a mesh. if you face similar issues on prod, I'm interested to know how you solve them. Thank you.
Beta Was this translation helpful? Give feedback.
All reactions