Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: locking in metrics #14144

Merged
merged 1 commit into from
Feb 4, 2025
Merged

Conversation

Joibel
Copy link
Member

@Joibel Joibel commented Feb 3, 2025

Fixes #14076

Also fixes my own bad test case (deprecated schedule test)

Motivation

The golang map in the userdata for custom metrics was accessed under some circumstances from multiple go routines without an appropriate mutex - as can be seen from the stack trace and panic in #14076

Modifications

Make the three maps which can be accessed have their own semaphore, instead of attempting to use the single top level one.

Make accessor functions for accessing instruments from util/telemetry/metrics so that locks there can be easily reasoned about. (Renamed from AllInstruments).

Make a separate lock for realtimeWorkflows from util/metrics/metrics and manually implement that.

Make a lock for custom_metric UserData, because that is needed in limited scenarios. Don't lock when this is accessed from test framework for cleanness in the framework. I don't believe the lock would be useful as the tests are not free running.

Fixed the deprecations metric test which only passed due to a different CronWorkflow using schedule not schedules.

Verification

Ensure existing tests pass - all changes are just for code compatibility, not changes to the tests.

Signed-off-by: Alan Clucas <[email protected]>
@Joibel Joibel marked this pull request as ready for review February 4, 2025 06:59
@terrytangyuan
Copy link
Member

Great fix.

@terrytangyuan terrytangyuan merged commit e79d1c5 into argoproj:main Feb 4, 2025
31 checks passed
Joibel added a commit that referenced this pull request Feb 6, 2025
Signed-off-by: Alan Clucas <[email protected]>
(cherry picked from commit e79d1c5)
Joibel added a commit that referenced this pull request Feb 6, 2025
Signed-off-by: Alan Clucas <[email protected]>
(cherry picked from commit e79d1c5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression: Fatal Error Due to Concurrent Map Access in Custom Gauge Metrics
2 participants