Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: handle aggregation queries sequentially follow up #8450

Conversation

gastonfournier
Copy link
Contributor

@gastonfournier gastonfournier commented Oct 15, 2024

This is a follow up on #8446

  • migrating some metrics (check the commits)
  • moving event listener registration outside metric calculation (every 2 hs we are adding new listeners)
  • adding jitter to metric refresh
  • manually executing some of the tasks immediately to avoid changing the test logic (with jitter now we'd have to force or wait until the first execution of the scheduled task that collect the metrics). We might decide to change this later after the migration
  • try catch the execution of the task
  • register metrics now "just registers" and returns functions for refreshing metrics. In the future, we'll return an object with the ability of independently updating specific metrics so we can tweak the frequency of updates

Copy link
Contributor

github-actions bot commented Oct 15, 2024

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

PackageVersionScoreDetails

Scanned Manifest Files

Copy link

vercel bot commented Oct 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
unleash-monorepo-frontend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 17, 2024 9:40pm
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
unleash-docs ⬜️ Ignored (Inspect) Visit Preview Oct 17, 2024 9:40pm

@gastonfournier gastonfournier changed the title chore: Proposal/handle aggregation queries sequentially follow up chore: handle aggregation queries sequentially follow up Oct 15, 2024
count: number;
labels: RestrictedRecord<GaugeDefinition<R>['labelNames']>;
};
type MapResult<R> = (result: R) => MetricValue<R> | MetricValue<R>[];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single task can output different counts for different labels. This is useful for cases like client_apps_total at line 40

Comment on lines 121 to 122
// schedule and execute immediately
await dbMetrics.registerGaugeDbMetric({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tasks need to execute immediately because the tests expect that. I'd argue that some of these can be delayed, but maybe they are important because of other components depending on these. So I'm delaying this decision until we migrated all gauges to dbMetrics

try {
dbMetrics.refreshDbMetrics();

const stats = await instanceStatsService.getStats();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getStats was triggering many queries in parallel, some of which were not used in prometheus.

@@ -512,16 +858,6 @@ export default class MetricsMonitor {
legacyTokensActive.reset();
legacyTokensActive.set(deprecatedTokens.activeLegacyTokens);

if (maxEnvironmentStrategies) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got migrated to: https://github.com/Unleash/unleash/pull/8450/files#diff-0610ce1595d1dcf150ca2efb2f0c5806d082f5bc6b8429a437970c7d60171cbdR181-R194 where both the gauge definition and refresh query are together

Comment on lines -463 to -485
eventBus.on(
events.STAGE_ENTERED,
(entered: { stage: string; feature: string }) => {
if (flagResolver.isEnabled('trackLifecycleMetrics')) {
logger.info(
`STAGE_ENTERED listened ${JSON.stringify(entered)}`,
);
}
featureLifecycleStageEnteredCounter.increment({
stage: entered.stage,
});
},
);

eventBus.on(
events.EXCEEDS_LIMIT,
({
resource,
limit,
}: { resource: string; limit: number }) => {
exceedsLimitErrorCounter.increment({ resource, limit });
},
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved up to line 485 on the right side. Now lives inside registerPrometheusMetrics instead of startMonitoring

Comment on lines +1086 to +1093
const { collectStaticCounters, collectDbMetrics } =
registerPrometheusMetrics(
config,
stores,
version,
eventBus,
instanceStatsService,
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start monitoring now calls helper functions that register prometheus metrics and then adds schedules to frequently update these metrics

@@ -338,11 +369,12 @@ export class InstanceStatsService {
this.clientInstanceStore.getDistinctApplicationsCount(30),
this.clientInstanceStore.getDistinctApplicationsCount(),
]);
return {
this.appCount = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Store the result, so it can be used internally (line 261)

Copy link
Member

@nunogois nunogois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed 1on1, LGTM

@gastonfournier gastonfournier merged commit 7a4c855 into proposal/handle-aggregation-queries-sequentially Oct 18, 2024
9 checks passed
@gastonfournier gastonfournier deleted the proposal/handle-aggregation-queries-sequentially-follow-up branch October 18, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants