Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

truncate resource journal at configurable size #6633

Merged
merged 6 commits into from
Feb 14, 2025

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Feb 13, 2025

This is the simplified version of #6631. The in-memory resource eventlog is truncated at the configured limit, but the truncate event does not contain any context, it just marks the fact that events have been dropped.

Fixes #6610

@@ -36,7 +36,7 @@ struct reslog {
struct resource_ctx *ctx;
zlist_t *pending; // list of pending futures
zlist_t *watchers;
json_t *eventlog;
zlistx_t *eventlog;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message typo "Swtich" and "reosurce"

@@ -217,8 +217,7 @@ int reslog_post_pack (struct reslog *reslog,
va_end (ap);
if (!event)
return -1;
if (json_array_append (reslog->eventlog, event) < 0) {
json_decref (event);
if (!zlistx_add_end (reslog->eventlog, json_incref (event))) {
errno = ENOMEM;
return -1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if zlistx_add_end() fails, I think you gotta call the json_decref() b/c you json_incref()ed above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be slightly simpler to just define an entry_duplicator().

Comment on lines 435 to 497
if (!zlistx_add_end (reslog->eventlog, json_incref (entry)))
goto nomem;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotta decref entry if zlistx_add_end() failed?

@@ -197,6 +197,43 @@ int reslog_sync (struct reslog *reslog)
return 0;
}

/* Truncate resource journal if needed to reslog->journal_max
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message "replace the most recent event" ... perhaps you mean "replace the oldest event"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to be "replace the most recently dropped event" - not the oldest event in case more than one event is dropped at a time.

Comment on lines +174 to +187
test_expect_success '1st event is a truncate event' '
head -1 eventlog.trunc > eventlog.trunc.1 &&
jq -e ".name == \"truncate\"" eventlog.trunc.1
'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: also check number of entries == limit, and likewise with 2 tests below this one

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I poked this a bit and LGTM.

@grondo grondo force-pushed the reslog-truncate-simple branch 2 times, most recently from bc88096 to 05a34be Compare February 13, 2025 23:10
@grondo
Copy link
Contributor Author

grondo commented Feb 13, 2025

I think I've addressed the comments above, rebased, and forced a push.

There's also a slight change here that ensures the eventlog doesn't have limit + 1 entries after truncation. This means there's actually max - 1 actual eventlog entries in a truncated list.

Problem: The resource journal will need to truncated by dropping
entries from the start, but a jansson json array is not conducive
to this operation.

Switch to a zlistx_t for the resource journal. This data structure
allows the removal and addition of items to the front of the list
(without moving the memory for the rest of the items on the list)
Problem: The in-memory resource journal grows without bound and there
is no way to limit its size.

In preparation for truncating the resource journal, add a new config
key in the [resource] table, `journal_max`, which can be used to set
a limit on the journal size. Default this limit to 100000 entries.

Modify the reslog constructor to accept this journal_max limit, but
don't do anything with it just yet.
Problem: The resource journal limit cannot be configured at runtime
because configuration updates are ignored by the resource module.

Capture any new resource.journal_max entry on config reload and send
it to the reslog implementation.
Problem: The resource journal can grow without bound.

Truncate the in-memory resource eventlog if configured and replace
the most recently dropped event with an RFC 44 truncate event.
Problem: There's no testing of resource journal truncation in the
testsuite.

Add a set of very basic tests to t2355-resource-journal.t that ensure
basic eventlog truncation works.
Problem: The resource.journal-max configuration key is not documented.

Add it to flux-config-resource(5).
@grondo grondo force-pushed the reslog-truncate-simple branch from 05a34be to 6f79323 Compare February 14, 2025 14:52
@grondo
Copy link
Contributor Author

grondo commented Feb 14, 2025

Uh, I guess I'll set MWP here now.

@mergify mergify bot merged commit f6bef37 into flux-framework:master Feb 14, 2025
35 checks passed
Copy link

codecov bot commented Feb 14, 2025

Codecov Report

Attention: Patch coverage is 88.46154% with 6 lines in your changes missing coverage. Please review.

Project coverage is 83.88%. Comparing base (04c71be) to head (6f79323).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
src/modules/resource/reslog.c 86.95% 6 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6633   +/-   ##
=======================================
  Coverage   83.87%   83.88%           
=======================================
  Files         533      533           
  Lines       88331    88372   +41     
=======================================
+ Hits        74087    74127   +40     
- Misses      14244    14245    +1     
Files with missing lines Coverage Δ
src/modules/resource/resource.c 86.43% <100.00%> (-0.24%) ⬇️
src/modules/resource/reslog.c 74.27% <86.95%> (+1.89%) ⬆️

... and 7 files with indirect coverage changes

@grondo grondo deleted the reslog-truncate-simple branch February 14, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

resource eventlog grows without bound in memory
3 participants