flux-top
: command that gives insight into the current utilization of job resources
#3791
Unanswered
SteVwonder
asked this question in
Ideas
Replies: 1 comment
-
I know initial brainstorming idea for the short term. Could Then on the client side look at info via |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
As the title says, it would be cool to be able to do something like
flux top $JOBID
and see the aggregate cpu/gpu/mem usage of all the nodes in a job. Maybeflux-exec
could be leveraged here to spawn a "daemon" on each node in a user's job. Those daemons could then collect utilization statistics, forward them up the overlay, where they are reduced and fed to theflux-top
front-end. Alternatively, maybe we have an always loaded module that collects these stats, and theflux-top
front-end would just filter to just the stats that apply the the user's particular job.Opening because I just had a workflow user request this while debugging node OOMs. The traditional method of debugging an OOM (attach a parallel debugger) is hard since there are tons of apps/components running in this composite workflow. So just getting a feel for which of the components is using the most memory would be helpful.
Beta Was this translation helpful? Give feedback.
All reactions