Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evt.cputime #112

Open
brendangregg opened this issue Apr 13, 2014 · 8 comments
Open

evt.cputime #112

brendangregg opened this issue Apr 13, 2014 · 8 comments

Comments

@brendangregg
Copy link
Contributor

I'd like a field showing nanoseconds of CPU time for the current task, so that CPU time during calls can be inspected. This evt.cputime would only increment when that task (thread) was on-CPU. The kernel already tracks this, so it's a matter of exposing it.

@ldegio
Copy link
Contributor

ldegio commented Apr 17, 2014

Just to understand, what you want is the number that the topprocs_cpu shows, but on an event basis rather than aggregated by second? Or something different?

@brendangregg
Copy link
Contributor Author

Ah, thanks, %thread.exectime (which topprocs_cpu uses) looks like it should do what I want. Is it possible for it to be exported on more than just switch events?

What I'd like to do is time an event, eg, a read() syscall, and determine if the latency is due to time spent on-CPU or off-CPU. This determination directs further investigation to different tools.

%evt.latency or %evt.rawtime deltas show me the elapsed time for the read() syscall. A %thread.exectime would be used to then divide this time into two states: on- and off-CPU.

@ldegio
Copy link
Contributor

ldegio commented Apr 28, 2014

Hey Brendan, is this 807e50a what you would expect?

@brendangregg
Copy link
Contributor Author

I commented on the commit (by mistake); anyway, interface looks ok, but was getting zero.

@ldegio
Copy link
Contributor

ldegio commented Apr 29, 2014

First of all, can you do a pull? b13df5a fixed an issue that caused the field not to be evaluated.

Then, the filter you're using rejects switch events, which are the ones incrementing the thread CPU time. You should be able to include them by telling sysdig that the filter is a display one:

sysdig -d -p '%proc.name read "%fd.name", %evt.latency ns, %thread.totexectime CPU ns' 'evt.type=read and proc.name=dd'

By the way: the way this works is that the CPU time is updated by looking at scheduler switch events. This means that it's bumped up in discrete intervals, every time there's a switch. I'm not sure if it works for this application. A continuous CPU time would involve adding the CPU counter to every event, which is not a trivial change.

@brendangregg
Copy link
Contributor Author

I'm running:

dd if=/dev/zero of=/dev/null bs=1000000k count=5

And the one-liner reports the read()s with what looks like the right evt.latency, but zero thread.totexectime. The CPU time should match the latency, since these syscalls are just moving bytes in system time.

Ok, I see, so while the kernel tracks it, if it's only incremented on switch (looks like /proc read as well) then that makes it a bit tricky. If the last schedule time is kept somewhere (task_struct->ftrace_timestamp? or something in task_struct->se?), then the current time could be read (CPU TSC) and a delta calculated. Or maybe some of the same functions /proc uses could be called, eg, task_cputime_adjusted().

@ldegio
Copy link
Contributor

ldegio commented May 25, 2014

Extracting the schedule time is definitely feasible, and that means that adding this feature for live analysis can be done (relatively) easily.

Remember, however, that one of the core philosophies behind sysdig is that observing the live system or taking a capture should give you the exact same result. So I see 3 choices:

  1. attach the CPU counter to specific events like switch. This is what we do now and, as you point out, it's not ideal because it doesn't offer the precision required by some use cases.
  2. attach the CPU counter to every event. This should solve the problem, but creates a major overhead in terms of capture buffer occupation and trace file size.
  3. accept that, for some metrics, the symmetry between live and offline analysis cannot be achieved and export the functionality for live only.

A possible compromise is implementing #2, but keeping it disabled by default. In other words, there would be a command line switch (and a chisel API call) to turn on per-event CPU capture when needed. This is feasible but non trivial to implement and therefore we need to understand how to prioritize it based on the importance of the use cases.

Thoughts?

By the way, it might be worth moving this discussion to the mailing list...

@github-actions
Copy link

github-actions bot commented Mar 3, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants