Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better CPU metric support #18

Open
dpocock opened this issue Apr 24, 2014 · 7 comments
Open

better CPU metric support #18

dpocock opened this issue Apr 24, 2014 · 7 comments

Comments

@dpocock
Copy link
Member

dpocock commented Apr 24, 2014

JMX provides the ProcessCpuTime metric. It is the number of nanoseconds of CPU time used by the JVM.

Graphing the raw value in Ganglia is unhelpful.

gmond itself sends cpu_* metrics (e.g. cpu_system) and these are percentages derived from the raw tick values of the host. jmxetric could derive percentages from the JVM values and send those instead. This would be more consistent with the Ganglia design and reduce network load when values are not changing.

We should also consider how this should work for multi-threaded Java apps on multi-CPU systems.

Other projects like jmxtrans have examples of working with these values.

@dpocock
Copy link
Member Author

dpocock commented Jul 1, 2014

@ngzhian Zhi An, could you please have an initial look at this issue and let me know your observations about it?

@ngzhian
Copy link
Contributor

ngzhian commented Jul 4, 2014

@dpocock by percentage do you mean CPU time used by JVM over total CPU time?

@dpocock
Copy link
Member Author

dpocock commented Jul 4, 2014

Actually, we need to work out the correct definition for this.

Have you seen the CPU graphs in JConsole? I believe it takes the same raw values from JMX. You could try making some simple code that starts 2 threads, each using 100% of a CPU core and see how that looks in JConsole for a CPU with 4 or more cores.

@ngzhian
Copy link
Contributor

ngzhian commented Jul 6, 2014

Going back to gmond monitor-core cpu_user_func, we observe that the value is calculated as such

diff = user_jiffies - last_user_jiffies;
val.f = ((double)diff/(double)(total_jiffies - last_total_jiffies)) * 100.0;

which translates to, portion of total CPU time spent on user activity over a particular time period.

So a simple way is to use take (ProcessCpuTime/Uptime * 100), where ProcessCpuTime is the "Total amount of CPU time that the Java VM has consumed since it was started" and Uptime is the "total amount of time since the JVM was started". This will result in a percentage value in the range [0, 100] that will fluctuate as the JVM is running. We could store a lastProcessCpuTime and lastUptime, then take the difference, in that case the calculation will be similar to the one in monitor-core.

I have taken a few pictures and also wrote a simple program that loops and uses CPU cycles so that I could examine jconsole to verify the numbers, and also took some screenshots of the values in jconsole. These will be updated on a blog post soon.

Also I noted some points, such as:

  • Uptime is given in milliseconds
  • ProcessCpuTime in nanoseconds
  • we can't distinguish between system, user CPU time, unlike the monitor-core implementation.
  • we can only poll all running threads for their ThreadCpuTime and ThreadUserTime and do our own calculation if we wish to provide the different CPU time.
  • at certain times, ProcessCpuTime is larger than Uptime, I will need to investigate this further
  • each Thread can have its own cpu usage as well

@ngzhian
Copy link
Contributor

ngzhian commented Jul 11, 2014

Updated everything in this blog post.
As of now every JMX metric value is retrieved using this code

Object o = mbs.getAttribute(objectName, attributeName);

This will make it difficult to make modifications to the raw value of the mbeans, as in this case of sending percentage values.
I'm still thinking of a a good way to incorporate this requirement into the current system.

@dpocock
Copy link
Member Author

dpocock commented Aug 6, 2014

Can you just clarify one other thing about multi-CPU: lets say that you have the following:

  • server has 4 CPU cores
  • JVM runs for 60 seconds
  • 2 threads are running, each using 100% of a CPU core for the whole 60 seconds

What would the actual calculations and results be in this case?

E.g. Uptime = 60 seconds
available CPU time = 4_60 seconds (because we have 4 cores)?
ProcessCpuTime = 2_60 seconds (because we have 2 threads)?
result = ProcessCpuTime / 60 = 200%?
or ProcessCpuTime / (4*60) = 50%?

What does JConsole show in this case?

You don't need to worry about the other things from mod_multicpu, e.g. the system or user jiffies. Just getting Ganglia to show something comparable to the JConsole CPU graph is sufficient.

@ngzhian
Copy link
Contributor

ngzhian commented Aug 16, 2014

The way jconsole does it is:
ProcessCpuTime / Uptime / #processors,
hence its 120 / 60 / 4 = 50%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants