-
Notifications
You must be signed in to change notification settings - Fork 461
Performance monitoring and debug
There are 2 distinctive kind of usages around collecting performance data:
- Performance monitoring (which might happen at production environment)
- Performance debug (which happens in development debug environment)
Tool | OS | Monitoring | Debug | uAPI | Notes |
---|---|---|---|---|---|
gputop | Linux | Possible (w/ care) | Yes | No | Performance monitoring possible (without high impact on the system) for some counters |
IGT trace.pl | Linux | No | Yes | No | |
Linux perf | Linux | Yes | Yes | Yes | |
metrics monitor | Linux | Yes | High level only | Yes | Sample for Linux perf |
UMDPerfProfiler | Linux | No | Yes | No | |
VTune | Linux/Windows | No | Yes | No |
Quick start:
- For more information see: Linux perf .
- To install on Linux:
- Either install thru package manager like: yum install perf
- Or build from sources which are inside kernel tree: cd tools && make perf
- You have an access to API documentation: man 2 perf_event_open
Tool | Description |
---|---|
perf stat | Obtain event counts |
perf record | Record events for later reporting |
perf report | Break down events by process, function, etc. |
perf annotate | Annotate assembly or source code with event counts |
perf top | See live event count |
perf bench | Run different kernel microbenchmarks |
1. /proc/sys/kernel/perf_event_paranoid system file specifies kernel action globally for all users:
Value | Meaning | Notes |
---|---|---|
-1 | Allow use of (almost) all events by all users | |
>= 0 | Disallow raw tracepoint access by users without CAP_IOC_LOCK | |
>= 1 | Disallow CPU event access by users without CAP_SYS_ADMIN | Starting from this level non-priv users will not be able to query global statistics |
>= 2 | Disallow kernel profiling by users without CAP_SYS_ADMIN | |
>= 3 | Disallow events access by non-priv users | Depending on the distribution! Not all distributions support this setting. Known distributions are: Debian, Android. |
2. Alternatively application can be granted capabilities by the privileged users - see which capabilities are needed in the previous table. In this case this application would be capable to request events statistics if ran by any user (who has permission to execute this program). For example, to grant CAP_SYS_ADMIN to metrics_monitor application:
$ sudo setcap cap_sys_admin+ep metrics_monitor $ sudo getcap metrics_monitor metrics_monitor = cap_sys_admin+ep $ sudo setcap -r metrics_monitor # this command will remove all the caps from the metrics_monitor
Pay attention that as soon as some capabilities are set for the application, shared libraries from which this application depends on will be searched for only in the system paths, i.e. LD_LIBRARY_PATH adjustments would no longer be possible: make sure that all dependencies for applications with capabilities are properly installed.
Quick start commands:
Command | Description |
---|---|
perf list | Get list of available perf event |
perf stat -e cpu-cycles,power/energy-cores/ ls /sys | Collect per-task events |
perf stat -e cpu-cycles,power/energy-cores/ -a ls /sys | Collect global events (mind -a option) |
perf stat -e i915/rcs0-busy/,i915/vcs0-busy/ -a < workload.sh > | Collect busy times for RCS0 and VCS0 (i915 events are global) |
perf stat -e i915/rcs0-busy/,i915/vcs0-busy/ -a -I 100 < workload.sh > | Sample busy metrics each 100 ms |
perf record -g < workload.sh > | Collect CPU% metrics |
perf report -G | View collected metrics from perf record |
perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/vcs0-busy/,i915/actual-frequency/ -a -I 100 <workload.sh>How to access metrics from the application:
- See metrics monitor which is a sample for Linux perf
# perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/vcs0-busy/ -a <workload.sh> Performance counter stats for 'system wide': 7,946,514,387 ns i915/rcs0-busy/ 3,206,574,844 ns i915/vcs0-busy/ 3,206,763,484 ns i915/vcs1-busy/ 6,842,922,734 ns i915/vecs0-busy/ 8.436751418 seconds time elapsed
Quick start:
- For more information see: metrics monitor readme
- Sources: metrics monitor
- To run:
sudo LD_LIBRARY_PATH=$MFX_INSTALL/share/mfx/samples $MFX_INSTALL/share/mfx/samples/metrics_monitor
- Run workload in parallel shell
Metric | Corresponding i915 event | Meaning |
---|---|---|
RENDER usage | i915/rcs0-busy/ | RCS (Render Engine, GPGPU) utilization, [0-100%] |
VIDEO usage | i915/vcs0-busy/ | VCS0 (VDBOX0) utilization, [0-100%] |
VIDEO_E usage | i915/vecs0-busy/ | VECS (VEBOX) utilization, [0-100%] |
VIDEO2 usage | i915/vcs1-busy/ | VCS1 (VDBOX1) utilization, [0-100%] |
GT Freq | i915/actual-frequency/ | Actual (granted) average GPU frequency, MHz |
Example:
perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/vcs0-busy/ -a <workload.sh> RENDER usage: 0.00, VIDEO usage: 0.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 349.95 RENDER usage: 0.00, VIDEO usage: 0.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 349.95 RENDER usage: 1.85, VIDEO usage: 4.09, VIDEO_E usage: 7.13 VIDEO2 usage: 4.09 GT Freq: 453.94 RENDER usage: 99.01, VIDEO usage: 36.88, VIDEO_E usage: 77.94 VIDEO2 usage: 36.90 GT Freq: 949.88 RENDER usage: 100.00, VIDEO usage: 37.34, VIDEO_E usage: 77.80 VIDEO2 usage: 37.39 GT Freq: 949.84 RENDER usage: 100.00, VIDEO usage: 37.86, VIDEO_E usage: 78.84 VIDEO2 usage: 37.91 GT Freq: 949.88
Quick start:
- For mode details see: gputop
- Permits to access Intel GPU HW counters
- Server-kind application to collect data
sudo gputop
To collect data:
gputop-wrapper -m RenderBasic -c GpuCoreClocks,EuActive,L3Misses,GtiL3Throughput,EuFpuBothActive
Server: localhost:7890 Sampling period: 1 s Monitoring system wide Connected System info: Kernel release: 4.15.0-rc4+ Kernel build: #49 SMP Tue Dec 19 12:17:49 GMT 2017 CPU info: CPU model: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz CPU cores: 4 GPU info: GT name: Kabylake GT2 (Gen 9, PCI 0x5916) Topology: 168 threads, 24 EUs, 1 slices, 3 subslices GT frequency range: 0.0MHz / 0.0MHz CS timestamp frequency: 12000000 Hz / 83.33 ns OA info: OA Hardware Sampling Exponent: 22 OA Hardware Period: 699050666 ns / 699.1 ms Timestamp GpuCoreClocks EuActive L3Misses GtiL3Throughput EuFpuBothActive (ns) (cycles/s) (%) (messages/s) (B) (%) 285961912416,770.9 M cycles, 0.919 %, 1473133.00, 89.91 MiB, 0.256 % 286992496416,900.1 M cycles, 1.04 %, 2036968.00, 124.3 MiB, 0.316 % 288190601500,521.4 M cycles, 1.81 %, 2030997.00, 124 MiB, 0.537 % 289519269500,1.028 G cycles, 11.8 %, 33181879.00, 1.978 GiB, 3.82 % 290562176250,1.007 G cycles, 11.1 %, 30115582.00, 1.795 GiB, 3.66 % 291569408333,905.9 M cycles, 10 %, 24534419.00, 1.462 GiB, 3.18 % 292590314500,762.4 M cycles, 6.89 %, 10934947.00, 667.4 MiB, 2.31 % 293954678166,538.5 M cycles, 1.72 %, 2034698.00, 124.2 MiB, 0.543 % 295323480416,751.6 M cycles, 1.28 %, 2034477.00, 124.2 MiB, 0.356 %
Quick start:
- Source: IGT trace.pl
- Requires:
- Linux perf to collect data and dump in raw text format: yum install perf
- VIS to render in HTML: apt-get install npm && npm install vis
- CONFIG_EXPERT=y
- CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS=y
- Permits to view i915 perf events on the timeline
$INSTALL/igt-gpu-tools/scripts/trace.pl --trace <workload.sh> # will produce perf.data file perf script > workload.dataTo render data in html:
mkdir ~/workdir && cd ~/workdir && npm install vis $INSTALL/igt-gpu-tools/scripts/trace.pl --html < workload.data > node_modules/workload.html # mind that < and > are redirections firefox node_modules/workload.htmlPlease, pay attention that location of output workload.html is important: it should be in node_modules directory. Display example:
Quick start:
- Source: UMDPerfProfiler
- Permits to collect and profile media tasks timing
Comprehensive performance analysis tool. Permits to collect various HW CPU and GPU counters, profile application, display tasks timelines. For details see VTune.
- Media SDK for Linux
- Media SDK for Windows
- FFmpeg QSV
- GStreamer MSDK
- Docker
- Usage guides
- Building Media SDK
- Running Media SDK CI tests
- Additional information
- Multi-Frame Encode