-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiler: Add Tracy backend #4300
base: main
Are you sure you want to change the base?
Conversation
fe94fe0
to
7698229
Compare
This differs from the existing GPUVis backend in a number of ways: * Tracy is optimized for minimal overhead and nanosecond-resolution profiling * Tracy supports live tracing (in addition to capture-based operation) * Tracy has a richer feature set and a more polished UI (notably, statistics and histograms are generated out-of-the-box) * GPUVis supports tracing multiple processes, whereas Tracy is single-process only To use this backend, one of the environment variables FEX_PROFILE_TARGET_NAME or FEX_PROFILE_TARGET_PATH must be defined to select the application under profile by name or by path suffix.
7698229
to
7650aef
Compare
…FOR_FORK for games that fork on startup
Do you know how long it takes each Tracy event to execute? |
The author advertises cost in the single-digit nanosecond range (< 3ns per zone) in a reference benchmark. The details are briefly outlined in section 1.7 in the Tracy manual :) (For FEX it will be marginally larger since each profiling event must be wrapped in a |
It would be nice if this could be measured on X13s/X1E or whatever since Snapdragon's cyclecounter doesn't give sub-nanosecond accuracy. At 19.2Mhz you only get ~52ns granularity so we'll need to get averages. So spinning a loop on the markers would be good just to see what it gets down to. I don't see what their measurement methodology is other than just saying it's fast. |
Is there a reason why we are adding tracy as an external here instead of using cmake to check the user has the appropriate dependencies installed? |
One of the things I noticed is that the Tracy UI is really small on my display for some reason, certainly smaller than the remaining GUI elements. This is just to document the existence of |
Did you check if linking against the system libtracy will produce a functional setup? I think the
Seems like lack of automatic DPI detection is a typical Dear ImGui quirk, but good to know! |
I ran this synthetic benchmark with the #include <tracy/Tracy.hpp>
#include <chrono>
#include <iostream>
int main() {
tracy::StartupProfiler();
auto begin = std::chrono::high_resolution_clock::now();
auto count = 1000'000'00ull;
for (decltype(count) i = 0; i < count; ++i) {
// ZoneScopedN("Hello");
TracyMessageL("Hello");
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << (std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() / (double)count) << '\n';
} The results indicate a runtime per-event of 43-44ns (X13s), 31-48ns (M1, seems to vary by power state), |
Alright cool, that's roughly indicative of what I'd expect around dumping memory in to a ringbuffer that another thread consumes. So roughly 5-8x the speed of a write in to ftrace if my average was ~260ns on X1E. |
Overview
The existing GPUVis backend isn't suitable for live-profiling and carries heavy overhead per traced event (a memory allocation and a system call). This makes gathering accurate data for quantitative analysis impossible.
This PR adds a new backend for FEX's profiler interface based on Tracy, a nanosecond resolution profiler with support for live tracing and a rich feature set. Notably, statistics and histograms are generated from profiled zones out-of-the-box.
To use this backend, one of the environment variables
FEX_PROFILE_TARGET_NAME
or
FEX_PROFILE_TARGET_PATH
must be defined to select the application underprofile by name or by path suffix.
Here's an example screenshot of the profiler view while running God of War (a very JIT-time heavy title). I'll post more screenshots as comments below.
Usage
ENABLE_FEXCORE_PROFILER=ON
andFEXCORE_PROFILER_BACKEND=tracy
External/tracy/profiler
using CMake and run usingtracy-profiler -a ::
FEX_PROFILE_TARGET_NAME=Celeste.bin.x86_64
(matches app name) orFEX_PROFILE_TARGET_PATH=amd64/SuperMeatBoy
(matches a path suffix)Future work
Plotting data
Tracy supports plotting data and creating pretty graphs out of it, which is easy to implement but will need new Profiler interfaces.
Frame markers
Integration with GL/Vulkan library forwarding allows us to detect where frames end, which is useful to recognize stuttering in a recorded profile.