Analyze is a command-line tool for analyzing profiles produced by apitrace. It is used to analyze a single profile or pairs of profiles produced by running apitrace on the same trace on different platforms.
Profiles are generated by running the apitrace command glretrace
on a trace file. Analyze can parse profile info with GPU and CPU timing information. To produce such files, you must run glretrace
with options --pcpu
and --pgpu
. Most of the time, you'll also want to use option --min-cpu-time=0
to ensure that the profile includes all calls. Without it, only calls taking 1 microsecond or more will be captured.
Example:
glretrace --pcpu --pgpu --min-cpu-time=0 traces_linux_10181_payday2_release.trace > profile_data.txt
Beware! When generating profiles by running traces in a Virgl environment, generating GPU timing can significantly skew the CPU timing. In such cases, it is better to capture CPU and GPU timings in separate profiles. (For the curious mind, this is because glretrace insert query operations around each call. Round-tripping this query ops through Virgl adds to the CPU time.)
.../src/platform/dev/contrib/gfx/perf-tools/profile-analysis
run make
.The binary for analyze is dropped in .../src/platform/dev/contrib/gfx/perf-tools/profile-analysis/bin
Once you have a profile and you‘ve successfully build analyze, you’re ready to analyze your first profile. Run analyze from within the bin directory as follows:
./bin/analyze <path to profile_data.txt>
Profiles can be very large and it might take a few minutes to load the profile. Once that is done, analyze will enter console mode:
Reading profile_data.txt.... 6222 frames read from profile_data.txt ->
To get started, let's type a command to show the 5 most expensive frame by average CPU time:
-> show-frames n=5 s=bycpuavg Profile: trime_trace_virgl.txt frame num calls GPU total CPU total --------------------------------------------------------------------- 3 64629 353.7 mS 3.6 S 1119 18811 331.3 mS 513.4 mS 912 35751 754.5 uS 348.5 mS 5986 9316 4.7 mS 301.8 mS 5861 7192 3.6 mS 243.9 mS ->
A few more useful tidbits about the console:
help
to get basic help and help <command>
to get more detailed help for that command.quit
to exit analyze. (Command history is preserved across runs.)Analyze is even more useful when processing two related profiles together. Two profiles are “related” if they were generated from the same trace, albeit on different platforms. To do so, run analyze with the two profiles as follows:
./bin/analyze <profile_data1.txt> <profile_data2.txt>
After loading the two profiles, analyze will check that they are compatible. (It does so by ensuring that matching calls in the two profiles call into the same GL function.)
Here's a sample command that works on two profiles simultaneously:
-> show-frame-details 1646 gt=0 ct=500000 Frame details for frames #1646 to 1646 GPU CPU GPU 1 CPU 1 GPU CPU GPU 2 CPU 2 frame call name call # Prof 1 Prof 1 % % Prof 2 Prof 2 % % ---------------------------------------------------------------------------------------------------------------------------- 1646 glCompileShader 10442713 0.0 nS 1.8 mS 0.0% 3.8% 0.0 nS 1.2 mS 0.0% 0.9% 1646 glLinkProgram 10442733 0.0 nS 7.5 mS 0.0% 15.8% 0.0 nS 20.9 mS 0.0% 14.9% 1646 glDrawRangeElements 10442826 30.8 uS 733.6 uS 0.8% 1.5% 62.2 uS 38.1 uS 0.4% 0.0% 1646 glCompileShader 10442841 0.0 nS 1.7 mS 0.0% 3.6% 0.0 nS 1.1 mS 0.0% 0.8% 1646 glCompileShader 10442845 0.0 nS 4.2 mS 0.0% 8.9% 0.0 nS 24.6 uS 0.0% 0.0% 1646 glLinkProgram 10442866 0.0 nS 7.4 mS 0.0% 15.5% 0.0 nS 25.8 mS 0.0% 18.4% 1646 glDrawRangeElements 10442953 29.4 uS 772.0 uS 0.8% 1.6% 64.6 uS 31.5 uS 0.4% 0.0% 1646 glFlush 10443734 0.0 nS 21.2 mS 0.0% 44.6% 0.0 nS 125.2 uS 0.0% 0.1% 1 frames out of 1 shown, or 100.0%
When analyzing two related profiles, it is even more important to use option --min-cpu-time=0
when generating the profiles with glretrace
. That is because, without it, glretrace
will filter out different calls on each platform. You will likely end up with non-matching subsets of GL calls in each profile. They will still load in analyze successfully, but they will be harder to compare accurately.