Analyze is a command-line tool for analyzing profiles produced by apitrace. It is used to analyze a single profile or pairs of profiles produced by running apitrace on the same trace on different platforms.
Profiles are generated by running the apitrace command glretrace
on a trace file. Analyze can parse profile info with GPU and CPU timing information. To produce such files, you must run glretrace
with options --pcpu
and --pgpu
. Most of the time, you'll also want to use option --min-cpu-time=0
to ensure that the profile includes all calls. Without it, only calls taking 1 microsecond or more will be captured.
Example:
glretrace --pcpu --pgpu --min-cpu-time=0 traces_linux_10181_payday2_release.trace > profile_data.txt
Beware! When generating profiles by running traces in a Virgl environment, generating GPU timing can significantly skew the CPU timing. In such cases, it is better to capture CPU and GPU timings in separate profiles. (For the curious mind, this is because glretrace insert query operations around each call. Round-tripping this query ops through Virgl adds to the CPU time.)
.../src/platform/dev/contrib/gfx/perf-tools/profile-analysis
run make
.The binary for analyze is dropped in .../src/platform/dev/contrib/gfx/perf-tools/profile-analysis/bin
Once you have a profile and you‘ve successfully build analyze, you’re ready to analyze your first profile. Run analyze from within the bin directory as follows:
./bin/analyze <path to profile_data.txt>
Profiles can be very large and it might take a few minutes to load the profile. Once that is done, analyze will enter console mode:
Reading profile_data.txt.... 6222 frames read from profile_data.txt ->
To get started, let's type a command to show the 5 most expensive frame by average CPU time:
-> show-frames n=5 s=bycpuavg Profile: trime_trace_virgl.txt frame num calls GPU total CPU total --------------------------------------------------------------------- 3 64629 353.7 mS 3.6 S 1119 18811 331.3 mS 513.4 mS 912 35751 754.5 uS 348.5 mS 5986 9316 4.7 mS 301.8 mS 5861 7192 3.6 mS 243.9 mS ->
A few more useful tidbits about the console:
help
to get basic help and help <command>
to get more detailed help for that command.quit
to exit analyze. (Command history is preserved across runs.)Analyze is even more useful when processing two related profiles together. Two profiles are “related” if they were generated from the same trace, albeit on different platforms. To do so, run analyze with the two profiles as follows:
./bin/analyze <profile_data1.txt> <profile_data2.txt>
After loading the two profiles, analyze will check that they are compatible. (It does so by ensuring that matching calls in the two profiles call into the same GL function.)
Here's a sample command that works on two profiles simultaneously:
-> show-frame-details 1646 gt=0 ct=500000 Frame details for frames #1646 to 1646 GPU CPU GPU 1 CPU 1 GPU CPU GPU 2 CPU 2 frame call name call # Prof 1 Prof 1 % % Prof 2 Prof 2 % % ---------------------------------------------------------------------------------------------------------------------------- 1646 glCompileShader 10442713 0.0 nS 1.8 mS 0.0% 3.8% 0.0 nS 1.2 mS 0.0% 0.9% 1646 glLinkProgram 10442733 0.0 nS 7.5 mS 0.0% 15.8% 0.0 nS 20.9 mS 0.0% 14.9% 1646 glDrawRangeElements 10442826 30.8 uS 733.6 uS 0.8% 1.5% 62.2 uS 38.1 uS 0.4% 0.0% 1646 glCompileShader 10442841 0.0 nS 1.7 mS 0.0% 3.6% 0.0 nS 1.1 mS 0.0% 0.8% 1646 glCompileShader 10442845 0.0 nS 4.2 mS 0.0% 8.9% 0.0 nS 24.6 uS 0.0% 0.0% 1646 glLinkProgram 10442866 0.0 nS 7.4 mS 0.0% 15.5% 0.0 nS 25.8 mS 0.0% 18.4% 1646 glDrawRangeElements 10442953 29.4 uS 772.0 uS 0.8% 1.6% 64.6 uS 31.5 uS 0.4% 0.0% 1646 glFlush 10443734 0.0 nS 21.2 mS 0.0% 44.6% 0.0 nS 125.2 uS 0.0% 0.1% 1 frames out of 1 shown, or 100.0%
When analyzing two related profiles, it is even more important to use option --min-cpu-time=0
when generating the profiles with glretrace
. That is because, without it, glretrace
will filter out different calls on each platform. You will likely end up with non-matching subsets of GL calls in each profile. They will still load in analyze successfully, but they will be harder to compare accurately.
Show the 10 most expensive frames:
-> show-frames n=10 Profile: 040117_KOTOR2.trace.crostini.prof frame num calls GPU total CPU total --------------------------------------------------------------------- 405 14576 31.9 mS 242.1 mS 404 14216 33.1 mS 209.9 mS 435 14691 32.3 mS 160.4 mS 1774 7613 26.6 mS 147.8 mS 665 15730 30.5 mS 132.3 mS 467 14691 34.3 mS 131.0 mS 601 13677 28.7 mS 130.7 mS 1583 7685 24.4 mS 130.5 mS 1481 7278 23.5 mS 129.8 mS 534 15213 34.2 mS 128.5 mS
Show the 10 most expensive frames for function glMapBufferARB:
-> show-frames n=10 f=glMapBufferARB Profile: 040117_KOTOR2.trace.crostini.prof frame num calls GPU total CPU total --------------------------------------------------------------------- 405 14576 0.0 nS 223.1 mS 404 14216 0.0 nS 193.1 mS 435 14691 0.0 nS 156.7 mS 1774 7613 0.0 nS 144.2 mS 665 15730 0.0 nS 128.2 mS 1583 7685 0.0 nS 126.6 mS 467 14691 0.0 nS 121.5 mS 1387 10275 0.0 nS 120.3 mS 1481 7278 0.0 nS 119.1 mS 534 15213 0.0 nS 118.5 mS
Show details for frame #1774, showing only the calls that spend at least 10mS in the CPU:
-> show-frame-details 1774 ct=10000000 gt=0 Frame details for frames #1774 to 1774 GPU CPU GPU 1 CPU 1 GPU CPU GPU 2 CPU 2 frame call name call # Prof 1 Prof 1 % % Prof 2 Prof 2 % % ---------------------------------------------------------------------------------------------------------------------------- 1774 glMapBufferARB 17262871 0.0 nS 20.3 mS 0.0% 13.7% 0.0 nS 5.7 uS 0.0% 0.0% 1774 glMapBufferARB 17263521 0.0 nS 10.3 mS 0.0% 7.0% 0.0 nS 152.9 uS 0.0% 0.7% 1774 glMapBufferARB 17263569 0.0 nS 29.0 mS 0.0% 19.6% 0.0 nS 135.4 uS 0.0% 0.6% 1 frames out of 1 shown, or 100.0%
Show statistics for call glMapBufferARB:
-> call-stats f=glMapBufferARB GPU timing statistics for 040117_KOTOR2.trace.crostini.prof: Sample count: 202492 Average call time: 0.0 nS Max call time: 0.0 nS Min call time: 0.0 nS Standard dev: 0.0 nS Total time in call: 0.0 nS % of time in call: 0.0 CPU timing statistics for 040117_KOTOR2.trace.crostini.prof: Sample count: 202492 Average call time: 599.5 uS Max call time: 142.1 mS Min call time: 87.0 nS Standard dev: 1.9 mS Total time in call: 121.4 S % of time in call: 86.4
Show the 10 most expensive calls, by average CPU time, in profile 1 and 2:
-> show-calls n=10 Profile: 040117_KOTOR2.trace.crostini.prof call count GPU|CPU avg GPU|CPU max GPU|CPU min GPU|CPU % total ------------------------------------------------------------------------------------------------------------------------- glTexSubImage2D 92 0.0 nS | 7.4 mS 0.0 nS | 37.4 mS 0.0 nS | 3.0 uS 0.0% | 0.5% glBlitFramebuffer 2438 2.4 mS | 2.4 mS 2.7 mS | 103.4 mS 1.0 mS | 13.9 uS 9.8% | 4.2% glCompileShader 6 0.0 nS | 1.6 mS 0.0 nS | 4.4 mS 0.0 nS | 384.5 uS 0.0% | 0.0% glCopyTexImage2D 6 0.0 nS | 1.1 mS 0.0 nS | 1.7 mS 0.0 nS | 759.6 uS 0.0% | 0.0% glLinkProgram 3 0.0 nS | 1.1 mS 0.0 nS | 1.4 mS 0.0 nS | 510.4 uS 0.0% | 0.0% glBufferData 96 0.0 nS | 747.8 uS 0.0 nS | 3.2 mS 0.0 nS | 9.3 uS 0.0% | 0.1% glMapBufferARB 202492 0.0 nS | 599.5 uS 0.0 nS | 142.1 mS 0.0 nS | 87.0 nS 0.0% | 86.4% glDeleteBuffers 6 0.0 nS | 148.0 uS 0.0 nS | 356.4 uS 0.0 nS | 3.1 uS 0.0% | 0.0% glTexImage2D 731 0.0 nS | 126.7 uS 0.0 nS | 17.2 mS 0.0 nS | 622.0 nS 0.0% | 0.1% glDeleteTextures 597 0.0 nS | 108.9 uS 0.0 nS | 11.3 mS 0.0 nS | 62.0 nS 0.0% | 0.0%
-> show-calls n=10 p2 Profile: 040117_KOTOR2.trace.crouton.prof call count GPU|CPU avg GPU|CPU max GPU|CPU min GPU|CPU % total ------------------------------------------------------------------------------------------------------------------------- glProgramStringARB 179 0.0 nS | 3.3 mS 0.0 nS | 26.8 mS 0.0 nS | 270.3 uS 0.0% | 2.0% glCopyTexImage2D 6 0.0 nS | 3.1 mS 0.0 nS | 5.4 mS 0.0 nS | 1.1 mS 0.0% | 0.1% glTexImage3D 6 0.0 nS | 178.7 uS 0.0 nS | 319.4 uS 0.0 nS | 77.9 uS 0.0% | 0.0% glLinkProgram 3 0.0 nS | 94.5 uS 0.0 nS | 106.8 uS 0.0 nS | 79.7 uS 0.0% | 0.0% glTexImage2D 731 0.0 nS | 94.3 uS 0.0 nS | 8.0 mS 0.0 nS | 849.0 nS 0.0% | 0.2% glMapBufferARB 202492 0.0 nS | 90.9 uS 0.0 nS | 56.0 mS 0.0 nS | 655.0 nS 0.0% | 63.1% glFlush 3 0.0 nS | 78.4 uS 0.0 nS | 161.7 uS 0.0 nS | 31.3 uS 0.0% | 0.0% glPushAttrib 8347 0.0 nS | 45.8 uS 0.0 nS | 4.5 mS 0.0 nS | 372.0 nS 0.0% | 1.3% glDeleteProgramsARB 269 0.0 nS | 39.6 uS 0.0 nS | 1.3 mS 0.0 nS | 27.0 nS 0.0% | 0.0% glTexSubImage2D 92 0.0 nS | 34.0 uS 0.0 nS | 134.9 uS 0.0 nS | 8.6 uS 0.0% | 0.0%
Compare the 10 most expensive calls, by average CPU time, in two profiles:
-> compare-profiles n=10 s=bycpuavg Compare statistics: 040117_KOTOR2.trace.crostini.prof / 040117_KOTOR2.trace.crouton.prof Prof 1 Prof 2 Ratio p1/p2 Diff p1-p2 call count GPU|CPU GPU|CPU GPU|CPU GPU|CPU ---------------------------------------------------------------------------------------------------------------------------- glTexSubImage2D 92 0.0 nS | 7.4 mS 0.0 nS | 34.0 uS INF! | 218.06 0.0 nS | 7.4 mS glBlitFramebuffer 2438 2.4 mS | 2.4 mS 2.2 mS | 9.7 uS 1.07 | 248.80 155.8 uS | 2.4 mS glCompileShader 6 0.0 nS | 1.6 mS 0.0 nS | 3.7 uS INF! | 417.54 0.0 nS | 1.6 mS glCopyTexImage2D 6 0.0 nS | 1.1 mS 0.0 nS | 3.1 mS INF! | 0.36137 0.0 nS | -1.9 mS glLinkProgram 3 0.0 nS | 1.1 mS 0.0 nS | 94.5 uS INF! | 11.17 0.0 nS | 960.5 uS glBufferData 96 0.0 nS | 747.8 uS 0.0 nS | 14.4 uS INF! | 51.77 0.0 nS | 733.4 uS glMapBufferARB 202492 0.0 nS | 599.5 uS 0.0 nS | 90.9 uS INF! | 6.59 0.0 nS | 508.6 uS glDeleteBuffers 6 0.0 nS | 148.0 uS 0.0 nS | 7.3 uS INF! | 20.28 0.0 nS | 140.7 uS glTexImage2D 731 0.0 nS | 126.7 uS 0.0 nS | 94.3 uS INF! | 1.34 0.0 nS | 32.3 uS glDeleteTextures 597 0.0 nS | 108.9 uS 0.0 nS | 5.4 uS INF! | 20.09 0.0 nS | 103.5 uS