tree: cc121a5b5af5d5df5ea06a59f5e9241ed179e2f2 [path history] [tgz]
  1. analyze/
  2. main.go
  3. Makefile
  4. README.md
contrib/gfx/perf-tools/profile-analysis/README.md

Analyze

Analyze is a command-line tool for analyzing profiles produced by apitrace. It is used to analyze a single profile or pairs of profiles produced by running apitrace on the same trace on different platforms.

Producing profiles

Profiles are generated by running the apitrace command glretrace on a trace file. Analyze can parse profile info with GPU and CPU timing information. To produce such files, you must run glretrace with options --pcpu and --pgpu. Most of the time, you'll also want to use option --min-cpu-time=0 to ensure that the profile includes all calls. Without it, only calls taking 1 microsecond or more will be captured.

Example:

glretrace --pcpu --pgpu --min-cpu-time=0 traces_linux_10181_payday2_release.trace > profile_data.txt

Beware! When generating profiles by running traces in a Virgl environment, generating GPU timing can significantly skew the CPU timing. In such cases, it is better to capture CPU and GPU timings in separate profiles. (For the curious mind, this is because glretrace insert query operations around each call. Round-tripping this query ops through Virgl adds to the CPU time.)

Building Analyze

  1. Make sure the go tools are installed on your machine.
  2. Within directory .../src/platform/dev/contrib/gfx/perf-tools/profile-analysis run make.

The binary for analyze is dropped in .../src/platform/dev/contrib/gfx/perf-tools/profile-analysis/bin

Running Analyze

Once you have a profile and you‘ve successfully build analyze, you’re ready to analyze your first profile. Run analyze from within the bin directory as follows:

./bin/analyze <path to profile_data.txt>

Profiles can be very large and it might take a few minutes to load the profile. Once that is done, analyze will enter console mode:

Reading profile_data.txt....
6222 frames read from profile_data.txt
->

To get started, let's type a command to show the 5 most expensive frame by average CPU time:

-> show-frames n=5 s=bycpuavg
Profile: trime_trace_virgl.txt
  frame num   calls           GPU total            CPU total
---------------------------------------------------------------------
          3   64629             353.7 mS                3.6 S
       1119   18811             331.3 mS             513.4 mS
        912   35751             754.5 uS             348.5 mS
       5986    9316               4.7 mS             301.8 mS
       5861    7192               3.6 mS             243.9 mS
->

A few more useful tidbits about the console:

  1. Type help to get basic help and help <command> to get more detailed help for that command.
  2. The console supports history. Use the up/down arrows to move back and forth between older commands.
  3. The console has limited support for tab completion.
  4. Type quit to exit analyze. (Command history is preserved across runs.)

Analyzing dual profiles

Analyze is even more useful when processing two related profiles together. Two profiles are “related” if they were generated from the same trace, albeit on different platforms. To do so, run analyze with the two profiles as follows:

./bin/analyze <profile_data1.txt> <profile_data2.txt>

After loading the two profiles, analyze will check that they are compatible. (It does so by ensuring that matching calls in the two profiles call into the same GL function.)

Here's a sample command that works on two profiles simultaneously:

-> show-frame-details 1646 gt=0 ct=500000
Frame details for frames #1646 to 1646
                                                   GPU       CPU     GPU 1  CPU 1         GPU       CPU      GPU 2   CPU 2
 frame                      call name    call #    Prof 1    Prof 1    %      %           Prof 2    Prof 2     %       %
----------------------------------------------------------------------------------------------------------------------------
  1646                glCompileShader  10442713    0.0 nS    1.8 mS   0.0%   3.8%         0.0 nS    1.2 mS   0.0%   0.9%
  1646                  glLinkProgram  10442733    0.0 nS    7.5 mS   0.0%  15.8%         0.0 nS   20.9 mS   0.0%  14.9%
  1646            glDrawRangeElements  10442826   30.8 uS  733.6 uS   0.8%   1.5%        62.2 uS   38.1 uS   0.4%   0.0%
  1646                glCompileShader  10442841    0.0 nS    1.7 mS   0.0%   3.6%         0.0 nS    1.1 mS   0.0%   0.8%
  1646                glCompileShader  10442845    0.0 nS    4.2 mS   0.0%   8.9%         0.0 nS   24.6 uS   0.0%   0.0%
  1646                  glLinkProgram  10442866    0.0 nS    7.4 mS   0.0%  15.5%         0.0 nS   25.8 mS   0.0%  18.4%
  1646            glDrawRangeElements  10442953   29.4 uS  772.0 uS   0.8%   1.6%        64.6 uS   31.5 uS   0.4%   0.0%
  1646                        glFlush  10443734    0.0 nS   21.2 mS   0.0%  44.6%         0.0 nS  125.2 uS   0.0%   0.1%
1 frames out of 1 shown, or 100.0%

When analyzing two related profiles, it is even more important to use option --min-cpu-time=0 when generating the profiles with glretrace. That is because, without it, glretrace will filter out different calls on each platform. You will likely end up with non-matching subsets of GL calls in each profile. They will still load in analyze successfully, but they will be harder to compare accurately.

Examples

Show the 10 most expensive frames:

-> show-frames n=10
Profile: 040117_KOTOR2.trace.crostini.prof
  frame num   calls           GPU total            CPU total
---------------------------------------------------------------------
        405   14576              31.9 mS             242.1 mS
        404   14216              33.1 mS             209.9 mS
        435   14691              32.3 mS             160.4 mS
       1774    7613              26.6 mS             147.8 mS
        665   15730              30.5 mS             132.3 mS
        467   14691              34.3 mS             131.0 mS
        601   13677              28.7 mS             130.7 mS
       1583    7685              24.4 mS             130.5 mS
       1481    7278              23.5 mS             129.8 mS
        534   15213              34.2 mS             128.5 mS

Show the 10 most expensive frames for function glMapBufferARB:

-> show-frames n=10 f=glMapBufferARB
Profile: 040117_KOTOR2.trace.crostini.prof
  frame num   calls           GPU total            CPU total
---------------------------------------------------------------------
        405   14576               0.0 nS             223.1 mS
        404   14216               0.0 nS             193.1 mS
        435   14691               0.0 nS             156.7 mS
       1774    7613               0.0 nS             144.2 mS
        665   15730               0.0 nS             128.2 mS
       1583    7685               0.0 nS             126.6 mS
        467   14691               0.0 nS             121.5 mS
       1387   10275               0.0 nS             120.3 mS
       1481    7278               0.0 nS             119.1 mS
        534   15213               0.0 nS             118.5 mS

Show details for frame #1774, showing only the calls that spend at least 10mS in the CPU:

-> show-frame-details 1774 ct=10000000 gt=0
Frame details for frames #1774 to 1774
                                                   GPU       CPU     GPU 1  CPU 1         GPU       CPU      GPU 2   CPU 2
 frame                      call name    call #    Prof 1    Prof 1    %      %           Prof 2    Prof 2     %       %
----------------------------------------------------------------------------------------------------------------------------
  1774                 glMapBufferARB  17262871    0.0 nS   20.3 mS   0.0%  13.7%         0.0 nS    5.7 uS   0.0%   0.0%
  1774                 glMapBufferARB  17263521    0.0 nS   10.3 mS   0.0%   7.0%         0.0 nS  152.9 uS   0.0%   0.7%
  1774                 glMapBufferARB  17263569    0.0 nS   29.0 mS   0.0%  19.6%         0.0 nS  135.4 uS   0.0%   0.6%
1 frames out of 1 shown, or 100.0%

Show statistics for call glMapBufferARB:

-> call-stats f=glMapBufferARB
GPU timing statistics for 040117_KOTOR2.trace.crostini.prof:
  Sample count:       202492
  Average call time:  0.0 nS
  Max call time:      0.0 nS
  Min call time:      0.0 nS
  Standard dev:       0.0 nS
  Total time in call: 0.0 nS
  % of time in call: 0.0
CPU timing statistics for 040117_KOTOR2.trace.crostini.prof:
  Sample count:       202492
  Average call time:  599.5 uS
  Max call time:      142.1 mS
  Min call time:      87.0 nS
  Standard dev:       1.9 mS
  Total time in call: 121.4 S
  % of time in call: 86.4

Show the 10 most expensive calls, by average CPU time, in profile 1 and 2:

-> show-calls n=10
Profile: 040117_KOTOR2.trace.crostini.prof
                          call   count       GPU|CPU avg          GPU|CPU max          GPU|CPU min         GPU|CPU % total
-------------------------------------------------------------------------------------------------------------------------
               glTexSubImage2D      92    0.0 nS |   7.4 mS    0.0 nS |  37.4 mS    0.0 nS |   3.0 uS      0.0% |    0.5%
             glBlitFramebuffer    2438    2.4 mS |   2.4 mS    2.7 mS | 103.4 mS    1.0 mS |  13.9 uS      9.8% |    4.2%
               glCompileShader       6    0.0 nS |   1.6 mS    0.0 nS |   4.4 mS    0.0 nS | 384.5 uS      0.0% |    0.0%
              glCopyTexImage2D       6    0.0 nS |   1.1 mS    0.0 nS |   1.7 mS    0.0 nS | 759.6 uS      0.0% |    0.0%
                 glLinkProgram       3    0.0 nS |   1.1 mS    0.0 nS |   1.4 mS    0.0 nS | 510.4 uS      0.0% |    0.0%
                  glBufferData      96    0.0 nS | 747.8 uS    0.0 nS |   3.2 mS    0.0 nS |   9.3 uS      0.0% |    0.1%
                glMapBufferARB  202492    0.0 nS | 599.5 uS    0.0 nS | 142.1 mS    0.0 nS |  87.0 nS      0.0% |   86.4%
               glDeleteBuffers       6    0.0 nS | 148.0 uS    0.0 nS | 356.4 uS    0.0 nS |   3.1 uS      0.0% |    0.0%
                  glTexImage2D     731    0.0 nS | 126.7 uS    0.0 nS |  17.2 mS    0.0 nS | 622.0 nS      0.0% |    0.1%
              glDeleteTextures     597    0.0 nS | 108.9 uS    0.0 nS |  11.3 mS    0.0 nS |  62.0 nS      0.0% |    0.0%
-> show-calls n=10 p2
Profile: 040117_KOTOR2.trace.crouton.prof
                          call   count       GPU|CPU avg          GPU|CPU max          GPU|CPU min         GPU|CPU % total
-------------------------------------------------------------------------------------------------------------------------
            glProgramStringARB     179    0.0 nS |   3.3 mS    0.0 nS |  26.8 mS    0.0 nS | 270.3 uS      0.0% |    2.0%
              glCopyTexImage2D       6    0.0 nS |   3.1 mS    0.0 nS |   5.4 mS    0.0 nS |   1.1 mS      0.0% |    0.1%
                  glTexImage3D       6    0.0 nS | 178.7 uS    0.0 nS | 319.4 uS    0.0 nS |  77.9 uS      0.0% |    0.0%
                 glLinkProgram       3    0.0 nS |  94.5 uS    0.0 nS | 106.8 uS    0.0 nS |  79.7 uS      0.0% |    0.0%
                  glTexImage2D     731    0.0 nS |  94.3 uS    0.0 nS |   8.0 mS    0.0 nS | 849.0 nS      0.0% |    0.2%
                glMapBufferARB  202492    0.0 nS |  90.9 uS    0.0 nS |  56.0 mS    0.0 nS | 655.0 nS      0.0% |   63.1%
                       glFlush       3    0.0 nS |  78.4 uS    0.0 nS | 161.7 uS    0.0 nS |  31.3 uS      0.0% |    0.0%
                  glPushAttrib    8347    0.0 nS |  45.8 uS    0.0 nS |   4.5 mS    0.0 nS | 372.0 nS      0.0% |    1.3%
           glDeleteProgramsARB     269    0.0 nS |  39.6 uS    0.0 nS |   1.3 mS    0.0 nS |  27.0 nS      0.0% |    0.0%
               glTexSubImage2D      92    0.0 nS |  34.0 uS    0.0 nS | 134.9 uS    0.0 nS |   8.6 uS      0.0% |    0.0%

Compare the 10 most expensive calls, by average CPU time, in two profiles:

-> compare-profiles n=10 s=bycpuavg
Compare statistics: 040117_KOTOR2.trace.crostini.prof / 040117_KOTOR2.trace.crouton.prof
                                                Prof 1               Prof 2             Ratio p1/p2           Diff p1-p2
                          call   count        GPU|CPU              GPU|CPU              GPU|CPU              GPU|CPU
----------------------------------------------------------------------------------------------------------------------------
               glTexSubImage2D      92    0.0 nS |   7.4 mS    0.0 nS |  34.0 uS      INF! |   218.06    0.0 nS |   7.4 mS
             glBlitFramebuffer    2438    2.4 mS |   2.4 mS    2.2 mS |   9.7 uS      1.07 |   248.80  155.8 uS |   2.4 mS
               glCompileShader       6    0.0 nS |   1.6 mS    0.0 nS |   3.7 uS      INF! |   417.54    0.0 nS |   1.6 mS
              glCopyTexImage2D       6    0.0 nS |   1.1 mS    0.0 nS |   3.1 mS      INF! |  0.36137    0.0 nS |  -1.9 mS
                 glLinkProgram       3    0.0 nS |   1.1 mS    0.0 nS |  94.5 uS      INF! |    11.17    0.0 nS | 960.5 uS
                  glBufferData      96    0.0 nS | 747.8 uS    0.0 nS |  14.4 uS      INF! |    51.77    0.0 nS | 733.4 uS
                glMapBufferARB  202492    0.0 nS | 599.5 uS    0.0 nS |  90.9 uS      INF! |     6.59    0.0 nS | 508.6 uS
               glDeleteBuffers       6    0.0 nS | 148.0 uS    0.0 nS |   7.3 uS      INF! |    20.28    0.0 nS | 140.7 uS
                  glTexImage2D     731    0.0 nS | 126.7 uS    0.0 nS |  94.3 uS      INF! |     1.34    0.0 nS |  32.3 uS
              glDeleteTextures     597    0.0 nS | 108.9 uS    0.0 nS |   5.4 uS      INF! |    20.09    0.0 nS | 103.5 uS