blob: df2fc21fa2c7c1cbb1afe7350cc997f59b03a20a [file] [edit]
=====================================
AArch64 Optimization and Flags Status
=====================================
Overview
--------
This page summarizes default-off BOLT optimization flags that users may
explicitly enable when optimizing AArch64 binaries.
BOLT is to be used with binaries linked with
relocations (``--emit-relocs`` or ``-Wl,-q``) and representative profile data.
Main Code-Layout Optimizations
------------------------------
The following code-layout optimizations are typically the first options to
consider when optimizing AArch64 binaries with representative profile data.
They typically provide the largest performance gains among BOLT optimizations.
.. list-table::
:header-rows: 1
:widths: 34 42
:align: left
* - Flag
- Optimization
* - | ``--reorder-functions=exec-count|hfsort|cdsort|pettis-hansen|random|user``
| ``--function-order=<file>``
- Reorder functions
* - ``--reorder-blocks=normal|ext-tsp|cache|branch-predictor|reverse|cluster-shuffle``
- Reorder basic blocks
* - | ``--split-functions``
| ``--split-strategy=profile2|random2|randomN|all``
| ``--split-all-cold``
| ``--split-eh``
- Split hot and cold code
Other Supported Optimizations
-----------------------------
The following optimizations are also supported for AArch64.
.. list-table::
:header-rows: 1
:widths: 34 42
:align: left
* - Flag
- Optimization
* - | ``--align-blocks``
| ``--block-alignment=<uint>``
- Align basic blocks
* - ``--tail-duplication=aggressive|moderate|cache``
- Duplicate branch tails
* - ``--peepholes=double-jumps|tailcall-traps|useless-branches|all``
- Run peephole optimizations
* - | ``--inline-all``
| ``--inline-small-functions``
| Related options:
| ``--inline-ap``
| ``--inline-limit=<uint>``
| ``--inline-small-functions-bytes=<uint>``
- Inline functions
* - ``--icf=safe|all``
- Fold identical functions
Supported Flags With Limitations
--------------------------------
The following flags are implemented for AArch64, but require specific runtime
or option conditions. Enabling them without the required conditions may report
an error or perform no transformation.
.. list-table::
:header-rows: 1
:widths: 30 28 44
:align: left
* - Flag
- Optimization
- Notes
* - ``--inline-memcpy``
- Inline fixed-size ``memcpy`` calls
- Only applies when the copy size is a known constant; AArch64 skips sizes over 64 bytes.
* - ``--plt=hot|all``
- Optimize PLT calls
- Requires immediate binding. If BOLT cannot update the binary, relink with ``-znow``.
* - ``--hugify``
- Place hot code on huge pages
- Applies to binaries with a recognized entry point; skipped when ``--instrument`` is used.
* - | ``--reorder-data=<section1,section2,...>``
| ``--reorder-data-algo=count|funcs``
- Reorder data sections
- ``move``, ``split`` and ``aggressive`` disable data reordering.
* - ``--split-strategy=cdsplit``
- Split functions using cache-directed splitting
- Requires ``--compact-code-model`` on AArch64.
Unsupported Flags
-----------------
The following flags are not available for AArch64. ``Not applicable to
AArch64`` means the optimization targets architectural features or mechanisms
that do not apply to AArch64. ``Not implemented for AArch64`` means the
optimization could be relevant, but is not currently implemented for this
target.
.. list-table::
:header-rows: 1
:widths: 30 28 42
:align: left
* - Flag
- Optimization
- Notes
* - ``--jt-footprint-reduction``
- Reduce jump-table footprint
- Not implemented for AArch64.
* - ``--three-way-branch``
- Reorder three-way branches
- Not implemented for AArch64.
* - ``--simplify-rodata-loads``
- Replace read-only data loads with constants
- Not implemented for AArch64.
* - ``--frame-opt=hot|all``
- Optimize stack-frame accesses
- Not implemented for AArch64.
* - ``--indirect-call-promotion=calls|jump-tables|all``
- Promote indirect calls
- Not implemented for AArch64.
* - ``--memcpy1-spec=<func1,func2:cs1:cs2,...>``
- Specialize one-byte ``memcpy`` calls
- Not implemented for AArch64.
* - ``--reg-reassign``
- Reassign registers to reduce encoding size
- Not applicable to AArch64.
* - ``--cmov-conversion``
- Convert branches to conditional moves
- Not applicable to AArch64.
* - | ``--stoke``
| ``--stoke-out``
- Emit STOKE optimization data
- Not applicable to AArch64.
* - ``--insert-retpolines``
- Insert retpolines
- Not applicable to AArch64.