bolt/docs/PointerAuthDesign.md - mirrors/github.com/llvm/llvm-project - Git at Google

 # Optimizing binaries with pac-ret hardening

 This is a design document about processing the `DW_CFA_AARCH64_negate_ra_state`
 DWARF instruction in BOLT. As it describes internal design decisions, the
 intended audience is BOLT developers. The document is an updated version of the
 [RFC posted on the LLVM Discourse](https://discourse.llvm.org/t/rfc-bolt-aarch64-handle-opnegaterastate-to-enable-optimizing-binaries-with-pac-ret-hardening/86594).


 `DW_CFA_AARCH64_negate_ra_state` is also referred to as  `.cfi_negate_ra_state`
 in assembly, or `OpNegateRAState` in BOLT sources. In this document, I will use
 **negate-ra-state** as a shorthand.

 Note: there are two resolutions for CFI:
 - Call Frame Instruction: individual DWARF instruction, e.g. negate-ra-state
 - Control Flow Integrity: a security mechanism, e.g. pointer authentication.

 ## Introduction

 ### Pointer Authentication

 For more information, see the [pac-ret section of the BOLT-binary-analysis document](BinaryAnalysis.md#pac-ret-analysis).

 ### DW_CFA_AARCH64_negate_ra_state

 The negate-ra-state CFI is a vendor-specific Call Frame Instruction defined in
 the [Arm ABI](https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1).

 ```
 The DW_CFA_AARCH64_negate_ra_state operation negates bit[0] of the RA_SIGN_STATE pseudo-register.
 ```

 This bit indicates to the unwinder whether the current return address is signed
 or not (hence the name). The unwinder uses this information to authenticate the
 pointer, and remove the Pointer Authentication Code (PAC) bits.
 Incorrect placement of negate-ra-state CFIs causes the unwinder to either attempt
 to authenticate an unsigned pointer (resulting in a segmentation fault), or skip
 authentication on a signed pointer, which can also cause a fault.

 Note: some unwinders use the `xpac` instruction to strip the PAC bits without
 authenticating the pointer. This is an incorrect (incomplete) implementation,
 as it allows control-flow modification in the case of unwinding.

 There are no DWARF instructions to directly set or clear the RA State. However,
 two other CFIs can also affect the RA state:
 - `DW_CFA_remember_state`: this CFI stores register rules onto an implicit stack.
 - `DW_CFA_restore_state`:  this CFI pops rules from this stack.

 Example:

 | CFI                            | Effect on RA state             |
 | ------------------------------ | ------------------------------ |
 | (default)                      | 0                              |
 | DW_CFA_AARCH64_negate_ra_state | 0 -> 1                         |
 | DW_CFA_remember_state          | 1 pushed to the stack          |
 | DW_CFA_AARCH64_negate_ra_state | 1 -> 0                         |
 | DW_CFA_restore_state           | 0 -> 1 (popped from the stack) |

 The Arm ABI also defines the DW_CFA_AARCH64_negate_ra_state_with_pc CFI, but it
 is not widely used, and is [likely to become deprecated](https://github.com/ARM-software/abi-aa/issues/327).

 ### Where are these CFIs needed?

 Whenever two consecutive instructions have different RA states, the unwinder must
 be informed of the change. This typically occurs during pointer signing or
 authentication. If adjacent instructions differ in RA state but neither signs
 nor authenticates the return address, they must belong to different control flow
 paths. One is part of an execution path with signed RA, the other is part of a
 path with an unsigned RA.

 In the example below, the first BasicBlock ends in a conditional branch, and
 jumps to two different BasicBlocks, each with their own authentication, and
 return. The instructions on the border of the second and third BasicBlock have
 different RA states. The `ret` at the end of the second BasicBlock is in unsigned
 state. The start of the third BasicBlock is after the `paciasp` in the control
 flow, but before the authentication. In this case, a negate-ra-state is needed
 at the end of the second BasicBlock.

 ```
         +----------------+
         |     paciasp    |
         |                |
         |      b.cc      |
         +--------+-------+
                  |
 +----------------+
 |                |
 |       +--------v-------+
 |       |                |
 |       |    autiasp     |
 |       |      ret       |   // RA: unsigned
 |       +----------------+
 +----------------+
                  |
         +--------v-------+  // RA: signed
         |                |
         |     autiasp    |
         |      ret       |
         +----------------+
 ```

 > [!important]
 > The unwinder does not follow the control flow graph. It reads unwind
 > information in the layout order.

 Because these locations are dependent on how the function layout looks,
 negate-ra-state CFIs will become invalid during BasicBlock reordering.

 ## Solution design

 The implementation introduces two new passes:
 1. `PointerAuthCFIAnalyzer`: assigns the RA state to each instruction based on
     the CFIs in the input binary
 2. `PointerAuthCFIFixup`: reads those assigned instruction RA states after
     optimizations, and emits `DW_CFA_AARCH64_negate_ra_state` CFIs at the correct
     places: wherever there is a state change between two consecutive instructions
     in the layout order.

 To track metadata on individual instructions, the `MCAnnotation` class was
 extended. These also have helper functions in `MCPlusBuilder`.

 ### Saving annotations at CFI reading

 CFIs are read and added to BinaryFunctions in `CFIReaderWriter::FillCFIInfoFor`.
 At this point, we add MCAnnotations about negate-ra-state, remember-state and
 restore-state CFIs to the instructions they refer to. This is to not interfere
 with the CFI processing that already happens in BOLT (e.g. remember-state and
 restore-state CFIs are removed in `normalizeCFIState` for reasons unrelated to PAC).

 As we add the MCAnnotations *to instructions*, we have to account for the case
 where the function starts with a CFI altering the RA state. As CFIs modify the RA
 state of the instructions before them, we cannot add the annotation to the first
 instruction.
 This special case is handled by adding an `initialRAState` bool to each BinaryFunction.
 If the `Offset` the CFI refers to is zero, we don't store an annotation, but set
 the `initialRAState` in `FillCFIInfoFor`. This information is then used in
 `PointerAuthCFIAnalyzer`.

 ### Binaries without DWARF info

 In some cases, the DWARF tables are stripped from the binary. These programs
 usually have some other unwind-mechanism.
 These passes only run on functions that include at least one negate-ra-state CFI.
 This avoids processing functions that do not use Pointer Authentication, or on
 functions that use Pointer Authentication, but do not have DWARF info.

 In summary:
 - pointer auth is not used: no change, the new passes do not run.
 - pointer auth is used, but DWARF info is stripped: no change, the new passes
   do not run.
 - pointer auth is used, and we have DWARF CFIs: passes run, and rewrite the
   negate-ra-state CFI.

 ### PointerAuthCFIAnalyzer pass

 This pass runs before optimizations reorder anything.

 It processes MCAnnotations generated during the CFI reading stage to check if
 instructions have either of the three CFIs that can modify RA state:
 - negate-ra-state,
 - remember-state,
 - restore-state.

 Then it adds new MCAnnotations to each instruction, indicating their RA state.
 Those annotations are:
 - Signed,
 - Unsigned.

 Below is a simple example, that shows the two different type of annotations:
 what we have before the pass, and after it.

 | Instruction                   | Before          |  After   |
 | ----------------------------- | --------------- | -------- |
 | paciasp                       | negate-ra-state | unsigned |
 | stp	x29, x30, [sp, #-0x10]! |                 | signed   |
 | mov	x29, sp                 |                 | signed   |
 | ldp	x29, x30, [sp], #0x10   |                 | signed   |
 | autiasp                       | negate-ra-state | signed   |
 | ret                           |                 | unsigned |

 ##### Error handling in PointerAuthCFIAnalyzer pass:

 Whenever the PointerAuthCFIAnalyzer pass finds inconsistencies in the current
 BinaryFunction, it marks the function as ignored using `BF.setIgnored()`. BOLT
 will not optimize this function but will emit it unchanged in the original section
 (`.bolt.org.text`).

 The inconsistencies are as follows:
 - finding a `pac*` instruction when already in signed state
 - finding an `aut*` instruction when already in unsigned state
 - finding `pac*` and `aut*` instructions without `.cfi_negate_ra_state`.

 Users will be informed about the number of ignored functions in the pass, the
 exact functions ignored, and the found inconsistency.

 ### PointerAuthCFIFixup

 This pass runs after optimizations. It performs the _inverse_ of PointerAuthCFIAnalyzer
 pass:
 1. it reads the RA state annotations attached to the instructions, and
 2. whenever the state changes, it adds a PseudoInstruction that holds an
    OpNegateRAState CFI.

 ##### Covering newly generated instructions:

 Some BOLT passes can add new Instructions. In PointerAuthCFIFixup, we have
 to know what RA state these have.

 > [!important]
 > As issue #160989 explains, unwind info is missing from stubs.
 > For this same reason, we cannot generate correct pac-specific unwind info: the
 > signedness of the _incorrect_ return address is meaningless.

 Assignment of RAStates to newly generated instructions is done in `inferUnknownStates`.
 We have two different cases to cover:

 1. If a BasicBlock has some instructions with known RA state, and some without, we
    can copy the RAState of known instructions to the unknown ones. As the control
    flow only changes between BasicBlocks, instructions in the same BasicBlock have
    the same return address. (The exception is noreturn calls, but these would only
    cause problems, if the newly inserted instruction is right after the call.)

 2. If a BasicBlock has no instructions with known RAState, we have to copy the
    RAState of the previous BasicBlock in layout order.

 ### Optimizations requiring special attention

 Marking states before optimizations ensure that instructions can be moved around
 freely. The only special case is function splitting. When a function is split,
 the split part becomes a new function in the emitted binary. For unwinding to
 work, it needs to "replay" all CFIs that lead up to the split point. BOLT does
 this for other CFIs. As negate-ra-state is not read (only stored as an Annotation),
 we have to do this manually in PointerAuthCFIFixup. Here, if the split part
 starts with an instruction that has Signed RA state, we add a negate-ra-state CFI
 to indicate this.

 ## Option to disallow the feature

 The feature can be guarded with the `--update-branch-prediction` flag, which is
 on by default. If the flag is set to false, and a function
 `containedNegateRAState()` after `FillCFIInfoFor()`, BOLT exits with an error.
	# Optimizing binaries with pac-ret hardening

	This is a design document about processing the `DW_CFA_AARCH64_negate_ra_state`
	DWARF instruction in BOLT. As it describes internal design decisions, the
	intended audience is BOLT developers. The document is an updated version of the
	[RFC posted on the LLVM Discourse](https://discourse.llvm.org/t/rfc-bolt-aarch64-handle-opnegaterastate-to-enable-optimizing-binaries-with-pac-ret-hardening/86594).


	`DW_CFA_AARCH64_negate_ra_state` is also referred to as `.cfi_negate_ra_state`
	in assembly, or `OpNegateRAState` in BOLT sources. In this document, I will use
	negate-ra-state as a shorthand.

	Note: there are two resolutions for CFI:
	- Call Frame Instruction: individual DWARF instruction, e.g. negate-ra-state
	- Control Flow Integrity: a security mechanism, e.g. pointer authentication.

	## Introduction

	### Pointer Authentication

	For more information, see the [pac-ret section of the BOLT-binary-analysis document](BinaryAnalysis.md#pac-ret-analysis).

	### DW_CFA_AARCH64_negate_ra_state

	The negate-ra-state CFI is a vendor-specific Call Frame Instruction defined in
	the [Arm ABI](https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1).

	```
	The DW_CFA_AARCH64_negate_ra_state operation negates bit[0] of the RA_SIGN_STATE pseudo-register.
	```

	This bit indicates to the unwinder whether the current return address is signed
	or not (hence the name). The unwinder uses this information to authenticate the
	pointer, and remove the Pointer Authentication Code (PAC) bits.
	Incorrect placement of negate-ra-state CFIs causes the unwinder to either attempt
	to authenticate an unsigned pointer (resulting in a segmentation fault), or skip
	authentication on a signed pointer, which can also cause a fault.

	Note: some unwinders use the `xpac` instruction to strip the PAC bits without
	authenticating the pointer. This is an incorrect (incomplete) implementation,
	as it allows control-flow modification in the case of unwinding.

	There are no DWARF instructions to directly set or clear the RA State. However,
	two other CFIs can also affect the RA state:
	- `DW_CFA_remember_state`: this CFI stores register rules onto an implicit stack.
	- `DW_CFA_restore_state`: this CFI pops rules from this stack.

	Example:

	\| CFI \| Effect on RA state \|
	\| ------------------------------ \| ------------------------------ \|
	\| (default) \| 0 \|
	\| DW_CFA_AARCH64_negate_ra_state \| 0 -> 1 \|
	\| DW_CFA_remember_state \| 1 pushed to the stack \|
	\| DW_CFA_AARCH64_negate_ra_state \| 1 -> 0 \|
	\| DW_CFA_restore_state \| 0 -> 1 (popped from the stack) \|

	The Arm ABI also defines the DW_CFA_AARCH64_negate_ra_state_with_pc CFI, but it
	is not widely used, and is [likely to become deprecated](https://github.com/ARM-software/abi-aa/issues/327).

	### Where are these CFIs needed?

	Whenever two consecutive instructions have different RA states, the unwinder must
	be informed of the change. This typically occurs during pointer signing or
	authentication. If adjacent instructions differ in RA state but neither signs
	nor authenticates the return address, they must belong to different control flow
	paths. One is part of an execution path with signed RA, the other is part of a
	path with an unsigned RA.

	In the example below, the first BasicBlock ends in a conditional branch, and
	jumps to two different BasicBlocks, each with their own authentication, and
	return. The instructions on the border of the second and third BasicBlock have
	different RA states. The `ret` at the end of the second BasicBlock is in unsigned
	state. The start of the third BasicBlock is after the `paciasp` in the control
	flow, but before the authentication. In this case, a negate-ra-state is needed
	at the end of the second BasicBlock.

	```
	+----------------+
	\| paciasp \|
	\| \|
	\| b.cc \|
	+--------+-------+
	\|
	+----------------+
	\| \|
	\| +--------v-------+
	\| \| \|
	\| \| autiasp \|
	\| \| ret \| // RA: unsigned
	\| +----------------+
	+----------------+
	\|
	+--------v-------+ // RA: signed
	\| \|
	\| autiasp \|
	\| ret \|
	+----------------+
	```

	> [!important]
	> The unwinder does not follow the control flow graph. It reads unwind
	> information in the layout order.

	Because these locations are dependent on how the function layout looks,
	negate-ra-state CFIs will become invalid during BasicBlock reordering.

	## Solution design

	The implementation introduces two new passes:
	1. `PointerAuthCFIAnalyzer`: assigns the RA state to each instruction based on
	the CFIs in the input binary
	2. `PointerAuthCFIFixup`: reads those assigned instruction RA states after
	optimizations, and emits `DW_CFA_AARCH64_negate_ra_state` CFIs at the correct
	places: wherever there is a state change between two consecutive instructions
	in the layout order.

	To track metadata on individual instructions, the `MCAnnotation` class was
	extended. These also have helper functions in `MCPlusBuilder`.

	### Saving annotations at CFI reading

	CFIs are read and added to BinaryFunctions in `CFIReaderWriter::FillCFIInfoFor`.
	At this point, we add MCAnnotations about negate-ra-state, remember-state and
	restore-state CFIs to the instructions they refer to. This is to not interfere
	with the CFI processing that already happens in BOLT (e.g. remember-state and
	restore-state CFIs are removed in `normalizeCFIState` for reasons unrelated to PAC).

	As we add the MCAnnotations to instructions, we have to account for the case
	where the function starts with a CFI altering the RA state. As CFIs modify the RA
	state of the instructions before them, we cannot add the annotation to the first
	instruction.
	This special case is handled by adding an `initialRAState` bool to each BinaryFunction.
	If the `Offset` the CFI refers to is zero, we don't store an annotation, but set
	the `initialRAState` in `FillCFIInfoFor`. This information is then used in
	`PointerAuthCFIAnalyzer`.

	### Binaries without DWARF info

	In some cases, the DWARF tables are stripped from the binary. These programs
	usually have some other unwind-mechanism.
	These passes only run on functions that include at least one negate-ra-state CFI.
	This avoids processing functions that do not use Pointer Authentication, or on
	functions that use Pointer Authentication, but do not have DWARF info.

	In summary:
	- pointer auth is not used: no change, the new passes do not run.
	- pointer auth is used, but DWARF info is stripped: no change, the new passes
	do not run.
	- pointer auth is used, and we have DWARF CFIs: passes run, and rewrite the
	negate-ra-state CFI.

	### PointerAuthCFIAnalyzer pass

	This pass runs before optimizations reorder anything.

	It processes MCAnnotations generated during the CFI reading stage to check if
	instructions have either of the three CFIs that can modify RA state:
	- negate-ra-state,
	- remember-state,
	- restore-state.

	Then it adds new MCAnnotations to each instruction, indicating their RA state.
	Those annotations are:
	- Signed,
	- Unsigned.

	Below is a simple example, that shows the two different type of annotations:
	what we have before the pass, and after it.

	\| Instruction \| Before \| After \|
	\| ----------------------------- \| --------------- \| -------- \|
	\| paciasp \| negate-ra-state \| unsigned \|
	\| stp x29, x30, [sp, #-0x10]! \| \| signed \|
	\| mov x29, sp \| \| signed \|
	\| ldp x29, x30, [sp], #0x10 \| \| signed \|
	\| autiasp \| negate-ra-state \| signed \|
	\| ret \| \| unsigned \|

	##### Error handling in PointerAuthCFIAnalyzer pass:

	Whenever the PointerAuthCFIAnalyzer pass finds inconsistencies in the current
	BinaryFunction, it marks the function as ignored using `BF.setIgnored()`. BOLT
	will not optimize this function but will emit it unchanged in the original section
	(`.bolt.org.text`).

	The inconsistencies are as follows:
	- finding a `pac*` instruction when already in signed state
	- finding an `aut*` instruction when already in unsigned state
	- finding `pac` and `aut` instructions without `.cfi_negate_ra_state`.

	Users will be informed about the number of ignored functions in the pass, the
	exact functions ignored, and the found inconsistency.

	### PointerAuthCFIFixup

	This pass runs after optimizations. It performs the _inverse_ of PointerAuthCFIAnalyzer
	pass:
	1. it reads the RA state annotations attached to the instructions, and
	2. whenever the state changes, it adds a PseudoInstruction that holds an
	OpNegateRAState CFI.

	##### Covering newly generated instructions:

	Some BOLT passes can add new Instructions. In PointerAuthCFIFixup, we have
	to know what RA state these have.

	> [!important]
	> As issue #160989 explains, unwind info is missing from stubs.
	> For this same reason, we cannot generate correct pac-specific unwind info: the
	> signedness of the _incorrect_ return address is meaningless.

	Assignment of RAStates to newly generated instructions is done in `inferUnknownStates`.
	We have two different cases to cover:

	1. If a BasicBlock has some instructions with known RA state, and some without, we
	can copy the RAState of known instructions to the unknown ones. As the control
	flow only changes between BasicBlocks, instructions in the same BasicBlock have
	the same return address. (The exception is noreturn calls, but these would only
	cause problems, if the newly inserted instruction is right after the call.)

	2. If a BasicBlock has no instructions with known RAState, we have to copy the
	RAState of the previous BasicBlock in layout order.

	### Optimizations requiring special attention

	Marking states before optimizations ensure that instructions can be moved around
	freely. The only special case is function splitting. When a function is split,
	the split part becomes a new function in the emitted binary. For unwinding to
	work, it needs to "replay" all CFIs that lead up to the split point. BOLT does
	this for other CFIs. As negate-ra-state is not read (only stored as an Annotation),
	we have to do this manually in PointerAuthCFIFixup. Here, if the split part
	starts with an instruction that has Signed RA state, we add a negate-ra-state CFI
	to indicate this.

	## Option to disallow the feature

	The feature can be guarded with the `--update-branch-prediction` flag, which is
	on by default. If the flag is set to false, and a function
	`containedNegateRAState()` after `FillCFIInfoFor()`, BOLT exits with an error.