blob: bb93ab5d115b89665b1f3940cc27f0eb33cdf290 [file] [log] [blame] [edit]
========================================
LLVM IR Generation for EH and Cleanups
========================================
.. contents::
:local:
Overview
========
This document describes how Clang's LLVM IR generation represents exception
handling (EH) and C++ cleanups. It focuses on the data structures and control
flow patterns used to model normal and exceptional exits, and it outlines how
the generated IR differs across common ABI models.
For details on the LLVM IR representation of exception handling, see
`LLVM Exception Handling <https://llvm.org/docs/ExceptionHandling.html>`_.
Core Model
==========
EH and cleanup handling is centered around an ``EHScopeStack`` that records
nested scopes for:
- **Cleanups**, which run on normal control flow, exceptional control flow, or
both. These are used for destructors, full-expression cleanups, and other
scope-exit actions.
- **Catch scopes**, which represent ``try``/``catch`` handlers.
- **Filter scopes**, used to model dynamic exception specifications and some
platform-specific filters.
- **Terminate scopes**, used for ``noexcept`` and similar termination paths.
Each cleanup is a small object with an ``Emit`` method. When a cleanup scope is
popped, the IR generator decides whether it must materialize a normal cleanup
block (for fallthrough, branch-through, or unresolved ``goto`` fixups) and/or an
EH cleanup entry (when exceptional control flow can reach the cleanup). This
results in a flattened CFG where cleanup lifetime is represented by the blocks
and edges that flow into those blocks.
Key Components
==============
The LLVM IR generation for EH and cleanups is spread across several core
components:
- ``CodeGenModule`` owns module-wide state such as the LLVM module, target
information, and the selected EH personality function. It provides access to
ABI helpers via ``CGCXXABI`` and target-specific hooks.
- ``CodeGenFunction`` manages per-function state and IR building. It owns the
``EHScopeStack``, tracks the current insertion point, and emits blocks, calls,
and branches. Most cleanup and EH control flow is built here.
- ``EHScopeStack`` is the central stack of scopes used to model EH and cleanup
semantics. It stores ``EHCleanupScope`` entries for cleanups, along with
``EHCatchScope``, ``EHFilterScope``, and ``EHTerminateScope`` for handlers and
termination logic.
- ``EHCleanupScope`` stores the cleanup object plus state data (active flags,
fixup depth, and enclosing scope links). When a cleanup scope is popped,
``CodeGenFunction`` decides whether to emit a normal cleanup block, an EH
cleanup entry, or both.
- Cleanup emission helpers implement the mechanics of branching through
cleanups, threading fixups, and emitting cleanup blocks.
- Exception emission helpers implement landing pads, dispatch blocks,
personality selection, and helper routines for try/catch, filters, and
terminate handling.
- ``CGCXXABI`` (and its ABI-specific implementations such as
``ItaniumCXXABI`` and ``MicrosoftCXXABI``) provide ABI-specific lowering for
throws, catch handling, and destructor emission details.
- The cleanup and exception handling code generation is driven by the flow of
``CodeGenFunction`` and its helper classes traversing the AST to emit IR for
C++ expressions, classes, and statements.
AST traversal in ``CodeGenFunction`` emits code and pushes cleanups or EH scopes,
``EHScopeStack`` records scope nesting, cleanup and exception helpers materialize
the CFG as scopes are popped, and ``CGCXXABI`` supplies ABI-specific details for
landing pads or funclets.
Cleanup Destination Routing
===========================
When multiple control flow exits (``return``, ``break``, ``continue``,
fallthrough) pass through the same cleanup, the generated IR shares a single
cleanup block among them. Before entering the cleanup, each exit path stores a
unique index into a "cleanup destination" slot. After the cleanup code runs, a
``switch`` instruction loads this index and dispatches to the appropriate final
destination. This avoids duplicating cleanup code for each exit while preserving
correct control flow.
For example, if a function has both a ``return`` and a ``break`` that exit
through the same destructor cleanup, both paths branch to the shared cleanup
block after storing their respective destination indices. The cleanup epilogue
then switches on the stored index to reach either the return block or the
loop-exit block.
When only a single exit passes through a cleanup (the common case), the switch
is unnecessary and the cleanup block branches directly to its sole destination.
Branch Fixups for Forward Gotos
-------------------------------
A ``goto`` statement that jumps forward to a label not yet seen poses a special
problem. The destination's enclosing cleanup scope is unknown at the point the
``goto`` is emitted. This is handled by emitting an optimistic branch and
recording a "fixup." When the cleanup scope is later popped, any recorded fixups
are resolved by rewriting the branch to thread through the cleanup block and
adding the destination to the cleanup's switch.
Exceptional Cleanups and EH Dispatch
====================================
Exceptional exits (``throw``, ``invoke`` unwinds) are routed through EH cleanup
entries, which are reached via a landing pad or a funclet dispatch block,
depending on the target ABI.
For Itanium-style EH (such as is used on x86-64 Linux), the IR uses ``invoke``
to call potentially-throwing operations and a ``landingpad`` instruction to
capture the exception and selector values. The landing pad aggregates any
catch and cleanup clauses for the current scope, and branches to a dispatch
block that compares the selector to type IDs and jumps to the appropriate
handler.
For Windows, LLVM IR uses funclet-style EH: ``catchswitch`` and ``catchpad`` for
handlers, and ``cleanuppad`` for cleanups, with ``catchret`` and ``cleanupret``
edges to resume normal flow. The personality function determines how these pads
are interpreted by the backend.
Personality and ABI Selection
=============================
Each function with exception handling constructs is associated with a
personality function (e.g. __gxx_personality_v0 for C++ on Linux). The
personality function determines the ABI-specifc EH behavior of the
function. The IR generation selects a personality function based on language
options and the target ABI (e.g., Itanium, MSVC SEH, SJLJ, Wasm EH). This
decision affects:
- Whether the IR uses landing pads or funclet pads.
- The shape of dispatch logic for catch and filter scopes.
- How termination or rethrow paths are modeled.
- Whether certain helper functions such as exception filters must be outlined.
Because the personality choice is made during IR generation, the CFG shape
directly reflects ABI-specific details.
Example: Array of Objects with Throwing Constructor
===================================================
Consider:
.. code-block:: c++
class MyClass {
public:
MyClass(); // may throw
~MyClass();
};
void doSomething(); // may throw
void f() {
MyClass arr[4];
doSomething();
}
High-level behavior
-------------------
- Construction of ``arr`` proceeds element-by-element. If an element constructor
throws, destructors must run for any elements that were successfully
constructed before the throw in reverse order of construction.
- After full construction, the call to ``doSomething`` may throw, in which case
the destructors for all constructed elements must run, in reverse order.
- On normal exit, destructors for all elements run in reverse order.
Codegen flow and key components
-------------------------------
- The surrounding compound statement enters a ``CodeGenFunction::LexicalScope``,
which is a ``RunCleanupsScope`` and is responsible for popping local cleanups
at the end of the block.
- ``CodeGenFunction::EmitDecl`` routes the local variable to
``CodeGenFunction::EmitVarDecl`` and then ``CodeGenFunction::EmitAutoVarDecl``,
which in turn calls ``EmitAutoVarAlloca``, ``EmitAutoVarInit``, and
``EmitAutoVarCleanups``.
- ``CodeGenFunction::EmitCXXAggrConstructorCall`` emits the array constructor
loop. While emitting the loop body, it enters a ``RunCleanupsScope`` and uses
``CodeGenFunction::pushRegularPartialArrayCleanup`` to register a
cleanup before calling ``CodeGenFunction::EmitCXXConstructorCall`` for one
element in the loop iteration. If this constructor were to throw an exception,
the cleanup handler would destroy the previously constructed elements in
reverse order.
- ``CodeGenFunction::EmitAutoVarCleanups`` calls ``emitAutoVarTypeCleanup``,
which ultimately registers a ``DestroyObject`` cleanup via
``CodeGenFunction::pushDestroy`` / ``pushFullExprCleanup`` for the full-array
destructor path.
- ``DestroyObject`` uses ``CodeGenFunction::destroyCXXObject``, which emits the
actual destructor call via ``CodeGenFunction::EmitCXXDestructorCall``.
- Cleanup emission helpers (e.g., ``CodeGenFunction::PopCleanupBlock`` and
``CodeGenFunction::EmitBranchThroughCleanup``) thread both normal and EH exits
through the cleanup blocks as scopes are popped.
- The cleanup is represented as an ``EHCleanupScope`` on ``EHScopeStack``, and
its ``Emit`` method generates a loop that calls the destructor on the
initialized range in reverse order.
The above function names and flow are accurate as of LLVM 22.0, but this is
subject to change as the code evolves, and this document might not be updated to
reflect the exact functions used.
Example: Temporary object materialization
=========================================
Consider:
.. code-block:: c++
class MyClass {
public:
MyClass();
~MyClass();
};
void useMyClass(MyClass &);
void f() {
useMyClass(MyClass());
}
High-level behavior
-------------------
- The temporary ``MyClass`` is materialized for the call argument.
- The temporary must be destroyed at the end of the full-expression, both on
the normal path and on the exceptional path if ``useMyClass`` throws.
- If the constructor throws, the temporary is not considered constructed and no
destructor runs.
Codegen flow and key functions
------------------------------
- ``CodeGenFunction::EmitExprWithCleanups`` wraps the full-expression in a
``RunCleanupsScope`` so that full-expression cleanups are run after the call.
- ``CodeGenFunction::EmitMaterializeTemporaryExpr`` creates storage for the
temporary via ``createReferenceTemporary`` and initializes it. For record
temporaries this flows through ``EmitAnyExprToMem`` and
``CodeGenFunction::EmitCXXConstructExpr``, which calls
``CodeGenFunction::EmitCXXConstructorCall``.
- ``pushTemporaryCleanup`` registers the destructor as a full-expression
cleanup by calling ``CodeGenFunction::pushDestroy`` for
``SD_FullExpression`` temporaries.
- The cleanup ultimately uses ``DestroyObject`` and
``CodeGenFunction::destroyCXXObject``, which emits
``CodeGenFunction::EmitCXXDestructorCall``.
The above function names and flow are accurate as of LLVM 22.0, but this is
subject to change as the code evolves, and this document might not be updated to
reflect the exact functions used.