|  | .. SPDX-License-Identifier: GPL-2.0 | 
|  |  | 
|  | =================================== | 
|  | Running BPF programs from userspace | 
|  | =================================== | 
|  |  | 
|  | This document describes the ``BPF_PROG_RUN`` facility for running BPF programs | 
|  | from userspace. | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  | :depth: 2 | 
|  |  | 
|  |  | 
|  | Overview | 
|  | -------- | 
|  |  | 
|  | The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to | 
|  | execute a BPF program in the kernel and return the results to userspace. This | 
|  | can be used to unit test BPF programs against user-supplied context objects, and | 
|  | as way to explicitly execute programs in the kernel for their side effects. The | 
|  | command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue | 
|  | to be defined in the UAPI header, aliased to the same value. | 
|  |  | 
|  | The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the | 
|  | following types: | 
|  |  | 
|  | - ``BPF_PROG_TYPE_SOCKET_FILTER`` | 
|  | - ``BPF_PROG_TYPE_SCHED_CLS`` | 
|  | - ``BPF_PROG_TYPE_SCHED_ACT`` | 
|  | - ``BPF_PROG_TYPE_XDP`` | 
|  | - ``BPF_PROG_TYPE_SK_LOOKUP`` | 
|  | - ``BPF_PROG_TYPE_CGROUP_SKB`` | 
|  | - ``BPF_PROG_TYPE_LWT_IN`` | 
|  | - ``BPF_PROG_TYPE_LWT_OUT`` | 
|  | - ``BPF_PROG_TYPE_LWT_XMIT`` | 
|  | - ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` | 
|  | - ``BPF_PROG_TYPE_FLOW_DISSECTOR`` | 
|  | - ``BPF_PROG_TYPE_STRUCT_OPS`` | 
|  | - ``BPF_PROG_TYPE_RAW_TRACEPOINT`` | 
|  | - ``BPF_PROG_TYPE_SYSCALL`` | 
|  |  | 
|  | When using the ``BPF_PROG_RUN`` command, userspace supplies an input context | 
|  | object and (for program types operating on network packets) a buffer containing | 
|  | the packet data that the BPF program will operate on. The kernel will then | 
|  | execute the program and return the results to userspace. Note that programs will | 
|  | not have any side effects while being run in this mode; in particular, packets | 
|  | will not actually be redirected or dropped, the program return code will just be | 
|  | returned to userspace. A separate mode for live execution of XDP programs is | 
|  | provided, documented separately below. | 
|  |  | 
|  | Running XDP programs in "live frame mode" | 
|  | ----------------------------------------- | 
|  |  | 
|  | The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs, | 
|  | which can be used to execute XDP programs in a way where packets will actually | 
|  | be processed by the kernel after the execution of the XDP program as if they | 
|  | arrived on a physical interface. This mode is activated by setting the | 
|  | ``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to | 
|  | ``BPF_PROG_RUN``. | 
|  |  | 
|  | The live packet mode is optimised for high performance execution of the supplied | 
|  | XDP program many times (suitable for, e.g., running as a traffic generator), | 
|  | which means the semantics are not quite as straight-forward as the regular test | 
|  | run mode. Specifically: | 
|  |  | 
|  | - When executing an XDP program in live frame mode, the result of the execution | 
|  | will not be returned to userspace; instead, the kernel will perform the | 
|  | operation indicated by the program's return code (drop the packet, redirect | 
|  | it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes | 
|  | in the syscall parameters when running in this mode will be rejected. In | 
|  | addition, not all failures will be reported back to userspace directly; | 
|  | specifically, only fatal errors in setup or during execution (like memory | 
|  | allocation errors) will halt execution and return an error. If an error occurs | 
|  | in packet processing, like a failure to redirect to a given interface, | 
|  | execution will continue with the next repetition; these errors can be detected | 
|  | via the same trace points as for regular XDP programs. | 
|  |  | 
|  | - Userspace can supply an ifindex as part of the context object, just like in | 
|  | the regular (non-live) mode. The XDP program will be executed as though the | 
|  | packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context | 
|  | object will point to that interface. Furthermore, if the XDP program returns | 
|  | ``XDP_PASS``, the packet will be injected into the kernel networking stack as | 
|  | though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet | 
|  | will be transmitted *out* of that same interface. Do note, though, that | 
|  | because the program execution is not happening in driver context, an | 
|  | ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to | 
|  | that same interface (i.e., it will only work if the driver has support for the | 
|  | ``ndo_xdp_xmit`` driver op). | 
|  |  | 
|  | - When running the program with multiple repetitions, the execution will happen | 
|  | in batches. The batch size defaults to 64 packets (which is same as the | 
|  | maximum NAPI receive batch size), but can be specified by userspace through | 
|  | the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch, | 
|  | the kernel executes the XDP program repeatedly, each invocation getting a | 
|  | separate copy of the packet data. For each repetition, if the program drops | 
|  | the packet, the data page is immediately recycled (see below). Otherwise, the | 
|  | packet is buffered until the end of the batch, at which point all packets | 
|  | buffered this way during the batch are transmitted at once. | 
|  |  | 
|  | - When setting up the test run, the kernel will initialise a pool of memory | 
|  | pages of the same size as the batch size. Each memory page will be initialised | 
|  | with the initial packet data supplied by userspace at ``BPF_PROG_RUN`` | 
|  | invocation. When possible, the pages will be recycled on future program | 
|  | invocations, to improve performance. Pages will generally be recycled a full | 
|  | batch at a time, except when a packet is dropped (by return code or because | 
|  | of, say, a redirection error), in which case that page will be recycled | 
|  | immediately. If a packet ends up being passed to the regular networking stack | 
|  | (because the XDP program returns ``XDP_PASS``, or because it ends up being | 
|  | redirected to an interface that injects it into the stack), the page will be | 
|  | released and a new one will be allocated when the pool is empty. | 
|  |  | 
|  | When recycling, the page content is not rewritten; only the packet boundary | 
|  | pointers (``data``, ``data_end`` and ``data_meta``) in the context object will | 
|  | be reset to the original values. This means that if a program rewrites the | 
|  | packet contents, it has to be prepared to see either the original content or | 
|  | the modified version on subsequent invocations. |