| L1TF - L1 Terminal Fault | 
 | ======================== | 
 |  | 
 | L1 Terminal Fault is a hardware vulnerability which allows unprivileged | 
 | speculative access to data which is available in the Level 1 Data Cache | 
 | when the page table entry controlling the virtual address, which is used | 
 | for the access, has the Present bit cleared or other reserved bits set. | 
 |  | 
 | Affected processors | 
 | ------------------- | 
 |  | 
 | This vulnerability affects a wide range of Intel processors. The | 
 | vulnerability is not present on: | 
 |  | 
 |    - Processors from AMD, Centaur and other non Intel vendors | 
 |  | 
 |    - Older processor models, where the CPU family is < 6 | 
 |  | 
 |    - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft, | 
 |      Penwell, Pineview, Silvermont, Airmont, Merrifield) | 
 |  | 
 |    - The Intel XEON PHI family | 
 |  | 
 |    - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the | 
 |      IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected | 
 |      by the Meltdown vulnerability either. These CPUs should become | 
 |      available by end of 2018. | 
 |  | 
 | Whether a processor is affected or not can be read out from the L1TF | 
 | vulnerability file in sysfs. See :ref:`l1tf_sys_info`. | 
 |  | 
 | Related CVEs | 
 | ------------ | 
 |  | 
 | The following CVE entries are related to the L1TF vulnerability: | 
 |  | 
 |    =============  =================  ============================== | 
 |    CVE-2018-3615  L1 Terminal Fault  SGX related aspects | 
 |    CVE-2018-3620  L1 Terminal Fault  OS, SMM related aspects | 
 |    CVE-2018-3646  L1 Terminal Fault  Virtualization related aspects | 
 |    =============  =================  ============================== | 
 |  | 
 | Problem | 
 | ------- | 
 |  | 
 | If an instruction accesses a virtual address for which the relevant page | 
 | table entry (PTE) has the Present bit cleared or other reserved bits set, | 
 | then speculative execution ignores the invalid PTE and loads the referenced | 
 | data if it is present in the Level 1 Data Cache, as if the page referenced | 
 | by the address bits in the PTE was still present and accessible. | 
 |  | 
 | While this is a purely speculative mechanism and the instruction will raise | 
 | a page fault when it is retired eventually, the pure act of loading the | 
 | data and making it available to other speculative instructions opens up the | 
 | opportunity for side channel attacks to unprivileged malicious code, | 
 | similar to the Meltdown attack. | 
 |  | 
 | While Meltdown breaks the user space to kernel space protection, L1TF | 
 | allows to attack any physical memory address in the system and the attack | 
 | works across all protection domains. It allows an attack of SGX and also | 
 | works from inside virtual machines because the speculation bypasses the | 
 | extended page table (EPT) protection mechanism. | 
 |  | 
 |  | 
 | Attack scenarios | 
 | ---------------- | 
 |  | 
 | 1. Malicious user space | 
 | ^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 |    Operating Systems store arbitrary information in the address bits of a | 
 |    PTE which is marked non present. This allows a malicious user space | 
 |    application to attack the physical memory to which these PTEs resolve. | 
 |    In some cases user-space can maliciously influence the information | 
 |    encoded in the address bits of the PTE, thus making attacks more | 
 |    deterministic and more practical. | 
 |  | 
 |    The Linux kernel contains a mitigation for this attack vector, PTE | 
 |    inversion, which is permanently enabled and has no performance | 
 |    impact. The kernel ensures that the address bits of PTEs, which are not | 
 |    marked present, never point to cacheable physical memory space. | 
 |  | 
 |    A system with an up to date kernel is protected against attacks from | 
 |    malicious user space applications. | 
 |  | 
 | 2. Malicious guest in a virtual machine | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 |    The fact that L1TF breaks all domain protections allows malicious guest | 
 |    OSes, which can control the PTEs directly, and malicious guest user | 
 |    space applications, which run on an unprotected guest kernel lacking the | 
 |    PTE inversion mitigation for L1TF, to attack physical host memory. | 
 |  | 
 |    A special aspect of L1TF in the context of virtualization is symmetric | 
 |    multi threading (SMT). The Intel implementation of SMT is called | 
 |    HyperThreading. The fact that Hyperthreads on the affected processors | 
 |    share the L1 Data Cache (L1D) is important for this. As the flaw allows | 
 |    only to attack data which is present in L1D, a malicious guest running | 
 |    on one Hyperthread can attack the data which is brought into the L1D by | 
 |    the context which runs on the sibling Hyperthread of the same physical | 
 |    core. This context can be host OS, host user space or a different guest. | 
 |  | 
 |    If the processor does not support Extended Page Tables, the attack is | 
 |    only possible, when the hypervisor does not sanitize the content of the | 
 |    effective (shadow) page tables. | 
 |  | 
 |    While solutions exist to mitigate these attack vectors fully, these | 
 |    mitigations are not enabled by default in the Linux kernel because they | 
 |    can affect performance significantly. The kernel provides several | 
 |    mechanisms which can be utilized to address the problem depending on the | 
 |    deployment scenario. The mitigations, their protection scope and impact | 
 |    are described in the next sections. | 
 |  | 
 |    The default mitigations and the rationale for choosing them are explained | 
 |    at the end of this document. See :ref:`default_mitigations`. | 
 |  | 
 | .. _l1tf_sys_info: | 
 |  | 
 | L1TF system information | 
 | ----------------------- | 
 |  | 
 | The Linux kernel provides a sysfs interface to enumerate the current L1TF | 
 | status of the system: whether the system is vulnerable, and which | 
 | mitigations are active. The relevant sysfs file is: | 
 |  | 
 | /sys/devices/system/cpu/vulnerabilities/l1tf | 
 |  | 
 | The possible values in this file are: | 
 |  | 
 |   ===========================   =============================== | 
 |   'Not affected'		The processor is not vulnerable | 
 |   'Mitigation: PTE Inversion'	The host protection is active | 
 |   ===========================   =============================== | 
 |  | 
 | If KVM/VMX is enabled and the processor is vulnerable then the following | 
 | information is appended to the 'Mitigation: PTE Inversion' part: | 
 |  | 
 |   - SMT status: | 
 |  | 
 |     =====================  ================ | 
 |     'VMX: SMT vulnerable'  SMT is enabled | 
 |     'VMX: SMT disabled'    SMT is disabled | 
 |     =====================  ================ | 
 |  | 
 |   - L1D Flush mode: | 
 |  | 
 |     ================================  ==================================== | 
 |     'L1D vulnerable'		      L1D flushing is disabled | 
 |  | 
 |     'L1D conditional cache flushes'   L1D flush is conditionally enabled | 
 |  | 
 |     'L1D cache flushes'		      L1D flush is unconditionally enabled | 
 |     ================================  ==================================== | 
 |  | 
 | The resulting grade of protection is discussed in the following sections. | 
 |  | 
 |  | 
 | Host mitigation mechanism | 
 | ------------------------- | 
 |  | 
 | The kernel is unconditionally protected against L1TF attacks from malicious | 
 | user space running on the host. | 
 |  | 
 |  | 
 | Guest mitigation mechanisms | 
 | --------------------------- | 
 |  | 
 | .. _l1d_flush: | 
 |  | 
 | 1. L1D flush on VMENTER | 
 | ^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 |    To make sure that a guest cannot attack data which is present in the L1D | 
 |    the hypervisor flushes the L1D before entering the guest. | 
 |  | 
 |    Flushing the L1D evicts not only the data which should not be accessed | 
 |    by a potentially malicious guest, it also flushes the guest | 
 |    data. Flushing the L1D has a performance impact as the processor has to | 
 |    bring the flushed guest data back into the L1D. Depending on the | 
 |    frequency of VMEXIT/VMENTER and the type of computations in the guest | 
 |    performance degradation in the range of 1% to 50% has been observed. For | 
 |    scenarios where guest VMEXIT/VMENTER are rare the performance impact is | 
 |    minimal. Virtio and mechanisms like posted interrupts are designed to | 
 |    confine the VMEXITs to a bare minimum, but specific configurations and | 
 |    application scenarios might still suffer from a high VMEXIT rate. | 
 |  | 
 |    The kernel provides two L1D flush modes: | 
 |     - conditional ('cond') | 
 |     - unconditional ('always') | 
 |  | 
 |    The conditional mode avoids L1D flushing after VMEXITs which execute | 
 |    only audited code paths before the corresponding VMENTER. These code | 
 |    paths have been verified that they cannot expose secrets or other | 
 |    interesting data to an attacker, but they can leak information about the | 
 |    address space layout of the hypervisor. | 
 |  | 
 |    Unconditional mode flushes L1D on all VMENTER invocations and provides | 
 |    maximum protection. It has a higher overhead than the conditional | 
 |    mode. The overhead cannot be quantified correctly as it depends on the | 
 |    workload scenario and the resulting number of VMEXITs. | 
 |  | 
 |    The general recommendation is to enable L1D flush on VMENTER. The kernel | 
 |    defaults to conditional mode on affected processors. | 
 |  | 
 |    **Note**, that L1D flush does not prevent the SMT problem because the | 
 |    sibling thread will also bring back its data into the L1D which makes it | 
 |    attackable again. | 
 |  | 
 |    L1D flush can be controlled by the administrator via the kernel command | 
 |    line and sysfs control files. See :ref:`mitigation_control_command_line` | 
 |    and :ref:`mitigation_control_kvm`. | 
 |  | 
 | .. _guest_confinement: | 
 |  | 
 | 2. Guest VCPU confinement to dedicated physical cores | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 |    To address the SMT problem, it is possible to make a guest or a group of | 
 |    guests affine to one or more physical cores. The proper mechanism for | 
 |    that is to utilize exclusive cpusets to ensure that no other guest or | 
 |    host tasks can run on these cores. | 
 |  | 
 |    If only a single guest or related guests run on sibling SMT threads on | 
 |    the same physical core then they can only attack their own memory and | 
 |    restricted parts of the host memory. | 
 |  | 
 |    Host memory is attackable, when one of the sibling SMT threads runs in | 
 |    host OS (hypervisor) context and the other in guest context. The amount | 
 |    of valuable information from the host OS context depends on the context | 
 |    which the host OS executes, i.e. interrupts, soft interrupts and kernel | 
 |    threads. The amount of valuable data from these contexts cannot be | 
 |    declared as non-interesting for an attacker without deep inspection of | 
 |    the code. | 
 |  | 
 |    **Note**, that assigning guests to a fixed set of physical cores affects | 
 |    the ability of the scheduler to do load balancing and might have | 
 |    negative effects on CPU utilization depending on the hosting | 
 |    scenario. Disabling SMT might be a viable alternative for particular | 
 |    scenarios. | 
 |  | 
 |    For further information about confining guests to a single or to a group | 
 |    of cores consult the cpusets documentation: | 
 |  | 
 |    https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v1/cpusets.rst | 
 |  | 
 | .. _interrupt_isolation: | 
 |  | 
 | 3. Interrupt affinity | 
 | ^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 |    Interrupts can be made affine to logical CPUs. This is not universally | 
 |    true because there are types of interrupts which are truly per CPU | 
 |    interrupts, e.g. the local timer interrupt. Aside of that multi queue | 
 |    devices affine their interrupts to single CPUs or groups of CPUs per | 
 |    queue without allowing the administrator to control the affinities. | 
 |  | 
 |    Moving the interrupts, which can be affinity controlled, away from CPUs | 
 |    which run untrusted guests, reduces the attack vector space. | 
 |  | 
 |    Whether the interrupts with are affine to CPUs, which run untrusted | 
 |    guests, provide interesting data for an attacker depends on the system | 
 |    configuration and the scenarios which run on the system. While for some | 
 |    of the interrupts it can be assumed that they won't expose interesting | 
 |    information beyond exposing hints about the host OS memory layout, there | 
 |    is no way to make general assumptions. | 
 |  | 
 |    Interrupt affinity can be controlled by the administrator via the | 
 |    /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is | 
 |    available at: | 
 |  | 
 |    https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst | 
 |  | 
 | .. _smt_control: | 
 |  | 
 | 4. SMT control | 
 | ^^^^^^^^^^^^^^ | 
 |  | 
 |    To prevent the SMT issues of L1TF it might be necessary to disable SMT | 
 |    completely. Disabling SMT can have a significant performance impact, but | 
 |    the impact depends on the hosting scenario and the type of workloads. | 
 |    The impact of disabling SMT needs also to be weighted against the impact | 
 |    of other mitigation solutions like confining guests to dedicated cores. | 
 |  | 
 |    The kernel provides a sysfs interface to retrieve the status of SMT and | 
 |    to control it. It also provides a kernel command line interface to | 
 |    control SMT. | 
 |  | 
 |    The kernel command line interface consists of the following options: | 
 |  | 
 |      =========== ========================================================== | 
 |      nosmt	 Affects the bring up of the secondary CPUs during boot. The | 
 | 		 kernel tries to bring all present CPUs online during the | 
 | 		 boot process. "nosmt" makes sure that from each physical | 
 | 		 core only one - the so called primary (hyper) thread is | 
 | 		 activated. Due to a design flaw of Intel processors related | 
 | 		 to Machine Check Exceptions the non primary siblings have | 
 | 		 to be brought up at least partially and are then shut down | 
 | 		 again.  "nosmt" can be undone via the sysfs interface. | 
 |  | 
 |      nosmt=force Has the same effect as "nosmt" but it does not allow to | 
 | 		 undo the SMT disable via the sysfs interface. | 
 |      =========== ========================================================== | 
 |  | 
 |    The sysfs interface provides two files: | 
 |  | 
 |    - /sys/devices/system/cpu/smt/control | 
 |    - /sys/devices/system/cpu/smt/active | 
 |  | 
 |    /sys/devices/system/cpu/smt/control: | 
 |  | 
 |      This file allows to read out the SMT control state and provides the | 
 |      ability to disable or (re)enable SMT. The possible states are: | 
 |  | 
 | 	==============  =================================================== | 
 | 	on		SMT is supported by the CPU and enabled. All | 
 | 			logical CPUs can be onlined and offlined without | 
 | 			restrictions. | 
 |  | 
 | 	off		SMT is supported by the CPU and disabled. Only | 
 | 			the so called primary SMT threads can be onlined | 
 | 			and offlined without restrictions. An attempt to | 
 | 			online a non-primary sibling is rejected | 
 |  | 
 | 	forceoff	Same as 'off' but the state cannot be controlled. | 
 | 			Attempts to write to the control file are rejected. | 
 |  | 
 | 	notsupported	The processor does not support SMT. It's therefore | 
 | 			not affected by the SMT implications of L1TF. | 
 | 			Attempts to write to the control file are rejected. | 
 | 	==============  =================================================== | 
 |  | 
 |      The possible states which can be written into this file to control SMT | 
 |      state are: | 
 |  | 
 |      - on | 
 |      - off | 
 |      - forceoff | 
 |  | 
 |    /sys/devices/system/cpu/smt/active: | 
 |  | 
 |      This file reports whether SMT is enabled and active, i.e. if on any | 
 |      physical core two or more sibling threads are online. | 
 |  | 
 |    SMT control is also possible at boot time via the l1tf kernel command | 
 |    line parameter in combination with L1D flush control. See | 
 |    :ref:`mitigation_control_command_line`. | 
 |  | 
 | 5. Disabling EPT | 
 | ^^^^^^^^^^^^^^^^ | 
 |  | 
 |   Disabling EPT for virtual machines provides full mitigation for L1TF even | 
 |   with SMT enabled, because the effective page tables for guests are | 
 |   managed and sanitized by the hypervisor. Though disabling EPT has a | 
 |   significant performance impact especially when the Meltdown mitigation | 
 |   KPTI is enabled. | 
 |  | 
 |   EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. | 
 |  | 
 | There is ongoing research and development for new mitigation mechanisms to | 
 | address the performance impact of disabling SMT or EPT. | 
 |  | 
 | .. _mitigation_control_command_line: | 
 |  | 
 | Mitigation control on the kernel command line | 
 | --------------------------------------------- | 
 |  | 
 | The kernel command line allows to control the L1TF mitigations at boot | 
 | time with the option "l1tf=". The valid arguments for this option are: | 
 |  | 
 |   ============  ============================================================= | 
 |   full		Provides all available mitigations for the L1TF | 
 | 		vulnerability. Disables SMT and enables all mitigations in | 
 | 		the hypervisors, i.e. unconditional L1D flushing | 
 |  | 
 | 		SMT control and L1D flush control via the sysfs interface | 
 | 		is still possible after boot.  Hypervisors will issue a | 
 | 		warning when the first VM is started in a potentially | 
 | 		insecure configuration, i.e. SMT enabled or L1D flush | 
 | 		disabled. | 
 |  | 
 |   full,force	Same as 'full', but disables SMT and L1D flush runtime | 
 | 		control. Implies the 'nosmt=force' command line option. | 
 | 		(i.e. sysfs control of SMT is disabled.) | 
 |  | 
 |   flush		Leaves SMT enabled and enables the default hypervisor | 
 | 		mitigation, i.e. conditional L1D flushing | 
 |  | 
 | 		SMT control and L1D flush control via the sysfs interface | 
 | 		is still possible after boot.  Hypervisors will issue a | 
 | 		warning when the first VM is started in a potentially | 
 | 		insecure configuration, i.e. SMT enabled or L1D flush | 
 | 		disabled. | 
 |  | 
 |   flush,nosmt	Disables SMT and enables the default hypervisor mitigation, | 
 | 		i.e. conditional L1D flushing. | 
 |  | 
 | 		SMT control and L1D flush control via the sysfs interface | 
 | 		is still possible after boot.  Hypervisors will issue a | 
 | 		warning when the first VM is started in a potentially | 
 | 		insecure configuration, i.e. SMT enabled or L1D flush | 
 | 		disabled. | 
 |  | 
 |   flush,nowarn	Same as 'flush', but hypervisors will not warn when a VM is | 
 | 		started in a potentially insecure configuration. | 
 |  | 
 |   off		Disables hypervisor mitigations and doesn't emit any | 
 | 		warnings. | 
 | 		It also drops the swap size and available RAM limit restrictions | 
 | 		on both hypervisor and bare metal. | 
 |  | 
 |   ============  ============================================================= | 
 |  | 
 | The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`. | 
 |  | 
 |  | 
 | .. _mitigation_control_kvm: | 
 |  | 
 | Mitigation control for KVM - module parameter | 
 | ------------------------------------------------------------- | 
 |  | 
 | The KVM hypervisor mitigation mechanism, flushing the L1D cache when | 
 | entering a guest, can be controlled with a module parameter. | 
 |  | 
 | The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the | 
 | following arguments: | 
 |  | 
 |   ============  ============================================================== | 
 |   always	L1D cache flush on every VMENTER. | 
 |  | 
 |   cond		Flush L1D on VMENTER only when the code between VMEXIT and | 
 | 		VMENTER can leak host memory which is considered | 
 | 		interesting for an attacker. This still can leak host memory | 
 | 		which allows e.g. to determine the hosts address space layout. | 
 |  | 
 |   never		Disables the mitigation | 
 |   ============  ============================================================== | 
 |  | 
 | The parameter can be provided on the kernel command line, as a module | 
 | parameter when loading the modules and at runtime modified via the sysfs | 
 | file: | 
 |  | 
 | /sys/module/kvm_intel/parameters/vmentry_l1d_flush | 
 |  | 
 | The default is 'cond'. If 'l1tf=full,force' is given on the kernel command | 
 | line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush | 
 | module parameter is ignored and writes to the sysfs file are rejected. | 
 |  | 
 | .. _mitigation_selection: | 
 |  | 
 | Mitigation selection guide | 
 | -------------------------- | 
 |  | 
 | 1. No virtualization in use | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 |    The system is protected by the kernel unconditionally and no further | 
 |    action is required. | 
 |  | 
 | 2. Virtualization with trusted guests | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 |    If the guest comes from a trusted source and the guest OS kernel is | 
 |    guaranteed to have the L1TF mitigations in place the system is fully | 
 |    protected against L1TF and no further action is required. | 
 |  | 
 |    To avoid the overhead of the default L1D flushing on VMENTER the | 
 |    administrator can disable the flushing via the kernel command line and | 
 |    sysfs control files. See :ref:`mitigation_control_command_line` and | 
 |    :ref:`mitigation_control_kvm`. | 
 |  | 
 |  | 
 | 3. Virtualization with untrusted guests | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | 3.1. SMT not supported or disabled | 
 | """""""""""""""""""""""""""""""""" | 
 |  | 
 |   If SMT is not supported by the processor or disabled in the BIOS or by | 
 |   the kernel, it's only required to enforce L1D flushing on VMENTER. | 
 |  | 
 |   Conditional L1D flushing is the default behaviour and can be tuned. See | 
 |   :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. | 
 |  | 
 | 3.2. EPT not supported or disabled | 
 | """""""""""""""""""""""""""""""""" | 
 |  | 
 |   If EPT is not supported by the processor or disabled in the hypervisor, | 
 |   the system is fully protected. SMT can stay enabled and L1D flushing on | 
 |   VMENTER is not required. | 
 |  | 
 |   EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. | 
 |  | 
 | 3.3. SMT and EPT supported and active | 
 | """"""""""""""""""""""""""""""""""""" | 
 |  | 
 |   If SMT and EPT are supported and active then various degrees of | 
 |   mitigations can be employed: | 
 |  | 
 |   - L1D flushing on VMENTER: | 
 |  | 
 |     L1D flushing on VMENTER is the minimal protection requirement, but it | 
 |     is only potent in combination with other mitigation methods. | 
 |  | 
 |     Conditional L1D flushing is the default behaviour and can be tuned. See | 
 |     :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. | 
 |  | 
 |   - Guest confinement: | 
 |  | 
 |     Confinement of guests to a single or a group of physical cores which | 
 |     are not running any other processes, can reduce the attack surface | 
 |     significantly, but interrupts, soft interrupts and kernel threads can | 
 |     still expose valuable data to a potential attacker. See | 
 |     :ref:`guest_confinement`. | 
 |  | 
 |   - Interrupt isolation: | 
 |  | 
 |     Isolating the guest CPUs from interrupts can reduce the attack surface | 
 |     further, but still allows a malicious guest to explore a limited amount | 
 |     of host physical memory. This can at least be used to gain knowledge | 
 |     about the host address space layout. The interrupts which have a fixed | 
 |     affinity to the CPUs which run the untrusted guests can depending on | 
 |     the scenario still trigger soft interrupts and schedule kernel threads | 
 |     which might expose valuable information. See | 
 |     :ref:`interrupt_isolation`. | 
 |  | 
 | The above three mitigation methods combined can provide protection to a | 
 | certain degree, but the risk of the remaining attack surface has to be | 
 | carefully analyzed. For full protection the following methods are | 
 | available: | 
 |  | 
 |   - Disabling SMT: | 
 |  | 
 |     Disabling SMT and enforcing the L1D flushing provides the maximum | 
 |     amount of protection. This mitigation is not depending on any of the | 
 |     above mitigation methods. | 
 |  | 
 |     SMT control and L1D flushing can be tuned by the command line | 
 |     parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run | 
 |     time with the matching sysfs control files. See :ref:`smt_control`, | 
 |     :ref:`mitigation_control_command_line` and | 
 |     :ref:`mitigation_control_kvm`. | 
 |  | 
 |   - Disabling EPT: | 
 |  | 
 |     Disabling EPT provides the maximum amount of protection as well. It is | 
 |     not depending on any of the above mitigation methods. SMT can stay | 
 |     enabled and L1D flushing is not required, but the performance impact is | 
 |     significant. | 
 |  | 
 |     EPT can be disabled in the hypervisor via the 'kvm-intel.ept' | 
 |     parameter. | 
 |  | 
 | 3.4. Nested virtual machines | 
 | """""""""""""""""""""""""""" | 
 |  | 
 | When nested virtualization is in use, three operating systems are involved: | 
 | the bare metal hypervisor, the nested hypervisor and the nested virtual | 
 | machine.  VMENTER operations from the nested hypervisor into the nested | 
 | guest will always be processed by the bare metal hypervisor. If KVM is the | 
 | bare metal hypervisor it will: | 
 |  | 
 |  - Flush the L1D cache on every switch from the nested hypervisor to the | 
 |    nested virtual machine, so that the nested hypervisor's secrets are not | 
 |    exposed to the nested virtual machine; | 
 |  | 
 |  - Flush the L1D cache on every switch from the nested virtual machine to | 
 |    the nested hypervisor; this is a complex operation, and flushing the L1D | 
 |    cache avoids that the bare metal hypervisor's secrets are exposed to the | 
 |    nested virtual machine; | 
 |  | 
 |  - Instruct the nested hypervisor to not perform any L1D cache flush. This | 
 |    is an optimization to avoid double L1D flushing. | 
 |  | 
 |  | 
 | .. _default_mitigations: | 
 |  | 
 | Default mitigations | 
 | ------------------- | 
 |  | 
 |   The kernel default mitigations for vulnerable processors are: | 
 |  | 
 |   - PTE inversion to protect against malicious user space. This is done | 
 |     unconditionally and cannot be controlled. The swap storage is limited | 
 |     to ~16TB. | 
 |  | 
 |   - L1D conditional flushing on VMENTER when EPT is enabled for | 
 |     a guest. | 
 |  | 
 |   The kernel does not by default enforce the disabling of SMT, which leaves | 
 |   SMT systems vulnerable when running untrusted guests with EPT enabled. | 
 |  | 
 |   The rationale for this choice is: | 
 |  | 
 |   - Force disabling SMT can break existing setups, especially with | 
 |     unattended updates. | 
 |  | 
 |   - If regular users run untrusted guests on their machine, then L1TF is | 
 |     just an add on to other malware which might be embedded in an untrusted | 
 |     guest, e.g. spam-bots or attacks on the local network. | 
 |  | 
 |     There is no technical way to prevent a user from running untrusted code | 
 |     on their machines blindly. | 
 |  | 
 |   - It's technically extremely unlikely and from today's knowledge even | 
 |     impossible that L1TF can be exploited via the most popular attack | 
 |     mechanisms like JavaScript because these mechanisms have no way to | 
 |     control PTEs. If this would be possible and not other mitigation would | 
 |     be possible, then the default might be different. | 
 |  | 
 |   - The administrators of cloud and hosting setups have to carefully | 
 |     analyze the risk for their scenarios and make the appropriate | 
 |     mitigation choices, which might even vary across their deployed | 
 |     machines and also result in other changes of their overall setup. | 
 |     There is no way for the kernel to provide a sensible default for this | 
 |     kind of scenarios. |