| .. SPDX-License-Identifier: GPL-2.0 | 
 | .. include:: <isonum.txt> | 
 |  | 
 | ========================= | 
 | System Suspend Code Flows | 
 | ========================= | 
 |  | 
 | :Copyright: |copy| 2020 Intel Corporation | 
 |  | 
 | :Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> | 
 |  | 
 | At least one global system-wide transition needs to be carried out for the | 
 | system to get from the working state into one of the supported | 
 | :doc:`sleep states <sleep-states>`.  Hibernation requires more than one | 
 | transition to occur for this purpose, but the other sleep states, commonly | 
 | referred to as *system-wide suspend* (or simply *system suspend*) states, need | 
 | only one. | 
 |  | 
 | For those sleep states, the transition from the working state of the system into | 
 | the target sleep state is referred to as *system suspend* too (in the majority | 
 | of cases, whether this means a transition or a sleep state of the system should | 
 | be clear from the context) and the transition back from the sleep state into the | 
 | working state is referred to as *system resume*. | 
 |  | 
 | The kernel code flows associated with the suspend and resume transitions for | 
 | different sleep states of the system are quite similar, but there are some | 
 | significant differences between the :ref:`suspend-to-idle <s2idle>` code flows | 
 | and the code flows related to the :ref:`suspend-to-RAM <s2ram>` and | 
 | :ref:`standby <standby>` sleep states. | 
 |  | 
 | The :ref:`suspend-to-RAM <s2ram>` and :ref:`standby <standby>` sleep states | 
 | cannot be implemented without platform support and the difference between them | 
 | boils down to the platform-specific actions carried out by the suspend and | 
 | resume hooks that need to be provided by the platform driver to make them | 
 | available.  Apart from that, the suspend and resume code flows for these sleep | 
 | states are mostly identical, so they both together will be referred to as | 
 | *platform-dependent suspend* states in what follows. | 
 |  | 
 |  | 
 | .. _s2idle_suspend: | 
 |  | 
 | Suspend-to-idle Suspend Code Flow | 
 | ================================= | 
 |  | 
 | The following steps are taken in order to transition the system from the working | 
 | state to the :ref:`suspend-to-idle <s2idle>` sleep state: | 
 |  | 
 |  1. Invoking system-wide suspend notifiers. | 
 |  | 
 |     Kernel subsystems can register callbacks to be invoked when the suspend | 
 |     transition is about to occur and when the resume transition has finished. | 
 |  | 
 |     That allows them to prepare for the change of the system state and to clean | 
 |     up after getting back to the working state. | 
 |  | 
 |  2. Freezing tasks. | 
 |  | 
 |     Tasks are frozen primarily in order to avoid unchecked hardware accesses | 
 |     from user space through MMIO regions or I/O registers exposed directly to | 
 |     it and to prevent user space from entering the kernel while the next step | 
 |     of the transition is in progress (which might have been problematic for | 
 |     various reasons). | 
 |  | 
 |     All user space tasks are intercepted as though they were sent a signal and | 
 |     put into uninterruptible sleep until the end of the subsequent system resume | 
 |     transition. | 
 |  | 
 |     The kernel threads that choose to be frozen during system suspend for | 
 |     specific reasons are frozen subsequently, but they are not intercepted. | 
 |     Instead, they are expected to periodically check whether or not they need | 
 |     to be frozen and to put themselves into uninterruptible sleep if so.  [Note, | 
 |     however, that kernel threads can use locking and other concurrency controls | 
 |     available in kernel space to synchronize themselves with system suspend and | 
 |     resume, which can be much more precise than the freezing, so the latter is | 
 |     not a recommended option for kernel threads.] | 
 |  | 
 |  3. Suspending devices and reconfiguring IRQs. | 
 |  | 
 |     Devices are suspended in four phases called *prepare*, *suspend*, | 
 |     *late suspend* and *noirq suspend* (see :ref:`driverapi_pm_devices` for more | 
 |     information on what exactly happens in each phase). | 
 |  | 
 |     Every device is visited in each phase, but typically it is not physically | 
 |     accessed in more than two of them. | 
 |  | 
 |     The runtime PM API is disabled for every device during the *late* suspend | 
 |     phase and high-level ("action") interrupt handlers are prevented from being | 
 |     invoked before the *noirq* suspend phase. | 
 |  | 
 |     Interrupts are still handled after that, but they are only acknowledged to | 
 |     interrupt controllers without performing any device-specific actions that | 
 |     would be triggered in the working state of the system (those actions are | 
 |     deferred till the subsequent system resume transition as described | 
 |     `below <s2idle_resume_>`_). | 
 |  | 
 |     IRQs associated with system wakeup devices are "armed" so that the resume | 
 |     transition of the system is started when one of them signals an event. | 
 |  | 
 |  4. Freezing the scheduler tick and suspending timekeeping. | 
 |  | 
 |     When all devices have been suspended, CPUs enter the idle loop and are put | 
 |     into the deepest available idle state.  While doing that, each of them | 
 |     "freezes" its own scheduler tick so that the timer events associated with | 
 |     the tick do not occur until the CPU is woken up by another interrupt source. | 
 |  | 
 |     The last CPU to enter the idle state also stops the timekeeping which | 
 |     (among other things) prevents high resolution timers from triggering going | 
 |     forward until the first CPU that is woken up restarts the timekeeping. | 
 |     That allows the CPUs to stay in the deep idle state relatively long in one | 
 |     go. | 
 |  | 
 |     From this point on, the CPUs can only be woken up by non-timer hardware | 
 |     interrupts.  If that happens, they go back to the idle state unless the | 
 |     interrupt that woke up one of them comes from an IRQ that has been armed for | 
 |     system wakeup, in which case the system resume transition is started. | 
 |  | 
 |  | 
 | .. _s2idle_resume: | 
 |  | 
 | Suspend-to-idle Resume Code Flow | 
 | ================================ | 
 |  | 
 | The following steps are taken in order to transition the system from the | 
 | :ref:`suspend-to-idle <s2idle>` sleep state into the working state: | 
 |  | 
 |  1. Resuming timekeeping and unfreezing the scheduler tick. | 
 |  | 
 |     When one of the CPUs is woken up (by a non-timer hardware interrupt), it | 
 |     leaves the idle state entered in the last step of the preceding suspend | 
 |     transition, restarts the timekeeping (unless it has been restarted already | 
 |     by another CPU that woke up earlier) and the scheduler tick on that CPU is | 
 |     unfrozen. | 
 |  | 
 |     If the interrupt that has woken up the CPU was armed for system wakeup, | 
 |     the system resume transition begins. | 
 |  | 
 |  2. Resuming devices and restoring the working-state configuration of IRQs. | 
 |  | 
 |     Devices are resumed in four phases called *noirq resume*, *early resume*, | 
 |     *resume* and *complete* (see :ref:`driverapi_pm_devices` for more | 
 |     information on what exactly happens in each phase). | 
 |  | 
 |     Every device is visited in each phase, but typically it is not physically | 
 |     accessed in more than two of them. | 
 |  | 
 |     The working-state configuration of IRQs is restored after the *noirq* resume | 
 |     phase and the runtime PM API is re-enabled for every device whose driver | 
 |     supports it during the *early* resume phase. | 
 |  | 
 |  3. Thawing tasks. | 
 |  | 
 |     Tasks frozen in step 2 of the preceding `suspend <s2idle_suspend_>`_ | 
 |     transition are "thawed", which means that they are woken up from the | 
 |     uninterruptible sleep that they went into at that time and user space tasks | 
 |     are allowed to exit the kernel. | 
 |  | 
 |  4. Invoking system-wide resume notifiers. | 
 |  | 
 |     This is analogous to step 1 of the `suspend <s2idle_suspend_>`_ transition | 
 |     and the same set of callbacks is invoked at this point, but a different | 
 |     "notification type" parameter value is passed to them. | 
 |  | 
 |  | 
 | Platform-dependent Suspend Code Flow | 
 | ==================================== | 
 |  | 
 | The following steps are taken in order to transition the system from the working | 
 | state to platform-dependent suspend state: | 
 |  | 
 |  1. Invoking system-wide suspend notifiers. | 
 |  | 
 |     This step is the same as step 1 of the suspend-to-idle suspend transition | 
 |     described `above <s2idle_suspend_>`_. | 
 |  | 
 |  2. Freezing tasks. | 
 |  | 
 |     This step is the same as step 2 of the suspend-to-idle suspend transition | 
 |     described `above <s2idle_suspend_>`_. | 
 |  | 
 |  3. Suspending devices and reconfiguring IRQs. | 
 |  | 
 |     This step is analogous to step 3 of the suspend-to-idle suspend transition | 
 |     described `above <s2idle_suspend_>`_, but the arming of IRQs for system | 
 |     wakeup generally does not have any effect on the platform. | 
 |  | 
 |     There are platforms that can go into a very deep low-power state internally | 
 |     when all CPUs in them are in sufficiently deep idle states and all I/O | 
 |     devices have been put into low-power states.  On those platforms, | 
 |     suspend-to-idle can reduce system power very effectively. | 
 |  | 
 |     On the other platforms, however, low-level components (like interrupt | 
 |     controllers) need to be turned off in a platform-specific way (implemented | 
 |     in the hooks provided by the platform driver) to achieve comparable power | 
 |     reduction. | 
 |  | 
 |     That usually prevents in-band hardware interrupts from waking up the system, | 
 |     which must be done in a special platform-dependent way.  Then, the | 
 |     configuration of system wakeup sources usually starts when system wakeup | 
 |     devices are suspended and is finalized by the platform suspend hooks later | 
 |     on. | 
 |  | 
 |  4. Disabling non-boot CPUs. | 
 |  | 
 |     On some platforms the suspend hooks mentioned above must run in a one-CPU | 
 |     configuration of the system (in particular, the hardware cannot be accessed | 
 |     by any code running in parallel with the platform suspend hooks that may, | 
 |     and often do, trap into the platform firmware in order to finalize the | 
 |     suspend transition). | 
 |  | 
 |     For this reason, the CPU offline/online (CPU hotplug) framework is used | 
 |     to take all of the CPUs in the system, except for one (the boot CPU), | 
 |     offline (typically, the CPUs that have been taken offline go into deep idle | 
 |     states). | 
 |  | 
 |     This means that all tasks are migrated away from those CPUs and all IRQs are | 
 |     rerouted to the only CPU that remains online. | 
 |  | 
 |  5. Suspending core system components. | 
 |  | 
 |     This prepares the core system components for (possibly) losing power going | 
 |     forward and suspends the timekeeping. | 
 |  | 
 |  6. Platform-specific power removal. | 
 |  | 
 |     This is expected to remove power from all of the system components except | 
 |     for the memory controller and RAM (in order to preserve the contents of the | 
 |     latter) and some devices designated for system wakeup. | 
 |  | 
 |     In many cases control is passed to the platform firmware which is expected | 
 |     to finalize the suspend transition as needed. | 
 |  | 
 |  | 
 | Platform-dependent Resume Code Flow | 
 | =================================== | 
 |  | 
 | The following steps are taken in order to transition the system from a | 
 | platform-dependent suspend state into the working state: | 
 |  | 
 |  1. Platform-specific system wakeup. | 
 |  | 
 |     The platform is woken up by a signal from one of the designated system | 
 |     wakeup devices (which need not be an in-band hardware interrupt)  and | 
 |     control is passed back to the kernel (the working configuration of the | 
 |     platform may need to be restored by the platform firmware before the | 
 |     kernel gets control again). | 
 |  | 
 |  2. Resuming core system components. | 
 |  | 
 |     The suspend-time configuration of the core system components is restored and | 
 |     the timekeeping is resumed. | 
 |  | 
 |  3. Re-enabling non-boot CPUs. | 
 |  | 
 |     The CPUs disabled in step 4 of the preceding suspend transition are taken | 
 |     back online and their suspend-time configuration is restored. | 
 |  | 
 |  4. Resuming devices and restoring the working-state configuration of IRQs. | 
 |  | 
 |     This step is the same as step 2 of the suspend-to-idle suspend transition | 
 |     described `above <s2idle_resume_>`_. | 
 |  | 
 |  5. Thawing tasks. | 
 |  | 
 |     This step is the same as step 3 of the suspend-to-idle suspend transition | 
 |     described `above <s2idle_resume_>`_. | 
 |  | 
 |  6. Invoking system-wide resume notifiers. | 
 |  | 
 |     This step is the same as step 4 of the suspend-to-idle suspend transition | 
 |     described `above <s2idle_resume_>`_. |