| .. SPDX-License-Identifier: GPL-2.0+ | 
 |  | 
 | ====================================================== | 
 | IBM Virtual Management Channel Kernel Driver (IBMVMC) | 
 | ====================================================== | 
 |  | 
 | :Authors: | 
 | 	Dave Engebretsen <engebret@us.ibm.com>, | 
 | 	Adam Reznechek <adreznec@linux.vnet.ibm.com>, | 
 | 	Steven Royer <seroyer@linux.vnet.ibm.com>, | 
 | 	Bryant G. Ly <bryantly@linux.vnet.ibm.com>, | 
 |  | 
 | Introduction | 
 | ============ | 
 |  | 
 | Note: Knowledge of virtualization technology is required to understand | 
 | this document. | 
 |  | 
 | A good reference document would be: | 
 |  | 
 | https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf | 
 |  | 
 | The Virtual Management Channel (VMC) is a logical device which provides an | 
 | interface between the hypervisor and a management partition. This interface | 
 | is like a message passing interface. This management partition is intended | 
 | to provide an alternative to systems that use a Hardware Management | 
 | Console (HMC) - based system management. | 
 |  | 
 | The primary hardware management solution that is developed by IBM relies | 
 | on an appliance server named the Hardware Management Console (HMC), | 
 | packaged as an external tower or rack-mounted personal computer. In a | 
 | Power Systems environment, a single HMC can manage multiple POWER | 
 | processor-based systems. | 
 |  | 
 | Management Application | 
 | ---------------------- | 
 |  | 
 | In the management partition, a management application exists which enables | 
 | a system administrator to configure the system’s partitioning | 
 | characteristics via a command line interface (CLI) or Representational | 
 | State Transfer Application (REST API's). | 
 |  | 
 | The management application runs on a Linux logical partition on a | 
 | POWER8 or newer processor-based server that is virtualized by PowerVM. | 
 | System configuration, maintenance, and control functions which | 
 | traditionally require an HMC can be implemented in the management | 
 | application using a combination of HMC to hypervisor interfaces and | 
 | existing operating system methods. This tool provides a subset of the | 
 | functions implemented by the HMC and enables basic partition configuration. | 
 | The set of HMC to hypervisor messages supported by the management | 
 | application component are passed to the hypervisor over a VMC interface, | 
 | which is defined below. | 
 |  | 
 | The VMC enables the management partition to provide basic partitioning | 
 | functions: | 
 |  | 
 | - Logical Partitioning Configuration | 
 | - Start, and stop actions for individual partitions | 
 | - Display of partition status | 
 | - Management of virtual Ethernet | 
 | - Management of virtual Storage | 
 | - Basic system management | 
 |  | 
 | Virtual Management Channel (VMC) | 
 | -------------------------------- | 
 |  | 
 | A logical device, called the Virtual Management Channel (VMC), is defined | 
 | for communicating between the management application and the hypervisor. It | 
 | basically creates the pipes that enable virtualization management | 
 | software. This device is presented to a designated management partition as | 
 | a virtual device. | 
 |  | 
 | This communication device uses Command/Response Queue (CRQ) and the | 
 | Remote Direct Memory Access (RDMA) interfaces. A three-way handshake is | 
 | defined that must take place to establish that both the hypervisor and | 
 | management partition sides of the channel are running prior to | 
 | sending/receiving any of the protocol messages. | 
 |  | 
 | This driver also utilizes Transport Event CRQs. CRQ messages are sent | 
 | when the hypervisor detects one of the peer partitions has abnormally | 
 | terminated, or one side has called H_FREE_CRQ to close their CRQ. | 
 | Two new classes of CRQ messages are introduced for the VMC device. VMC | 
 | Administrative messages are used for each partition using the VMC to | 
 | communicate capabilities to their partner. HMC Interface messages are used | 
 | for the actual flow of HMC messages between the management partition and | 
 | the hypervisor. As most HMC messages far exceed the size of a CRQ buffer, | 
 | a virtual DMA (RMDA) of the HMC message data is done prior to each HMC | 
 | Interface CRQ message. Only the management partition drives RDMA | 
 | operations; hypervisors never directly cause the movement of message data. | 
 |  | 
 |  | 
 | Terminology | 
 | ----------- | 
 | RDMA | 
 |         Remote Direct Memory Access is DMA transfer from the server to its | 
 |         client or from the server to its partner partition. DMA refers | 
 |         to both physical I/O to and from memory operations and to memory | 
 |         to memory move operations. | 
 | CRQ | 
 |         Command/Response Queue a facility which is used to communicate | 
 |         between partner partitions. Transport events which are signaled | 
 |         from the hypervisor to partition are also reported in this queue. | 
 |  | 
 | Example Management Partition VMC Driver Interface | 
 | ================================================= | 
 |  | 
 | This section provides an example for the management application | 
 | implementation where a device driver is used to interface to the VMC | 
 | device. This driver consists of a new device, for example /dev/ibmvmc, | 
 | which provides interfaces to open, close, read, write, and perform | 
 | ioctl’s against the VMC device. | 
 |  | 
 | VMC Interface Initialization | 
 | ---------------------------- | 
 |  | 
 | The device driver is responsible for initializing the VMC when the driver | 
 | is loaded. It first creates and initializes the CRQ. Next, an exchange of | 
 | VMC capabilities is performed to indicate the code version and number of | 
 | resources available in both the management partition and the hypervisor. | 
 | Finally, the hypervisor requests that the management partition create an | 
 | initial pool of VMC buffers, one buffer for each possible HMC connection, | 
 | which will be used for management application  session initialization. | 
 | Prior to completion of this initialization sequence, the device returns | 
 | EBUSY to open() calls. EIO is returned for all open() failures. | 
 |  | 
 | :: | 
 |  | 
 |         Management Partition		Hypervisor | 
 |                         CRQ INIT | 
 |         ----------------------------------------> | 
 |         	   CRQ INIT COMPLETE | 
 |         <---------------------------------------- | 
 |         	      CAPABILITIES | 
 |         ----------------------------------------> | 
 |         	 CAPABILITIES RESPONSE | 
 |         <---------------------------------------- | 
 |               ADD BUFFER (HMC IDX=0,1,..)         _ | 
 |         <----------------------------------------  | | 
 |         	  ADD BUFFER RESPONSE              | - Perform # HMCs Iterations | 
 |         ----------------------------------------> - | 
 |  | 
 | VMC Interface Open | 
 | ------------------ | 
 |  | 
 | After the basic VMC channel has been initialized, an HMC session level | 
 | connection can be established. The application layer performs an open() to | 
 | the VMC device and executes an ioctl() against it, indicating the HMC ID | 
 | (32 bytes of data) for this session. If the VMC device is in an invalid | 
 | state, EIO will be returned for the ioctl(). The device driver creates a | 
 | new HMC session value (ranging from 1 to 255) and HMC index value (starting | 
 | at index 0 and ranging to 254) for this HMC ID. The driver then does an | 
 | RDMA of the HMC ID to the hypervisor, and then sends an Interface Open | 
 | message to the hypervisor to establish the session over the VMC. After the | 
 | hypervisor receives this information, it sends Add Buffer messages to the | 
 | management partition to seed an initial pool of buffers for the new HMC | 
 | connection. Finally, the hypervisor sends an Interface Open Response | 
 | message, to indicate that it is ready for normal runtime messaging. The | 
 | following illustrates this VMC flow: | 
 |  | 
 | :: | 
 |  | 
 |         Management Partition             Hypervisor | 
 |         	      RDMA HMC ID | 
 |         ----------------------------------------> | 
 |         	    Interface Open | 
 |         ----------------------------------------> | 
 |         	      Add Buffer                  _ | 
 |         <----------------------------------------  | | 
 |         	  Add Buffer Response              | - Perform N Iterations | 
 |         ----------------------------------------> - | 
 |         	Interface Open Response | 
 |         <---------------------------------------- | 
 |  | 
 | VMC Interface Runtime | 
 | --------------------- | 
 |  | 
 | During normal runtime, the management application and the hypervisor | 
 | exchange HMC messages via the Signal VMC message and RDMA operations. When | 
 | sending data to the hypervisor, the management application performs a | 
 | write() to the VMC device, and the driver RDMA’s the data to the hypervisor | 
 | and then sends a Signal Message. If a write() is attempted before VMC | 
 | device buffers have been made available by the hypervisor, or no buffers | 
 | are currently available, EBUSY is returned in response to the write(). A | 
 | write() will return EIO for all other errors, such as an invalid device | 
 | state. When the hypervisor sends a message to the management, the data is | 
 | put into a VMC buffer and an Signal Message is sent to the VMC driver in | 
 | the management partition. The driver RDMA’s the buffer into the partition | 
 | and passes the data up to the appropriate management application via a | 
 | read() to the VMC device. The read() request blocks if there is no buffer | 
 | available to read. The management application may use select() to wait for | 
 | the VMC device to become ready with data to read. | 
 |  | 
 | :: | 
 |  | 
 |         Management Partition             Hypervisor | 
 |         		MSG RDMA | 
 |         ----------------------------------------> | 
 |         		SIGNAL MSG | 
 |         ----------------------------------------> | 
 |         		SIGNAL MSG | 
 |         <---------------------------------------- | 
 |         		MSG RDMA | 
 |         <---------------------------------------- | 
 |  | 
 | VMC Interface Close | 
 | ------------------- | 
 |  | 
 | HMC session level connections are closed by the management partition when | 
 | the application layer performs a close() against the device. This action | 
 | results in an Interface Close message flowing to the hypervisor, which | 
 | causes the session to be terminated. The device driver must free any | 
 | storage allocated for buffers for this HMC connection. | 
 |  | 
 | :: | 
 |  | 
 |         Management Partition             Hypervisor | 
 |         	     INTERFACE CLOSE | 
 |         ----------------------------------------> | 
 |                 INTERFACE CLOSE RESPONSE | 
 |         <---------------------------------------- | 
 |  | 
 | Additional Information | 
 | ====================== | 
 |  | 
 | For more information on the documentation for CRQ Messages, VMC Messages, | 
 | HMC interface Buffers, and signal messages please refer to the Linux on | 
 | Power Architecture Platform Reference. Section F. |