| <!--===- docs/OpenMP-descriptor-management.md |
| |
| Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. |
| See https://llvm.org/LICENSE.txt for license information. |
| SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| |
| --> |
| |
| # Fortran descriptor type mapping for OpenMP offload |
| ```{contents} |
| --- |
| local: |
| --- |
| ``` |
| |
| |
| ## Details |
| The initial method for mapping Fortran types tied to descriptors for OpenMP offloading is to treat these types |
| as a special case of OpenMP record type (C/C++ structure/class, Fortran derived type etc.) mapping as far as the |
| runtime is concerned. Where the box (descriptor information) is the holding container and the underlying |
| data pointer is contained within the container, and we must generate explicit maps for both the pointer member and |
| the container. As an example, a small C++ program that is equivalent to the concept described, with the |
| `mock_descriptor` class being representative of the class utilised for descriptors in Clang: |
| |
| ```C++ |
| struct mock_descriptor { |
| long int x; |
| std::byte x1, x2, x3, x4; |
| void *pointer; |
| long int lx[1][3]; |
| }; |
| |
| int main() { |
| mock_descriptor data; |
| #pragma omp target map(tofrom: data, data.pointer[:upper_bound]) |
| { |
| do something... |
| } |
| |
| return 0; |
| } |
| ``` |
| |
| In the above, we have to map both the containing structure, with its non-pointer members and the |
| data pointed to by the pointer contained within the structure to appropriately access the data. This |
| is effectively what is done with descriptor types for the time being. Other pointers that are part |
| of the descriptor container such as the addendum should also be treated as the data pointer is |
| treated. |
| |
| Currently, Flang will lower these descriptor types in the OpenMP lowering (lower/OpenMP.cpp) similarly |
| to all other map types, generating an omp.MapInfoOp containing relevant information required for lowering |
| the OpenMP dialect to LLVM-IR during the final stages of the MLIR lowering. However, after |
| the lowering to FIR/HLFIR has been performed an OpenMP dialect specific pass for Fortran, |
| `MapInfoFinalizationPass` (Optimizer/OpenMP/MapInfoFinalization.cpp) will expand the |
| `omp.MapInfoOp`'s containing descriptors (which currently will be a `BoxType` or `BoxAddrOp`) into multiple |
| mappings, with one extra per pointer member in the descriptor that is supported on top of the original |
| descriptor map operation. These pointers members are linked to the parent descriptor by adding them to |
| the member field of the original descriptor map operation, they are then inserted into the relevant map |
| owning operation's (`omp.TargetOp`, `omp.TargetDataOp` etc.) map operand list and in cases where the owning |
| operation is `IsolatedFromAbove`, it also inserts them as `BlockArgs` to canonicalize the mappings and |
| simplify lowering. |
| |
| An example transformation by the `MapInfoFinalizationPass`: |
| |
| ``` |
| |
| ... |
| %12 = omp.map.info var_ptr(%1#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.box<!fir.ptr<!fir.array<?xi32>>>) map_clauses(tofrom) capture(ByRef) bounds(%11) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> {name = "arg_alloc"} |
| ... |
| omp.target map_entries(%12 -> %arg1, %13 -> %arg2 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ref<i32>) { |
| ^bb0(%arg1: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, %arg2: !fir.ref<i32>): |
| ... |
| |
| ====> |
| |
| ... |
| %12 = fir.box_offset %1#1 base_addr : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> |
| %13 = omp.map.info var_ptr(%1#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.array<?xi32>) map_clauses(tofrom) capture(ByRef) var_ptr_ptr(%12 : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) bounds(%11) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> {name = ""} |
| %14 = omp.map.info var_ptr(%1#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.box<!fir.ptr<!fir.array<?xi32>>>) map_clauses(tofrom) capture(ByRef) members(%13 : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> {name = "arg_alloc"} |
| ... |
| omp.target map_entries(%13 -> %arg1, %14 -> %arg2, %15 -> %arg3 : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ref<i32>) { |
| ^bb0(%arg1: !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>, %arg2: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, %arg3: !fir.ref<i32>): |
| ... |
| |
| ``` |
| |
| In later stages of the compilation flow when the OpenMP dialect is being lowered to LLVM-IR these descriptor |
| mappings are treated as if they were structure mappings with explicit member maps on the same directive as |
| their parent was mapped. |
| |
| This implementation utilises the member field of the `map_info` operation to indicate that the pointer |
| descriptor elements which are contained in their own `map_info` operation are part of their respective |
| parent descriptor. This allows the descriptor containing the descriptor pointer member to be mapped |
| as a composite entity during lowering, with the correct mappings being generated to tie them together, |
| allowing the OpenMP runtime to map them correctly, attaching the pointer member to the parent |
| structure so it can be accessed during execution. If we opt to not treat the descriptor as a single |
| entity we have issues with the member being correctly attached to the parent and being accessible, |
| this can cause runtime segfaults on the device when we try to access the data through the parent. It |
| may be possible to avoid this member mapping, treating them as individual entities, but treating a |
| composite mapping as an individual mapping could lead to problems such as the runtime taking |
| liberties with the mapping it usually wouldn't if it knew they were linked, we would also have to |
| be careful to maintian the correct order of mappings as we lower, if we misorder the maps, it'd be |
| possible to overwrite already written data, e.g. if we write the descriptor data pointer first, and |
| then the containing descriptor, we would overwrite the descriptor data pointer with the incorrect |
| address. |
| |
| This method is generic in the sense that the OpenMP dialect doesn't need to understand that it is mapping a |
| Fortran type containing a descriptor, it just thinks it's a record type from either Fortran or C++. However, |
| it is a little rigid in how the descriptor mappings are handled as there is no specialisation or possibility |
| to specialise the mappings for possible edge cases without polluting the dialect or lowering with further |
| knowledge of Fortran and the FIR dialect. |
| |
| ## Differences from OpenACC |
| |
| The descriptor mapping for OpenMP currently works differently to the planned direction for OpenACC, however, |
| it is possible and would likely be ideal to align the method with OpenACC in the future. |
| |
| Currently the OpenMP specification is less descriptive and has less stringent rules around descriptor based |
| types so does not require as complex a set of descriptor management rules as OpenACC (although, in certain |
| cases for the interim adopting OpenACC's rules where it makes sense could be useful). To handle the more |
| complex descriptor mapping rules OpenACC has opted to utilise a more runtime oriented approach, where |
| specialized runtime functions for handling descriptor mapping for OpenACC are created and these runtime |
| function handles are attatched to a special OpenACC dialect operation. When this operation is lowered it |
| will lower to the attatched OpenACC descriptor mapping runtime function. This sounds like it will work |
| (no implementation yet) similarly to some of the existing HLFIR operations which optionally lower to |
| Fortran runtime calls. |
| |
| This methodology described by OpenACC which utilises runtime functions to handle specialised mappings allows |
| more flexibility as a significant amount of the mapping logic can be moved into the runtime from the compiler. |
| It also allows specialisation of the mapping for fortran specific types. This may be a desireable approach |
| to take for OpenMP in the future, in particular if we find need to specialise mapping further for |
| descriptors or other Fortran types. However, for the moment the currently chosen implementation for OpenMP |
| appears sufficient as far as the OpenMP specification and current testing can show. |