blob: 892fc5d342ebf22728343e0ed7636f4b5ba9cd8a [file] [view] [edit]
# COS GPU Driver Compilation Tool
This tool allows you to cross-compile a specific NVIDIA GPU driver runfile for a specific Container-Optimized OS (COS) version and board. It outputs the compiled kernel modules, the input runfile, and an updated `gpu_driver_versions.bin` file, which can then be uploaded to a custom GCS bucket and installed on COS instances.
Consolidating the compilation logic into Go ensures a robust and reproducible build environment that matches the official COS build process.
## Prerequisites
- **Go**: A working Go environment (Go 1.26+ recommended).
- **GCS Bucket**: A Google Cloud Storage bucket where you will upload the compiled artifacts (e.g., `my-custom-cos-artifacts`).
- **gcloud CLI**: Installed and authenticated to access GCS.
- **NVIDIA Driver Runfile**: The local `.run` file of the NVIDIA driver version you want to compile (e.g., `NVIDIA-Linux-x86_64-580.126.20.run`).
## Step 1: Compile the GPU Driver
1. Build the compilation tool from the repository root:
```bash
go build -o compile_gpu_driver src/cmd/compile_gpu_driver/main.go
```
2. Run the tool, specifying the local runfile, target COS version (build number), target board, and output directory:
```bash
./compile_gpu_driver \
-runfile /path/to/NVIDIA-Linux-x86_64-580.126.20.run \
-cos-version 19506.120.64 \
-cos-board lakitu \
-out-dir ./output
```
The tool will:
* Parse the runfile to determine its version and architecture.
* Download the matching toolchain and kernel headers from the official `cos-tools` GCS bucket.
* Decompress the toolchain/headers and cross-compile the driver modules.
* Download and update the `gpu_driver_versions.bin` config for that COS version.
* Save the following outputs in the `./output` directory:
* `nvidia-drivers-580.126.20.tgz` (the compiled modules package)
* `gpu_driver_versions.bin` (updated configuration)
* `NVIDIA-Linux-x86_64-580.126.20.run` (copy of the input runfile)
## Step 2: Upload Artifacts to Your GCS Bucket
Upload the generated files to your GCS bucket. While not strictly required, it is highly recommended to use a structured GCS path (prefix) to organize your artifacts (e.g., by COS version and board):
```
gs://<your-bucket>/<gcs-prefix>/
```
For example, you can use the standard `19506.120.64/lakitu` structure, or any other custom path of your choice:
```bash
export BUCKET="my-custom-cos-artifacts"
export PREFIX="19506.120.64/lakitu" # Or any custom prefix of your choice (e.g., "custom-drivers/580")
export DRIVER_VER="580.126.20"
# Upload the compiled driver package
gsutil cp ./output/nvidia-drivers-${DRIVER_VER}.tgz gs://${BUCKET}/${PREFIX}/
# Upload the updated versions configuration
gsutil cp ./output/gpu_driver_versions.bin gs://${BUCKET}/${PREFIX}/
# Upload the input runfile
gsutil cp ./output/NVIDIA-Linux-x86_64-${DRIVER_VER}.run gs://${BUCKET}/${PREFIX}/
```
## Step 3: Configure COS VM Security (Required)
Because your custom compiled GPU driver is not signed by Google's private key, the standard COS kernel will refuse to load the kernel modules by default due to strict module signature enforcement and Integrity Measurement Architecture (IMA) policies.
To successfully install and load your custom driver, you **must** configure your COS VM with the following security changes:
1. **Disable Secure Boot**: When creating your Compute Engine VM instance, ensure that **Secure Boot is disabled** in the Shielded VM settings.
2. **Disable Module Signature Enforcement & IMA**: You must modify the kernel command line to set `module.sig_enforce=0` and `ima_appraise=off`.
## Step 4: Install on COS Instance
To install your custom compiled driver on a COS instance, you can use the standard `cos-extensions install gpu` command. Any arguments passed after `--` are forwarded verbatim to the underlying `cos-gpu-installer` container.
This allows you to instruct the installer to download artifacts from your custom GCS bucket and prefix instead of the official public bucket.
1. Create your COS VM instance (with Secure Boot disabled and kernel parameters applied as described in Step 3) with the desired GPU attached.
2. SSH into the COS instance.
3. Run `cos-extensions` with your GCS bucket and the corresponding prefix flags passed after `--` (make sure the prefix matches the GCS path you uploaded files to in Step 2):
```bash
export BUCKET="my-custom-cos-artifacts"
export PREFIX="19506.120.64/lakitu" # Must match the prefix used in Step 2
export DRIVER_VER="580.126.20"
sudo cos-extensions install gpu -- \
-version="${DRIVER_VER}" \
-gcs-download-bucket="${BUCKET}" \
-gcs-download-prefix="${PREFIX}" \
-gcs-download-bucket-nvidia="${BUCKET}" \
-gcs-download-prefix-nvidia="${PREFIX}"
```
This command will:
* Invoke `cos-extensions` to manage the GPU installation.
* Forward the custom GCS bucket and path arguments to the underlying `cos-gpu-installer`.
* Decompress and install the custom driver package (`nvidia-drivers-*.tgz`) and the installer runfile from your GCS bucket and prefix.
* Configure and load the custom GPU drivers.