tree: c9265a04a53f3c0e81774e7e783d31a8e0785ddb
  1. main.go
  2. README.md
src/cmd/compile_gpu_driver/README.md

COS GPU Driver Compilation Tool

This tool allows you to cross-compile a specific NVIDIA GPU driver runfile for a specific Container-Optimized OS (COS) version and board. It outputs the compiled kernel modules, the input runfile, and an updated gpu_driver_versions.bin file, which can then be uploaded to a custom GCS bucket and installed on COS instances.

Consolidating the compilation logic into Go ensures a robust and reproducible build environment that matches the official COS build process.

Prerequisites

  • Go: A working Go environment (Go 1.26+ recommended).
  • GCS Bucket: A Google Cloud Storage bucket where you will upload the compiled artifacts (e.g., my-custom-cos-artifacts).
  • gcloud CLI: Installed and authenticated to access GCS.
  • NVIDIA Driver Runfile: The local .run file of the NVIDIA driver version you want to compile (e.g., NVIDIA-Linux-x86_64-580.126.20.run).

Step 1: Compile the GPU Driver

  1. Build the compilation tool from the repository root:

    go build -o compile_gpu_driver src/cmd/compile_gpu_driver/main.go
    
  2. Run the tool, specifying the local runfile, target COS version (build number), target board, and output directory:

    ./compile_gpu_driver \
      -runfile /path/to/NVIDIA-Linux-x86_64-580.126.20.run \
      -cos-version 19506.120.64 \
      -cos-board lakitu \
      -out-dir ./output
    

    The tool will:

    • Parse the runfile to determine its version and architecture.
    • Download the matching toolchain and kernel headers from the official cos-tools GCS bucket.
    • Decompress the toolchain/headers and cross-compile the driver modules.
    • Download and update the gpu_driver_versions.bin config for that COS version.
    • Save the following outputs in the ./output directory:
      • nvidia-drivers-580.126.20.tgz (the compiled modules package)
      • gpu_driver_versions.bin (updated configuration)
      • NVIDIA-Linux-x86_64-580.126.20.run (copy of the input runfile)

Step 2: Upload Artifacts to Your GCS Bucket

Upload the generated files to your GCS bucket. While not strictly required, it is highly recommended to use a structured GCS path (prefix) to organize your artifacts (e.g., by COS version and board):

gs://<your-bucket>/<gcs-prefix>/

For example, you can use the standard 19506.120.64/lakitu structure, or any other custom path of your choice:

export BUCKET="my-custom-cos-artifacts"
export PREFIX="19506.120.64/lakitu" # Or any custom prefix of your choice (e.g., "custom-drivers/580")
export DRIVER_VER="580.126.20"

# Upload the compiled driver package
gsutil cp ./output/nvidia-drivers-${DRIVER_VER}.tgz gs://${BUCKET}/${PREFIX}/

# Upload the updated versions configuration
gsutil cp ./output/gpu_driver_versions.bin gs://${BUCKET}/${PREFIX}/

# Upload the input runfile
gsutil cp ./output/NVIDIA-Linux-x86_64-${DRIVER_VER}.run gs://${BUCKET}/${PREFIX}/

Step 3: Configure COS VM Security (Required)

Because your custom compiled GPU driver is not signed by Google's private key, the standard COS kernel will refuse to load the kernel modules by default due to strict module signature enforcement and Integrity Measurement Architecture (IMA) policies.

To successfully install and load your custom driver, you must configure your COS VM with the following security changes:

  1. Disable Secure Boot: When creating your Compute Engine VM instance, ensure that Secure Boot is disabled in the Shielded VM settings.
  2. Disable Module Signature Enforcement & IMA: You must modify the kernel command line to set module.sig_enforce=0 and ima_appraise=off.

Step 4: Install on COS Instance

To install your custom compiled driver on a COS instance, you can use the standard cos-extensions install gpu command. Any arguments passed after -- are forwarded verbatim to the underlying cos-gpu-installer container.

This allows you to instruct the installer to download artifacts from your custom GCS bucket and prefix instead of the official public bucket.

  1. Create your COS VM instance (with Secure Boot disabled and kernel parameters applied as described in Step 3) with the desired GPU attached.
  2. SSH into the COS instance.
  3. Run cos-extensions with your GCS bucket and the corresponding prefix flags passed after -- (make sure the prefix matches the GCS path you uploaded files to in Step 2):
export BUCKET="my-custom-cos-artifacts"
export PREFIX="19506.120.64/lakitu" # Must match the prefix used in Step 2
export DRIVER_VER="580.126.20"

sudo cos-extensions install gpu -- \
  -version="${DRIVER_VER}" \
  -gcs-download-bucket="${BUCKET}" \
  -gcs-download-prefix="${PREFIX}" \
  -gcs-download-bucket-nvidia="${BUCKET}" \
  -gcs-download-prefix-nvidia="${PREFIX}"

This command will:

  • Invoke cos-extensions to manage the GPU installation.
  • Forward the custom GCS bucket and path arguments to the underlying cos-gpu-installer.
  • Decompress and install the custom driver package (nvidia-drivers-*.tgz) and the installer runfile from your GCS bucket and prefix.
  • Configure and load the custom GPU drivers.