Proxmox 9.x / Strix Halo / GPU Passthrough

dlasher

Renowned Member
Mar 23, 2011
253
46
93
Having spent the last few days fighting all the subtle parts of getting this working, I put together a quick guide on how to get Proxmox, running on a Strix Halo, machine, with working GPU passthrough into LXC containers. To be clear, this is a "works right now" recipe, subject to kernel changes, ROCM changes, etc.


---------------------------------

Guide: The Strix Halo AI Powerhouse (Proxmox 9.1 + ROCm 7.2)​

Target Hardware: AMD STRIX HALO box (Minisforum S1-MAX etc)
Goal: 128GB total RAM, 64GB ram for CPU/applications + 64GB VRAM local AI server with full hardware acceleration


Introduction​

The AMD Strix Halo (RDNA 3.5 / gfx1151) is a game-changer for local AI. By leveraging a high-speed unified memory architecture, this APU can address massive amounts of system RAM as video memory. This guide details how to configure Proxmox 9.1 to carve out a 64GB VRAM pool and pass it through to a high-performance LXC container.


Phase 1: Host BIOS & Kernel Tuning​

Unlocking the memory gates to allow the GPU to access 64GB of RAM.

1. BIOS Settings​

  • IOMMU: Enabled.
  • UMA Framebuffer: Auto. (The kernel parameters below will override and expand this).
  • Resizable BAR: Enabled.

2. Host Kernel Parameters​

Edit /etc/default/grub (or /etc/kernel/cmdline) on your Proxmox Host to enable IOMMU pass-through and define the Graphics Translation Table (GTT) size.

Code:
# Edit this line in /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on iommu=pt amdgpu.gttsize=65536 ttm.pages_limit=16777216 video=1024x768@60"

  • amdgpu.gttsize=65536: Maps 64GB of system RAM for GPU use.
  • ttm.pages_limit=16777216: Sets the page limit to exactly 64GB (16777216 times 4096 byte pages).
  • video=1024x768@60: Because console graphics mode autosense is always wrong.
Apply and Reboot:
Code:
update-grub && reboot


Phase 2: Host Driver & Firmware Installation​

Strix Halo requires the latest firmware blobs and the ROCm 7.2 userspace stack.

1. Update Firmware (Critical for gfx1151)​

Run this on the Proxmox Host:

Code:
apt update && apt install -y git
git clone --depth 1 https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
cp linux-firmware/amdgpu/gc_11_5_1* /lib/firmware/amdgpu/
cp linux-firmware/amdgpu/sdma_6_1_1* /lib/firmware/amdgpu/
update-initramfs -u

2. Install ROCm 7.2 Userspace (Host)​

Code:
apt update && apt install -y wget gpg curl
wget -q https://repo.radeon.com/amdgpu-install/latest/ubuntu/noble/amdgpu-install_7.2.70200-1_all.deb -O /tmp/amdgpu.deb
apt install -y /tmp/amdgpu.deb
amdgpu-install -y --usecase=graphics,rocm
usermod -aG render,video root

3. Host Verification​

Confirm the host sees the hardware:

Code:
/usr/bin/rocminfo | grep "gfx1151"


Phase 3: LXC Creation & Hardware Mapping​

1. Create the Container​

  • Privileged: Yes (Required for the KFD driver handshake).
  • Template: Ubuntu 24 LTS
  • RAM: 16GB (The GPU will pull from the 64GB GTT pool independently).

2. Map Devices (On Host)​

Run these commands on the Proxmox Host to find your GIDs and map the hardware to your LXC (replace 1201 with your actual Container ID):

Code:
# Find GIDs
RENDER_GID=$(getent group render | cut -d: -f3)
VIDEO_GID=$(getent group video | cut -d: -f3)

# Native Proxmox device passthrough
pct set 1201 -dev0 /dev/kfd,gid=$RENDER_GID
pct set 1201 -dev1 /dev/dri/renderD128,gid=$RENDER_GID
pct set 1201 -dev2 /dev/dri/card0,gid=$VIDEO_GID


Phase 4: Container Internal Setup​

Inside the Ubuntu 24 LTS LXC, install the ROCm stack without kernel modules (no-dkms) and configure Ollama.

1. Install ROCm 7.2 (LXC)​

Code:
apt update && apt install -y wget gpg curl zstd
wget -q 'https://repo.radeon.com/amdgpu-install/latest/ubuntu/noble/amdgpu-install_7.2.70200-1_all.deb' -O /tmp/amdgpu.deb
apt install -y /tmp/amdgpu.deb
amdgpu-install -y --usecase=rocm --no-dkms
usermod -aG render,video root



2. Install & Reconfigure Ollama​

Code:
curl -fsSL https://ollama.com/install.sh | sh
systemctl edit ollama.service

Paste the following into the override file:

Code:
[Service]
# Force RDNA 3.5 recognition
Environment="HSA_OVERRIDE_GFX_VERSION=11.5.0"

# Stability Fix: Disable bugged SDMA for unified memory
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_VULKAN=1"

# Connectivity & Performance
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=*"
Environment="OLLAMA_NUM_PARALLEL=2"
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
Environment="OLLAMA_KEEP_ALIVE=24h"



Phase 5: Functional Validation​

Perform these checks inside the LXC to ensure the stack is operational.

1. Driver Check​

Code:
/usr/bin/rocminfo | grep "gfx1151"


Expected Output: Name: gfx1151 and Name: amdgcn-amd-amdhsa--gfx1151.

2. Functional LLM Test​

Code:
systemctl daemon-reload
systemctl restart ollama
ollama pull qwen2.5:0.5b
ollama run qwen2.5:0.5b "Why is the sky blue?"

If the reply is instant and doesn't crash, your 64G/64G Strix Halo workstation is live.

 
Last edited:
OPTIONAL:

If you don't want to blind copy the firmware over the top, or you might actually want to check if you're ALREADY on the latest firmware, use this script instead.

Code:
#!/bin/bash
# pve-firmware-sync.sh

# 1. Pull latest firmware blobs from upstream
apt update && apt install -y git
git clone --depth 1 https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git /tmp/linux-firmware

# Track if ANY file across all patterns was updated
ANY_UPDATED=0

sync_firmware() {
    local src_pattern=$1
    local dest_dir="/lib/firmware/amdgpu/"
    local changed_in_this_run=0

    # Expand the pattern into a list of files
    for src_file in $src_pattern; do
        filename=$(basename "$src_file")
        dest_file="${dest_dir}${filename}"

        # If dest doesn't exist OR hashes differ, copy it
        if [[ ! -f "$dest_file" ]] || [[ "$(md5sum < "$src_file")" != "$(md5sum < "$dest_file")" ]]; then
            echo "Updating: $filename"
            cp "$src_file" "$dest_file"
            changed_in_this_run=1
            ANY_UPDATED=1
        else
            echo "Skipping: $filename (already up to date)"
        fi
    done
}

# 2. Perform the sync
sync_firmware "/tmp/linux-firmware/amdgpu/gc_11_5_1*"
sync_firmware "/tmp/linux-firmware/amdgpu/sdma_6_1_1*"

# 3. Only rebuild if ANY_UPDATED was set to 1
if [ "$ANY_UPDATED" -eq 1 ]; then
    echo "------------------------------------------------"
    echo "Changes detected. Rebuilding initramfs..."
    update-initramfs -u
    echo "Reboot recommended to apply new firmware."
else
    echo "------------------------------------------------"
    echo "Firmware is already synchronized. No rebuild required."
fi

# Cleanup
rm -rf /tmp/linux-firmware


# bash ./sync.firmware.sh
Hit:1 http://ftp.us.debian.org/debian trixie InRelease
Hit:2 http://security.debian.org trixie-security InRelease
Hit:3 http://ftp.us.debian.org/debian trixie-updates InRelease
Hit:4 https://repo.radeon.com/amdgpu/30.30/ubuntu noble InRelease
Hit:5 http://download.proxmox.com/debian/pve trixie InRelease
Hit:6 https://repo.radeon.com/rocm/apt/7.2 noble InRelease
Hit:7 https://repo.radeon.com/graphics/7.2/ubuntu noble InRelease
All packages are up to date.
git is already the newest version (1:2.47.3-0+deb13u1).
Summary:
Upgrading: 0, Installing: 0, Removing: 0, Not Upgrading: 0
Cloning into '/tmp/linux-firmware'...
remote: Enumerating objects: 4286, done.
remote: Counting objects: 100% (4286/4286), done.
remote: Compressing objects: 100% (2969/2969), done.
remote: Total 4286 (delta 1672), reused 3272 (delta 1190), pack-reused 0 (from 0)
Receiving objects: 100% (4286/4286), 721.90 MiB | 24.74 MiB/s, done.
Resolving deltas: 100% (1672/1672), done.
Updating files: 100% (4462/4462), done.
Skipping: gc_11_5_1_imu.bin (already up to date)
Skipping: gc_11_5_1_me.bin (already up to date)
Skipping: gc_11_5_1_mec.bin (already up to date)
Skipping: gc_11_5_1_mes1.bin (already up to date)
Skipping: gc_11_5_1_mes_2.bin (already up to date)
Skipping: gc_11_5_1_pfp.bin (already up to date)
Skipping: gc_11_5_1_rlc.bin (already up to date)
Skipping: sdma_6_1_1.bin (already up to date)
------------------------------------------------
Firmware is already synchronized. No rebuild required.
 
Last edited: