Tutorial: Run LLMs using AMD GPU and ROCm in unprivileged LXC container

spiralsugarcane

New Member
Jun 6, 2023
13
1
3
Create a Ubuntu 24.04 LXC container.
I used the excellent tteck script but you can also do using any other method you are comfortable with.
Give it plenty of specs regarding storage, RAM and CPU (according to Ollama's recommendations)
I chose 32GB and all available cores. Give it plenty of storage to accomodate your models (can be expanded using the GUI)
I chose 60 GB to start. ROCm takes up ~30 GB.

In the LXC container:
install ROCm according to official instructions but do not use DKMS as the drivers are already installed on Proxmox by default.
This is the purpose of the --no-dkms flag
Code:
wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb
amdgpu-install --usecase=rocm --no-dkms

Set the environment variables as discussed in this thread:
  • for GCN 5th gen based GPUs and APUs HSA_OVERRIDE_GFX_VERSION=9.0.0
  • for RDNA 1 based GPUs and APUs HSA_OVERRIDE_GFX_VERSION=10.1.0
  • for RDNA 2 based GPUs and APUs HSA_OVERRIDE_GFX_VERSION=10.3.0
  • for RDNA 3 based GPUs and APUs HSA_OVERRIDE_GFX_VERSION=11.0.0

Check what group IDs render and video is, in the container.
Code:
cat /etc/group | grep -w 'render\|\video'

On the Proxmox host:
Locate your AMD GPU's render device path to use in the next step:
Code:
ls -l /sys/class/drm/renderD*/device/driver

In the proxmox GUI, go to Options and set Device Passthrough.
Code:
/dev/kbd
Use the container's GID for render group inside the container and uid left to 0.

Code:
/dev/dri/renderD***
Use the GID for video group and uid left to 0.

Restart container and proceed to install ollama according to instructions on the github repo.
You can check on the host using radeontop if your GPU is working and is used.
Code:
apt update && apt install radeontop
 
Last edited:
  • Like
Reactions: Lukas Moravek
Thanks for the notes

I had to do some additional steps as well on Proxmox v8.3
When installing ROCm change the default instructions from
Code:
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
to
Code:
sudo apt install "proxmox-default-headers" "proxmox-headers-6.5"

Also had to add environment variables
Code:
Environment="ROCR_VISIBLE_DEVICES=0"
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
to
Code:
/etc/systemd/system/ollama.service
 
I have RX 7800XT

root@docker:~# amdgpu-install --usecase=rocm --no-dkms
Hit:1 http://deb.debian.org/debian bookworm InRelease
Hit:2 http://security.debian.org bookworm-security InRelease
Hit:3 http://deb.debian.org/debian bookworm-updates InRelease
Hit:4 https://download.docker.com/linux/debian bookworm InRelease
Hit:5 https://repo.radeon.com/amdgpu/6.2.4/ubuntu noble InRelease
Hit:6 https://repo.radeon.com/rocm/apt/6.2.4 noble InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
hipsolver : Depends: libcholmod5 but it is not installable
Depends: libsuitesparseconfig7 but it is not installable
mesa-amdgpu-va-drivers : Depends: libc6 (>= 2.38) but 2.36-9+deb12u9 is to be installed
Depends: libdrm-amdgpu-amdgpu1 but it is not installable
Depends: libdrm-amdgpu-radeon1 but it is not going to be installed
Depends: libdrm2-amdgpu but it is not installable
Depends: libelf1t64 (>= 0.142) but it is not installable
Depends: libllvm18.1-amdgpu but it is not going to be installed
Depends: libx11-xcb1 (>= 2:1.8.7) but 2:1.8.4-2+deb12u2 is to be installed
Depends: libzstd1 (>= 1.5.5) but 1.5.4+dfsg2-5 is to be installed
rccl : Depends: libc6 (>= 2.38) but 2.36-9+deb12u9 is to be installed
Depends: libstdc++6 (>= 13.1) but 12.2.0-14 is to be installed
rocm-gdb : Depends: libc6 (>= 2.38) but 2.36-9+deb12u9 is to be installed
Depends: libgmp10 (>= 2:6.3.0+dfsg) but 2:6.2.1+dfsg1-1.1 is to be installed
Depends: libpython3.12t64 (>= 3.12.1) but it is not installable
Depends: libzstd1 (>= 1.5.5) but 1.5.4+dfsg2-5 is to be installed
rocprofiler-register : Depends: libc6 (>= 2.38) but 2.36-9+deb12u9 is to be installed
Depends: libstdc++6 (>= 13.1) but 12.2.0-14 is to be installed
E: Unable to correct problems, you have held broken packages.


Should I install any dependencies or my gpu not support rocm?
 
Thanks for the notes

I had to do some additional steps as well on Proxmox v8.3
When installing ROCm change the default instructions from
Code:
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
to
Code:
sudo apt install "proxmox-default-headers" "proxmox-headers-6.5"

Also had to add environment variables
Code:
Environment="ROCR_VISIBLE_DEVICES=0"
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
to
Code:
/etc/systemd/system/ollama.service
Did you install ROCm on the host?
I have RX 7800XT

root@docker:~# amdgpu-install --usecase=rocm --no-dkms
Hit:1 http://deb.debian.org/debian bookworm InRelease
Hit:2 http://security.debian.org bookworm-security InRelease
Hit:3 http://deb.debian.org/debian bookworm-updates InRelease
Hit:4 https://download.docker.com/linux/debian bookworm InRelease
Hit:5 https://repo.radeon.com/amdgpu/6.2.4/ubuntu noble InRelease
Hit:6 https://repo.radeon.com/rocm/apt/6.2.4 noble InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
hipsolver : Depends: libcholmod5 but it is not installable
Depends: libsuitesparseconfig7 but it is not installable
mesa-amdgpu-va-drivers : Depends: libc6 (>= 2.38) but 2.36-9+deb12u9 is to be installed
Depends: libdrm-amdgpu-amdgpu1 but it is not installable
Depends: libdrm-amdgpu-radeon1 but it is not going to be installed
Depends: libdrm2-amdgpu but it is not installable
Depends: libelf1t64 (>= 0.142) but it is not installable
Depends: libllvm18.1-amdgpu but it is not going to be installed
Depends: libx11-xcb1 (>= 2:1.8.7) but 2:1.8.4-2+deb12u2 is to be installed
Depends: libzstd1 (>= 1.5.5) but 1.5.4+dfsg2-5 is to be installed
rccl : Depends: libc6 (>= 2.38) but 2.36-9+deb12u9 is to be installed
Depends: libstdc++6 (>= 13.1) but 12.2.0-14 is to be installed
rocm-gdb : Depends: libc6 (>= 2.38) but 2.36-9+deb12u9 is to be installed
Depends: libgmp10 (>= 2:6.3.0+dfsg) but 2:6.2.1+dfsg1-1.1 is to be installed
Depends: libpython3.12t64 (>= 3.12.1) but it is not installable
Depends: libzstd1 (>= 1.5.5) but 1.5.4+dfsg2-5 is to be installed
rocprofiler-register : Depends: libc6 (>= 2.38) but 2.36-9+deb12u9 is to be installed
Depends: libstdc++6 (>= 13.1) but 12.2.0-14 is to be installed
E: Unable to correct problems, you have held broken packages.


Should I install any dependencies or my gpu not support rocm?
Strange. I did this on a newly created Ubuntu 24.04 container, and I did not get those messages. It can be some software you installed before. You use the "noble" link. Are you on 24.04 and not 22.04? I did not have to do anything other than the instructions I have here.
 
I've used this script

Then run this inside the container:

Code:
wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb
amdgpu-install --usecase=rocm --no-dkms


Then I get this error during install:

Code:
The following packages have unmet dependencies:
 hipsolver : Depends: libcholmod5 but it is not installable
             Depends: libsuitesparseconfig7 but it is not installable
 mesa-amdgpu-va-drivers : Depends: libva2 (>= 2.16.0) but 2.14.0-1 is to be installed or
                                   libva2-amdgpu but it is not installable
                          Depends: libva-drm2 (>= 2.16.0) but 2.14.0-1 is to be installed or
                                   libva-amdgpu-drm2 but it is not installable
                          Depends: libva-wayland2 (>= 2.16.0) but 2.14.0-1 is to be installed or
                                   libva-amdgpu-wayland2 but it is not installable
                          Depends: libva-x11-2 (>= 2.16.0) but 2.14.0-1 is to be installed or
                                   libva-amdgpu-x11-2 but it is not installable
                          Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
                          Depends: libdrm-amdgpu-amdgpu1 but it is not installable
                          Depends: libdrm-amdgpu-radeon1 but it is not going to be installed
                          Depends: libdrm2-amdgpu but it is not installable
                          Depends: libelf1t64 (>= 0.142) but it is not installable
                          Depends: libllvm18.1-amdgpu but it is not going to be installed
                          Depends: libx11-xcb1 (>= 2:1.8.7) but 2:1.7.5-1ubuntu0.3 is to be installed
                          Depends: libzstd1 (>= 1.5.5) but 1.4.8+dfsg-3build1 is to be installed
 rccl : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
        Depends: libstdc++6 (>= 13.1) but 12.3.0-1ubuntu1~22.04 is to be installed
 rocm-gdb : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
            Depends: libgmp10 (>= 2:6.3.0+dfsg) but 2:6.2.1+dfsg-3ubuntu1 is to be installed
            Depends: libpython3.12t64 (>= 3.12.1) but it is not installable
            Depends: libzstd1 (>= 1.5.5) but 1.4.8+dfsg-3build1 is to be installed
 rocprofiler-register : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
                        Depends: libstdc++6 (>= 13.1) but 12.3.0-1ubuntu1~22.04 is to be installed
E: Unable to correct problems, you have held broken packages.
 
Last edited:
Hi spiralsugarcane, thank you for showing rocm can be used inside a container! Following your post, I installed rocm 6.3 successfully in a ubuntu 24.04 container on a host running pve 8.3.

However, when I ran rocminfo in the container, I got a HSA_STATUS_ERROR_OUT_OF_RESOURCES error. Also, pytorch could not use the GPU as torch.cuda.is_available() returned FALSE even with environment variable HSA_OVERRIDE_GFX_VERSION set to 11.0.0. I am unsure if this has anything to do with the RX 7800XT on my system since is not officially supported in rocm. Do you mind sharing what GPU are you using?

Edit 1: I just noticed that your mentioned "install ROCm according to official instructions but do not use DKMS as the drivers are already installed on Proxmox by default.", does this mean I have to install amdgpu-dkms on the host?

Edit 2: I managed to get pytorch working with rocm inside the container. The host needs to have the dkms module installed.
 
Last edited:
Hi spiralsugarcane, thank you for showing rocm can be used inside a container! Following your post, I installed rocm 6.3 successfully in a ubuntu 24.04 container on a host running pve 8.3.

However, when I ran rocminfo in the container, I got a HSA_STATUS_ERROR_OUT_OF_RESOURCES error. Also, pytorch could not use the GPU as torch.cuda.is_available() returned FALSE even with environment variable HSA_OVERRIDE_GFX_VERSION set to 11.0.0. I am unsure if this has anything to do with the RX 7800XT on my system since is not officially supported in rocm. Do you mind sharing what GPU are you using?

Edit 1: I just noticed that your mentioned "install ROCm according to official instructions but do not use DKMS as the drivers are already installed on Proxmox by default.", does this mean I have to install amdgpu-dkms on the host?

Edit 2: I managed to get pytorch working with rocm inside the container. The host needs to have the dkms module installed.
Interesting, thanks for the update. The user rocketpants above also had the same GPU, maybe there is the same solution. Guess the pre laded drivers in Proxmox are not compatible with your card.

I use the somewhat outdated Radeon VII 16GB, and the drivers in Proxmox worked just fine. As a matter of fact both ROCm and Ollama works flawlessly, but somewhat slow as can be expected with such an old card.
 
Last edited:
Hi @spiralsugarcane thanks for sharing your experience. Can you please provide more detailed instructions on rocm installation. Did you install anything specific to the host? What was installed to the container? What repositories were added? Step by step instruction will be very helpful for many people. Thanks in advance!
 
I played around with container, but see this error while trying to run ollama:
Code:
amdgpu devices detected but permission problems block access: kfd driver not loaded.  If running in a container, remember to include '--device /dev/kfd --device /dev/dri'
 
Hi @spiralsugarcane thanks for sharing your experience. Can you please provide more detailed instructions on rocm installation. Did you install anything specific to the host? What was installed to the container? What repositories were added? Step by step instruction will be very helpful for many people. Thanks in advance!
Hi
I did not install anything on the host as the amdgpu driver and kernel module was already loaded and included with proxmox.
If you have a GPU that for some reason you need to install a specific driver maybe you need to use DKMS on the host. Since it is a container, the kernel is shared with the container and the container does not have any privileges (if unprivileged) to install any kernel modules.

In the container I installed rocm and then docker. I then ran ollama and openwebui in docker.

Regarding permission issues. You need to first pass the /dev/dri/renderD*** and /dev/kfd to the LXC contsiner setting the correct uid and gid in proxmox as described above.

You also need to pass the same device paths to the ollama Docker container. If you run ollama without Docker then you dont need to do anything.

Here is my lxc config file located in /etc/pve/lxc
Code:
arch: amd64
cores: 8
dev0: /dev/kfd,gid=993,uid=0
dev1: /dev/dri/renderD129,gid=44
features: keyctl=1,nesting=1
hostname: ollama
memory: 32768
net0: name=eth0,bridge=vmbr0,hwaddr=*******,ip=dhcp,type=veth
onboot: 1
ostype: ubuntu
parent: start1
rootfs: local-zfs:vm-310-disk-0,size=100G
swap: 512
unprivileged: 1

Here is my docker compose file (sidenote, only use docker compose not docker-compose)
Code:
services:
  openWebUI:
    image: ghcr.io/open-webui/open-webui:main
    restart: always
    ports:
      - "3000:8080"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - /opt/docker/open-webui:/app/backend/data

  ollama:
    image: ollama/ollama:rocm
    restart: always
    devices:
      - /dev/kfd
      - /dev/dri
    ports:
      - "11434:11434"
    volumes:
      - /opt/docker/ollama:/root/.ollama
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!