Kernel pin and amd driver issue

teknowill

Member
Jun 20, 2022
5
0
6
I need rocm, so I need the amd proprietary driver, and have to stick with kernels that it will work with.

This was working great.
installed on the hypervisor
rocminfo came back good

passed to the gpu to an lxc

/dev/kfd
/dev/dri/render
/dev/dri/card0
/dev/dri/by-path/pci-0000:c3:00.0-card
/dev/dri/by-path/pci-0000:c3:00.0-render

worked well for llms in a container / ollama and openwebui

after
apt distro-upgrade

6.8.12-1-pve breaks dkms , I did not realized it was the kernel, till i tired to remove and reinstall the drivers

you cannot install dkms with amdgpu-install_6.2.60200-1_all.deb and 6.8.12-1-pve

happy to stay on a older kernel till they release a newer verison.

wish I knew of a better way to test, if a update might break it

At this point, it looks like I have to wipe /boot proxmox install drive, install , restore and not upgrade beyond 6.8.8-4-pve

I've tried going back to a older working kernel with the pin feature
pve-efiboot-tool kernel list
pve-efiboot-tool kernel add 6.8.8-4-pve
pve-efiboot-tool kernel pin 6.8.8-4-pve

This gets proxmox to autoboot with a older kernel

running 6.8.8-4-pve
wget https://repo.radeon.com/amdgpu-install/6.2/ubuntu/jammy/amdgpu-install_6.2.60200-1_all.deb
sudo apt install ./amdgpu-install_6.2.60200-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm

this would work however...

My problem is that their installer for dkms sees the 6.8.12-1-pve and 6.8.8-4-pve kernels
even when booting with 6.8.8-4-pve


I do not see a configuration i can send to only have it try one kernel


so It will not finishing installing because it cannot make it work with the newer kernel.


Loading new amdgpu-6.8.5-2009582.22.04 DKMS files...
Building for 6.8.8-4-pve 6.8.12-1-pve
Building for architecture amd64
Building initial module for 6.8.8-4-pve
Done.
Forcing installation of amdgpu
...
Building initial module for 6.8.12-1-pve
Error! Bad return status for module build on kernel: 6.8.12-1-pve (amd64)
Consult /var/lib/dkms/amdgpu/6.8.5-2009582.22.04/build/make.log for more information.
dpkg: error processing package amdgpu-dkms (--configure):
installed amdgpu-dkms package post-installation script subprocess returned error exit status 10
Errors were encountered while processing:
amdgpu-dkms

I can do dkms by hand, but then I do not know what the amdgpu-install does after that

Is there a way to get ride of 6.8.12-1-pve without uninstalling proxmox-ve?
of it, what to do to get it back

sudo apt purge proxmox-kernel-6.8.12-1-pve-signed
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
proxmox-kernel-6.8.12-1-pve
The following packages will be REMOVED:
proxmox-default-kernel* proxmox-kernel-6.8* proxmox-kernel-6.8.12-1-pve-signed* proxmox-ve*
The following NEW packages will be installed:
proxmox-kernel-6.8.12-1-pve
0 upgraded, 1 newly installed, 4 to remove and 0 not upgraded.

any idea beyond outside of do not buy amd would be appricated
 
Last edited:
  • Like
Reactions: mariol
thanks, you've never needed dksm to get rocm to work?

While I am able to add the amdgpu/module and install dkms , just on the older kernel
you'll get a good dkms status

however, there's no way to finish the amdgpu install's last steps

there is supposed to be a --no-dkms flag but it ignores it, maybe they removed the flag

it always tries to build with dkms on both kernels
 
wiped and installed pve 8.2 with 6.8.8

apt update

apt-mark hold proxmox-kernel-6.8.12-1-pve-signed
has no effect at blocking the kernel update

usermod -a -G render,video $LOGNAME

had trouble installing given dependencies, stopped trying.

confirmed hypervisor with built in kernel drivers presented a /dev/dri/renderD128

restored the lxc from proxmox backup server

the lxc has supported ubuntu 24 -is- unprivileged

in the lxc

apt install rocm
from the ubuntu 24 repo

long as you can see
/dev/dri/renderD128
/dev/kfd

and

rocminfo gives output within the lxc

ollama/rocm docker containers will run fine, long as they can see the same /dev/dri/render and /dev/kfd

hypervisor is on proxmox-kernel-6.8.12-1-pve-signed

long as you do not need the bleeding edge rocm or amd driver, doing it this way is a lot easier
and
you don't have to worry about the hypervisor distro-updates
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!