[SOLVED] AMDGPU HWMON no longer exists.

eBell

Renowned Member
Jun 11, 2017
13
0
66
I was experimenting with the AMDGPU 20.45 drivers to see if it would allow me to pass through the driver to a container for HW acceleration under Jellyfin, but I was unable to install the drivers on the Debian host.
After I uninstalled the partially installed AMD driver packages I was unable to control the fan on my MI25 and the GPU wasn't detected in ROCM-SMI, and after investigating it seems that HWMON is no longer present in '/sys/class/drm/card0/device/'.

Could this be caused by the AMD driver installing older firmware, or firmware of a different version than the original ones available in the pve-firmware package?
 
Could this be caused by the AMD driver installing older firmware, or firmware of a different version than the original ones available in the pve-firmware package?
Likely. Did you reboot to load the old module?

And in general, better pass the card into a VM. Also the encapsulation helps not to trash the host. ;)
 
I've sorted it.
It was my own fault for not checking the module blacklists.
The AMD installer script blacklists the amdgpu module and doesn't remove the blacklist when uninstalled.

I don't run as many VMs these days, I've been trying to implement what I can in containers for the ease of use and management.
Trashing the host is part of the fun. :p