LXC - An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel

Sep 9, 2021
41
4
8
48
Hello,
After updating proxmox kernel, it seems my Nvidia drivers are not working on my LXC with Plex.
I uninstalled the driver from both hypervisor and LXC.
I installed the last driver (NVIDIA-Linux-x86_64-510.54.run) on hypervisor.
However when I am trying to install the same driver on LXC I get this error:
"An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel"

I tried modprobe -r nvidia-drm but did not help.
Any hints to fix this ?

PS: Any way to avoid this happening on every kernel update ?

Thank you!
 
Last edited:
the lxc cannot load kernel modules, so that log seems normal? (since it's already loaded on the host)
maybe there is a way to install the driver without loading the module?
 
Do you use the --no-kernel-module option for the nvidia-driver installation inside the LXC?
 
Something appears to have changed between 550 and 560 nvidia drivers. I had 550 working, upgraded my proxmox kernel and mistakenly installed 560.
Now when attempted to install the drivers only in the LXC container I get this message..
Code:
root@frigate:/media/frigate# cat /var/log/cuda-installer.log
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 10.2.1 20210110 (Debian 10.2.1-6)

[INFO]: Initializing menu
[INFO]: nvidia-fs.setKOVersion(2.22.3)
[INFO]: Setup complete
[INFO]: Installing: Driver
[INFO]: Installing: 560.35.05
[INFO]: Executing NVIDIA-Linux-x86_64-560.35.05.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd  2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details.
[ERROR]: Install of 560.35.05 failed, quitting
root@frigate:/media/frigate# ls
clips  cuda_12.6.3_560.35.05_linux.run  exports  person-bicycle-car-detection.mp4  recordings  storage
root@frigate:/media/frigate# cat /var/log/nvidia-installer.log
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Tue Dec 31 06:56:33 2024
installer version: 560.35.05

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

nvidia-installer command line:
    ./nvidia-installer
    --ui=none
    --no-questions
    --accept-license
    --disable-nouveau
    --no-cc-version-check
    --install-libglvnd

Using built-in stream user interface
-> Unable to locate any tools for listing initramfs contents.
-> Unable to scan initramfs: no tool found
-> Detected 2 CPUs online; setting concurrency level to 2.
WARNING: An NVIDIA kernel module 'nvidia-uvm' appears to be already loaded in your kernel.  This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading.  Some of the sanity checks that nvidia-installer performs to detect potential installation problems are not possible while an NVIDIA kernel module is running.
-> Would you like to continue installation and skip the sanity checks? If not, please abort the installation, then close any programs which may be using the NVIDIA GPU(s), and attempt installation again. (Answer: Abort installation)
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
root@frigate:/media/frigate#


The installer no longer recognises the`--no-kernel-module` flag. I cannot find anywhere in the installer a way to override this default abort answer to the sanity checks.. and it appears the installation folder is cleared when the install fails.. so i cannot run the command manually.

Has anyone else worked around this?
 
Um, you can't install modules from inside a container.

ETA: Reason is that containers use the host's kernel. It would be a big security problem to allow containers to modify the host kernel. You must install any drivers/modules from the host side. Or use a VM, which has its own kernel.
 
Last edited:
I fully understand that you cannot instal modules in a container.

With CUDA 550 you could install the required drivers using '-no-kernal-modules' option which worked. In 560 even when you uncheck the kernel modules from the install and select driver only, it fails with the above message. there is no command line option to specify no-kernal-modules anymore.

I realise this is an NVIDIA issue... but im hoping that someone might have a solution here for proxmox LXC containers.
 
For what it's worth, I reinstalled 550 on the host and it's all working fine again. It seems NVIDIAs packaging has changed from 550 to 560 versions, and there is an added layer of application that means you can't force install a driver..
 
Is there an update on this? I'm always installing Nvidia drivers on the host and the container, and I've been on 550.x and thinking about moving to later versions.