Getting Nvidia drivers to install on host for use in LXC with CUDA

pundip

Member
Aug 13, 2021
7
2
8
46
Proxmox version: 7.4-18
Kernel version: pve-manager/7.4-18/b1f94095 (running kernel: 5.15.158-1-pve)

I have been trying to get my nvidia RTX 670 working within an LXC container so I can get CUDA running. My understanding as per this blog:
https://yomis.blog/nvidia-gpu-in-proxmox-lxc/
Is that I need to first install the driver on the host with dkms ie:
./NVIDIA-Linux-x86_64-304.137.run --dkms

And then install it on the container without kernel modules:
./NVIDIA-Linux-x86_64-304.137.run --no-kernel-module

I think I have installed the kernel header correctly as the command apt list --installed | grep headers gives me:


Code:
pve-headers-5.15.158-1-pve/stable,now 5.15.158-1 amd64 [installed,automatic]
pve-headers-5.15.30-2-pve/stable,now 5.15.30-3 amd64 [installed]
pve-headers-5.15/stable,now 7.4-14 all [installed,automatic]
pve-headers/stable,now 7.4-1 all [installed]

DKMS seems to be installed as apt install dkms now gives dkms is already the newest version (2.8.4-3).

So when I try to run:


./NVIDIA-Linux-x86_64-304.137.run --dkms

I get asked
“Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel Later.” to which I answer “Yes”

When installing I get the following error:

Code:
ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 304.137 -k 5.15.158-1-pve`:
         Kernel preparation unnecessary for this kernel.  Skipping...
 
         Building module:                                                                                                                                         
         cleaning build area...                                         
         make -j32 KERNELRELEASE=5.15.158-1-pve module SYSSRC=/lib/modules/5.15.158-1-pve/build...(bad exit status: 2)                                             
         Error! Bad return status for module build on kernel: 5.15.158-1-pve (x86_64)
         Consult /var/lib/dkms/nvidia/304.137/build/make.log for more information.


/var/lib/dkms/nvidia/304.137/build/make.log contains the following

*** Unable to determine the target kernel version. ***
make: *** [makefile:53: select_makefile] Error 1


The /var/log/nvidia-installer.log contents are as follows:

Code:
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Mon Jul  1 15:48:47 2024
installer version: 304.137


PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin


nvidia-installer command line:
    ./nvidia-installer
    --dkms


Using: nvidia-installer ncurses v6 user interface
-> License accepted.
-> Installing NVIDIA driver version 304.137.
-> There appears to already be a driver installed on your system (version: 304.137).  As part of installing this driver (version: 304.137), the existing driver wil>
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel late>
-> Installing both new and classic TLS OpenGL libraries.
-> Installing classic TLS 32bit OpenGL libraries.
-> Install NVIDIA's 32-bit compatibility OpenGL libraries? (Answer: Yes)
-> Uninstalling the previous installation with /usr/bin/nvidia-uninstall.
-> Searching for conflicting X files:
-> done.
-> Searching for conflicting OpenGL files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (304.137):
   executing: '/usr/sbin/ldconfig'...
   executing: '/usr/sbin/depmod -aq'...
   depmod: WARNING: Ignored deprecated option -q
-> done.
-> Driver file installation is complete.
-> Installing DKMS kernel module:
ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 304.137 -k 5.15.158-1-pve`:
Kernel preparation unnecessary for this kernel.  Skipping...


Building module:
cleaning build area...
make -j32 KERNELRELEASE=5.15.158-1-pve module SYSSRC=/lib/modules/5.15.158-1-pve/build...(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.15.158-1-pve (x86_64)
Consult /var/lib/dkms/nvidia/304.137/build/make.log for more information.
-> error.
ERROR: Failed to install the kernel module through DKMS. No kernel module was installed; please try installing again without DKMS, or check the DKMS logs for more >
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the >


Any advice on how I can get the drivers for RTX 670 work?

The server is a Poweredge 720XD and lspci is showing the card.
 
Did you ever get this sorted out? I'm having somewhat different issues using GPU passthrough to an LXC container, and I also used that blog post as my starting point.

Obligatory XKCD comic:

1739891199933.png