Tesla P4 | Cannot get drivers installed at all!!!

Jamie888 · Apr 13, 2025

Hi guys,

Been trying everything for over a month. When I go to install the nvidia drivers, I keep getting kernel build errors. Yes, I've read every guide, and tried everything. This is a last resort for me.

Looking for some serious help please.

My Environment is:

root@R730Node01:~/# gcc --version
gcc (Debian 12.2.0-14) 12.2.0

root@R730Node01:~# pveversion
pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.8.12-9-pve)

Console Outputs:
root@R730Node01:~# nvidia-smi
-bash: nvidia-smi: command not found

root@R730Node01:~# lspci -v | grep Tesla
04:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
Kernel modules: nouveau <-- Yes I know this is a error, it's seems to ignore the blacklist.

root@R730Node01:~# cat /etc/modprobe.d/blacklist.conf
blacklist nouveau
blacklist nouveau
blacklist amdgpu
blacklist radeon
blacklist nouveau

blacklist i915
blacklist nouvea
blacklist rivafb
blacklist rivatv
blacklist nouveau
options nouveau modeset=0

______________

I'm trying to install the following version: NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm
root@R730Node01:~/NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60/Host_Drivers# sudo ./NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm.run -dkms

The error is:
ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details.

Some of the log files are full of these post install fail:
ERROR: Kernel configuration is invalid.
include/generated/autoconf.h or include/config/auto.conf are missing.
Run 'make oldconfig && make prepare' on kernel src to fix it.

CC [M] /tmp/selfgz91607/NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm/kernel/nvidia/os-usermap.o
make[3]: *** [scripts/Makefile.build:243: /tmp/selfgz91607/NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm/kernel/nvidia/nv-vtophys.o] Error 1
In file included from <command-line>:
././include/linux/kconfig.h:5:10: fatal error: generated/autoconf.h: No such file or directory
5 | #include <generated/autoconf.h>

EDIT 01:
root@R730Node01:~/NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60/Host_Drivers# apt install proxmox-default-headers
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
proxmox-default-headers is already the newest version (1.1.0).
proxmox-default-headers set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@R730Node01:~/NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60/Host_Drivers#

EDIT 02:
root@R730Node01:~# apt-cache policy pve-headers-6.8.12-9-pve
pve-headers-6.8.12-9-pve:
Installed: (none)
Candidate: (none)
Version table:
root@R730Node01:~#

and

root@R730Node01:~# pveversion -v | grep kernel
proxmox-ve: 8.4.0 (running kernel: 6.8.12-9-pve)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8: 6.8.12-9
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-helper: 8.1.1
root@R730Node01:~#

EDIT 03:
hmm, not sure what I have done, the outputs are still the same, but after reinstalling the headers a few times, I randomly tried to nvidia installer again, and it worked fine.. not sure what I did.

04:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_vgpu_vfio, nvidia

root@R730Node01:~# nvidia-smi
Sun Apr 13 15:55:40 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03 Driver Version: 570.124.03 CUDA Version: N/A |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 Off | 00000000:04:00.0 Off | 0 |
| N/A 55C P0 25W / 75W | 33MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found

EDIT 04:

root@R730Node01:~# systemctl enable --now pve-nvidia-sriov@ALL.service
root@R730Node01:~# lspci -d 10de:
04:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
root@R730Node01:~#

I think I'm getting closer, but lets see..

Happy to provide any further tests or outputs.

Jamie888 · Apr 14, 2025

I suppose now I'm stuck at the part where the Plex LXC can see and use the GPU

Jamie888 · Apr 14, 2025

root@R730Node01:~# lspci -v -s 04:00.0
04:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
Flags: bus master, fast devsel, latency 0, IRQ 172, NUMA node 0, IOMMU group 31
Memory at 91000000 (32-bit, non-prefetchable) [size=16M]
Memory at 3bfe0000000 (64-bit, prefetchable) [size=256M]
Memory at 3bff0000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_vgpu_vfio, nvidia

dcsapak · Apr 14, 2025

according to this:

https://docs.nvidia.com/vgpu/gpus-supported-by-vgpu.html

the tesla p4 is only supported until vgpu v16, but you use 570.124.03 which corresponds to the v18 branch...

I guess you would have to use the v16 branch (note that this is not supported by nvidia/proxmox but can still work)

EDIT: or am i misunderstanding and you don't want to use the vgpu software at all?
I'm missing a bit the target you have with your post, because your latest posts indicate that the driver installation went through

Jamie888 · Apr 14, 2025

You really think it's driver related? I cant roll back I guess, but it will undo my progress

dcsapak · Apr 14, 2025

what is it you want to accomplish in the end? do you want to have multiple vms with vgpus? do you want to have a regular gpu driver installed on the host?
depending on what you want to do, there are different drivers you'd have to use...

Jamie888 · Apr 14, 2025

I want to share my Tesla P4 with a Windows VM for BlueIRIS and a LXC for PLEX. Plex is the first and most important task. (for transcoding).

dcsapak · Apr 14, 2025

that won't be possible, you can either have a 'regular' driver on the host and share it with containers, OR you pass the whole card through to a vm OR you use the vgpu driver to use in multiple vms but then you can't use it on the host for transcoding (and thus also not for containers)

this is a restriction of the nvidia drivers, since they either allow the use on the host, or with vms, but not both simultaniously (at least as far as i am aware...)

Jamie888 · Apr 14, 2025

Ok, I will just share it with Plex then.

Jamie888 · Apr 14, 2025

This is from WITHIN the LXC

root@r730plex:~# ls -l /dev/nv*
crw-rw-rw- 1 root root 195, 0 Apr 14 16:57 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr 14 16:57 /dev/nvidiactl
---------- 1 root root 0 Apr 14 18:38 /dev/nvidia-modeset
---------- 1 root root 0 Apr 14 18:38 /dev/nvidia-uvm
---------- 1 root root 0 Apr 14 18:38 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 509, 0 Apr 14 18:38 /dev/nvidia-vgpuctl
crw------- 1 root root 10, 144 Apr 14 16:56 /dev/nvram

/dev/nvidia-caps:
total 0
cr-------- 1 root root 235, 1 Apr 14 18:39 nvidia-cap1
cr--r--r-- 1 root root 235, 2 Apr 14 18:39 nvidia-cap2
root@r730plex:~#

dcsapak · Apr 14, 2025

which driver did you use ? the vgpu one? if yes, this is the one where you cannot transcode on the host/lxcs .. you'd have to use the 'regular' nvidia driver (e.g. directly from the debian repositories)

zenowl77 · Apr 14, 2025

i have a tesla p4 too, it has been a real pain to make it work on LXCs at all. i just went with vGPU only and mostly gave up on it. but you will i believe need the standard driver (as dcsapak said) for both host and LXC. if you're using the kvm/grid drivers (which appears to be the case?) as dcsapak said, it won't work. and the command root@R730Node01:~# systemctl enable --now pve-nvidia-sriov@ALL.service will also not work on the p4, it lacks SR-IOV functionality, so no need to worry about that or worry about not having it working.

don't forget to run update-initramfs -u -k all after adding blacklist options, or the os wont see the changes i believe.

then make sure immou is enabled. edit grub nano /etc/default/grub then add intel_iommu=on for intel or iommu=pt for amd to the GRUB_CMDLINE_LINUX_DEFAULT= line, then update-grub then install the apt version from the debian repos like dcsapak said. you will want the 535 version.

Randell · Apr 14, 2025

There are issues with kernels 6.8 and newer are not supported by the nvidia provided drivers. You have to patch them to get them to build correctly. That begin said, I don't think the patches exist for the v18 drivers, just the v16 and v17.

I'm not sure I'm allowed to link for it, but search for nvidia pollo loco and find the gitlab site. It has instructions on patching. Currently the 6.14 kernel is supported but if you look at the merge requests, there is a patch waiting for merging that'll support it as well.

Edit: I believe this is just for the grid drivers and wouldn't work for LXC since you'd need the merged drivers (merging host and grid drivers). I've never done that. I just use a VM.

Edit 2: https://forum.proxmox.com/threads/pve-8-22-kernel-6-8-and-nvidia-vgpu.147039/

zenowl77 · Apr 14, 2025

Randell said:
There are issues with kernels 6.8 and newer are not supported by the nvidia provided drivers. You have to patch them to get them to build correctly. That begin said, I don't think the patches exist for the v18 drivers, just the v16 and v17.

I'm not sure I'm allowed to link for it, but search for nvidia pollo loco and find the gitlab site. It has instructions on patching. Currently the 6.14 kernel is supported but if you look at the merge requests, there is a patch waiting for merging that'll support it as well.

Edit: I believe this is just for the grid drivers and wouldn't work for LXC since you'd need the merged drivers (merging host and grid drivers). I've never done that. I just use a VM.

Edit 2: https://forum.proxmox.com/threads/pve-8-22-kernel-6-8-and-nvidia-vgpu.147039/

i believe that is only for the KVM drivers and does not apply to the open drivers in the repo? and of course doesnt apply at all if their goal is primarily for LXC on standard non kvm/grid drivers.

the 6.14 patch for the kvm is working great and GreenDam has also made a nice patch to change the P4 to a A5500 to be supported on newer drivers so you can actually use 17.5 with the 553 grid driver. (otherwise the vgpu works with a file replacement but then drivers will not install since it was still showing up as a p4 or p40) although it will also just work fine if you use 16.9 / 535 too with the 6.14 patch and no extra steps to change files, etc.

i am also pretty sure the 6.8 kernel works fine with 16.9 (i think 16.5 needed the patch for 6.8), only 6.12+ begin to fail to build because of changes in vfio and other files.

i have tried merging them and had absolutely no luck getting it to actually work with LXCs at all, all it ended up doing is losing 512mb off the total vgpu could use and while it would show up and be seen properly by the LXC every time it tried to use cuda or anything else it would error out despite everything it needed being present and i never could find any settings that would actually work. still go back and try again here and there to see if i can eventually make it work but yeah, so far it does not work.

Randell · Apr 14, 2025

I believe you're correct. I never tried merging, only saw it in passing. I've only ever used my P4 passed into a VM and not with any LXC containers

Thanks for the note about the P4 to A5500 patch. I need to check that out.

zenowl77 · Apr 14, 2025

yeah, so far in my testing it isn't worth it and just causes some problems with vGPU and that's about it. but maybe im just not getting the permissions right in my LXC settings, it is really unclear what exactly needs to be done and it doesnt seem like many have actually got it working, all the guides i have found do not work at all, lol.

in my opinion if someone really wants GPU transcoding, it seems like it would almost be easier to just setup SSHFS or something in a VM with vGPU and skip an LXC all together because the LXC just does not seem like it is developed for it at all, it is more of a hacky afterthought some people managed to make work.

you're welcome, here is the post GreenDam mentioned it in reply when i asked about how thy got their pascal device working with 17.5
https://forum.proxmox.com/threads/o...le-on-test-no-subscription.164497/post-761859
i highly recommend it. it works great and i have been using it in combination with the 6.14 patch since GreenDam sent the link to it.

Jamie888 · Apr 15, 2025

This is my current outputs, on the surface, looks ok to me? These commands are run WITHIN the Plex LXC.

However, if the general consensus is that I need to roll back to 16x drivers, then I will do it.

The Tesla P4 supports vGPU so thats why I used the KVM drivers.

Plex LXC Container
Provided by: community-scripts ORG | GitHub: https://github.com/community-scripts/ProxmoxVE

️ OS: Ubuntu - Version: 22.04
Hostname: r730plex
IP Address: 192.168.1.105
root@r730plex:~# nvidia-smi
Tue Apr 15 09:58:07 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03 Driver Version: 570.124.03 CUDA Version: N/A |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 On | 00000000:04:00.0 Off | 0 |
| N/A 58C P8 11W / 75W | 33MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
root@r730plex:~#
root@r730plex:~# ls -l /dev/nv*
crw-rw-rw- 1 root root 195, 0 Apr 14 19:22 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr 14 19:22 /dev/nvidiactl
---------- 1 root root 0 Apr 14 19:22 /dev/nvidia-modeset
---------- 1 root root 0 Apr 14 19:22 /dev/nvidia-uvm
---------- 1 root root 0 Apr 14 19:22 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 509, 0 Apr 14 19:22 /dev/nvidia-vgpuctl
crw------- 1 root root 10, 144 Apr 14 19:21 /dev/nvram

/dev/nvidia-caps:
total 0
cr-------- 1 root root 234, 1 Apr 15 09:58 nvidia-cap1
cr--r--r-- 1 root root 234, 2 Apr 15 09:58 nvidia-cap2
root@r730plex:~#

zenowl77 · Apr 15, 2025

Jamie888 said:
This is my current outputs, on the surface, looks ok to me? These commands are run WITHIN the Plex LXC.

However, if the general consensus is that I need to roll back to 16x drivers, then I will do it.

The Tesla P4 supports vGPU so thats why I used the KVM drivers.

Plex LXC Container
Provided by: community-scripts ORG | GitHub: https://github.com/community-scripts/ProxmoxVE

️ OS: Ubuntu - Version: 22.04
Hostname: r730plex
IP Address: 192.168.1.105
root@r730plex:~# nvidia-smi
Tue Apr 15 09:58:07 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03 Driver Version: 570.124.03 CUDA Version: N/A |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 On | 00000000:04:00.0 Off | 0 |
| N/A 58C P8 11W / 75W | 33MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
root@r730plex:~#
root@r730plex:~# ls -l /dev/nv*
crw-rw-rw- 1 root root 195, 0 Apr 14 19:22 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr 14 19:22 /dev/nvidiactl
---------- 1 root root 0 Apr 14 19:22 /dev/nvidia-modeset
---------- 1 root root 0 Apr 14 19:22 /dev/nvidia-uvm
---------- 1 root root 0 Apr 14 19:22 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 509, 0 Apr 14 19:22 /dev/nvidia-vgpuctl
crw------- 1 root root 10, 144 Apr 14 19:21 /dev/nvram

/dev/nvidia-caps:
total 0
cr-------- 1 root root 234, 1 Apr 15 09:58 nvidia-cap1
cr--r--r-- 1 root root 234, 2 Apr 15 09:58 nvidia-cap2
root@r730plex:~#

Have you attempted to run transcoding within the plex LXC? Usually the output looks fine but then when you go to run anything it will error out if it is actually trying to use the gpu and not cpu.

You only want the kvm for vgpu and it lacks cuda and other functions present in the standard driver which is why it is 50mb and not 300+mb

Jamie888 · Apr 15, 2025

Well, I went to rollback the host drivers, now I can't install anything again

"You appear to be running an X server; please exit X before installing."

zenowl77 · Apr 15, 2025

Jamie888 said:
Well, I went to rollback the host drivers, now I can't install anything again

"You appear to be running an X server; please exit X before installing."

I believe you need to type into the console init 3 or sudo init 3 to stop the x server if you have one installed

Tesla P4 | Cannot get drivers installed at all!!!

New Member

New Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

Member

Well-Known Member

Member

Well-Known Member

Member

New Member

Member

New Member

Member

We value your privacy