Hi guys,
Been trying everything for over a month. When I go to install the nvidia drivers, I keep getting kernel build errors. Yes, I've read every guide, and tried everything. This is a last resort for me.
Looking for some serious help please.
My Environment is:
root@R730Node01:~/# gcc --version
gcc (Debian 12.2.0-14) 12.2.0
root@R730Node01:~# pveversion
pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.8.12-9-pve)
Console Outputs:
root@R730Node01:~# nvidia-smi
-bash: nvidia-smi: command not found
root@R730Node01:~# lspci -v | grep Tesla
04:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
Kernel modules: nouveau <-- Yes I know this is a error, it's seems to ignore the blacklist.
root@R730Node01:~# cat /etc/modprobe.d/blacklist.conf
blacklist nouveau
blacklist nouveau
blacklist amdgpu
blacklist radeon
blacklist nouveau
blacklist i915
blacklist nouvea
blacklist rivafb
blacklist rivatv
blacklist nouveau
options nouveau modeset=0
______________
I'm trying to install the following version: NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm
root@R730Node01:~/NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60/Host_Drivers# sudo ./NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm.run -dkms
The error is:
ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details.
Some of the log files are full of these post install fail:
ERROR: Kernel configuration is invalid.
include/generated/autoconf.h or include/config/auto.conf are missing.
Run 'make oldconfig && make prepare' on kernel src to fix it.
CC [M] /tmp/selfgz91607/NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm/kernel/nvidia/os-usermap.o
make[3]: *** [scripts/Makefile.build:243: /tmp/selfgz91607/NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm/kernel/nvidia/nv-vtophys.o] Error 1
In file included from <command-line>:
././include/linux/kconfig.h:5:10: fatal error: generated/autoconf.h: No such file or directory
5 | #include <generated/autoconf.h>
EDIT 01:
root@R730Node01:~/NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60/Host_Drivers# apt install proxmox-default-headers
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
proxmox-default-headers is already the newest version (1.1.0).
proxmox-default-headers set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@R730Node01:~/NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60/Host_Drivers#
EDIT 02:
root@R730Node01:~# apt-cache policy pve-headers-6.8.12-9-pve
pve-headers-6.8.12-9-pve:
Installed: (none)
Candidate: (none)
Version table:
root@R730Node01:~#
and
root@R730Node01:~# pveversion -v | grep kernel
proxmox-ve: 8.4.0 (running kernel: 6.8.12-9-pve)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8: 6.8.12-9
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-helper: 8.1.1
root@R730Node01:~#
EDIT 03:
hmm, not sure what I have done, the outputs are still the same, but after reinstalling the headers a few times, I randomly tried to nvidia installer again, and it worked fine.. not sure what I did.
04:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_vgpu_vfio, nvidia
root@R730Node01:~# nvidia-smi
Sun Apr 13 15:55:40 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03 Driver Version: 570.124.03 CUDA Version: N/A |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 Off | 00000000:04:00.0 Off | 0 |
| N/A 55C P0 25W / 75W | 33MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found
EDIT 04:
root@R730Node01:~# systemctl enable --now pve-nvidia-sriov@ALL.service
root@R730Node01:~# lspci -d 10de:
04:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
root@R730Node01:~#
I think I'm getting closer, but lets see..
Happy to provide any further tests or outputs.
Been trying everything for over a month. When I go to install the nvidia drivers, I keep getting kernel build errors. Yes, I've read every guide, and tried everything. This is a last resort for me.
Looking for some serious help please.
My Environment is:
root@R730Node01:~/# gcc --version
gcc (Debian 12.2.0-14) 12.2.0
root@R730Node01:~# pveversion
pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.8.12-9-pve)
Console Outputs:
root@R730Node01:~# nvidia-smi
-bash: nvidia-smi: command not found
root@R730Node01:~# lspci -v | grep Tesla
04:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
Kernel modules: nouveau <-- Yes I know this is a error, it's seems to ignore the blacklist.
root@R730Node01:~# cat /etc/modprobe.d/blacklist.conf
blacklist nouveau
blacklist nouveau
blacklist amdgpu
blacklist radeon
blacklist nouveau
blacklist i915
blacklist nouvea
blacklist rivafb
blacklist rivatv
blacklist nouveau
options nouveau modeset=0
______________
I'm trying to install the following version: NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm
root@R730Node01:~/NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60/Host_Drivers# sudo ./NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm.run -dkms
The error is:
ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details.
Some of the log files are full of these post install fail:
ERROR: Kernel configuration is invalid.
include/generated/autoconf.h or include/config/auto.conf are missing.
Run 'make oldconfig && make prepare' on kernel src to fix it.
CC [M] /tmp/selfgz91607/NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm/kernel/nvidia/os-usermap.o
make[3]: *** [scripts/Makefile.build:243: /tmp/selfgz91607/NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm/kernel/nvidia/nv-vtophys.o] Error 1
In file included from <command-line>:
././include/linux/kconfig.h:5:10: fatal error: generated/autoconf.h: No such file or directory
5 | #include <generated/autoconf.h>
EDIT 01:
root@R730Node01:~/NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60/Host_Drivers# apt install proxmox-default-headers
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
proxmox-default-headers is already the newest version (1.1.0).
proxmox-default-headers set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@R730Node01:~/NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60/Host_Drivers#
EDIT 02:
root@R730Node01:~# apt-cache policy pve-headers-6.8.12-9-pve
pve-headers-6.8.12-9-pve:
Installed: (none)
Candidate: (none)
Version table:
root@R730Node01:~#
and
root@R730Node01:~# pveversion -v | grep kernel
proxmox-ve: 8.4.0 (running kernel: 6.8.12-9-pve)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8: 6.8.12-9
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-helper: 8.1.1
root@R730Node01:~#
EDIT 03:
hmm, not sure what I have done, the outputs are still the same, but after reinstalling the headers a few times, I randomly tried to nvidia installer again, and it worked fine.. not sure what I did.
04:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_vgpu_vfio, nvidia
root@R730Node01:~# nvidia-smi
Sun Apr 13 15:55:40 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03 Driver Version: 570.124.03 CUDA Version: N/A |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 Off | 00000000:04:00.0 Off | 0 |
| N/A 55C P0 25W / 75W | 33MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found
EDIT 04:
root@R730Node01:~# systemctl enable --now pve-nvidia-sriov@ALL.service
root@R730Node01:~# lspci -d 10de:
04:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
root@R730Node01:~#
I think I'm getting closer, but lets see..
Happy to provide any further tests or outputs.
Last edited: