I recently upgraded to Proxmox 9.2.2 and my NVIDIA drivers completely stopped working.
In the process of fixing this, post-upgrade. I found two issues.
Here is the fix that worked for me!
1. Validate your NVIDIA DKMS Module was built for your new kernel version
Check your installed driver and kernel version:
**If your DKMS status output indicates the driver is already built for your active kernel, Skip to Step 3.**
2. Rebuild the NVIDIA Module (DKMS)
Install the headers for the new kernel so DKMS can compile the driver.
Force DKMS to rebuild and install the module for the new kernel
3. Blacklist the NovaCore Driver
To prevent NovaCore from taking control of the GPU during early boot, you need to blacklist it.
If you already have a blacklist entry in one of your /etc/modprobe.d/ files for nouveau, just add the lines specific to nova/nova_core.
4. Update Initramfs and Reboot
Finally, rebuild your initial ramdisk so the blacklist applies during the boot sequence, and restart the host.
5. Validate fix
Once the host comes back online, run
In the process of fixing this, post-upgrade. I found two issues.
- DKMS failed to rebuild driver for new kernel
- I think this was my fault. I had previously installed headers for a specific kernel version using
apt install pve-headers-$(uname -r). I should have probably usedapt install pve-headers.
- I think this was my fault. I had previously installed headers for a specific kernel version using
- My GPU was hijacked by NovaCore.
- For context, Nova is the new open-source, Rust-based kernel driver meant to replace Nouveau for modern (Turing/RTX 2000+) GPUs that use GSP firmware. It seems to essentially be the new nouveau.
Here is the fix that worked for me!
1. Validate your NVIDIA DKMS Module was built for your new kernel version
Check your installed driver and kernel version:
Bash:
dkms status
uname -r
**If your DKMS status output indicates the driver is already built for your active kernel, Skip to Step 3.**
Code:
# example 1: DKMS Module built for current active kernel
root@box:~# dkms status && uname -r
nvidia/595.71.05, 7.0.2-6-pve, x86_64: installed
7.0.2-6-pve
# example 2: DKMS Module NOT built for current active kernel
root@box:~# dkms status && uname -r
nvidia/595.71.05, 6.17.13-7-pve, x86_64: installed
7.0.2-6-pve
2. Rebuild the NVIDIA Module (DKMS)
Install the headers for the new kernel so DKMS can compile the driver.
Bash:
apt update
apt install pve-headers-$(uname -r)
Force DKMS to rebuild and install the module for the new kernel
Bash:
# example for a card with nvidia driver 595.71.05 ('dkms status' output) on kernel 7.0.2-6-pve ('uname -r' output)
dkms install -m nvidia -v 595.71.05 -k 7.0.2-6-pve
3. Blacklist the NovaCore Driver
To prevent NovaCore from taking control of the GPU during early boot, you need to blacklist it.
If you already have a blacklist entry in one of your /etc/modprobe.d/ files for nouveau, just add the lines specific to nova/nova_core.
Bash:
echo "blacklist nova" >> /etc/modprobe.d/nvidia.conf
echo "blacklist nova_core" >> /etc/modprobe.d/nvidia.conf
echo "options nova modeset=0" >> /etc/modprobe.d/nvidia.conf
echo "blacklist nouveau" >> /etc/modprobe.d/nvidia.conf
echo "options nouveau modeset=0" >> /etc/modprobe.d/nvidia.conf
4. Update Initramfs and Reboot
Finally, rebuild your initial ramdisk so the blacklist applies during the boot sequence, and restart the host.
Bash:
update-initramfs -u
reboot
5. Validate fix
Once the host comes back online, run
nvidia-smi, it should be back to normal!