Unable to load the kernel module 'nvidia.ko'

Mattias Hedman

Well-Known Member
Jan 19, 2019
120
10
58
54
I followed the most excellent guide from Artofserver how to install a GPU with Proxmox, it worked just fine.
But this was a couple of weeks back when my GPU was too old... or the drivers were too old.
So I bought a GPU that is supported by the newest driver atm. Before I tried this I did run the nvidia-uninstaller script.
Swapped the GPU and tried to install the latest driver to date.
The building of the modules starts and gets interrupted with an error:
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly
configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is
present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA GPU(s), or no NVIDIA GPU installed in this system is supported by
this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more
information.

The log entry that looks interesting is:

[ 6025.889895] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting

So question is: How do I unload the conflicting kernel modules.
 
Well, first question is which driver is conflicting - check that with lspci -k. Once you've found it, create a file /etc/modprobe.d/nvidia.conf and add blacklist <driver you found>. Then rebuild the initramfs with update-initramfs -u and reboot. The driver should now be unloaded.

If the driver is nouveau, make sure the proprietary driver is installed correctly, as otherwise you might not get a display output after the reboot (network access should still work).
 
Well, first question is which driver is conflicting - check that with lspci -k. Once you've found it, create a file /etc/modprobe.d/nvidia.conf and add blacklist <driver you found>. Then rebuild the initramfs with update-initramfs -u and reboot. The driver should now be unloaded.

If the driver is nouveau, make sure the proprietary driver is installed correctly, as otherwise you might not get a display output after the reboot (network access should still work).

With lspci -k how do I see what driver is in conflict? I get a crazy long list...
The part with the GPU looks like this:
42:00.0 VGA compatible controller: NVIDIA Corporation GK107GL [Quadro 410] (rev a1)
Subsystem: Hewlett-Packard Company GK107GL [Quadro 410]
Kernel driver in use: vfio-pci

In the folder you mentioned there is a pve-blacklist.conf that contains:
blacklist nvidiafb
 
With lspci -k how do I see what driver is in conflict? I get a crazy long list...
The part with the GPU looks like this:
42:00.0 VGA compatible controller: NVIDIA Corporation GK107GL [Quadro 410] (rev a1)
Subsystem: Hewlett-Packard Company GK107GL [Quadro 410]
Kernel driver in use: vfio-pci
Well, there you go, "vfio-pci" is what's conflicting. Now, vfio-pci usually doesn't bind itself to a GPU like that on it's own... did you try to setup GPU passthrough with this card as well at some point?

Anyway, if you don't need PCIe passthrough you can probably just blacklist the entire "vfio-pci" driver as described above.

In the folder you mentioned there is a pve-blacklist.conf that contains:
blacklist nvidiafb
Ah yes, I figured you'd have already removed that as part of the installation process. And yes, you need to remove that, as it obviously blacklists the NVIDIA driver. We include that because sometimes the proprietary driver causes all sorts of virtualization issues, but if you're willing to try you can just remove it (also needs an initramfs rebuild) :)
 
Well, there you go, "vfio-pci" is what's conflicting. Now, vfio-pci usually doesn't bind itself to a GPU like that on it's own... did you try to setup GPU passthrough with this card as well at some point?

Anyway, if you don't need PCIe passthrough you can probably just blacklist the entire "vfio-pci" driver as described above.


Ah yes, I figured you'd have already removed that as part of the installation process. And yes, you need to remove that, as it obviously blacklists the NVIDIA driver. We include that because sometimes the proprietary driver causes all sorts of virtualization issues, but if you're willing to try you can just remove it (also needs an initramfs rebuild) :)
I wasn't totally clear in my first post, I did do a install with the old card according to that guide I linked, it worked perfect!
So I bought a new GPU that didnt cost me an arm and a leg but still was in the latest driver package. I did run nvidia-uninstall.
Swapped cards and tried to install and failed as above.

So I will block vfio-pci and remove the nvidia block, rebuild and reboot. Will report the result.
 
So I added this to the /etc/modprobe.d/nvidia-installer-disable.conf:
# generated by nvidia-installer
blacklist nouveau
options nouveau modeset=0
blacklist vfio-pci

Oh and I removed the file pve-blacklist.conf which I talked about earlier.

Did run a update-initramfs -u and rebooted the server.

Tried to run the NVIDIA installer and got the same error as before.
This is what I find in the lcpci -k

42:00.0 VGA compatible controller: NVIDIA Corporation GK107GL [Quadro 410] (rev a1)
Subsystem: Hewlett-Packard Company GK107GL [Quadro 410]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau

So that was happened, so what to do now?
 
You can try to unbind the 'vfio-pci' driver manually:
Code:
echo "0000:42:00.0" > /sys/bus/pci/devices/0000:42:00.0/driver/unbind
It's rather unusual for this to be necessary though...
 
  • Like
Reactions: Mattias Hedman
You can try to unbind the 'vfio-pci' driver manually:
Code:
echo "0000:42:00.0" > /sys/bus/pci/devices/0000:42:00.0/driver/unbind
It's rather unusual for this to be necessary though...
That did the trick! Now I was able to run the nvidia installer without any issues. Thank you!
 
  • Like
Reactions: Stefan_R
I'm having the same problem when trying to install the latest official driver. lspci -k shows the following

Code:
08:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] (rev a1)
    Subsystem: Micro-Star International Co., Ltd. [MSI] GP104 [GeForce GTX 1070 Ti]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau

I also have this

Code:
# cat /etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE

# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb

So my question is, should I add blacklist nouveau to /etc/modprobe.d/pve-blacklist.conf and then do an update-initramfs ? or should I add vfio-pci to it as well? I'm not sure why the GPU is using the nvidiafb module after it being blacklisted in /etc/modprobe.d/pve-blacklist.conf
 
What problem? This thread is solved - not marked as such, but the OP reported back that it's working - so I hope you tried everything that was written before.
This problem.

Code:
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics drive release.
Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.

I want to try what was written before, thus my previous question.
 
This problem.

Code:
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics drive release.
Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.

I want to try what was written before, thus my previous question.
Okay, yes. How did compile the modules? Could you please post the steps you already tried? Have you tried the version from backports?
 
Okay, thank you. So blacklist nouveau and reboot. Afterwards you should get new error messages.

Why did you go the vanilla driver route instead of using the one from bullseye-backports? That one works flawlessly for me.

I was not aware of that option. What is the package name? is it nvidia-driver/bullseye-backports ?
 
I was not aware of that option. What is the package name? is it nvidia-driver/bullseye-backports ?
The behind-the-scenes-summary is in this thread, so you need to add the repository

Code:
deb http://deb.debian.org/debian bullseye-backports main contrib non-free

and install from backports like this

Code:
apt install -t bullseye-backports nvidia-driver

(commands should work, I typed it from memory)
 
The behind-the-scenes-summary is in this thread, so you need to add the repository

Code:
deb http://deb.debian.org/debian bullseye-backports main contrib non-free

and install from backports like this

Code:
apt install -t bullseye-backports nvidia-driver

(commands should work, I typed it from memory)

Just installed the drivers and rebooted. Machine is unreachable :(. I plugged a screen and rebooted. It gets stuck here https://ibb.co/vDhWgbv. How can I troubleshoot this? Is it an issue with the graphics driver not loading properly?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!