[SOLVED] Kernel Sources for Driver Module Compilation?

mattlach

Well-Known Member
Mar 23, 2016
169
17
58
Boston, MA
Hey all,

So I am looking to compile some driver modules for my proxmox box.

I've been able to install GCC and make via apt, but I can't seem to figure out the package name for the Proxmox kernel sources. Does anyone know what I need to install in order to get this up and running, so I can compile kernel modules?

Much obliged,
Matt
 
The package name is pve-headers-<KVER>, so the headers for our latest kernel are here:

ftp://download1.proxmox.com/debian/dists/jessie/pve-no-subscription/binary-amd64/pve-headers-4.4.6-1-pve_4.4.6-48_amd64.deb

You can install that with

# apt-get install pve-headers-4.4.6-1-pve
 
One short extra question in this context - is there a reason why the pve-headers aren't upgraded automatically when doing an apt-get upgrade containing a new kernel release? I always have to upgrade the pve-headers manually after a kernelupgrade and to do the recompilation of my Mellanox OFED packages. It would be very nice if you could change this behaviour or give me a hint why not. Normally one installs the headers on purpose and when installing the headers you are going to need them for later upgrades too.

Thank you very much!

Best regards, Johannes
 
I assume 99 % of the users do not need the header packages. therefore we think its better NOT installing them by default.
 
Thank you very much. Follow up question:

My plan is to have an LXC container display output directly to a physical Nvidia GPU as part of a video server displaying content.

My plan was originally to just use apt-get to install the nivida binary blob from the debian packages and use it, but it doesnt seem present in the repository for some reason. This is why I went down the path of compiling my own.

I'd much rather use the debian package if this is possible though. Is it?
 
there are no pre-compiled NVIDIA binary blobs in Debian (anymore). you need to use the -dkms packages and take care to always recompile when updating the kernel.
 
there are no pre-compiled NVIDIA binary blobs in Debian (anymore). you need to use the -dkms packages and take care to always recompile when updating the kernel.

Ahh, thank you for that. While I use Debian based distributions all day every day, it's been a while since I used Debian proper, so I didn't realize they had dropped the binary blob from their repositories. Their argument seems to be that nouveau is good enough now, but I strongly disagree, especially if you need vdpau support :p

It looks like the nvidia-driver package is still in jessie-backports though, at least according to this wiki. Maybe I will try that, though I'm not sure if it will work or complain about broken dependencies based on how Proxmox handles the kernel headers.
 
I assume 99 % of the users do not need the header packages. therefore we think its better NOT installing them by default.

You are absolutely right and that was not, what I meant. I meant IF you have already installed the headers once (and on purpose!), it would be nice to receive the corresponding header-updates automatically when upgrading the kernel to a new version. I didn't mean that they should be deployed by default on all installations out there.
 
  • Like
Reactions: mattlach
You are absolutely right and that was not, what I meant. I meant IF you have already installed the headers once (and on purpose!), it would be nice to receive the corresponding header-updates automatically when upgrading the kernel to a new version. I didn't mean that they should be deployed by default on all installations out there.

I agree this would be a good idea. It would definitely simplify things for those of us who need to use driver modules not already included with the distribution.
 
You can install that with

# apt-get install pve-headers-4.4.6-1-pve

Hmm...

So, I installed the kernel headers as suggested above, making sure they were of the same version of the kernel as is installed:
Code:
# uname -a
Linux proxmox 4.4.6-1-pve #1 SMP Thu Apr 21 11:25:40 CEST 2016 x86_64 GNU/Linux

So, both are 4.4.6-1-pve.

I ran the Nvidia binary blob driver installer, and it resulted in the following error:
Code:
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to
         build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system
         is supported by this NVIDIA Linux graphics driver release.                                                                                                                                                                        
       
         Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.

Based on this, I feel like there are likely one of two problems:

1.) The kernel was compiled on a different version of gcc than is current in the repositories. This appears to be 4.9.2:

Code:
# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.9/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.9.2-10' --with-bugurl=file:///usr/share/doc/gcc-4.9/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.9 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.9 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.9.2 (Debian 4.9.2-10)

Or,

2.) The headers in the repository differ somehow from the binary kernel in the repository.

Does anyone have any suggestions?

Much obliged,
Matt
 
The kernel and header packages are created in one go, so that should not be an issue.

Did you check that this is not the cause: "a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s)"
 
The kernel and header packages are created in one go, so that should not be an issue.

Did you check that this is not the cause: "a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s)"

Thank you for your response. I blacklisted nouveau back over a week ago when I first posted this thread. The others I have never seen:

Code:
root@proxmox:~# lsmod |grep -i riva
root@proxmox:~# lsmod |grep -i nvidia
root@proxmox:~# lsmod |grep -i nouv

Judging by the content of /proc/version the gcc version used for the running kernel is the same as the currently installed gcc as well. Both are 4.9.2 (Debian 4.9.2-10)

Code:
root@proxmox:~# cat /proc/version
Linux version 4.4.6-1-pve (root@elsa) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Thu Apr 21 11:25:40 CEST 2016

Could it be that the headers in the pve-headers-4.4.6-1-pve_4.4.6-48_amd64.deb package noted above are not a perfect match for the running binary package kernel?

Maybe compiler flags need to be the same too? Are you familiar with the build environment that the Proxmox binary kernel package is compiled in?
 
Ugh.

Never mind. I'm an idiot.

I had originally planned to do a PCI passthrough of the adapter, but decided that a container would be better. When I did that I added the nvidia adapter to stub, and forgot all about it. I'm guessing this error is happening because the device is claimed by pci-stub, not because there is an actual conflicting module or compiler/kernel/header issue.
 
Last edited:
Hmm. I've removed the devices from /etc/initramfs-tools/modules, and updated initramfs, so on next reboot they should be freed from pci-stub, but I really don't want to reboot right now.

I wonder if there is a way to release the devices from pci-stub in the running environment.
 
Ugh.

Never mind. I'm an idiot.

I had originally planned to do a PCI passthrough of the adapter, but decided that a container would be better. When I did that I added the nvidia adapter to stub, and forgot all about it. I'm guessing this error is happening because the device is claimed by pci-stub, not because there is an actual conflicting module or compiler/kernel/header issue.

Yep, that did it. Driver module now installed.

Thanks for your help!
 
So, while I got the driver to install, I ran into problems getting it to work in an LXC container.

I bind mounted /dev/nvidia0 and /dev/nvidiactl inside my container in the /dev folder (I didn't have a /dev/nvidia-uvm) and then also installed the same Nvidia driver binary inside the container but with the --no-kernel-module option, so I could get all the nvidia command line tools as recommended in this guide (which granted is for Cuda, but if it works for cuda, it should work for video output as well, right?).

Infortunately while nvidia-smi finds the video card in the host, inside the container it does not. Predictably, the x server also errors out, failing to start, but not with the error I expected. It looks like it finds the Nvidia device, but then fails due to not being able to open virtual console 7.

Code:
# cat Xorg.0.log
[  8041.629]
X.Org X Server 1.18.3
Release Date: 2016-04-04
[  8041.629] X Protocol Version 11, Revision 0
[  8041.629] Build Operating System: Linux 3.13.0-85-generic x86_64 Ubuntu
[  8041.629] Current Operating System: Linux htpc1 4.4.6-1-pve #1 SMP Thu Apr 21 11:25:40 CEST 2016 x86_64
[  8041.629] Kernel command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.4.6-1-pve root=ZFS=rpool/ROOT/pve-1 ro boot=zfs root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on
[  8041.629] Build Date: 07 April 2016  09:18:50AM
[  8041.629] xorg-server 2:1.18.3-1ubuntu2 (For technical support please see http://www.ubuntu.com/support)
[  8041.630] Current version of pixman: 0.33.6
[  8041.630]    Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
[  8041.630] Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  8041.630] (==) Log file: "/var/log/Xorg.0.log", Time: Tue May  3 01:24:55 2016
[  8041.630] (==) Using config file: "/etc/X11/xorg.conf"
[  8041.630] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  8041.631] (==) ServerLayout "Layout0"
[  8041.631] (**) |-->Screen "Screen0" (0)
[  8041.631] (**) |   |-->Monitor "Monitor0"
[  8041.631] (**) |   |-->Device "Device0"
[  8041.631] (**) |-->Input Device "Keyboard0"
[  8041.631] (**) |-->Input Device "Mouse0"
[  8041.631] (==) Automatically adding devices
[  8041.631] (==) Automatically enabling devices
[  8041.631] (==) Automatically adding GPU devices
[  8041.631] (==) Max clients allowed: 256, resource mask: 0x1fffff
[  8041.631] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[  8041.631]    Entry deleted from font path.
[  8041.631] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[  8041.631]    Entry deleted from font path.
[  8041.631] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[  8041.631]    Entry deleted from font path.
[  8041.631] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[  8041.631]    Entry deleted from font path.
[  8041.631] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[  8041.631]    Entry deleted from font path.
[  8041.631] (==) FontPath set to:
        /usr/share/fonts/X11/misc,
        /usr/share/fonts/X11/Type1,
        built-ins
[  8041.631] (==) ModulePath set to "/usr/lib/x86_64-linux-gnu/xorg/extra-modules,/usr/lib/xorg/extra-modules,/usr/lib/xorg/modules"
[  8041.631] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[  8041.631] (WW) Disabling Keyboard0
[  8041.632] (WW) Disabling Mouse0
[  8041.632] (II) Loader magic: 0x55ce00e91da0
[  8041.632] (II) Module ABI versions:
[  8041.632]    X.Org ANSI C Emulation: 0.4
[  8041.632]    X.Org Video Driver: 20.0
[  8041.632]    X.Org XInput driver : 22.1
[  8041.632]    X.Org Server Extension : 9.0
[  8041.632] (++) using VT number 7

[  8041.632] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[  8041.633] (II) xfree86: Adding drm device (/dev/dri/card0)
[  8041.633] (II) config/udev: Ignoring already known drm device (/dev/dri/card0)
[  8041.638] (--) PCI: (0:1:3:0) 102b:0532:15d9:0404 rev 10, Mem @ 0xf6800000/8388608, 0xf7ffc000/16384, 0xf8000000/8388608
[  8041.638] (--) PCI:*(0:6:0:0) 10de:1288:196e:1130 rev 161, Mem @ 0xf9000000/16777216, 0xd8000000/134217728, 0xd6000000/33554432, I/O @ 0x0000cc00/128, BIOS @ 0x????????/524288
[  8041.638] (II) LoadModule: "glx"
[  8041.639] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[  8041.644] (II) Module glx: vendor="NVIDIA Corporation"
[  8041.644]    compiled for 4.0.2, module version = 1.0.0
[  8041.644]    Module class: X.Org Server Extension
[  8041.644] (II) NVIDIA GLX Module  361.42  Tue Mar 22 17:25:45 PDT 2016
[  8041.644] (II) LoadModule: "nvidia"
[  8041.644] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[  8041.645] (II) Module nvidia: vendor="NVIDIA Corporation"
[  8041.645]    compiled for 4.0.2, module version = 1.0.0
[  8041.645]    Module class: X.Org Video Driver
[  8041.645] (II) NVIDIA dlloader X Driver  361.42  Tue Mar 22 17:04:20 PDT 2016
[  8041.645] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[  8041.645] (EE)
Fatal server error:
[  8041.645] (EE) xf86OpenConsole: Cannot open virtual console 7 (No such file or directory)
[  8041.645] (EE)
[  8041.645] (EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
[  8041.645] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[  8041.645] (EE)

Oh well, it was worth a try, as LXC containers work so much more efficiently than full VM's, but I always suspected I'd need to go with a full VM for this either way. Next try, full VM install of Ubuntu 16.04 with PCIe GPU passthrough.
 
Hey guys,

sorry for bringing this up again, but I'm still not quite happy with the fact, that the kernel-headers (when installed deliberately) are not upgraded automatically together with the kernel. In fact two days ago I stumbled upon the following error message during regular system upgrade and would like to hear your opinion on that.
I have installed the Mellanox OFED drivers for my Infiniband interface as dkms-version and as far as I understand I need always the matching headers to the actual kernel version so that the dkms-drivers work at all. From googling the message I get the info that this is no Mellanox-related message but rather a generic one which also could occur with various proprietary drivers/Virtualbox/etc. So why isn't there an automatic upgrade path for the headers-package, as it is there for all the other softwarepackages installed on a system? Does this relate to the fact that the names of all kernel(-header) packages are different for each version or is there another reason?

Cheers,
Johannes
 

Attachments

  • IMG_20160509_205206.jpg
    IMG_20160509_205206.jpg
    362.5 KB · Views: 27
Hi. I am installing the nvidia drivers for a passthrough in proxmox 6.0...the problem I am having is this ERROR message from the nvidia driver installation:


"ERROR: The kernel source path 'usr/lib/modules/5.0.15-1-pve' does not exist. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM
installed...."

But this path exists usr/lib/modules/5.0.15-1-pve
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!