[SOLVED] Kernel Sources for Driver Module Compilation?

mattlach · Apr 23, 2016

Hey all,

So I am looking to compile some driver modules for my proxmox box.

I've been able to install GCC and make via apt, but I can't seem to figure out the package name for the Proxmox kernel sources. Does anyone know what I need to install in order to get this up and running, so I can compile kernel modules?

Much obliged,
Matt

dietmar · Apr 24, 2016

The package name is pve-headers-<KVER>, so the headers for our latest kernel are here:

ftp://download1.proxmox.com/debian/dists/jessie/pve-no-subscription/binary-amd64/pve-headers-4.4.6-1-pve_4.4.6-48_amd64.deb

You can install that with

# apt-get install pve-headers-4.4.6-1-pve

CBdVSdFSMB · Apr 24, 2016

One short extra question in this context - is there a reason why the pve-headers aren't upgraded automatically when doing an apt-get upgrade containing a new kernel release? I always have to upgrade the pve-headers manually after a kernelupgrade and to do the recompilation of my Mellanox OFED packages. It would be very nice if you could change this behaviour or give me a hint why not. Normally one installs the headers on purpose and when installing the headers you are going to need them for later upgrades too.

Thank you very much!

Best regards, Johannes

tom · Apr 24, 2016

I assume 99 % of the users do not need the header packages. therefore we think its better NOT installing them by default.

mattlach · Apr 24, 2016

Thank you very much. Follow up question:

My plan is to have an LXC container display output directly to a physical Nvidia GPU as part of a video server displaying content.

My plan was originally to just use apt-get to install the nivida binary blob from the debian packages and use it, but it doesnt seem present in the repository for some reason. This is why I went down the path of compiling my own.

I'd much rather use the debian package if this is possible though. Is it?

fabian · Apr 25, 2016

there are no pre-compiled NVIDIA binary blobs in Debian (anymore). you need to use the -dkms packages and take care to always recompile when updating the kernel.

mattlach · Apr 25, 2016

fabian said:
there are no pre-compiled NVIDIA binary blobs in Debian (anymore). you need to use the -dkms packages and take care to always recompile when updating the kernel.

Ahh, thank you for that. While I use Debian based distributions all day every day, it's been a while since I used Debian proper, so I didn't realize they had dropped the binary blob from their repositories. Their argument seems to be that nouveau is good enough now, but I strongly disagree, especially if you need vdpau support

It looks like the nvidia-driver package is still in jessie-backports though, at least according to this wiki. Maybe I will try that, though I'm not sure if it will work or complain about broken dependencies based on how Proxmox handles the kernel headers.

CBdVSdFSMB · Apr 27, 2016

tom said:
I assume 99 % of the users do not need the header packages. therefore we think its better NOT installing them by default.

You are absolutely right and that was not, what I meant. I meant IF you have already installed the headers once (and on purpose!), it would be nice to receive the corresponding header-updates automatically when upgrading the kernel to a new version. I didn't mean that they should be deployed by default on all installations out there.

mattlach · Apr 27, 2016

CBdVSdFSMB said:
You are absolutely right and that was not, what I meant. I meant IF you have already installed the headers once (and on purpose!), it would be nice to receive the corresponding header-updates automatically when upgrading the kernel to a new version. I didn't mean that they should be deployed by default on all installations out there.

I agree this would be a good idea. It would definitely simplify things for those of us who need to use driver modules not already included with the distribution.

mattlach · May 1, 2016

dietmar said:
You can install that with

# apt-get install pve-headers-4.4.6-1-pve

Hmm...

So, I installed the kernel headers as suggested above, making sure they were of the same version of the kernel as is installed:

Code:

# uname -a
Linux proxmox 4.4.6-1-pve #1 SMP Thu Apr 21 11:25:40 CEST 2016 x86_64 GNU/Linux

So, both are 4.4.6-1-pve.

I ran the Nvidia binary blob driver installer, and it resulted in the following error:

Code:

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to
         build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system
         is supported by this NVIDIA Linux graphics driver release.                                                                                                                                                                        
       
         Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.

Based on this, I feel like there are likely one of two problems:

1.) The kernel was compiled on a different version of gcc than is current in the repositories. This appears to be 4.9.2:

Code:

# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.9/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.9.2-10' --with-bugurl=file:///usr/share/doc/gcc-4.9/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.9 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.9 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.9.2 (Debian 4.9.2-10)

Or,

2.) The headers in the repository differ somehow from the binary kernel in the repository.

Does anyone have any suggestions?

Much obliged,
Matt

fabian · May 2, 2016

The kernel and header packages are created in one go, so that should not be an issue.

Did you check that this is not the cause: "a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s)"

mattlach · May 2, 2016

fabian said:
The kernel and header packages are created in one go, so that should not be an issue.

Did you check that this is not the cause: "a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s)"

Thank you for your response. I blacklisted nouveau back over a week ago when I first posted this thread. The others I have never seen:

Code:

root@proxmox:~# lsmod |grep -i riva
root@proxmox:~# lsmod |grep -i nvidia
root@proxmox:~# lsmod |grep -i nouv

Judging by the content of /proc/version the gcc version used for the running kernel is the same as the currently installed gcc as well. Both are 4.9.2 (Debian 4.9.2-10)

Code:

root@proxmox:~# cat /proc/version
Linux version 4.4.6-1-pve (root@elsa) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Thu Apr 21 11:25:40 CEST 2016

Could it be that the headers in the pve-headers-4.4.6-1-pve_4.4.6-48_amd64.deb package noted above are not a perfect match for the running binary package kernel?

Maybe compiler flags need to be the same too? Are you familiar with the build environment that the Proxmox binary kernel package is compiled in?

mattlach · May 2, 2016

Ugh.

Never mind. I'm an idiot.

I had originally planned to do a PCI passthrough of the adapter, but decided that a container would be better. When I did that I added the nvidia adapter to stub, and forgot all about it. I'm guessing this error is happening because the device is claimed by pci-stub, not because there is an actual conflicting module or compiler/kernel/header issue.

mattlach · May 2, 2016

Hmm. I've removed the devices from /etc/initramfs-tools/modules, and updated initramfs, so on next reboot they should be freed from pci-stub, but I really don't want to reboot right now.

I wonder if there is a way to release the devices from pci-stub in the running environment.

mattlach · May 3, 2016

mattlach said:
Ugh.

Never mind. I'm an idiot.

I had originally planned to do a PCI passthrough of the adapter, but decided that a container would be better. When I did that I added the nvidia adapter to stub, and forgot all about it. I'm guessing this error is happening because the device is claimed by pci-stub, not because there is an actual conflicting module or compiler/kernel/header issue.

Yep, that did it. Driver module now installed.

Thanks for your help!

mattlach · May 3, 2016

So, while I got the driver to install, I ran into problems getting it to work in an LXC container.

I bind mounted /dev/nvidia0 and /dev/nvidiactl inside my container in the /dev folder (I didn't have a /dev/nvidia-uvm) and then also installed the same Nvidia driver binary inside the container but with the --no-kernel-module option, so I could get all the nvidia command line tools as recommended in this guide (which granted is for Cuda, but if it works for cuda, it should work for video output as well, right?).

Infortunately while nvidia-smi finds the video card in the host, inside the container it does not. Predictably, the x server also errors out, failing to start, but not with the error I expected. It looks like it finds the Nvidia device, but then fails due to not being able to open virtual console 7.

Code:

# cat Xorg.0.log
[  8041.629]
X.Org X Server 1.18.3
Release Date: 2016-04-04
[  8041.629] X Protocol Version 11, Revision 0
[  8041.629] Build Operating System: Linux 3.13.0-85-generic x86_64 Ubuntu
[  8041.629] Current Operating System: Linux htpc1 4.4.6-1-pve #1 SMP Thu Apr 21 11:25:40 CEST 2016 x86_64
[  8041.629] Kernel command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.4.6-1-pve root=ZFS=rpool/ROOT/pve-1 ro boot=zfs root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on
[  8041.629] Build Date: 07 April 2016  09:18:50AM
[  8041.629] xorg-server 2:1.18.3-1ubuntu2 (For technical support please see http://www.ubuntu.com/support)
[  8041.630] Current version of pixman: 0.33.6
[  8041.630]    Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
[  8041.630] Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  8041.630] (==) Log file: "/var/log/Xorg.0.log", Time: Tue May  3 01:24:55 2016
[  8041.630] (==) Using config file: "/etc/X11/xorg.conf"
[  8041.630] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  8041.631] (==) ServerLayout "Layout0"
[  8041.631] (**) |-->Screen "Screen0" (0)
[  8041.631] (**) |   |-->Monitor "Monitor0"
[  8041.631] (**) |   |-->Device "Device0"
[  8041.631] (**) |-->Input Device "Keyboard0"
[  8041.631] (**) |-->Input Device "Mouse0"
[  8041.631] (==) Automatically adding devices
[  8041.631] (==) Automatically enabling devices
[  8041.631] (==) Automatically adding GPU devices
[  8041.631] (==) Max clients allowed: 256, resource mask: 0x1fffff
[  8041.631] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[  8041.631]    Entry deleted from font path.
[  8041.631] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[  8041.631]    Entry deleted from font path.
[  8041.631] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[  8041.631]    Entry deleted from font path.
[  8041.631] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[  8041.631]    Entry deleted from font path.
[  8041.631] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[  8041.631]    Entry deleted from font path.
[  8041.631] (==) FontPath set to:
        /usr/share/fonts/X11/misc,
        /usr/share/fonts/X11/Type1,
        built-ins
[  8041.631] (==) ModulePath set to "/usr/lib/x86_64-linux-gnu/xorg/extra-modules,/usr/lib/xorg/extra-modules,/usr/lib/xorg/modules"
[  8041.631] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[  8041.631] (WW) Disabling Keyboard0
[  8041.632] (WW) Disabling Mouse0
[  8041.632] (II) Loader magic: 0x55ce00e91da0
[  8041.632] (II) Module ABI versions:
[  8041.632]    X.Org ANSI C Emulation: 0.4
[  8041.632]    X.Org Video Driver: 20.0
[  8041.632]    X.Org XInput driver : 22.1
[  8041.632]    X.Org Server Extension : 9.0
[  8041.632] (++) using VT number 7

[  8041.632] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[  8041.633] (II) xfree86: Adding drm device (/dev/dri/card0)
[  8041.633] (II) config/udev: Ignoring already known drm device (/dev/dri/card0)
[  8041.638] (--) PCI: (0:1:3:0) 102b:0532:15d9:0404 rev 10, Mem @ 0xf6800000/8388608, 0xf7ffc000/16384, 0xf8000000/8388608
[  8041.638] (--) PCI:*(0:6:0:0) 10de:1288:196e:1130 rev 161, Mem @ 0xf9000000/16777216, 0xd8000000/134217728, 0xd6000000/33554432, I/O @ 0x0000cc00/128, BIOS @ 0x????????/524288
[  8041.638] (II) LoadModule: "glx"
[  8041.639] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[  8041.644] (II) Module glx: vendor="NVIDIA Corporation"
[  8041.644]    compiled for 4.0.2, module version = 1.0.0
[  8041.644]    Module class: X.Org Server Extension
[  8041.644] (II) NVIDIA GLX Module  361.42  Tue Mar 22 17:25:45 PDT 2016
[  8041.644] (II) LoadModule: "nvidia"
[  8041.644] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[  8041.645] (II) Module nvidia: vendor="NVIDIA Corporation"
[  8041.645]    compiled for 4.0.2, module version = 1.0.0
[  8041.645]    Module class: X.Org Video Driver
[  8041.645] (II) NVIDIA dlloader X Driver  361.42  Tue Mar 22 17:04:20 PDT 2016
[  8041.645] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[  8041.645] (EE)
Fatal server error:
[  8041.645] (EE) xf86OpenConsole: Cannot open virtual console 7 (No such file or directory)
[  8041.645] (EE)
[  8041.645] (EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
[  8041.645] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[  8041.645] (EE)

Oh well, it was worth a try, as LXC containers work so much more efficiently than full VM's, but I always suspected I'd need to go with a full VM for this either way. Next try, full VM install of Ubuntu 16.04 with PCIe GPU passthrough.

CBdVSdFSMB · May 11, 2016

Hey guys,

sorry for bringing this up again, but I'm still not quite happy with the fact, that the kernel-headers (when installed deliberately) are not upgraded automatically together with the kernel. In fact two days ago I stumbled upon the following error message during regular system upgrade and would like to hear your opinion on that.
I have installed the Mellanox OFED drivers for my Infiniband interface as dkms-version and as far as I understand I need always the matching headers to the actual kernel version so that the dkms-drivers work at all. From googling the message I get the info that this is no Mellanox-related message but rather a generic one which also could occur with various proprietary drivers/Virtualbox/etc. So why isn't there an automatic upgrade path for the headers-package, as it is there for all the other softwarepackages installed on a system? Does this relate to the fact that the names of all kernel(-header) packages are different for each version or is there another reason?

Cheers,
Johannes

Reuven · Nov 1, 2019

Hi. I am installing the nvidia drivers for a passthrough in proxmox 6.0...the problem I am having is this ERROR message from the nvidia driver installation:

"ERROR: The kernel source path 'usr/lib/modules/5.0.15-1-pve' does not exist. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM
installed...."

But this path exists usr/lib/modules/5.0.15-1-pve

Search

Search

[SOLVED] Kernel Sources for Driver Module Compilation?

mattlach

Well-Known Member

dietmar

Proxmox Staff Member

CBdVSdFSMB

Renowned Member

tom

Proxmox Staff Member

mattlach

Well-Known Member

fabian

Proxmox Staff Member

mattlach

Well-Known Member

CBdVSdFSMB

Renowned Member

mattlach

Well-Known Member

mattlach

Well-Known Member

fabian

Proxmox Staff Member

mattlach

Well-Known Member

mattlach

Well-Known Member

mattlach

Well-Known Member

mattlach

Well-Known Member

mattlach

Well-Known Member

CBdVSdFSMB

Renowned Member

Attachments

Reuven

Member