Building custom Kernel (6.11) but unable to edit config

bugmenot

Member
Jul 20, 2020
9
0
21
34
Hey guys,

I'm trying to build a custom kernel but whatever I do I am unable to edit the config of the kernel, it always reverts to the default.

Bash:
git clone git://git.proxmox.com/git/pve-kernel.git
cd pve-kernel

nano Makefile
# I edit this line: EXTRAVERSION=-$(KREL)-pve-custom

make build-dir-fresh

#Copy custom config
cp /path/to/my/custom.config proxmox-kernel-6.11.11-1/ubuntu-kernel/.config

make deb

After this I extract the config from the .deb file and it has reverted to the original config.

Bash:
dpkg-deb -x proxmox-kernel-6.11.11-1-pve-custom_6.11.11-1_amd64.deb extracted_kernel
find extracted_kernel -name "config-*"

Any idea how to allow for a custom config? I've tried to edit the config-6.11.11.org file as well before I run the "make deb" cmd but it seems to always revert to the original config file.
 
Last edited:
So I'm setting these debian rules in debian/rules, they are reflected in proxmox-kernel-6.11.11/debian/rules after running make build-dir-fresh.
1739435908159.png

Note I added the CONFIG_MLX5_VFIO_PCI as well since I need that too.
After building the .deb files with make deb it still seems to be turned off. Even though I see the settings reflected in the make screen:

1739436442722.png

I unpack the kernel deb with this command and view it
Bash:
dpkg-deb -x proxmox-kernel-6.11.11-1-pve-custom_6.11.11-1_amd64.deb extracted_kernel
vi extracted_kernel/boot/config-6.11.11-1-pve-custom

Afterwards I view the extracted config and these settings are still default:

1739436249590.png

Any thoughts?
 
I think for some reason it takes my current kernel config and uses that as the configuration. Any ideas on how I mitigate this?
 
Kconfig is complex. in this case, you are trying to enable settings that are actually automatically configured based on other settings. do you actually need those to be builtin? usually, things being built as module (m) is okay ;)
 
What I'm trying to do is a live migration with PCI-e Passthrough. I've enabled all the modules.

Bash:
lsmod | grep vfio
mlx5_vfio_pci          49152  0
vfio_pci               16384  2
vfio_pci_core          86016  2 mlx5_vfio_pci,vfio_pci
vfio_iommu_type1       49152  2
vfio                   65536  13 mlx5_vfio_pci,vfio_pci_core,vfio_iommu_type1,vfio_pci
iommufd               102400  1 vfio
mlx5_core            2347008  3 mlx5_vfio_pci,mlx5_vdpa,mlx5_ib

Once I initiate the migration i get this error:

Bash:
qm monitor 309
Entering QEMU Monitor for VM 309 - type 'help' for help
qm> migrate -d tcp:10.0.123.112:5901
Error: 0000:81:00.5: VFIO migration is not supported in kernel

VFIO migration is not supported in kernel. Therefore I'm trying to compile a kernel with VFIO enabled by default.
 
Last edited:
whether the VFIO driver is compiled as a module or builtin doesn't make a difference for that. as far as I can tell, that error means that your driver (or device) doesn't support tracking its state in a fashion that would allow live migration..
 
what is the device you're trying to live migrate? how does your qemu config (qm config ID) look like?

there are a bit more things to do for live migration than just the kernel module. the device has to be marked as migratable (normally) but we don't do that for pci devices.
also, sometimes it's necessary that e.g. virtual functions need to retain the correct driver, but by default we rebind the device to the generic 'vfio-pci' driver (which might not work to migrate)

e.g. i'm currently working on nvidia vgpu live migration integration: https://lore.proxmox.com/pve-devel/4fc9a8ef-f263-4906-bf39-3c7561c2a653@proxmox.com/T/#t
and there i add 'enable-migration=on' to the hostpci device, and leave the driver that nvidia loads onto the virtual functions
 
I'm trying to migrate a Mellanox NIC (one that supports live migration).

I'm a bit further now. Even though the modules were loaded I needed to bind the mlx5_vfio_pci drivers to the card. This allows me to successfully run the command on the source machine.

Bash:
echo '0000:81:00.5' > /sys/bus/pci/drivers/vfio-pci/unbind
echo 'mlx5_vfio_pci' > /sys/bus/pci/devices/0000:81:00.5/driver_override
echo '0000:81:00.5' > /sys/bus/pci/drivers/mlx5_vfio_pci/bind

This allows me to start the migration:

Bash:
root@sourcemachine:~# qm monitor 304
Entering QEMU Monitor for VM 304 - type 'help' for help
qm> migrate_set_capability return-path on
qm> migrate_set_capability switchover-ack on
qm> migrate -d tcp:10.0.123.123:5900
qm> info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: active
total time: 294917 ms
expected downtime: 0 ms
setup: 2 ms
transferred ram: 4203947 kbytes
throughput: 70.78 mbps
remaining ram: 0 kbytes
total ram: 1065544 kbytes
duplicate: 63375 pages
skipped: 0 pages
normal: 1023765 pages
normal bytes: 4095060 kbytes
dirty sync count: 5705415
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 2070
dirty pages rate: 2280 pages
precopy ram: 4148229 kbytes

But even after binding the driver on the destination machine it seems to stall on the destination machine

Bash:
root@destinationmachine:~# qemu-system-x86_64 -machine pc-q35-9.0 -device vfio-pci,host=0000:81:00.5,id=mlx5_1 -incoming tcp:10.0.123.123:5900 -m 1G -monitor stdio

QEMU 9.0.2 monitor - type 'help' for more information
VNC server running on 127.0.0.1:5900

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Outgoing migration blocked:
  0000:81:00.5: VFIO migration is not supported in kernel
Migration status: setup
total time: 0 ms
socket address: [
        tcp:10.0.123.123:5900
]

Seems I'm still getting this error after binding the mlx5-vfio-pci driver:
Bash:
Outgoing migration blocked:
  0000:81:00.5: VFIO migration is not supported in kernel

So for some reason, the source machine is now fine, not throwing any errors and even stating that information is being sent to the destination machine, but the destination machine is still blocked due to this error, even though both VF's are using the same drivers on both sides.

Edit: Both machines are running the exact same kernel, with the same enabled VFIO modules and the same NIC firmware. The NIC's are the exact same type in the same model machine, same PCI-E port in the motherboard.

If I switch the commands on the server and do a migration to the other machine the same error pops up, always on the receiving side.
 
Last edited:
how do you start the vm on the source? if you do it with our stack, we rebind pci devices to vfio-pci (or try at least) so the binding on mlx5_vfio_pci is probably not working right
AFAIK the preconditions to get this working would be:

* have the devices loaded the correct driver (mlx5_vfio_pci)
* start the vm with vfio-pci device with 'enable-migration=on' and don't reset to vfio-pci (like we do automatically)
* same on the target machine

is there any documentation for the card for live migration with qemu/kvm?
(btw which nic are you using? would be interesting for us to have a similar one in our testlab for testing this)
 
Hi Dominik,

I am using the BF3 DPU, the documentation from Nvidia is found here:

https://docs.nvidia.com/doca/sdk/sr-iov+live+migration/index.html

These are Virtual Functions bound to a mapped device, that we are trying to migrate.

My understanding, to use the enable-migration=on flag in our config I will need to patch some Proxmox files? How would I go about that?

Edit: I've just successfully done a migration with a raw device, but it seems Proxmox used the vfio-pci driver when migrating over so it lost the card after the migration even though the machine stayed up. This was without applying any patches.
 
Last edited:
I've used the following command to get the Proxmox CMD for starting the VM

Bash:
qm showcmd <vmid> --pretty

This returns a nice overview for the qemu cmdline for this VM:

Bash:
/usr/bin/kvm \
  -id 304 \
  -name 'Test-Migration,debug-threads=on' \
  -no-shutdown \
  -chardev 'socket,id=qmp,path=/var/run/qemu-server/304.qmp,server=on,wait=off' \
  -mon 'chardev=qmp,mode=control' \
  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  -mon 'chardev=qmp-event,mode=control' \
  -pidfile /var/run/qemu-server/304.pid \
  -daemonize \
  -smbios 'type=1,uuid=2a5f3bd1-7e50-467a-a637-9888f1059a80' \
  -smp '60,sockets=1,cores=60,maxcpus=60' \
  -nodefaults \
  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
  -vnc 'unix:/var/run/qemu-server/304.vnc,password=on' \
  -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt \
  -m 1024 \
  -object 'iothread,id=iothread-virtioscsi0' \
  -device 'intel-iommu,intremap=on,caching-mode=on' \
  -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
  -device 'vmgenid,guid=bd41a715-5f67-4161-9fb1-ea486dda0c9f' \
  -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
  -device 'vfio-pci,host=0000:81:00.0,id=hostpci0,bus=pci.0,addr=0x10,enable-migration=on' \
  -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:26fe41f2f63d' \
  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' \
  -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0' \
  -drive 'file=/dev/zvol/zfs-pool/vm-304-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' \
  -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
  -machine 'type=q35+pve0,kernel-irqchip=split'

The problem is, with the correct driver (or any of them, I've tried vfio-pci, mlx5_vfio_pci or mlx5_core) loaded it will still throw an error while trying to start the machine:

Bash:
kvm: -device vfio-pci,host=0000:81:01.0,id=hostpci0,bus=pci.0,addr=0x10,enable-migration=on: vfio 0000:81:01.0: 0000:81:01.0: VFIO migration is not supported in kernel.
 
Last edited:
The problem is, with the correct driver (or any of them, I've tried vfio-pci, mlx5_vfio_pci or mlx5_core) loaded it will still throw an error while trying to start the machine:
just to be perfectly sure, do you still start the vm with our stack or via this commandline?

can you maybe post the output of
Code:
lspci -nnkv
before you start it?

and the output of 'dmesg' after?
 
Hi Dominik,

After binding the mlx5_vfio_pci driver to this Virtual Function:

Bash:
81:00.4 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e] (rev 01)
        Subsystem: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:0009]
        Flags: fast devsel, NUMA node 0, IOMMU group 68
        Memory at 44440400000 (64-bit, prefetchable) [virtual] [size=2M]
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [9c] MSI-X: Enable- Count=12 Masked-
        Capabilities: [100] Vendor Specific Information: ID=0000 Rev=0 Len=00c <?>
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Kernel driver in use: mlx5_vfio_pci
        Kernel modules: mlx5_core

When trying to start it I get this error

Bash:
kvm: -device vfio-pci,host=0000:81:00.4,id=hostpci0,bus=pci.0,addr=0x10,enable-migration=on: vfio 0000:81:00.4: 0000:81:00.4: VFIO migration is not supported in kernel

If I bind it again to vfio-pci I get the same error:
Bash:
81:00.4 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e] (rev 01)
        Subsystem: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:0009]
        Flags: fast devsel, NUMA node 0, IOMMU group 68
        Memory at 44440400000 (64-bit, prefetchable) [virtual] [size=2M]
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [9c] MSI-X: Enable- Count=12 Masked-
        Capabilities: [100] Vendor Specific Information: ID=0000 Rev=0 Len=00c <?>
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Kernel driver in use: vfio-pci
        Kernel modules: mlx5_core

Bash:
kvm: -device vfio-pci,host=0000:81:00.4,id=hostpci0,bus=pci.0,addr=0x10,enable-migration=on: vfio 0000:81:00.4: 0000:81:00.4: VFIO migration is not supported in kernel

I'm using this cmd to try and get the machine up:

Bash:
/usr/bin/kvm \
  -id 304 \
  -name 'Test-Migration,debug-threads=on' \
  -no-shutdown \
  -chardev 'socket,id=qmp,path=/var/run/qemu-server/304.qmp,server=on,wait=off' \
  -mon 'chardev=qmp,mode=control' \
  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  -mon 'chardev=qmp-event,mode=control' \
  -pidfile /var/run/qemu-server/304.pid \
  -daemonize \
  -smbios 'type=1,uuid=2a5f3bd1-7e50-467a-a637-9888f1059a80' \
  -smp '60,sockets=1,cores=60,maxcpus=60' \
  -nodefaults \
  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
  -vnc 'unix:/var/run/qemu-server/304.vnc,password=on' \
  -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt \
  -m 1024 \
  -object 'iothread,id=iothread-virtioscsi0' \
  -device 'intel-iommu,intremap=on,caching-mode=on' \
  -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
  -device 'vmgenid,guid=bd41a715-5f67-4161-9fb1-ea486dda0c9f' \
  -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
  -device 'vfio-pci,host=0000:81:00.4,id=hostpci0,bus=pci.0,addr=0x10,enable-migration=on' \
  -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:26fe41f2f63d' \
  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' \
  -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0' \
  -drive 'file=/dev/zvol/zfs-pool/vm-304-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' \
  -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
  -machine 'type=q35+pve0,kernel-irqchip=split'

dmesg wise I found this regarding the VF:
Bash:
root@sourcemachine:~# dmesg | grep 0000:81:00.4
[  622.571704] pci 0000:81:00.4: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[  622.574976] pci 0000:81:00.4: enabling Extended Tags
[  622.579561] pci 0000:81:00.4: Adding to iommu group 68
[  622.589345] mlx5_core 0000:81:00.4: enabling device (0000 -> 0002)
[  622.591513] mlx5_core 0000:81:00.4: firmware version: 32.43.2026
[  622.851121] mlx5_core 0000:81:00.4: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps
[  622.898365] mlx5_core 0000:81:00.4: Assigned random MAC address 5e:91:80:62:ec:54
[  623.089185] mlx5_core 0000:81:00.4: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  623.140523] mlx5_core 0000:81:00.4 enp129s0f0v0: renamed from eth0
[ 1545.513055] pci 0000:81:00.4: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[ 1545.517672] pci 0000:81:00.4: enabling Extended Tags
[ 1545.529435] pci 0000:81:00.4: Adding to iommu group 68
[ 1545.570088] mlx5_core 0000:81:00.4: enabling device (0000 -> 0002)
[ 1545.573814] mlx5_core 0000:81:00.4: firmware version: 32.43.2026
[ 1545.834510] mlx5_core 0000:81:00.4: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps
[ 1545.875214] mlx5_core 0000:81:00.4: Assigned random MAC address f2:ba:b8:c2:df:71
[ 1546.067070] mlx5_core 0000:81:00.4: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[ 1546.099708] mlx5_core 0000:81:00.4 enp129s0f0v0: renamed from eth0
[ 1546.781669] mlx5_core 0000:81:00.4 enp129s0f0v0: Link up
[ 1655.789902] mlx5_vfio_pci 0000:81:00.4: enabling device (0000 -> 0002)
[ 1935.131136] vfio-pci 0000:81:00.4: enabling device (0000 -> 0002)

You can see in the last line that I bound it to mlx5_vfio_pci, then tried to get the machine up but failed, and went back to vfio-pci. It seems the enable-migration=on gets blocked no matter what driver is used.

QEMU version:
Bash:
QEMU emulator version 9.0.2 (pve-qemu-kvm_9.0.2-5)
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
 
mhmm, this looks like somewhere the live migration has to be enabled still (sorry i can't test it here since we don't have that hardware in our lab)

did you enable the live migration on the kernel module parameter and with 'devlink' ?

see
https://docs.nvidia.com/doca/sdk/sr...597609_id-.SRIOVLiveMigrationv2.10.0-NVCONFIG
and
https://docs.nvidia.com/doca/sdk/sr...d-.SRIOVLiveMigrationv2.10.0-DirectlyoverQEMU

EDIT: also it would be great to know what you use case for the DPU + live migration is (just my curiosity)
 
Last edited: