Unable to boot VM with NVME Drive passed through

Istalri-Dragon · Oct 4, 2022

Hello,

So I am trying to setup a Truenas VM on proxmox and want to pass through a NVME drive I have to it to use as Arc Cache. VM boots fine without the nvme drive passed to it. I have even manage to get it booted up once or twice with the NVME drive so I know its the right device passed through. Below is the error I am getting its not very helpful as you can see. I am just wondering what anyone thinks the cause is and how to get it to work consistantly like I said I have managed to get it to boot once or twice after rebooting the host but currently that isnt working and I cant be expected to reboot the host 10 times in hopes to get the VM to work.

If any more information is needed will gladly provide it. I am just at a lost as to what to do here as nothing I have tried has worked short of removing the NVME from the VM but that would prevent me from using it as a Arc cache.

Code:

TASK ERROR: start failed: command '/usr/bin/kvm -id 104 -name 'TruenasCore,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/104.pid -daemonize -smbios 'type=1,uuid=b1e3faf5-03b8-4076-ae96-b8c3053b9e76' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/104.vnc,password=on' -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 8256 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=4b70a649-9253-445c-8c61-468ada80d9c8' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'vfio-pci,host=0000:02:00.0,id=hostpci0,bus=pci.0,addr=0x10,rombar=0' -device 'vfio-pci,host=0000:06:00.0,id=hostpci1,bus=pci.0,addr=0x11,rombar=0' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:bf5b80a839c6' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-104-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap104i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=5A:58:69:DF:D8:AA,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=101' -machine 'type=pc+pve0'' failed: got timeout

leesteken · Oct 4, 2022

If there are no error messages (in journalctl) and just a time-out, then start the VM with less memory.

PCI(e) devices can do DMA to and from any part of the memory at any time. Therefore all VM memory must be pinned into actual RAM on the Proxmox host. Sometimes there is not enough continuous memory available and this typically results in a time-out. For the same reason ballooning does not work for a VM with passthrough.

Istalri-Dragon · Oct 4, 2022

So checked journalctl and got the below errors. Also tried what you said about starting with less memory but that did not change the errors I am getting. I also disabled ballooning for this VM as I rather it have this one fixed anyway. I was giving 8GB of 16 and just tried 4GB as well but same result.

Code:

Oct 04 07:43:41 PVE kernel: vfio-pci 0000:06:00.0: can't change power state from D3cold to D0 (config space inaccessible)
Oct 04 07:43:42 PVE kernel: vfio-pci 0000:06:00.0: timed out waiting for pending transaction; performing function level reset anyway
Oct 04 07:43:43 PVE kernel: vfio-pci 0000:06:00.0: not ready 1023ms after FLR; waiting
Oct 04 07:43:44 PVE kernel: vfio-pci 0000:06:00.0: not ready 2047ms after FLR; waiting
Oct 04 07:43:47 PVE kernel: vfio-pci 0000:06:00.0: not ready 4095ms after FLR; waiting
Oct 04 07:43:51 PVE kernel: vfio-pci 0000:06:00.0: not ready 8191ms after FLR; waiting
Oct 04 07:43:59 PVE kernel: vfio-pci 0000:06:00.0: not ready 16383ms after FLR; waiting

leesteken · Oct 4, 2022

Istalri-Dragon said:

Code:

Oct 04 07:43:41 PVE kernel: vfio-pci 0000:06:00.0: can't change power state from D3cold to D0 (config space inaccessible)
Oct 04 07:43:42 PVE kernel: vfio-pci 0000:06:00.0: timed out waiting for pending transaction; performing function level reset anyway
Oct 04 07:43:43 PVE kernel: vfio-pci 0000:06:00.0: not ready 1023ms after FLR; waiting
Oct 04 07:43:44 PVE kernel: vfio-pci 0000:06:00.0: not ready 2047ms after FLR; waiting
Oct 04 07:43:47 PVE kernel: vfio-pci 0000:06:00.0: not ready 4095ms after FLR; waiting
Oct 04 07:43:51 PVE kernel: vfio-pci 0000:06:00.0: not ready 8191ms after FLR; waiting
Oct 04 07:43:59 PVE kernel: vfio-pci 0000:06:00.0: not ready 16383ms after FLR; waiting

With such an error message, memory is not the problem. Device 06:00.0 does not reset properly with Function Level Reset (flr). Is this the NVMe device?
Are you using kernel 5.15 or higher? What is the output of cat '/sys/bus/pci/devices/0000:06:00.0/reset_method'? Maybe you can try another reset mechanism.

EDIT: Maybe you can attach the VM configuration file and show the IOMMU groups?

Istalri-Dragon · Oct 4, 2022

Below is the output from the command. Yes this is the NVME device in question.
I believe this is the kernel version you are referring to got it by runnning pveversion in the shell. pve-manager/7.2-11/b76d3178 (running kernel: 5.15.60-1-pve)

If not let me know how to get it and I can check for you.

I do not know how to try a different reset method but if you can link a guide or let me know will gladly try that.

Code:

root@PVE:~# cat '/sys/bus/pci/devices/0000:06:00.0/reset_method'
flr bus

leesteken · Oct 4, 2022

Istalri-Dragon said:
Code:

root@PVE:~# cat '/sys/bus/pci/devices/0000:06:00.0/reset_method' flr bus

Try if running this command (after a reboot of the host and) before starting the VM helps: echo bus >'/sys/bus/pci/devices/0000:06:00.0/reset_method'.
Also early binding the device to vfio-pci might help to prevent anything from touching it before starting the VM. And try enabling the ROM-Bar and PCI-Express options.

Istalri-Dragon · Oct 4, 2022

Running the command yielded no output but not sure if that is correct or not. Tried running it before starting the vm and got

Code:

kvm: ../hw/pci/pci.c:1541: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
TASK ERROR: start failed: QEMU exited with code 1

I can not enable PCI-Express on this VM as it says that is for machine type Q35 only and this is not.

Looking in to the early binding the device to vfio-pci now.

leesteken · Oct 4, 2022

Istalri-Dragon said:
Running the command yielded no output but not sure if that is correct or not.

That is normal. You can check if it was applied with cat '/sys/bus/pci/devices/0000:06:00.0/reset_method'.

Istalri-Dragon said:
Tried running it before starting the vm and got

Code:

kvm: ../hw/pci/pci.c:1541: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed. TASK ERROR: start failed: QEMU exited with code 1

Strange... The reset-method is not persistent so a reboot might fix this.

Istalri-Dragon said:
I can not enable PCI-Express on this VM as it says that is for machine type Q35 only and this is not.

Linux usually does not mind switching between q35 and i440fx. Did ROM-Bar change anything?

Istalri-Dragon said:
Looking in to the early binding the device to vfio-pci now.

Make sure that lspci -nnks 06:00 shows ICODE]Kernel driver in use: vfio-pci[/ICODE] (after a reboot of the host and) before starting the VM.

Istalri-Dragon · Oct 4, 2022

leesteken said:
Strange... The reset-method is not persistent so a reboot might fix this.

A Reboot did fix it. Still getting the timeout error but not getting that one after another reboot.

leesteken said:
Linux usually does not mind switching between q35 and i440fx. Did ROM-Bar change anything?

Adding ROM-Bar did not change anything. Will see if I can change system type and add PCI-E

leesteken said:
Make sure that lspci -nnks 06:00 shows ICODE]Kernel driver in use: vfio-pci[/ICODE] (after a reboot of the host and) before starting the VM.

Added options vfio-pci ids=1234:5678,4321:8765 to pve-blacklist.conf in /etc/modprobe.d based on the linked guide and still getting the got timeout error when starting the VM. Bwlow is what I got for the lspci -nn

06:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:501a] (rev ff)

Running the lspci -nnks 06:00 yielded

Code:

lspci -nnks 06:00
06:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:501a] (rev ff)
        Kernel driver in use: vfio-pci
        Kernel modules: nvme

So believe that is correct.

Trying the switching machine type now.

Istalri-Dragon · Oct 4, 2022

Still getting the same error with machine type set to q35 and pcie=1

leesteken · Oct 4, 2022

Istalri-Dragon said:
Added options vfio-pci ids=1234:5678,4321:8765 to pve-blacklist.conf in /etc/modprobe.d based on the linked guide and still getting the got timeout error when starting the VM. Bwlow is what I got for the lspci -nn

06:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:501a] (rev ff)

If does not matter to which file (or new file that ends in .conf) you add it but 1234:5678,4321:8765 is wrong (as it is only an example). It need to be options vfio-pci ids=15b7:501a so it claims all devices with that ID. And you need to run update-initramfs -u and reboot to make the changes in the /etc/modprobe.d/ directory active.

Istalri-Dragon said:
Running the lspci -nnks 06:00 yielded

Code:

lspci -nnks 06:00 06:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:501a] (rev ff) Kernel driver in use: vfio-pci Kernel modules: nvme

So believe that is correct.

Trying the switching machine type now.

But the device was automatically claimed? Are you sure you ran the command after a host reboot and before (trying to) start the VM?

leesteken · Oct 4, 2022

Istalri-Dragon said:
Still getting the same error with machine type set to q35 and pcie=1

It's very well possible that the device just doesn't work with passthrough or that you need to upgrade your motherboard BIOS. What is the make and model of your motherboard? Some Ryzen AGESA versions break passthrough with power-state errors.

Istalri-Dragon · Oct 4, 2022

leesteken said:
If does not matter to which file (or new file that ends in .conf) you add it but 1234:5678,4321:8765 is wrong (as it is only an example). It need to be options vfio-pci ids=15b7:501a so it claims all devices with that ID. And you need to run update-initramfs -u and reboot to make the changes in the /etc/modprobe.d/ directory active.

Sorry my bad this is the code in the pve-blacklist.config

Code:

  GNU nano 5.4                                                        pve-blacklist.conf                                                                 
# This file contains a list of modules which are not supported by Proxmox VE

# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
options vfio-pci ids=15b7:501a

I just copied the wrong thing in my above post. I did also run update-initramfs -u as per the guide then rebooted. Just forgot to mention it.

leesteken said:
It's very well possible that the device just doesn't work with passthrough or that you need to upgrade your motherboard BIOS. What is the make and model of your motherboard? Some Ryzen AGESA versions break passthrough with power-state errors.

It is a asus motherboard for Intel. Processor is a 6700k. https://www.asus.com/us/Motherboards-Components/Motherboards/All-series/MAXIMUS-VIII-HERO/
Is the motherboard and will check for a bios update.

leesteken said:
But the device was automatically claimed? Are you sure you ran the command after a host reboot and before (trying to) start the VM?

after another run of update-initramfs -u then another reboot getting.

Code:

06:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:501a]
        Subsystem: Sandisk Corp Device [15b7:501a]
        Kernel driver in use: nvme
        Kernel modules: nvme

leesteken · Oct 4, 2022

Even though it's strange that you get different results when doing the same thing multiple times: the file appears to be correct but the wrong driver gets loaded.
Does your system use GRUB or systemd-boot? proxmox-boot-tool refresh runs automatically during a update-initramfs -u but maybe update-grub not?

Istalri-Dragon said:
after another run of update-initramfs -u then another reboot getting.

Code:

06:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:501a] Subsystem: Sandisk Corp Device [15b7:501a] Kernel driver in use: nvme Kernel modules: nvme

That means that the device is still claimed by nvme. Try adding softdep nvme pre: vfio-pci to your file in the /etc/modprobe.d/ directory, which should make sure that vfio-pci is loaded first. And another update-initramfs -u (and maybe update-grub?) and reboot.

Istalri-Dragon · Oct 4, 2022

leesteken said:
Does your system use GRUB or systemd-boot? proxmox-boot-tool refresh runs automatically during a update-initramfs -u but maybe update-grub not?

It looks like systemd-boot which I thought it was using grub but not sure.

Code:

Boot0001* Linux Boot Manager    HD(2,GPT,8ed8d5ff-90ef-4ba5-88a0-dc56273db56b,0x800,0x100000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)

leesteken said:
That means that the device is still claimed by nvme. Try adding softdep nvme pre: vfio-pci to your file in the /etc/modprobe.d/ directory, which should make sure that vfio-pci is loaded first. And another update-initramfs -u (and maybe update-grub?) and reboot.

Just did this now got.

Code:

06:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:501a] (rev ff)
        Kernel driver in use: vfio-pci
        Kernel modules: nvme

This time. Trying to run the VM still getting the same timeout error.

Checked the journalctl and found this now.

Code:

Oct 04 09:13:53 PVE pvedaemon[2849]: start failed: command '/usr/bin/kvm -id 104 -name 'TruenasCore,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event>
Oct 04 09:13:53 PVE pvedaemon[1996]: <root@pam> end task UPID:PVE:00000B21:00001FD5:633C312B:qmstart:104:root@pam: start failed: command '/usr/bin/kvm -id 104 -name 'TruenasCore,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'sock>
Oct 04 09:13:59 PVE kernel: vfio-pci 0000:06:00.0: not ready 32767ms after FLR; waiting
Oct 04 09:14:03 PVE pvestatd[1966]: VM 104 qmp command failed - VM 104 qmp command 'query-proxmox-support' failed - unable to connect to VM 104 qmp socket - timeout after 31 retries
Oct 04 09:14:03 PVE pvestatd[1966]: status update time (6.153 seconds)
Oct 04 09:14:08 PVE pvedaemon[1995]: VM 104 qmp command failed - VM 104 qmp command 'query-proxmox-support' failed - unable to connect to VM 104 qmp socket - timeout after 31 retries
Oct 04 09:14:13 PVE pvestatd[1966]: VM 104 qmp command failed - VM 104 qmp command 'query-proxmox-support' failed - unable to connect to VM 104 qmp socket - timeout after 31 retries

leesteken · Oct 4, 2022

Istalri-Dragon said:
It looks like systemd-boot which I thought it was using grub but not sure.

Code:

Boot0001* Linux Boot Manager HD(2,GPT,8ed8d5ff-90ef-4ba5-88a0-dc56273db56b,0x800,0x100000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)

You're booting in UEFI mode but are you booting from the ZFS rpool? What is the output of cat /proc/cmdline?

Istalri-Dragon said:

Just did this now got.

Code:

06:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:501a] (rev ff)
        Kernel driver in use: vfio-pci
        Kernel modules: nvme

This is what we want for the cleanest PCIe passthrough.

Istalri-Dragon said:

Code:

Oct 04 09:13:53 PVE pvedaemon[2849]: start failed: command '/usr/bin/kvm -id 104 -name 'TruenasCore,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event>
Oct 04 09:13:53 PVE pvedaemon[1996]: <root@pam> end task UPID:PVE:00000B21:00001FD5:633C312B:qmstart:104:root@pam: start failed: command '/usr/bin/kvm -id 104 -name 'TruenasCore,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'sock>
Oct 04 09:13:59 PVE kernel: vfio-pci 0000:06:00.0: not ready 32767ms after FLR; waiting

Still the FLR problem unfortenately. Try the bus reset-method again (after a host reboot!), maybe that will work now.
Otherwise, this device just doesn't work with passthrough. You could consider undo-ing all the PCIe passthrough stuff and passthrough the device as a disk.

EDIT: Or try another M.2 slot? Maybe it's different depending on whether the PCIe lanes come from the CPU or from the chipset.

Istalri-Dragon · Oct 4, 2022

leesteken said:
You're booting in UEFI mode but are you booting from the ZFS rpool? What is the output of cat /proc/cmdline?

Apparently more or less outside of selecting what drive left it all default during install. If you think I should I can reinstall proxmox again. This is more or less the first thing I am setting up and I can throw backups of the VM on to a different server for now and restore.

This is the output.

Code:

initrd=\EFI\proxmox\5.15.60-1-pve\initrd.img-5.15.60-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on

leesteken said:
Still the FLR problem unfortenately. Try the bus reset-method again (after a host reboot!), maybe that will work now.
Otherwise, this device just doesn't work with passthrough. You could consider undo-ing all the PCIe passthrough stuff and passthrough the device as a disk.

I did try that again and getting this still

Code:

kvm: ../hw/pci/pci.c:1541: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
TASK ERROR: start failed: QEMU exited with code 1

Does passing through as a disk yield the same performance as passing it through raw.

leesteken · Oct 4, 2022

Istalri-Dragon said:
Apparently more or less outside of selecting what drive left it all default during install. If you think I should I can reinstall proxmox again. This is more or less the first thing I am setting up and I can throw backups of the VM on to a different server for now and restore.

This is the output.

Code:

initrd=\EFI\proxmox\5.15.60-1-pve\initrd.img-5.15.60-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on

You are booting in UEFI mode and from a ZFS pool, so your Proxmox does not use GRUB.

Istalri-Dragon said:
I did try that again and getting this still

Code:

kvm: ../hw/pci/pci.c:1541: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed. TASK ERROR: start failed: QEMU exited with code 1

There are other people running into this issue. See here and here.

Istalri-Dragon said:
Does passing through as a disk yield the same performance as passing it through raw.

I'm not sure you'll notice, but it has none of the drawbacks of PCIe passthrough. EDIT (twice): See this thread and this thread for example.

Istalri-Dragon · Oct 4, 2022

leesteken said:
You are booting in UEFI mode and from a ZFS pool, so your Proxmox does not use GRUB.

Do you think I would have better luck with PCI passthrough reinstalling on to the drive again and trying to use GRUB instead? I have the USB still and can backup all the VMs to external storage.

leesteken said:
There are other people running into this issue. See here and here.

Interesting. Although those look like GPU not a NVME and dont see a solution found yet.

leesteken said:
I'm not sure you'll notice, but it has none of the drawbacks of PCIe passthrough.

Will consider that rather do PCI passthrough but at this point been working on this for 3 days and only just now posted here.

leesteken · Oct 4, 2022

Istalri-Dragon said:
Do you think I would have better luck with PCI passthrough reinstalling on to the drive again and trying to use GRUB instead? I have the USB still and can backup all the VMs to external storage.

No, that will make absolutely no difference.

Istalri-Dragon said:
Interesting. Although those look like GPU not a NVME and dont see a solution found yet.

It's all just PCIe devices. Indeed no solutions yet, but a common theme hardware that does not play nice with passthrough.

Istalri-Dragon said:
Will consider that rather do PCI passthrough but at this point been working on this for 3 days and only just now posted here.

Unless I can find reports on the internet that people has success with passthrough of a device, I assume that it will not work (and buy something else). But I wish you the best of luck in this endeavor.
Here are some posts about disk passthrough: this thread and this thread. (I'm not sure if you are seeing my edits of earlier posts.)

Unable to boot VM with NVME Drive passed through

New Member

Distinguished Member

New Member

Distinguished Member

New Member

Distinguished Member

New Member

Distinguished Member

New Member

New Member

Distinguished Member

Distinguished Member

New Member

Distinguished Member

New Member

Distinguished Member

New Member

Distinguished Member

New Member

Distinguished Member

We value your privacy