The virtual machine cannot start normally

cfsxy

New Member
Jan 31, 2025
4
0
1
I added a PCI wireless network card pass through on the virtual machine installed in PVE. After adding it, the virtual machine cannot start and the error message is“
VM 100 qmp command 'set_password' failed - unable to connect to VM 100 qmp socket - timeout after 51 retries
TASK ERROR: Failed to run vncproxy.”
 
Hi,

Could you please share the VM config and the syslog when you start the VM? You can get the VM config using `qm config <VMID>` and the syslog during start the VM you can run the `journalctl -f` command and start the VM.
 
VM config:
boot: order=sata0
cores: 2
cpu: host
hostpci0: 0000:03:00
hostpci1: 0000:07:00
machine: q35
memory: 2048
meta: creation-qemu=9.0.2,ctime=1738360121
name: immortalwrt
net0: e1000=BC:24:11:84:E2:CF,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
sata0: local-lvm:vm-100-disk-0,size=336M
scsihw: virtio-scsi-single
smbios1: uuid=3ff109ab-bc04-468e-a7b2-7ea977a9c637
sockets: 1
vmgenid: 5d4bf96b-b4d3-447b-8441-336115188653
syslog:
Feb 01 06:42:12 pve kernel: vfio-pci 0000:07:00.0: not ready 32767ms after bus reset; waiting
Feb 01 06:42:12 pve kernel: vfio-pci 0000:07:00.0: not ready 65535ms after bus reset; giving up
Feb 01 06:42:12 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Feb 01 06:40:59 pve pvestatd[988]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 51 retries
Feb 01 06:41:00 pve pvestatd[988]: status update time (8.944 seconds)
Feb 01 06:42:13 pve kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Feb 01 06:42:13 pve kernel: tap100i0 (unregistering): left allmulticast mode
Feb 01 06:42:13 pve kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Feb 01 06:42:13 pve systemd[1]: 100.scope: Deactivated successfully.
Feb 01 06:42:13 pve systemd[1]: 100.scope: Consumed 1.066s CPU time.
Feb 01 06:44:35 pve pvedaemon[8820]: start VM 100: UPID:pve:00002274:0000DB49:679D5253:qmstart:100:root@pam:
Feb 01 06:44:35 pve pvedaemon[1001]: <root@pam> starting task UPID:pve:00002274:0000DB49:679D5253:qmstart:100:root@pam:
Feb 01 06:44:35 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: timed out waiting for pending transaction; performing function level reset anyway
Feb 01 06:46:59 pve kernel: pcieport 0000:00:1d.3: broken device, retraining non-functional downstream link at 2.5GT/s
Feb 01 06:46:59 pve kernel: pcieport 0000:00:1d.3: retraining failed
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 1023ms after FLR; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 2047ms after FLR; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 4095ms after FLR; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 8191ms after FLR; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 16383ms after FLR; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 32767ms after FLR; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 65535ms after FLR; giving up
Feb 01 06:46:59 pve kernel: pcieport 0000:00:1d.3: broken device, retraining non-functional downstream link at 2.5GT/s
Feb 01 06:46:59 pve kernel: pcieport 0000:00:1d.3: retraining failed
Feb 01 06:46:59 pve kernel: pcieport 0000:00:1d.3: broken device, retraining non-functional downstream link at 2.5GT/s
Feb 01 06:46:59 pve kernel: pcieport 0000:00:1d.3: retraining failed
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 1023ms after bus reset; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 2047ms after bus reset; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 4095ms after bus reset; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 8191ms after bus reset; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 16383ms after bus reset; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 32767ms after bus reset; waiting
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: not ready 65535ms after bus reset; giving up
Feb 01 06:44:42 pve pvedaemon[1000]: <root@pam> starting task UPID:pve:000022DD:0000DDE9:679D525A:qmstart:100:root@pam:
Feb 01 06:44:42 pve pvedaemon[8925]: start VM 100: UPID:pve:000022DD:0000DDE9:679D525A:qmstart:100:root@pam:
Feb 01 06:44:52 pve pvedaemon[8925]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Feb 01 06:44:52 pve pvedaemon[1000]: <root@pam> end task UPID:pve:000022DD:0000DDE9:679D525A:qmstart:100:root@pam: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Feb 01 06:46:59 pve pvedaemon[8820]: error writing '1' to '/sys/bus/pci/devices/0000:07:00.0/reset': Inappropriate ioctl for device
Feb 01 06:46:59 pve pvedaemon[8820]: failed to reset PCI device '0000:07:00.0', but trying to continue as not all devices need a reset
Feb 01 06:46:59 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Feb 01 06:47:00 pve systemd[1]: Started 100.scope.
Feb 01 06:47:01 pve kernel: tap100i0: entered promiscuous mode
Feb 01 06:47:01 pve kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Feb 01 06:47:01 pve kernel: vmbr0: port 3(fwpr100p0) entered disabled state
Feb 01 06:47:01 pve kernel: fwln100i0 (unregistering): left allmulticast mode
Feb 01 06:47:01 pve kernel: fwln100i0 (unregistering): left promiscuous mode
Feb 01 06:47:01 pve kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Feb 01 06:47:01 pve kernel: fwpr100p0 (unregistering): left allmulticast mode
Feb 01 06:47:01 pve kernel: fwpr100p0 (unregistering): left promiscuous mode
Feb 01 06:47:01 pve kernel: vmbr0: port 3(fwpr100p0) entered disabled state
Feb 01 06:47:01 pve kernel: vmbr0: port 3(fwpr100p0) entered blocking state
Feb 01 06:47:01 pve kernel: vmbr0: port 3(fwpr100p0) entered disabled state
Feb 01 06:47:01 pve kernel: fwpr100p0: entered allmulticast mode
Feb 01 06:47:01 pve kernel: fwpr100p0: entered promiscuous mode
Feb 01 06:47:01 pve kernel: vmbr0: port 3(fwpr100p0) entered blocking state
Feb 01 06:47:01 pve kernel: vmbr0: port 3(fwpr100p0) entered forwarding state
Feb 01 06:47:01 pve kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Feb 01 06:47:01 pve kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Feb 01 06:47:01 pve kernel: fwln100i0: entered allmulticast mode
Feb 01 06:47:01 pve kernel: fwln100i0: entered promiscuous mode
Feb 01 06:47:01 pve kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Feb 01 06:47:01 pve kernel: fwbr100i0: port 1(fwln100i0) entered forwarding state
Feb 01 06:47:01 pve kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Feb 01 06:47:01 pve kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Feb 01 06:47:01 pve kernel: tap100i0: entered allmulticast mode
Feb 01 06:47:01 pve kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Feb 01 06:47:01 pve kernel: fwbr100i0: port 2(tap100i0) entered forwarding state
Feb 01 06:47:01 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: timed out waiting for pending transaction; performing function level reset anyway
Feb 01 06:49:27 pve kernel: pcieport 0000:00:1d.3: broken device, retraining non-functional downstream link at 2.5GT/s
Feb 01 06:49:27 pve kernel: pcieport 0000:00:1d.3: retraining failed
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 1023ms after FLR; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 2047ms after FLR; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 4095ms after FLR; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 8191ms after FLR; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 16383ms after FLR; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 32767ms after FLR; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 65535ms after FLR; giving up
Feb 01 06:49:27 pve kernel: pcieport 0000:00:1d.3: broken device, retraining non-functional downstream link at 2.5GT/s
Feb 01 06:49:27 pve kernel: pcieport 0000:00:1d.3: retraining failed
Feb 01 06:49:27 pve kernel: pcieport 0000:00:1d.3: broken device, retraining non-functional downstream link at 2.5GT/s
Feb 01 06:49:27 pve kernel: pcieport 0000:00:1d.3: retraining failed
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 1023ms after bus reset; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 2047ms after bus reset; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 4095ms after bus reset; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 8191ms after bus reset; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 16383ms after bus reset; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 32767ms after bus reset; waiting
Feb 01 06:49:27 pve kernel: vfio-pci 0000:07:00.0: not ready 65535ms after bus reset; giving up
Feb 01 06:47:08 pve pvedaemon[999]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 51 retries
 
In pci1: 0000:07:00 Add pcie=on, still report errors, the information is still
Feb 02 16:13:07 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Feb 02 16:13:07 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Feb 02 16:13:08 pve kernel: pcieport 0000:00:1d.3: Data Link Layer Link Active not set in 1000 msec
Feb 02 16:13:08 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Feb 02 16:13:08 pve kernel: vfio-pci 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
 
Thank you for the syslog and the VM config!

It looks like the PCI device is fail to passthrough, could you please check the IOMMU groups to see if all related devices are passed to the VM?

Code:
find /sys/kernel/iommu_groups/ -type l

I would also try to disable power saving by adding `pcie_aspm=off` to your GRUB boot option.
 
Thank you very much for your help, the IOMMU groups show that the device has been passed through.
IOMMU groups:
/sys/kernel/iommu_groups/17/devices/0000:02:00.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.0
/sys/kernel/iommu_groups/25/devices/0000:00:02.3
/sys/kernel/iommu_groups/15/devices/0000:00:1f.0
/sys/kernel/iommu_groups/15/devices/0000:00:1f.5
/sys/kernel/iommu_groups/15/devices/0000:00:1f.3
/sys/kernel/iommu_groups/15/devices/0000:00:1f.4
/sys/kernel/iommu_groups/5/devices/0000:00:16.0
/sys/kernel/iommu_groups/23/devices/0000:00:02.1
/sys/kernel/iommu_groups/13/devices/0000:00:1d.3
/sys/kernel/iommu_groups/3/devices/0000:00:14.2
/sys/kernel/iommu_groups/3/devices/0000:00:14.0
/sys/kernel/iommu_groups/21/devices/0000:06:00.0
/sys/kernel/iommu_groups/11/devices/0000:00:1d.1
/sys/kernel/iommu_groups/1/devices/0000:00:00.0
/sys/kernel/iommu_groups/18/devices/0000:03:00.0
/sys/kernel/iommu_groups/8/devices/0000:00:1c.2
/sys/kernel/iommu_groups/26/devices/0000:00:02.4
/sys/kernel/iommu_groups/16/devices/0000:01:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:1a.0
/sys/kernel/iommu_groups/24/devices/0000:00:02.2
/sys/kernel/iommu_groups/14/devices/0000:00:1e.0
/sys/kernel/iommu_groups/14/devices/0000:00:1e.3
/sys/kernel/iommu_groups/4/devices/0000:00:15.1
/sys/kernel/iommu_groups/4/devices/0000:00:15.0
/sys/kernel/iommu_groups/22/devices/0000:07:00.0
/sys/kernel/iommu_groups/12/devices/0000:00:1d.2
/sys/kernel/iommu_groups/2/devices/0000:00:0d.0
/sys/kernel/iommu_groups/20/devices/0000:05:00.0
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/0/devices/0000:00:02.0
/sys/kernel/iommu_groups/19/devices/0000:04:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:1c.6
According to the solution you provided, I edited it in grub as follows, is that okay?
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream pcie_aspm=off"
GRUB_CMDLINE_LINUX=""