Proxmox host hangs/crashes after upgrade 8.4 to 9.0 due to PCIe Passthrough

Humility7999

New Member
Aug 5, 2025
3
0
1
Hello,

First, thanks for Proxmox, it's a great software and I've been using it for a year or two now in my very simple homelab 3-node cluster. After a couple of hours of fiddling about, I've decided to write this post as i'm going insane.

I decided to upgrade from Proxmox 8.4 to 9.0 today and I've upgraded 2 out of my 3 nodes so far. I stopped after seeing one of my nodes behaving strangely (not responsive in Proxmox GUI, very slow in terminal, slow response to local ICMP pings ~30-200ms, 100% CPU load due to KVM/corosync, host complaining in journalctl about no active links).

I've narrowed it down to ASM1166 SATA-card passthrough to unRAID VM on that node (tried to pass it into Windows 11 VM with same behavior). Before the upgrade, everything was working flawlessly for at least a couple of months. Starting unRAID VM without the card assigned works fine. Below is some output that hopefully is relevant. No other changes were made before or after the upgrade as i was in middle of upgrading all three nodes.

Do you have any ideas? Thanks for all your help. Below is some info. I do not know how Ioomu grouping was looking before the upgrade.

1. pvesh get /nodes/pve2/hardware/pci
Code:
┌──────────┬────────┬──────────────┬────────────┬────────┬────────────────────────────────────────────────┬──────┬──────────────────┬
│ class    │ device │ id           │ iommugroup │ vendor │ device_name                                    │ mdev │ subsystem_device │
╞══════════╪════════╪══════════════╪════════════╪════════╪════════════════════════════════════════════════╪══════╪══════════════════╪
│ 0x010601 │ 0xa382 │ 0000:00:17.0 │          4 │ 0x8086 │ 400 Series Chipset Family SATA AHCI Controller │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x010601 │ 0x1166 │ 0000:03:00.0 │          6 │ 0x1b21 │ ASM1166 Serial ATA Controller                  │      │ 0x2116           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x020000 │ 0x0d55 │ 0000:00:1f.6 │          8 │ 0x8086 │ Ethernet Connection (12) I219-V                │      │ 0x8672           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x030000 │ 0x9bc5 │ 0000:00:02.0 │          0 │ 0x8086 │ CometLake-S GT2 [UHD Graphics 630]             │ 1    │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x058000 │ 0xa3a1 │ 0000:00:1f.2 │          8 │ 0x8086 │ Cannon Lake PCH Power Management Controller    │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060000 │ 0x9b53 │ 0000:00:00.0 │          1 │ 0x8086 │ Comet Lake-S 6c Host Bridge/DRAM Controller    │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060100 │ 0xa3c8 │ 0000:00:1f.0 │          8 │ 0x8086 │ B460 Chipset LPC/eSPI Controller               │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060400 │ 0xa3e9 │ 0000:00:1b.0 │          5 │ 0x8086 │                                                │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060400 │ 0xa392 │ 0000:00:1c.0 │          6 │ 0x8086 │                                                │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060400 │ 0xa394 │ 0000:00:1c.4 │          6 │ 0x8086 │ Comet Lake PCI Express Root Port #05           │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060400 │ 0xa398 │ 0000:00:1d.0 │          7 │ 0x8086 │ Comet Lake PCI Express Root Port 9             │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x078000 │ 0xa3ba │ 0000:00:16.0 │          3 │ 0x8086 │ Comet Lake PCH-V HECI Controller               │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x0c0330 │ 0xa3af │ 0000:00:14.0 │          2 │ 0x8086 │ Comet Lake PCH-V USB Controller                │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x0c0500 │ 0xa3a3 │ 0000:00:1f.4 │          8 │ 0x8086 │ Comet Lake PCH-V SMBus Host Controller         │      │ 0x8694           │                  
└──────────┴────────┴──────────────┴────────────┴────────┴────────────────────────────────────────────────┴──────┴──────────────────┴

2. lspci -nn
Code:
root@pve2:~# lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Comet Lake-S 6c Host Bridge/DRAM Controller [8086:9b53] (rev 05)
00:02.0 VGA compatible controller [0300]: Intel Corporation CometLake-S GT2 [UHD Graphics 630] [8086:9bc5] (rev 05)
00:14.0 USB controller [0c03]: Intel Corporation Comet Lake PCH-V USB Controller [8086:a3af]
00:16.0 Communication controller [0780]: Intel Corporation Comet Lake PCH-V HECI Controller [8086:a3ba]
00:17.0 SATA controller [0106]: Intel Corporation 400 Series Chipset Family SATA AHCI Controller [8086:a382]
00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:a3e9] (rev f0)
00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:a392] (rev f0)
00:1c.4 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #05 [8086:a394] (rev f0)
00:1d.0 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port 9 [8086:a398] (rev f0)
00:1f.0 ISA bridge [0601]: Intel Corporation B460 Chipset LPC/eSPI Controller [8086:a3c8]
00:1f.2 Memory controller [0580]: Intel Corporation Cannon Lake PCH Power Management Controller [8086:a3a1]
00:1f.4 SMBus [0c05]: Intel Corporation Comet Lake PCH-V SMBus Host Controller [8086:a3a3]
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (12) I219-V [8086:0d55]
03:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1166 Serial ATA Controller [1b21:1166] (rev 02)

3. unRAID VM config screenshot
1754424069607.png
 
Last edited:
Here are some more logs from journalctl while attempting to run the VM with PCIe card passed through. Below, I'm also passing through iGPU but that works just fine without the ASM1166. OBS: Note that below the ASM1166 got assigned 0000:04:00.0 ID, I'm trying to mess around with blacklisting but can't seem to get it to work as i just can't blacklist the full achi driver and i suspect this is not the issue anyways.

Code:
Aug 05 23:05:48 pve2 pvestatd[1735]: status update time (5.233 seconds)
Aug 05 23:05:54 pve2 pvedaemon[1796]: <root@pam> starting task UPID:pve2:0000157E:0000A6EE:68927232:qmstart:102:root@pam:
Aug 05 23:05:54 pve2 pvedaemon[5502]: start VM 102: UPID:pve2:0000157E:0000A6EE:68927232:qmstart:102:root@pam:
Aug 05 23:05:54 pve2 kernel: intel_vgpu_mdev 00000000-0000-0000-0000-000000000102: Adding to iommu group 10
Aug 05 23:05:54 pve2 kernel: sd 8:0:0:0: [sdc] Synchronizing SCSI cache
Aug 05 23:05:54 pve2 kernel: ata9.00: Entering standby power mode
Aug 05 23:05:54 pve2 kernel: sd 9:0:0:0: [sdd] Synchronizing SCSI cache
Aug 05 23:05:54 pve2 kernel: ata10.00: Entering standby power mode
Aug 05 23:05:54 pve2 kernel: sd 10:0:0:0: [sdf] Synchronizing SCSI cache
Aug 05 23:05:54 pve2 kernel: ata11.00: Entering standby power mode
Aug 05 23:05:55 pve2 kernel: sd 11:0:0:0: [sdg] Synchronizing SCSI cache
Aug 05 23:05:55 pve2 kernel: ata12.00: Entering standby power mode
Aug 05 23:05:56 pve2 kernel: vfio-pci 0000:04:00.0: resetting
Aug 05 23:05:56 pve2 kernel: vfio-pci 0000:04:00.0: reset done
Aug 05 23:05:56 pve2 systemd[1]: Started 102.scope.
Aug 05 23:05:56 pve2 kernel: tap102i0: entered promiscuous mode
Aug 05 23:05:56 pve2 kernel: vmbr0v20: port 2(fwpr102p0) entered blocking state
Aug 05 23:05:56 pve2 kernel: vmbr0v20: port 2(fwpr102p0) entered disabled state
Aug 05 23:05:56 pve2 kernel: fwpr102p0: entered allmulticast mode
Aug 05 23:05:56 pve2 kernel: fwpr102p0: entered promiscuous mode
Aug 05 23:05:56 pve2 kernel: vmbr0v20: port 2(fwpr102p0) entered blocking state
Aug 05 23:05:56 pve2 kernel: vmbr0v20: port 2(fwpr102p0) entered forwarding state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 1(fwln102i0) entered blocking state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 1(fwln102i0) entered disabled state
Aug 05 23:05:56 pve2 kernel: fwln102i0: entered allmulticast mode
Aug 05 23:05:56 pve2 kernel: fwln102i0: entered promiscuous mode
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 1(fwln102i0) entered blocking state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 1(fwln102i0) entered forwarding state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 2(tap102i0) entered blocking state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 2(tap102i0) entered disabled state
Aug 05 23:05:56 pve2 kernel: tap102i0: entered allmulticast mode
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 2(tap102i0) entered blocking state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 2(tap102i0) entered forwarding state
Aug 05 23:05:57 pve2 kernel: vfio-pci 0000:04:00.0: resetting
Aug 05 23:05:57 pve2 kernel: vfio-pci 0000:04:00.0: reset done
Aug 05 23:05:57 pve2 kernel: vfio-pci 0000:04:00.0: resetting
Aug 05 23:05:57 pve2 kernel: vfio-pci 0000:04:00.0: reset done
Aug 05 23:05:58 pve2 pvedaemon[5502]: VM 102 started with PID 5540.
Aug 05 23:05:58 pve2 pvedaemon[1796]: <root@pam> end task UPID:pve2:0000157E:0000A6EE:68927232:qmstart:102:root@pam: OK
Aug 05 23:05:59 pve2 pvestatd[1735]: storage 'unraid-proxmox-data' is not online
Aug 05 23:06:00 pve2 kernel: hrtimer: interrupt took 100297929 ns
Aug 05 23:06:03 pve2 corosync[1593]:   [KNET  ] link: host: 2 link: 0 is down
Aug 05 23:06:03 pve2 corosync[1593]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 05 23:06:03 pve2 corosync[1593]:   [KNET  ] host: host: 2 has no active links
Aug 05 23:06:05 pve2 corosync[1593]:   [KNET  ] rx: host: 2 link: 0 is up
Aug 05 23:06:05 pve2 corosync[1593]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Aug 05 23:06:07 pve2 corosync[1593]:   [TOTEM ] Token has not been received in 2858 ms
Aug 05 23:06:07 pve2 corosync[1593]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Aug 05 23:06:07 pve2 corosync[1593]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 05 23:06:07 pve2 corosync[1593]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 05 23:06:09 pve2 corosync[1593]:   [QUORUM] Sync members[3]: 1 2 3
Aug 05 23:06:09 pve2 corosync[1593]:   [TOTEM ] A new membership (1.66b) was formed. Members
Aug 05 23:06:11 pve2 corosync[1593]:   [QUORUM] Members[3]: 1 2 3
Aug 05 23:06:11 pve2 corosync[1593]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 05 23:06:12 pve2 pve-firewall[1731]: firewall update time (8.924 seconds)
Aug 05 23:06:12 pve2 pvedaemon[1796]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - got timeout
Aug 05 23:06:15 pve2 pvestatd[1735]: got timeout
Aug 05 23:06:20 pve2 pvestatd[1735]: status update time (26.782 seconds)
Aug 05 23:06:22 pve2 pve-firewall[1731]: firewall update time (9.484 seconds)
Aug 05 23:06:31 pve2 kernel: INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 50.124 msecs
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] link: host: 3 link: 0 is down
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] link: host: 2 link: 0 is down
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] host: host: 3 has no active links
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] host: host: 2 has no active links
Aug 05 23:06:33 pve2 corosync[1593]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Aug 05 23:06:34 pve2 corosync[1593]:   [TOTEM ] Token has not been received in 3059 ms
Aug 05 23:06:34 pve2 pvestatd[1735]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 32 retries
Aug 05 23:06:35 pve2 corosync[1593]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Aug 05 23:06:36 pve2 corosync[1593]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
Aug 05 23:06:41 pve2 corosync[1593]:   [TOTEM ] Process pause detected for 2106 ms, flushing membership messages.
Aug 05 23:06:41 pve2 corosync[1593]:   [MAIN  ] Corosync main process was not scheduled (@1754428000973) for 4963.6758 ms (threshold is 2920.0000 ms). Consider token timeout increase.
Aug 05 23:06:42 pve2 corosync[1593]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 05 23:06:42 pve2 pve-firewall[1731]: firewall update time (9.160 seconds)
Aug 05 23:06:43 pve2 corosync[1593]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 05 23:06:43 pve2 corosync[1593]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 05 23:06:43 pve2 corosync[1593]:   [MAIN  ] Corosync main process was not scheduled (@1754428003935) for 2958.0959 ms (threshold is 2920.0000 ms). Consider token timeout increase.
Aug 05 23:06:46 pve2 ceph-mon[1591]: 2025-08-05T23:06:46.843+0200 754f55a036c0 -1 mon.pve2@0(leader) e4 get_health_metrics reporting 2 slow ops, oldest is log(1 entries from seq 129 at 2025-08-05T23:06:09.581682+0200)
Aug 05 23:06:47 pve2 kernel: INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 50.138 msecs
Aug 05 23:06:47 pve2 pvedaemon[1794]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 20 retries
Aug 05 23:06:50 pve2 corosync[1593]:   [QUORUM] Sync members[1]: 1
Aug 05 23:06:50 pve2 corosync[1593]:   [QUORUM] Sync left[2]: 2 3
Aug 05 23:06:50 pve2 corosync[1593]:   [TOTEM ] A new membership (1.66f) was formed. Members left: 2 3
Aug 05 23:06:50 pve2 corosync[1593]:   [TOTEM ] Failed to receive the leave message. failed: 2 3
Aug 05 23:06:50 pve2 corosync[1593]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 05 23:06:50 pve2 corosync[1593]:   [QUORUM] Members[1]: 1
Aug 05 23:06:50 pve2 corosync[1593]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 05 23:06:51 pve2 pmxcfs[1440]: [status] notice: node lost quorum
Aug 05 23:06:51 pve2 pmxcfs[1440]: [dcdb] notice: members: 1/1440
Aug 05 23:06:51 pve2 pmxcfs[1440]: [status] notice: members: 1/1440
Aug 05 23:06:51 pve2 pmxcfs[1440]: [dcdb] crit: received write while not quorate - trigger resync
Aug 05 23:06:51 pve2 pmxcfs[1440]: [dcdb] crit: leaving CPG group
Aug 05 23:06:52 pve2 pve-ha-lrm[1980]: unable to write lrm status file - unable to open file '/etc/pve/nodes/pve2/lrm_status.tmp.1980' - Permission denied
Aug 05 23:06:53 pve2 pmxcfs[1440]: [dcdb] notice: start cluster connection
Aug 05 23:06:53 pve2 pmxcfs[1440]: [dcdb] crit: cpg_join failed: CS_ERR_EXIST
Aug 05 23:06:54 pve2 pmxcfs[1440]: [dcdb] crit: can't initialize service
Aug 05 23:06:57 pve2 pve-firewall[1731]: firewall update time (14.297 seconds)
Aug 05 23:06:59 pve2 kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 50.136 msecs
Aug 05 23:06:59 pve2 kernel: perf: interrupt took too long (391816 > 2500), lowering kernel.perf_event_max_sample_rate to 1000
Aug 05 23:07:02 pve2 pvestatd[1735]: got timeout
Aug 05 23:07:04 pve2 pve-firewall[1731]: firewall update time (7.358 seconds)
Aug 05 23:07:07 pve2 corosync[1593]:   [TOTEM ] Process pause detected for 3660 ms, flushing membership messages.
Aug 05 23:07:07 pve2 corosync[1593]:   [MAIN  ] Corosync main process was not scheduled (@1754428026704) for 3660.2207 ms (threshold is 2920.0000 ms). Consider token timeout increase.
Aug 05 23:07:08 pve2 corosync[1593]:   [TOTEM ] Process pause detected for 2206 ms, flushing membership messages.
Aug 05 23:07:10 pve2 pvescheduler[5845]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 05 23:07:10 pve2 pmxcfs[1440]: [dcdb] notice: members: 1/1440
Aug 05 23:07:10 pve2 pmxcfs[1440]: [dcdb] notice: all data is up to date
Aug 05 23:07:10 pve2 corosync[1593]:   [QUORUM] Sync members[1]: 1
Aug 05 23:07:10 pve2 corosync[1593]:   [TOTEM ] A new membership (1.67b) was formed. Members
Aug 05 23:07:10 pve2 corosync[1593]:   [QUORUM] Members[1]: 1
 
Last edited:
I seem to have a similar Problem. Passthrough of the Asmedia ASM1064 SATA-Controller on my Odroid H4 doesn't work anymore with the default Kernel 6.14 from PVE 9.0, or the Opt-In Kernel 6.14 on PVE 8.4. (Details also in this Thread: https://forum.proxmox.com/threads/p...el-6-14-vm-hangs-on-start.169324/#post-789570)

Other PCIe-Passthrough still works, so it seems to be a problem especially with the (Asmedia-) SATA-Controllers.
 
Same issue on my end.

Also using ASM 1166 and PCIe passthrough.

Issue has also been discussed in this thread: https://forum.proxmox.com/threads/o...le-on-test-no-subscription.164497/post-788269

I had to revert to PVE 8.4 and kernel 6.8 for now.

I hope that Proxmox devs acknowledge this issue and put it in their backlog.

I can wait as PVE 8.4 + kernel 6.8 worlds perfectly fine but it would be nice if at some point this gets investigated.
 
I'm currently using PVE9.0.5 and kernel 6.11 ASM1166 and everything works fine. The problem only occurred with kernel 6.14.



Code:
Aug 26 23:50:02 Z790 pvedaemon[1680]: VM 200 started with PID 1726.
Aug 26 23:50:02 Z790 pvedaemon[1535]: <root@pam> end task UPID:Z790:00000690:00000C3E:68ADD7A7:qmstart:200:root@pam: OK
Aug 26 23:50:02 Z790 pvedaemon[1833]: starting vnc proxy UPID:Z790:00000729:00000D4D:68ADD7AA:vncproxy:200:root@pam:
Aug 26 23:50:02 Z790 pvedaemon[1535]: <root@pam> starting task UPID:Z790:00000729:00000D4D:68ADD7AA:vncproxy:200:root@pam:
Aug 26 23:50:07 Z790 qm[1837]: VM 200 qmp command failed - VM 200 qmp command 'set_password' failed - got timeout
Aug 26 23:50:07 Z790 pvedaemon[1833]: Failed to run vncproxy.
Aug 26 23:50:07 Z790 pvedaemon[1535]: <root@pam> end task UPID:Z790:00000729:00000D4D:68ADD7AA:vncproxy:200:root@pam: Failed to run vncproxy.
Aug 26 23:50:11 Z790 pvedaemon[1534]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 51 retries
Aug 26 23:50:20 Z790 pvestatd[1507]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 51 retries
Aug 26 23:50:21 Z790 pvestatd[1507]: status update time (9.005 seconds)
Aug 26 23:50:30 Z790 pvestatd[1507]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 50 retries
Aug 26 23:50:31 Z790 pvestatd[1507]: status update time (9.130 seconds)
Aug 26 23:50:36 Z790 pvedaemon[1535]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 51 retries
 
  • Like
Reactions: MajorP93
I'm currently using PVE9.0.5 and kernel 6.11 ASM1166 and everything works fine. The problem only occurred with kernel 6.14.



Code:
Aug 26 23:50:02 Z790 pvedaemon[1680]: VM 200 started with PID 1726.
Aug 26 23:50:02 Z790 pvedaemon[1535]: <root@pam> end task UPID:Z790:00000690:00000C3E:68ADD7A7:qmstart:200:root@pam: OK
Aug 26 23:50:02 Z790 pvedaemon[1833]: starting vnc proxy UPID:Z790:00000729:00000D4D:68ADD7AA:vncproxy:200:root@pam:
Aug 26 23:50:02 Z790 pvedaemon[1535]: <root@pam> starting task UPID:Z790:00000729:00000D4D:68ADD7AA:vncproxy:200:root@pam:
Aug 26 23:50:07 Z790 qm[1837]: VM 200 qmp command failed - VM 200 qmp command 'set_password' failed - got timeout
Aug 26 23:50:07 Z790 pvedaemon[1833]: Failed to run vncproxy.
Aug 26 23:50:07 Z790 pvedaemon[1535]: <root@pam> end task UPID:Z790:00000729:00000D4D:68ADD7AA:vncproxy:200:root@pam: Failed to run vncproxy.
Aug 26 23:50:11 Z790 pvedaemon[1534]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 51 retries
Aug 26 23:50:20 Z790 pvestatd[1507]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 51 retries
Aug 26 23:50:21 Z790 pvestatd[1507]: status update time (9.005 seconds)
Aug 26 23:50:30 Z790 pvestatd[1507]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 50 retries
Aug 26 23:50:31 Z790 pvestatd[1507]: status update time (9.130 seconds)
Aug 26 23:50:36 Z790 pvedaemon[1535]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 51 retries
Yes, for me sadly exactly the same situation with kernel 6.14
 
Hello, I recently upgraded to PVE 9 and the server started freezing randomly. I'm having the same issue using kernel 6.14 and 6.8 and doing a GPU passthrough to a ubuntu VM running an LLM. Even stopped the VM, but still had a PVE crash. I've spent hours with ChatGPT trying to trace the problem and implement fixes, but no luck.

@MajorP93 mentioned reverting to PVE 8. Did that help? How do you revert?
 
Hello, I recently upgraded to PVE 9 and the server started freezing randomly. I'm having the same issue using kernel 6.14 and 6.8 and doing a GPU passthrough to a ubuntu VM running an LLM. Even stopped the VM, but still had a PVE crash. I've spent hours with ChatGPT trying to trace the problem and implement fixes, but no luck.

@MajorP93 mentioned reverting to PVE 8. Did that help? How do you revert?
Hello,
you can revert by importing backup of Proxmox.


But yeah I am running Proxmox 8 for now with kernel 6.8.

I have no issues but it's a bit frustrating not being able to upgrade.

Kernel 6.14 breaks PCIe passthrough for me.
No matter PVE 8 or PVE9.
 
@MajorP93 What type of backup of PVE were you doing? I have the /etc/pve folder and some other configs. Did you do entire PVE backup? I'm curious how that would be done.
 
I'm glad I found this thread, because I believe I'm having the exact same issue after upgrading from version 8 to 9. I will try using an older kernel when I get home later. The system hangs are so annoying because sometimes it will crash after 5 minutes, and sometimes it will be fine for like 12 hours.
I have a hard drive passed through to an openmediavault vm, and crashes seem more likely to happen when accessing those shares.
 
After canceling ROM-Bar in PVE9.0.9 and Kernel 6.14, the ASM1166 expansion card can work normally. I only tested it for two days and there was no problem.
 
I'm currently using PVE9.0.5 and kernel 6.11 ASM1166 and everything works fine. The problem only occurred with kernel 6.14.



Code:
Aug 26 23:50:02 Z790 pvedaemon[1680]: VM 200 started with PID 1726.
Aug 26 23:50:02 Z790 pvedaemon[1535]: <root@pam> end task UPID:Z790:00000690:00000C3E:68ADD7A7:qmstart:200:root@pam: OK
Aug 26 23:50:02 Z790 pvedaemon[1833]: starting vnc proxy UPID:Z790:00000729:00000D4D:68ADD7AA:vncproxy:200:root@pam:
Aug 26 23:50:02 Z790 pvedaemon[1535]: <root@pam> starting task UPID:Z790:00000729:00000D4D:68ADD7AA:vncproxy:200:root@pam:
Aug 26 23:50:07 Z790 qm[1837]: VM 200 qmp command failed - VM 200 qmp command 'set_password' failed - got timeout
Aug 26 23:50:07 Z790 pvedaemon[1833]: Failed to run vncproxy.
Aug 26 23:50:07 Z790 pvedaemon[1535]: <root@pam> end task UPID:Z790:00000729:00000D4D:68ADD7AA:vncproxy:200:root@pam: Failed to run vncproxy.
Aug 26 23:50:11 Z790 pvedaemon[1534]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 51 retries
Aug 26 23:50:20 Z790 pvestatd[1507]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 51 retries
Aug 26 23:50:21 Z790 pvestatd[1507]: status update time (9.005 seconds)
Aug 26 23:50:30 Z790 pvestatd[1507]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 50 retries
Aug 26 23:50:31 Z790 pvestatd[1507]: status update time (9.130 seconds)
Aug 26 23:50:36 Z790 pvedaemon[1535]: VM 200 qmp command failed - VM 200 qmp command 'query-proxmox-support' failed - unable to connect to VM 200 qmp socket - timeout after 51 retries
Using version 6.11.11, I only started one virtual machine with the ASM1166 passed through to run qBittorrent. After running for a few days, the host system completely crashed and froze, and the terminal could neither display nor accept any input, leaving me unable to obtain any error information. I have now switched to 6.14.