Proxmox host hangs/crashes after upgrade 8.4 to 9.0 due to PCIe Passthrough

Humility7999

New Member
Aug 5, 2025
3
0
1
Hello,

First, thanks for Proxmox, it's a great software and I've been using it for a year or two now in my very simple homelab 3-node cluster. After a couple of hours of fiddling about, I've decided to write this post as i'm going insane.

I decided to upgrade from Proxmox 8.4 to 9.0 today and I've upgraded 2 out of my 3 nodes so far. I stopped after seeing one of my nodes behaving strangely (not responsive in Proxmox GUI, very slow in terminal, slow response to local ICMP pings ~30-200ms, 100% CPU load due to KVM/corosync, host complaining in journalctl about no active links).

I've narrowed it down to ASM1166 SATA-card passthrough to unRAID VM on that node (tried to pass it into Windows 11 VM with same behavior). Before the upgrade, everything was working flawlessly for at least a couple of months. Starting unRAID VM without the card assigned works fine. Below is some output that hopefully is relevant. No other changes were made before or after the upgrade as i was in middle of upgrading all three nodes.

Do you have any ideas? Thanks for all your help. Below is some info. I do not know how Ioomu grouping was looking before the upgrade.

1. pvesh get /nodes/pve2/hardware/pci
Code:
┌──────────┬────────┬──────────────┬────────────┬────────┬────────────────────────────────────────────────┬──────┬──────────────────┬
│ class    │ device │ id           │ iommugroup │ vendor │ device_name                                    │ mdev │ subsystem_device │
╞══════════╪════════╪══════════════╪════════════╪════════╪════════════════════════════════════════════════╪══════╪══════════════════╪
│ 0x010601 │ 0xa382 │ 0000:00:17.0 │          4 │ 0x8086 │ 400 Series Chipset Family SATA AHCI Controller │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x010601 │ 0x1166 │ 0000:03:00.0 │          6 │ 0x1b21 │ ASM1166 Serial ATA Controller                  │      │ 0x2116           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x020000 │ 0x0d55 │ 0000:00:1f.6 │          8 │ 0x8086 │ Ethernet Connection (12) I219-V                │      │ 0x8672           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x030000 │ 0x9bc5 │ 0000:00:02.0 │          0 │ 0x8086 │ CometLake-S GT2 [UHD Graphics 630]             │ 1    │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x058000 │ 0xa3a1 │ 0000:00:1f.2 │          8 │ 0x8086 │ Cannon Lake PCH Power Management Controller    │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060000 │ 0x9b53 │ 0000:00:00.0 │          1 │ 0x8086 │ Comet Lake-S 6c Host Bridge/DRAM Controller    │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060100 │ 0xa3c8 │ 0000:00:1f.0 │          8 │ 0x8086 │ B460 Chipset LPC/eSPI Controller               │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060400 │ 0xa3e9 │ 0000:00:1b.0 │          5 │ 0x8086 │                                                │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060400 │ 0xa392 │ 0000:00:1c.0 │          6 │ 0x8086 │                                                │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060400 │ 0xa394 │ 0000:00:1c.4 │          6 │ 0x8086 │ Comet Lake PCI Express Root Port #05           │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x060400 │ 0xa398 │ 0000:00:1d.0 │          7 │ 0x8086 │ Comet Lake PCI Express Root Port 9             │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x078000 │ 0xa3ba │ 0000:00:16.0 │          3 │ 0x8086 │ Comet Lake PCH-V HECI Controller               │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x0c0330 │ 0xa3af │ 0000:00:14.0 │          2 │ 0x8086 │ Comet Lake PCH-V USB Controller                │      │ 0x8694           │                  
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────────────┼──────┼──────────────────┼
│ 0x0c0500 │ 0xa3a3 │ 0000:00:1f.4 │          8 │ 0x8086 │ Comet Lake PCH-V SMBus Host Controller         │      │ 0x8694           │                  
└──────────┴────────┴──────────────┴────────────┴────────┴────────────────────────────────────────────────┴──────┴──────────────────┴

2. lspci -nn
Code:
root@pve2:~# lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Comet Lake-S 6c Host Bridge/DRAM Controller [8086:9b53] (rev 05)
00:02.0 VGA compatible controller [0300]: Intel Corporation CometLake-S GT2 [UHD Graphics 630] [8086:9bc5] (rev 05)
00:14.0 USB controller [0c03]: Intel Corporation Comet Lake PCH-V USB Controller [8086:a3af]
00:16.0 Communication controller [0780]: Intel Corporation Comet Lake PCH-V HECI Controller [8086:a3ba]
00:17.0 SATA controller [0106]: Intel Corporation 400 Series Chipset Family SATA AHCI Controller [8086:a382]
00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:a3e9] (rev f0)
00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:a392] (rev f0)
00:1c.4 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #05 [8086:a394] (rev f0)
00:1d.0 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port 9 [8086:a398] (rev f0)
00:1f.0 ISA bridge [0601]: Intel Corporation B460 Chipset LPC/eSPI Controller [8086:a3c8]
00:1f.2 Memory controller [0580]: Intel Corporation Cannon Lake PCH Power Management Controller [8086:a3a1]
00:1f.4 SMBus [0c05]: Intel Corporation Comet Lake PCH-V SMBus Host Controller [8086:a3a3]
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (12) I219-V [8086:0d55]
03:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1166 Serial ATA Controller [1b21:1166] (rev 02)

3. unRAID VM config screenshot
1754424069607.png
 
Last edited:
Here are some more logs from journalctl while attempting to run the VM with PCIe card passed through. Below, I'm also passing through iGPU but that works just fine without the ASM1166. OBS: Note that below the ASM1166 got assigned 0000:04:00.0 ID, I'm trying to mess around with blacklisting but can't seem to get it to work as i just can't blacklist the full achi driver and i suspect this is not the issue anyways.

Code:
Aug 05 23:05:48 pve2 pvestatd[1735]: status update time (5.233 seconds)
Aug 05 23:05:54 pve2 pvedaemon[1796]: <root@pam> starting task UPID:pve2:0000157E:0000A6EE:68927232:qmstart:102:root@pam:
Aug 05 23:05:54 pve2 pvedaemon[5502]: start VM 102: UPID:pve2:0000157E:0000A6EE:68927232:qmstart:102:root@pam:
Aug 05 23:05:54 pve2 kernel: intel_vgpu_mdev 00000000-0000-0000-0000-000000000102: Adding to iommu group 10
Aug 05 23:05:54 pve2 kernel: sd 8:0:0:0: [sdc] Synchronizing SCSI cache
Aug 05 23:05:54 pve2 kernel: ata9.00: Entering standby power mode
Aug 05 23:05:54 pve2 kernel: sd 9:0:0:0: [sdd] Synchronizing SCSI cache
Aug 05 23:05:54 pve2 kernel: ata10.00: Entering standby power mode
Aug 05 23:05:54 pve2 kernel: sd 10:0:0:0: [sdf] Synchronizing SCSI cache
Aug 05 23:05:54 pve2 kernel: ata11.00: Entering standby power mode
Aug 05 23:05:55 pve2 kernel: sd 11:0:0:0: [sdg] Synchronizing SCSI cache
Aug 05 23:05:55 pve2 kernel: ata12.00: Entering standby power mode
Aug 05 23:05:56 pve2 kernel: vfio-pci 0000:04:00.0: resetting
Aug 05 23:05:56 pve2 kernel: vfio-pci 0000:04:00.0: reset done
Aug 05 23:05:56 pve2 systemd[1]: Started 102.scope.
Aug 05 23:05:56 pve2 kernel: tap102i0: entered promiscuous mode
Aug 05 23:05:56 pve2 kernel: vmbr0v20: port 2(fwpr102p0) entered blocking state
Aug 05 23:05:56 pve2 kernel: vmbr0v20: port 2(fwpr102p0) entered disabled state
Aug 05 23:05:56 pve2 kernel: fwpr102p0: entered allmulticast mode
Aug 05 23:05:56 pve2 kernel: fwpr102p0: entered promiscuous mode
Aug 05 23:05:56 pve2 kernel: vmbr0v20: port 2(fwpr102p0) entered blocking state
Aug 05 23:05:56 pve2 kernel: vmbr0v20: port 2(fwpr102p0) entered forwarding state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 1(fwln102i0) entered blocking state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 1(fwln102i0) entered disabled state
Aug 05 23:05:56 pve2 kernel: fwln102i0: entered allmulticast mode
Aug 05 23:05:56 pve2 kernel: fwln102i0: entered promiscuous mode
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 1(fwln102i0) entered blocking state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 1(fwln102i0) entered forwarding state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 2(tap102i0) entered blocking state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 2(tap102i0) entered disabled state
Aug 05 23:05:56 pve2 kernel: tap102i0: entered allmulticast mode
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 2(tap102i0) entered blocking state
Aug 05 23:05:56 pve2 kernel: fwbr102i0: port 2(tap102i0) entered forwarding state
Aug 05 23:05:57 pve2 kernel: vfio-pci 0000:04:00.0: resetting
Aug 05 23:05:57 pve2 kernel: vfio-pci 0000:04:00.0: reset done
Aug 05 23:05:57 pve2 kernel: vfio-pci 0000:04:00.0: resetting
Aug 05 23:05:57 pve2 kernel: vfio-pci 0000:04:00.0: reset done
Aug 05 23:05:58 pve2 pvedaemon[5502]: VM 102 started with PID 5540.
Aug 05 23:05:58 pve2 pvedaemon[1796]: <root@pam> end task UPID:pve2:0000157E:0000A6EE:68927232:qmstart:102:root@pam: OK
Aug 05 23:05:59 pve2 pvestatd[1735]: storage 'unraid-proxmox-data' is not online
Aug 05 23:06:00 pve2 kernel: hrtimer: interrupt took 100297929 ns
Aug 05 23:06:03 pve2 corosync[1593]:   [KNET  ] link: host: 2 link: 0 is down
Aug 05 23:06:03 pve2 corosync[1593]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 05 23:06:03 pve2 corosync[1593]:   [KNET  ] host: host: 2 has no active links
Aug 05 23:06:05 pve2 corosync[1593]:   [KNET  ] rx: host: 2 link: 0 is up
Aug 05 23:06:05 pve2 corosync[1593]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Aug 05 23:06:07 pve2 corosync[1593]:   [TOTEM ] Token has not been received in 2858 ms
Aug 05 23:06:07 pve2 corosync[1593]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Aug 05 23:06:07 pve2 corosync[1593]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 05 23:06:07 pve2 corosync[1593]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 05 23:06:09 pve2 corosync[1593]:   [QUORUM] Sync members[3]: 1 2 3
Aug 05 23:06:09 pve2 corosync[1593]:   [TOTEM ] A new membership (1.66b) was formed. Members
Aug 05 23:06:11 pve2 corosync[1593]:   [QUORUM] Members[3]: 1 2 3
Aug 05 23:06:11 pve2 corosync[1593]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 05 23:06:12 pve2 pve-firewall[1731]: firewall update time (8.924 seconds)
Aug 05 23:06:12 pve2 pvedaemon[1796]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - got timeout
Aug 05 23:06:15 pve2 pvestatd[1735]: got timeout
Aug 05 23:06:20 pve2 pvestatd[1735]: status update time (26.782 seconds)
Aug 05 23:06:22 pve2 pve-firewall[1731]: firewall update time (9.484 seconds)
Aug 05 23:06:31 pve2 kernel: INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 50.124 msecs
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] link: host: 3 link: 0 is down
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] link: host: 2 link: 0 is down
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] host: host: 3 has no active links
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 05 23:06:31 pve2 corosync[1593]:   [KNET  ] host: host: 2 has no active links
Aug 05 23:06:33 pve2 corosync[1593]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Aug 05 23:06:34 pve2 corosync[1593]:   [TOTEM ] Token has not been received in 3059 ms
Aug 05 23:06:34 pve2 pvestatd[1735]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 32 retries
Aug 05 23:06:35 pve2 corosync[1593]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Aug 05 23:06:36 pve2 corosync[1593]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
Aug 05 23:06:41 pve2 corosync[1593]:   [TOTEM ] Process pause detected for 2106 ms, flushing membership messages.
Aug 05 23:06:41 pve2 corosync[1593]:   [MAIN  ] Corosync main process was not scheduled (@1754428000973) for 4963.6758 ms (threshold is 2920.0000 ms). Consider token timeout increase.
Aug 05 23:06:42 pve2 corosync[1593]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 05 23:06:42 pve2 pve-firewall[1731]: firewall update time (9.160 seconds)
Aug 05 23:06:43 pve2 corosync[1593]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 05 23:06:43 pve2 corosync[1593]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 05 23:06:43 pve2 corosync[1593]:   [MAIN  ] Corosync main process was not scheduled (@1754428003935) for 2958.0959 ms (threshold is 2920.0000 ms). Consider token timeout increase.
Aug 05 23:06:46 pve2 ceph-mon[1591]: 2025-08-05T23:06:46.843+0200 754f55a036c0 -1 mon.pve2@0(leader) e4 get_health_metrics reporting 2 slow ops, oldest is log(1 entries from seq 129 at 2025-08-05T23:06:09.581682+0200)
Aug 05 23:06:47 pve2 kernel: INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 50.138 msecs
Aug 05 23:06:47 pve2 pvedaemon[1794]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 20 retries
Aug 05 23:06:50 pve2 corosync[1593]:   [QUORUM] Sync members[1]: 1
Aug 05 23:06:50 pve2 corosync[1593]:   [QUORUM] Sync left[2]: 2 3
Aug 05 23:06:50 pve2 corosync[1593]:   [TOTEM ] A new membership (1.66f) was formed. Members left: 2 3
Aug 05 23:06:50 pve2 corosync[1593]:   [TOTEM ] Failed to receive the leave message. failed: 2 3
Aug 05 23:06:50 pve2 corosync[1593]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 05 23:06:50 pve2 corosync[1593]:   [QUORUM] Members[1]: 1
Aug 05 23:06:50 pve2 corosync[1593]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 05 23:06:51 pve2 pmxcfs[1440]: [status] notice: node lost quorum
Aug 05 23:06:51 pve2 pmxcfs[1440]: [dcdb] notice: members: 1/1440
Aug 05 23:06:51 pve2 pmxcfs[1440]: [status] notice: members: 1/1440
Aug 05 23:06:51 pve2 pmxcfs[1440]: [dcdb] crit: received write while not quorate - trigger resync
Aug 05 23:06:51 pve2 pmxcfs[1440]: [dcdb] crit: leaving CPG group
Aug 05 23:06:52 pve2 pve-ha-lrm[1980]: unable to write lrm status file - unable to open file '/etc/pve/nodes/pve2/lrm_status.tmp.1980' - Permission denied
Aug 05 23:06:53 pve2 pmxcfs[1440]: [dcdb] notice: start cluster connection
Aug 05 23:06:53 pve2 pmxcfs[1440]: [dcdb] crit: cpg_join failed: CS_ERR_EXIST
Aug 05 23:06:54 pve2 pmxcfs[1440]: [dcdb] crit: can't initialize service
Aug 05 23:06:57 pve2 pve-firewall[1731]: firewall update time (14.297 seconds)
Aug 05 23:06:59 pve2 kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 50.136 msecs
Aug 05 23:06:59 pve2 kernel: perf: interrupt took too long (391816 > 2500), lowering kernel.perf_event_max_sample_rate to 1000
Aug 05 23:07:02 pve2 pvestatd[1735]: got timeout
Aug 05 23:07:04 pve2 pve-firewall[1731]: firewall update time (7.358 seconds)
Aug 05 23:07:07 pve2 corosync[1593]:   [TOTEM ] Process pause detected for 3660 ms, flushing membership messages.
Aug 05 23:07:07 pve2 corosync[1593]:   [MAIN  ] Corosync main process was not scheduled (@1754428026704) for 3660.2207 ms (threshold is 2920.0000 ms). Consider token timeout increase.
Aug 05 23:07:08 pve2 corosync[1593]:   [TOTEM ] Process pause detected for 2206 ms, flushing membership messages.
Aug 05 23:07:10 pve2 pvescheduler[5845]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 05 23:07:10 pve2 pmxcfs[1440]: [dcdb] notice: members: 1/1440
Aug 05 23:07:10 pve2 pmxcfs[1440]: [dcdb] notice: all data is up to date
Aug 05 23:07:10 pve2 corosync[1593]:   [QUORUM] Sync members[1]: 1
Aug 05 23:07:10 pve2 corosync[1593]:   [TOTEM ] A new membership (1.67b) was formed. Members
Aug 05 23:07:10 pve2 corosync[1593]:   [QUORUM] Members[1]: 1
 
Last edited: