Report: ASM1166 (PH516 VER:1.5, firmware 241224-0000-00) + Intel C236 root port (Dell Precision Tower 3620) + Proxmox 9 (kernel 6.17.x) warm-reboot enumeration failure
Hardware
The failure is suspect to be a timing race that occurs only on warm reset between the ASM1166 silicon and the Intel C236 root port.
A minimal systemd oneshot service that forces the exact clean root-port state before the HA conditional freeze occurs.
A service file is created at: /etc/systemd/system/pve-pre-ha-pcie-clean.service
Installation steps (run as root, don't forget to adapt to your own adapter, no pun intended)
1. Create the service file (cli command):
2. Reload systemd and enable the service (this also creates two symlinks):
The symlinks:
Result after first GUI reboot with the service active
Although this workaround is developed on my Intel C236 + ASM1166 combination, it may apply to any downstream PCIe bridge combination that fails warm-reset timing under Proxmox HA-managed shutdowns. No kernel patches, BIOS changes, or firmware updates were required.
Disclaimer: I am new to proxmox, this is my first post. I registered specifically to provide this feedback and share my experience in the hope it may help others like this forum has helped me fix this.
Hardware
- Adapter: ASM1166 M.2-to-6xSATA (PH516 VER:1.5, all 6 SATA ports populated with drives, legacy boot enabled in BIOS)
- Root port: 00:1b.0 (Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17, rev f1)
- Firmware tested: original 220419-0000-00 and upgraded 241224-0000-00 (identical behavior)
- Cold boot / hard power cycle: ASM1166 enumerates correctly under downstream bus 02 at 8.0 GT/s x2, Power state D0, full BARs and memory window populated, ahci driver loads cleanly, all 6 drives visible.
- Any soft reboot (reboot, shutdown -r now, or Proxmox web-UI Reboot button): 00:1b.0 shows Link speed downgraded to 2.5 GT/s, Power state D3hot, downstream bus 02 empty. lspci -vvv shows zeroed memory window at offset 0x20, downgraded LNKSTA/LNKCTL bits, PMCSR in D3hot, AER error flags set. dmesg reports “broken device, retraining non-functional downstream link at 2.5GT/s” + “retraining failed”.
The failure is suspect to be a timing race that occurs only on warm reset between the ASM1166 silicon and the Intel C236 root port.
- Direct CLI reboot runs the kernel’s standard PCIe shutdown notifiers in the expected order, leaving the root port in a clean state.
- Proxmox GUI reboot first stops pve-ha-lrm.service with its “conditional” shutdown policy (stops/freezes HA services), which changes the exact moment the PCIe notifiers run and leaves the root port locked in D3hot + broken retrain flags.
BIOS settings (C-States disabled, Deep Sleep disabled), pcie_aspm=off, ahci.mobile_lpm_policy=0, libata.force=nolpm were tested; only pcie_aspm=off made CLI reboots reliable, but it never fixed the GUI path.
- Exhaustive lspci -vvv -xxx and dmesg comparisons of good (cold) vs bad (warm) states.
- Manual sysfs power/control D0/D3 cycles, setpci writes to Bridge Control (secondary bus reset), PMCSR (50.w), rescan sequences.
- Journal analysis of pve-ha-lrm.service vs systemd-shutdown paths (GUI vs CLI).
- Creation and iterative refinement of a systemd oneshot service that runs Before=pve-ha-lrm.service on shutdown/reboot targets.
A minimal systemd oneshot service that forces the exact clean root-port state before the HA conditional freeze occurs.
A service file is created at: /etc/systemd/system/pve-pre-ha-pcie-clean.service
Installation steps (run as root, don't forget to adapt to your own adapter, no pun intended)
1. Create the service file (cli command):
Code:
cat > /etc/systemd/system/pve-pre-ha-pcie-clean.service << 'EOF'
[Unit]
Description=Minimal ASM1166 pre-HA hook: force clean C236 root port state before pve-ha-lrm conditional shutdown
DefaultDependencies=no
Before=pve-ha-lrm.service shutdown.target reboot.target
Wants=shutdown.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/sh -c 'setpci -s 00:1b.0 3e.b=40; sleep 1; setpci -s 00:1b.0 3e.b=00; sleep 2; echo 1 > /sys/bus/pci/rescan; echo "Pre-HA PCIe SBR+rescan done (ASM1166 fixed)" > /dev/kmsg'
ExecStop=/bin/true
[Install]
WantedBy=reboot.target shutdown.target
EOF
2. Reload systemd and enable the service (this also creates two symlinks):
Code:
systemctl daemon-reload && systemctl enable --now pve-pre-ha-pcie-clean.service
The symlinks:
- /etc/systemd/system/reboot.target.wants/pve-pre-ha-pcie-clean.service
- /etc/systemd/system/shutdown.target.wants/pve-pre-ha-pcie-clean.service
Result after first GUI reboot with the service active
- 00:1b.0: LnkSta Speed 8GT/s, Width x2, Power state D0
- 02:00.0: ASM1166 fully enumerated
- ahci driver loaded without controller-reset failures
- No retrain errors in dmesgThe GUI reboot now behaves identically to a cold boot.
Code:
systemctl disable --now pve-pre-ha-pcie-clean.service
rm -f /etc/systemd/system/pve-pre-ha-pcie-clean.service
rm -f /etc/systemd/system/reboot.target.wants/pve-pre-ha-pcie-clean.service
rm -f /etc/systemd/system/shutdown.target.wants/pve-pre-ha-pcie-clean.service
systemctl daemon-reload
Although this workaround is developed on my Intel C236 + ASM1166 combination, it may apply to any downstream PCIe bridge combination that fails warm-reset timing under Proxmox HA-managed shutdowns. No kernel patches, BIOS changes, or firmware updates were required.
Disclaimer: I am new to proxmox, this is my first post. I registered specifically to provide this feedback and share my experience in the hope it may help others like this forum has helped me fix this.