I have a 2 node proxmox cluster running version 8.4.14, with a Raspberry Pi acting a as a qdevice for quorum. Kernel 6.8.12-15-pve
I've taken pve-2 out of the datacentre to do some hardware work on it, and pve-1 had all the VMs running on it prior to powering it off. I then noticed my monitoring was having issues monitoring pve-1, but all the VMs on pve-1 were reporting just fine.
After a closer look, I couldn't SSH the PVE host or login via the WebUI. I can login using the console just fine. When I try do anything where I believe dns is involved the process just hang, and I can't even kill them using `kill -s 9`. pvestatd is usually the first process to hang in dmesg.
I've done this before without issues, and I have pve-1 out of action for a few weeks last month, because the motherboard had a fault, which has since been fixed, pve-1 hasn't been reinstalled though.
`ip address`
`ping google.com`
`lspci` hangs when it does to lookup `openat(AT_FDCWD, "/sys/bus/pci/devices/0000:03:00.0/config", O_RDONLY) = 3` which is the enp3s0 which is the migration network NIC which is down, as it's directly connected to pve-2 which isn't there.
The following commands do work as expected
`ping 1.1.1.1`
`dig google.com`
Feels like corosync or ha-manager is doing something really strange, to make part of the network stack hang?
During trying to debug this myself, I've tried restarting pve-1, but after a few minutes it just goes back to this broken state. The HA VMs start and things work, and then suddenly the problem comes back. I can't find anything in dmesg or journalctl --boot which I can see that's causing this result. pve-1 been working great for weeks, until I took pve-2 out, also rebooted pve-1 a few times during that 2 week period too.
`pvecm status`
`ha-manager status`
`/etc/resolv.conf`
Also then causes webui VMs to go to unknown state, yet the HA stays it's OK

Love to get some help on this, as this was working and wondering if an update has broke something here...
I've taken pve-2 out of the datacentre to do some hardware work on it, and pve-1 had all the VMs running on it prior to powering it off. I then noticed my monitoring was having issues monitoring pve-1, but all the VMs on pve-1 were reporting just fine.
After a closer look, I couldn't SSH the PVE host or login via the WebUI. I can login using the console just fine. When I try do anything where I believe dns is involved the process just hang, and I can't even kill them using `kill -s 9`. pvestatd is usually the first process to hang in dmesg.
I've done this before without issues, and I have pve-1 out of action for a few weeks last month, because the motherboard had a fault, which has since been fixed, pve-1 hasn't been reinstalled though.
`ip address`
`ping google.com`
`lspci` hangs when it does to lookup `openat(AT_FDCWD, "/sys/bus/pci/devices/0000:03:00.0/config", O_RDONLY) = 3` which is the enp3s0 which is the migration network NIC which is down, as it's directly connected to pve-2 which isn't there.
The following commands do work as expected
`ping 1.1.1.1`
`dig google.com`
Feels like corosync or ha-manager is doing something really strange, to make part of the network stack hang?
During trying to debug this myself, I've tried restarting pve-1, but after a few minutes it just goes back to this broken state. The HA VMs start and things work, and then suddenly the problem comes back. I can't find anything in dmesg or journalctl --boot which I can see that's causing this result. pve-1 been working great for weeks, until I took pve-2 out, also rebooted pve-1 a few times during that 2 week period too.
`pvecm status`
Code:
Cluster information
-------------------
Name: pve-coventry
Config Version: 3
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Fri Oct 17 23:16:17 2025
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.395
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 A,V,NMW 172.30.1.10 (local)
0x00000000 1 Qdevice
`ha-manager status`
Code:
quorum OK
master pve-1 (active, Fri Oct 17 23:16:44 2025)
lrm pve-1 (active, Fri Oct 17 23:16:45 2025)
lrm pve-2 (old timestamp - dead?, Fri Oct 17 17:24:41 2025)
service vm:102 (pve-1, started)
service vm:103 (pve-1, started)
service vm:105 (pve-1, started)
service vm:106 (pve-1, started)
service vm:107 (pve-1, started)
service vm:108 (pve-1, started)
service vm:109 (pve-1, started)
service vm:110 (pve-1, started)
service vm:111 (pve-1, started)
service vm:112 (pve-1, started)
service vm:114 (pve-1, started)
Code:
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: files systemd
group: files systemd
shadow: files systemd
gshadow: files systemd
hosts: files dns
networks: files
protocols: db files
services: db files
ethers: db files
rpc: db files
netgroup: nis
`/etc/resolv.conf`
Code:
nameserver 172.30.0.1
Also then causes webui VMs to go to unknown state, yet the HA stays it's OK

Love to get some help on this, as this was working and wondering if an update has broke something here...
Last edited: