I have a 4 host cluster running PVE 9.1.1.
I wanted to test HA. So I disconnected the network port of one of the nodes with VMs on it. They all were restarted on other hosts. I waited 10 min, then plugged the host back in. But the VMs never fell back to the node. I found that in the datacenter HA status, the host showed "idle"
Looking at the following logs I see this
ha-status looks fine
1. Why is the host stuck in idle?
2. How can I get the host back to active?
UPDATE: So this is by design. I thought that VMs with fallback turned on would remigrate to the stale host when it came back online. Apprently this is not the case. Have to manually migrate hosts back. I think I can create HA groups and set priority, thats where fallback is used, but without ha groups, its never used. So once at least one vm is migrated back, the host became active again.
I wanted to test HA. So I disconnected the network port of one of the nodes with VMs on it. They all were restarted on other hosts. I waited 10 min, then plugged the host back in. But the VMs never fell back to the node. I found that in the datacenter HA status, the host showed "idle"
Looking at the following logs I see this
Code:
# sudo journalctl -u pve-cluster -n 30 --no-pager
May 07 16:31:22 proxmox-01.home.internal pmxcfs[1037]: [status] notice: members: 1/1037
May 07 16:31:22 proxmox-01.home.internal pmxcfs[1037]: [status] notice: all data is up to date
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: members: 1/1037, 2/1015, 3/1015, 4/1015
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: starting data syncronisation
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: cpg_send_message retried 1 times
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: node has quorum
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: members: 1/1037, 2/1015, 3/1015, 4/1015
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: starting data syncronisation
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: received sync request (epoch 1/1037/00000002)
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: received sync request (epoch 1/1037/00000002)
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: received all states
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: leader is 2/1015
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: synced members: 2/1015, 3/1015, 4/1015
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: waiting for updates from leader
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: dfsm_deliver_queue: queue length 4
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: received all states
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: all data is up to date
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: dfsm_deliver_queue: queue length 16
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: update complete - trying to commit (got 14 inode updates)
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: all data is up to date
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: dfsm_deliver_sync_queue: queue length 4
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [ipcs] crit: connection from bad user 1000! - rejected
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [libqb] error: Error in connection setup (/dev/shm/qb-1037-4149-33-qad5On/qb): Unknown error -1 (-1)
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [ipcs] crit: connection from bad user 1000! - rejected
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [libqb] error: Error in connection setup (/dev/shm/qb-1037-4149-33-6oCuYD/qb): Unknown error -1 (-1)
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [ipcs] crit: connection from bad user 1000! - rejected
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [libqb] error: Error in connection setup (/dev/shm/qb-1037-4149-33-4brPiC/qb): Unknown error -1 (-1)
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [ipcs] crit: connection from bad user 1000! - rejected
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [libqb] error: Error in connection setup (/dev/shm/qb-1037-4149-33-DcC7Fn/qb): Unknown error -1 (-1)
May 07 16:47:43 proxmox-01.home.internal pmxcfs[1037]: [status] notice: received log
Code:
root@proxmox-01:~# sudo journalctl -u pve-cluster -n 30 --no-pager
May 07 16:31:22 proxmox-01.home.internal pmxcfs[1037]: [status] notice: members: 1/1037
May 07 16:31:22 proxmox-01.home.internal pmxcfs[1037]: [status] notice: all data is up to date
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: members: 1/1037, 2/1015, 3/1015, 4/1015
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: starting data syncronisation
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: cpg_send_message retried 1 times
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: node has quorum
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: members: 1/1037, 2/1015, 3/1015, 4/1015
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: starting data syncronisation
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: received sync request (epoch 1/1037/00000002)
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: received sync request (epoch 1/1037/00000002)
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: received all states
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: leader is 2/1015
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: synced members: 2/1015, 3/1015, 4/1015
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: waiting for updates from leader
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: dfsm_deliver_queue: queue length 4
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: received all states
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: all data is up to date
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [status] notice: dfsm_deliver_queue: queue length 16
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: update complete - trying to commit (got 14 inode updates)
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: all data is up to date
May 07 16:35:39 proxmox-01.home.internal pmxcfs[1037]: [dcdb] notice: dfsm_deliver_sync_queue: queue length 4
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [ipcs] crit: connection from bad user 1000! - rejected
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [libqb] error: Error in connection setup (/dev/shm/qb-1037-4149-33-qad5On/qb): Unknown error -1 (-1)
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [ipcs] crit: connection from bad user 1000! - rejected
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [libqb] error: Error in connection setup (/dev/shm/qb-1037-4149-33-6oCuYD/qb): Unknown error -1 (-1)
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [ipcs] crit: connection from bad user 1000! - rejected
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [libqb] error: Error in connection setup (/dev/shm/qb-1037-4149-33-4brPiC/qb): Unknown error -1 (-1)
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [ipcs] crit: connection from bad user 1000! - rejected
May 07 16:45:51 proxmox-01.home.internal pmxcfs[1037]: [libqb] error: Error in connection setup (/dev/shm/qb-1037-4149-33-DcC7Fn/qb): Unknown error -1 (-1)
May 07 16:47:43 proxmox-01.home.internal pmxcfs[1037]: [status] notice: received log
Code:
# systemctl status pve-ha-lrm
● pve-ha-lrm.service - PVE Local HA Resource Manager Daemon
Loaded: loaded (/usr/lib/systemd/system/pve-ha-lrm.service; enabled; preset: enabled)
Active: active (running) since Thu 2026-05-07 16:32:41 EDT; 21min ago
Invocation: dc0b44bfaf094f719e6707a66cbd8de5
Main PID: 1509 (pve-ha-lrm)
Tasks: 1 (limit: 17735)
Memory: 114.6M (peak: 133.5M)
CPU: 1.059s
CGroup: /system.slice/pve-ha-lrm.service
└─1509 pve-ha-lrm
May 07 16:32:40 proxmox-01.home.internal systemd[1]: Starting pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
May 07 16:32:41 proxmox-01.home.internal pve-ha-lrm[1509]: starting server
May 07 16:32:41 proxmox-01.home.internal pve-ha-lrm[1509]: status change startup => wait_for_agent_lock
May 07 16:32:41 proxmox-01.home.internal systemd[1]: Started pve-ha-lrm.service - PVE Local HA Resource Manager Daemon.
ha-status looks fine
Code:
root@proxmox-01:~# pvecm status
ha-manager status
Cluster information
-------------------
Name: home-01
Config Version: 5
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed May 6 19:47:54 2026
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000001
Ring ID: 1.15f
Quorate: Yes
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.50.51 (local)
0x00000002 1 192.168.50.52
0x00000003 1 192.168.50.53
0x00000004 1 192.168.50.54
quorum OK
master proxmox-02 (active, Wed May 6 19:47:47 2026)
lrm proxmox-01 (active, Wed May 6 19:47:50 2026)
lrm proxmox-02 (idle, Wed May 6 19:47:53 2026)
lrm proxmox-03 (idle, Wed May 6 19:47:53 2026)
lrm proxmox-04 (idle, Wed May 6 19:47:51 2026)
service vm:100 (proxmox-01, stopped)
root@proxmox-01:~# qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
100 template-ubuntu-24-04 stopped 8192 50.00 0
101 kube-01 running 8192 50.00 1678
200 ucs-01 running 4096 0.00 1903
root@proxmox-01:~# qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
100 template-ubuntu-24-04 stopped 8192 50.00 0
101 kube-01 running 8192 50.00 1678
200 ucs-01 running 4096 0.00 1903
root@proxmox-01:~# ha-manager status
quorum OK
master proxmox-02 (active, Wed May 6 19:50:57 2026)
lrm proxmox-01 (active, Wed May 6 19:51:00 2026)
lrm proxmox-02 (active, Wed May 6 19:50:58 2026)
lrm proxmox-03 (active, Wed May 6 19:50:58 2026)
lrm proxmox-04 (active, Wed May 6 19:51:02 2026)
service vm:100 (proxmox-01, stopped)
service vm:101 (proxmox-01, started)
service vm:102 (proxmox-02, started)
service vm:103 (proxmox-03, started)
service vm:104 (proxmox-04, started)
service vm:200 (proxmox-01, started)
1. Why is the host stuck in idle?
2. How can I get the host back to active?
UPDATE: So this is by design. I thought that VMs with fallback turned on would remigrate to the stale host when it came back online. Apprently this is not the case. Have to manually migrate hosts back. I think I can create HA groups and set priority, thats where fallback is used, but without ha groups, its never used. So once at least one vm is migrated back, the host became active again.
Last edited: