Proxmox VE 6.2 prevent last nodes to reboot with HA enabled

tawh · May 18, 2020

I have three node cluster with CEPH and DRBD installed as shared storage (CEPH is used entirely for the linstor controller, a long story to tell and not discussed in this thread).
I created two VMs, one "vm:100" on CEPH (as said, hosting linstor controller) and one "vm:110" on DRBD (appliance by debian OS).
everything works perfectly, such as manual migration, CEPH and DRBD synchronization, VM manual bootup and shutdown.

I set HA with the following customized configuration:
Under GUI Datacenter > Options

Code:

Migration Setting: network=172.16.0.1/24, type=secure
HA Settings: shutdown_policy=migrate

Under GUI Datacenter > HA > Groups

Code:

name: primary, restricted: yes, nofailback: no, Nodes: pve1, pve2 (Without pve3)

I put two VMs under the HA with "started" state.

Code:

ID: vm:100, State: stated, Max. Restart: 3. Max. Relocate: 3, Group: primary
ID: vm:110, State: stated, Max. Restart: 3. Max. Relocate: 3, Group: primary

Problem encountered:
All VMs resides on pve1, Shutdown pve1 initiated, all VMs migrated to pve2 without any problem.
Then I shutdown pve2, but It cannot reboot and wait forever.

the following messages appeared on the console of pve2:

Code:

A stop job is running for PVE Local HA Resource Manager Daemon

From the GUI, I don't see any shutdown triggered to the VMs, the VM are still running.
I tried to do the manual shutdown from the OS. However, the HA manager then kicks up the VM again.
If I click the shutdown button for the VM on the proxmox webGUI, the VMs go down, the node can reboot, but the VMs don't started after the reboot as the state in the HA changed to "stopped"

Initially, I suspect the DRBD caused this issue, so I removed vm110 from the HA. But the problem still exists.
I noticed the issue of ifupdown2 as described in this forum, but I checked that my side don't use ifupdown2

Attached is my version of pve on 3 nodes.

proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.9-pve1
ceph-fuse: 14.2.9-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

tawh · May 18, 2020

Is there anyone have similar problem?

tawh · May 19, 2020

Today I tried on the third node (pve3), a VM installed in the local-lvm and put it to the HA resource manager, I created a HA group which only contain pve3.

When I click "shutdown" over the pve3, the log in the GUI showed "Stop all VMs and Containers" and then wait forever again. On the console, it also shows "A stop job is running for PVE Local HA Resource Manager Daemon (??min ??s / no limit)

It seems the latest build has serious problem on the shutdown procedure if HA is enabled.

Search

Search

Proxmox VE 6.2 prevent last nodes to reboot with HA enabled

tawh

Active Member

tawh

Active Member

tawh

Active Member