I have three node cluster with CEPH and DRBD installed as shared storage (CEPH is used entirely for the linstor controller, a long story to tell and not discussed in this thread).
I created two VMs, one "vm:100" on CEPH (as said, hosting linstor controller) and one "vm:110" on DRBD (appliance by debian OS).
everything works perfectly, such as manual migration, CEPH and DRBD synchronization, VM manual bootup and shutdown.
I set HA with the following customized configuration:
Under GUI Datacenter > Options
Under GUI Datacenter > HA > Groups
I put two VMs under the HA with "started" state.
Problem encountered:
All VMs resides on pve1, Shutdown pve1 initiated, all VMs migrated to pve2 without any problem.
Then I shutdown pve2, but It cannot reboot and wait forever.
the following messages appeared on the console of pve2:
From the GUI, I don't see any shutdown triggered to the VMs, the VM are still running.
I tried to do the manual shutdown from the OS. However, the HA manager then kicks up the VM again.
If I click the shutdown button for the VM on the proxmox webGUI, the VMs go down, the node can reboot, but the VMs don't started after the reboot as the state in the HA changed to "stopped"
Initially, I suspect the DRBD caused this issue, so I removed vm110 from the HA. But the problem still exists.
I noticed the issue of ifupdown2 as described in this forum, but I checked that my side don't use ifupdown2
Attached is my version of pve on 3 nodes.
I created two VMs, one "vm:100" on CEPH (as said, hosting linstor controller) and one "vm:110" on DRBD (appliance by debian OS).
everything works perfectly, such as manual migration, CEPH and DRBD synchronization, VM manual bootup and shutdown.
I set HA with the following customized configuration:
Under GUI Datacenter > Options
Code:
Migration Setting: network=172.16.0.1/24, type=secure
HA Settings: shutdown_policy=migrate
Code:
name: primary, restricted: yes, nofailback: no, Nodes: pve1, pve2 (Without pve3)
Code:
ID: vm:100, State: stated, Max. Restart: 3. Max. Relocate: 3, Group: primary
ID: vm:110, State: stated, Max. Restart: 3. Max. Relocate: 3, Group: primary
Problem encountered:
All VMs resides on pve1, Shutdown pve1 initiated, all VMs migrated to pve2 without any problem.
Then I shutdown pve2, but It cannot reboot and wait forever.
the following messages appeared on the console of pve2:
Code:
A stop job is running for PVE Local HA Resource Manager Daemon
I tried to do the manual shutdown from the OS. However, the HA manager then kicks up the VM again.
If I click the shutdown button for the VM on the proxmox webGUI, the VMs go down, the node can reboot, but the VMs don't started after the reboot as the state in the HA changed to "stopped"
Initially, I suspect the DRBD caused this issue, so I removed vm110 from the HA. But the problem still exists.
I noticed the issue of ifupdown2 as described in this forum, but I checked that my side don't use ifupdown2
Attached is my version of pve on 3 nodes.
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.9-pve1
ceph-fuse: 14.2.9-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.9-pve1
ceph-fuse: 14.2.9-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
Last edited: