CT Promox 6.1 ... HA (Job for pve-container@100.service failed )

POL CRIOLLO

New Member
Feb 4, 2020
17
0
1
49
Greetings.
I am implementing a cluster of three nodes and high availability, with the cluster everything is working very well, in each node created a CT to do my tests.
I create the HA group with the three nodes, I add a resource HA one of the CT, I test the fall of that node where that CT is.
The CT manages to move to another node, initially it is in started mode, then the service is stopped and the CT is turned OFF.
I am sending the Log.
################
task started by HA resource agent
Job for pve-container@100.service failed because the control process exited with error code.
See "systemctl status pve-container@100.service" and "journalctl -xe" for details.
TASK ERROR: command 'systemctl start pve-container@100' failed: exit code 1

log
//////////

task started by HA resource agent
2020-02-17 18:25:10 starting migration of CT 100 to node 'node2' (192.168.1.38)
2020-02-17 18:25:10 found local volume 'local-zfs:subvol-100-disk-1' (in current VM config)
cannot open 'rpool/data/subvol-100-disk-1': dataset does not exist
usage:
snapshot [-r] [-o property=value] ... <filesystem|volume>@<snap> ...
For the property list, run: zfs set|get
2020-02-17 18:25:10 ERROR: zfs error: For the delegated permission list, run: zfs allow|unallow
2020-02-17 18:25:10 aborting phase 1 - cleanup resources
2020-02-17 18:25:10 ERROR: found stale volume copy 'local-zfs:subvol-100-disk-1' on node 'node2'
2020-02-17 18:25:10 start final cleanup
2020-02-17 18:25:10 ERROR: migration aborted (duration 00:00:01): zfs error: For the delegated permission list, run: zfs allow|unallow
TASK ERROR: migration aborted

Paul Criollo
 
Hi,

the problem is you use local disks and at failover time the other node has no rootfs to start from.
For HA you need shared storage or you must enable pvesr.