Relocation error - ZFS HA

yena · Nov 14, 2017

Hello,
i'm testing HA on a 4 node cluster Proxmox 5 last ver. ZFS local storage + replication.

When one node go down the vps is migrated correctly on the second node but when the failed node come back online i have this error on the failback:

task started by HA resource agent
2017-11-14 18:06:40 starting migration of CT 100 to node 'nodo1' (192.168.100.11)
2017-11-14 18:06:40 found local volume 'SSDstorage:subvol-100-disk-1' (in current VM config)
full send of SSDstorage/subvol-100-disk-1@rep_TestBackup_2017-11-14_17:27:52 estimated size is 547M
send from @rep_TestBackup_2017-11-14_17:27:52 to SSDstorage/subvol-100-disk-1@rep_TestBackup_2017-11-14_17:28:01 estimated size is 66.6K
send from @rep_TestBackup_2017-11-14_17:28:01 to SSDstorage/subvol-100-disk-1@__replicate_100-0_1510678800__ estimated size is 1.19M
send from @__replicate_100-0_1510678800__ to SSDstorage/subvol-100-disk-1@__migration__ estimated size is 1.19M
total estimated size is 549M
TIME SENT SNAPSHOT
SSDstorage/subvol-100-disk-1 name SSDstorage/subvol-100-disk-1 -
volume 'SSDstorage/subvol-100-disk-1' already exists
command 'zfs send -Rpv -- SSDstorage/subvol-100-disk-1@__migration__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2017-11-14 18:06:40 ERROR: command 'set -o pipefail && pvesm export SSDstorage:subvol-100-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=nodo1' root@192.168.100.11 -- pvesm import SSDstorage:subvol-100-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
2017-11-14 18:06:40 aborting phase 1 - cleanup resources
2017-11-14 18:06:40 ERROR: found stale volume copy 'SSDstorage:subvol-100-disk-1' on node 'nodo1'
2017-11-14 18:06:40 start final cleanup
2017-11-14 18:06:40 ERROR: migration aborted (duration 00:00:00): command 'set -o pipefail && pvesm export SSDstorage:subvol-100-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=nodo1' root@192.168.100.11 -- pvesm import SSDstorage:subvol-100-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
TASK ERROR: migration aborted

----------------------------------------------

- Workaound: delete snapshot and vm image on the failed node and migrate manually the VM.

es:
root@nodo2:~# zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
SSDstorage 599M 3.59T 30.6K /SSDstorage
SSDstorage/subvol-101-disk-1 598M 7.42G 597M /SSDstorage/subvol-101-disk-1
SSDstorage/subvol-101-disk-1@__replicate_101-0_1510683082__ 1.31M - 597M -
rpool 9.85G 221G 96K /rpool
rpool/ROOT 1.34G 221G 96K /rpool/ROOT
rpool/ROOT/pve-1 1.34G 221G 1.34G /
rpool/data 96K 221G 96K /rpool/data
rpool/swap 8.50G 229G 64K -
root@nodo2:~# zfs destroy -r SSDstorage/subvol-101-disk-1

- Live Migration Works Well after deleting old volume on the failed server.
- Live Migration Works without node's fail .. in normal state.

Some ideas ?
Thanks!

pveversion -V
proxmox-ve: 5.1-26 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.10.15-1-pve: 4.10.15-15
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9

harvie · Nov 15, 2017

Sadly. For some reason this is not yet implemented. I wish it wil be sorted out in Proxmox 5.2, but i saw no activity on this topic.

yena · Nov 15, 2017

harvie said:
Sadly. For some reason this is not yet implemented. I wish it wil be sorted out in Proxmox 5.2, but i saw no activity on this topic.

I have notice an other "bug",
when the maste rnode goes offline, and the vps is moved on the slave node, i can see the snapshot on the web interface, but no zfs Snap on the filesystem:

After relocation:

root@storage:~# zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
SSDstorage 1.33G 15.3T 186K /SSDstorage
SSDstorage/BACKUP 140K 15.3T 140K /SSDstorage/BACKUP
SSDstorage/VM 140K 15.3T 140K /SSDstorage/VM
SSDstorage/subvol-100-disk-1 680M 7.34G 678M /SSDstorage/subvol-100-disk-1
SSDstorage/subvol-100-disk-1@__replicate_100-0_1510730101__ 2.40M - 678M -
SSDstorage/subvol-101-disk-1 678M 7.34G 678M /SSDstorage/subvol-101-disk-1
SSDstorage/subvol-101-disk-1@__replicate_101-0_1510731000__ 0B - 678M -
rpool 10.9G 220G 96K /rpool
rpool/ROOT 2.39G 220G 96K /rpool/ROOT
rpool/ROOT/pve-1 2.39G 220G 2.39G /
rpool/data 96K 220G 96K /rpool/data
rpool/swap 8.50G 228G 64K -
-------------------------------------------------------------------------------------------------

If i try do delete snapshot:

Task viewer: CT 100 - Delete Snapshot

OutputStatus

Stop
TASK ERROR: zfs error: could not find any snapshots to destroy; check snapshot names.

Task viewer: CT 100 - Shutdown

OutputStatus

Stop
task started by HA resource agent
TASK ERROR: CT is locked (snapshot-delete)

root@storage:~# pct unlock 100
root@storage:~# pct stop 100
Requesting HA stop for CT 100
service 'ct:100' in error state, must be disabled and fixed first
command 'ha-manager set ct:100 --state stopped' failed: exit code 255
---------------------------------------------------------------------

Alwin · Nov 15, 2017

This is by design of the storage replication, as it does a sync per interval and if you did the snapshot immediately after a sync then only the next iteration will catch the snapshot. Depending on the time window this can take quite some time, hence you need to trigger the sync manual. The vmid.conf is updated on every change (eg. snapshot) and distributed to all PVE hosts in the cluster, this why you can see the snapshot but it is not there.

yena · Nov 15, 2017

Alwin said:
This is by design of the storage replication, as it does a sync per interval and if you did the snapshot immediately after a sync then only the next iteration will catch the snapshot. Depending on the time window this can take quite some time, hence you need to trigger the sync manual. The vmid.conf is updated on every change (eg. snapshot) and distributed to all PVE hosts in the cluster, this why you can see the snapshot but it is not there.

So i have to check "no rollback" on HA group and on Fail, usi this procedure, deleting all volumes on the failed node.
Have you think about an automatic rollback that do this in automatic mode ?

Thanks

Alwin · Nov 15, 2017

'nofailback' and 'restricted' should be set.
https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_groups

There are implications on the replication, on recovery, as there is always some data loss involved due to the sync time window.

Take for example that the replication is done on a daily basis and due to a system failure the node reboots before the daily replication took place. HA starts the VM on a new node and 2 minutes later the original node is back in business. With the automatic approach, a days data would be deleted without a possibility of intervention. And the replication can be done to multiple nodes, that makes it even more complex.

In general an automation should not be taken lightly and might cause more trouble then good, as it is not a shared storage. https://pve.proxmox.com/pve-docs/chapter-pvesr.html

yena · Nov 15, 2017

You are right, i have set both options but i think should be very usefull an option to resync the recovered node by web panel using the last valid replicated snapshot.
This wolud avoid a full resync of all VM.

What do you think about it ?

Thanks!

Alwin · Nov 15, 2017

The snapshots in the GUI, are triggered by the user and the snapshots of the pve-zync (zfs send/receive) are not seen in the GUI. The replication direction on migration changes, on HA failure not.

A complete resync is always good to avoid, as it is a relatively new feature there is sure more to come.

Search

Search

Relocation error - ZFS HA

yena

Renowned Member

Attachments

harvie

Well-Known Member

yena

Renowned Member

Attachments

Alwin

Proxmox Retired Staff

yena

Renowned Member

Alwin

Proxmox Retired Staff

yena

Renowned Member

Alwin

Proxmox Retired Staff