Hi all,
After a bit of help with a new cluster. I've managed to solve all the issues that I've had so far by Googling, but this one has me a little stumped. I have 2x devices in a cluster:
vhost-01
root filesystem installed to 32Gb USB drive
4x 3Tb drives in RAID-Z1: vhost01-pool-01
120Gb SSD used for log/cache
vhost-02
root filesystem installed to 32Gb USB drive
3x 2Tb drives in RAID-Z1: vhost02-pool-01
Both devices have a single connection to my main network, plus a device-to-device direct connection for the purposes of sync/cluster traffic. I initially had a bit of an issue getting the cluster up and running - I believe I entered the commands incorrectly and it tried to establish the cluster through my main network via a switch which seems to have issues with multicast. I managed to edit the corosync.conf with the correct details and the cluster began to work.
My issue now is that my cluster is unable to live-, or offline-migrate VM's between the host. The error recieved is below:
Is this in some way connected to the naming of the ZFS pools between the devices? I created each device with it's own pool before joining them in a cluster. There are currently only a couple of VM's hosted on the first host - I'd rather tear this down and rebuild so that it's right before I continue if that's the best thing to do.
After a bit of help with a new cluster. I've managed to solve all the issues that I've had so far by Googling, but this one has me a little stumped. I have 2x devices in a cluster:
vhost-01
root filesystem installed to 32Gb USB drive
4x 3Tb drives in RAID-Z1: vhost01-pool-01
120Gb SSD used for log/cache
vhost-02
root filesystem installed to 32Gb USB drive
3x 2Tb drives in RAID-Z1: vhost02-pool-01
Both devices have a single connection to my main network, plus a device-to-device direct connection for the purposes of sync/cluster traffic. I initially had a bit of an issue getting the cluster up and running - I believe I entered the commands incorrectly and it tried to establish the cluster through my main network via a switch which seems to have issues with multicast. I managed to edit the corosync.conf with the correct details and the cluster began to work.
My issue now is that my cluster is unable to live-, or offline-migrate VM's between the host. The error recieved is below:
cannot open 'vhost01-pool-01/vm-disks/vm-201-disk-1': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
cannot open 'vhost01-pool-01/vm-disks/vm-201-disk-1': dataset does not exist
command 'zfs recv -F -- vhost01-pool-01/vm-disks/vm-201-disk-1' failed: exit code 1
command 'zfs send -Rpv -- vhost01-pool-01/vm-disks/vm-201-disk-1@__migration__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2018-04-11 10:07:24 ERROR: Failed to sync data - command 'set -o pipefail && pvesm export vm-disks:vm-201-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=vhost-02' root@192.168.1.13 -- pvesm import vm-disks:vm-201-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 1
2018-04-11 10:07:24 aborting phase 1 - cleanup resources
2018-04-11 10:07:24 ERROR: found stale volume copy 'vm-disks:vm-201-disk-1' on node 'vhost-02'
2018-04-11 10:07:24 ERROR: migration aborted (duration 00:00:04): Failed to sync data - command 'set -o pipefail && pvesm export vm-disks:vm-201-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=vhost-02' root@192.168.1.13 -- pvesm import vm-disks:vm-201-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 1
TASK ERROR: migration aborted
Is this in some way connected to the naming of the ZFS pools between the devices? I created each device with it's own pool before joining them in a cluster. There are currently only a couple of VM's hosted on the first host - I'd rather tear this down and rebuild so that it's right before I continue if that's the best thing to do.