Assistance with New Cluster

UnstableAlpha

New Member
Apr 11, 2018
1
0
1
47
Hi all,

After a bit of help with a new cluster. I've managed to solve all the issues that I've had so far by Googling, but this one has me a little stumped. I have 2x devices in a cluster:

vhost-01
root filesystem installed to 32Gb USB drive
4x 3Tb drives in RAID-Z1: vhost01-pool-01
120Gb SSD used for log/cache

vhost-02
root filesystem installed to 32Gb USB drive
3x 2Tb drives in RAID-Z1: vhost02-pool-01

Both devices have a single connection to my main network, plus a device-to-device direct connection for the purposes of sync/cluster traffic. I initially had a bit of an issue getting the cluster up and running - I believe I entered the commands incorrectly and it tried to establish the cluster through my main network via a switch which seems to have issues with multicast. I managed to edit the corosync.conf with the correct details and the cluster began to work.

My issue now is that my cluster is unable to live-, or offline-migrate VM's between the host. The error recieved is below:
cannot open 'vhost01-pool-01/vm-disks/vm-201-disk-1': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
cannot open 'vhost01-pool-01/vm-disks/vm-201-disk-1': dataset does not exist
command 'zfs recv -F -- vhost01-pool-01/vm-disks/vm-201-disk-1' failed: exit code 1
command 'zfs send -Rpv -- vhost01-pool-01/vm-disks/vm-201-disk-1@__migration__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2018-04-11 10:07:24 ERROR: Failed to sync data - command 'set -o pipefail && pvesm export vm-disks:vm-201-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=vhost-02' root@192.168.1.13 -- pvesm import vm-disks:vm-201-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 1
2018-04-11 10:07:24 aborting phase 1 - cleanup resources
2018-04-11 10:07:24 ERROR: found stale volume copy 'vm-disks:vm-201-disk-1' on node 'vhost-02'
2018-04-11 10:07:24 ERROR: migration aborted (duration 00:00:04): Failed to sync data - command 'set -o pipefail && pvesm export vm-disks:vm-201-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=vhost-02' root@192.168.1.13 -- pvesm import vm-disks:vm-201-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 1
TASK ERROR: migration aborted

Is this in some way connected to the naming of the ZFS pools between the devices? I created each device with it's own pool before joining them in a cluster. There are currently only a couple of VM's hosted on the first host - I'd rather tear this down and rebuild so that it's right before I continue if that's the best thing to do.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!