I've got a 3-node PVE 4.1 cluster running with UDPU at OVH (on their SoYouStart brand, experience has been OK so far).
Using UDPU, the cluster seems (well, seemed) to operate quite happily once I got corosync to be 100% happy with UDPU.
I'm now seeing a strange behaviour, when I try to migrate a VM from one node to another, *or* clone a VM or template from one node to another, the PVE GUI reports the task has completed successfully, but all it did was move the .conf file in /etc/pve/nodes/*, it completely skipped migrating the disk image!
If I attempt to, say, migrate the VM back to the source node, PVE won't let me because it detects that the disk image is missing (no kidding).
I can manually copy files back and forth with no issues, using NFS, rsync, SSH, netcat, etc. and after manually migrating the offline VM, the VM works fine at its new home.
On the two nodes involved, in /var/log/pve/tasks/index, the only errors I see relate to pveproxy (i.e. console connections).
Where do I start looking for more details? The task logs only show this:
...which really doesn't give me much to go on.
pveversion --verboseoutput (identical from all nodes):
The manual-migration workaround is a pain in the butt, but it works for now so I'm not completely dead in the water, but this is disturbing.
Two final notes:
1. I have not enabled unsecure_migration enabled in datacenter.cfg, since the interconnect is in fact public. I'm going to try that next.
2. All three nodes have a community subscription, which is why I'm asking here instead of opening a ticket.
Using UDPU, the cluster seems (well, seemed) to operate quite happily once I got corosync to be 100% happy with UDPU.
I'm now seeing a strange behaviour, when I try to migrate a VM from one node to another, *or* clone a VM or template from one node to another, the PVE GUI reports the task has completed successfully, but all it did was move the .conf file in /etc/pve/nodes/*, it completely skipped migrating the disk image!
If I attempt to, say, migrate the VM back to the source node, PVE won't let me because it detects that the disk image is missing (no kidding).
I can manually copy files back and forth with no issues, using NFS, rsync, SSH, netcat, etc. and after manually migrating the offline VM, the VM works fine at its new home.
On the two nodes involved, in /var/log/pve/tasks/index, the only errors I see relate to pveproxy (i.e. console connections).
Where do I start looking for more details? The task logs only show this:
Apr 01 09:48:44 starting migration of VM 103 to node 'pve2' (167.114.208.176)
Apr 01 09:48:44 copying disk images
Apr 01 09:48:45 migration finished successfully (duration 00:00:01)
TASK OK
...which really doesn't give me much to go on.
pveversion --verboseoutput (identical from all nodes):
root@pve2:~# pveversion --verbose
proxmox-ve: 4.1-41 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-22 (running version: 4.1-22/aca130cf)
pve-kernel-4.2.8-1-pve: 4.2.8-41
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-36
qemu-server: 4.0-64
pve-firmware: 1.1-7
libpve-common-perl: 4.0-54
libpve-access-control: 4.0-13
libpve-storage-perl: 4.0-45
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-9
pve-container: 1.0-52
pve-firewall: 2.0-22
pve-ha-manager: 1.0-25
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie
The manual-migration workaround is a pain in the butt, but it works for now so I'm not completely dead in the water, but this is disturbing.
Two final notes:
1. I have not enabled unsecure_migration enabled in datacenter.cfg, since the interconnect is in fact public. I'm going to try that next.
2. All three nodes have a community subscription, which is why I'm asking here instead of opening a ticket.