[SOLVED] Migration migrates .conf file but skips disk images

athompso

Member
Sep 13, 2013
127
8
18
I've got a 3-node PVE 4.1 cluster running with UDPU at OVH (on their SoYouStart brand, experience has been OK so far).
Using UDPU, the cluster seems (well, seemed) to operate quite happily once I got corosync to be 100% happy with UDPU.

I'm now seeing a strange behaviour, when I try to migrate a VM from one node to another, *or* clone a VM or template from one node to another, the PVE GUI reports the task has completed successfully, but all it did was move the .conf file in /etc/pve/nodes/*, it completely skipped migrating the disk image!
If I attempt to, say, migrate the VM back to the source node, PVE won't let me because it detects that the disk image is missing (no kidding).
I can manually copy files back and forth with no issues, using NFS, rsync, SSH, netcat, etc. and after manually migrating the offline VM, the VM works fine at its new home.

On the two nodes involved, in /var/log/pve/tasks/index, the only errors I see relate to pveproxy (i.e. console connections).

Where do I start looking for more details? The task logs only show this:
Apr 01 09:48:44 starting migration of VM 103 to node 'pve2' (167.114.208.176)
Apr 01 09:48:44 copying disk images
Apr 01 09:48:45 migration finished successfully (duration 00:00:01)
TASK OK

...which really doesn't give me much to go on.

pveversion --verboseoutput (identical from all nodes):
root@pve2:~# pveversion --verbose
proxmox-ve: 4.1-41 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-22 (running version: 4.1-22/aca130cf)
pve-kernel-4.2.8-1-pve: 4.2.8-41
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-36
qemu-server: 4.0-64
pve-firmware: 1.1-7
libpve-common-perl: 4.0-54
libpve-access-control: 4.0-13
libpve-storage-perl: 4.0-45
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-9
pve-container: 1.0-52
pve-firewall: 2.0-22
pve-ha-manager: 1.0-25
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie

The manual-migration workaround is a pain in the butt, but it works for now so I'm not completely dead in the water, but this is disturbing.

Two final notes:
1. I have not enabled unsecure_migration enabled in datacenter.cfg, since the interconnect is in fact public. I'm going to try that next.
2. All three nodes have a community subscription, which is why I'm asking here instead of opening a ticket.
 
Hi,
looks that the disks images aren't copied, because the storage is marked as shared?? So pve think both nodes has access to the storage?

Post the vm-config and the storage-config (/etc/pve/storage.cfg).

Udo
 
<smacks forehead> You got it in one! I had flagged "local" as shared for some reason.

Oh, I do remember why - I'm also exporting each node's /var/lib/vz directory to the others via NFS so they can back up to each other and restore each other's backups if/when necessary - OVH/SyS doesn't provide a common NFS-accessible backup fileshare, so this is one of the workarounds needed in that environment.
I have some sort of vague belief that the "shared" flag affected locking, so that PVE didn't just assume it had exclusive access to the directory... premature optimization, I guess, was the root cause.

THANK YOU for the quick, accurate response!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!