Disk migration fails

MC4U.de

Member
Sep 22, 2017
13
1
23
27
Hi,
I am trying to live-migrate a VM (disk-size: 200GB) inside a cluster with three nodes. The VM image is currently local, so I want to move it to an NFS-share first. But if I try that, it gives this error:

create full clone of drive scsi0 (local:7161/vm-7161-disk-0.qcow2)
TASK ERROR: storage migration failed: error with cfs lock 'storage-storage1': unable to create image: got lock timeout - aborting command

The file for the VM is created on the target cluster. I see the file size growing fast, but it doesn't reach 200gb before the error occurs. The NFS-Storage is based on ext4 and running over 1 GBit/s, I don't know if thats important.

How can I fix that?
 
Does the journal say anything before/while after the error occurs (`journalctl -f` while letting the command run)?
Is your cluster quorate, while moving the disk (`pvecm status`)?
 
All nodes are online and running fine. I can migrate a VM, if I have created it with the Disk on the nfs storage.
The storage server and all nodes are fine, there are no errors in the journal.
 
Migration, when the disk is on a shared NFS-Storage, transfers significantly less data (only the machine's state, and not its disk-data), so that would not rule out that there is a temporary problem in your cluster-network.

Is the NFS-traffic on a physically different network from corosync?
 
The latency is at around 0.2-0.3 ms, so should be fine. This stays unter 1 ms even when I fully saturate the connection with the storage server. Also the cluster is working completely fine, the only issue is the migration of big vm disks, the problem does not occur if I migrate a small disk from local to nfs.
 
hm - the latency values sound ok! - maybe an issue with the nfs-server - any chance you could try that either via CIFS, or to another storage?

please post your storage.cfg and the VM-config
 
Same here, not just migration to NFS storage fails. Any operation which creates a disk on NFS storage fails. We have a NFS storage which is shared between two PVE nodes (not in a cluster, standalone). One with pve-manager/5.1-36/131401db (running kernel: 4.13.8-3-pve) which works fine and another one with pve-manager/5.2-11/13c2da63 (running kernel: 4.15.18-7-pve). When creating a VM on NFS or migrating VM to NFS or restoring a VM with a disk on NFS storage it fails after 60 seconds or so. It looks like "qemu-img create" doing something different now, comparing to previous versions, as it works too slow on NFS. I even tried to export NFS share and mount it on the same machine. Local disk creation is ok, through NFS - too slow and fails with timeout. The older machine still works fine with this new share.
Code:
Nov 23 13:44:31 srv22 pvedaemon[4653]: VM 201811235 creating disks failed
Nov 23 13:44:31 srv22 pvedaemon[10036]: <veenee@pve> end task UPID:srv22:0000122D:0B6A8F4A:5BF7F5F3:qmcreate:201811235:veenee@pve: unable to create VM 201811235 - error with cfs lock 'storage-backup-srv1': unable to create image: got lock timeout - aborting command
 
Just tested it with cifs (running on the same storage server as nfs), runs completely fine...
 
Just tested it with cifs (running on the same storage server as nfs), runs completely fine...
maybe some different mount-options for nfs could help? (I would google for the storage (brand+model) and nfs-options)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!