Storage migration from ZFS to NFS fails for some containers

mammer

Member
Sep 24, 2019
4
0
21
53
I'm migrating from local storage to NFS. I powerdown the container before migration and start the storage migration. Some containers (about 30% I gues) fail. I see the .raw file being created on NFS (and in the proxmox progress dialog). Next the rsync starts and seems to finish (I see a speedup of 1.0, so rsync seems to be done). After this the migration just stops: no further progress and the container stays locked (as if the migration has not finished).
I checked to see if the PID mentioned in the status dialogue of proxmox is there and I see it in the ps output.
The only step forward now is to stop the migration: the nfs is deleted and the container is unlocked.

The NFS server seems ok and I have no problems with some containers so it's a weird problem. No logging output and nothing in the logs.

Anyone got an idea how to debug this?
 
I'm migrating from local storage to NFS. I powerdown the container before migration and start the storage migration. Some containers (about 30% I gues) fail. I see the .raw file being created on NFS (and in the proxmox progress dialog). Next the rsync starts and seems to finish (I see a speedup of 1.0, so rsync seems to be done). After this the migration just stops: no further progress and the container stays locked (as if the migration has not finished).
I checked to see if the PID mentioned in the status dialogue of proxmox is there and I see it in the ps output.
The only step forward now is to stop the migration: the nfs is deleted and the container is unlocked.

The NFS server seems ok and I have no problems with some containers so it's a weird problem. No logging output and nothing in the logs.

Anyone got an idea how to debug this?
For investigation a look into the tasklogs would be necessary.
 
I found the reason after some debugging with strace: one of my backup storage units is on NFS. It's on a wake-on lan schedule, so during the migration the host was off. When I move a VM to another NFS server while the backup storage is down, the task hangs. Turning it on solves the problem. So basically migration does not seem to work when an NFS node is unavailable, even though it's not involved.

I have some strace logs showing this and can easily reproduce it. I've created https://bugzilla.proxmox.com/show_bug.cgi?id=2407 to track it.

The clarify why it seems to happen to 30% of the VMs: the backup server turns on in the evening, so when I started moving the host was down and failed. And suddenly it started working as the host was waking up for the backups to start about an hour later.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!