Error "online storage migration not possible if snapshot exists" on zfspool (bug in QemuMigrate.pm?)

Jonathan Hankins

Active Member
Mar 17, 2018
10
4
43
47
Live migration with snapshots should be possible for zfspool storage, but I am getting the error "online storage migration not possible if snapshot exists".

The block at line 520 in QemuMigrate.pm is never reached because of the die statement at line 519. If I comment out line 519 and restart, live migration works correctly. EDIT: Loses snapshots, see reply in thread.

qemu-server 7.1-4

513 if (defined($snaprefs)) { 514 $local_volumes->{$volid}->{snapshots} = 1; 515 516 # we cannot migrate shapshots on local storage 517 # exceptions: 'zfspool' or 'qcow2' files (on directory storage) 518 519 die "online storage migration not possible if snapshot exists\n" if $self->{running}; 520 if (!($scfg->{type} eq 'zfspool' 521 || ($scfg->{type} eq 'btrfs' && $local_volumes->{$volid}->{format} eq 'raw') 522 || $local_volumes->{$volid}->{format} eq 'qcow2' 523 )) { 524 die "non-migratable snapshot exists\n"; 525 } 526 } 527
 
Last edited:
Hi,
you will lose your snapshots like that! Since the VM is currently using the disk, we cannot use zfs send/recv to migrate the disk together with the snapshots. We need to use QEMU's drive mirror to keep the VM's view of the disk consistent, but the drive mirror doesn't know anything about the snapshots.
 
we could extend this to work as long as the disks are/the VM is replicated, but that requires some additional logic to verify that everything is setup as expected, and that logic isn't there yet.
 
Right, this is actually the enhancement request in bug 2792, which I haven't come around to yet.
 
Got it - thank you. I edited my original post based on your warning. Also followed the ticket you mentioned on bugzilla.
 
Hello,
could you please look at this enhancement again - we would really, really need this functionality.
We just dumped our central Ceph storage (not proxmox Ceph) because of performance and stability issues and switched back to DAS (direct attached storage) - NVMe SSDs and ZFS. With NVMe SSDs at 100Euro/TB a central networked storage does not really make sense anymore.
BUT now we are unable to live migrate replicated VMs with snapshots - a severe limitation for us.
As far as I understand - the underlying ZFS send/recv should be able to replicate the snapshots also - it could be a problem with the structure of the existing code.
We've just extended all our proxmox licenses until the end 2025 - we're very satisfied with proxmox PVE - if we could live migrate VMs with snapshots - it would be perfect.

Thanks,
Gerald
 
  • Like
Reactions: UdoB
Just tried a node reboot with automigration - and watched it failing into a loop because of snapshots :)
Would be nice if migration would fail for snapshot-reasons (or having a locally connected cdrom) within the VM to have an option to get things to migrate with stop+start or simply eject a cdrom (yeah, true that was a big self inflicted error)...
 
Last edited:
This issue should be fixed with qemu-server >= 7.3-4 which is currently available in the pvetest repository. If you like, you can help test it (on non-production setups) or wait until it reaches the other repositories.
 
  • Like
Reactions: UdoB
I have a "pve-nosubscription" test cluster. When I enable the "test"-Repo via Gui a "full-upgrade" would install 12 packages, including the Kernel, "pve-kernel-helper" and some other. So I have some questions:
  • for this snapshot handling enhancement: is it sufficient to install qemu-server?
  • do I have to reboot the complete cluster? Or will new VMs be able to use this feature?
  • can I downgrade to 7.3-3 later on?
Thank you
 
Last edited:
I have a "pve-nosubscription" test cluster. When I enable the "test"-Repo via Gui a "full-upgrade" would install 12 packages, including the Kernel, "pve-kernel-helper" and some other.
You can enable the test repository, run apt update && apt install qemu-server and then disable the test repository again. And afterwards, I'd do another apt update to not see the other updates you weren't interested in anymore.

So I have some questions:
  • for this snapshot handling enhancement: is it sufficient to install qemu-server?
Yes

  • do I have to reboot the complete cluster? Or will new VMs be able to use this feature?
No. You don't even need to restart the VMs in this case. Old VMs will also be able to use the feature.

  • can I downgrade to 7.3-3 later on?
Yes, with apt install qemu-server=7.3-3.

Thank you
 
  • Like
Reactions: UdoB
Tested on my test-cluster with some Dell Xeon E5-2600; Kernel 6.1.10-1; zfs-local on rpool; with some snapshots of course; live, Debian-VM (only a single one); with configured replication in beforehand:

Works like a charm! Thank you very much :)


Now I am keen to see it in pve-enterprise. Is ETA in weeks or in months?

Best regards
 
Now I am keen to see it in pve-enterprise. Is ETA in weeks or in months?
I don't think the qemu-server release contains any really important bug fixes (only nice-to-have ones such as this) so I'd say there's no great hurry. But I'm not on the release team, so can't tell you anything concrete.
 
Do I need replication or does it work with local storage as well?
I have a bunch of VMs I'd like to migrate to a different host however all of them have snapshots.
 
Hi,
Do I need replication or does it work with local storage as well?
I have a bunch of VMs I'd like to migrate to a different host however all of them have snapshots.
no, you don't need replication. But online migration with local snapshots only works for ZFS storages.

EDIT: replication is required at the moment.
 
Last edited:
Both systems are using ZFS and have version 7.4.3 installed
However I still receive the error message when trying to migrate a VM

2023-06-29 13:59:31 can't migrate local disk 'local-zfs:vm-139-disk-0': online storage migration not possible if non-replicated snapshot exists
 
Please post the output of pveversion -v | grep qemu-server
 
Code:
root@pve1:~# pveversion -v | grep qemu-server
qemu-server: 7.4-3
root@pve2:~# pveversion -v | grep qemu-server
qemu-server: 7.4-3
 
Oh, sorry! You do need replication. I guess in principle we could also handle the case where it's not replicated, do a one time "replication" first and then migrate. But that is not implemented right now.
 
Replication by default doesnt make any sense for our use case at the moment.
But I'd highly appreciate the "one time" replication when running the migration.
Any chance you could add that to your roadmap?

I'm sure I'm not the only one who would benefit from that.


We have a bunch of nodes in a cluster and want to migrate VMs from one node to another in case we have to reboot or shut it down.
Replication by default would require a ton of additional disk space on all hosts and doesnt scale well.
 
Last edited:
  • Like
Reactions: fireon
Replication by default doesnt make any sense for our use case at the moment.
But I'd highly appreciate the "one time" replication when running the migration.
Any chance you could add that to your roadmap?
Feel free to open a feature request on our bugtracker: https://bugzilla.proxmox.com/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!