Thanks for implementing this feature (snapshots on shared thick LVM), Proxmox devs! I know I've had to explain to my colleagues in the past why, if we wanted to experiment with Proxmox to replace our VMWare infrastructure, we'd have to either
also experiment with Ceph at the same time (to be able to get live migration and snapshots) or start experimenting with it on shared LUNs on the SAN we already own (but not be able to snapshot VMs).
I tried this today in a small brand-new 9-BETA cluster, and I did get it to work, and I can see the multiple generations of qcow2-named LVs as I take one snap after another. However, when I tried to roll back to a much earlier snap, it refuses me — it seems that it will only roll back to the last snapshot that was created, or at least the last one that still exists now. Is this true, and is it a permanent restriction? Does the
entire process only work on a linear chain of snapshots, and the only rollback allowed is one backward, to the final link in the chain? And if I want to roll back several links, I must delete all the later snaps until the one to restore to is now the final one?
Obviously one very common use of VM snaps is to provide just-in-case reverting during a patch window, like
- take a snap
- try to patch item A
- if it goes badly, restore to "1" and try to do A again or differently
- take another snap
- try to patch item B
- again, if it blows up, restore to "4"
- take another snap
- try to patch C
- if it blows up, restore to "7"
- success, patching all items is done! so now delete all the snaps
So PVE's new capability will support this workflow. But sometimes the snapshots are
not all in one linear flow; sometimes I'm trying to troubleshoot some very complex set of interactions, and I want to be able to take snapshots 1, 2, 3, 4, and then temporarily roll back to "1" to try something else while keeping the 2,3,4 ones around, then forget that little distraction and again roll back (roll forward? restore?) to "4" and keep going. Or roll back to "2" and then start another branch from there, making 5, 6, 7 that are descended from "2" while 3,4 continue to be descended from "2". This sort of branched workflow is definitely supported in the VMWare world, but I've never really tried to do it in the PVE world, so I wonder if it's not supported at all? Or currently only supported on the file-based storages (like QCOW2-stored images on NFS)? Or supported in shared Ceph (either Ceph RBD or CephFS)?
Thanks very much for your time. I know end users are never satisfied, so no matter what new brilliant things you create, there will be someone complaining about it. Just know that I think y'all are amazing, you've made an incredible product, and I am trying over years to convince my colleagues we should move off VMWare, and I'm still trying.