[SOLVED] Migration of LXC on ZFS looses ZFS underlying snapshots

jinjer

Renowned Member
Oct 4, 2010
204
7
83
There's a bug (bad feature) when migrating LXC containers hosted on ZFS: It looses the snapshots.

Longer explanation: I snapshot routinely all LXC containers for backup and replication. This is "a goog thing" and saved my azz a few times over the years.

I discovered that proxmox will migrate but loose all previous snapshots. What is worse, it will delete the zfs filesystem and all snapshots on the originating system.

So... weeks and months (and years) of previous snapshots (i.e. backups) are lost on the first migration.

Can we fix this please?

EDIT: By snapshots I mean the snapshots that the system (i.e. I) make on the hosting server, not using the gui. These are there for protection of the "hoster" and are not visible in the proxmox gui/end users.

EDIT2: The fix is simple (from an admin point of view). You must send the complete ZFS stream (including snapshots) and not just the last snapshot. It's a simple switch on the zfs send command line.
 
Last edited:
  • Like
Reactions: EuroDomenii
Hi,

this is by design, because there are much more storage backend out there what not support snapshot.
So it is not possible to do so, in a generic way.
 
You're saying it's a design decision, however evidence is there this cannot be correct.

In Storage.pm there's an explicit check for zfs backend and a branch to take advanted of the zfs features in a non-compatible way for other storages.

Specifically, Proxmox uses the command "zfs send" to send data from one server and "zfs recv" to receive the data on the other server. This is specific zfs command. It does not just "rsync" the contents in a generic way.

By the same extent, it should use the command in the proper way to preserve as much of the attributes of the zfs backend, accross the servers and not loose data or other features.

Again, as there is a specific copy command for the ZFS backend, (zfs send, which proxmox is taking advantage of), it is trivial to just use it properly preserving the complete stream.

This is not a design decision, but just a simple oversight. It's easy to correct. Do you want me to send you a patch ?
 
Last edited:
You're saying it's a design decision, however evidence is there this cannot be correct.

Containers can have more than one mount point, using different storage types. I would simply reject migration if containers have snapshots on local storage.
 
I don't follow you. If you move a container from ZFS to ZFS, you use "zfs send". This is hard-coded in Storage.pm.

What I'm asking is to use "zfs send -R" instead.

Please give an example of a situation that will not work.

Thank you.

PS: Just to clarify, we're not talking about the snapshots done trough proxmox gui, but snapshots on the local zfs filesystem (automatic, periodic, done every few minutes and handled externally by specific scripts). These are never seen by proxmox and are just part of the zfs share (zfs filesystem) that is holding the container files. There is one FS per container and it is segregated from other containers. It's an atomic entity that can be simply moved to the local storage of another server, without influencing the rest of proxmox. I know because I've been doing it for years before zfs was an official backend for proxmox.
 
Last edited:
Ok, I get you. We're talking pears and apples.

In the case of ZFS to LVM, we're converting storage backends. It would be expected that not all features can be "converted" or supported on all backends. This is normal and can be documented. In this specific case Storage.pm would not use "zfs send". It would probably just rsync the files.

However, the case I'm talking about is migrating data from one server to another identical with the same backend (as in normal in a cluster environment). In this specific case of zfs-to-zfs migration, Storage.pm already is "smart" enough and uses proper tools. Read the following code from storage_migrate function in /usr/share/perl5/PVE/Storage.pm:

} elsif ($scfg->{type} eq 'zfspool') {
if ($tcfg->{type} eq 'zfspool') {
...
...
my $send = "zfs send -pv $zfspath\@__migration__ \| ssh root\@$target_host zfs recv $zfspath";
...
...
}
}

It means if SOURCE is ZFS and DEST is ZFS, then use ZFS send.
all other cases are conversions and I don't know how they're currently handled.

The problem that I'm asking to fix and offering to give you a patch is to replicate the whole ZFS stream and not just the last ZFS snapshot.

I do not mean to sound harsh... but I'm probably not getting my message trough.

Jinjer.
 
The problem is not when migrating from ZFS to LVM, but when migrating a container that has for example the rootfs on ZFS, and a mountpoint on LVM(-thin). Since there is no LVM-send that allows us to send the LVM snapshots, we would end up with an inconsistent state after migrating - for some volumes, the snapshots are migrated, for others they are not.

However, we are currently working on improving this situation by detecting when a consistent migration including snapshots is possible.
 
Fabian, are you talking about the snapshots seen in proxmox or the filesystem snapshots ?

And, how do you create such a situation, with multiple mount points on the same LXC ? Currently the gui will not let you do this.

Also: why one would want to support all X * Y combinations ? I see no reason for one to want ZFS for the root and LVM for some other point, on the same server (i.e. not shared storage). I could envision LVM on shared iSCSI, but then there's no need to migrate anything there as it's shared.

My idea is: You have a ZFS filesystem assigned to an LXC. When migrating, migrate it as a whole in its integriry, not just the last piece and not because some other arbitrary XYZ filesystem cannot do the same. You allow it and then say "It's by design" because ZFS can do it, while LVM cannot, hence you end up consistent with the underlying FS capabilities.

Currently, you have *data loss* because the ZFS is migrated and all previous snaps are *silently* dropped, with no warning and nothing to indicate this. Is this a "good thing" ? If it is, it's bad design, and undocumented too.
 
Last edited:
Fabian, are you talking about the snapshots seen in proxmox or the filesystem snapshots ?

And, how do you create such a situation, with multiple mount points on the same LXC ? Currently the gui will not let you do this.

this feature was recently added to the GUI in pve-manager, and was available via pct set for a while before that.

Also: why one would want to support all X * Y combinations ? I see no reason for one to want ZFS for the root and LVM for some other point, on the same server (i.e. not shared storage). I could envision LVM on shared iSCSI, but then there's no need to migrate anything there as it's shared.

My idea is: You have a ZFS filesystem assigned to an LXC. When migrating, migrate it as a whole in its integriry, not just the last piece and not because some other arbitrary XYZ filesystem cannot do the same. You allow it and then say "It's by design" because ZFS can do it, while LVM cannot, hence you end up consistent with the underlying FS capabilities.

Currently, you have *data loss* because the ZFS is migrated and all previous snaps are *silently* dropped, with no warning and nothing to indicate this. Is this a "good thing" ? If it is, it's bad design, and undocumented too.

I see your point, which is why I said above that are working on detecting situations where a migration including snapshots is possible (for example, because only ZFS is used as storage and we are migrating to ZFS as well), to support this use case.

It is easy enough for your setup, but it is not as easy in general unfortunately. Like I said, we are working on improving this.
 
I see your point, which is why I said above that are working on detecting situations where a migration including snapshots is possible (for example, because only ZFS is used as storage and we are migrating to ZFS as well), to support this use case.

It is easy enough for your setup, but it is not as easy in general unfortunately. Like I said, we are working on improving this.
There is no need to fix an issue in general, when the issue is only relevant to a specific case.

According to you it is not possible to fix it in general because some backends do not support snapshots. And not all backends can do this "easily", even if they do support.

This is why there are more backends, so that one can choose the appropriate one and it's up to the administrator to choose the appropriate backend.

It would be wise to fix a problem in the case it can be fixed and state that it cannot be fixed where it is not feasible (difficult and/or impossible).

Currently there is data loss during migration of zfs. Is this acceptable?
 
Containers can have more than one mount point, using different storage types. I would simply reject migration if containers have snapshots on local storage.
Dietmar... it does not reject. It looses snapshots silently. This is silent data loss. If this is by design, the design needs to be reconsidered.
 
Ok, I see that there's no hope for this to get fixed. I've patched my install and made some regression tests. It all works, so I'll add it to the set of patches I maintain locally.

If at any stage you change your mind, feel free to contact me.

cheers.
 
  • Like
Reactions: Ovidiu
For anyone interested, here's the patch:

Code:
# diff -u Storage.pm.org Storage.pm
--- Storage.pm.org      2016-03-16 18:13:01.086242490 +0100
+++ Storage.pm  2016-03-16 18:13:12.753853338 +0100
@@ -512,7 +512,7 @@

            my $snap = "zfs snapshot $zfspath\@__migration__";

-           my $send = "zfs send -pv $zfspath\@__migration__ \| ssh root\@$target_host zfs recv $zfspath";
+           my $send = "zfs send -pvR $zfspath\@__migration__ \| ssh root\@$target_host zfs recv $zfspath";

            my $destroy_target = "ssh root\@$target_host zfs destroy $zfspath\@__migration__";
            run_command($snap);
Restart pvedaemon service to reload the library:

#/etc/init.d/pvedaemon restart
 
  • Like
Reactions: Lof and Ovidiu