A big
THANK YOU for this comprehensive guide. I'm currently implementing it and it seems to be very clear to me that this approach will work somehow
Results:
ONLINE MIGRATION of VMs¹ only works, if you set up a replication job beforehand and
OFFLINE MIGRATION works out of the box for VMs and CTs.
¹As CTs are always "restarted" during migration the CT online migration is more like an offline migration. So the replication job for CTs is not needed necessarily.
And one small detail might be good to know for others too:
1. Applying your patch file (the code you were posting) using
patch /root/ZFSPoolPlugin.pm.patch /usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm
did result in an error because of a
missing bracket:
Code:
2024-11-23 23:28:29 [pve] Missing right curly or square bracket at /usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm line 870, at end of line
2024-11-23 23:28:29 [pve] syntax error at /usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm line 870, at EOF
2024-11-23 23:28:29 [pve] Compilation failed in require at /usr/share/perl5/PVE/Storage.pm line 38, <DATA> line 960.
2024-11-23 23:28:29 [pve] BEGIN failed--compilation aborted at /usr/share/perl5/PVE/Storage.pm line 38, <DATA> line 960.
2024-11-23 23:28:29 [pve] Compilation failed in require at /usr/share/perl5/PVE/CLI/pvesm.pm line 19, <DATA> line 960.
2024-11-23 23:28:29 [pve] BEGIN failed--compilation aborted at /usr/share/perl5/PVE/CLI/pvesm.pm line 19, <DATA> line 960.
2024-11-23 23:28:29 [pve] Compilation failed in require at /usr/sbin/pvesm line 6, <DATA> line 960.
2024-11-23 23:28:29 [pve] BEGIN failed--compilation aborted at /usr/sbin/pvesm line 6, <DATA> line 960.
I could resolve this by adding the curly bracket before your comment
### set key location and load key
. So my patch file would look like this:
ZFSPoolPlugin.pm.patch
Code:
755,761c755
< my $cmd = ['zfs', 'send'];
< my $encrypted = $class->zfs_get_properties($scfg, 'encryption', "$scfg->{pool}/$dataset");
< if ($encrypted !~ m/^off$/) {
< push @$cmd, '-Rpvw';
< } else {
< push @$cmd, '-Rpv';
< }
---
> my $cmd = ['zfs', 'send', '-Rpv'];
817,829d810
< }
< ### set key location and load key
< my $encrypted = $class->zfs_get_properties($scfg, 'encryption', $zfspath);
< if ($encrypted !~ m/^off$/) {
< my $keystatus = $class->zfs_get_properties($scfg, 'keystatus', $zfspath);
< if ($keystatus eq "unavailable") {
< my ($parent) = $zfspath =~ /(.*)\/.*$/;
< my $keylocation = $class->zfs_get_properties($scfg, 'keylocation', $parent);
< my $keyformat = $class->zfs_get_properties($scfg, 'keyformat', $parent);
< eval { run_command(['zfs', 'set', "keylocation=$keylocation", $zfspath]) };
< eval { run_command(['zfs', 'set', "keyformat=$keyformat", $zfspath]) };
< eval { run_command(['zfs', 'load-key', $zfspath]) };
< }
What I did and my results
1. Made sure I'm using
the same encryption key for the ZFS datasets on my nodes. As I have two nodes I just took the key from node #1 and copied it over to node #2
to the exact same place. As the ZFS dataset
tank/encrypted
on node #2 has already been unlocked during startup
using the old key I just issued the following command to
change the encryption key to the new keyfile (the one from node #1):
zfs change-key -l -o keylocation=[URL='https://forum.proxmox.com/file:///root/tank_key'][COLOR=#ffffff]file:///root/tank_key[/COLOR][/URL] -o keyformat=raw tank/encrypted/vm-data-migrate
Optional but in my opinion this makes sense to not mess with already created ZFS datasets.
2. Created new ZFS dataset below the already encrypted root (e.g.
tank/encrypted/vm-data-migrate
) on both nodes:
zfs create -o mountpoint=/storage/tank-encrypted/vm-data-migrate tank/encrypted/vm-data-migrate
3. Added the new ZFS dataset to my cluster using PROXMOX GUI.
4. Moved the storage of one (or more) of my VMs into this new ZFS dataset using PROXMOX GUI (
Click VM -> Hardware -> Click Hard Disk -> Disc Action -> Move Storage
).
5. Now the storage of e.g. VM 999 was located at
tank/encrypted/vm-data-migrate/vm-999-disk-0
but
still not its own encryption root yet:
zfs get name,keylocation,keyformat,encryption,keystatus,encryptionroot tank/encrypted/vm-data-migrate/vm-999-disk-0
Code:
NAME PROPERTY VALUE SOURCE
tank/encrypted/vm-data-migrate/vm-999-disk-0 name tank/encrypted/vm-data-migrate/vm-999-disk-0 -
tank/encrypted/vm-data-migrate/vm-999-disk-0 keylocation none default
tank/encrypted/vm-data-migrate/vm-999-disk-0 keyformat raw -
tank/encrypted/vm-data-migrate/vm-999-disk-0 encryption aes-256-gcm -
tank/encrypted/vm-data-migrate/vm-999-disk-0 keystatus available -
tank/encrypted/vm-data-migrate/vm-999-disk-0 encryptionroot tank/encrypted -
6. Encrypted the ZFS dataset for VM 999 (
made ZFS dataset its own encryption root without inheritance) using
the same keyfile as for the already encrypted root:
zfs change-key -l -o keylocation=[URL='https://forum.proxmox.com/file:///root/tank_key'][COLOR=#ffffff]file:///root/tank_key[/COLOR][/URL] -o keyformat=raw tank/encrypted/vm-data-migrate/vm-999-disk-0
Note: During my tests it was not possible to have the ZFS dataset of VM 999 within an unencrypted root and add encryption afterwards using the change-key command. Probably this works somehow but I didn't know / find how. Another method would be to create the encrypted dataset beforehand and sync the unencrypted one into it using this command: zfs send tank/vm-data-migrate/vm-999-disk-0 | zfs receive -x encryption tank/encrypted/vm-data-migrate/vm-999-disk-0
7. Seems like it has worked out:
zfs get name,keylocation,keyformat tank/encrypted/vm-data-migrate/vm-999-disk-0
Code:
NAME PROPERTY VALUE SOURCE
tank/encrypted/vm-data-migrate/vm-999-disk-0 name tank/encrypted/vm-data-migrate/vm-999-disk-0 -
tank/encrypted/vm-data-migrate/vm-999-disk-0 keylocation file:///root/tank_key local
tank/encrypted/vm-data-migrate/vm-999-disk-0 keyformat raw -
tank/encrypted/vm-data-migrate/vm-999-disk-0 encryption aes-256-gcm -
tank/encrypted/vm-data-migrate/vm-999-disk-0 keystatus available -
tank/encrypted/vm-data-migrate/vm-999-disk-0 encryptionroot tank/encrypted/vm-data-migrate/vm-999-disk-0 -
8. Now I patched
ZFSPoolPlugin.pm
using the patch from above
on both nodes and started my first migration of VM 999.
9. I started with an
OFFLINE MIGRATION from node #1 to node #2. During the process the
following error message appeared:
Migration log:
Code:
copying local disk images
full send of tank/encrypted/vm-data-migrate/vm-999-disk-0@__migration__ estimated size is 13.8G
total estimated size is 13.8G
TIME SENT SNAPSHOT tank/encrypted/vm-data-migrate/vm-999-disk-0@__migration__
[pve] cannot set property for 'tank/encrypted/vm-data-migrate/vm-999-disk-0': keylocation must not be 'none' for encrypted datasets
[pve] cannot set property for 'tank/encrypted/vm-data-migrate/vm-999-disk-0': 'keyformat' is readonly
[pve] successfully imported 'tank-encrypted-vm-data-migrate:vm-999-disk-0'
volume 'tank-encrypted-vm-data-migrate:vm-999-disk-0' is 'tank-encrypted-vm-data-migrate:vm-999-disk-0' on the target
migration finished successfully (duration 00:00:34)
But the migration
seems to have worked out (command executed on node #2):
zfs get name,keylocation,keyformat tank/encrypted/vm-data-migrate/vm-999-disk-0
Code:
NAME PROPERTY VALUE SOURCE
tank/encrypted/vm-data-migrate/vm-999-disk-0 name tank/encrypted/vm-data-migrate/vm-999-disk-0 -
tank/encrypted/vm-data-migrate/vm-999-disk-0 keylocation file:///root/tank_key local
tank/encrypted/vm-data-migrate/vm-999-disk-0 keyformat raw -
tank/encrypted/vm-data-migrate/vm-999-disk-0 encryption aes-256-gcm -
tank/encrypted/vm-data-migrate/vm-999-disk-0 keystatus available -
tank/encrypted/vm-data-migrate/vm-999-disk-0 encryptionroot tank/encrypted/vm-data-migrate/vm-999-disk-0 -
Migrating back also worked flawlessly.
10. Now it was time to try the
ONLINE MIGRATION too. And I can say –
it initially did not work out for me! The migration itself did not throw any error:
Migration log:
Code:
starting VM 999 on remote node 'pve2'
volume 'tank-encrypted-vm-data-migrate:vm-999-disk-0' is 'tank-encrypted-vm-data-migrate:vm-999-disk-0' on the target
start remote tunnel
ssh tunnel ver 1
starting storage migration
scsi0: start migration to nbd:192.168.9.1:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred 0.0 B of 16.0 GiB (0.00%) in 0s
[...]
drive-scsi0: transferred 16.0 GiB of 16.0 GiB (100.00%) in 28s, ready
all 'mirror' jobs are ready
switching mirror jobs to actively synced mode
drive-scsi0: switching to actively synced mode
drive-scsi0: successfully switched to actively synced mode
starting online/live migration on tcp:192.168.9.1:60000
set migration capabilities
migration downtime limit: 100 ms
migration cachesize: 256.0 MiB
set migration parameters
spice client_migrate_info
start migrate command to tcp:192.168.9.1:60000
migration active, transferred 959.4 MiB of 2.2 GiB VM-state, 781.6 MiB/s
migration active, transferred 1.5 GiB of 2.2 GiB VM-state, 793.3 MiB/s
average migration speed: 746.9 MiB/s - downtime 21 ms
migration status: completed
all 'mirror' jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi0: mirror-job finished
stopping NBD storage migration server on target.
Waiting for spice server migration
migration finished successfully (duration 00:00:45)
But after checking the ZFS dataset on node #2
the encryption parameters have not been preserved:
zfs get name,keylocation,keyformat,encryption,keystatus,encryptionroot tank/encrypted/vm-data-migrate/vm-999-disk-0
Code:
NAME PROPERTY VALUE SOURCE
tank/encrypted/vm-data-migrate/vm-999-disk-0 name tank/encrypted/vm-data-migrate/vm-999-disk-0 -
tank/encrypted/vm-data-migrate/vm-999-disk-0 keylocation none default
tank/encrypted/vm-data-migrate/vm-999-disk-0 keyformat raw -
tank/encrypted/vm-data-migrate/vm-999-disk-0 encryption aes-256-gcm -
tank/encrypted/vm-data-migrate/vm-999-disk-0 keystatus available -
tank/encrypted/vm-data-migrate/vm-999-disk-0 encryptionroot tank/encrypted -
After I
set up a replication job from node #1 to node #2
the online migration worked and the
encryption parameters are preserved and in sync.
Migration log:
Code:
use dedicated network address for sending migration traffic (192.168.9.2)
starting migration of VM 999 to node 'pve2' (192.168.9.2)
found local, replicated disk 'tank-encrypted-vm-data-migrate:vm-999-disk-0' (attached)
scsi0: start tracking writes using block-dirty-bitmap 'repl_scsi0'
replicating disk images
start replication job
guest => VM 999, running => 3573163
volumes => tank-encrypted-vm-data-migrate:vm-999-disk-0
create snapshot '__replicate_999-0_1732410934__' on tank-encrypted-vm-data-migrate:vm-999-disk-0
using insecure transmission, rate limit: none
incremental sync 'tank-encrypted-vm-data-migrate:vm-999-disk-0' (__replicate_999-0_1732410900__ => __replicate_999-0_1732410934__)
send from @__replicate_999-0_1732410900__ to tank/encrypted/vm-data-migrate/vm-999-disk-0@__replicate_999-0_1732410934__ estimated size is 1.50M
total estimated size is 1.50M
TIME SENT SNAPSHOT tank/encrypted/vm-data-migrate/vm-999-disk-0@__replicate_999-0_1732410934__
[pve2] successfully imported 'tank-encrypted-vm-data-migrate:vm-999-disk-0'
delete previous replication snapshot '__replicate_999-0_1732410900__' on tank-encrypted-vm-data-migrate:vm-999-disk-0
(remote_finalize_local_job) delete stale replication snapshot '__replicate_999-0_1732410900__' on tank-encrypted-vm-data-migrate:vm-999-disk-0
end replication job
starting VM 999 on remote node 'pve2'
volume 'tank-encrypted-vm-data-migrate:vm-999-disk-0' is 'tank-encrypted-vm-data-migrate:vm-999-disk-0' on the target
start remote tunnel
ssh tunnel ver 1
starting storage migration
scsi0: start migration to nbd:192.168.9.2:60002:exportname=drive-scsi0
drive mirror re-using dirty bitmap 'repl_scsi0'
drive mirror is starting for drive-scsi0
drive-scsi0: transferred 1.4 MiB of 1.4 MiB (100.00%) in 0s
drive-scsi0: transferred 1.4 MiB of 1.4 MiB (100.00%) in 1s, ready
all 'mirror' jobs are ready
switching mirror jobs to actively synced mode
drive-scsi0: switching to actively synced mode
drive-scsi0: successfully switched to actively synced mode
starting online/live migration on tcp:192.168.9.2:60001
set migration capabilities
migration downtime limit: 100 ms
migration cachesize: 256.0 MiB
set migration parameters
spice client_migrate_info
start migrate command to tcp:192.168.9.2:60001
migration active, transferred 1.0 GiB of 2.2 GiB VM-state, 1.2 GiB/s
average migration speed: 1.1 GiB/s - downtime 57 ms
migration status: completed
all 'mirror' jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi0: mirror-job finished
# /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve2' -o 'UserKnownHostsFile=/etc/pve/nodes/pve2/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.9.2 pvesr set-state 999 \''{"local/pve1":{"last_iteration":1732410934,"last_node":"pve1","duration":3.846521,"last_try":1732410934,"last_sync":1732410934,"storeid_list":["tank-encrypted-vm-data-migrate"],"fail_count":0}}'\'
stopping NBD storage migration server on target.
Waiting for spice server migration
migration finished successfully (duration 00:00:20)
TASK OK
I used
PVE 8.2.7 to try this.
Hints, thoughts and comments are welcome