Replication not possible on encrypted data pool?

Elleni

Active Member
Jul 6, 2020
150
6
38
51
I have setup a 2 node cluster, and migration of a vm from one node to the other works fine, but trying to enable replication, I get the following error:
Code:
2020-11-03 22:44:01 106-0: start replication job
2020-11-03 22:44:01 106-0: guest => VM 106, running => 19766
2020-11-03 22:44:01 106-0: volumes => name:vm-106-disk-0
2020-11-03 22:44:01 106-0: freeze guest filesystem
2020-11-03 22:44:01 106-0: create snapshot '__replicate_106-0_1604439841__' on name:vm-106-disk-0
2020-11-03 22:44:01 106-0: thaw guest filesystem
2020-11-03 22:44:01 106-0: using secure transmission, rate limit: none
2020-11-03 22:44:01 106-0: full sync 'name:vm-106-disk-0' (__replicate_106-0_1604439841__)
2020-11-03 22:44:02 106-0: cannot send pool/name/vm-106-disk-0@__replicate_106-0_1604439841__: encrypted dataset pool/names/vm-106-disk-0 may not be sent with properties without the raw flag
2020-11-03 22:44:02 106-0: command 'zfs send -Rpv -- pool/name/vm-106-disk-0@__replicate_106-0_1604439841__' failed: exit code 1
2020-11-03 22:44:02 106-0: cannot receive: failed to read from stream
2020-11-03 22:44:02 106-0: cannot open 'pool/name/vm-106-disk-0': dataset does not exist
2020-11-03 22:44:02 106-0: command 'zfs recv -F -- pool/name/vm-106-disk-0' failed: exit code 1
2020-11-03 22:44:02 106-0: delete previous replication snapshot '__replicate_106-0_1604439841__' on name:vm-106-disk-0
2020-11-03 22:44:02 106-0: end replication job with error: command 'set -o pipefail && pvesm export name:vm-106-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_106-0_1604439841__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=nodename' root@ip.ad.dr.ess -- pvesm import name:vm-106-disk-0 zfs - -with-snapshots 1 -allow-rename 0' failed: exit code 1
How can the raw flag be set? Or is this currently not supported?

I have all my vms on node1 on an encrypted datapool. I tried to remove the second node from the cluster and create an unencrypted pool and re-joined to the cluster, but that does not work either. So what are my options if I'd like to enable replication of the vm's to a second clusternode while keeping encrypted datapool?

While migrating some VMs from one node to another, I realize that some of the nodes migrate without problem, others fail for the same reason? Following the logs of a failed one. This vms are on the same datapool so I don't understand what the difference could be and why some are able to be migrated to the other node while others fail. On the other hand this makes me hope that it is no a matter of replication not being possible on encrypted datasets. Thanks in advance for any assistance.
Code:
2020-11-04 00:14:04 starting migration of VM ID to node 'nodename' (192.168.x.y)
2020-11-04 00:14:05 found local, replicated disk 'name:vm-106-disk-0' (in current VM config)
2020-11-04 00:14:05 scsi0: start tracking writes using block-dirty-bitmap 'repl_scsi0'
2020-11-04 00:14:05 replicating disk images
2020-11-04 00:14:05 start replication job
2020-11-04 00:14:05 guest => VM 106, running => 19766
2020-11-04 00:14:05 volumes => name:vm-106-disk-0
2020-11-04 00:14:06 freeze guest filesystem
2020-11-04 00:14:06 create snapshot '__replicate_106-0_1604445245__' on name:vm-106-disk-0
2020-11-04 00:14:06 thaw guest filesystem
2020-11-04 00:14:06 using secure transmission, rate limit: none
2020-11-04 00:14:06 full sync 'name:vm-106-disk-0' (__replicate_106-0_1604445245__)
2020-11-04 00:14:07 cannot send pool/name/vm-106-disk-0@__replicate_106-0_1604445245__: encrypted dataset pool/name/vm-106-disk-0 may not be sent with properties without the raw flag
2020-11-04 00:14:07 command 'zfs send -Rpv -- pool/name/vm-106-disk-0@__replicate_106-0_1604445245__' failed: exit code 1
2020-11-04 00:14:07 cannot receive: failed to read from stream
2020-11-04 00:14:07 cannot open 'pool/name/vm-106-disk-0': dataset does not exist
2020-11-04 00:14:07 command 'zfs recv -F -- pool/name/vm-106-disk-0' failed: exit code 1
send/receive failed, cleaning up snapshot(s)..
2020-11-04 00:14:07 delete previous replication snapshot '__replicate_106-0_1604445245__' on name:vm-106-disk-0
2020-11-04 00:14:07 end replication job with error: command 'set -o pipefail && pvesm export name:vm-106-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_106-0_1604445245__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=nodename' root@192.168.x.y -- pvesm import name:vm-106-disk-0 zfs - -with-snapshots 1 -allow-rename 0' failed: exit code 1
2020-11-04 00:14:07 ERROR: Failed to sync data - command 'set -o pipefail && pvesm export name:vm-106-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_106-0_1604445245__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=nodename' root@192.168.x.y -- pvesm import name:vm-106-disk-0 zfs - -with-snapshots 1 -allow-rename 0' failed: exit code 1
2020-11-04 00:14:07 aborting phase 1 - cleanup resources
2020-11-04 00:14:07 scsi0: removing block-dirty-bitmap 'repl_scsi0'
2020-11-04 00:14:07 ERROR: migration aborted (duration 00:00:03): Failed to sync data - command 'set -o pipefail && pvesm export name:vm-106-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_106-0_1604445245__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=nodename' root@192.168.x.y -- pvesm import name:vm-106-disk-0 zfs - -with-snapshots 1 -allow-rename 0' failed: exit code 1
TASK ERROR: migration aborted
 
Last edited:
Hi,
this is currently not possible. See this bug report. Online migration without replication should work, as it uses a different mechanism (Qemu NBD migration) instead of ZFS send/receive.
 
  • Like
Reactions: Elleni
Thanks for confirmation. Strangely it does work with part of the vms while with others I get above error, which I do not really understand, as the vms have similar configuration and are on the same pool. As its not possible I will probably change to unencrypted pool for now, Can you elaborate if its being worked on this and estimate an approximate time for this feature? As I need to enable replication I will remove clusternodes to recreate unencrypted pools for now and rebuild the cluster, but my boss made a requirement that the data should reside on an encrypted drive to be safe, so I would like to know if replication on encrypted pools will be implemented in future.
 
Yes, we do plan to implement it (see comment 10 in the bug report), but I honestly cannot give any time estimate. Since we plan to get the needed changes for this into upstream ZFS, it also depends on them and at the moment we are quite busy with the development of Proxmox Backup Server.
 
  • Like
Reactions: Elleni
Which is a great thing, I am currently testing it :)

Thanks for your information
 
As Fabian Grünbichler stated:
Code:
it seems like this issue was lost when upstream moved their github repository.
AFAIK there hasn't been any progress on this front yet.

Thus I would like to kindly ask if this issue could be recreated? Else there won't probably be any progress upstream, right?. It would be really usefull, if replication on zfs encrypted pools will be possible in near future? And if this has already been recreated, it would be nice to have a link to the issue to track it's progress, as we desperately wait for this functionality to become available.
 
Last edited:
One could recreate the issue, but the discussion was rather stale anyways (IIRC half a year without any messages before we bumped it and not much after that). I think upstream would be much happier with a pull request, which is what @Stoiko Ivanov is working on. But again, we cannot give any time estimate.
 
I understand. So thank you for your information. Is there any way to track the status on @Stoiko Ivanov 's work? I would like to follow the status in order to know when there is any progress, maybe even help testing if I can assist on this.
 
You can subscribe to the bug report (you need an account in the bugzilla for that) and keep at look at the developer mailing list (the initial version is likely going to be posted there).
 
  • Like
Reactions: Elleni
In comment 13 of the mentioned bug report there I read

...using -w just for encrypted datasets in volume_export seems to be no problem in practice...

Can someone explain me what that means, and if this could be an intermediate solution until the pull request will be done? Could somebody please describe this solution? I also would apreciate any comment on wether this is a good solution or if there are any drawbacks in a production environement. And is this modification update proof?

We want to be encrypted in case of stolen hardware thus I am considering to try this.
 
Last edited:
The problem with just using -w is that the encryption root of the volume will be the volume itself on the target. It doesn't share the same key as the dataset/pool it was migrated into. So you'd have to unlock every migrated volume on its own after every reboot (or unmount).
 
  • Like
Reactions: Elleni
We have the following setup. The unencrypted root/boot pool rpool is on a separate disk. What we want is to have the datapool which is setup on two separate mirrored disks, will be encrypted. So the workflow when rebooting a node will be to unencrypt the datapool by load-key anyway.

What and where would this -w have to be put and would this work in the above szenario? We need migration of vms and replication working.
 
To be more specific, after you migrate a volume with -w, you need to load the key manually before you can start using the volume on the target (and it will be the key the volume (or the containing dataset) had on the source).

Afterwards, you can correct the encryptionroot, by using
Code:
zfs change-key -i <pool/volume>
to avoid needing to load the key for every migrated volume manually after each reboot.
 
  • Like
Reactions: Elleni
Hi, I still don't understand exactly how to use the -w switch. Probably it has to be done in console, and it's possibly not something that can be put somewhere in a file or something in order to continue to use the webinterface of ProxMox to

a) configure replication of vm's to both nodes and
b) to do migration of vm's from one node to another, right?

Maybe I should better wait for an official solution that @Stoiko Ivanov might be working on and postpone the encryption of our datapool where our VMs are residing.
 
Yes, you'd need to do patch the volume_export/volume_import functions in /usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm as described in the bug report.
 
  • Like
Reactions: Elleni
What is the status on this issue? Is it in the meantime possible to have an encrypted datapool where the vm-disks reside - with replication enabled on the recent stable ProxMox VE version?
 
Unfortunately not yet. Please CC yourself to the bug report to get updates on the issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!