Replication job started failing

Kyryl

New Member
Nov 29, 2020
2
0
1
44
Hi!

I have two nodes configured, with the container on one of the nodes with additional volume mounted (configured with 1M record size). Replication used to run OK between node1 and node2, as well as migration to the the node2, but now incremental replication task started failing with:

... cannot receive incremental stream: incremental send stream requires -L (--large-block), to match previous receive. ...

I have another container with the 1M record size setup (initially running at node1) exhibiting the same issue and I was managed to work around the issue by completely deleting the volume on the 'destination' node (node1) and doing full replication once - after that I was able to migrate the container to the node1 but now the replication node1 -> node2 for that container started having the same issue. I guess removing the volume on node2 and doing full replication once would solve this issue (but keeping it in the current state to be able to investigate).

The only change that comes to mind is recent upgrade to the latest PVE (apparently pulling in the 2.0 ZFS libs).

PVE version:
#pveversion pve-manager/6.3-4/0a38c56f (running kernel: 5.4.98-1-pve)

Previously both nodes were running on the "pve-kernel-5.4.78-2-pve".

Thanks.
 
quickly skimming through the ZFS commit logs - it seems that the error-message and handling got introduced rather recently:
https://github.com/openzfs/zfs/commit/7bcb7f0840d1857370dd1f9ee0ad48f9b7939dfd
(meaning they are in ZFS >= 2.0.0 but not in the 0.8.x series (which was shipped with pve-kernel < 5.4.98)

as described in the commit-message - this exit is there to prevent a bug with receiving snapshots without '-L', when there was a previous send with '-L'.

Did you ever send+recv (using -L) this dataset manually? (AFAICT PVE's code does not use the '-L' option for zfs send+recv )

As far as I understand the commit message removing the destination and running a complete send+recv w/o -L should be ok (also for the future)

I hope this helps!
 
Thank you for looking into it, Stoiko! To answer your question - I was not using the custom send+recv, always using the PVE replication (via web UI).

Removing the destination and doing a full send+recv does not seem to help reliably: the next time I attempt to do the send+recv in the opposite direction the issue happens again.

I've managed to reproduce it on the freshly created container with the following steps:

1. create container on node1 with specifying the root volume to be on the storage with 1M record size

2. setup replication to node2 in the container "replication" submenu, trigger replication manually couple of times (succeeds all the time - both first-time full one and incremental ones subsequently)

3. migrate container to node2 (incremental send+recv succeeds, migration succeeds as well)

4. trigger the replication to node1 in the container "replication" submenu: incremental replication starts failing immediately, with the same message
 
This issue still seems to be around, and is apparently just that PVE does not support large blocks. I can replicate it using Kyryl's steps.

Is there any reason -L can't be specified in the send every time? From the manpage:
This flag has no effect if the large_blocks pool feature is disabled, or if the recordsize property of this filesystem has
never been set above 128KB

Edit: Modifying /usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm to add the -L flag to sub volume_export gets replication/migration working again. I'm not seeing any potential side effects, either, given that the flag does nothing unless large_blocks are in use.
 
Last edited:
Any update on this issue ?

I just upgraded from v7 to v8 and I'm facing this issue when trying to migrate my VM back to the server.
 
May I ask what is the status on this?

I just discovered the exact same problem.

I can confirm that I could fix it by modifying the ZFSPoolPlugin.m as follows:

Code:
sub volume_export {
    my ($class, $scfg, $storeid, $fh, $volname, $format, $snapshot, $base_snapshot, $with_snapshots) = @_;

    die "unsupported export stream format for $class: $format\n"
        if $format ne 'zfs';

    die "$class storage can only export snapshots\n"
        if !defined($snapshot);

    my $dataset = ($class->parse_volname($volname))[1];

    my $fd = fileno($fh);
    die "internal error: invalid file handle for volume_export\n"
        if !defined($fd);
    $fd = ">&$fd";

    # For zfs we always create a replication stream (-R) which means the remote
    # side will always delete non-existing source snapshots. This should work
    # for all our use cases.
    my $cmd = ['zfs', 'send', '-RpvL'];
...


note that I changed the line cmd from

Code:
my $cmd = ['zfs', 'send', '-Rpv'];

to

Code:
my $cmd = ['zfs', 'send', '-RpvL'];

this allows me now to move my LXC container to whatever node I want. The problem seems indeed to be related to the 1M blocksize; according to a lot of tests and comparisons, I found that in my specific use case, ZFS has better performance and compression ratios if I use 1M record size on the zpool, but then it seems that the replication seems not to work properly without the L flag.
 
Last edited:
*bump*

when is this bug being fixed?
After each Proxmox Update I need to manually fix the file

/usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm

and change the "cmd" from "-Rpv" to "-RpvL". This is because my datasets use the 1M blocksize.
I see that this file has been edited recently, it already looks a bit different than last time I fixed it, but the "L" flag is still missing.
This means that after each Proxmox update the CT replication is broken until I fix it manually :-(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!