Replication error on 6.0-5 versions

ricardoj · Aug 3, 2019

Hi,

I'm testing some DR scenarios and came to errors in replication.

There is no HA configuration for the single VM I'm using for this test.

Scenario is :

- 3 Proxmox 6.0-5 version configured to work in Cluster mode

- Nodes are named pve-t01 / pve-t02 / pve-t03

- Just one VM that is in stopped during all tests

I can migrate this VM as many times I want and can let replication running for hours without any error.

Just to simulate a node crash I shutdown the node where this VM ( stopped ) is.

After the node is no longer responding I moved this VM to one of the 2 nodes.

mv /etc/pve/nodes/pve-t01/qemu-server/111.conf /etc/pve/nodes/pve-t03/qemu-server/111.conf

I can see VM in the target node and replication configuration change nodes as expected.

But, most of the times I do this procedure, one node no longer replicate and I must delete the VM from that node so the replication process can continue.

I found this old thread but in my case there is only one small stopped VM.

There are some LOGs from replication.

Test #1 - VM was on node T01 and I shutdwon thar node

- I move VM from node T01 to node T03

Replication to node T02 is in error with the following LOG

======================
2019-08-03 16:28:03 111-0: start replication job
2019-08-03 16:28:03 111-0: guest => VM 111, running => 0
2019-08-03 16:28:03 111-0: volumes => L-Stor:vm-111-disk-1
2019-08-03 16:28:03 111-0: delete stale replication snapshot '__replicate_111-0_1564860301__' on L-Stor:vm-111-disk-1
2019-08-03 16:28:04 111-0: (remote_prepare_local_job) delete stale replication snapshot '__replicate_111-0_1564860301__' on L-Stor:vm-111-disk-1
2019-08-03 16:28:04 111-0: create snapshot '__replicate_111-0_1564860483__' on L-Stor:vm-111-disk-1
2019-08-03 16:28:04 111-0: full sync 'L-Stor:vm-111-disk-1' (__replicate_111-0_1564860483__)
2019-08-03 16:28:04 111-0: full send of L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__ estimated size is 1.46G
2019-08-03 16:28:04 111-0: send from @__replicate_111-1_1564860304__ to L-Stor/vm-111-disk-1@__replicate_111-0_1564860483__ estimated size is 624B
2019-08-03 16:28:04 111-0: total estimated size is 1.46G
2019-08-03 16:28:04 111-0: TIME SENT SNAPSHOT L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:28:04 111-0: L-Stor/vm-111-disk-1 name L-Stor/vm-111-disk-1 -
2019-08-03 16:28:04 111-0: volume 'L-Stor/vm-111-disk-1' already exists
2019-08-03 16:28:04 111-0: 140160 B 136.9 KB 0.77 s 182784 B/s 178.50 KB/s
2019-08-03 16:28:04 111-0: write: Broken pipe
2019-08-03 16:28:04 111-0: warning: cannot send 'L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__': signal received
2019-08-03 16:28:04 111-0: warning: cannot send 'L-Stor/vm-111-disk-1@__replicate_111-0_1564860483__': Broken pipe
2019-08-03 16:28:05 111-0: cannot send 'L-Stor/vm-111-disk-1': I/O error
2019-08-03 16:28:05 111-0: command 'zfs send -Rpv -- L-Stor/vm-111-disk-1@__replicate_111-0_1564860483__' failed: exit code 1
2019-08-03 16:28:05 111-0: delete previous replication snapshot '__replicate_111-0_1564860483__' on L-Stor:vm-111-disk-1
2019-08-03 16:28:05 111-0: end replication job with error: command 'set -o pipefail && pvesm export L-Stor:vm-111-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_111-0_1564860483__ | /usr/bin/cstream -t 50000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve-t02' root@192.168.0.228 -- pvesm import L-Stor:vm-111-disk-1 zfs - -with-snapshots 1' failed: exit code 255
======================

After that I deleted VM on node T02 and replication continue to work.

======================
2019-08-03 16:39:00 111-0: start replication job
2019-08-03 16:39:00 111-0: guest => VM 111, running => 12213
2019-08-03 16:39:00 111-0: volumes => L-Stor:vm-111-disk-1
2019-08-03 16:39:01 111-0: freeze guest filesystem
2019-08-03 16:39:02 111-0: create snapshot '__replicate_111-0_1564861140__' on L-Stor:vm-111-disk-1
2019-08-03 16:39:02 111-0: thaw guest filesystem
2019-08-03 16:39:02 111-0: full sync 'L-Stor:vm-111-disk-1' (__replicate_111-0_1564861140__)
2019-08-03 16:39:02 111-0: full send of L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__ estimated size is 1.46G
2019-08-03 16:39:02 111-0: send from @__replicate_111-1_1564860304__ to L-Stor/vm-111-disk-1@__replicate_111-1_1564860900__ estimated size is 1.34M
2019-08-03 16:39:02 111-0: send from @__replicate_111-1_1564860900__ to L-Stor/vm-111-disk-1@__replicate_111-0_1564861140__ estimated size is 361K
2019-08-03 16:39:02 111-0: total estimated size is 1.46G
2019-08-03 16:39:03 111-0: TIME SENT SNAPSHOT L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:04 111-0: 16:39:04 40.4M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:05 111-0: 16:39:05 83.7M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:06 111-0: 16:39:06 126M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:07 111-0: 16:39:07 167M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:08 111-0: 16:39:08 205M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:09 111-0: 16:39:09 250M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:10 111-0: 16:39:10 285M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:11 111-0: 16:39:11 323M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:12 111-0: 16:39:12 355M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:13 111-0: 16:39:13 395M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:14 111-0: 16:39:14 436M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:15 111-0: 16:39:15 467M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:16 111-0: 16:39:16 512M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:17 111-0: 16:39:17 549M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:18 111-0: 16:39:18 588M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:19 111-0: 16:39:19 635M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:20 111-0: 16:39:20 667M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:21 111-0: 16:39:21 705M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:22 111-0: 16:39:22 744M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:23 111-0: 16:39:23 777M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:24 111-0: 16:39:24 825M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:25 111-0: 16:39:25 860M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:26 111-0: 16:39:26 892M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:27 111-0: 16:39:27 924M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:28 111-0: 16:39:28 973M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:29 111-0: 16:39:29 1005M L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:30 111-0: 16:39:30 1.03G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:31 111-0: 16:39:31 1.06G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:32 111-0: 16:39:32 1.09G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:33 111-0: 16:39:33 1.13G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:34 111-0: 16:39:34 1.16G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:35 111-0: 16:39:35 1.21G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:36 111-0: 16:39:36 1.23G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:37 111-0: 16:39:37 1.27G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:38 111-0: 16:39:38 1.31G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:39 111-0: 16:39:39 1.34G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:40 111-0: 16:39:40 1.38G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:41 111-0: 16:39:41 1.41G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:42 111-0: 16:39:42 1.45G L-Stor/vm-111-disk-1@__replicate_111-1_1564860304__
2019-08-03 16:39:42 111-0: TIME SENT SNAPSHOT L-Stor/vm-111-disk-1@__replicate_111-1_1564860900__
2019-08-03 16:39:42 111-0: TIME SENT SNAPSHOT L-Stor/vm-111-disk-1@__replicate_111-0_1564861140__
2019-08-03 16:39:45 111-0: end replication job
======================

Can anyone confirm this behaviour ?

Regards,

Ricardo Jorge

fabian · Aug 5, 2019

if you just move the VM config, but don't modify the replication config and transfer the replication state (like a migration does), then you are bound to run into issues..

ricardoj · Aug 5, 2019

Hi,

The simulation was to evaluate DR options.

The VM was off and so I turned off the node where it was.

No HA was configured for this VM.

How can I use already replicated content to start this VM on another node ?

The test was based on this Proxmox documentation.

"move both guest configuration files form the origin node A to node B:"

In fact when I moved VM to another node replication was automatically changed to represent the new replication scenario.

Thank you for your time and attention.

Regards,

Ricardo Jorge

fabian · Aug 5, 2019

you can move the configuration file and start the guest for immediate recovery. but the replication state is (potentially) wrong at this point (T3 might not have the last snapshot that T1 and T2 had in common), which is why it will trigger a full resync, which will fail since you have a partially synced disk on T2. the state on T2 is potentially newer than that on T3, so removing that disk is not a choice that the software can do automatically, but the admin has to decide which node to pick for recovery (hint: the replication snapshot names contain a timestamp

)

ricardoj · Aug 5, 2019

Hi,

Thank you for your reply.

I understand that "software" can not always decide which option is the correct ( best ) one for each environment and that's why we have "Admins".

In my test scenario, VM is always powered off so the last ( previous ) snapshot is the same in all nodes.

In a real scenario the node with failure could be the most up to date or not !

So the Admin must decide what is the best option for that particular case. Recover VM from a backup or continue with the latest replication available.

From my tests what I see is :

a) - One can recover from a node failure, starting needed VMs on another node(s)

b) - There will be inconsistent in replication status and Admin must solve this by hand

c) - The replication issue I see during my tests occour from time to time but not always

c.1) - There are times replication continues without errors. Please, note that VM is always powered off.

c.2) - I can not find a way to always have a replication "error" or never have an "error". Maybe I must test it deeper !

c.3) - Even so, I can recover from a node failure and that was the main purpose of this test.

d) - Considering local storage replication the current scenario is a good option for environments where one can not invest in a high end shared storage - wether "external" or "internal" like CEPH.

Regards,

Ricardo Jorge

fabian · Aug 6, 2019

the content might be the same because the VM is powered off, but you still have different snapshots for replicating to node B and node C, and one of them will be newer. if you have this sequence:

Node A replicates to Node B
Node A replicates to Node C
Node A replicates to Node B, state from 1 is discarded
Node A replicates to Node C, state from 2 is discarded
Node A goes down

now, Node C has the state from point in time 4, which includes the state from point in time 3. Node B only has the state from point in time 3, which includes the already discarded state 2 for the sync from A -> C

if you now move the VM to Node B, it will attempt to replicate to Node C using the state 2 as base, which is not valid on Node C since it was discarded at point in time 4.
if you instead move the VM to Node C, it can replicate to Node B with state 3 as base, since that is valid on both nodes.

as long as you choose the node with the latest replication across all jobs defined for that guest, you should not need to manually clean up anything else.

ricardoj · Aug 6, 2019

Hi,

@fabian

Yes, the example is correct.

In the tests I did, I realized that there is this timing condition for the "error" to occur or not.

In a real case, it is not possible to choose the node that will have problem and so this "failure" condition may occur and Admin will need to fix it manually.

Also in a real case, the VM will change content continuously, unlike my test where the VM is powered off all the time.

Storage replication is an option that should be used very carefully because it has little guarantee of operating the way you want when a node fails.

Nothing like good shared storage !

Thank you for your time and attention.

Regards,

Ricardo Jorge

Search

Search

Replication error on 6.0-5 versions

ricardoj

Member

fabian

Proxmox Staff Member

ricardoj

Member

fabian

Proxmox Staff Member

ricardoj

Member

fabian

Proxmox Staff Member

ricardoj

Member

We value your privacy