Storage Replication - Starts fast and then slows to a crawl, second dataset slow from start. Using another way to send snapshots goes full speed.

effgee

Renowned Member
Jul 29, 2013
44
15
73
Update: Believe this has to do with Wireguard VPN, possibly an MTU issue but the high speed going down to a trickle is very odd. If anyone has any insight would appreciate it.

Kinda new to Proxmox built in replication but I am very familiar with zfs send / recv

Testing out simple storage replication and having an issue. Not sure what is going on.

First dataset, transfer started out fast going 50-130 MBS, which is normal for my networking, then near the end slowed to a crawl.
The cpu / memory and network were all not being utilized, just crawled to a slow trickle.

First dataset finished, and then second dataset continued but the lower performance was there from the start ~ 2 megabytes a second.

Eventually I canceled the transfer with the additional datasets and used syncoid which sent the datasets at full speed.

I have included the log file although I dont know what good it will be .


Both nodes are completely up to date (GPL/Free)

Code:
pveversion --verbose
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-10 (running version: 6.2-10/a20769ed)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-4.15: 5.4-12
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-4.15.18-24-pve: 4.15.18-52
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 8.0-2
ifupdown: 0.8.35+pve1
ksmtuned: 4.20150325+b1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-2
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-9
pve-cluster: 6.1-8
pve-container: 3.1-12
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-11
pve-xtermjs: 4.3.0-1
pve-zsync: 2.0-3
qemu-server: 6.2-11
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
 

Attachments

  • replicate-log-clipped.txt
    825.2 KB · Views: 8
Last edited:
Kinda new to Proxmox built in replication but I am very familiar with zfs send / recv
The storage replication is zfs send/receive under the hood.

Update: Believe this has to do with Wireguard VPN, possibly an MTU issue but the high speed going down to a trickle is very odd. If anyone has any insight would appreciate it.
Did you test the replication without a VPN (to verify)?
 
The storage replication is zfs send/receive under the hood.


Did you test the replication without a VPN (to verify)?
Never really figured out exactly what was causing it, did the same behavior over SSH as well. Ended up going with Sanoid / syncoid for replication which has an adjustable mbuffer switch and is generally, no offense intended a more clever and forgiving replication mechanism.

Adjusting the mbuffer (gave it 750 megabytes buffer) seemed to help although there were still times where little bits of data took way longer than they should. Is there a way to adjust buffers on the Proxmoxian replication?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!