ZFS - Slow replication

Jota V.

Well-Known Member
Jan 29, 2018
59
9
48
49
Hi, i'have a cluster with five nodes. Nodes are running CTs.

If we enable a replication between two nodes, replication is too slow. Weeks ago was instantly. Now when replication starts, saids.

Code:
2019-03-11 08:21:00 501-1: start replication job
2019-03-11 08:21:00 501-1: guest => CT 501, running => 1
2019-03-11 08:21:00 501-1: volumes => local-zfs:subvol-501-disk-0
2019-03-11 08:21:01 501-1: freeze guest filesystem
2019-03-11 08:21:01 501-1: create snapshot '__replicate_501-1_1552288860__' on local-zfs:subvol-501-disk-0
2019-03-11 08:21:01 501-1: thaw guest filesystem
2019-03-11 08:21:01 501-1: full sync 'local-zfs:subvol-501-disk-0' (__replicate_501-1_1552288860__)

and logs saids:

Code:
Mar 11 08:26:31 vcloud01 pvesr[17347]: 08:26:31   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:32 vcloud01 pvesr[17347]: 08:26:32   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:33 vcloud01 pvesr[17347]: 08:26:33   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:34 vcloud01 pvesr[17347]: 08:26:34   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:35 vcloud01 pvesr[17347]: 08:26:35   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:36 vcloud01 pvesr[17347]: 08:26:36   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:37 vcloud01 pvesr[17347]: 08:26:37   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:38 vcloud01 pvesr[17347]: 08:26:38   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:39 vcloud01 pvesr[17347]: 08:26:39   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:40 vcloud01 pvesr[17347]: 08:26:40   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:41 vcloud01 pvesr[17347]: 08:26:41   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:42 vcloud01 pvesr[17347]: 08:26:42   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:43 vcloud01 pvesr[17347]: 08:26:43   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:44 vcloud01 pvesr[17347]: 08:26:44   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:45 vcloud01 pvesr[17347]: 08:26:45   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:46 vcloud01 pvesr[17347]: 08:26:46   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:47 vcloud01 pvesr[17347]: 08:26:47   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:48 vcloud01 pvesr[17347]: 08:26:48   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:49 vcloud01 pvesr[17347]: 08:26:49   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:50 vcloud01 pvesr[17347]: 08:26:50   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:51 vcloud01 pvesr[17347]: 08:26:51   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:52 vcloud01 pvesr[17347]: 08:26:52   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:53 vcloud01 pvesr[17347]: 08:26:53   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:54 vcloud01 pvesr[17347]: 08:26:54   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:55 vcloud01 pvesr[17347]: 08:26:55   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:56 vcloud01 pvesr[17347]: 08:26:56   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:57 vcloud01 pvesr[17347]: 08:26:57   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:58 vcloud01 pvesr[17347]: 08:26:58   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:59 vcloud01 pvesr[17347]: 08:26:59   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:00 vcloud01 pvesr[17347]: 08:27:00   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:00 vcloud01 pmxcfs[21973]: [status] notice: received log
Mar 11 08:27:01 vcloud01 pvesr[17347]: 08:27:01   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:02 vcloud01 pvesr[17347]: 08:27:02   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:03 vcloud01 pvesr[17347]: 08:27:03   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:04 vcloud01 pvesr[17347]: 08:27:04   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:05 vcloud01 pvesr[17347]: 08:27:05   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:06 vcloud01 pmxcfs[21973]: [status] notice: received log
Mar 11 08:27:06 vcloud01 pvesr[17347]: 08:27:06   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:07 vcloud01 pvesr[17347]: 08:27:07   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:08 vcloud01 pvesr[17347]: 08:27:08   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:09 vcloud01 pvesr[17347]: 08:27:09   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:10 vcloud01 pvesr[17347]: 08:27:10   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:11 vcloud01 pvesr[17347]: 08:27:11   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:12 vcloud01 pvesr[17347]: 08:27:12   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:13 vcloud01 pvesr[17347]: 08:27:13   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:14 vcloud01 pvedaemon[26221]: worker exit
Mar 11 08:27:14 vcloud01 pvedaemon[1770]: worker 26221 finished
Mar 11 08:27:14 vcloud01 pvedaemon[1770]: starting 1 worker(s)
Mar 11 08:27:14 vcloud01 pvedaemon[1770]: worker 25964 started
Mar 11 08:27:14 vcloud01 pvesr[17347]: 08:27:14   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:15 vcloud01 pvesr[17347]: 08:27:15   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:16 vcloud01 pvesr[17347]: 08:27:16   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:17 vcloud01 pvesr[17347]: 08:27:17   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:18 vcloud01 pvesr[17347]: 08:27:18   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:19 vcloud01 pvesr[17347]: 08:27:19   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30

i've removed all snapshots (used as secondary backups) and upgraded and rebooted nodes but replication is still too slow.

Network is gigabit and with iperf can reach 100% bandwidth.

my pveversions

Code:
root@vcloud01:~/bin# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
pve-manager: 5.3-11 (running version: 5.3-11/d4907f84)
pve-kernel-4.15: 5.3-2
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-47
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-12
libpve-storage-perl: 5.0-38
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-23
pve-cluster: 5.0-33
pve-container: 2.0-34
pve-docs: 5.3-3
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-18
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-2
pve-xtermjs: 3.10.1-2
qemu-server: 5.0-47
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 
I've removed CT and restored from a backup, and now replication for this CT is fast.

How can i check / optimize other CT's ZFS pools?
 
Also, restoring backup, i've converted from privileged to unprivileged.

Converting gives me this errors

tar: ./lib/udev/devices/ploop63531p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop29757p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop44060p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop36534p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop60768p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop34737p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop44010p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop46981p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop45950p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop31285p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop64185p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop59173p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop35387p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop16652p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop20899p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop48607p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop61807p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop44539p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop11696p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop54212p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/simfs: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop26232p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop61584p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop39288p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop23411p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop31588p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop34567p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop12934p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop27966p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop29886p1: Cannot mknod: Operation not permitted
tar: ./var/spool/postfix/dev/urandom: Cannot mknod: Operation not permitted
tar: ./var/spool/postfix/dev/random: Cannot mknod: Operation not permitted


This CTs are migrated from SolusVM to Proxmox. May theese files can hangs replication?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!