ZFS - Slow replication

Jota V.

Well-Known Member
Jan 29, 2018
59
9
48
50
Hi, i'have a cluster with five nodes. Nodes are running CTs.

If we enable a replication between two nodes, replication is too slow. Weeks ago was instantly. Now when replication starts, saids.

Code:
2019-03-11 08:21:00 501-1: start replication job
2019-03-11 08:21:00 501-1: guest => CT 501, running => 1
2019-03-11 08:21:00 501-1: volumes => local-zfs:subvol-501-disk-0
2019-03-11 08:21:01 501-1: freeze guest filesystem
2019-03-11 08:21:01 501-1: create snapshot '__replicate_501-1_1552288860__' on local-zfs:subvol-501-disk-0
2019-03-11 08:21:01 501-1: thaw guest filesystem
2019-03-11 08:21:01 501-1: full sync 'local-zfs:subvol-501-disk-0' (__replicate_501-1_1552288860__)

and logs saids:

Code:
Mar 11 08:26:31 vcloud01 pvesr[17347]: 08:26:31   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:32 vcloud01 pvesr[17347]: 08:26:32   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:33 vcloud01 pvesr[17347]: 08:26:33   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:34 vcloud01 pvesr[17347]: 08:26:34   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:35 vcloud01 pvesr[17347]: 08:26:35   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:36 vcloud01 pvesr[17347]: 08:26:36   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:37 vcloud01 pvesr[17347]: 08:26:37   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:38 vcloud01 pvesr[17347]: 08:26:38   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:39 vcloud01 pvesr[17347]: 08:26:39   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:40 vcloud01 pvesr[17347]: 08:26:40   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:41 vcloud01 pvesr[17347]: 08:26:41   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:42 vcloud01 pvesr[17347]: 08:26:42   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:43 vcloud01 pvesr[17347]: 08:26:43   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:44 vcloud01 pvesr[17347]: 08:26:44   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:45 vcloud01 pvesr[17347]: 08:26:45   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:46 vcloud01 pvesr[17347]: 08:26:46   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:47 vcloud01 pvesr[17347]: 08:26:47   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:48 vcloud01 pvesr[17347]: 08:26:48   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:49 vcloud01 pvesr[17347]: 08:26:49   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:50 vcloud01 pvesr[17347]: 08:26:50   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:51 vcloud01 pvesr[17347]: 08:26:51   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:52 vcloud01 pvesr[17347]: 08:26:52   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:53 vcloud01 pvesr[17347]: 08:26:53   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:54 vcloud01 pvesr[17347]: 08:26:54   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:55 vcloud01 pvesr[17347]: 08:26:55   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:56 vcloud01 pvesr[17347]: 08:26:56   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:57 vcloud01 pvesr[17347]: 08:26:57   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:58 vcloud01 pvesr[17347]: 08:26:58   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:26:59 vcloud01 pvesr[17347]: 08:26:59   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:00 vcloud01 pvesr[17347]: 08:27:00   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:00 vcloud01 pmxcfs[21973]: [status] notice: received log
Mar 11 08:27:01 vcloud01 pvesr[17347]: 08:27:01   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:02 vcloud01 pvesr[17347]: 08:27:02   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:03 vcloud01 pvesr[17347]: 08:27:03   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:04 vcloud01 pvesr[17347]: 08:27:04   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:05 vcloud01 pvesr[17347]: 08:27:05   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:06 vcloud01 pmxcfs[21973]: [status] notice: received log
Mar 11 08:27:06 vcloud01 pvesr[17347]: 08:27:06   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:07 vcloud01 pvesr[17347]: 08:27:07   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:08 vcloud01 pvesr[17347]: 08:27:08   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:09 vcloud01 pvesr[17347]: 08:27:09   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:10 vcloud01 pvesr[17347]: 08:27:10   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:11 vcloud01 pvesr[17347]: 08:27:11   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:12 vcloud01 pvesr[17347]: 08:27:12   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:13 vcloud01 pvesr[17347]: 08:27:13   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:14 vcloud01 pvedaemon[26221]: worker exit
Mar 11 08:27:14 vcloud01 pvedaemon[1770]: worker 26221 finished
Mar 11 08:27:14 vcloud01 pvedaemon[1770]: starting 1 worker(s)
Mar 11 08:27:14 vcloud01 pvedaemon[1770]: worker 25964 started
Mar 11 08:27:14 vcloud01 pvesr[17347]: 08:27:14   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:15 vcloud01 pvesr[17347]: 08:27:15   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:16 vcloud01 pvesr[17347]: 08:27:16   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:17 vcloud01 pvesr[17347]: 08:27:17   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:18 vcloud01 pvesr[17347]: 08:27:18   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30
Mar 11 08:27:19 vcloud01 pvesr[17347]: 08:27:19   6.79M   rpool/data/subvol-501-disk-0@backup_2019_03_06_00_40_30

i've removed all snapshots (used as secondary backups) and upgraded and rebooted nodes but replication is still too slow.

Network is gigabit and with iperf can reach 100% bandwidth.

my pveversions

Code:
root@vcloud01:~/bin# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
pve-manager: 5.3-11 (running version: 5.3-11/d4907f84)
pve-kernel-4.15: 5.3-2
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-47
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-12
libpve-storage-perl: 5.0-38
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-23
pve-cluster: 5.0-33
pve-container: 2.0-34
pve-docs: 5.3-3
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-18
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-2
pve-xtermjs: 3.10.1-2
qemu-server: 5.0-47
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 
I've removed CT and restored from a backup, and now replication for this CT is fast.

How can i check / optimize other CT's ZFS pools?
 
Also, restoring backup, i've converted from privileged to unprivileged.

Converting gives me this errors

tar: ./lib/udev/devices/ploop63531p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop29757p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop44060p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop36534p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop60768p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop34737p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop44010p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop46981p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop45950p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop31285p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop64185p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop59173p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop35387p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop16652p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop20899p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop48607p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop61807p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop44539p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop11696p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop54212p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/simfs: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop26232p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop61584p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop39288p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop23411p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop31588p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop34567p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop12934p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop27966p1: Cannot mknod: Operation not permitted
tar: ./lib/udev/devices/ploop29886p1: Cannot mknod: Operation not permitted
tar: ./var/spool/postfix/dev/urandom: Cannot mknod: Operation not permitted
tar: ./var/spool/postfix/dev/random: Cannot mknod: Operation not permitted


This CTs are migrated from SolusVM to Proxmox. May theese files can hangs replication?