All Replication stalled

yena

Renowned Member
Nov 18, 2011
385
6
83
Hello,
i have a cluster node ( ZFS / Replication /HA 3 nodes cluster )
with 3 VPS (2 LXC, 1 KVM ).
All replication job on node2 are stalled but no errors log.

Before reboot the node, i'm tryng a pve-zsync but return this error:

/usr/sbin/pve-zsync sync --source 104 --dest 10.10.10.3:STORAGE/PVEZSYNC--verbose --maxsnap 3 --name Srv01
COMMAND:
zfs send -- STORAGE/KVM/vm-104-disk-0@rep_Srv01_2019-01-02_10:41:28 | ssh -o 'BatchMode=yes' root@10.10.10.3 -- zfs recv -F -- STORAGE/PVEZSYNC--verbose/vm-104-disk-0
GET ERROR:
cannot open 'STORAGE/PVEZSYNC--verbose/vm-104-disk-0': dataset does not exist
cannot receive new filesystem stream: dataset does not exist


Seem this try to sync a LXC image but this is a KVM vps:

zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
STORAGE 80.5G 818G 104K /STORAGE
STORAGE/BACKUP 96K 818G 96K /STORAGE/BACKUP
STORAGE/KVM 39.4G 818G 96K /STORAGE/KVM
STORAGE/KVM/vm-104-disk-0 39.4G 818G 38.5G -
STORAGE/KVM/vm-104-disk-0@autodaily190102031537 897M - 38.1G -
STORAGE/LXC 41.1G 818G 112K /STORAGE/LXC
STORAGE/LXC/subvol-101-disk-0 35.0G 175G 24.6G /STORAGE/LXC/subvol-101-disk-0
STORAGE/LXC/subvol-101-disk-0@autodaily181217164532 303M - 1.37G -
STORAGE/LXC/subvol-101-disk-0@__replicate_101-0_1546210801__ 10.1G - 32.7G -
STORAGE/LXC/subvol-102-disk-0 6.03G 94.2G 5.79G /STORAGE/LXC/subvol-102-disk-0
STORAGE/LXC/subvol-102-disk-0@__replicate_102-0_1546210806__ 145M - 5.79G -
STORAGE/LXC/subvol-102-disk-0@autodaily190102031410 105M - 5.78G -
rpool 10.9G 214G 104K /rpool
rpool/ROOT 2.40G 214G 96K /rpool/ROOT
rpool/ROOT/pve-1 2.40G 214G 2.40G /
rpool/data 96K 214G 96K /rpool/data
rpool/swap 8.50G 215G 7.52G -
root@iwlab2:~#

-------------------------------------------------------------------------------------------------------------------------------------------

root@iwlab2:~# pct list
VMID Status Lock Name
101 stopped srv01.iweblab.it
102 running srv010.iweblab.it
root@iwlab2:~# qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
104 srv01.iweblab.it running 81920 1945.00 21144


-------------------------------------------------------------------------------------------------------------------------------------------

I can login on storage node using ssh on LAN... so no networking issue

pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
pve-zsync: 1.7-2
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
----------------------------------------------------------------------------------------------------

Thanks!
 
GET ERROR:
cannot open 'STORAGE/PVEZSYNC--verbose/vm-104-disk-0': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
Did you see this Error and check it?
 
Did you see this Error and check it?

Yes, it is the first job, in the dest server i have no images.. so i don’t understand this error...
And all other replication (proxmox 5) fails without logging ...
Someone know witch service manage replication?
I would like to restart it.
Thanks
 
Thanks man, using,
ps aux | grep -v grep | grep pvesr
root 25812 0.0 0.0 500516 75124 ? Ss 2018 0:00 /usr/bin/perl -T /usr/bin/pvesr run --mail 1
root 25855 0.0 0.0 500516 68072 ? D 2018 0:02 /usr/bin/perl -T /usr/bin/pvesr run --mail 1
root@iwlab2:~# kill -15 25812

The "stalled" replica is killed and other replication are in progress...

Now i have to dicover why this vps replica are stalled ..
I have enabled NFS on LXC.. may be this ...
 
  • Like
Reactions: James Crook