All Replication stalled

yena

Active Member
Nov 18, 2011
354
4
38
Hello,
i have a cluster node ( ZFS / Replication /HA 3 nodes cluster )
with 3 VPS (2 LXC, 1 KVM ).
All replication job on node2 are stalled but no errors log.

Before reboot the node, i'm tryng a pve-zsync but return this error:

/usr/sbin/pve-zsync sync --source 104 --dest 10.10.10.3:STORAGE/PVEZSYNC--verbose --maxsnap 3 --name Srv01
COMMAND:
zfs send -- STORAGE/KVM/vm-104-disk-0@rep_Srv01_2019-01-02_10:41:28 | ssh -o 'BatchMode=yes' root@10.10.10.3 -- zfs recv -F -- STORAGE/PVEZSYNC--verbose/vm-104-disk-0
GET ERROR:
cannot open 'STORAGE/PVEZSYNC--verbose/vm-104-disk-0': dataset does not exist
cannot receive new filesystem stream: dataset does not exist


Seem this try to sync a LXC image but this is a KVM vps:

zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
STORAGE 80.5G 818G 104K /STORAGE
STORAGE/BACKUP 96K 818G 96K /STORAGE/BACKUP
STORAGE/KVM 39.4G 818G 96K /STORAGE/KVM
STORAGE/KVM/vm-104-disk-0 39.4G 818G 38.5G -
STORAGE/KVM/vm-104-disk-0@autodaily190102031537 897M - 38.1G -
STORAGE/LXC 41.1G 818G 112K /STORAGE/LXC
STORAGE/LXC/subvol-101-disk-0 35.0G 175G 24.6G /STORAGE/LXC/subvol-101-disk-0
STORAGE/LXC/subvol-101-disk-0@autodaily181217164532 303M - 1.37G -
STORAGE/LXC/subvol-101-disk-0@__replicate_101-0_1546210801__ 10.1G - 32.7G -
STORAGE/LXC/subvol-102-disk-0 6.03G 94.2G 5.79G /STORAGE/LXC/subvol-102-disk-0
STORAGE/LXC/subvol-102-disk-0@__replicate_102-0_1546210806__ 145M - 5.79G -
STORAGE/LXC/subvol-102-disk-0@autodaily190102031410 105M - 5.78G -
rpool 10.9G 214G 104K /rpool
rpool/ROOT 2.40G 214G 96K /rpool/ROOT
rpool/ROOT/pve-1 2.40G 214G 2.40G /
rpool/data 96K 214G 96K /rpool/data
rpool/swap 8.50G 215G 7.52G -
root@iwlab2:~#

-------------------------------------------------------------------------------------------------------------------------------------------

root@iwlab2:~# pct list
VMID Status Lock Name
101 stopped srv01.iweblab.it
102 running srv010.iweblab.it
root@iwlab2:~# qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
104 srv01.iweblab.it running 81920 1945.00 21144


-------------------------------------------------------------------------------------------------------------------------------------------

I can login on storage node using ssh on LAN... so no networking issue

pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
pve-zsync: 1.7-2
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
----------------------------------------------------------------------------------------------------

Thanks!
 

sb-jw

Active Member
Jan 23, 2018
587
63
33
31
GET ERROR:
cannot open 'STORAGE/PVEZSYNC--verbose/vm-104-disk-0': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
Did you see this Error and check it?
 

yena

Active Member
Nov 18, 2011
354
4
38
Did you see this Error and check it?

Yes, it is the first job, in the dest server i have no images.. so i don’t understand this error...
And all other replication (proxmox 5) fails without logging ...
Someone know witch service manage replication?
I would like to restart it.
Thanks
 

yena

Active Member
Nov 18, 2011
354
4
38
Thanks man, using,
ps aux | grep -v grep | grep pvesr
root 25812 0.0 0.0 500516 75124 ? Ss 2018 0:00 /usr/bin/perl -T /usr/bin/pvesr run --mail 1
root 25855 0.0 0.0 500516 68072 ? D 2018 0:02 /usr/bin/perl -T /usr/bin/pvesr run --mail 1
root@iwlab2:~# kill -15 25812

The "stalled" replica is killed and other replication are in progress...

Now i have to dicover why this vps replica are stalled ..
I have enabled NFS on LXC.. may be this ...
 
  • Like
Reactions: James Crook

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!