Replication Issues

gadreel

Member
Apr 22, 2023
10
1
6
Hi.
I have Proxmox 8.2 with 3 nodes cluster. When I shutdown one of the nodes for maintenance for example I receive all these replication emails regarding the containers of the node.

The emails contain the following content:
Code:
Replication job '112-0' with target 'franky' and schedule '21:00' failed!

Last successful sync: 1970-01-01 02:00:00
Next sync try: 1970-01-01 21:05:00
Failure count: 1

Error:
command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=franky' -o 'UserKnownHostsFile=/etc/pve/nodes/franky/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.81.96 -- pvesr prepare-local-job 112-0 local-zfs:subvol-112-disk-0 --last_sync 0' failed: exit code 255

I do not understand why this happens, I do not understand why the dates are from 1970. Any ideas? Thanks.

issue.png
 
Posting again in-case someone has any clue why the above is happening. Thnx
 
Last edited:
Hi,
did any such replication job succeed at least once? It sounds like the node franky was fenced (network issues?) and thus was not part of the cluster anymore, so replication cannot succeed after that. What is the output of pveversion -v and of cat /var/lib/pve-manager/pve-replication-state.json on the source node of the replication?
 
Hi,
did any such replication job succeed at least once? It sounds like the node franky was fenced (network issues?) and thus was not part of the cluster anymore, so replication cannot succeed after that. What is the output of pveversion -v and of cat /var/lib/pve-manager/pve-replication-state.json on the source node of the replication?
Hi @fiona,
Thanks for replying.

If "succeed at least once" you mean in general the answer is yes. Most of my replication jobs run at night.
The issue is that this happens when I shutdown a server and the replication jobs are not scheduled to run at the time the server shut down.

See below again the same thing on another server .

1729848095631.png

Find the pveversion and replication state you requested on the server zoro.

Bash:
Linux zoro 6.8.12-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-2 (2024-09-05T10:03Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Fri Oct 25 12:23:35 EEST 2024 from 192.168.81.99 on pts/0
root@zoro:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.12-2-pve)
pve-manager: 8.2.7 (running version: 8.2.7/3e0176e6bb2ade3b)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-2
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
amd64-microcode: 3.20240820.1
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.8
libpve-cluster-perl: 8.0.8
libpve-common-perl: 8.2.5
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.10
libpve-storage-perl: 8.2.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-4
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.2.4
pve-cluster: 8.0.8
pve-container: 5.2.0
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.0.7
pve-firmware: 3.13-2
pve-ha-manager: 4.0.5
pve-i18n: 3.2.4
pve-qemu-kvm: 9.0.2-3
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1

JSON:
{"117":{"local/luffy":{"last_sync":1729792834,"last_iteration":1729792802,"duration":17.605772,"fail_count":0,"storeid_list":["local-zfs"],"last_node":"zoro","last_try":1729792834}},"128":{"local/luffy":{"last_iteration":1729847701,"last_sync":1729847706,"last_try":1729847706,"last_node":"zoro","storeid_list":["local-zfs"],"fail_count":0,"duration":2.722637}},"111":{"local/luffy":{"duration":3.322885,"fail_count":0,"storeid_list":["local-zfs"],"last_node":"zoro","last_try":1729846801,"last_sync":1729846801,"last_iteration":1729846801}},"119":{"local/luffy":{"duration":5.747837,"fail_count":0,"last_node":"zoro","storeid_list":["local-zfs"],"last_try":1729847701,"last_sync":1729847701,"last_iteration":1729847701}},"102":{"local/luffy":{"last_iteration":1729792802,"last_sync":1729792802,"storeid_list":["local-zfs"],"last_node":"zoro","last_try":1729792802,"duration":32.684891,"fail_count":0}}}
 
Are all these guests HA managed? Can you also share the excerpt of the system logs/journal from around the time of the issue? The current replication state looks good at a glance, could you share it again once the failure happens?
 
Are all these guests HA managed? Can you also share the excerpt of the system logs/journal from around the time of the issue? The current replication state looks good at a glance, could you share it again once the failure happens?
@fiona

Yes all these guests are HA managed. I am attaching the Journal from 23:40 to 23:50 during the shutdown procedure...and journal when the server power up.

Let me know if you need something else.
 

Attachments

@fiona

Find attached the Journal of the "Luffy" node. Around 23:46 in the logs you will see that the Luffy server sends the emails but I think those are the "fencing" emails which are ok to receive.
 

Attachments