New Server works but replication is not

liszca

Well-Known Member
May 8, 2020
69
1
48
23
I tried to replicate one of my big LXC container to new Proxmox host and it refuses to do so with an error message:

Code:
2025-07-13 00:39:01 110-0: end replication job with error: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node3' -o 'UserKnownHostsFile=/etc/pve/nodes/node3/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@10.0.0.3 -- pvesr prepare-local-job 110-0 local-zfs-data1:subvol-110-disk-0 local-zfs-data1:subvol-110-disk-1 local-zfs-data1:subvol-110-disk-4 --last_sync 0' failed: malformed number (leading zero must not be followed by another digit), at character offset 2 (before "0:39:01 up 37 min,  ...") at /usr/share/perl5/PVE/Replication.pm line 128.

What I did is the following:
- Is replication working between the old hosts - yes
- chronyc sources - got router as timeserver and looks fine
- Replication schedules can't be deleted in the GUI - Console can help out

I am confused about "leading zero must not be followed by another digit".

Are the SSH keys not distributed correctly? How can I check? - there is a key of the new host in .ssh/authorized_keys

So I am out of Ideas where to look for my mistake.
 
Last edited:
Are the SSH keys not distributed correctly? How can I check?
May be. I've been there when I changed the members of an existing cluster.

You nodes have names. You did not tell us anything about your cluster, so let's assume there are three nodes named pveh / pvei / pvej. Now this must run without any any prompt or error message:
Code:
~# for HOST in pveh pvei pvej ; do ssh root@$HOST whoami; done
root
root
root
Run the above on all three nodes - running on only one node is NOT sufficient.

Check /etc/hosts on all nodes. That file must contain correct and identical information about those hosts. (Assuming there is no full blown local DNS server in the background.)
 
I think the problem isn't coming from ssh. I logged into the server without my .ssh/config like ssh -F /dev/null thats how I logged into PVE.

From there I checked for ssh accessibility and no issue.

I looked into this message for several times, and the "character offset" can also 3:
Code:
at character offset 2 (before "0:39:01 up 37 min,  ...") at

Another example:
Code:
failed: garbage after JSON object, at character offset 3 (before ":36:02 up 4 days, 18...") at

looking at locale it looks like this:
Code:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

So date and time format is the same on all nodes.

Checked for ZFS pools having the expected name - Looks good