rsync failing between two servers - have I tried to be too clever?

guff666

Member
Nov 6, 2021
35
1
13
72
I used to have the following:
  • iMac 21" with Time Machine backups
  • SMARTOS server with huge ZFS disks for backups
A zone on the SMARTOS server was installed with rsnapshot. Under manual control I could instruct this zone to use rsnapshot to take a full copy of my home directory on the iMac. I used this as a long term archive which was never purged. I only did this every couple of months.

More recently the iMac died and I used a Dell Workstation to run Proxmox 7 and created a Hackintosh clone of the iMac. Plenty of memory and CPU. All good.

Having seen the benefit of Proxmox I migrated the SMARTOS server to Proxmox and created an "archive" container with rsnapshot. It seemed to work but is now displaying problems. (BTW, there have been no configuration changes)

When I run the rsnapshot script I am getting the following error:
Code:
rsync error: error in rsync protocol data stream (code 12) at io.c(235) [receiver=3.1.3]

isolating this down to the rsync command, which is
Code:
/usr/bin/rsync -azF --delete --numeric-ids --delete-excluded --progress --partial -e "/usr/bin/ssh -p 22" garethhowell@<client>:/Users/garethhowell/ /mnt/archive1/alpha.0/<client>
I see that the cause is the following ssh error.
Code:
ssh_dispatch_run_fatal: Connection to <client> port 22: message authentication code incorrect

Now, this occurs some time into the session, not at the start. So I think the actual error code maybe a red herring. I've increased the log level to the max and this seems to confirm this. See
Code:
ssh_dispatch_run_fatal: Connection to <client> port 22: message authentication code incorrect

rsync: connection unexpectedly closed (123144802 bytes received so far) [receiver]

I've seen messages on other forums where this has been associated with a dodgy network card or network performance, but for all of these, the error occurs right at the outset, not some time in.

Has anybody else seen anything like this? Am I exposing the network performance problem?
 
Last edited:
Yes, there’s no immediate problem. The transfer starts OK but fails part-way through. The failure point varies.

That’s why I suspect the reported error is a red herring.
 
on a (rather far-fetched) hunch - check the mac-addresses and ip-addresses in your network for any duplicates (that could explain why it happens only part-way through
also - ssh in the same direction that rsync/rsnapshot does - and let the session open for a few minutes/hours - it should remain active

I hope this helps!
 
Hi Stoiko
It was worth a try, but it reveals nothing. I can ssh easily in both directions. Running a simple bash while loop ran for over 5 hours with no problem.
I also ran
iperf3 with 10 parallel sessions for 5 mins. Ave throughput was 942 MBit/s on a gigabit network, so no problems there either.

I tried an rsync session in the opposite direction and all was well. So, I tried pulling the same folder hierarchy to the archive server. It fails randomly - transfer some files and then fails. If I repeat, it gets a bit further and then fails again. Always the same error code, and always after a window size adjust
Code:
Documents/MyBrains/Gareth Howell_brain/Files/85486DE1-7176-4B6B-4D46-AA7B6C173314/TheBrainIcon.png
          2,527 100%    3.23kB/s    0:00:00 (xfr#695, to-chk=3410/4784)
Documents/MyBrains/Gareth Howell_brain/Files/856FEC3E-02DB-C8EB-10AD-EB546BD83745/ERM and the requirements of ISO 31000.pdf
              0   0%    0.00kB/s    0:00:00  debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1990656 sent adjust 106496
debug2: channel 0: window 1990656 sent adjust 106496
debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1982464 sent adjust 114688
        562,791 100%  697.46kB/s    0:00:00 (xfr#696, to-chk=3408/4784)
Documents/MyBrains/Gareth Howell_brain/Files/859783AC-E0E7-603A-B0FF-467C97D5A4AA/.Crisis Communication Guidelines_SM261113.docx.icloud
            197 100%    0.24kB/s    0:00:00 (xfr#697, to-chk=3406/4784)
Documents/MyBrains/Gareth Howell_brain/Files/863592AD-0961-8CC0-D4ED-4E18596B0DC6/TheBrainIcon.png
          4,548 100%    5.63kB/s    0:00:00 (xfr#698, to-chk=3404/4784)
Documents/MyBrains/Gareth Howell_brain/Files/864B5D72-3DEA-1051-AE50-03163758FB55/ERM - Whats Different in the Corporate World - Kinsey.pdf
              0   0%    0.00kB/s    0:00:00  debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1982464 sent adjust 114688
ssh_dispatch_run_fatal: Connection to 172.29.12.200 port 22: message authentication code incorrect

rsync: connection unexpectedly closed (47697599 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(235) [receiver=3.1.3]
rsync: connection unexpectedly closed (222770 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(235) [generator=3.1.3]
 
I'd suggest to create a destination disk that has the identical bytes and dd the file over the network directly on to it - just for a test. If that works, you need to create a new disk with your destination size and clonezilla the thing onto that.

ssh_dispatch_run_fatal: Connection to 172.29.12.200 port 22: message authentication code incorrect
I remember, I had such an error roughly 20 years and it was a bad CPU that under warm conditions was unable to calculate correctly. Can you check for temperature related things?
 
I had some issues in the past where the network stack at some point was overloaded. That issued a TCP zero window and everything went nuts. So I'd recommend to use a "bwlimit" on rsync command and see if that helps.
Also check the VM configuration and make sure there are no disk related issues. This can also bubble up into other areas. I do experience this sometimes on smb connects when for instance a zfs scrub is running.
Hth and good luck!
 
OK, it's a network problem. I opened the proxmox shell on the node and ran a simple while loop printing the date. It stalled when the rsync session froze.
I'm surprised. It's not a highly stressed box, though it is only a Dell Optiplex. I think I have an E1000 somewhere which I can stick in it.
 
So, I still can't bottom this problem.
I've added an additional Ethernet card, created a second bridge and instructed the CT to use this bridge. rsync still fails with the same error.

I'll explorea different solution.
 
I know this is old but putting it here in case it helps someone later on.

The symptoms described seemed to have been exactly what I was experiencing. While running a watch command in my LXC terminal, I can see the rsync process starting and then immediately dying. At the source end of the connection, it would appear as though a couple dozen files would copy and then the network connection would drop. After spinning my wheels for a bit, I remembered my LXC containers have their firewalls enabled by default. I disabled the firewall and rebooted the container and voila, the rsync is running uninterrupted for the past 15 minutes now. At some point, I will dig into the firewall rules and figure out exactly why the connection is allowed to establish and then gets killed after. For now, I just need to copy my files. My network is not exposed outside of my LAN so taking down my firewall is no problem. I'll put it back up later on. I hope this helps!