rsync failing between two servers - have I tried to be too clever?

guff666

Member
Nov 6, 2021
33
1
13
71
I used to have the following:
  • iMac 21" with Time Machine backups
  • SMARTOS server with huge ZFS disks for backups
A zone on the SMARTOS server was installed with rsnapshot. Under manual control I could instruct this zone to use rsnapshot to take a full copy of my home directory on the iMac. I used this as a long term archive which was never purged. I only did this every couple of months.

More recently the iMac died and I used a Dell Workstation to run Proxmox 7 and created a Hackintosh clone of the iMac. Plenty of memory and CPU. All good.

Having seen the benefit of Proxmox I migrated the SMARTOS server to Proxmox and created an "archive" container with rsnapshot. It seemed to work but is now displaying problems. (BTW, there have been no configuration changes)

When I run the rsnapshot script I am getting the following error:
Code:
rsync error: error in rsync protocol data stream (code 12) at io.c(235) [receiver=3.1.3]

isolating this down to the rsync command, which is
Code:
/usr/bin/rsync -azF --delete --numeric-ids --delete-excluded --progress --partial -e "/usr/bin/ssh -p 22" garethhowell@<client>:/Users/garethhowell/ /mnt/archive1/alpha.0/<client>
I see that the cause is the following ssh error.
Code:
ssh_dispatch_run_fatal: Connection to <client> port 22: message authentication code incorrect

Now, this occurs some time into the session, not at the start. So I think the actual error code maybe a red herring. I've increased the log level to the max and this seems to confirm this. See
Code:
ssh_dispatch_run_fatal: Connection to <client> port 22: message authentication code incorrect

rsync: connection unexpectedly closed (123144802 bytes received so far) [receiver]

I've seen messages on other forums where this has been associated with a dodgy network card or network performance, but for all of these, the error occurs right at the outset, not some time in.

Has anybody else seen anything like this? Am I exposing the network performance problem?
 
Last edited:
Yes, there’s no immediate problem. The transfer starts OK but fails part-way through. The failure point varies.

That’s why I suspect the reported error is a red herring.
 
on a (rather far-fetched) hunch - check the mac-addresses and ip-addresses in your network for any duplicates (that could explain why it happens only part-way through
also - ssh in the same direction that rsync/rsnapshot does - and let the session open for a few minutes/hours - it should remain active

I hope this helps!
 
Hi Stoiko
It was worth a try, but it reveals nothing. I can ssh easily in both directions. Running a simple bash while loop ran for over 5 hours with no problem.
I also ran
iperf3 with 10 parallel sessions for 5 mins. Ave throughput was 942 MBit/s on a gigabit network, so no problems there either.

I tried an rsync session in the opposite direction and all was well. So, I tried pulling the same folder hierarchy to the archive server. It fails randomly - transfer some files and then fails. If I repeat, it gets a bit further and then fails again. Always the same error code, and always after a window size adjust
Code:
Documents/MyBrains/Gareth Howell_brain/Files/85486DE1-7176-4B6B-4D46-AA7B6C173314/TheBrainIcon.png
          2,527 100%    3.23kB/s    0:00:00 (xfr#695, to-chk=3410/4784)
Documents/MyBrains/Gareth Howell_brain/Files/856FEC3E-02DB-C8EB-10AD-EB546BD83745/ERM and the requirements of ISO 31000.pdf
              0   0%    0.00kB/s    0:00:00  debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1990656 sent adjust 106496
debug2: channel 0: window 1990656 sent adjust 106496
debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1982464 sent adjust 114688
        562,791 100%  697.46kB/s    0:00:00 (xfr#696, to-chk=3408/4784)
Documents/MyBrains/Gareth Howell_brain/Files/859783AC-E0E7-603A-B0FF-467C97D5A4AA/.Crisis Communication Guidelines_SM261113.docx.icloud
            197 100%    0.24kB/s    0:00:00 (xfr#697, to-chk=3406/4784)
Documents/MyBrains/Gareth Howell_brain/Files/863592AD-0961-8CC0-D4ED-4E18596B0DC6/TheBrainIcon.png
          4,548 100%    5.63kB/s    0:00:00 (xfr#698, to-chk=3404/4784)
Documents/MyBrains/Gareth Howell_brain/Files/864B5D72-3DEA-1051-AE50-03163758FB55/ERM - Whats Different in the Corporate World - Kinsey.pdf
              0   0%    0.00kB/s    0:00:00  debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1982464 sent adjust 114688
ssh_dispatch_run_fatal: Connection to 172.29.12.200 port 22: message authentication code incorrect

rsync: connection unexpectedly closed (47697599 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(235) [receiver=3.1.3]
rsync: connection unexpectedly closed (222770 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(235) [generator=3.1.3]
 
I'd suggest to create a destination disk that has the identical bytes and dd the file over the network directly on to it - just for a test. If that works, you need to create a new disk with your destination size and clonezilla the thing onto that.

ssh_dispatch_run_fatal: Connection to 172.29.12.200 port 22: message authentication code incorrect
I remember, I had such an error roughly 20 years and it was a bad CPU that under warm conditions was unable to calculate correctly. Can you check for temperature related things?
 
I had some issues in the past where the network stack at some point was overloaded. That issued a TCP zero window and everything went nuts. So I'd recommend to use a "bwlimit" on rsync command and see if that helps.
Also check the VM configuration and make sure there are no disk related issues. This can also bubble up into other areas. I do experience this sometimes on smb connects when for instance a zfs scrub is running.
Hth and good luck!
 
OK, it's a network problem. I opened the proxmox shell on the node and ran a simple while loop printing the date. It stalled when the rsync session froze.
I'm surprised. It's not a highly stressed box, though it is only a Dell Optiplex. I think I have an E1000 somewhere which I can stick in it.
 
So, I still can't bottom this problem.
I've added an additional Ethernet card, created a second bridge and instructed the CT to use this bridge. rsync still fails with the same error.

I'll explorea different solution.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!