Found a strange issue after upgrading my systems to PVE 9.
Proxmox backup crawls to some silly KB/s speeds until it completely stalls, unable to finish backup.
I have two servers and on both I have PVE + PBS installed alongside. PVE hosts just backup to each others PBS. Server are identical.
Hardware: Dell R6515, Broadcom BCM57414 “Broadcom Adv. Dual 25Gb Ethernet”
Firmware: 233.0.195.0/pkg 23.31.18.10
Kernel: 6.17.2-1-pve
Driver: bnxt_en only (RoCE disabled in firmware, bnxt_re not loaded)
MTU: 9000 on both endpoints and the switch
Symptom:
With default offloads: proxmox-backup-client benchmark: TLS -> 200 MB/s - but the actual backup can not complete, quickly drags to a stall
With
Which means there's some problem with driver, or kernel (or maybe even the new net card firmware, because I updated it in the same go when I upgraded PVE from 8 to 9).
What's interesting both iperf3 and ICMP tests with jumbo packets look fine in both cases. Took me a while (a bit of me a lot more of ChatGPT really) to figure out it might be the hardware offloads. Not sure why and how the problem only surfaces when doing backups and not when doing iperf tests. Perhaps iperf is sending zeroes or something (didn't go that deep into research) but it was surprising.
As a note: backups also work fine if I reduce MTU to 1500 and leave offloads by their defaults. But I used MTU 9000 for years on these servers, never experienced a single issue before, when I was on PVE 8. After upgrade to PVE 9 the problem started.
Proxmox backup crawls to some silly KB/s speeds until it completely stalls, unable to finish backup.
I have two servers and on both I have PVE + PBS installed alongside. PVE hosts just backup to each others PBS. Server are identical.
Hardware: Dell R6515, Broadcom BCM57414 “Broadcom Adv. Dual 25Gb Ethernet”
Firmware: 233.0.195.0/pkg 23.31.18.10
Kernel: 6.17.2-1-pve
Driver: bnxt_en only (RoCE disabled in firmware, bnxt_re not loaded)
MTU: 9000 on both endpoints and the switch
Symptom:
With default offloads: proxmox-backup-client benchmark: TLS -> 200 MB/s - but the actual backup can not complete, quickly drags to a stall
With
ethtool -K <iface> rx-gro-hw off gro off gso off tso off: proxmox-backup-client benchmark: TLS -> 770 MB/s - and backups works fineWhich means there's some problem with driver, or kernel (or maybe even the new net card firmware, because I updated it in the same go when I upgraded PVE from 8 to 9).
What's interesting both iperf3 and ICMP tests with jumbo packets look fine in both cases. Took me a while (a bit of me a lot more of ChatGPT really) to figure out it might be the hardware offloads. Not sure why and how the problem only surfaces when doing backups and not when doing iperf tests. Perhaps iperf is sending zeroes or something (didn't go that deep into research) but it was surprising.
As a note: backups also work fine if I reduce MTU to 1500 and leave offloads by their defaults. But I used MTU 9000 for years on these servers, never experienced a single issue before, when I was on PVE 8. After upgrade to PVE 9 the problem started.
Last edited: