Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20

There is a new Linux kernel with version 6.17.11-4-test-pve available for testing. You can get the debian packages (including sha256 checksums for integrity verification) from http://download.proxmox.com/temp/kernel-6.17.11-tcp-stall-4/ and install by using apt install ./[<package-name>].deb. As always, double check that you booted into the correct version by uname -a after a reboot.

This kernel version reverts commits which were considered to be problematic.

In addition, in order to provide up to date data to the Linux kernel developers, it would be great if we could get tcpdump, ss -tim and nstat outputs also from v6.18.1 [0] or even v6.19-rc1 [1]. @LKo do you think you might be able to provide these?

Thanks in advance!

[0] https://kernel.ubuntu.com/mainline/v6.18.1/
[1] https://kernel.ubuntu.com/mainline/v6.19-rc1/

Edit: Also as requested by the kernel devs [2], please provide the output of cat /proc/sys/net/ipv4/tcp_rmem on the PBS as well as perf record -a -e tcp:tcp_rcvbuf_grow sleep 30 ; perf script

[2] https://lore.kernel.org/netdev/CANn89iL=MTgYygnFaCeaMpSzjooDgnzwUd_ueSnJFxasXwyMwg@mail.gmail.com/
 
Last edited:
They definitly know how to communicate efficiently over at netdev :D

Trying 6.17.11-4-test-pve now and will build 6.18 + the patches Eric mentioned once finished with that (the ubuntu 6.19-rc1 build is non-existent for amd64 atm).

I wasn't able to reproduce the stall using pbc-benchmark anymore, so back to doing the real backups for now.

edit: my git-fu is a bit rusty. Out of the three patches Eric mentioned, only "416dd649f3aa tcp: add net.ipv4.tcp_comp_sack_rtt_percent" is not already part of 6.18, right?
 
Last edited:
  • Like
Reactions: Chris
They definitly know how to communicate efficiently over at netdev :D

Trying 6.17.11-4-test-pve now and will build 6.18 + the patches Eric mentioned once finished with that (the ubuntu 6.19-rc1 build is non-existent for amd64 atm).

I wasn't able to reproduce the stall using pbc-benchmark anymore, so back to doing the real backups for now.
We already cherry picked aa251c84636c in our test build 6.17.11-2-test-pve (since it was considered as possible fix) and the other 2 patches should not lead to much different outcomes, as only the additional parameters were exposed (apart for slower growing the socket buffer if RTT is smaller than the threshold and changing the number of SACK being send).

But of course we are already building a kernel with these patches applied, for confirmation.

Could you provide the output of cat /proc/sys/net/ipv4/tcp_rmem for the time being? Although I do not expect to see any unexpected values ;)

Missed that the 6.19-rc1 is not available for amd64, sorry for that. Please do test https://kernel.ubuntu.com/mainline/v6.18.1/ though
 
Last edited:
At the moment it is
Code:
root@prx-backup:~# cat /proc/sys/net/ipv4/tcp_rmem                                                                                                                                                   
4096    131072  6291456
root@prx-backup:~# uname -a
Linux prx-backup 6.17.11-4-test-pve #1 SMP PREEMPT_DYNAMIC PMX 6.17.11-4 (2025-12-09T09:02Z) x86_64 GNU/Linux
But the backups are nearly finished and haven't stalled yet. I wish it was simpler to reproduce!
 
At the moment it is
Code:
root@prx-backup:~# cat /proc/sys/net/ipv4/tcp_rmem                                                                                                                                                 
4096    131072  6291456
root@prx-backup:~# uname -a
Linux prx-backup 6.17.11-4-test-pve #1 SMP PREEMPT_DYNAMIC PMX 6.17.11-4 (2025-12-09T09:02Z) x86_64 GNU/Linux
But the backups are nearly finished and haven't stalled yet. I wish it was simpler to reproduce!
This is not a dynamic value, so should not change.

Edit: Also we do not expect the backups to stall with this version. All the problematic commits we were able to identify have been reverted for this build. So @all affected users please do test this particular version if possible!
 
Last edited:
Ok, so 6.17.11-4-test-pve completed a round of normal backups without hiccups. Happened in the past as well, so can't yet be sure if it's really fixed. Testing 6.18+416dd649f3aa now.
If both go through the backups without issue, I'll test 6.17.11-4-test-pve for a night or two of "normal" backup schedule as well as 6.18+416dd649f3aa.

This is tcp_rmem on 6.18:
Code:
root@prx-backup:~# uname -a
Linux prx-backup 6.18.0+ #1 SMP PREEMPT_DYNAMIC Thu Dec 18 12:32:46 CET 2025 x86_64 GNU/Linux
root@prx-backup:~# cat /proc/sys/net/ipv4/tcp_rmem
4096    131072  33554432
 
  • Like
Reactions: fabian
Ok, so 6.17.11-4-test-pve completed a round of normal backups without hiccups. Happened in the past as well, so can't yet be sure if it's really fixed. Testing 6.18+416dd649f3aa now.
Great, thx! There is also a test build with the mentioned commits cherry picked available at http://download.proxmox.com/temp/kernel-6.17.11-tcp-stall-5/

If both go through the backups without issue, I'll test 6.17.11-4-test-pve for a night or two of "normal" backup schedule as well as 6.18+416dd649f3aa.

This is tcp_rmem on 6.18:
Code:
root@prx-backup:~# uname -a
Linux prx-backup 6.18.0+ #1 SMP PREEMPT_DYNAMIC Thu Dec 18 12:32:46 CET 2025 x86_64 GNU/Linux
root@prx-backup:~# cat /proc/sys/net/ipv4/tcp_rmem
4096    131072  33554432
Yes, the default max size was bumped from 6M to 32M in 572be9bf tcp: increase tcp_rmem[2] to 32 MB, so this is expected.
 
So 6.18+416dd649f3aa did stall again. I'll DM you the tcpdump and other outputs; the perf output was unfortunatly empty.
Thanks! The perf output being empty is a result as well, so we know this is not even being called.