[SOLVED] Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20

f.cuseo · Dec 1, 2025

GE_Admin said:
Hi,
now I don't have time to test it but I think that could be something related to LACP...

Does everyone who has the problem have LACP aggregation?

Me, of course... single bond 802.3ad (4 ports on PVE, 2 ports on PBS)

pfornara · Dec 1, 2025

GE_Admin said:
Hi,
now I don't have time to test it but I think that could be something related to LACP...

Does everyone who has the problem have LACP aggregation?

no bonding on mine setup , both pve or pbs side

GE_Admin · Dec 1, 2025

pfornara said:
no bonding on mine setup , both pve or pbs side

Do you have 10 Gbe or MTU at 9000?

Heracleos · Dec 1, 2025

GE_Admin said:
... Does everyone who has the problem have LACP aggregation?

Me too. All my PVE nodes run with a bond lacp 803.2ad layer 3+4 with MTU 1500, on two 10g cards.
The PBS is a VM that runs on one of the nodes, even though the virtual disks are qcow2 files on a NAS via NFS.

f.cuseo · Dec 1, 2025

pfornara said:
you could try to downgrade kernel also on PVE host and report it, maybe help, but its not related to specific manufacter driver.

Actually Im running Intel Corporation Ethernet Controller X710 for 10GbE SFP+ on 8.4.14 hosts
and
BCM5719 on 9.1.1 test host

but no issues on both scenario restoring VMs.

On the other hand Im on Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) on PBS 4.1 , and every backup was a nightmare before I reverted to good old 6.14

So, with a 9.1.1 node and 6.14.11-4-pve kernel, I can restore without any problem.
PS: i had not problems with mass live migration, and I use always the same bond of 4x10Gbit with vlans and mtu 9000

pfornara · Dec 1, 2025

GE_Admin said:
Do you have 10 Gbe or MTU at 9000?

both, 10Gbe on PVE and PBS, and MTU at 9000 on both nics

JCNED · Dec 1, 2025

I've seen several mention the Intel 82599ES NIC which is also what I'm running in my dev PBS Server (and PVE Hosts). 20GB LAGG's as well.

Given the drop in Disk I/O it is certainly pointing towards a networking issue on the surface.

f.cuseo · Dec 1, 2025

f.cuseo said:
So, with a 9.1.1 node and 6.14.11-4-pve kernel, I can restore without any problem.
PS: i had not problems with mass live migration, and I use always the same bond of 4x10Gbit with vlans and mtu 9000

I have downgraded kernel on two clusters and PBS, so they are to the last software available (no subscription) but kernel 6.14.11-4-pve.
One backup was normal, another one is still running, very slow; linux vm with guest agent:
INFO: 32% (19.2 GiB of 60.0 GiB) in 23m 32s, read: 16.0 MiB/s, write: 15.8 MiB/s
Incremental, new dirty-map, Really too slow....
so downgrading kernel doesn't resolve the issue for me,.

Michiel_1afa · Dec 2, 2025

Can confirm downgrading the kernel worked for us, no more hanging backups or broken vms.

f.cuseo · Dec 2, 2025

Michiel_1afa said:
Can confirm downgrading the kernel worked for us, no more hanging backups or broken vms.

Which PVE version are you using ? I have the last no-subscription with downgraded kernel too, but some vm is still very very very very slow... (i am testing only a few and less important vm).
INFO: 67% (40.2 GiB of 60.0 GiB) in 49m 34s, read: 9.2 MiB/s, write: 0 B/s

Michiel_1afa · Dec 2, 2025

f.cuseo said:
Which PVE version are you using ? I have the last no-subscription with downgraded kernel too, but some vm is still very very very very slow... (i am testing only a few and less important vm).
INFO: 67% (40.2 GiB of 60.0 GiB) in 49m 34s, read: 9.2 MiB/s, write: 0 B/s

it is in my earlier message, but our main production clusters we have both 9 and 8 versions, both fully updated as of last weekend. All PBS is 4 though (4.0 with the 6.14 kernel)

pfornara · Dec 2, 2025

Hope a fix comes out soon.

Since kernel releases in enterprise repositories should be tested and stable.

... any update from maintainers or dev team ?

f.cuseo · Dec 2, 2025

Michiel_1afa said:
it is in my earlier message, but our main production clusters we have both 9 and 8 versions, both fully updated as of last weekend. All PBS is 4 though (4.0 with the 6.14 kernel)

I started to have problems upgrading PBS from 4.0 to 4.1; no any problem found with PBS 4.0

fiona · Dec 2, 2025

We were not able to reproduce the issue yet unfortunately. Again, if you are not using ZFS, you can help narrow it down by testing mainline kernels:

fiona said:
The ZFS kernel module is the same version 2.3.4+pve1 in both kernel 6.14 and 6.17, so the likely cause of the issue is in the rest of the kernel code. Unfortunately, the difference between 6.14 and 6.17 is very big. If anybody is not using ZFS and still affected by the issue at hand, you could test mainline builds to help narrow it down:
https://kernel.ubuntu.com/mainline/v6.15/
https://kernel.ubuntu.com/mainline/v6.16/
(the amd64/linux-image... and amd64/linux-modules... packages need to be installed).

DjWulf · Dec 2, 2025

For us it was resolved after disabling JUMBO Frames all good since Sunday.

pfornara · Dec 2, 2025

DjWulf said:
For us it was resolved after disabling JUMBO Frames all good since Sunday.

Did you disabled on PBS host or also on PVE hosts ? Have you noticed performance issues with 10Gbit nics like decreasing transfer speeds ?

GE_Admin · Dec 2, 2025

DjWulf said:
For us it was resolved after disabling JUMBO Frames all good since Sunday.

Mmm... Heracleos said to have the problem with LAGG and MTU 1500...

My configuration has 2 x 10G LAGG with MTU 9000 with MLAG switches...

Knuut · Dec 2, 2025

DjWulf said:
For us it was resolved after disabling JUMBO Frames all good since Sunday.

I'll give it a try. Just installed latest 6.17.2 kernel and set MTU to 1500 only on pbs. Next backup on 12:00 local time. I'll report.

Best

Knuut

DjWulf · Dec 2, 2025

pfornara said:
Did you disabled on PBS host or also on PVE hosts ? Have you noticed performance issues with 10Gbit nics like decreasing transfer speeds ?

No, we've had no issue but hard to compare since we have a bandwith limit on our Backup jobs due to with 802.3ad we hit the limits on our core switch. We have roughly 22 backup servers running at the same time, all of them with 10Gbit nics and the backbone is MLAG with 2x100Gbit.

Chris · Dec 2, 2025

Is the issue reproducible if stressing the network, e.g. by iperf? iperf -s on the PBS host and iperf -c <PBS-host-IP> -t 600 -i 10 on the backup source.

Edit: Also, do you see high memory pressure on the PBS while this issue appears?

[SOLVED] Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20

Renowned Member

New Member

Member

Member

Renowned Member

New Member

Member

Renowned Member

Active Member

Renowned Member

Active Member

New Member

Renowned Member

Proxmox Staff Member

New Member

New Member

Member

Active Member

New Member

Proxmox Staff Member

We value your privacy