Backup job failed

Loic92

Member
Oct 23, 2022
34
1
13
Paris
magic-radio.net
PVE 9.1.1, Linux 6.17.2-2-pve
PBS 4.1.0, Linux 6.17.2-2-pve
Community subscription

Hello,

I have a daily backup job at 4am for all my VMs running on a cluster of 3 nodes, and I've had this error on one node:
1764913856147.png

Can you please help to investigate?
Thanks.

Note: when I'm starting manually the backup job for each VM on the concerned node, it's working well.
 
Last edited:
Hi,

do you have MTU 9000 configured for the interface over which the backup traffic is being routed? If so, you most likely are affected by the 6.17.2 kernel bug as discussed here https://forum.proxmox.com/threads/s...r-updated-to-pve-9-1-1-and-pbs-4-0-20.176444/

If so, please install the latest available kernel version 6.17.4-1 from pbs-test repo and see if the issue is fixed
I don't use MTU 9000, but I'm going to upgrade the kernel, I will see if I have the issue again, thanks.
 
Then it is most likely a different issue. Please post the backup task log for the failed backup job.
 
Apparently there is an issue with the pvescheduler since yesterday:

Code:
root@nab91:~# journalctl -k | grep pvescheduler | more
Dec 04 14:15:00 nab91 kernel: pvescheduler[1055747]: segfault at 231 ip 0000000000000231 sp 00007ffebaa4f5c8 error 14 likely on CPU 17 (core 29, socket 0)
Dec 04 14:16:00 nab91 kernel: pvescheduler[1057044]: segfault at 231 ip 0000000000000231 sp 00007ffebaa4f5c8 error 14 likely on CPU 2 (core 4, socket 0)
Dec 04 14:17:00 nab91 kernel: pvescheduler[1057407]: segfault at 231 ip 0000000000000231 sp 00007ffebaa4f5c8 error 14 likely on CPU 10 (core 20, socket 0)
Dec 04 14:18:00 nab91 kernel: pvescheduler[1057784]: segfault at 231 ip 0000000000000231 sp 00007ffebaa4f5c8 error 14 likely on CPU 6 (core 12, socket 0)
Dec 04 14:19:00 nab91 kernel: pvescheduler[1058146]: segfault at 231 ip 0000000000000231 sp 00007ffebaa4f5c8 error 14 likely on CPU 4 (core 8, socket 0)
Dec 04 14:20:00 nab91 kernel: pvescheduler[1058505]: segfault at 231 ip 0000000000000231 sp 00007ffebaa4f5c8 error 14 likely on CPU 10 (core 20, socket 0)
etc. until now and it'still ongoing
 
So an issue on the PVE side then: Please post the output of pveversion -v and make sure your system has the latest firmware and microcode installed, see also https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_firmware_cpu
Code:
root@nab91:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.17.2-2-pve)
pve-manager: 9.1.1 (running version: 9.1.1/42db4a6cf33dac83)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.2-2-pve-signed: 6.17.2-2
proxmox-kernel-6.17: 6.17.2-2
proxmox-kernel-6.17.2-1-pve-signed: 6.17.2-1
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20250812.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.4
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.0.15
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.3
libpve-rs-perl: 0.11.3
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.20-1
proxmox-backup-file-restore: 4.0.20-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.2
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.1
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.0.8
pve-i18n: 3.6.2
pve-qemu-kvm: 10.1.2-4
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.0
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

Code:
root@nab91:~# apt install intel-microcode
intel-microcode is already the newest version (3.20250812.1~deb13u1).
Summary:
  Upgrading: 0, Installing: 0, Removing: 0, Not Upgrading: 1
 
Last edited:
We can close this topic, I'm going to ask for an hardware change. I have 3 of them (so Minisforum NAB9) and only one is not working well so there is an issue with it. RAM and Disk cannot be faulty, I did deep load tests on them during hours and hours.