Severe system freeze with NFS on Proxmox 9 running kernel 6.14.8-2-pve when mounting NFS shares

Krailler

New Member
Aug 10, 2025
1
0
1
Hello everyone,

I have recently updated to Proxmox VE 9 with kernel 6.14.8-2-pve on an Intel N100 system and now, when mounting an NFSv4.2 share from a NAS server, performing write operations causes a massive increase in IO delay and eventually the entire system freezes, requiring a manual reboot.

Details:
  • The NFS server is working fine and the network is not saturated.
  • Other servers mounting the same NFS share don’t experience issues.
  • The mount uses default NFSv4.2 options with large rsize and wsize.
  • Tried different mount options (including NFSv3 and smaller sizes) with no success.
  • The problem only occurs on Proxmox 9 running kernel 6.14.8.
  • SMB mounts work fine without freezes.
  • Reinstalled nfs-common package without changes.
  • No ECC memory and no visible hardware issues.
Has anyone else experienced similar NFS freezes on Proxmox 9 with this kernel? Are there any known workarounds or fixes?

Thanks in advance.
 
We were able to reproduce a problem by mounting NFS-Shares with kernel "6.14.8-2-pve".
It is not possible to mount.

We tried to mount manually by using the following command:
Code:
mount -t nfs -o vers=4.2 X.X.X.X:/pve/VMSTORE /mnt/pve/VMSTORE -vvvvvv
The following error occurs:

Code:
mount.nfs: mount(2): Input/output error
mount.nfs: mount system call failed for /mnt/pve2/VMSTORE

After performing a rebooting with "6.8.12-13" everything works fine.
 
I am not only experiencing the above with NFS, but also with CIFS. Regular mounts work fine, but with any large data transfer, the IO delay comes and everything becomes unresponsive. The NFS/CIFS shares become disconnected and I have to reboot.

I'm running Proxmox 9 (kernel 6.14.8-2-pve) on a Minisforum MS-A2 with 96GB Ram. I never had issues like this on Proxmox 8. I tried looking for kernel 6.8.12-13, but it is unavailable on the Proxmox 9 free repositories. I haven't found any mention of these issues anywhere but here. I guess I either have to wait until the next kernel release, or re-install Proxmox 8 on my cluster.
 
I see the similar issue with my proxmox setup as mentioned by original poster when copying large amounts of data to nfs: high load average, lost connection to nfs vm, proxmox web UI does not get any data from VMs / unresponsive.

Not 100% sure, but it looks like I have found a way to 'unfreeze' the system in such case without the need to reboot the whole server - run `umount -f /mnt/pve/suspicious_nfs_share`, possibly several times. It won't unmount, because 'device is busy', but somehow it breaks deadlock and let the system continue to work. Maybe this is a coincidence, but this trick saved me a couple of times.
 
That does seem to unlock the mount. Strange that it's happening though. I didn't have an issue before upgrading to version 9, but when I reloaded one node with 8.4, it was doing the same thing. I can't figure it out.

Edit: That trick doesn't always work.
 
Last edited:
I was checking every configuration change that I did as my backups to NFS share was crashing. Looks like the update broke it huh :)
 
Mine is a TrueNAS Scale, which is running as a VM inside the same host. I didn't notice the freezing during usual NFS share usage, but my scheduled backups were failing as the NFS share was getting frozen after the update to Proxmox 9. I've had the same setup for a long time and didn't notice this issue previously. But SMB mount works fine, as stated by @Krailler
 
  • Like
Reactions: Gilberto Ferreira
Mine is a TrueNAS Scale, which is running as a VM inside the same host. I didn't notice the freezing during usual NFS share usage, but my scheduled backups were failing as the NFS share was getting frozen after the update to Proxmox 9. I've had the same setup for a long time and didn't notice this issue previously. But SMB mount works fine, as stated by @Krailler
dmesg, journalctl -f show something?
 
This is happening on both NFS and CIFS for me. The shares are on a QNAP TVS-h1288X running QuTS hero h5.2.6.3195 (latest).

Edit: I also have a QNAP TS-1635AX running QTS 5.2.6.3195. I never had issues before upgrading to Proxmox 9.
 
Last edited:
Hi,

Same issue here. Backup jobs hang when copying data to the NFS shared drive.

I switched to SMB and the issue was resolved.
 
Trying to move a 128Gig vm disk from QNAP CIFS share to local ssd. The system load raised to about 10 and the IO Pressure Stall was 90-100%.

create full clone of drive scsi0 (kradianos-nfs:400/vm-400-disk-0.qcow2)
Logical volume "vm-400-disk-2" created.
Logical volume pve/vm-400-disk-2 changed.
transferred 0.0 B of 128.0 GiB (0.00%)
transferred 1.3 GiB of 128.0 GiB (1.00%)
transferred 2.6 GiB of 128.0 GiB (2.00%)
transferred 3.8 GiB of 128.0 GiB (3.00%)
transferred 5.1 GiB of 128.0 GiB (4.00%)
transferred 6.4 GiB of 128.0 GiB (5.00%)
transferred 7.7 GiB of 128.0 GiB (6.00%)
transferred 9.0 GiB of 128.0 GiB (7.01%)
transferred 10.3 GiB of 128.0 GiB (8.01%)
transferred 11.5 GiB of 128.0 GiB (9.01%)
transferred 12.8 GiB of 128.0 GiB (10.01%)
transferred 14.1 GiB of 128.0 GiB (11.01%)
transferred 15.4 GiB of 128.0 GiB (12.01%)
transferred 16.7 GiB of 128.0 GiB (13.01%)
transferred 17.9 GiB of 128.0 GiB (14.01%)
transferred 19.2 GiB of 128.0 GiB (15.01%)
transferred 20.5 GiB of 128.0 GiB (16.01%)
transferred 21.8 GiB of 128.0 GiB (17.02%)
qemu-img: error while reading at byte 24178065408: Resource temporarily unavailable
qemu-img: error while reading at byte 24175968256: Resource temporarily unavailable
qemu-img: error while reading at byte 24173871104: Resource temporarily unavailable
qemu-img: error while reading at byte 24171773952: Resource temporarily unavailable
qemu-img: error while reading at byte 24169676800: Resource temporarily unavailable
qemu-img: error while reading at byte 24167579648: Resource temporarily unavailable
Logical volume "vm-400-disk-2" successfully removed.
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f qcow2 -O raw /mnt/pve/kradianos-cifs/images/400/vm-400-disk-0.qcow2 zeroinit:/dev/pve/vm-400-disk-2' failed: exit code 1
 
Interesting. In networking, I have my main bridge on a 1gb line. I also have a 10gb bridge. I moved the gateway from the 1gig to the 10gig. I transferred over CIFS without any errors. Perhaps not the fastest. I need to make some networking changes. I'll keep monitoring this.
 
I just wanted to chime in here and confirm I have the same issue. Using any NFS4 mount crashes the system at a certain point. No issues with 6.8 though. I'm using TrueNAS Scale as well on a separate machine.
 
Hello,

Whenever the host is having "difficulties" to talk to the NFS/CIFS storage, do you see any suspicious entry in the system's journal? You can see the system journals for the current boot via `journalctl -b`.

Does downgrading to kernel 6.8 help in this situation? The repositories for Proxmox VE 9 do not have a copy of version 6.8 of the kernel. However, for debugging purposes it is OK to download from the Proxmox VE 8 repositories the current stable 6.8 kernel and install it in the meanwhile. In order to boot into this older kernel version one might require to manually pin the kernel for the next boot as explained in [1].

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot_kernel_pin
 
Hello everyone,

I would like to add my observations as I am experiencing the exact same problem.

Since last night (August 28, 2025, starting around 3:00 AM), my Proxmox host becomes completely unresponsive during the scheduled backup jobs that write to an NFS share. The Web-UI is mostly down, and SSH connections are possible.

Interestingly, the host seems to "freeze" only partially. The backup jobs themselves continue to run and complete successfully. As soon as the last backup job is finished, the host immediately becomes fully accessible again, as if nothing happened. I can reproduce this behavior reliably by manually starting a backup to the NFS share.

While the system is in this "frozen" state, I managed to keep an SSH session open. My observations are:

  • Simple commands like uptime or df still work and return output instantly.
  • However, any command that tries to read process states, such as top, htop, or ps -ef, hangs indefinitely and produces no output even after several minutes.
  • The system load is extremely high during this period. uptime shows a load average of around 21, 20, 15.

1756412060025.png

What might be particularly relevant for troubleshooting:
The issue first occurred last night while the host was still running Proxmox VE 8. I had performed the preparatory steps from the pve8to9 guide the evening before (August 27), but the actual distribution upgrade to PVE 9 was only done this morning at 10:00 AM, after the problem had already appeared for the first time.

Here is the history from the evening before the first freeze. The night from the 26th to the 27th was completely fine.

Code:
Start-Date: 2025-08-27  20:23:16
Commandline: apt upgrade
Upgrade: pve-manager:amd64 (8.4.11, 8.4.12)
End-Date: 2025-08-27  20:23:22

Start-Date: 2025-08-27  20:32:56
Commandline: apt remove systemd-boot
Remove: systemd-boot:amd64 (252.38-1~deb12u1)
End-Date: 2025-08-27  20:32:57

Start-Date: 2025-08-27  20:34:12
Commandline: apt install amd64-microcode
Install: amd64-microcode:amd64 (3.20240820.1~deb12u1)
End-Date: 2025-08-27  20:34:34

Start-Date: 2025-08-27  20:35:07
Commandline: apt install --reinstall grub-efi-amd64
Reinstall: grub-efi-amd64:amd64 (2.06-13+pmx7)
End-Date: 2025-08-27  20:35:13

Start-Date: 2025-08-27  21:04:34
Commandline: apt full-upgrade
Upgrade: librados2:amd64 (17.2.8-pve2, 18.2.7-pve1), ceph-fuse:amd64 (17.2.8-pve2, 18.2.7-pve1), python3-ceph-common:amd64 (17.2.8-pve2, 18.2.7-pve1), librbd1:amd64 (17.2.8-pve2, 18.2.7-pve1), librgw2:amd64 (17.2.8-pve2, 18.2.7-pve1), ceph-common:amd64 (17.2.8-pve2, 18.2.7-pve1), python3-cephfs:amd64 (17.2.8-pve2, 18.2.7-pve1), libcephfs2:amd64 (17.2.8-pve2, 18.2.7-pve1), libradosstriper1:amd64 (17.2.8-pve2, 18.2.7-pve1), python3-rbd:amd64 (17.2.8-pve2, 18.2.7-pve1), python3-rgw:amd64 (17.2.8-pve2, 18.2.7-pve1), python3-ceph-argparse:amd64 (17.2.8-pve2, 18.2.7-pve1), python3-rados:amd64 (17.2.8-pve2, 18.2.7-pve1)
End-Date: 2025-08-27  21:04:40

Start-Date: 2025-08-27  21:05:47
Commandline: apt full-upgrade
Upgrade: librados2:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), ceph-fuse:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), python3-ceph-common:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), librbd1:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), librgw2:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), ceph-common:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), python3-cephfs:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), libcephfs2:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), libradosstriper1:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), python3-rbd:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), python3-rgw:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), python3-ceph-argparse:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1), python3-rados:amd64 (18.2.7-pve1, 19.2.2-pve1~bpo12+1)
End-Date: 2025-08-27  21:05:53

Although no kernel update is explicitly listed in this log, one of the package updates from that evening must have introduced this behavior. My symptoms align perfectly with a kernel or storage subsystem issue related to NFS.

I hope this information helps narrow down the cause.

@Maximiliano

UPDATE 2025-08-29 16:16

Code:
Aug 28 21:47:51 proxmox pvedaemon[588804]: INFO: Starting Backup of VM 111 (qemu)
Aug 28 21:47:51 proxmox kernel:  sdc: sdc1
Aug 28 21:47:51 proxmox kernel: vfio-pci 0000:06:00.0: resetting
Aug 28 21:47:51 proxmox kernel:  sdb: sdb1
Aug 28 21:47:51 proxmox kernel: vfio-pci 0000:06:00.0: reset done
Aug 28 21:47:51 proxmox kernel:  sda: sda1
Aug 28 21:47:51 proxmox kernel: vfio-pci 0000:07:00.0: resetting
Aug 28 21:47:51 proxmox kernel: vfio-pci 0000:07:00.0: reset done
Aug 28 21:47:51 proxmox systemd[1]: Started 111.scope.
Aug 28 21:47:52 proxmox kernel: tap111i0: entered promiscuous mode
Aug 28 21:47:52 proxmox kernel: vmbr0: port 2(fwpr111p0) entered blocking state
Aug 28 21:47:52 proxmox kernel: vmbr0: port 2(fwpr111p0) entered disabled state
Aug 28 21:47:52 proxmox kernel: fwpr111p0: entered allmulticast mode
Aug 28 21:47:52 proxmox kernel: fwpr111p0: entered promiscuous mode
Aug 28 21:47:52 proxmox kernel: vmbr0: port 2(fwpr111p0) entered blocking state
Aug 28 21:47:52 proxmox kernel: vmbr0: port 2(fwpr111p0) entered forwarding state
Aug 28 21:47:52 proxmox kernel: fwbr111i0: port 1(fwln111i0) entered blocking state
Aug 28 21:47:52 proxmox kernel: fwbr111i0: port 1(fwln111i0) entered disabled state
Aug 28 21:47:52 proxmox kernel: fwln111i0: entered allmulticast mode
Aug 28 21:47:52 proxmox kernel: fwln111i0: entered promiscuous mode
Aug 28 21:47:52 proxmox kernel: fwbr111i0: port 1(fwln111i0) entered blocking state
Aug 28 21:47:52 proxmox kernel: fwbr111i0: port 1(fwln111i0) entered forwarding state
Aug 28 21:47:52 proxmox kernel: fwbr111i0: port 2(tap111i0) entered blocking state
Aug 28 21:47:52 proxmox kernel: fwbr111i0: port 2(tap111i0) entered disabled state
Aug 28 21:47:52 proxmox kernel: tap111i0: entered allmulticast mode
Aug 28 21:47:52 proxmox kernel: fwbr111i0: port 2(tap111i0) entered blocking state
Aug 28 21:47:52 proxmox kernel: fwbr111i0: port 2(tap111i0) entered forwarding state
Aug 28 21:47:52 proxmox kernel: vfio-pci 0000:06:00.0: resetting
Aug 28 21:47:52 proxmox kernel: vfio-pci 0000:06:00.0: reset done
Aug 28 21:47:52 proxmox kernel: vfio-pci 0000:07:00.0: resetting
Aug 28 21:47:52 proxmox kernel: vfio-pci 0000:07:00.0: reset done
Aug 28 21:47:52 proxmox kernel: vfio-pci 0000:07:00.0: resetting
Aug 28 21:47:52 proxmox kernel: vfio-pci 0000:07:00.0: reset done
Aug 28 21:47:52 proxmox kernel: vfio-pci 0000:06:00.0: resetting
Aug 28 21:47:52 proxmox kernel: vfio-pci 0000:06:00.0: reset done
Aug 28 21:47:52 proxmox pvedaemon[588804]: VM 111 started with PID 588840.
Aug 28 21:47:53 proxmox systemd[1]: Started check-mk-agent@895-1235-999.service - Checkmk agent (PID 1235/UID 999).
Aug 28 21:47:54 proxmox pveproxy[557805]: worker exit
Aug 28 21:47:54 proxmox pveproxy[1516]: worker 557805 finished
Aug 28 21:47:54 proxmox pveproxy[1516]: starting 1 worker(s)
Aug 28 21:47:54 proxmox pveproxy[1516]: worker 588999 started
Aug 28 21:47:55 proxmox systemd[1]: check-mk-agent@895-1235-999.service: Deactivated successfully.
Aug 28 21:47:55 proxmox systemd[1]: check-mk-agent@895-1235-999.service: Consumed 1.513s CPU time, 48.4M memory peak.
Aug 28 21:47:56 proxmox pvedaemon[514612]: <root@pam> successful auth for user 'checkmk@pve'
Aug 28 21:48:53 proxmox pvedaemon[504642]: <root@pam> successful auth for user 'root@pam'
Aug 28 21:49:43 proxmox pvestatd[1476]: status update time (72.651 seconds)
Aug 28 21:49:57 proxmox pvedaemon[504642]: <root@pam> successful auth for user 'root@pam'
Aug 28 21:50:27 proxmox pveproxy[573689]: proxy detected vanished client connection
Aug 28 21:50:59 proxmox pveproxy[583729]: proxy detected vanished client connection
Aug 28 21:51:00 proxmox pveproxy[583729]: proxy detected vanished client connection
Aug 28 21:51:07 proxmox pveproxy[573689]: proxy detected vanished client connection
Aug 28 21:51:17 proxmox pveproxy[573689]: proxy detected vanished client connection
Aug 28 21:51:25 proxmox pveproxy[588999]: proxy detected vanished client connection
Aug 28 21:51:47 proxmox pveproxy[583729]: proxy detected vanished client connection
Aug 28 21:51:59 proxmox pveproxy[583729]: proxy detected vanished client connection
Aug 28 21:51:59 proxmox pveproxy[588999]: proxy detected vanished client connection
Aug 28 21:52:29 proxmox pveproxy[573689]: proxy detected vanished client connection
Aug 28 21:52:39 proxmox sshd-session[590216]: Accepted publickey for root from 192.168.137.150 port 63955 ssh2: RSA SHA256:sOb7KQbZ2UbuUUSlRmMOqxThh+9/VfoiSoK8ki6xnxY
Aug 28 21:52:39 proxmox sshd-session[590216]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Aug 28 21:52:39 proxmox systemd-logind[1102]: New session 29 of user root.
Aug 28 21:52:39 proxmox systemd[1]: Started session-29.scope - Session 29 of User root.
Aug 28 21:53:00 proxmox pveproxy[588999]: proxy detected vanished client connection
Aug 28 21:53:57 proxmox kernel: INFO: task ksmd:99 blocked for more than 122 seconds.
Aug 28 21:53:57 proxmox kernel:       Tainted: P           O       6.14.8-2-pve #1
Aug 28 21:53:57 proxmox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 28 21:53:57 proxmox kernel: task:ksmd            state:D stack:0     pid:99    tgid:99    ppid:2      task_flags:0x200040 flags:0x00004000
Aug 28 21:53:57 proxmox kernel: Call Trace:
Aug 28 21:53:57 proxmox kernel:  <TASK>
Aug 28 21:53:57 proxmox kernel:  __schedule+0x466/0x13f0
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? finish_task_switch.isra.0+0x9c/0x340
Aug 28 21:53:57 proxmox kernel:  schedule+0x29/0x130
Aug 28 21:53:57 proxmox kernel:  schedule_preempt_disabled+0x15/0x30
Aug 28 21:53:57 proxmox kernel:  rwsem_down_read_slowpath+0x230/0x460
Aug 28 21:53:57 proxmox kernel:  ? schedule_timeout+0x92/0x110
Aug 28 21:53:57 proxmox kernel:  down_read+0x48/0xc0
Aug 28 21:53:57 proxmox kernel:  ksm_scan_thread+0x16e/0x26a0
Aug 28 21:53:57 proxmox kernel:  ? __pfx_ksm_scan_thread+0x10/0x10
Aug 28 21:53:57 proxmox kernel:  kthread+0xfc/0x230
Aug 28 21:53:57 proxmox kernel:  ? __pfx_kthread+0x10/0x10
Aug 28 21:53:57 proxmox kernel:  ret_from_fork+0x47/0x70
Aug 28 21:53:57 proxmox kernel:  ? __pfx_kthread+0x10/0x10
Aug 28 21:53:57 proxmox kernel:  ret_from_fork_asm+0x1a/0x30
Aug 28 21:53:57 proxmox kernel:  </TASK>
Aug 28 21:53:57 proxmox kernel: INFO: task worker:589731 blocked for more than 122 seconds.
Aug 28 21:53:57 proxmox kernel:       Tainted: P           O       6.14.8-2-pve #1
Aug 28 21:53:57 proxmox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 28 21:53:57 proxmox kernel: task:worker          state:D stack:0     pid:589731 tgid:2931  ppid:1      task_flags:0x84000c0 flags:0x00000002
Aug 28 21:53:57 proxmox kernel: Call Trace:
Aug 28 21:53:57 proxmox kernel:  <TASK>
Aug 28 21:53:57 proxmox kernel:  __schedule+0x466/0x13f0
Aug 28 21:53:57 proxmox kernel:  ? __set_task_blocked+0x29/0x80
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? __x64_sys_rt_sigprocmask+0xd9/0x160
Aug 28 21:53:57 proxmox kernel:  schedule+0x29/0x130
Aug 28 21:53:57 proxmox kernel:  schedule_preempt_disabled+0x15/0x30
Aug 28 21:53:57 proxmox kernel:  rwsem_down_read_slowpath+0x230/0x460
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  down_read+0x48/0xc0
Aug 28 21:53:57 proxmox kernel:  do_madvise+0x11f/0x480
Aug 28 21:53:57 proxmox kernel:  ? switch_fpu_return+0x4f/0xe0
Aug 28 21:53:57 proxmox kernel:  __x64_sys_madvise+0x2b/0x40
Aug 28 21:53:57 proxmox kernel:  x64_sys_call+0x21a9/0x2310
Aug 28 21:53:57 proxmox kernel:  do_syscall_64+0x7e/0x170
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? futex_wake+0x8a/0x1a0
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? do_futex+0x18e/0x260
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? __x64_sys_futex+0x128/0x200
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? syscall_exit_to_user_mode+0x38/0x1d0
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? do_syscall_64+0x8a/0x170
Aug 28 21:53:57 proxmox kernel:  ? irqentry_exit_to_user_mode+0x2d/0x1d0
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? irqentry_exit+0x43/0x50
Aug 28 21:53:57 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 28 21:53:57 proxmox kernel:  ? common_interrupt+0x64/0xe0
Aug 28 21:53:57 proxmox kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 28 21:53:57 proxmox kernel: RIP: 0033:0x786b7871ebb7
Aug 28 21:53:57 proxmox kernel: RSP: 002b:00007869c1ff6dd8 EFLAGS: 00000206 ORIG_RAX: 000000000000001c
Aug 28 21:53:57 proxmox kernel: RAX: ffffffffffffffda RBX: 00007869c1ffbcdc RCX: 0000786b7871ebb7
Aug 28 21:53:57 proxmox kernel: RDX: 0000000000000004 RSI: 00000000007f7000 RDI: 00007869c17fb000
Aug 28 21:53:57 proxmox kernel: RBP: 00007869c17fb000 R08: 00007869c1ffb6c0 R09: 0000000000000000
Aug 28 21:53:57 proxmox kernel: R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000801000
Aug 28 21:53:57 proxmox kernel: R13: 000000000000000b R14: 0000786b749d5880 R15: 00007869c17fb000
Aug 28 21:53:57 proxmox kernel:  </TASK>
Aug 28 21:53:57 proxmox kernel: INFO: task worker:589734 blocked for more than 122 seconds.
Aug 28 21:53:57 proxmox kernel:       Tainted: P           O       6.14.8-2-pve #1
Aug 28 21:53:57 proxmox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 28 21:53:57 proxmox kernel: task:worker          state:D stack:0     pid:589734 tgid:2931  ppid:1      task_flags:0x84000c0 flags:0x00000002
Aug 28 21:53:57 proxmox kernel: Call Trace:
...
Aug 28 21:56:05 proxmox pveproxy[583729]: proxy detected vanished client connection
Aug 28 21:56:16 proxmox pveproxy[583729]: proxy detected vanished client connection
Aug 28 21:56:16 proxmox pveproxy[573689]: proxy detected vanished client connection
Aug 28 21:56:20 proxmox pveproxy[588999]: proxy detected vanished client connection
Aug 28 21:56:37 proxmox pveproxy[573689]: proxy detected vanished client connection
Aug 28 21:56:50 proxmox pveproxy[588999]: proxy detected vanished client connection
Aug 28 22:05:23 proxmox pveproxy[573689]: worker exit
Aug 28 22:05:23 proxmox pveproxy[1516]: worker 573689 finished
Aug 28 22:05:23 proxmox pveproxy[1516]: starting 1 worker(s)
Aug 28 22:05:23 proxmox pveproxy[1516]: worker 592104 started
Aug 28 22:10:49 proxmox pveproxy[583729]: proxy detected vanished client connection
Aug 28 22:12:58 proxmox pveproxy[583729]: worker exit
Aug 28 22:12:58 proxmox pveproxy[1516]: worker 583729 finished
Aug 28 22:12:58 proxmox pveproxy[1516]: starting 1 worker(s)
Aug 28 22:12:58 proxmox pveproxy[1516]: worker 593233 started
Aug 28 22:17:01 proxmox CRON[593860]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Aug 28 22:17:01 proxmox CRON[593862]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Aug 28 22:17:01 proxmox CRON[593860]: pam_unix(cron:session): session closed for user root
Aug 28 22:18:18 proxmox pvestatd[1476]: status update time (1705.407 seconds)
Aug 28 22:18:25 proxmox pvedaemon[516904]: <root@pam> end task UPID:proxmox:0008FC04:004C5139:68B0B267:vzdump:111:root@pam: unexpected status
...
Aug 28 22:18:36 proxmox systemd[1]: Stopping user@0.service - User Manager for UID 0...

UPDATE 2025-09-26

Once fixed, the backup process should no longer be able to be canceled via the GUI.

Code:
ps -ef | grep -ie vzdump -ie zstd
kill <zstd-pid>
 
Last edited:
Hello,

Whenever the host is having "difficulties" to talk to the NFS/CIFS storage, do you see any suspicious entry in the system's journal? You can see the system journals for the current boot via `journalctl -b`.

Does downgrading to kernel 6.8 help in this situation? The repositories for Proxmox VE 9 do not have a copy of version 6.8 of the kernel. However, for debugging purposes it is OK to download from the Proxmox VE 8 repositories the current stable 6.8 kernel and install it in the meanwhile. In order to boot into this older kernel version one might require to manually pin the kernel for the next boot as explained in [1].

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot_kernel_pin
I just had some CIFS issues. Here's the `journalctl -b` errors. I have not tried 6.8 kernel yet, but have tried the latest 6.14.11-1 kernel. Of course it locked up the lxc and I couldn't get it to force stop. I had to reboot the whole server.

Code:
Aug 29 15:41:03 pve pvestatd[1365]: storage 'kradianos-nfs' is not online
Aug 29 15:41:03 pve pvestatd[1365]: status update time (20.823 seconds)
Aug 29 15:41:06 pve kernel: CIFS: VFS: \\10.0.0.29 sends on sock 00000000859fa48d stuck for 15 seconds
Aug 29 15:41:07 pve kernel: ------------[ cut here ]------------
Aug 29 15:41:07 pve kernel: UBSAN: shift-out-of-bounds in ./include/linux/folio_queue.h:311:19
Aug 29 15:41:07 pve kernel: shift exponent 242 is too large for 64-bit type 'long unsigned int'
Aug 29 15:41:07 pve kernel: CPU: 13 UID: 0 PID: 2788797 Comm: kworker/u130:7 Tainted: P           O       6.14.11-1-pve #1
Aug 29 15:41:07 pve kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
Aug 29 15:41:07 pve kernel: Hardware name: Micro Computer (HK) Tech Limited MS-A2/F1WSA, BIOS 1.02 06/16/2025
Aug 29 15:41:07 pve kernel: Workqueue: events_unbound netfs_write_collection_worker [netfs]
Aug 29 15:41:07 pve kernel: Call Trace:
Aug 29 15:41:07 pve kernel:  <TASK>
Aug 29 15:41:07 pve kernel:  dump_stack_lvl+0x5f/0x90
Aug 29 15:41:07 pve kernel:  dump_stack+0x10/0x18
Aug 29 15:41:07 pve kernel:  ubsan_epilogue+0x9/0x40
Aug 29 15:41:07 pve kernel:  __ubsan_handle_shift_out_of_bounds.cold+0x61/0xe6
Aug 29 15:41:07 pve kernel:  netfs_limit_iter.cold+0x20/0x8b [netfs]
Aug 29 15:41:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 29 15:41:07 pve kernel:  ? cifs_prepare_write+0xc1/0x2b0 [cifs]
Aug 29 15:41:07 pve kernel:  netfs_retry_writes+0x66a/0x830 [netfs]
Aug 29 15:41:07 pve kernel:  netfs_write_collection+0x671/0xda0 [netfs]
Aug 29 15:41:07 pve kernel:  netfs_write_collection_worker+0x89/0x110 [netfs]
Aug 29 15:41:07 pve kernel:  process_one_work+0x172/0x350
Aug 29 15:41:07 pve kernel:  worker_thread+0x34a/0x480
Aug 29 15:41:07 pve kernel:  ? __pfx_worker_thread+0x10/0x10
Aug 29 15:41:07 pve kernel:  kthread+0xf9/0x230
Aug 29 15:41:07 pve kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 15:41:07 pve kernel:  ret_from_fork+0x44/0x70
Aug 29 15:41:07 pve kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 15:41:07 pve kernel:  ret_from_fork_asm+0x1a/0x30
Aug 29 15:41:07 pve kernel:  </TASK>
Aug 29 15:41:07 pve kernel: ---[ end trace ]---
Aug 29 15:41:07 pve kernel: CIFS: VFS: \\10.0.0.29 Error -14 sending data on socket to server
Aug 29 15:41:07 pve kernel: ------------[ cut here ]------------
Aug 29 15:41:07 pve kernel: UBSAN: shift-out-of-bounds in ./include/linux/folio_queue.h:311:19
Aug 29 15:41:07 pve kernel: shift exponent 242 is too large for 64-bit type 'long unsigned int'
Aug 29 15:41:07 pve kernel: CPU: 18 UID: 0 PID: 2788797 Comm: kworker/u130:7 Tainted: P           O       6.14.11-1-pve #1
Aug 29 15:41:07 pve kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
Aug 29 15:41:07 pve kernel: Hardware name: Micro Computer (HK) Tech Limited MS-A2/F1WSA, BIOS 1.02 06/16/2025
Aug 29 15:41:07 pve kernel: Workqueue: events_unbound netfs_write_collection_worker [netfs]
Aug 29 15:41:07 pve kernel: Call Trace:
Aug 29 15:41:07 pve kernel:  <TASK>
Aug 29 15:41:07 pve kernel:  dump_stack_lvl+0x5f/0x90
Aug 29 15:41:07 pve kernel:  dump_stack+0x10/0x18
Aug 29 15:41:07 pve kernel:  ubsan_epilogue+0x9/0x40
Aug 29 15:41:07 pve kernel:  __ubsan_handle_shift_out_of_bounds.cold+0x61/0xe6
Aug 29 15:41:07 pve kernel:  iov_iter_advance.cold+0x14/0x19
Aug 29 15:41:07 pve kernel:  netfs_reissue_write+0x4e/0xa0 [netfs]
Aug 29 15:41:07 pve kernel:  netfs_retry_writes+0x64e/0x830 [netfs]
Aug 29 15:41:07 pve kernel:  netfs_write_collection+0x671/0xda0 [netfs]
Aug 29 15:41:07 pve kernel:  netfs_write_collection_worker+0x89/0x110 [netfs]
Aug 29 15:41:07 pve kernel:  process_one_work+0x172/0x350
Aug 29 15:41:07 pve kernel:  worker_thread+0x34a/0x480
Aug 29 15:41:07 pve kernel:  ? __pfx_worker_thread+0x10/0x10
Aug 29 15:41:07 pve kernel:  kthread+0xf9/0x230
Aug 29 15:41:07 pve kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 15:41:07 pve kernel:  ret_from_fork+0x44/0x70
Aug 29 15:41:07 pve kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 15:41:07 pve kernel:  ret_from_fork_asm+0x1a/0x30
Aug 29 15:41:07 pve kernel:  </TASK>
Aug 29 15:41:07 pve kernel: ---[ end trace ]---
Aug 29 15:41:07 pve kernel: CIFS: trying to dequeue a deleted mid
Aug 29 15:41:17 pve kernel: overlayfs: fs on '/var/lib/docker/overlay2/l/G5R66IT7NQT4DWMHNA4COZ6CLS' does not support file handles, falling back to xino=off.
Aug 29 15:41:18 pve kernel: docker0: port 4(vetha3375ea) entered blocking state
 
Last edited:
I have similar issues with pve 9.

My setup contains a storage vm (in goes three virtio block device passthrough HDDs - out comes nfs/cifs) and multiple vms and lxc containers that have the shares mounted.
This worked rock solid for years (at least since pve 6 or 7), but since upgrading to v9 and kernel 6.14, I have had multiple deadlocks, especially when the storage VM takes some time to serve requests because it's undergoing RAID rebuild.

They definitely seem kernel related. It's either nfs or block device passthrough.
I have now passed the HBA to the storage vm entirely (via pcie passthrough). If the system wide crash happens again it can only be nfs.
the reboot command wont work, the only thing that helps is rebooting via sysrq-trigger or a physical keyboard ("REISUB" method).
 

Attachments