Hi. I've been using proxmox for a few years, and it worked fine for a while. A few months ago I started having issues with it not responding. I'd reboot and it worked for a time. Long story short, I semi guessed it must be hardware and I upgraded my home server's cpu, mobo, and ram. Finally got everything installed, get proxmox configured with the new hardware (networking, etc.) and I'm still having the same issues. Thinking it's probably not hardware related now..
New hardware:
AsRock Rack B650D4U
64GB ECC Ram
AMD EPYC 4005 (5th Gen) 4245P Hexa-core CPU
On Proxmox 8.4.1. I have 4 14tb drives in a ZFS raid 10 config. I've check the zfs status. Scrubbed the ZFS pool and everything looks good.
And then 2 SSDs. One for proxmox, and one for a Ubuntu server.
I have a Turnkey file server container that shares the ZFS directories using NFS and Samba to my windows computer and my other Ubuntu server VM.
My Ubuntu server connects to the turnkey file server shares, and runs docker. It's setup under /etc/fstab using samba at the moment to test. Previously it was NFS. Both had the same issue.
Things become unstable rather quickly. I'm seeing error logs such as:
Digging in more, when this is happening I loaded up top on the turnkey file server and noticed smbd was running at CPU of 100% and wouldn't go down. I had previously setup the share on Ubuntu using NFS and similar things were happening with that. So not sure if it's actually samba or NFS. If I ask ChatGPT it thinks it's some sort of ZFS bug, but doesn't seem really sure.
I upgraded the Kernel to the newer 6.14.8-3-bpo12-pve. I also tried using older kernels. No luck on 3-4 different ones.
I tried setting these ZFS settings, after reading it might help.
zfs set xattr=sa zfspool
zfs set acltype=off zfspool
I also started looking at the file permissions my docker containers were using. I run as a user and not root, but not all of the permissions seemed right. I tried running
Even doing a
I've been working at this for weeks without much luck. I'll get things almost working, and then I access the wrong file, or try to update the permissions on the wrong folder and everything freezes up. I tried to rule out certain things, including hardware, the kernel, ZFS the best I could, permissions, etc. These CPU soft lockups keep happening though and locking everything up. I have to do a hard reboot. It's pretty much unusable, and not sure what to try next. Any help is greatly appreciated!
Ubuntu VM
New hardware:
AsRock Rack B650D4U
64GB ECC Ram
AMD EPYC 4005 (5th Gen) 4245P Hexa-core CPU
On Proxmox 8.4.1. I have 4 14tb drives in a ZFS raid 10 config. I've check the zfs status. Scrubbed the ZFS pool and everything looks good.
And then 2 SSDs. One for proxmox, and one for a Ubuntu server.
I have a Turnkey file server container that shares the ZFS directories using NFS and Samba to my windows computer and my other Ubuntu server VM.
My Ubuntu server connects to the turnkey file server shares, and runs docker. It's setup under /etc/fstab using samba at the moment to test. Previously it was NFS. Both had the same issue.
Things become unstable rather quickly. I'm seeing error logs such as:
Sep 21 15:46:39 proxmox kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 547s! [smbd:4845]
Sep 21 15:46:39 proxmox kernel: Modules linked in: tcp_diag inet_diag rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl lockd grace veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink amdgpu zfs(PO) ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac spl(O) edac_mce_amd vhost_net kvm_amd vhost vhost_iotlb tap amdxcp drm_exec kvm gpu_sched vfio_pci drm_buddy drm_suballoc_helper vfio_pci_core drm_ttm_helper crct10dif_pclmul ttm polyval_clmulni irqbypass polyval_generic ghash_clmulni_intel drm_display_helper vfio_iommu_type1 acpi_ipmi sha256_ssse3 cec vfio sha1_ssse3 ipmi_si ses aesni_intel crypto_simd enclosure rc_core joydev ast input_leds ccp k10temp cryptd ipmi_devintf rapl pcspkr ipmi_msghandler wmi_bmof mac_hid iommufd efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_logitech_hidpp hid_logitech_dj hid_generic usbkbd usbmouse dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio
Sep 21 15:46:39 proxmox kernel: libcrc32c cdc_ether usbnet usbhid uas mii hid usb_storage xhci_pci xhci_pci_renesas crc32_pclmul igb ahci mpt3sas xhci_hcd i2c_piix4 libahci i2c_algo_bit raid_class dca scsi_transport_sas video wmi gpio_amdpt
Sep 21 15:46:39 proxmox kernel: CPU: 1 PID: 4845 Comm: smbd Tainted: P O L 6.8.12-15-pve #1
Sep 21 15:46:39 proxmox kernel: Hardware name: AsrockRack To be filled by O.E.M. /B650D4U, BIOS 22.01 08/12/2025
Sep 21 15:46:39 proxmox kernel: RIP: 0010:zap_leaf_lookup_closest+0xc2/0x1c0 [zfs]
Sep 21 15:46:39 proxmox kernel: Code: 00 00 00 48 8b 40 18 8b 8f d0 00 00 00 45 89 cf 83 e9 05 41 d3 e7 0f b7 ca 48 8d 0c 49 49 8d 0c 8f 48 8d 44 48 30 48 8b 48 10 <49> 39 c8 73 ba 4c 39 d1 73 5b 44 8b 60 0c 44 0f b7 50 0a 44 89 db
Sep 21 15:46:39 proxmox kernel: RSP: 0018:ffffb99d22ff3960 EFLAGS: 00000216
Sep 21 15:46:39 proxmox kernel: RAX: ffff9b40c2846548 RBX: 0000000000000159 RCX: 0000000000000000
Sep 21 15:46:39 proxmox kernel: RDX: 0000000000006161 RSI: ffffb99d22ff39a0 RDI: ffff9b40e081f400
Sep 21 15:46:39 proxmox kernel: RBP: ffffb99d22ff3988 R08: a97e277000000000 R09: 0000000000000001
Sep 21 15:46:39 proxmox kernel: R10: dead000000000100 R11: 0000000000000159 R12: 0000000000000000
Sep 21 15:46:39 proxmox kernel: R13: ffffb99d22ff39b6 R14: 0000000000000001 R15: 0000000000000200
Sep 21 15:46:39 proxmox kernel: FS: 00007aaa52bc4a40(0000) GS:ffff9b4b9fa80000(0000) knlGS:0000000000000000
Sep 21 15:46:39 proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 21 15:46:39 proxmox kernel: CR2: 00007aaa51d22fd8 CR3: 00000004c4428000 CR4: 0000000000f50ef0
Sep 21 15:46:39 proxmox kernel: PKRU: 55555554
Sep 21 15:46:39 proxmox kernel: Call Trace:
Sep 21 15:46:39 proxmox kernel: <IRQ>
Sep 21 15:46:39 proxmox kernel: ? show_regs+0x6d/0x80
Sep 21 15:46:39 proxmox kernel: ? watchdog_timer_fn+0x206/0x290
Sep 21 15:46:39 proxmox kernel: ? __pfx_watchdog_timer_fn+0x10/0x10
Sep 21 15:46:39 proxmox kernel: ? __hrtimer_run_queues+0x105/0x280
Sep 21 15:46:39 proxmox kernel: ? clockevents_program_event+0xb3/0x140
Sep 21 15:46:39 proxmox kernel: ? hrtimer_interrupt+0xf6/0x250
Sep 21 15:46:39 proxmox kernel: ? __sysvec_apic_timer_interrupt+0x4e/0x120
Sep 21 15:46:39 proxmox kernel: ? sysvec_apic_timer_interrupt+0x8d/0xd0
Sep 21 15:46:39 proxmox kernel: </IRQ>
Sep 21 15:46:39 proxmox kernel: <TASK>
Sep 21 15:46:39 proxmox kernel: ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
Sep 21 15:46:39 proxmox kernel: ? zap_leaf_lookup_closest+0xc2/0x1c0 [zfs]
Sep 21 15:46:39 proxmox kernel: fzap_cursor_retrieve+0x10f/0x390 [zfs]
Sep 21 15:46:39 proxmox kernel: zap_cursor_retrieve+0x222/0x390 [zfs]
Sep 21 15:46:39 proxmox kernel: ? dbuf_prefetch+0x13/0x30 [zfs]
Sep 21 15:46:39 proxmox kernel: zfs_readdir+0x219/0x560 [zfs]
Sep 21 15:46:39 proxmox kernel: zpl_iterate+0x54/0x90 [zfs]
Sep 21 15:46:39 proxmox kernel: iterate_dir+0x114/0x210
Sep 21 15:46:39 proxmox kernel: __x64_sys_getdents64+0x84/0x130
Sep 21 15:46:39 proxmox kernel: ? __pfx_filldir64+0x10/0x10
Sep 21 15:46:39 proxmox kernel: x64_sys_call+0x1548/0x2480
Sep 21 15:46:39 proxmox kernel: do_syscall_64+0x81/0x170
Sep 21 15:46:39 proxmox kernel: ? mntput+0x24/0x50
Sep 21 15:46:39 proxmox kernel: ? path_put+0x1e/0x30
Sep 21 15:46:39 proxmox kernel: ? path_getxattr+0x88/0xe0
Sep 21 15:46:39 proxmox kernel: ? arch_exit_to_user_mode_prepare.constprop.0+0x1a/0xe0
Sep 21 15:46:39 proxmox kernel: ? syscall_exit_to_user_mode+0x43/0x1e0
Sep 21 15:46:39 proxmox kernel: ? do_syscall_64+0x8d/0x170
Sep 21 15:46:39 proxmox kernel: ? arch_exit_to_user_mode_prepare.constprop.0+0x1a/0xe0
Sep 21 15:46:39 proxmox kernel: ? syscall_exit_to_user_mode+0x43/0x1e0
Sep 21 15:46:39 proxmox kernel: ? do_syscall_64+0x8d/0x170
Sep 21 15:46:39 proxmox kernel: ? arch_exit_to_user_mode_prepare.constprop.0+0x1a/0xe0
Sep 21 15:46:39 proxmox kernel: ? syscall_exit_to_user_mode+0x43/0x1e0
Sep 21 15:46:39 proxmox kernel: ? do_syscall_64+0x8d/0x170
Sep 21 15:46:39 proxmox kernel: ? syscall_exit_to_user_mode+0x43/0x1e0
Sep 21 15:46:39 proxmox kernel: ? do_syscall_64+0x8d/0x170
Sep 21 15:46:39 proxmox kernel: ? syscall_exit_to_user_mode+0x43/0x1e0
Sep 21 15:46:39 proxmox kernel: ? do_syscall_64+0x8d/0x170
Sep 21 15:46:39 proxmox kernel: ? do_syscall_64+0x8d/0x170
Sep 21 15:46:39 proxmox kernel: ? do_syscall_64+0x8d/0x170
Sep 21 15:46:39 proxmox kernel: entry_SYSCALL_64_after_hwframe+0x78/0x80
Sep 21 15:46:39 proxmox kernel: RIP: 0033:0x7aaa56cadfb7
Sep 21 15:46:39 proxmox kernel: Code: 0f 1f 00 48 8b 47 20 c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 81 fa ff ff ff 7f b8 ff ff ff 7f 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 a9 ae 10 00 f7 d8 64 89 02 48
Sep 21 15:46:39 proxmox kernel: RSP: 002b:00007ffee43d76a8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
Sep 21 15:46:39 proxmox kernel: RAX: ffffffffffffffda RBX: 00005fbe40ee7ee0 RCX: 00007aaa56cadfb7
Sep 21 15:46:39 proxmox kernel: RDX: 0000000000008000 RSI: 00005fbe40ee7f10 RDI: 000000000000004a
Sep 21 15:46:39 proxmox kernel: RBP: 00005fbe40ee7f10 R08: 0000000000000030 R09: 0000000000000001
Sep 21 15:46:39 proxmox kernel: R10: 00005fbe40ead180 R11: 0000000000000293 R12: ffffffffffffff58
Sep 21 15:46:39 proxmox kernel: R13: 00005fbe40ee7ee4 R14: 000000000000005f R15: 00005fbe40dab610
Sep 21 15:46:39 proxmox kernel: </TASK>
Digging in more, when this is happening I loaded up top on the turnkey file server and noticed smbd was running at CPU of 100% and wouldn't go down. I had previously setup the share on Ubuntu using NFS and similar things were happening with that. So not sure if it's actually samba or NFS. If I ask ChatGPT it thinks it's some sort of ZFS bug, but doesn't seem really sure.
I upgraded the Kernel to the newer 6.14.8-3-bpo12-pve. I also tried using older kernels. No luck on 3-4 different ones.
I tried setting these ZFS settings, after reading it might help.
zfs set xattr=sa zfspool
zfs set acltype=off zfspool
I also started looking at the file permissions my docker containers were using. I run as a user and not root, but not all of the permissions seemed right. I tried running
chown
against a certain folder for my kopia backup cache and it triggered the soft lockup immediately. Something like this:chown -R myaccount:users /mnt/zfsdocker/
Message from syslogd@ct-fileserver at Sep 30 01:39:00 ...
kernel:[ 404.030072] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [chown:6251]
Message from syslogd@ct-fileserver at Sep 30 01:39:28 ...
kernel:[ 432.030052] watchdog: BUG: soft lockup - CPU#0 stuck for 52s! [chown:6251]
Message from syslogd@ct-fileserver at Sep 30 01:39:56 ...
kernel:[ 460.029971] watchdog: BUG: soft lockup - CPU#0 stuck for 78s! [chown:6251]
Message from syslogd@ct-fileserver at Sep 30 01:40:24 ...
kernel:[ 488.029975] watchdog: BUG: soft lockup - CPU#0 stuck for 104s! [chown:6251]
Message from syslogd@ct-fileserver at Sep 30 01:40:52 ...
kernel:[ 516.030273] watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [chown:6251]
Even doing a
ls /mnt/zfsdocker/kopia/path
would trigger the soft lockup immediately and hang the server permanently until I did a hard reboot.I've been working at this for weeks without much luck. I'll get things almost working, and then I access the wrong file, or try to update the permissions on the wrong folder and everything freezes up. I tried to rule out certain things, including hardware, the kernel, ZFS the best I could, permissions, etc. These CPU soft lockups keep happening though and locking everything up. I have to do a hard reboot. It's pretty much unusable, and not sure what to try next. Any help is greatly appreciated!
Ubuntu VM
Turnkey File serveragent: 1
boot: order=scsi0;ide2;net0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 12288
meta: creation-qemu=8.0.2,ctime=1691893089
name: Ubuntu22.0.4
net0: virtio=FA:A9:A3:08:A2:07,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: Containers:vm-101-disk-0,aio=threads,iothread=1,size=75G
scsihw: virtio-scsi-single
smbios1: uuid=f243033f-ece4-4e1e-b873-b1da0e3a03f4
sockets: 1
tablet: 0
tags: ubuntu
vga: std
vmgenid: 5982d62c-e4d9-4ab4-aad6-72aeb08349cb
arch: amd64
cores: 2
features: mount=nfs;cifs,nesting=1
hostname: ct-fileserver
memory: 500
mp0: /zfs01/zfsdocker,mp=/mnt/zfsdocker
mp1: /zfs01/zfsmedia,mp=/mnt/zfsmedia
mp2: /zfs01/zfsbackup,mp=/mnt/zfsbackup
mp3: /zfs01/zfsdata,mp=/mnt/zfsdata
mp4: /mnt/usb-drive,mp=/mnt/usb-drive
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.88.1,hwaddr=E2:C4:86:CF:3A:A7,ip=192.168.88.19/24,type=veth
onboot: 1
ostype: debian
rootfs: local-lvm:vm-102-disk-0,size=8G
searchdomain: 192.168.88.1
swap: 256