I have a PVE setup with a TrueNAS (FreeBSD-based) file server as a KVM guest. It is configured with a separate network interface (
After doing an in-place upgrade to PVE 8.4.1 from 7.4 a couple days ago using
If I leave it for long enough, TrueNAS kernel panics and reboots, and the rest of the backup seems to eventually finish, albeit quite slowly and with a couple failing again. During this time I was trying to kill the
TrueNAS
Proxmox
Attached is a more complete version that shows the time when the backup job started until the NFS failed.
Proxmox version
My interpretation is that when there starts to be significant traffic over NFS, sometimes something breaks within the network stack on Proxmox/Linux/KVM, causing FreeBSD to panic somewhere in its networking code. I don't really know how to start troubleshooting this since there isn't any indication of what might be going wrong in the Linux syslog, other than of course the NFS errors which I assume are a symptom of the original problem.
I've seen some people talking about downgrading the kernel version for various reasons. Might that fix this?
virtio
) used for an NFS share between it and the Proxmox host, as well as one for the local network. This provides storage for some other containers and as a backup destination for a daily backup I have set up. (Boot orders are set correctly, and it's not backing up TrueNAS to itself...)After doing an in-place upgrade to PVE 8.4.1 from 7.4 a couple days ago using
apt
, I have been noticing the server blows up whenever a my daily backup job starts. A couple backups complete successfully, but then I/O delay shown on the web UI goes up to 70-80%, and various parts of the UI are blank or show question marks. During this time, other guests seem unaffected, including containers and another FreeBSD guest (pfSense). TrueNAS, however, becomes essentially unresponsive and SMB shares it is providing on the local network are unavailable.If I leave it for long enough, TrueNAS kernel panics and reboots, and the rest of the backup seems to eventually finish, albeit quite slowly and with a couple failing again. During this time I was trying to kill the
vzdump
backup processes, though they seemed unresponsive even to kill -9
s for a while.TrueNAS
/var/log/messages
Code:
Feb 20 02:34:43 truenas spin lock 0xffffffff81f85e08 (sleepq chain) held by 0xfffffe00cac9dac0 (tid 101989) too long
Feb 20 02:34:43 truenas spin lock 0xffffffff81f58300 (callout) held by 0xfffffe00dd8c6ac0 (tid 101990) too long
Feb 20 02:34:43 truenas spin lock 0xffffffff81f38fa0 (cnputs_mtx) held by 0xfffffe00107f5c80 (tid 100065) too long
Feb 20 02:34:43 truenas panic: spin lock held too long
Feb 20 02:34:43 truenas cpuid = 6
Feb 20 02:34:43 truenas time = 1708405941
Feb 20 02:34:43 truenas KDB: stack backtrace:
Feb 20 02:34:43 truenas db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008ec48350
Feb 20 02:34:43 truenas vpanic() at vpanic+0x17f/frame 0xfffffe008ec483a0
Feb 20 02:34:43 truenas panic() at panic+0x43/frame 0xfffffe008ec48400
Feb 20 02:34:43 truenas _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x68/frame 0xfffffe008ec48410
Feb 20 02:34:43 truenas _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe008ec48480
Feb 20 02:34:43 truenas cnputsn() at cnputsn+0xd8/frame 0xfffffe008ec484c0
Feb 20 02:34:43 truenas putchar() at putchar+0x14a/frame 0xfffffe008ec48550
Feb 20 02:34:43 truenas kvprintf() at kvprintf+0xf5/frame 0xfffffe008ec48670
Feb 20 02:34:43 truenas vprintf() at vprintf+0x82/frame 0xfffffe008ec48750
Feb 20 02:34:43 truenas printf() at printf+0x43/frame 0xfffffe008ec487b0
Feb 20 02:34:43 truenas _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x5a/frame 0xfffffe008ec487c0
Feb 20 02:34:43 truenas _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe008ec48830
Feb 20 02:34:43 truenas callout_lock() at callout_lock+0xb0/frame 0xfffffe008ec48860
Feb 20 02:34:43 truenas _callout_stop_safe() at _callout_stop_safe+0xc9/frame 0xfffffe008ec488c0
Feb 20 02:34:43 truenas sleepq_remove_thread() at sleepq_remove_thread+0xb7/frame 0xfffffe008ec488e0
Feb 20 02:34:43 truenas sleepq_resume_thread() at sleepq_resume_thread+0x49/frame 0xfffffe008ec48920
Feb 20 02:34:43 truenas sleepq_broadcast() at sleepq_broadcast+0x74/frame 0xfffffe008ec48950
Feb 20 02:34:43 truenas cv_broadcastpri() at cv_broadcastpri+0x41/frame 0xfffffe008ec48970
Feb 20 02:34:43 truenas doselwakeup() at doselwakeup+0xa9/frame 0xfffffe008ec489c0
Feb 20 02:34:43 truenas sowakeup() at sowakeup+0x1e/frame 0xfffffe008ec489f0
Feb 20 02:34:43 truenas udp_append() at udp_append+0x254/frame 0xfffffe008ec48a70
Feb 20 02:34:43 truenas udp_input() at udp_input+0x733/frame 0xfffffe008ec48b50
Feb 20 02:34:43 truenas ip_input() at ip_input+0x11f/frame 0xfffffe008ec48be0
Feb 20 02:34:43 truenas netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe008ec48c30
Feb 20 02:34:43 truenas ether_demux() at ether_demux+0x138/frame 0xfffffe008ec48c60
Feb 20 02:34:43 truenas ether_nh_input() at ether_nh_input+0x355/frame 0xfffffe008ec48cc0
Feb 20 02:34:43 truenas netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe008ec48d10
Feb 20 02:34:43 truenas ether_input() at ether_input+0x69/frame 0xfffffe008ec48d70
Feb 20 02:34:43 truenas vtnet_rxq_eof() at vtnet_rxq_eof+0x73e/frame 0xfffffe008ec48e30
Feb 20 02:34:43 truenas vtnet_rx_vq_process() at vtnet_rx_vq_process+0x67/frame 0xfffffe008ec48e60
Feb 20 02:34:43 truenas ithread_loop() at ithread_loop+0x25a/frame 0xfffffe008ec48ef0
Feb 20 02:34:43 truenas fork_exit() at fork_exit+0x7e/frame 0xfffffe008ec48f30
Feb 20 02:34:43 truenas fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe008ec48f30
Feb 20 02:34:43 truenas --- trap 0x80b7b3a0, rip = 0xffffffff80aa32ef, rsp = 0, rbp = 0x32ec000 ---
Feb 20 02:34:43 truenas mi_startup() at mi_startup+0xdf/frame 0x32ec000
Feb 20 02:34:43 truenas KDB: enter: panic
Feb 20 02:34:43 truenas ---<<BOOT>>---
Feb 20 02:34:43 truenas Copyright (c) 1992-2021 The FreeBSD Project.
Feb 20 02:34:43 truenas Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Feb 20 02:34:43 truenas The Regents of the University of California. All rights reserved.
Feb 20 02:34:43 truenas FreeBSD is a registered trademark of The FreeBSD Foundation.
Feb 20 02:34:43 truenas FreeBSD 13.1-RELEASE-p9 n245429-296d095698e TRUENAS amd64
normal boot from here...
Proxmox
/var/log/syslog
at the same time (192.168.101.2 is the TrueNAS server's NFS interface IP address)Attached is a more complete version that shows the time when the backup job started until the NFS failed.
Code:
2024-02-20T02:33:35.940721-08:00 kernel: [82911.936038] I/O error, dev loop1, sector 12648880 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 2
2024-02-20T02:33:35.940737-08:00 kernel: [82911.936076] nfs: server 192.168.101.2 not responding, timed out
2024-02-20T02:33:35.940739-08:00 kernel: [82911.936089] I/O error, dev loop0, sector 16592952 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
2024-02-20T02:33:35.950991-08:00 kernel: [82911.940751] I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
2024-02-20T02:33:35.951006-08:00 kernel: [82911.946997] EXT4-fs warning (device loop1): htree_dirblock_to_tree:1082: inode #393240: lblock 0: comm qmgr: error -5 reading directory block
2024-02-20T02:33:35.960610-08:00 systemd[1]: session-141.scope: Deactivated successfully.
2024-02-20T02:33:35.960691-08:00 systemd[1]: session-141.scope: Consumed 2.542s CPU time.
2024-02-20T02:33:39.586794-08:00 pvestatd[1511]: storage 'TrueNAS' is not online
2024-02-20T02:33:39.637345-08:00 pvestatd[1511]: status update time (19331.242 seconds)
2024-02-20T02:33:45.729615-08:00 pvestatd[1511]: storage 'TrueNAS' is not online
2024-02-20T02:33:45.913988-08:00 pvestatd[1511]: status update time (6.276 seconds)
2024-02-20T02:33:54.945730-08:00 pvestatd[1511]: storage 'TrueNAS' is not online
2024-02-20T02:33:54.987696-08:00 pvestatd[1511]: status update time (5.073 seconds)
2024-02-20T02:34:04.161739-08:00 pvestatd[1511]: storage 'TrueNAS' is not online
2024-02-20T02:34:13.377724-08:00 pvestatd[1511]: storage 'TrueNAS' is not online
2024-02-20T02:34:25.665816-08:00 pvestatd[1511]: storage 'TrueNAS' is not online
Proxmox version
Code:
# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-8-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-10
proxmox-kernel-6.5: 6.5.11-8
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
pve-kernel-5.15.136-1-pve: 5.15.136-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.1
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-3
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.0
pve-qemu-kvm: 8.1.5-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1
My interpretation is that when there starts to be significant traffic over NFS, sometimes something breaks within the network stack on Proxmox/Linux/KVM, causing FreeBSD to panic somewhere in its networking code. I don't really know how to start troubleshooting this since there isn't any indication of what might be going wrong in the Linux syslog, other than of course the NFS errors which I assume are a symptom of the original problem.
I've seen some people talking about downgrading the kernel version for various reasons. Might that fix this?
Attachments
Last edited: