The NAS which is used by my proxmox as a backup datastorage had a problem. The problem led to the nas being very unresponsive/hanging. It looks like this behaviour also influenced
How can this happen? Is there a way to mitigate such a problem should it happen again in the future?
pvestatd
is such a way that it has became a zombie process. dmesg
shows the following errors. I can say for sure that the cifs nas had a problem and that it (most likley) really did not answer for 120 sec (or even more). Since there are no VMs stored on it and no backup was running at that time, I would expect that such a failure should have little to no impact on proxmox. Unfortunatley it somehow made pvestatd hung in such a way that I can not restart it, not even kill it. ps
shows it as being "D" = dead, and the parent process is PID 1, so only a reboot will help me here.How can this happen? Is there a way to mitigate such a problem should it happen again in the future?
Code:
[23225.274454] CIFS VFS: Server xxx.xxx.xxx.xxx has not responded in 120 seconds. Reconnecting...
[23323.060336] INFO: task kworker/12:0:14135 blocked for more than 120 seconds.
[23323.061232] Tainted: P O 5.0.21-1-pve #1
[23323.061948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23323.062588] kworker/12:0 D 0 14135 2 0x80000000
[23323.062646] Workqueue: cifsiod smb2_reconnect_server [cifs]
[23323.062648] Call Trace:
[23323.062656] __schedule+0x2d4/0x870
[23323.062659] schedule+0x2c/0x70
[23323.062661] schedule_preempt_disabled+0xe/0x10
[23323.062662] __mutex_lock.isra.10+0x2e4/0x4c0
[23323.062666] __mutex_lock_slowpath+0x13/0x20
[23323.062666] mutex_lock+0x2c/0x30
[23323.062682] smb2_reconnect+0x102/0x7d0 [cifs]
[23323.062688] ? lock_timer_base+0x6b/0x90
[23323.062692] ? wait_woken+0x80/0x80
[23323.062707] smb2_reconnect_server+0x18c/0x2d0 [cifs]
[23323.062710] process_one_work+0x20f/0x410
[23323.062712] worker_thread+0x34/0x400
[23323.062714] kthread+0x120/0x140
[23323.062715] ? process_one_work+0x410/0x410
[23323.062716] ? __kthread_parkme+0x70/0x70
[23323.062718] ret_from_fork+0x35/0x40
[23323.062733] INFO: task pvestatd:8008 blocked for more than 120 seconds.
[23323.063336] Tainted: P O 5.0.21-1-pve #1
[23323.064099] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23323.064837] pvestatd D 0 8008 2633 0x80000004
[23323.064839] Call Trace:
[23323.064844] __schedule+0x2d4/0x870
[23323.064848] schedule+0x2c/0x70
[23323.064852] schedule_preempt_disabled+0xe/0x10
[23323.064854] __mutex_lock.isra.10+0x2e4/0x4c0
[23323.064864] __mutex_lock_slowpath+0x13/0x20
[23323.064865] mutex_lock+0x2c/0x30
[23323.064887] cifs_mark_open_files_invalid+0x5b/0xa0 [cifs]
[23323.064908] smb2_reconnect+0x149/0x7d0 [cifs]
[23323.064929] smb2_plain_req_init+0x34/0x260 [cifs]
[23323.064946] SMB2_open_init+0x69/0x760 [cifs]
[23323.064963] SMB2_open+0x148/0x510 [cifs]
[23323.064980] open_shroot+0x170/0x210 [cifs]
[23323.064997] ? open_shroot+0x170/0x210 [cifs]
[23323.065014] smb2_query_path_info+0x137/0x1c0 [cifs]
[23323.065016] ? _cond_resched+0x19/0x30
[23323.065018] ? _cond_resched+0x19/0x30
[23323.065022] ? kmem_cache_alloc_trace+0x153/0x1d0
[23323.065047] cifs_get_inode_info+0x283/0xb40 [cifs]
[23323.065067] ? build_path_from_dentry_optional_prefix+0xc4/0x410 [cifs]
[23323.065090] cifs_revalidate_dentry_attr+0xdd/0x3a0 [cifs]
[23323.065113] cifs_getattr+0x5a/0x1a0 [cifs]
[23323.065120] vfs_getattr_nosec+0x73/0x90
[23323.065123] vfs_getattr+0x36/0x40
[23323.065124] vfs_statx+0x8d/0xe0
[23323.065126] __do_sys_newstat+0x3d/0x70
[23323.065128] __x64_sys_newstat+0x16/0x20
[23323.065131] do_syscall_64+0x5a/0x110
[23323.065133] entry_SYSCALL_64_after_hwframe+0x44/0xa9