Dear Community,
I've experienced many issues with PVE 9.1 and the new kernel 6.17.
I'm unable to name all exactly but there were I/O hangs, kernel stack traces and so on.
6.17.2-1-pve was worst, it got a bit better with 6.17.2-2-pve.
Yesterday I had I/O timeouts while the PBS backup was running.
Today RBD started to have issued during a normal migration:
My solution is currently to pin back kernel 6.14.11-4-pve which was working fine in my environment.
I wonder if others experience similar issues within their environment.
Is it maybe related to me having KRBD enabled? Is it advisable to have this option disabled?
Please excuse the crudity of the post, I try to collect some information which might lead to the root cause of the issue.
Thanks!
Best regards,
Bernhard
I've experienced many issues with PVE 9.1 and the new kernel 6.17.
I'm unable to name all exactly but there were I/O hangs, kernel stack traces and so on.
6.17.2-1-pve was worst, it got a bit better with 6.17.2-2-pve.
Yesterday I had I/O timeouts while the PBS backup was running.
Today RBD started to have issued during a normal migration:
Code:
[34409.788606] INFO: task kworker/u128:3:723688 blocked for more than 245 seconds.
[34409.788932] Tainted: P O 6.17.2-2-pve #1
[34409.789223] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[34409.789525] task:kworker/u128:3 state:D stack:0 pid:723688 tgid:723688 ppid:2 task_flags:0x4208060 flags:0x00004000
[34409.789825] Workqueue: rbd28-tasks rbd_release_lock_work [rbd]
[34409.790128] Call Trace:
[34409.790406] <TASK>
[34409.790707] __schedule+0x468/0x1310
[34409.790986] ? srso_alias_return_thunk+0x5/0xfbef5
[34409.791258] ? sched_clock_noinstr+0x9/0x10
[34409.791578] ? srso_alias_return_thunk+0x5/0xfbef5
[34409.791850] ? sched_clock+0x10/0x30
[34409.792119] ? srso_alias_return_thunk+0x5/0xfbef5
[34409.792392] schedule+0x27/0xf0
[34409.792671] schedule_timeout+0xcf/0x110
[34409.792941] __wait_for_common+0x98/0x1b0
[34409.793211] ? __pfx_schedule_timeout+0x10/0x10
[34409.793499] wait_for_completion+0x24/0x40
[34409.793771] rbd_quiesce_lock+0xa6/0xf0 [rbd]
[34409.794050] rbd_release_lock_work+0x2f/0xc0 [rbd]
[34409.794327] process_one_work+0x18b/0x370
[34409.794606] worker_thread+0x33a/0x480
[34409.794872] ? __pfx_worker_thread+0x10/0x10
[34409.795134] kthread+0x10b/0x220
[34409.795391] ? __pfx_kthread+0x10/0x10
[34409.795652] ret_from_fork+0x208/0x240
[34409.795935] ? __pfx_kthread+0x10/0x10
[34409.796246] ret_from_fork_asm+0x1a/0x30
[34409.796522] </TASK>
My solution is currently to pin back kernel 6.14.11-4-pve which was working fine in my environment.
I wonder if others experience similar issues within their environment.
Is it maybe related to me having KRBD enabled? Is it advisable to have this option disabled?
Please excuse the crudity of the post, I try to collect some information which might lead to the root cause of the issue.
Thanks!
Best regards,
Bernhard