Not sure how to start this topic, but I noticed strange behavior on one of my systems, here is the kernel output:
I also see that, the corosync daemon constantly trying to sent cluster beacons to the other one.
Code:
[ 242.670592] INFO: task pvescheduler:2292 blocked for more than 120 seconds.
[ 242.670600] Tainted: P O 5.13.19-2-pve #1
[ 242.670602] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.670604] task:pvescheduler state:D stack: 0 pid: 2292 ppid: 2290 flags:0x00000004
[ 242.670608] Call Trace:
[ 242.670612] __schedule+0x2fa/0x910
[ 242.670618] schedule+0x4f/0xc0
[ 242.670622] rwsem_down_write_slowpath+0x212/0x590
[ 242.670626] down_write+0x43/0x50
[ 242.670629] filename_create+0x7e/0x160
[ 242.670634] do_mkdirat+0x58/0x150
[ 242.670637] __x64_sys_mkdir+0x1b/0x20
[ 242.670640] do_syscall_64+0x61/0xb0
[ 242.670644] ? syscall_exit_to_user_mode+0x27/0x50
[ 242.670646] ? do_syscall_64+0x6e/0xb0
[ 242.670649] ? __x64_sys_alarm+0x4a/0x90
[ 242.670654] ? exit_to_user_mode_prepare+0x37/0x1b0
[ 242.670658] ? syscall_exit_to_user_mode+0x27/0x50
[ 242.670661] ? do_syscall_64+0x6e/0xb0
[ 242.670663] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 242.670667] RIP: 0033:0x7fecb3ac6b07
[ 242.670669] RSP: 002b:00007ffee4775b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[ 242.670673] RAX: ffffffffffffffda RBX: 00005609a93de2a0 RCX: 00007fecb3ac6b07
[ 242.670675] RDX: 0000000000000027 RSI: 00000000000001ff RDI: 00005609aec86500
[ 242.670677] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000001f28
[ 242.670679] R10: 00007ffee4775b00 R11: 0000000000000246 R12: 00005609aec86500
[ 242.670681] R13: 00005609a93e3ca8 R14: 00005609aad08688 R15: 00000000000001ff
[ 242.670686] INFO: task pvesr:2734 blocked for more than 120 seconds.
[ 242.670689] Tainted: P O 5.13.19-2-pve #1
[ 242.670690] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.670692] task:pvesr state:D stack: 0 pid: 2734 ppid: 2732 flags:0x00000000
[ 242.670695] Call Trace:
[ 242.670697] __schedule+0x2fa/0x910
[ 242.670700] schedule+0x4f/0xc0
[ 242.670703] rwsem_down_write_slowpath+0x212/0x590
[ 242.670706] down_write+0x43/0x50
[ 242.670709] filename_create+0x7e/0x160
[ 242.670712] do_mkdirat+0x58/0x150
[ 242.670716] __x64_sys_mkdir+0x1b/0x20
[ 242.670719] do_syscall_64+0x61/0xb0
[ 242.670721] ? syscall_exit_to_user_mode+0x27/0x50
[ 242.670724] ? __x64_sys_read+0x1a/0x20
[ 242.670727] ? do_syscall_64+0x6e/0xb0
[ 242.670730] ? irqentry_exit_to_user_mode+0x9/0x20
[ 242.670732] ? irqentry_exit+0x19/0x30
[ 242.670734] ? exc_page_fault+0x8f/0x170
[ 242.670736] ? asm_exc_page_fault+0x8/0x30
[ 242.670739] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 242.670741] RIP: 0033:0x7fcdda383b07
[ 242.670743] RSP: 002b:00007ffed62878f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[ 242.670746] RAX: ffffffffffffffda RBX: 00005647dee7a2a0 RCX: 00007fcdda383b07
[ 242.670747] RDX: 00005647de8a9ae5 RSI: 00000000000001ff RDI: 00005647e2faa490
[ 242.670749] RBP: 0000000000000000 R08: 00005647e2faa4f8 R09: 0000000000000000
[ 242.670751] R10: 000000000000000f R11: 0000000000000246 R12: 00005647e2faa490
[ 242.670753] R13: 00005647e02431f8 R14: 00005647e321e798 R15: 00000000000001ff
root@proxmox-node-1.home.lan:~#
I also see that, the corosync daemon constantly trying to sent cluster beacons to the other one.
Code:
Dec 17 16:28:44 proxmox-node-1.home.lan corosync[2077]: [KNET ] link: host: 2 link: 0 is down
Dec 17 16:28:44 proxmox-node-1.home.lan corosync[2077]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Dec 17 16:28:44 proxmox-node-1.home.lan corosync[2077]: [KNET ] host: host: 2 has no active links
Dec 17 16:28:45 proxmox-node-1.home.lan corosync[2077]: [KNET ] rx: host: 2 link: 0 is up
Dec 17 16:28:45 proxmox-node-1.home.lan corosync[2077]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)