Hi,
i am running a cluster of 5 nodes with Proxmox 5.1 since months and yesterday all nodes all of a sudden stopped "seeing" eachother and i find these error logs in `dmesg -T` repeated several times on all servers at around the same time.
And idea what yould be causing this and how to get the cluster in sync again?
i am running a cluster of 5 nodes with Proxmox 5.1 since months and yesterday all nodes all of a sudden stopped "seeing" eachother and i find these error logs in `dmesg -T` repeated several times on all servers at around the same time.
Code:
[Sun Apr 29 15:22:56 2018] INFO: task pvesr:19470 blocked for more than 120 seconds.
[Sun Apr 29 15:22:56 2018] Tainted: P O 4.13.13-5-pve #1
[Sun Apr 29 15:22:56 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Apr 29 15:22:56 2018] pvesr D 0 19470 1 0x00000000
[Sun Apr 29 15:22:56 2018] Call Trace:
[Sun Apr 29 15:22:56 2018] __schedule+0x3cc/0x850
[Sun Apr 29 15:22:56 2018] ? path_parentat+0x3e/0x80
[Sun Apr 29 15:22:56 2018] schedule+0x36/0x80
[Sun Apr 29 15:22:56 2018] rwsem_down_write_failed+0x230/0x3a0
[Sun Apr 29 15:22:56 2018] call_rwsem_down_write_failed+0x17/0x30
[Sun Apr 29 15:22:56 2018] ? call_rwsem_down_write_failed+0x17/0x30
[Sun Apr 29 15:22:56 2018] down_write+0x2d/0x40
[Sun Apr 29 15:22:56 2018] filename_create+0x7e/0x160
[Sun Apr 29 15:22:56 2018] SyS_mkdir+0x51/0x100
[Sun Apr 29 15:22:56 2018] ? exit_to_usermode_loop+0x9b/0xd0
[Sun Apr 29 15:22:56 2018] entry_SYSCALL_64_fastpath+0x33/0xa3
[Sun Apr 29 15:22:56 2018] RIP: 0033:0x7fd6f89da477
[Sun Apr 29 15:22:56 2018] RSP: 002b:00007fff33652338 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[Sun Apr 29 15:22:56 2018] RAX: ffffffffffffffda RBX: 000055a6710ee010 RCX: 00007fd6f89da477
[Sun Apr 29 15:22:56 2018] RDX: 000055a66f403484 RSI: 00000000000001ff RDI: 000055a6745ca2d0
[Sun Apr 29 15:22:56 2018] RBP: 0000000000000000 R08: 0000000000000200 R09: 000055a6710ee028
[Sun Apr 29 15:22:56 2018] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a6731e2158
[Sun Apr 29 15:22:56 2018] R13: 000055a6745659f0 R14: 000055a6745ca2d0 R15: 00000000000001ff
And idea what yould be causing this and how to get the cluster in sync again?