pvesr issue

schinzelh

Active Member
Apr 30, 2018
16
3
43
Hi,

i am running a cluster of 5 nodes with Proxmox 5.1 since months and yesterday all nodes all of a sudden stopped "seeing" eachother and i find these error logs in `dmesg -T` repeated several times on all servers at around the same time.

Code:
[Sun Apr 29 15:22:56 2018] INFO: task pvesr:19470 blocked for more than 120 seconds.

[Sun Apr 29 15:22:56 2018]       Tainted: P           O    4.13.13-5-pve #1
[Sun Apr 29 15:22:56 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Apr 29 15:22:56 2018] pvesr           D    0 19470      1 0x00000000
[Sun Apr 29 15:22:56 2018] Call Trace:
[Sun Apr 29 15:22:56 2018]  __schedule+0x3cc/0x850
[Sun Apr 29 15:22:56 2018]  ? path_parentat+0x3e/0x80
[Sun Apr 29 15:22:56 2018]  schedule+0x36/0x80
[Sun Apr 29 15:22:56 2018]  rwsem_down_write_failed+0x230/0x3a0
[Sun Apr 29 15:22:56 2018]  call_rwsem_down_write_failed+0x17/0x30
[Sun Apr 29 15:22:56 2018]  ? call_rwsem_down_write_failed+0x17/0x30
[Sun Apr 29 15:22:56 2018]  down_write+0x2d/0x40
[Sun Apr 29 15:22:56 2018]  filename_create+0x7e/0x160
[Sun Apr 29 15:22:56 2018]  SyS_mkdir+0x51/0x100
[Sun Apr 29 15:22:56 2018]  ? exit_to_usermode_loop+0x9b/0xd0
[Sun Apr 29 15:22:56 2018]  entry_SYSCALL_64_fastpath+0x33/0xa3
[Sun Apr 29 15:22:56 2018] RIP: 0033:0x7fd6f89da477
[Sun Apr 29 15:22:56 2018] RSP: 002b:00007fff33652338 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[Sun Apr 29 15:22:56 2018] RAX: ffffffffffffffda RBX: 000055a6710ee010 RCX: 00007fd6f89da477
[Sun Apr 29 15:22:56 2018] RDX: 000055a66f403484 RSI: 00000000000001ff RDI: 000055a6745ca2d0
[Sun Apr 29 15:22:56 2018] RBP: 0000000000000000 R08: 0000000000000200 R09: 000055a6710ee028
[Sun Apr 29 15:22:56 2018] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a6731e2158
[Sun Apr 29 15:22:56 2018] R13: 000055a6745659f0 R14: 000055a6745ca2d0 R15: 00000000000001ff

And idea what yould be causing this and how to get the cluster in sync again?
 
Hi Fabian,

that sounds exactly like what i am seeing: corosync causing high load, pve-ha-lrm being stuck. I will try the new packages on pvetest and see if they fix it for me. Any further gotchas/tips for the update?

Holger
 
restarting as described in the linked thread and then upgrading should be enough.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!