All proxmox process in the 'D' state and very big LA

Frakir

Member
Jan 20, 2016
29
0
21
52
Hi All!

I have proxmox cluser, six nodes.
pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
on all of the nodes.

On one of them (with nfs-server, it's mostly used for backup) sometimes I see very big LA. 13..14... 20... and it grows bigger.
There is very little cpu usage (top says 99.6% idle), very little iowait (around zero). Very little network exchange.
When I try pct list it halts.
When I try to systemctl restart pvedaemon, pvestatd and so on I get timeout.
There is a lot of vzdump process in 'D' state (they must run on the other nodes but they wait proxmox answer forever). And a lot of the other proxmox process in the same 'D' state.
root@backup-node:~# ps ax | grep ' D'
770 ? D 0:00 /usr/bin/perl -T /usr/bin/vzdump 601 602 --node p6 --compress gzip --mailnotification always --mode snapshot --mailto proxmox@my-domain.com --storage nfs-storage --quiet 1
1026 ? D 0:00 /usr/bin/perl -T /usr/sbin/pct list
1796 ? D 0:00 /usr/bin/perl -T /usr/bin/vzdump 301 302 --quiet 1 --storage nfs-storage --mailto proxmox@my-doimain.com --mailnotification always --node p3 --compress gzip --mode snapshot
2430 ? Ds 0:00 /usr/bin/perl /usr/sbin/pve-firewall stop
4063 pts/1 D+ 0:00 /usr/bin/perl -T /usr/sbin/pct list
25551 ? Ds 0:00 /usr/bin/perl -T /usr/bin/pvedaemon stop
28963 ? Ds 0:00 /usr/bin/perl -T /usr/bin/pvesr run --mail 1
30877 ? Ds 0:00 /usr/bin/perl -T /usr/bin/pvedaemon start
and so on.

Reboot helps. But it happens more and more frequently and rebooting it every day is not a permanent solution.

Where to look for the source of the problem? What can be done?
 
When processes end up in 'D' state, they are waiting on a syscall to return (strace might help here to determine what syscall the process is waiting for), usually IO, what dmesg tell's you? How about avoiding doing all backup at the same time?
 
Thanks, I will try strace next time it halts. dmesg didn't say anything about it.
I hope strace will get the point where it halts, thanks for pointing at it.

All backups are at night. Almost one by one, not at the same time. The problem continutes (or happens) when there is no working backup anywhere.
It works fine and after some point LA gets higher every time any of the proxmox processes starts (e.g. vzdump for the other node, it gets 'D' state and LA gets higer). After some point pvecm says the cluster is ok but web-interface says the node is out.
 
I think it's not exactly the same I don't have Ceph. zfs and ext4. All VMs and lxc-containers but one are stopped. Everything works except proxmox utils which halts ('D' state) when run.
 
well read the file /proc/[PID]/stack the next time you get a process stuck in D state and it'll tell you exactly which syscall it's getting stuck at and hopefully provide a hint as to where to go from there. You might also want to try Magic SysRq key "d" to show all held locks. NFS is known for locking up from time to time on IO errors and getting a process stuck in D state.
 
  • Like
Reactions: Chris
This time I got the stack.
For vzdump, /usr/bin/perl -T /usr/bin/pvesr run --mail 1, /usr/bin/perl -T /usr/bin/pveproxy restart, /usr/bin/perl -T /usr/sbin/pct list
it's like this:
[<0>] call_rwsem_down_write_failed+0x17/0x30
[<0>] filename_create+0x7e/0x160
[<0>] SyS_mkdir+0x51/0x100
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

For /usr/bin/perl /usr/bin/pveupdate:
[<0>] call_rwsem_down_read_failed+0x18/0x30
[<0>] path_openat+0x897/0x14a0
[<0>] do_filp_open+0x99/0x110
[<0>] do_sys_open+0x135/0x280
[<0>] SyS_open+0x1e/0x20
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

pmxcfs is stuck.
But I don't know what to do with this, how to prevent this or at least get it work back without reboot..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!