Hi All!
I have proxmox cluser, six nodes.
pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
on all of the nodes.
On one of them (with nfs-server, it's mostly used for backup) sometimes I see very big LA. 13..14... 20... and it grows bigger.
There is very little cpu usage (top says 99.6% idle), very little iowait (around zero). Very little network exchange.
When I try pct list it halts.
When I try to systemctl restart pvedaemon, pvestatd and so on I get timeout.
There is a lot of vzdump process in 'D' state (they must run on the other nodes but they wait proxmox answer forever). And a lot of the other proxmox process in the same 'D' state.
root@backup-node:~# ps ax | grep ' D'
770 ? D 0:00 /usr/bin/perl -T /usr/bin/vzdump 601 602 --node p6 --compress gzip --mailnotification always --mode snapshot --mailto proxmox@my-domain.com --storage nfs-storage --quiet 1
1026 ? D 0:00 /usr/bin/perl -T /usr/sbin/pct list
1796 ? D 0:00 /usr/bin/perl -T /usr/bin/vzdump 301 302 --quiet 1 --storage nfs-storage --mailto proxmox@my-doimain.com --mailnotification always --node p3 --compress gzip --mode snapshot
2430 ? Ds 0:00 /usr/bin/perl /usr/sbin/pve-firewall stop
4063 pts/1 D+ 0:00 /usr/bin/perl -T /usr/sbin/pct list
25551 ? Ds 0:00 /usr/bin/perl -T /usr/bin/pvedaemon stop
28963 ? Ds 0:00 /usr/bin/perl -T /usr/bin/pvesr run --mail 1
30877 ? Ds 0:00 /usr/bin/perl -T /usr/bin/pvedaemon start
and so on.
Reboot helps. But it happens more and more frequently and rebooting it every day is not a permanent solution.
Where to look for the source of the problem? What can be done?
I have proxmox cluser, six nodes.
pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
on all of the nodes.
On one of them (with nfs-server, it's mostly used for backup) sometimes I see very big LA. 13..14... 20... and it grows bigger.
There is very little cpu usage (top says 99.6% idle), very little iowait (around zero). Very little network exchange.
When I try pct list it halts.
When I try to systemctl restart pvedaemon, pvestatd and so on I get timeout.
There is a lot of vzdump process in 'D' state (they must run on the other nodes but they wait proxmox answer forever). And a lot of the other proxmox process in the same 'D' state.
root@backup-node:~# ps ax | grep ' D'
770 ? D 0:00 /usr/bin/perl -T /usr/bin/vzdump 601 602 --node p6 --compress gzip --mailnotification always --mode snapshot --mailto proxmox@my-domain.com --storage nfs-storage --quiet 1
1026 ? D 0:00 /usr/bin/perl -T /usr/sbin/pct list
1796 ? D 0:00 /usr/bin/perl -T /usr/bin/vzdump 301 302 --quiet 1 --storage nfs-storage --mailto proxmox@my-doimain.com --mailnotification always --node p3 --compress gzip --mode snapshot
2430 ? Ds 0:00 /usr/bin/perl /usr/sbin/pve-firewall stop
4063 pts/1 D+ 0:00 /usr/bin/perl -T /usr/sbin/pct list
25551 ? Ds 0:00 /usr/bin/perl -T /usr/bin/pvedaemon stop
28963 ? Ds 0:00 /usr/bin/perl -T /usr/bin/pvesr run --mail 1
30877 ? Ds 0:00 /usr/bin/perl -T /usr/bin/pvedaemon start
and so on.
Reboot helps. But it happens more and more frequently and rebooting it every day is not a permanent solution.
Where to look for the source of the problem? What can be done?