SOLVED - Stats/Graphs empty, node slow

Karsten

Member
Sep 6, 2016
6
1
23
Berlin
Hi.

We have a cluster consisting of 5 nodes, running Ceph. From two hosts there is a shared Gluster provided.

One node is... strange.

1. If I inject args in the Ceph OSDs, this node is veeery slow.
2. The node is showing no graphs in the WebGUI.
3. The pvestatd has problems:
daemon.warning: Oct 23 18:47:24 predator pvestatd[3716]: got timeout
4. When checking the systemd-state of the pvestatd on this node, it is the only node showing a pverados "thing":
CGroup: /system.slice/pvestatd.service
├─36821 pvestatd
├─37937 /usr/sbin/glusterfs --process-name fuse --volfile-server=172.18.18.2 --volfile-id=glfsProxmox /mnt/pve/glfsProxmox
└─43313 pverados
5. When calling "pvesm status" - the result takes minutes to appear.


"pvecm status" and "pvecm nodes" look good. I cannot find ANYTHING in the logs. I have no idea how to debug this issue.


Regarding the missing graphs, I already deleted the contents of /var/lib/rrdcached/db and restarted the pvestad afterwards on all nodes. I also rebooted the affected node multiple times.

All nodes are up-2-date, we are license holder, so we are running the licensed Proxmox Versions. But before I open an official ticket, I just wanted to check if you have an idea.

Any help appreciated. Attached a Strace of the "hanging" pvesm command.

Best
Karsten
 

Attachments

Last edited:
Hi.

Caused by a missing jumbo package setting on a link aggregation on switch side. So the communication did not work correctly caused by the differnt MTU settings and package-loss/-retransmitting.

So keeo an eye in case you also get such kind of problem.

Best
Karsten