Hi.
We have a cluster consisting of 5 nodes, running Ceph. From two hosts there is a shared Gluster provided.
One node is... strange.
1. If I inject args in the Ceph OSDs, this node is veeery slow.
2. The node is showing no graphs in the WebGUI.
3. The pvestatd has problems:
daemon.warning: Oct 23 18:47:24 predator pvestatd[3716]: got timeout
4. When checking the systemd-state of the pvestatd on this node, it is the only node showing a pverados "thing":
CGroup: /system.slice/pvestatd.service
├─36821 pvestatd
├─37937 /usr/sbin/glusterfs --process-name fuse --volfile-server=172.18.18.2 --volfile-id=glfsProxmox /mnt/pve/glfsProxmox
└─43313 pverados
5. When calling "pvesm status" - the result takes minutes to appear.
"pvecm status" and "pvecm nodes" look good. I cannot find ANYTHING in the logs. I have no idea how to debug this issue.
Regarding the missing graphs, I already deleted the contents of /var/lib/rrdcached/db and restarted the pvestad afterwards on all nodes. I also rebooted the affected node multiple times.
All nodes are up-2-date, we are license holder, so we are running the licensed Proxmox Versions. But before I open an official ticket, I just wanted to check if you have an idea.
Any help appreciated. Attached a Strace of the "hanging" pvesm command.
Best
Karsten
We have a cluster consisting of 5 nodes, running Ceph. From two hosts there is a shared Gluster provided.
One node is... strange.
1. If I inject args in the Ceph OSDs, this node is veeery slow.
2. The node is showing no graphs in the WebGUI.
3. The pvestatd has problems:
daemon.warning: Oct 23 18:47:24 predator pvestatd[3716]: got timeout
4. When checking the systemd-state of the pvestatd on this node, it is the only node showing a pverados "thing":
CGroup: /system.slice/pvestatd.service
├─36821 pvestatd
├─37937 /usr/sbin/glusterfs --process-name fuse --volfile-server=172.18.18.2 --volfile-id=glfsProxmox /mnt/pve/glfsProxmox
└─43313 pverados
5. When calling "pvesm status" - the result takes minutes to appear.
"pvecm status" and "pvecm nodes" look good. I cannot find ANYTHING in the logs. I have no idea how to debug this issue.
Regarding the missing graphs, I already deleted the contents of /var/lib/rrdcached/db and restarted the pvestad afterwards on all nodes. I also rebooted the affected node multiple times.
All nodes are up-2-date, we are license holder, so we are running the licensed Proxmox Versions. But before I open an official ticket, I just wanted to check if you have an idea.
Any help appreciated. Attached a Strace of the "hanging" pvesm command.
Best
Karsten
Attachments
Last edited: