One Promox showing all question marks on machines

Oct 8, 2019
16
0
1
30
Hi there,

so we have a cluster of 4 proxmox nodes running 5.4-13 and one of the nodes keeps showing question marks for all of the machines. When this happens, the graphs for the machines just show as loading.

It looks like this:
Capture.PNG

This also happened when we were on version 4. Running sudo service pvestatd restart fixes the issue, but only very briefly, a couple of minutes later it shows up like this again. This keeps happening until the node is fully restarted. I can SSH onto the node as normal and pvesm returns and shows the storages as active.

It hasn't happened to other nodes thus far. Could you offer any advice on how to find out what is causing this?

Thank you
 
Hi there,

so we have a cluster of 4 proxmox nodes running 5.4-13 and one of the nodes keeps showing question marks for all of the machines. When this happens, the graphs for the machines just show as loading.

It looks like this:
View attachment 12202

This also happened when we were on version 4. Running sudo service pvestatd restart fixes the issue, but only very briefly, a couple of minutes later it shows up like this again. This keeps happening until the node is fully restarted. I can SSH onto the node as normal and pvesm returns and shows the storages as active.

It hasn't happened to other nodes thus far. Could you offer any advice on how to find out what is causing this?

Thank you

I typically hit this due to a storage issue from nfs. Are you positive your storage isn't having issues?
 
  • Like
Reactions: adrian_vg
I typically hit this due to a storage issue from nfs. Are you positive your storage isn't having issues?
Hey, the storage is configured in the same way for each node, but only this one has the issue. Is there a quick way to check if NFS is causing it?

pvesm shows the storages as active.
 
It's also worth noting, that for the brief time I can get it showing when restarting pvestatd the containers still show up as question marks and pct list just hangs. qm list works fine and so does qm migrate. For now I'm migrating them off and then rebooting.
 
If you click on the storage from the GUI, can you actually see whats inside?
I'm not too sure what you mean. If I go to Storage in the pane it lists my storages and if I double click it just brings up their settings.

If I log onto the node via SSH and go to the MNT folder, I can browse into the storages without any problems at all.
 
I'm not too sure what you mean. If I go to Storage in the pane it lists my storages and if I double click it just brings up their settings.

If I log onto the node via SSH and go to the MNT folder, I can browse into the storages without any problems at all.

From the GUI you should be able to highlight your storage and on the right side have tab's called Summary and Content. Within Content can you see your disks/backups?
 
It seems you have some type of storage issue on that specific node.
I think it is something to do with the containers we are running. Since I can't do anything with them even from the cli. I've rebooted the node and migrated the containers to see if it makes the issue happen on another node. Thank you for your assistance.
 
I think it is something to do with the containers we are running. Since I can't do anything with them even from the cli. I've rebooted the node and migrated the containers to see if it makes the issue happen on another node. Thank you for your assistance.
Please check the syslog on the node with the issue, are there any errors around the time this happens?
 
Please check the syslog on the node with the issue, are there any errors around the time this happens?
Hello. While was ongoing there seems to be many entries in syslog saying pve-firewall[2473]: status update error: command '/sbin/iptables-save' failed: exit code 1
it seems to have happened every 10 seconds.

Towards the start of the issue happening there is also systemd[1]: Failed to propagate agent release messate: Transport endpoint is not connected

That one is in a block of 17 that all happened on the same second.
 
Maybe a firewall misconfiguration? Try to run pve-firewall compile on this host and check for issues.
 
Well this excludes the firewall as possible issue since it is not enabled. What is the output of pvecm status when this happens?
 
Well this excludes the firewall as possible issue since it is not enabled. What is the output of pvecm status when this happens?
Hi there, I was waiting for the issue to happen again.

I have moved the containers we had to another node, and now it is that node having the problem. So it seems to be linked to the containers. The firewall is also disabled on this node.

Here is the output of pvecm status:
proxmox.PNG
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!