All nodes in cluster have grey question marks except one

Adithya

New Member
Feb 18, 2020
14
0
1
25
All nodes in cluster have grey question marks except the node where the cluster is setup. I am able to access the shell of the all nodes but their status is 'Unknown". I referred to existing threads and tried out the following:
  1. Restarted the nodes: This didn't fix the issue
  2. Tried to restart proxmox services: The shell just hung and was not able to run the following scripts
    • systemctl restart pvedaemon
    • systemctl restart pveproxy
    • systemctl restart pvestatd
If I try to run any of the other commands like 'qm help' or 'qm list', the shell just hung and was not able to run them. Here is some information regarding the cluster:

Corsync is active and running
Node List: Out of the 13 nodes present in this cluster, only 7 nodes are actively used by our team. The other nodes are not being used and are powered off. Out of these 7 nodes, only one node is online and the other 6 have unknown status. This cluster is using CIFS storage ('online'). This CIFS storage is also not accessible in the other 6 nodes through the web interface.
1599575706246.png

pveversion -v
1599575722313.png

pvecm status
1599575732382.png

pvecm updatecerts: The shell hung after printing these 2 lines
1599575756298.png

pve-cluster was active and running before the restarted pve-cluster.service
systemctl status pve-cluster.service
1599575834391.png

But after I ran the command service pve-cluster restart , it failed with the following message:
1599575917537.png

After this when I checked the status of the cluster again, I got this:
1599575970483.png

In this state, no commands were able to execute and after restart, the cluster status was back to active and running. However, the other 6 nodes' status is still unknown and I'm back to the same problem.

Kindly help me fix this problem.
 
Hi,

the status of your cluster is not 100% clear to me.
I guess the problem is that the CIFS was blocking the pvestatd.

Try to disable the CIFS storage and restart the pvestatd.service.

Your Proxmox VE installation is quite outdated, and we can only support versions that are in the active lifecycle [1].
So please consider updating your cluster soon.

1.) https://forum.proxmox.com/threads/proxmox-ve-support-lifecycle.35755/
 
  • Like
Reactions: Tmanok
In the above proxmox setup, CIFS storage is mapped to the local storage of the node "ss-storage" by adding the following under /etc/samba/smb.conf of ss-storage:

[shared]
comment = ISO sharing
path = /var/lib/vz
browseable = yes
guest ok = yes
create mask = 0755
writable = yes


I tried disabling CIFS storage from the web UI under Datacenter settings. I got a connection error:
1599640638840.png

I am worried as the ISOs of all the hosted VMs reside in the shared storage and the cluster is hosted on ss-storage.

Please let me know how to proceed. Should I upgrade proxmox at this stage?
 
Last edited:
Please let me know how to proceed. Should I upgrade proxmox at this stage?
No, first fix your cluster.


I am worried as the ISOs of all the hosted VMs reside in the shared storage and the cluster is hosted on ss-storage.
If you disable the storage the share stay mounted and all running VMs are fine.

can you send the output of the following command

Code:
ps faxl
 
@wolfgang
I copied the output of ps faxl to a text file and attached it.
 

Attachments

  • ps faxl output.txt
    15 KB · Views: 8
Last edited:
Sorry for the late answer.

as the process shows you have to restart these services.
* pvedaemon
* pveproxy

Code:
systemctl restart pveproxy pveproxy
 
Hi,
Thanks for your reply.
As mentioned before, when I try to restart any of the following, the shell is just stuck and the commands aren't getting executed.
  • systemctl restart pvedaemon
  • systemctl restart pveproxy
  • systemctl restart pvestatd
I tried restarting pveproxy again. There is nothing displayed and the shell just hangs here:
1600061223426.png
I have no clue as to why this is happening.
 
The ps output that you have send looks incomplete and cut off. So it is hard to say what is the problem here.

can you check if the process pmxcfs is running?
 
Use this command to see if pmxcfs runs.

Code:
ps aux | grep pmxcfs
 
I'm sorry, but this needs a deep analysis of this cluster and this goes too far for forum support.

The only hint I have is that something is blocking in Disk IO or network IO.
 
Hey man, I have the same problem with you. This How I fix it, I have a cluster have 39 nodes, always not stable with question mark or reboot.

1. Close all of this node.
2. Start 3 node first, then after some minute start the rest one by one
 
Hi,

the status of your cluster is not 100% clear to me.
I guess the problem is that the CIFS was blocking the pvestatd.

Try to disable the CIFS storage and restart the pvestatd.service.

Your Proxmox VE installation is quite outdated, and we can only support versions that are in the active lifecycle [1].
So please consider updating your cluster soon.

1.) https://forum.proxmox.com/threads/proxmox-ve-support-lifecycle.35755/
Restarting the pvestatd daemon and disabling remote storages that were offline have helped me a few times with this. The issue often arises when one node believes that NFS is active but then "panics" when an administrator queries the remote storage that is in fact unreachable.

Thanks!

Tmanok
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!