All nodes in cluster have grey question marks except one

Adithya · Sep 8, 2020

All nodes in cluster have grey question marks except the node where the cluster is setup. I am able to access the shell of the all nodes but their status is 'Unknown". I referred to existing threads and tried out the following:

Restarted the nodes: This didn't fix the issue
Tried to restart proxmox services: The shell just hung and was not able to run the following scripts
- systemctl restart pvedaemon
- systemctl restart pveproxy
- systemctl restart pvestatd

If I try to run any of the other commands like 'qm help' or 'qm list', the shell just hung and was not able to run them. Here is some information regarding the cluster:

Corsync is active and running
Node List: Out of the 13 nodes present in this cluster, only 7 nodes are actively used by our team. The other nodes are not being used and are powered off. Out of these 7 nodes, only one node is online and the other 6 have unknown status. This cluster is using CIFS storage ('online'). This CIFS storage is also not accessible in the other 6 nodes through the web interface.

pveversion -v

pvecm status

pvecm updatecerts: The shell hung after printing these 2 lines

pve-cluster was active and running before the restarted pve-cluster.service
systemctl status pve-cluster.service

But after I ran the command service pve-cluster restart , it failed with the following message:

After this when I checked the status of the cluster again, I got this:

In this state, no commands were able to execute and after restart, the cluster status was back to active and running. However, the other 6 nodes' status is still unknown and I'm back to the same problem.

Kindly help me fix this problem.

wolfgang · Sep 9, 2020

Hi,

the status of your cluster is not 100% clear to me.
I guess the problem is that the CIFS was blocking the pvestatd.

Try to disable the CIFS storage and restart the pvestatd.service.

Your Proxmox VE installation is quite outdated, and we can only support versions that are in the active lifecycle [1].
So please consider updating your cluster soon.

1.) https://forum.proxmox.com/threads/proxmox-ve-support-lifecycle.35755/

Adithya · Sep 9, 2020

In the above proxmox setup, CIFS storage is mapped to the local storage of the node "ss-storage" by adding the following under /etc/samba/smb.conf of ss-storage:

[shared]
comment = ISO sharing
path = /var/lib/vz
browseable = yes
guest ok = yes
create mask = 0755
writable = yes

I tried disabling CIFS storage from the web UI under Datacenter settings. I got a connection error:

I am worried as the ISOs of all the hosted VMs reside in the shared storage and the cluster is hosted on ss-storage.

Please let me know how to proceed. Should I upgrade proxmox at this stage?

wolfgang · Sep 9, 2020

Adithya said:
Please let me know how to proceed. Should I upgrade proxmox at this stage?

No, first fix your cluster.

Adithya said:
I am worried as the ISOs of all the hosted VMs reside in the shared storage and the cluster is hosted on ss-storage.

If you disable the storage the share stay mounted and all running VMs are fine.

can you send the output of the following command

Code:

ps faxl

Adithya · Sep 9, 2020

@wolfgang
I copied the output of ps faxl to a text file and attached it.

Adithya · Sep 14, 2020

@wolfgang
Kindly let me know if there a fix for this.

wolfgang · Sep 14, 2020

Sorry for the late answer.

as the process shows you have to restart these services.
* pvedaemon
* pveproxy

Code:

systemctl restart pveproxy pveproxy

Adithya · Sep 14, 2020

Hi,
Thanks for your reply.
As mentioned before, when I try to restart any of the following, the shell is just stuck and the commands aren't getting executed.

systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pvestatd

I tried restarting pveproxy again. There is nothing displayed and the shell just hangs here:

I have no clue as to why this is happening.

wolfgang · Sep 14, 2020

The ps output that you have send looks incomplete and cut off. So it is hard to say what is the problem here.

can you check if the process pmxcfs is running?

Adithya · Sep 14, 2020

The command pmxcfs returned this:

wolfgang · Sep 14, 2020

Use this command to see if pmxcfs runs.

Code:

ps aux | grep pmxcfs

Adithya · Sep 14, 2020

I got this as the output for ps aux | grep pmxcfs :

Adithya · Sep 15, 2020

@wolfgang
Kindly let me know if there a fix for this. Do you think I can recover the cluster from this state?

wolfgang · Sep 17, 2020

I'm sorry, but this needs a deep analysis of this cluster and this goes too far for forum support.

The only hint I have is that something is blocking in Disk IO or network IO.

lazypaul · Sep 17, 2020

Hey man, I have the same problem with you. This How I fix it, I have a cluster have 39 nodes, always not stable with question mark or reboot.

1. Close all of this node.
2. Start 3 node first, then after some minute start the rest one by one

Tmanok · Apr 27, 2022

wolfgang said:
Hi,

the status of your cluster is not 100% clear to me.
I guess the problem is that the CIFS was blocking the pvestatd.

Try to disable the CIFS storage and restart the pvestatd.service.

Your Proxmox VE installation is quite outdated, and we can only support versions that are in the active lifecycle [1].
So please consider updating your cluster soon.

1.) https://forum.proxmox.com/threads/proxmox-ve-support-lifecycle.35755/

Restarting the pvestatd daemon and disabling remote storages that were offline have helped me a few times with this. The issue often arises when one node believes that NFS is active but then "panics" when an administrator queries the remote storage that is in fact unreachable.

Thanks!

Tmanok

Search

Search

All nodes in cluster have grey question marks except one

Adithya

New Member

wolfgang

Proxmox Retired Staff

Adithya

New Member

wolfgang

Proxmox Retired Staff

Adithya

New Member

Attachments

Adithya

New Member

wolfgang

Proxmox Retired Staff

Adithya

New Member

wolfgang

Proxmox Retired Staff

Adithya

New Member

wolfgang

Proxmox Retired Staff

Adithya

New Member

Adithya

New Member

wolfgang

Proxmox Retired Staff

lazypaul

Member

Tmanok

Renowned Member

We value your privacy