One server is out because of the aforementioned error, another will be out for the duration of the reboot. I will add 2 votes for a node temporarily to make sure quorum is maintained during this time.
And related to the phantom process that we see in the GUI that belonged to the server that is in error.
What can be done about that? It is not present on any physical server, can it be removed somehow?
Luckily we have the resources to move the running VMs to other servers in the cluster. I think still needs some quorum trickery to prevent loss as it is a 4 node cluster.
The hosting server thrown a CPU error according to the ilo IML logs, it cannot be powered on from ilo in this state, and nobody is on the site until sunday to try a hard reset.
Same issue here. At backup time one of the servers died. The thing is this server was the one that shared its storage via NFS and the backups were taken on that NFS share (mounted with the "hard" option.
Now we have 2 backup processes in the GUI log, one was running on the server that died, the...
Hello,
We are running Proxmox 5.3 on a cluster (did not had time to upgrade yet this one) and last weekend we had a planned downtime (changed switches). After starting up, nothing was working, the web ui came up, login failed with "No Proxmox VE services running", no VMs were started, cluster...
Hello.
We have a 5 member cluster that originally had been 3 servers running PVE 6.1. Now we added 2 more servers that were installed with the latest, 7.3 at that time. The idea is to upgrade gradually and the new servers were to test the compatibility between versions.
Unfortunately there were...
Hello,
i am not sure what field is used for user matching in the AD connector, i suppose it is samaccountname. Is there an option to change it to UPN somewhere?
Thanks.
Hmm. I did that, now it seems to be working. I restarted the pve-ha-crm and pve-ha-lrm services before, but i had reservations about corosync.
So basically all corosync instances have to be restarted in these cases, not just the non working ones.
I suppose the pve-ha-crm and pve-ha-lrm...
This is syslog from a working node. It has the "[TOTEM ] Failed to receive the leave message. failed: 4" message in it, don't know if it's relevant.
tail -f /var/log/syslog | grep -i corosync
Feb 4 15:53:59 ndi-srv-021 corosync[13960]: [QUORUM] Members[3]: 1 2 3
Feb 4 15:53:59 ndi-srv-021...
systemctl status pve-cluster
systemctl status corosync.service
This is the non working node pveversion -v:
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.3: 6.1-6...
Yes, but as for now not many VMs are marked as HA. This specific node has nothing on it since it was stopped and everything on it was transferred beforehand.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.