I have a total of four Proxmox 4.4 servers running in a production environment.
They are of differing ages, the fourth, vhost4, was added recently and when it was the other three were recreated with Proxmox 4.4 in such a way as to minimize service interruptions.
When it was first installed vhost4 was created with clustering in mind and as such, before any VMs were added to it, was made the first node in a new cluster. This cluster was set up to make use of a separate network, ala https://pve.proxmox.com/wiki/Separate_Cluster_Network, and no problems were found when this was done.
It is now at the point that I am ready to add the other nodes to the cluster and in preparation of this have moved all VMs from vhost2 to another vhost for temporary hosting. I then ran the command 'pvecm add vhost4 -ring0_addr network2IP'. This ended up pausing at the waiting for quorum... stage and would not continue. After waiting overnight, having started this at the end of the day, I found it at the same state the next morning. I hit CTRL-C to stop it and it did and started looking into possible fixes.
After removing vhost2 from the cluster, trying a fix, than adding it back in a few times I finally found a partial fix.
First I set the bridge interface to multicast on both nodes by adding post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier ) to the bridge configuration in /etc/network/interfaces.
I also made certain that the SSH public key for the opposite node was set in the /etc/pve/prive/authorized_hosts file on each node before running the pvecm add command. Not certain if this was necessary but it seemed to help at the time.
I now have both nodes showing up in the pvecm status output, so far so good.
Now I am at another impasse that my Google search cannot seem to find any direct references to.
When accessing the web interface on vhost2 I am asked to login and can do so with root and the Linux PAM authentication realm, yes I know its not a good practice. I see both nodes listed with the green checkmark.
Within a few moments I get the message Connection error 401: 403 Permission check failed (permission denied - invalid PVE ticket) and am asked to login again. This is a never ending loop. This does not occur on vhost4.
Another possibly related issue is when accessing the web interface on vhost4. When trying to access any screen involving vhost2 I get the message ssl3_get_server_certificate: certificate verify failed (596). A quick note is that the SSL certs are self-created and self-signed per machine so this may be part of the issue. However these machines are not facing the Internet so no SSL cert will ever be purchased for them.
I have tried restarting vhost2, the newly added cluster node, but to no affect. I have tried running pvecm updatecerts to no affect.
What I cannot do at the moment is modify vhost4 as it is in production with a number of VMs running on it. This means I cannot restart the host, any VM, the network, or run an update. I know it narrows my possible options but there is no way around this.
Any ideas would be helpful.
They are of differing ages, the fourth, vhost4, was added recently and when it was the other three were recreated with Proxmox 4.4 in such a way as to minimize service interruptions.
When it was first installed vhost4 was created with clustering in mind and as such, before any VMs were added to it, was made the first node in a new cluster. This cluster was set up to make use of a separate network, ala https://pve.proxmox.com/wiki/Separate_Cluster_Network, and no problems were found when this was done.
It is now at the point that I am ready to add the other nodes to the cluster and in preparation of this have moved all VMs from vhost2 to another vhost for temporary hosting. I then ran the command 'pvecm add vhost4 -ring0_addr network2IP'. This ended up pausing at the waiting for quorum... stage and would not continue. After waiting overnight, having started this at the end of the day, I found it at the same state the next morning. I hit CTRL-C to stop it and it did and started looking into possible fixes.
After removing vhost2 from the cluster, trying a fix, than adding it back in a few times I finally found a partial fix.
First I set the bridge interface to multicast on both nodes by adding post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier ) to the bridge configuration in /etc/network/interfaces.
I also made certain that the SSH public key for the opposite node was set in the /etc/pve/prive/authorized_hosts file on each node before running the pvecm add command. Not certain if this was necessary but it seemed to help at the time.
I now have both nodes showing up in the pvecm status output, so far so good.
Now I am at another impasse that my Google search cannot seem to find any direct references to.
When accessing the web interface on vhost2 I am asked to login and can do so with root and the Linux PAM authentication realm, yes I know its not a good practice. I see both nodes listed with the green checkmark.
Within a few moments I get the message Connection error 401: 403 Permission check failed (permission denied - invalid PVE ticket) and am asked to login again. This is a never ending loop. This does not occur on vhost4.
Another possibly related issue is when accessing the web interface on vhost4. When trying to access any screen involving vhost2 I get the message ssl3_get_server_certificate: certificate verify failed (596). A quick note is that the SSL certs are self-created and self-signed per machine so this may be part of the issue. However these machines are not facing the Internet so no SSL cert will ever be purchased for them.
I have tried restarting vhost2, the newly added cluster node, but to no affect. I have tried running pvecm updatecerts to no affect.
What I cannot do at the moment is modify vhost4 as it is in production with a number of VMs running on it. This means I cannot restart the host, any VM, the network, or run an update. I know it narrows my possible options but there is no way around this.
Any ideas would be helpful.
Last edited: