Hi,
As some of you have already noticed in other threads, I have huge problems with the cluster and more precisely with adding a node and qdevice for quorum.
Without writing too much, I'll describe my issues.
I migrated from VMware to Proxmox, and that part was successful. I have two separate nodes which I wanted to connect in one cluster as I have done in VMware by vcenter. It's not my first time creating the Proxmox cluster (I've used it since about 2018) so I thought it'd be fast and clear.
My setup:
I have two nodes. One is working as production now and has many of the VMs. I created the cluster on it without any issues. The second one was the non-prod, but now I would like to add it to a cluster and make the rightful member from it. I also have a network storage which I would like to use as a qdevice for the quorum.
Components:
Network configuration:
My problems started with adding a node. I couldn't add one due to some hostname issues. I was using the option "
which told me literally nothing, especially when I could ping my hosts via hostnames, and also connect between them via SSH using the hostnames without any issues.
I tried to modify my hosts file in many different ways without any result. After hours of trying, I finally figured out what was the case for this behavior.
The solution was to use the FQDN in the command (for both, cluster and node) instead of the hostname or IP for adding the node to the cluster. I don't know why it worked like that, don't ask me, but it worked for me at last.
As I wrote above, I have been doing this for years and never had such issues.
Ok, let's move on. So, when the command finally worked, here we go again.
I am stuck on the "
I have one small problem also. While trying to make everything work, I changed vote counts to one. Now, when I try to change it back to 2 votes, I've got the message:
And the last one -- Qdevice setup.
I prepared everything according to the documentation and ran the commands, one on PVE which acts as a cluster "master" and the second on OMV which should be the qdevice vote. Everything worked fine (added successfully), but my qdevice has no vote:
What's wrong with that?
Finally, I would like to achieve one of the below scenarios:
1. Creating the working production cluster with a separate network for corosync and storage migration*.
2. Just returning to two single nodes by rolling back all of my changes.
* here comes the question, is this possible at all - in VMware I had to pay for the "storage vmotion" functionality, but finally it worked. On PVE 6 I've tried to move my VM's between storage, but I'm not sure if it was copied via storage/corosync network. But ok, it's not a case and I won't die without that
My last and greatest observation. I think your documentation is lack of technical information. There are no steps in case of disaster and some failures. No points to logs where to find some issues. No instruction on how to roll back some changes or what folders should be backed up for security and the possibility of backing with configuration. Especially when you made a lot of radical changes in your engine.
Please rely here on the documentation of VMware, Oracle, or Red Hat, etc.
As some of you have already noticed in other threads, I have huge problems with the cluster and more precisely with adding a node and qdevice for quorum.
Without writing too much, I'll describe my issues.
I migrated from VMware to Proxmox, and that part was successful. I have two separate nodes which I wanted to connect in one cluster as I have done in VMware by vcenter. It's not my first time creating the Proxmox cluster (I've used it since about 2018) so I thought it'd be fast and clear.
My setup:
I have two nodes. One is working as production now and has many of the VMs. I created the cluster on it without any issues. The second one was the non-prod, but now I would like to add it to a cluster and make the rightful member from it. I also have a network storage which I would like to use as a qdevice for the quorum.
Components:
- 2x PVE 8.1.3
- 1x OMV as a SAN for VMs' storage, planned as qdevice quorum.
Network configuration:
- Each one of the planned cluster members has 2 separate network cards: one for VMs' LAN and a second one for SAN (AKA storage network) which was planned for corosync communication and VM storage migration (in VMware it is a storage vmotion).
My problems started with adding a node. I couldn't add one due to some hostname issues. I was using the option "
5.4.3. Adding Nodes with Separated Cluster Network
" from Proxmox documentation. The hosts files were correct on all nodes, I could ping the hosts by hostname, and I could connect between them via SSH (passwordless by SSH-keys). I always got the below message:500 Can't connect to ip.ip.ip.ip:8006 (hostname verification failed)
which told me literally nothing, especially when I could ping my hosts via hostnames, and also connect between them via SSH using the hostnames without any issues.
I tried to modify my hosts file in many different ways without any result. After hours of trying, I finally figured out what was the case for this behavior.
The solution was to use the FQDN in the command (for both, cluster and node) instead of the hostname or IP for adding the node to the cluster. I don't know why it worked like that, don't ask me, but it worked for me at last.
As I wrote above, I have been doing this for years and never had such issues.
Ok, let's move on. So, when the command finally worked, here we go again.
I am stuck on the "
waiting for quorum...
" message and all files in the "/etc/pve
" folder on adding node were deleted, which completely ruined my PVE2 configuration by deleting ssh-keys and SSL certs also. And here comes my question also:Why the hell, you don't create the backup of the "/etc/pve" folder during the adding a node process? I know that you probably expect to do it from admins, but I think that you should have it in your process just in case of failure.
I have one small problem also. While trying to make everything work, I changed vote counts to one. Now, when I try to change it back to 2 votes, I've got the message:
Code:
pvecm expected 2
Unable to set expected votes: CS_ERR_INVALID_PARAM
Code:
Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate
And the last one -- Qdevice setup.
I prepared everything according to the documentation and ran the commands, one on PVE which acts as a cluster "master" and the second on OMV which should be the qdevice vote. Everything worked fine (added successfully), but my qdevice has no vote:
0x00000000 0 Qdevice (votes 0)
What's wrong with that?
Finally, I would like to achieve one of the below scenarios:
1. Creating the working production cluster with a separate network for corosync and storage migration*.
2. Just returning to two single nodes by rolling back all of my changes.
* here comes the question, is this possible at all - in VMware I had to pay for the "storage vmotion" functionality, but finally it worked. On PVE 6 I've tried to move my VM's between storage, but I'm not sure if it was copied via storage/corosync network. But ok, it's not a case and I won't die without that
My last and greatest observation. I think your documentation is lack of technical information. There are no steps in case of disaster and some failures. No points to logs where to find some issues. No instruction on how to roll back some changes or what folders should be backed up for security and the possibility of backing with configuration. Especially when you made a lot of radical changes in your engine.
Please rely here on the documentation of VMware, Oracle, or Red Hat, etc.
Last edited: