So, I've been poking around the various threads on the forum, and tried some of the things suggested, but nothing is fitting exactly.
So, I inherited support of a Proxmox 4.3 cluster (4 nodes), plus a fifth node that my predecessor tried to add, but failed. He suspected it was because it was installed with 4.4. Since it generated lots of errors, and he was retiring, he just shut it off.
I had one of the four active nodes run into a problem with its ZFS volume -- pretty much any zpool command caused zpool to deadlock. I eventually migrated everything off of it to the other three nodes, and reinstalled. And then upgraded to 4.4 using apt-get dist-upgrade.
Then I decided to tackle the fifth node. I followed the steps to remove the node, powered off, installed it from a 4.4 ISO, and tried to join the node using pvecm add <IPaddress>. It looks like it works, then fails, telling me to run systemctl status pve-cluster.service and journalctl -xn to see why. From those, I see that Transport endpoint is not connected and, yeah, /etc/pve just isn't there.
I reboot the node. And it briefly appears in the GUI and when I run various pvecm commands from the other nodes. And then becomes red. I see corosync [TOTEM ] FAILED TO RECEIVE when I check corosync.service.
Then I left it alone for several days. I got a chance to get back to it today, and noticed that proxa5 isn't listed when I run pvecm nodes from the cluster. So, I tried to run pvecm add <IPaddress> and it complained with authentication key already exists. I also tried pvecm add <IPaddress> -force -- which got me back to Transport endpoint is not connected.
Trying to decide what my next steps are -- would appreciate any and all advice. Thanks!!
So, I inherited support of a Proxmox 4.3 cluster (4 nodes), plus a fifth node that my predecessor tried to add, but failed. He suspected it was because it was installed with 4.4. Since it generated lots of errors, and he was retiring, he just shut it off.
I had one of the four active nodes run into a problem with its ZFS volume -- pretty much any zpool command caused zpool to deadlock. I eventually migrated everything off of it to the other three nodes, and reinstalled. And then upgraded to 4.4 using apt-get dist-upgrade.
Then I decided to tackle the fifth node. I followed the steps to remove the node, powered off, installed it from a 4.4 ISO, and tried to join the node using pvecm add <IPaddress>. It looks like it works, then fails, telling me to run systemctl status pve-cluster.service and journalctl -xn to see why. From those, I see that Transport endpoint is not connected and, yeah, /etc/pve just isn't there.
I reboot the node. And it briefly appears in the GUI and when I run various pvecm commands from the other nodes. And then becomes red. I see corosync [TOTEM ] FAILED TO RECEIVE when I check corosync.service.
Then I left it alone for several days. I got a chance to get back to it today, and noticed that proxa5 isn't listed when I run pvecm nodes from the cluster. So, I tried to run pvecm add <IPaddress> and it complained with authentication key already exists. I also tried pvecm add <IPaddress> -force -- which got me back to Transport endpoint is not connected.
Trying to decide what my next steps are -- would appreciate any and all advice. Thanks!!