Proxmox VE 8 Cluster - New nodes unhealthy

chedda7

Member
Apr 18, 2020
4
0
21
40
Been using Proxmox for nearly a year stable on a single machine. Decided to add two more machines to create a cluster and eventual HA.

Steps taken on original machine (metal-01):
  1. Datacenter -> Cluster -> Create Cluster

Steps taken on metal-02 and metal-03:
  1. Installed PVE 8.03 -> RAID-Z1 on 2x Micron 5400 960GB
  2. Minor NIC configurations for static IP and SAN networking
  3. Edited /etc/apt/source.list to contain the pve free repo
  4. Commented out the enterprise repos from /etc/apt/sources.list.d/*
  5. Ran apt update + upgrade
  6. Datacenter -> Cluster -> Join Cluster
Steps taken at Cluster level post creation:
  1. Edited a few Storage items as they only exist on metal-01 but defaulted to All Nodes

Behavior that is repeatedly observed:
New nodes become unhealthy overnight and by the next day only metal-01 is still green. The nodes metal-02 and metal-03 are unresponsive via SSH as well, they are completely locked up. A physical restart brings the node back into the cluster.

I've done some googling and checked out logs with commands such as journalctl -b -u pve-cluster -u corosync but nothing is jumping out here.
 
Last edited:
Hello whats the state of `pvecm status`?

Do you know the timestamp on which nodes 2 and 3 disconnected from the cluster? It would be interesting to check the system logs (not just of pve-cluster and corosync) around that timestamp on all nodes.
 
We opened a support ticket, it's looking like this is related to the ASMedia controller they were on. We moved the drives to the native SATA bridge and things have been more stable.