Proxmox VE 8 Cluster - New nodes unhealthy

chedda7 · Oct 17, 2023

Been using Proxmox for nearly a year stable on a single machine. Decided to add two more machines to create a cluster and eventual HA.

Steps taken on original machine (metal-01):

Datacenter -> Cluster -> Create Cluster

Steps taken on metal-02 and metal-03:

Installed PVE 8.03 -> RAID-Z1 on 2x Micron 5400 960GB
Minor NIC configurations for static IP and SAN networking
Edited /etc/apt/source.list to contain the pve free repo
Commented out the enterprise repos from /etc/apt/sources.list.d/*
Ran apt update + upgrade
Datacenter -> Cluster -> Join Cluster

Steps taken at Cluster level post creation:

Edited a few Storage items as they only exist on metal-01 but defaulted to All Nodes

Behavior that is repeatedly observed:
New nodes become unhealthy overnight and by the next day only metal-01 is still green. The nodes metal-02 and metal-03 are unresponsive via SSH as well, they are completely locked up. A physical restart brings the node back into the cluster.

I've done some googling and checked out logs with commands such as journalctl -b -u pve-cluster -u corosync but nothing is jumping out here.

Maximiliano · Oct 19, 2023

Hello whats the state of `pvecm status`?

Do you know the timestamp on which nodes 2 and 3 disconnected from the cluster? It would be interesting to check the system logs (not just of pve-cluster and corosync) around that timestamp on all nodes.

chedda7 · Oct 23, 2023

We opened a support ticket, it's looking like this is related to the ASMedia controller they were on. We moved the drives to the native SATA bridge and things have been more stable.

Search

Search

Proxmox VE 8 Cluster - New nodes unhealthy

chedda7

Member

Maximiliano

Proxmox Staff Member

chedda7

Member

We value your privacy