Been using Proxmox for nearly a year stable on a single machine. Decided to add two more machines to create a cluster and eventual HA.
Steps taken on original machine (
Steps taken on
Behavior that is repeatedly observed:
New nodes become unhealthy overnight and by the next day only
I've done some googling and checked out logs with commands such as
Steps taken on original machine (
metal-01
):- Datacenter -> Cluster -> Create Cluster
Steps taken on
metal-02
and metal-03
:- Installed PVE 8.03 -> RAID-Z1 on 2x Micron 5400 960GB
- Minor NIC configurations for static IP and SAN networking
- Edited
/etc/apt/source.list
to contain the pve free repo - Commented out the enterprise repos from
/etc/apt/sources.list.d/*
- Ran apt update + upgrade
- Datacenter -> Cluster -> Join Cluster
- Edited a few Storage items as they only exist on
metal-01
but defaulted to All Nodes
Behavior that is repeatedly observed:
New nodes become unhealthy overnight and by the next day only
metal-01
is still green. The nodes metal-02
and metal-03
are unresponsive via SSH as well, they are completely locked up. A physical restart brings the node back into the cluster.I've done some googling and checked out logs with commands such as
journalctl -b -u pve-cluster -u corosync
but nothing is jumping out here.
Last edited: