[SOLVED] New Cluster Reboot Problems: NTP, Adding a new Node

LBX_Blackjack

Member
Jul 24, 2020
35
1
8
30
Hello all,
[Background] We recently set up our first Proxmox cluster with four new HP ProLiant DL360 Gen9 servers and a HP-2530-48G (J9775A) switch. The servers' NICs configured to form LACP bonds to the switch, and we have set up CEPH and HA in Proxmox.

[Issue 1] We were getting random reboots of nodes or the whole cluster, but put a band-aid on it by creating a job that updates the Time/Date from a public NTP server every minute. Clearly this isn't ideal, so we were hoping for some advise on how to properly fix this issue.

[Issue 2] The second issue we're having is that we're trying to add a fifth server (on a second switch) into the cluster and CEPH pool, but it's causing nodes to spontaneously reboot.

Any leads would be greatly appreciated.
 
These problems are most likely due to congestion/saturation of your network. LACP can only share sessions based on the hashing policy configured on the nodes AND the switch, any mismatch produces out-of-order packets and therefore re-transmissions and/or latency.

I have built a similar cluster on like hardware for testing/proof of concept, but you MUST separate networks by function -- VLANS are not sufficient.

1. Front-end Network -- vmbr0; node mgmt GUI/SSH; VM networking (with VLANS, etc) -- can be active/backup or LACP bond
2. PVE Cluster (Corosync/Kronosnet) -- dedicated -- nothing else runs on this network -- 1G is sufficient [0]
3. Ceph Public/Cluster -- separate network for CEPH -- can be LACP with layer3+4 hashing on BOTH switch and nodes. [1], [2]

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_cluster_network
[1] https://docs.ceph.com/docs/master/rados/configuration/network-config-ref/
[2] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_precondition

With some tweaking of scrubbing, logging, etc., CEPH will perform minimally over 1G links/bonds, but keep in mind it takes about 3 HOURS to replicate 1TB of data over 1Gbps. Also, systemd-timesyncd never performed well enough for me -- I replaced it with Chrony and set the same upstream NTP server on all nodes with fallback to each other (so loss of NTP causes the nodes to drift TOGETHER).

As I said, this will work for PoC -- you absolutely MUST upgrade the CEPH NICs and switch to 10G before going to production or you will suffer exponentially degraded performance with the load/VMs/etc. you place on the cluster!
 
Well that makes a lot of sense. Our network unfortunately can't support those requirements, so we will have to do without CEPH and HA. Thank you very much!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!