We've had Proxmox with Ceph for over 5 years now and have deployed a production cluster to move off of VMware and NetApp. We've got about 60T of NVME in a dedicated pool, 15T of SSD and 20T of HDD fronted by SSD configured in Ceph.
Overview:
5 dedicated storage nodes and 4 compute nodes with...
I've removed all the nodes with the myricom interfaces from the cluster, although they are still part of CEPH. I also had a node in a data closet, connected over 10G long haul that I removed from the cluster and now everything seems good.
What I don't like -- a single node shouldn't be able to...
I've downed corosync on the nodes that are problematic as mentioned above. It is running on all other nodes that aren't misbehaving. Those other nodes are still not rejoining the cluster even though they are connected
root@pve01:~# corosync-cfgtool -s
Printing link status.
Local node ID 1...
Each host has two one gig network ports connected to physical switches that are "old school" segmented. One is SERVER, one is BASTION. The cluster IP lives in SERVER (for management via web and joining). There is a top of rack 10G cisco Nexus that is trunked to our core for the rest of our...
I've now got 10 nodes online and stable. Of the remaining 6 if I bring corosync up it will start splitting off other nodes. Five of these nodes have myricom 10G network cards in them -- they are the only nodes with them (but they were working fine in the cluster since november). pve06 has an...
First time posting, I'm axle wrapped around this one.
I have a 15 node PVE cluster with CEPH. It has been running peachy since November. Today I went to add another node and it hung on waiting for quorum (I added at the command line). Eventually I had to kill the join. At this point all 15...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.