Search results

  1. D

    I/O Performance issues with Ceph

    After extensive testing today it turns out we've got some crap NVME drives (40 of them).
  2. D

    I/O Performance issues with Ceph

    We've had Proxmox with Ceph for over 5 years now and have deployed a production cluster to move off of VMware and NetApp. We've got about 60T of NVME in a dedicated pool, 15T of SSD and 20T of HDD fronted by SSD configured in Ceph. Overview: 5 dedicated storage nodes and 4 compute nodes with...
  3. D

    corosync split, some nodes not rejoining

    I've removed all the nodes with the myricom interfaces from the cluster, although they are still part of CEPH. I also had a node in a data closet, connected over 10G long haul that I removed from the cluster and now everything seems good. What I don't like -- a single node shouldn't be able to...
  4. D

    corosync split, some nodes not rejoining

    Also, versions. root@pve01:~# pveversion -v proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve) pve-manager: 6.1-8 (running version: 6.1-8/806edfe1) pve-kernel-helper: 6.1-8 pve-kernel-5.3: 6.1-6 pve-kernel-5.3.18-3-pve: 5.3.18-3 pve-kernel-5.3.18-2-pve: 5.3.18-2 pve-kernel-5.3.10-1-pve: 5.3.10-1...
  5. D

    corosync split, some nodes not rejoining

    I've downed corosync on the nodes that are problematic as mentioned above. It is running on all other nodes that aren't misbehaving. Those other nodes are still not rejoining the cluster even though they are connected root@pve01:~# corosync-cfgtool -s Printing link status. Local node ID 1...
  6. D

    corosync split, some nodes not rejoining

    My CEPH is also working perfectly fine through this. I suspect this is a corosync3 issue but I cannot find the way to keep my nodes synced any more.
  7. D

    corosync split, some nodes not rejoining

    Each host has two one gig network ports connected to physical switches that are "old school" segmented. One is SERVER, one is BASTION. The cluster IP lives in SERVER (for management via web and joining). There is a top of rack 10G cisco Nexus that is trunked to our core for the rest of our...
  8. D

    corosync split, some nodes not rejoining

    I've now got 10 nodes online and stable. Of the remaining 6 if I bring corosync up it will start splitting off other nodes. Five of these nodes have myricom 10G network cards in them -- they are the only nodes with them (but they were working fine in the cluster since november). pve06 has an...
  9. D

    corosync split, some nodes not rejoining

    First time posting, I'm axle wrapped around this one. I have a 15 node PVE cluster with CEPH. It has been running peachy since November. Today I went to add another node and it hung on waiting for quorum (I added at the command line). Eventually I had to kill the join. At this point all 15...

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!