Proxmox & Ceph cluster reduction

alvaroag

Member
Nov 5, 2015
4
0
21
Hello everyone.

I've been working on this for a while, but recently started worried about the possible outcomes.

We initially had six PVE servers, 3 of the in north america & the other 3 in europe. All of them are also part of a Ceph cluster which holds a pool with 2 VMs, and another 2 pools for a CephFS (filesystem & metadata). In CRUSH map, all nodes have weight 1.000.

To reduce costs, I've been ordered to remove the servers in europe. So I started to do that. But I couldn't manage to exclude the first one without getting warnings from Ceph about "degraded data redundancy", each time I either used "ceph osd out" or put that node's weight to 0.000 in the Crush map.

Then disaster came. Another of the european servers hardware failed (not the one I was trying to remove). So there was no other option than forcing its removal from the cluster. by using "ceph osd destroy", then "ceph osd rm". Previous to that, "ceph osd safe-to-destroy" said no risk.

Now, other of the servers in europe is having some hardware failure. I might be able to recover it and put it up & running, but only if it is actually required, as it might take some time & effort. This time, when I run "ceph osd safe-to-destroy", it says "Error EAGAIN: OSD(s) 3 have no reported stats, and not all PGs are active+clean; we cannot draw any conclusions". That makes me doubt about its possible deletion.

My CRUSH map looks something like this:

Code:
root default {
    id -1        # do not change unnecessarily
    id -17 class ssd        # do not change unnecessarily
    # weight 5.000
    alg straw
    hash 0    # rjenkins1
    item NA1 weight 1.000
    item E1 weight 1.000
    item NA2 weight 1.000
    item NA3 weight 1.000
    item E3 weight 0.000
}
root ha-global {
    id -200        # do not change unnecessarily
    id -16 class ssd        # do not change unnecessarily
    # weight 5.000
    alg straw
    hash 0    # rjenkins1
    item NA1 weight 1.000
    item NA2 weight 1.000
    item NA3 weight 1.000
    item E1 weight 1.000
    item E3 weight 0.000
}

(This is only part of the CRUSH map. The server I'm trying to remove now is E3.)

So, my question is, given the current situation, is it safe to destroy that node from the Ceph cluster?

Also, when I check my Ceph cluster status, I have 33.333% degraded objects, with 36 pgs "active+undersized+degraded" (64 active+clean). Half of those objects are because of the recently destroyed node, and half because of the soon to be destroyed one. Is there any way to go back to a healthy cluster, considering thos PGs are never gonna be back active+clean?

As a side note, I've been able to conduct full backup of my CephFS content, as well as the VMs stored in Ceph.

Code:
# ceph versions
{
    "mon": {
        "ceph version 12.2.13 (8308e37990ecad1a20789adfd1c3d487ca084b7d) luminous (stable)": 1,
        "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 4
    },
    "mgr": {
        "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 4
    },
    "osd": {
        "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 4
    },
    "mds": {
        "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 2
    },
    "overall": {
        "ceph version 12.2.13 (8308e37990ecad1a20789adfd1c3d487ca084b7d) luminous (stable)": 1,
        "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 14
    }
}

# pveversion
pve-manager/5.4-13/aee6f0ec (running kernel: 4.15.18-25-pve)
(Same for all servers)

Thanks for any help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!