Ceph emulating Raid 1

Oct 21, 2020
113
7
23
38
How can I make Ceph emulate some sort of RAI1?

I am aware that Ceph is a distributed file system and that if it has fewer than three nodes it does not work as it should.
I'm fighting for the third node. To put the second into production I have to be able to convince my bosses to take a 25GB switch.
In the meantime I would like to put into production the disks that I managed to get taken but with only one node I cannot have redundancy.


Is there a way or does Ceph need at least two nodes to have redundancy?
 
Yes configure RF=2 though it is not recommended for production setup as node failure will leave the setup unusable
 
ceph osd tree:
Code:
ID CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       14.88217 root default                           
-3       14.88217     host pveZZ                         
 0 NVMe1  0.45479         osd.0      up  1.00000 1.00000
 1 NVMe1  0.45479         osd.1      up  1.00000 1.00000
 2 NVMe2  6.98630         osd.2      up  1.00000 1.00000
 3 NVMe2  6.98630         osd.3      up  1.00000 1.00000

Ceph -s
Code:
  cluster:
    id:     dc0e8f62-4ab8-441b-9891-0cf905b52e87
    health: HEALTH_WARN
            1 pool(s) have no replicas configured
            Reduced data availability: 128 pgs inactive
            Degraded data redundancy: 128 pgs undersized
            mon is allowing insecure global_id reclaim
 
  services:
    mon: 1 daemons, quorum pveZZ (age 16h)
    mgr: pveZZ(active, since 16h)
    osd: 4 osds: 4 up (since 16h), 4 in (since 5d)
 
  data:
    pools:   2 pools, 256 pgs
    objects: 97.36k objects, 376 GiB
    usage:   115 GiB used, 15 TiB / 15 TiB avail
    pgs:     50.000% pgs not active
             128 active+clean
             128 undersized+peered
 
According to ceph osd tree output, you only have 1 node which is having NVME Disks. Where is the second node at all or you clipped the output?

otherwise to make it work, crush rule need to be modified but you will not get any advantage, it will just work, wont survive failure
 
I'll not work with 2 nodes, as you need quorum for monitors, so you need 3 monitors minimum.
if you do only 2 monitor, it'll go readonly if 1 host is down.
(the osd/storage can be replicated on 2 nodes : ceph osd pool set <pool name> size 2, ceph osd pool set <pool name> min_size 1
 
I'm aware that Ceph is a distributed file system and that if it has fewer than three nodes it does not work as it should.
I'm fighting for the third node. To put the second into production I have to be able to convince my bosses to take a 25GB switch.
In the meantime I would like to put into production the disks that I managed to get taken but with only one node I cannot have redundancy.

I mean redundancy on disk level, like RAID 1.
 
Well one way to start with single node setup is modify the crush rule and allow the replication across osds rather than nodes. You can do that easily by manually editing crush maps. As soon as you keep adding nodes and you have sufficient number of nodes, change the crush map again to distribute data across nodes rather than osd

for this to work make sure you have minimum 3 OSD per server and remind you this is not at all recommended setup.


If you don't have a need to go for ceph directly. You can configure LVM-thin using existing disks by configuring disks in RAID 1 either through BIOS or through ZFS RAID 1 and later migrate data when you have sufficient nodes for CEPH
 
Follow this command to start with single node ceph

Assumption: You have single node, You have configured a cluster already with single node member as of now

pveceph init --network=172.19.X.Y/24. == Specify your 10G or higher interface network here

pveceph mon create

find the list of drives using command
lsblk -f
I have 3 disks added for testing /dev/sdb, /dev/sdc, /dev/sdd


pveceph osd create /dev/sdb --crush-device-class hdd
pveceph osd create /dev/sdb --crush-device-class hdd
pveceph osd create /dev/sdb --crush-device-class hdd

Choose your device type properly, if you have ssd use "ssd" after crush-device-class


Now its time to edit the crush map

ceph osd getcrushmap -o crush.bin


crushtool -d crush.bin -o crush.txt

edit the crush.txt file
and you will find something like this


rule replicated_rule {


id 0


type replicated


min_size 1


max_size 10


step take default


step chooseleaf firstn 0 type host


step emit


}


Change this line step chooseleaf firstn 0 type host. to step chooseleaf firstn 0 type osd


crushtool -c crush.txt -o crushnew.bin



ceph osd setcrushmap -i crushnew.bin



Verify the data placement, now create a pool with size 2



pveceph pool create vm2 --size 2


Create a dummy file

dd if=/dev/zero of=test bs=1M count=4

put the file in the rados


rados -p vm2 put obj1 test

Verify the data placement

ceph osd map vm2 obj1

osdmap e25 pool 'vm2' (2) object 'obj1' -> pg 2.6cf8deff (2.7f) -> up ([2,1], p2) acting ([2,1], p2)


Check the health status


ceph -s


cluster:


id: fad45a97-e141-49d0-8485-14389d69204c


health: HEALTH_OK





services:


mon: 1 daemons, quorum pve1 (age 18m)


mgr: pve1(active, since 18m)


osd: 3 osds: 3 up (since 14m), 3 in (since 14m)





data:


pools: 1 pools, 128 pgs


objects: 1 objects, 4 MiB


usage: 3.0 GiB used, 297 GiB / 300 GiB avail


pgs: 128 active+clean

I hope it helps
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!