Question about High aviability and ceph with(ou) RAID

supervache

Member
Dec 6, 2019
27
0
6
34
Hello.
I have a cluster with 6 proxmox nodes. My developer team uses it every day and we are very happy with it.

Now we would like to use this cluster in production, and therefore add high availability (being able to migrate the CTs from one node to another, and be resilient to the failure of one of the servers).

Each node is different (they are in a datacenter at OVH):
Some have hard drives, others have SSD / NVME and some have both.

I wanted to try ceph, but I realized that ceph refuses to run with RAID volumes (same software raid).

So I have a few questions:
1- can we configure 2 identical disks on the same node in a ceph cluster?

2- can we do 2 ceph volumes (one faster with SSD / NVME, the other slower with hard drives)?

3- Can we use ceph with a partition? Some servers have only 2 hard disks, a raid partition for proxmox, and another mounted in / var / lib / vz / datastorage-hdd for CTs. I would like to keep the raid for proxmox and use the datastorage-hdd partition for ceph (see question 1- since I would have 2 identical partitions without RAID)

4- Can we use another technology like glusterfs instead of ceph if it is more flexible for my needs? and how to implement this solution if the answer is yes?

Thanks for your help
 
1- can we configure 2 identical disks on the same node in a ceph cluster?
Yes, you can have multiple OSD (ceph's backing disks) on each node without an issue, they somewhat are a "software RAID" (but together with all other nodes and the CRUSH algorithm as data balancer).

2- can we do 2 ceph volumes (one faster with SSD / NVME, the other slower with hard drives)?

You can have different pools with different CRUSH rules, which then target only certain OSD devices (or device classes).
So yes, you could have two pools, one with the data on the faster devices, one with the data on the slower ones.

3- Can we use ceph with a partition? Some servers have only 2 hard disks, a raid partition for proxmox, and another mounted in / var / lib / vz / datastorage-hdd for CTs. I would like to keep the raid for proxmox and use the datastorage-hdd partition for ceph (see question 1- since I would have 2 identical partitions without RAID)

Yes, but needs some manual work, currently the webinterface does not allows to select single partitions. So you need to use ceph-volume lvm create /dev/... from the CLI, after initial ceph installation on all nodes and setup of monitors this will integrate into PVE-Ceph just fine.
But, be aware that OSDs can have high IO, and are intended to just swap out. So it's really good to separate them from the OS, IMO, as else if that fails you have two problems, recovering OSD and RAID to resilver.. I'd not recommend it outside from testing.

Only 2 disk is IMO rather too little for a Ceph server, but you could add only one OSD and use those servers more for "compute" as they have less to do with ceph and so more CPU resources available (depends naturally on what CPUs are in your cluster).
But as they're hosted by OVH you won't get to more disks to easily, at least not cheaply, else I'd recommend to add a small SSD for the OS (they better known ones are really resilient, IMO) and use the tow other disks as CEPH OSD only. You can then move CTs to Ceph and have them available on all nodes.

4- Can we use another technology like glusterfs instead of ceph if it is more flexible for my needs? and how to implement this solution if the answer is yes?

GlusterFS is much less flexible than Ceph regarding ease of scale and recoverability in case of HW failure of any sort.
Most setups I know are three node setups, or two + arbiter. If I'd go now for a cluster with storage I'd go for ceph.
Proxmox VE has integration for a lot of management for Ceph, but none for GlusterFS (FYI).
 
Hi.
Thank you for answer.

So I took 4 test server (2 with SSD/NVME and 2 with HDD). I've intall proxmox on a raid partition, then I manually create 2 "no-raid" partitions using fdsik and mkfs.ext4 an i mount them on /mnt/sda4 and /mnt/sdb4 (for hdd serveur).

I have create a cluster, an join all serveurs in my cluster. (I gives them 2 links : private IP as link0 and public IP as link1. It's probably an error, because serveurs communicates via public IP (link1) without firewall rules on them and I don't know why.... whatever, It works for my POC.

I have also install ceph from web GUI.

I have 2 problems :

1)
On my firts node (proxmox-0007), all seems good. But on other node (E.G proxmox-0008), I can finish the ceph installation because of this message in wizzard :
No active IP found for the requested ceph public network '149.***.***.77/24' on node 'proxmox-0008' (500)
the public ip showed is the proxmox-0007 (my first node) IP.

2) On the only nodes where installation is completly done, I can't create OSD on ceph.
On your latest answer, you told me that "currently the webinterface does not allows to select single partitions. So you need to use ceph-volume lvm create /dev/... from the CLI, after initial ceph installation on all nodes and setup of monitors this will integrate into PVE-Ceph just fine."

I really don't know how to do that.... I tried this :
Code:
 ceph-volume lvm create /dev/sda4
but it doesn't work.

Can you help me again please ?
 
Last edited:
No active IP found for the requested ceph public network '149.***.***.77/24' on node 'proxmox-0008' (500)
the public ip showed is the proxmox-0007 (my first node) IP.

I mean does the second node has an IP in the 149.***.***.77/24 network? What's the output of a (slightly censored) ip addr output?

I really don't know how to do that.... I tried this : ceph-volume lvm create /dev/sda4
but it doesn't work.
Did you created an monitor? Also what error you get?
 
I mean does the second node has an IP in the 149.***.***.77/24 network? What's the output of a (slightly censored) ip addr output?
No. The second node is an another ip range (54.***.***.***)
So what is the problem, and do you have a solution ?

Did you created an monitor? Also what error you get?

My node has a monitor whitch is himself. And I cant create more monitor on other nodes because I get the same error : No active IP found for the requested ceph public network '149.***.***.77/24'

My error is "ceph-volume lvm create: error: argument --data is required"
Actuelly, I never use ceph before so I really don't know this technology. Is there an article to explain how to create a ceph replicated volume on partitions ?


Thank you for your help :)
 
Last edited:
I've tried something :
Code:
pveceph init --network 10.64.0.0/10
then edit /etc/pve/ceph.conf on each nodes with private adress instead of public address

now I have 3 monitors : one on each node.

So I think ceph is working "fine".

---- EDIT ---

Now I come back to create an OSD on a partition. I use gdisk to create a LVM typed partition /dev/sda4, then (after a partprobe) I do this :
Code:
root@proxmox-0007:~# pvdisplay
  "/dev/sda4" is a new physical volume of "1.76 TiB"
  --- NEW Physical volume ---
  PV Name               /dev/sda4
  VG Name              
  PV Size               1.76 TiB
  Allocatable           NO
  PE Size               0  
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               b3qdgX-3Et7-nXzs-Am84-UpaC-5196-EcMyHX
  
root@proxmox-0007:~# pvcreate /dev/sda4
  Physical volume "/dev/sda4" successfully created.

root@proxmox-0007:~# vgcreate ceph-pool-a /dev/sda4
  Volume group "ceph-pool-a" successfully created

root@proxmox-0007:~# lvcreate -L 1800000m -n OSD_data-a ceph-pool-a
WARNING: ext4 signature detected on /dev/ceph-pool-a/OSD_data-a at offset 1080. Wipe it? [y/n]: y
  Wiping ext4 signature on /dev/ceph-pool-a/OSD_data-a.
  Logical volume "OSD_data-a" created.

root@proxmox-0007:~# ceph-volume lvm create --data ceph-pool-a/OSD_data-a --journal ceph-pool-a/OSD_data-a
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 5abe1b31-5f03-4af6-af40-5299d6f6e5b7
stderr: [errno 2] error connecting to the cluster
-->  RuntimeError: Unable to create a new OSD id

I'm stucked with OSD creation from CLI with partitions instead of full hard drives. Can you help me on this point ?
I think I need an example step-by-step to create a partition and use it at ceph OSD for this node, then I'll can probably accommodate this on the rest of my nodes by myself (I hope) :rolleyes:.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!