ceph, autoscaler

pille99

Active Member
Sep 14, 2022
360
28
28
hello all

i am wondering if ceph can handle autoblancing. more in detail. i want a script which runs once a day and automatically scale the VM and Resources between 4 Nodes (in a cluster). but the question is if ceph can handle it.

4 nodes, 8 Ceph storages (2 per node)
on node 1 is a VM

the VM writes to Node 2,3 and 4 a part of the VM Drive, 1/3 to each drive. in case of Node 1 goes down, the VM will be restored from the disk information stored from drive 2,3and 4 and will be created on any Node (lets suppose on Node 2).
but how does Ceph handle if moving VMs are on the cluster. means: VM will be moved to Node 4 (autoscaler), does ceph automatically delete the "backup_infos" from Node 4 and write it to Node1 ? does the recovery still work ?
 
I think there is a bit of misunderstanding of Ceph works and which services / functionality does what. The data is stored somewhere across the cluster. If you have a replicated pool with size=3, Ceph will store 3 copies of each data. Unless you changed anything, the failure domain is host. This means, that each replica is placed on a separate node.
Overall, the disk images are spread across all nodes in the cluster.

The individual objects that make up the disk image are grouped in so called "Placement Groups", PG. The decision on which node and which disk (managed by its appropriate OSD daemon) is made on the PG level. Each PG has a primary OSD. The client (VM) is talking to the primary OSD, which then makes sure that the replicas are stored on OSDs in other nodes as well.
Since the disk image is probably larger, the objects are spread over multiple PGs with different primary OSDs, in other servers.

In short, Ceph does not really have a concept of locality. No matter where the VM is running, The data will be spread over the cluster.

The autoscaler is handling a whole different thing. Each pool has a config option for how many PGs it should use. How large that number is depends on how many OSDs are usable by the pool and how many other pools there are. The autoscaler helps adjusting the pg_num for each pool. Either by how full they currently are, or if configured, the target_size or target_ratio.

Overall, I would recommend that you read up on how Ceph works :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!