Low performance CEPH during a Disaster Receovery Simulation

alanspa

Member
Apr 4, 2022
21
1
8
HI
our scenario involves this:
5 nodes in HA
2 nodes installed in one place
2 nodes installed elsewhere
1 node installed elsewhere witout CEPH storage

all places are connected in 10 gigabits LAN


if I turn off 2 nodes at the same time in the same place, then 50% of the Ceph OSD are down, the data is there but the performance is very slow, such that it is not possible to work. How come?

I attach screens of the ceph configuration and some screens during the shutdown of the two nodes.

question: could it be that the performances would have improved and returned to normal at the end of the rebalancing?

Thank you
 

Attachments

  • Screenshot 2024-05-30 150617.png
    Screenshot 2024-05-30 150617.png
    38.4 KB · Views: 6
  • Screenshot 2024-05-30 151902.png
    Screenshot 2024-05-30 151902.png
    66.2 KB · Views: 6
  • Screenshot 2024-06-01 122504.png
    Screenshot 2024-06-01 122504.png
    27.3 KB · Views: 6
  • Screenshot 2024-06-01 122859.png
    Screenshot 2024-06-01 122859.png
    69.9 KB · Views: 6
I don't know much about this, but Proxmox does not like remote nodes (due to increased latency). Assuming that that's not a problem in your setup, Ceph really needs three working nodes (it limps an rebuilds in a panic with two, I believe) which is not the case in your disaster. I hope other people here who know the details of all this will correct me.
 
  • Like
Reactions: UdoB
if I turn off 2 nodes at the same time in the same place, then 50% of the Ceph OSD are down,
Yes. The missing information here is: what is your "size/min_size" setting? (Found in Pools --> row <yourpool> --> column "Size/min")
the data is there but the performance is very slow,
The remaining nodes may begin to shuffle a lot of data around to get re-balanced. No?


Disclaimer: I am NOT a Ceph specialist.
 
"Size=3/Min Size=2" means: Ceph is going to write three copies. As long as two copies are available everything is fine.

You had four Nodes with OSDs. You turned off two Nodes with OSDs.

What if two of the tree datablocks written are on the turned-off Nodes? That Placement Group is read only now! Every VM writing data in this area will stop to work immediately. (Depending on the application different things may happen, from just messages "can not write this data" to crashed systems.)

The only way to allow two nodes to fail without getting into real trouble quickly is to set a least "size=4/min_size=2".

The number of nodes allowed to fail is the difference between these two number. Your setup is "3 - 2 = 1" meaning only one single node may fail.

Please make sure to understand the implications (e.g. less space) before you change that setting. Read about it in the official Ceph documentation. As already said I am not the specialist here...

Good luck!
 
  • Like
Reactions: pvps1 and leesteken
Until a few months ago it was set to 4 and in fact the disaster recovery simulation went better.

Now everything explains.

I hope we can understand from the documentation how much space will be taken up.

Two questions, even if you are not an expert on ceph:

- what happens if I fill the entire pool? in addition to freeze all the VMs, is there a way to return to normal without losing data by putting the data back to 3?

- in today's scenario, once rebalancing is complete, do you confirm that performance improves?
 
- what happens if I fill the entire pool? in addition to freeze all the VMs, is there a way to return to normal without losing data by putting the data back to 3?

- in today's scenario, once rebalancing is complete, do you confirm that performance improves?
Try under all circumstances to not fill up the whole pool. It may be really troublesome to get back to "normal". There should be no data loss as everything goes "read-only", but I would not bet on this one.

Probably yes. But I've not tested many corner cases and... I am not an expert ;-)
 
if I turn off 2 nodes at the same time in the same place, then 50% of the Ceph OSD are down, the data is there but the performance is very slow, such that it is not possible to work. How come?
Before we discuss remedy, we need to discuss what your intended logical configuration is to be.

Ceph is a hierarchical system that operates on the basis of failure domains. A "failure domain" describes layer of functionality you intend to account for fault. In your case, I'm counting three seperate failure domains- osd, node, and DC (location.) EACH of the failure domains needs to contain a full replica. in a 3:2 rule, 2 replicas must be alive for the system to offer full functionality.

You have 3 sites, but only 2 contain OSDs, which means you are logically compromised from the start. If this is by design, we can move on, but understand each of your DCs only contain 2 OSD nodes. This means they cant ever satisfy a 3 replica requirement; you can get in a lot of trouble on link disruption. This configuration is a timebomb and service loss and possibly data loss is all but guaranteed in your future.

Now to address the performance loss. Understand that a proxmox/ceph HC requires a minimum of 3 networks (even if in your case they travel over the same link)- cluster traffic, ceph private, and ceph public traffic. You describe your inter-DC as 10gb, but not how many links or latency. your guest traffic travels over ceph public which is fighting for latency with the other traffic types; if your ceph private network is filled with frantic rebalancing across DCs you can guess what would happen to your guest performance.
 
what is your ceph version ?

ceph 17 (quincy) have a qos , to priorize client access vs repair


ceph config set global osd_mclock_profile high_client_ops
ceph config set global osd_mclock_profile balanced (default)
ceph config set global osd_mclock_profile high_recovery_ops
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!