Ceph | Trying to understand fault tolerance

Anotheruser · May 25, 2024

I am currently looking into ceph and read several posts strongly discouraging a 2 Node setup.

Lets say you are running a 2 node, 2 replica, 1 minimum replica cluster with an additional monitor as third voting device.
From my understanding this would be a similar risk for data loss like running a raid1 / raidz1 pool (ignoring the fact that ceph has more potential failure points than a conventional disk + hba) or to put it different i can loose one disk / one node without loosing any data, if the second one also goes obviously the pool is lost.
Is this correct?

Does ceph store checksums or sth similar to zfs to ensure data integrity on every node or does it use / require a separate replica to verify / compare data integrity?

Thanks for any help

alexskysilk · May 25, 2024

https://forum.proxmox.com/threads/number-of-disks-on-ceph-storage.147245/post-666286

alexskysilk · May 25, 2024

Anotheruser said:
From my understanding this would be a similar risk for data loss like running a raid1 / raidz1 pool

yes and no. RAID is not host domain aware. RAID is meant to never have disks operate independently where ceph is. consequently, the rules by default would not allow operation with only one osd member present- and while you could override it, you should think long and hard about what it is you are expecting from your storage; most people expect to take data and retain it properly.

Anotheruser said:
i can loose one disk / one node without loosing any data, if the second one also goes obviously the pool is lost.

This is not guaranteed. data integrity is provided by comparing the different pieces of pg to all members in the pg group. if you dont have any members to compare, the storage does not guarantee that what its giving you is what was written.

Again, the question you should be asking is "what am I expecting from the storage?" yes, you can operate in the manner you describe- but only if you dont care about the data.

Nemesiz · May 25, 2024

What will happens if both nodes lose connection between and both nodes continues to write locally. How you will merge differences?

alexskysilk · May 25, 2024

Nemesiz said:
What will happens if both nodes lose connection between and both nodes continues to write locally. How you will merge differences?

thats where the pve cluster comes into play. the node with quorum "wins."

Anotheruser · May 25, 2024

Nemesiz said:
What will happens if both nodes lose connection between and both nodes continues to write locally. How you will merge differences?

Thats why i mentioned there would be a third independent monitor (on a pi or sth)

Search

Search

Ceph | Trying to understand fault tolerance

Anotheruser

Member

alexskysilk

Distinguished Member

alexskysilk

Distinguished Member

Nemesiz

Renowned Member

alexskysilk

Distinguished Member

Anotheruser

Member

We value your privacy