Hi all,
Running a reasonably small proxmox and ceph deployment, four servers split across two buildings in our school, then a fifth witness server (which doesn't host any VMs/storage) in a third location.
In order to sustain things in the event of failure, we run 4/2 copies on ceph, so any two servers can go down and things keep going. Everything (except the witness...) is networked 10gb.
We've noticed that some of our DB servers are running rather slower than in our previous setup, and have had a few moans....
We've ended up running a couple of drives in writeback-unsafe mode to resolve the slowdowns, it seems to be arising where programs are waiting for writes to be confirmed before moving on.
Reading the ceph documentation I noticed this passage:
Ceph will NOT acknowledge a write operation to a client, until all OSDs of the acting set persist the write operation. This practice ensures that at least one member of the acting set will have a record of every acknowledged write operation since the last successful peering operation.
Given we're set to run four copies it seems likely it isn't confirming a write as finished until all four of our servers have committed it to disk.
This feels a lot like overkill, given that all four of our servers are protected by UPS (two in each location so we can accommodate a failure of either of those too...) and geographically distant from each other (approx 500m).
Is there any way to relax Cephs handling of these writes? An equivalent to the ZFS sync=disabled perhaps?
Looking at Linstor DRBD they offer different replication modes, notably 'Protocol B' which treats writes as completed once local disk has completed and replication packets have reached remote notes:
https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-replication-protocols
This basically sounds perfect to me, although obviously it's going to be a fair job to switch things across to DRBD (and no doubt there may be other caveats to consider!)
Any suggestions/thoughts?
My concern with the current solution of writeback-unsafe or turning off buffer flushing on the VM is that it is vulnerable to data loss if that particular host crashes - avoiding this was of course part of the reason we wanted to bring in ceph. Performance of the cluster is otherwise superb....
Running a reasonably small proxmox and ceph deployment, four servers split across two buildings in our school, then a fifth witness server (which doesn't host any VMs/storage) in a third location.
In order to sustain things in the event of failure, we run 4/2 copies on ceph, so any two servers can go down and things keep going. Everything (except the witness...) is networked 10gb.
We've noticed that some of our DB servers are running rather slower than in our previous setup, and have had a few moans....
We've ended up running a couple of drives in writeback-unsafe mode to resolve the slowdowns, it seems to be arising where programs are waiting for writes to be confirmed before moving on.
Reading the ceph documentation I noticed this passage:
Ceph will NOT acknowledge a write operation to a client, until all OSDs of the acting set persist the write operation. This practice ensures that at least one member of the acting set will have a record of every acknowledged write operation since the last successful peering operation.
Given we're set to run four copies it seems likely it isn't confirming a write as finished until all four of our servers have committed it to disk.
This feels a lot like overkill, given that all four of our servers are protected by UPS (two in each location so we can accommodate a failure of either of those too...) and geographically distant from each other (approx 500m).
Is there any way to relax Cephs handling of these writes? An equivalent to the ZFS sync=disabled perhaps?
Looking at Linstor DRBD they offer different replication modes, notably 'Protocol B' which treats writes as completed once local disk has completed and replication packets have reached remote notes:
https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-replication-protocols
This basically sounds perfect to me, although obviously it's going to be a fair job to switch things across to DRBD (and no doubt there may be other caveats to consider!)
Any suggestions/thoughts?
My concern with the current solution of writeback-unsafe or turning off buffer flushing on the VM is that it is vulnerable to data loss if that particular host crashes - avoiding this was of course part of the reason we wanted to bring in ceph. Performance of the cluster is otherwise superb....