Ceph ext4 mysql

TwiX · Mar 20, 2019

Hi,

I noticed that the only way to get low iowaits with ceph is to settle VM disk cache to writeback.

But it's not enough, with mysql (innodb) we still have high iowaits on high load. We have to disable barrier on ext4 mount options. After that, disk performance is OK.

On a 5 nodes cluster with 20 SSD OSDs and 20Gbps network, I would have thought it could be better without turning off these options.

I guess, in case of a disaster, it's possible to loose data. VMs reboot on another node but data may not be consistent.

Is there a way with ceph to mix safety/performance ?

Thanks in advanced.

Antoine

guletz · Mar 20, 2019

Hi Antoine,

TwiX said:
I guess, in case of a disaster, it's possible to loose data

Yes it will be very likely to lose data with no-barrier and writeback cache policy in case of disaster!

twister · Mar 20, 2019

Have you benchmarked your CEPH cluster?
If so, do the results from within the VMs & directly from the nodes differ greatly?

TwiX · Mar 20, 2019

Hi,

Yes benchmarks are done. Results are almost similar to what Proxmox team did.
Benchmarking now with 40 vms on production could be irrelevant.

Mysql instances perform about 500 to 1000 queries/sec each.

If nothing could be done, no choice I must settle barrier=1 and keep writeback for vm disk cache.
I also know that in general ceph performances for 4k/8k block size are not so good.

TwiX · Mar 20, 2019

Activating mysql binary logs and keep ext4 nobarrier could be safe enough ?

guletz · Mar 20, 2019

Hi

TwiX said:
Activating mysql binary logs and keep ext4 nobarrier could be safe enough ?

Disabling barriers when disks cannot guarantee caches are properly written in case of power failure can lead to severe file system corruption and data loss. For a VM, if you have a kernel crash, then is possible to have a corrupted FS. If you have a corrupted FS, then binary logs could be also unusable, so the recover of DB can not happend. Even more adding writeback cache for VM, for VM perspective cache will be flush to disk (so ext4 will think that all data are on disk -> mysql think the same) but this is not true until the PMX host will flush the data to disk. But if the PMX node will crash, before flushing the data to disk(excluding any power-loss), then you can have a inconsistent DB at least!

Good luck!

Knuuut · Mar 21, 2019

TwiX said:
On a 5 nodes cluster with 20 SSD OSDs and 20Gbps network, I would have thought it could be better without turning off these options.

IO/s interacts with network latency. Think about a 40/100Gb/s network infrastructure and more OSDs/node.

Cheers Knuuut

TwiX · Apr 2, 2019

Hi,

We plan to reactivate barrier. The solution is to add lots of RAM for MySQL innodb_buffer_pool_size.

I guess you're right, 10GB with 20 SSDs maybe not enough.

Thanks to all of you guys

Search

Search

Ceph ext4 mysql

TwiX

Renowned Member

guletz

Distinguished Member

twister

Active Member

TwiX

Renowned Member

TwiX

Renowned Member

guletz

Distinguished Member

Knuuut

Member

TwiX

Renowned Member

We value your privacy