Ceph ext4 mysql

TwiX

Active Member
Feb 3, 2015
294
21
38
Hi,

I noticed that the only way to get low iowaits with ceph is to settle VM disk cache to writeback.

But it's not enough, with mysql (innodb) we still have high iowaits on high load. We have to disable barrier on ext4 mount options. After that, disk performance is OK.

On a 5 nodes cluster with 20 SSD OSDs and 20Gbps network, I would have thought it could be better without turning off these options.

I guess, in case of a disaster, it's possible to loose data. VMs reboot on another node but data may not be consistent.

Is there a way with ceph to mix safety/performance ?

Thanks in advanced.

Antoine
 

twister

Member
Mar 20, 2019
4
0
6
46
Have you benchmarked your CEPH cluster?
If so, do the results from within the VMs & directly from the nodes differ greatly?
 

TwiX

Active Member
Feb 3, 2015
294
21
38
Hi,

Yes benchmarks are done. Results are almost similar to what Proxmox team did.
Benchmarking now with 40 vms on production could be irrelevant.

Mysql instances perform about 500 to 1000 queries/sec each.

If nothing could be done, no choice I must settle barrier=1 and keep writeback for vm disk cache.
I also know that in general ceph performances for 4k/8k block size are not so good.
 

TwiX

Active Member
Feb 3, 2015
294
21
38
Activating mysql binary logs and keep ext4 nobarrier could be safe enough ?
 

guletz

Famous Member
Apr 19, 2017
1,584
260
103
Brasov, Romania
Hi

Activating mysql binary logs and keep ext4 nobarrier could be safe enough ?

Disabling barriers when disks cannot guarantee caches are properly written in case of power failure can lead to severe file system corruption and data loss. For a VM, if you have a kernel crash, then is possible to have a corrupted FS. If you have a corrupted FS, then binary logs could be also unusable, so the recover of DB can not happend. Even more adding writeback cache for VM, for VM perspective cache will be flush to disk (so ext4 will think that all data are on disk -> mysql think the same) but this is not true until the PMX host will flush the data to disk. But if the PMX node will crash, before flushing the data to disk(excluding any power-loss), then you can have a inconsistent DB at least!

Good luck!
 

Knuuut

Member
Jun 7, 2018
91
9
8
57
On a 5 nodes cluster with 20 SSD OSDs and 20Gbps network, I would have thought it could be better without turning off these options.
IO/s interacts with network latency. Think about a 40/100Gb/s network infrastructure and more OSDs/node.

Cheers Knuuut
 

TwiX

Active Member
Feb 3, 2015
294
21
38
Hi,

We plan to reactivate barrier. The solution is to add lots of RAM for MySQL innodb_buffer_pool_size.

I guess you're right, 10GB with 20 SSDs maybe not enough.

Thanks to all of you guys :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!