pveperf and ext4

mir

Famous Member
Apr 14, 2012
3,568
127
133
Copenhagen, Denmark
Hi all,

The poor scores for ext4 in the pveperf benchmark has been hunting me for some time now and a posting by Tom triggered me to do some serious digging into the problem which I have now done. Here comes the problem:

As of kernel 2.6.32 the design of ext4 has changes dramatically favoring data safety over performance resulting in bad default behavior for ext4. The change boils down to the following commit to the linux kernel:
http://git.kernel.org/cgit/linux/ke.../?id=5f3481e9a80c240f169b36ea886e2325b9aeb745

Story and background can be read here: http://www.postgresql.org/message-id/4B512D0D.4030909@2ndquadrant.com. Especially follow the links to phoronix.com.

What importance does the have to pveperf regarding ext4?
The impact on the test in pveperf for ext4 is that the fsync mesures per second is done by counting how many fsyncs is possible within a time frame which you should think would be fine, doesn't you? Well in the case of ext4 this is utterly wrong since ext4 in is default configuration already, implicitly makes an fsync for each write and therefore what the pveperf benchmark is testing is wrong since it explicitly does an fsync also after each write and thereby doubles the number of fsyncs made per write.

How should pveperf then do the test?
The proper way of dealing with fsyncs per second on ext4 would simply be to leave out the explicit fsync and simple hammer the disk with write command and only stick to the current behavior if the file system is mounted with -o nobarrier.

Later this evening I will make some test showing the difference between current pveperf and a pveperf incorporating my ideas - both against ext3 and ext4. Stay tuned for the next episode of pveperf and ext4;-)
 
Well in the case of ext4 this is utterly wrong since ext4 in is default configuration already, implicitly makes an fsync for each write and therefore what the pveperf benchmark is testing is wrong since it explicitly does an fsync also after each write and thereby doubles the number of fsyncs made per write.

You talk about barriers=1? AFAIK a barrier and an fsync is something different.
 
Also, I guess most software simply use fsync, instead of querying the properties of the underlying file system (is there a API for that?)?
 
Yes, barrier and fsync are off course different but in the specific case of ext4 and the change made from the commit I have pointed to means that setting barrier=0 will disable explicit disk flush after every write.

"[This change] is required for safe behavior with volatile write caches on drives. You could mount with -o nobarrier and [the performance drop] would go away, but a sequence like write->fsync->lose power->reboot may well find your file without the data that you synced, if the drive had write caches enabled. If you know you have no write cache, or that it is safely battery backed, then you can mount with -o nobarrier, and not incur this penalty."

 
By doing a loop. What you measure is how many times can we do this loop, containing a write and a fsync, within a certain amount of time.

For ext3: write->fsync
For ext4 with barrier=1: (write, floss disk)->fsync

Which means, in the case of ext3 and ext4, comparing apples with pares.
 
I found this interesting article which in detail discusses file system optimization when running KVM on Redhat 5 and 6 kernels. As also can be seen on page 12 barriers is still an option.
http://www.redhat.com/summit/2011/p...day/shak_barry_w_0530_fileperf_summit2011.pdf

As a side note: Redhat clams 4-12% performance improvements for ext4 in RHEL6 over ext3 in RHEL5. (page 36)
Same page also mentions a KVM storage option: aio=native. Is this used in PVE?
 
As a side note: Redhat clams 4-12% performance improvements for ext4 in RHEL6 over ext3 in RHEL5. (page 36)

benchmarks and performance claims ..... (will not comment on that).

Same page also mentions a KVM storage option: aio=native. Is this used in PVE?

yes
 
barriers as not used anymore in recent linux kernel (> 2.6.39) if I remember, including redhat 2.6.32 kernel.

Thanks for that link. Glad to see that developers finally see that performance using barriers is unusable bad (and this is what pveperf shows exactly?)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!