pveperf and ext4

mir · May 30, 2013

Hi all,

The poor scores for ext4 in the pveperf benchmark has been hunting me for some time now and a posting by Tom triggered me to do some serious digging into the problem which I have now done. Here comes the problem:

As of kernel 2.6.32 the design of ext4 has changes dramatically favoring data safety over performance resulting in bad default behavior for ext4. The change boils down to the following commit to the linux kernel:
http://git.kernel.org/cgit/linux/ke.../?id=5f3481e9a80c240f169b36ea886e2325b9aeb745

Story and background can be read here: http://www.postgresql.org/message-id/4B512D0D.4030909@2ndquadrant.com. Especially follow the links to phoronix.com.

What importance does the have to pveperf regarding ext4?
The impact on the test in pveperf for ext4 is that the fsync mesures per second is done by counting how many fsyncs is possible within a time frame which you should think would be fine, doesn't you? Well in the case of ext4 this is utterly wrong since ext4 in is default configuration already, implicitly makes an fsync for each write and therefore what the pveperf benchmark is testing is wrong since it explicitly does an fsync also after each write and thereby doubles the number of fsyncs made per write.

How should pveperf then do the test?
The proper way of dealing with fsyncs per second on ext4 would simply be to leave out the explicit fsync and simple hammer the disk with write command and only stick to the current behavior if the file system is mounted with -o nobarrier.

Later this evening I will make some test showing the difference between current pveperf and a pveperf incorporating my ideas - both against ext3 and ext4. Stay tuned for the next episode of pveperf and ext4;-)

dietmar · May 31, 2013

mir said:
Well in the case of ext4 this is utterly wrong since ext4 in is default configuration already, implicitly makes an fsync for each write and therefore what the pveperf benchmark is testing is wrong since it explicitly does an fsync also after each write and thereby doubles the number of fsyncs made per write.

You talk about barriers=1? AFAIK a barrier and an fsync is something different.

dietmar · May 31, 2013

Also, I guess most software simply use fsync, instead of querying the properties of the underlying file system (is there a API for that?)?

mir · May 31, 2013

Yes, barrier and fsync are off course different but in the specific case of ext4 and the change made from the commit I have pointed to means that setting barrier=0 will disable explicit disk flush after every write.

"[This change] is required for safe behavior with volatile write caches on drives. You could mount with -o nobarrier and [the performance drop] would go away, but a sequence like write->fsync->lose power->reboot may well find your file without the data that you synced, if the drive had write caches enabled. If you know you have no write cache, or that it is safely battery backed, then you can mount with -o nobarrier, and not incur this penalty."

dietmar · May 31, 2013

I do not get the point. pveperf measures FSYNCS/Second.

mir · May 31, 2013

By doing a loop. What you measure is how many times can we do this loop, containing a write and a fsync, within a certain amount of time.

For ext3: write->fsync
For ext4 with barrier=1: (write, floss disk)->fsync

Which means, in the case of ext3 and ext4, comparing apples with pares.

dietmar · May 31, 2013

mir said:
Which means, in the case of ext3 and ext4, comparing apples with pares.

But I do not compare ext3 with ext4 (you do that).

dietmar · May 31, 2013

mir said:
Which means, in the case of ext3 and ext4, comparing apples with pares.

And as long as there is no way to query those features, how should an application know that fsync is not needed?

dietmar · May 31, 2013

Also, if you set the same mount option, then you more comparable result.

spirit · May 31, 2013

Hi Guys,

barriers as not used anymore in recent linux kernel (> 2.6.39) if I remember, including redhat 2.6.32 kernel.

http://lwn.net/Articles/400541/

mir · May 31, 2013

I found this interesting article which in detail discusses file system optimization when running KVM on Redhat 5 and 6 kernels. As also can be seen on page 12 barriers is still an option.
http://www.redhat.com/summit/2011/p...day/shak_barry_w_0530_fileperf_summit2011.pdf

As a side note: Redhat clams 4-12% performance improvements for ext4 in RHEL6 over ext3 in RHEL5. (page 36)
Same page also mentions a KVM storage option: aio=native. Is this used in PVE?

dietmar · May 31, 2013

mir said:
As a side note: Redhat clams 4-12% performance improvements for ext4 in RHEL6 over ext3 in RHEL5. (page 36)

benchmarks and performance claims ..... (will not comment on that).

mir said:
Same page also mentions a KVM storage option: aio=native. Is this used in PVE?

yes

dietmar · May 31, 2013

spirit said:
barriers as not used anymore in recent linux kernel (> 2.6.39) if I remember, including redhat 2.6.32 kernel.

Thanks for that link. Glad to see that developers finally see that performance using barriers is unusable bad (and this is what pveperf shows exactly?)

Search

Search

pveperf and ext4

mir

Famous Member

dietmar

Proxmox Staff Member

dietmar

Proxmox Staff Member

mir

Famous Member

dietmar

Proxmox Staff Member

mir

Famous Member

dietmar

Proxmox Staff Member

dietmar

Proxmox Staff Member

dietmar

Proxmox Staff Member

spirit

Distinguished Member

mir

Famous Member

dietmar

Proxmox Staff Member

dietmar

Proxmox Staff Member