Hi all,
The poor scores for ext4 in the pveperf benchmark has been hunting me for some time now and a posting by Tom triggered me to do some serious digging into the problem which I have now done. Here comes the problem:
As of kernel 2.6.32 the design of ext4 has changes dramatically favoring data safety over performance resulting in bad default behavior for ext4. The change boils down to the following commit to the linux kernel:
http://git.kernel.org/cgit/linux/ke.../?id=5f3481e9a80c240f169b36ea886e2325b9aeb745
Story and background can be read here: http://www.postgresql.org/message-id/4B512D0D.4030909@2ndquadrant.com. Especially follow the links to phoronix.com.
What importance does the have to pveperf regarding ext4?
The impact on the test in pveperf for ext4 is that the fsync mesures per second is done by counting how many fsyncs is possible within a time frame which you should think would be fine, doesn't you? Well in the case of ext4 this is utterly wrong since ext4 in is default configuration already, implicitly makes an fsync for each write and therefore what the pveperf benchmark is testing is wrong since it explicitly does an fsync also after each write and thereby doubles the number of fsyncs made per write.
How should pveperf then do the test?
The proper way of dealing with fsyncs per second on ext4 would simply be to leave out the explicit fsync and simple hammer the disk with write command and only stick to the current behavior if the file system is mounted with -o nobarrier.
Later this evening I will make some test showing the difference between current pveperf and a pveperf incorporating my ideas - both against ext3 and ext4. Stay tuned for the next episode of pveperf and ext4;-)
The poor scores for ext4 in the pveperf benchmark has been hunting me for some time now and a posting by Tom triggered me to do some serious digging into the problem which I have now done. Here comes the problem:
As of kernel 2.6.32 the design of ext4 has changes dramatically favoring data safety over performance resulting in bad default behavior for ext4. The change boils down to the following commit to the linux kernel:
http://git.kernel.org/cgit/linux/ke.../?id=5f3481e9a80c240f169b36ea886e2325b9aeb745
Story and background can be read here: http://www.postgresql.org/message-id/4B512D0D.4030909@2ndquadrant.com. Especially follow the links to phoronix.com.
What importance does the have to pveperf regarding ext4?
The impact on the test in pveperf for ext4 is that the fsync mesures per second is done by counting how many fsyncs is possible within a time frame which you should think would be fine, doesn't you? Well in the case of ext4 this is utterly wrong since ext4 in is default configuration already, implicitly makes an fsync for each write and therefore what the pveperf benchmark is testing is wrong since it explicitly does an fsync also after each write and thereby doubles the number of fsyncs made per write.
How should pveperf then do the test?
The proper way of dealing with fsyncs per second on ext4 would simply be to leave out the explicit fsync and simple hammer the disk with write command and only stick to the current behavior if the file system is mounted with -o nobarrier.
Later this evening I will make some test showing the difference between current pveperf and a pveperf incorporating my ideas - both against ext3 and ext4. Stay tuned for the next episode of pveperf and ext4;-)