EXT4 or not EXT4 (+ mount options)

TiME

New Member
Apr 25, 2014
18
0
1
I know this question has been asked a few times already and I was reading through every topic in this forum I could find about it, but some are outdated or only contain inconclusive answers. The question is: Is it safe or even recommended to use EXT4 as the storage file system (in my case local SATA storage, RAID-10) and/or install Proxmox with the boot parameter "linux ext4"? If so, why isn't it default yet, I mean have there been any recent reports about bugs with Proxmox and ext4 or is it just to stay "as stable as possible"?

My second question is about the mount options for local ext4 storage on a SATA RAID-10 array with BBU (KVM images only): I noticed that it's possible to gain a quite huge performance boost from using "defaults,noatime,nodiratime,data=writeback,nobarrier,nobh,commit=10,nodelalloc" with the setup I mentioned (especially "nodelalloc" increased the fsyncs a lot and in addition I use "blockdev --setra 512 /dev/mapper/pve-data"). I came up with these settings after a while of research and I'm almost certain that it would work great on a Proxmox node, but I'm not abolutely sure about how secure and stable they are in a production environment. Would anyone be willing to try these settings in a similar testing environment or elaborate on any issues that might occur with these mount options?
 
Both nobarrier and writeback can be dangerous fpr your data and normally would not be recommended in production.

*if* your sata controller performs no caching - i.e writes direct to disk, then nobarrier should be safe.

writeback will always be dangerous, the system reports writes as completed before they are written to disk. In the event of a system crash or power failure there is a good chance of data loss.


ext4 is quite stable, I would be fine with using it for data storage and system. If you are ussing a SSD for system, then definitely use ext4 as it is much better for SSD's.
 
The question is: Is it safe or even recommended to use EXT4 as the storage file system (in my case local SATA storage, RAID-10) and/or install Proxmox with the boot parameter "linux ext4"?

We install Proxmox with linux=ext4 for several years now on all nodes in our cluster, over hardware RAID10 (without battery protection). Ext4 works very reliably, has survived many kernel panics and unplanned cold resets without any data corruption.

It also provides several advantages over ext3 that we observed:
- it is considerably faster during normal operation, especially on slower disk subsystems (single disk, mirror)
http://www.phoronix.com/scan.php?page=article&item=ext4_benchmarks&num=3
- it is orders of magnitude faster when doing recovery or filesystem-check (fsck) which can mean only minutes of downtime (instead of hours) after an unclean restart
http://en.wikipedia.org/wiki/File:E2fsck-uninit.svg
- much faster with large number of files and directories (speeding up vzdump / vzmigrate in our tests)
- much less fragmentation by design + online defragmentation
- larger maximum filesystem and file sizes

My second question is about the mount options for local ext4 storage on a SATA RAID-10 array with BBU (KVM images only): I noticed that it's possible to gain a quite huge performance boost from using "defaults,noatime,nodiratime,data=writeback,nobarrier,nobh,commit=10,nodelalloc" with the setup I mentioned (especially "nodelalloc" increased the fsyncs a lot and in addition I use "blockdev --setra 512 /dev/mapper/pve-data"). I came up with these settings after a while of research and I'm almost certain that it would work great on a Proxmox node, but I'm not abolutely sure about how secure and stable they are in a production environment. Would anyone be willing to try these settings in a similar testing environment or elaborate on any issues that might occur with these mount options?

We have also tested most of these mount options, but since the journal itself is the biggest factor in the reliability of ext4, have decided against using them, eventually using only noatime, nodiratime. For testing, we used bonnie++, as the fsync/s number reported by pveperf is far from being a comprehensive I/O benchmark.

data=writeback
According to the documentation, using data=writeback may lead to correctly written files that contain obsolete data (only metadata is written, file contents are not) after a crash, which would be unacceptable for us:
http://unix.stackexchange.com/quest...journal-vs-data-writeback-in-ext4-file-system

nobarrier
Nobarrier is also dangerous, especially if your RAID controller has no BBU and/or your server has no UPS:
http://www.phoronix.com/scan.php?page=article&item=ext4_linux35_tuning&num=1

nodelalloc
On the other hand, nodelalloc may be useful for data integrity (it may increase performance on many-disk arrays, but decreases it in single disks and mirrors, never tested it):
http://www.pointsoftware.ch/en/4-ext4-vs-ext3-filesystem-and-why-delayed-allocation-is-bad/
http://www.phoronix.com/scan.php?page=article&item=ext4_linux35_tuning&num=2

read-ahead
Setting read ahead should be done carefully, as it's very much dependent on your RAID stripe size and also your IO profile (how sequential is your average IO operation, etc.), so running bonnie++ with many possible settings is very useful before settling on a value. We didn't find a big effect on 6 disk hardware RAID10 compared to default, but on mdraid it may prove useful.

mdraid vs. hardware RAID
Not long ago the Proxmox kernel wasn't as stable as now, and a kernel panic and subsquent cold reset on a software RAID system did cause data corruption and even RAID collapse for us, even though our servers were UPS protected. Therefore if data integrity is important to you, you SHOULD use a hardware RAID controller that doesn't go down even when your kernel does. Together with a journaling filesystem and a UPS, it will provide data integrity even if your server crashes 5 times a day, because the controller and the disk will always complete the writes that they received from the OS, and the filesystem recovery will preseve the atomicity when recovering the journal - that's why a BBU is not really necessary if journaling is functioning properly.


Summary
To summarize it all, you should use ext4, it's safe and reliable, but messing with the default journaling options may be bad for data integrity, especially on software RAID or single disks. We chose data integrity instead of maximum performance, but on an affordable Adaptec 6805E controller with 6 disk RAID10 pveperf gives us 700 MB/s sequential and 3000 fsyncs/s random with only noatime mount option, so no complaining here.
 
Last edited:
Thanks for your response!

Both nobarrier and writeback can be dangerous fpr your data and normally would not be recommended in production.
*if* your sata controller performs no caching - i.e writes direct to disk, then nobarrier should be safe.
writeback will always be dangerous, the system reports writes as completed before they are written to disk. In the event of a system crash or power failure there is a good chance of data loss.
As stated in my OP, the RAID controllers I use have battery backup units, so the write cache should be written to disk even in case of power loss/PSU failure. Am I right about the assumption that this will make it fine to use writeback and nobarrier or would either one of those still be a concern?

ext4 is quite stable, I would be fine with using it for data storage and system. If you are ussing a SSD for system, then definitely use ext4 as it is much better for SSD's.
I don't use SSDs, otherwise I would of course use ext4 without much questioning to make sure I can use the drives for more than a few months. I also assume that ext4 can be considered stable, because many distros use it as default filesystem already, but then I wonder why it isn't the default filesystem in Proxmox VE? There surely must be a reason for that? Or maybe the reason is that VE 3 was released before ext4 was stable enough and the Proxmox team didn't want to change the default filesystem with minor versions like 3.3, even it is stable enough now? Just trying to figure this out. :)
 
As stated in my OP, the RAID controllers I use have battery backup units, so the write cache should be written to disk even in case of power loss/PSU failure. Am I right about the assumption that this will make it fine to use writeback and nobarrier or would either one of those still be a concern?

If I understand correctly, the problem with nobarrier/data=writeback could still manifest itself if your kernel crashes at a bad time, because during a file overwrite metadata could have already been written out while the file itself is not yet overwritten with the new data, and the journal is not properly representing the actual state. One of my links has an extensive writeup about that.
 
Last edited:
As stated in my OP, the RAID controllers I use have battery backup units, so the write cache should be written to disk even in case of power loss/PSU failure. Am I right about the assumption that this will make it fine to use writeback and nobarrier or would either one of those still be a concern?
No, you are not safe even if you use a hardware raid controller with bbu. Writeback for journal means that the kernel is informed by the file system that data is persistently written to storage as soon as the metadata is committed to the journal. Eg the file system does not wait for a commitOK from the controller for the actual data.
 
Ok, I was somehow under the impression that BBUs would "fix" the dangers of data=writeback, but apparently I was wrong. But with BBUs it should still be safe to disable barriers, right? I mean at least the official documentation says so.
 
We install Proxmox with linux=ext4 for several years now on all nodes in our cluster, over hardware RAID10 (without battery protection). Ext4 works very reliably, has survived many kernel panics and unplanned cold resets without any data corruption.

It also provides several advantages over ext3 that we observed:
- it is considerably faster during normal operation, especially on slower disk subsystems (single disk, mirror)
http://www.phoronix.com/scan.php?page=article&item=ext4_benchmarks&num=3
- it is orders of magnitude faster when doing recovery or filesystem-check (fsck) which can mean only minutes of downtime (instead of hours) after an unclean restart
http://en.wikipedia.org/wiki/File:E2fsck-uninit.svg
- much faster with large number of files and directories (speeding up vzdump / vzmigrate in our tests)
- much less fragmentation by design + online defragmentation
- larger maximum filesystem and file sizes
That's exactly what I needed - thanks a lot! I also assumed that it has to be stable enough if that many distros use it as default filesystem, but I was just wondering why exactly Proxmox VE doesn't use it by default and I found some (older) reports in the forums that it would cause issues with Snapshots or Live backups or something. But if you use it successfully for years, I'm sure it will be stable enough for my purpose as well.

About the rest of your post: Thanks for that too - very useful. So at least for production or if my data is important to me I should forget about data=writeback, but nobarrier as well as nodelalloc should be fine (at least data safety-wise) with BBUs? I'll probably stay close to the default mount options too then, except for disabling access times of course and probably nobarrier and nodelalloc, if those are safe.

PS: And by the way I'm using HP Smart Array controllers.
 
nobarriers is completely safe with or without hardware raid with bbu. Current version of ext4 has taken other measures which makes noborrier as safe on ext4 as it is with ext3 and since proxmox defaults to nobarrier with ext3 you get equal safety.

For nodealloc the good part is that your file system will benefit from both better safety and higher performance at the same time. A win-win situation;)
 
So to summarize, nobarrier and nodelalloc is the way to go then, as long as BBUs are present one way or another and it's also fine to use ext4 with Proxmox VE to make use of its benefits. Thank you very much for the help guys, really appreciated.

Note to myself and other readers: To get the best possible performance without risking data integrity, type "linux ext4" at boot prompt when installing Proxmox VE, or later "convert" the /dev/mapper/pve-data LVM partition to ext4 using tune2fs (don't forget to run fsck on it afterwards) and mount that partition with the mount options "defaults,noatime,nodiratime,nobarrier,nodelalloc" (and probably "commit=10") in /etc/fstab, if you use a RAID array with battery backup.
 
nobarriers is completely safe with or without hardware raid with bbu. Current version of ext4 has taken other measures which makes noborrier as safe on ext4 as it is with ext3 and since proxmox defaults to nobarrier with ext3 you get equal safety.

According to the RHEL documentation (and other sources), the performance penalty of write barriers is a negligible 3%, so I don't recommend turning it off.

For nodealloc the good part is that your file system will benefit from both better safety and higher performance at the same time. A win-win situation;)

This is not entirely true, there are reported cases where disabling delayed allocation actually decreases performance, especially single-disk or single-ssd storage.
http://comments.gmane.org/gmane.comp.file-systems.ext4/37924
http://www.phoronix.com/scan.php?page=article&item=ext4_linux35_tuning&num=3

According to some other sources, nodelalloc mount option was only kept for ext3 compatibility, ext4 contains code that works around delayed allocation to ensure data integrity, so it's not advised to use at all.


But it is a heavily debated topic:
http://www.pointsoftware.ch/en/4-ext4-vs-ext3-filesystem-and-why-delayed-allocation-is-bad/
 
Last edited:
but I was just wondering why exactly Proxmox VE doesn't use it by default

Its a good question, ext4 is well out of its dev phase and is the default for most distro's now. And given how common it is to use an SSD for boot now, ext3 as a defaul is positively dangerous.
 
And seeing as we are discussing ext4 :)

ext4 or xfs for VM image storage? I've been testing a lot lately and ext4 just seems to give better read/write performance, both sustained and random - especially with a external SSD journal. This is with large spindles, no raid (3TB WD Red)

You can use a external journal with xfs but the options and tools for it are so limited that I find the thought quite risky.


I'm testing in a gluster settup, where they heavily recommend xfs, but I'm wondering if thats because of its performance with large numbers of small files, which is not an issue for image storage.
 
Don't know about RHEL documentation, but "nobarries" makes an huge difference (much worse if enabled) if you test performance with pveperf (fsync/sec drops dramatically). Any idea?
 
Don't know about RHEL documentation, but "nobarries" makes an huge difference (much worse if enabled) if you test performance with pveperf (fsync/sec drops dramatically). Any idea?
Do you mean much worse if barrier=1? If thats the case I can confirm that performance is increased by a factor 5-10 if applying mount option barrier=0
 
What you are referring as a ext4 problem is actually a md problem. So no, you did not have problems with ext4 you had problems with md.
 
Can you prove it? We have quite a lot of software raids, and experienced problems only with ext4 on them when sparse qcow2 files were expanded greatly.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!