Best file system for two nvme drives?

Nicolas Simond

Active Member
Apr 8, 2019
10
2
43
29
www.nicolas-simond.ch
Good morning,

Actually, my personal proxmox runs on one NVME drive for system and my VMs are on two sata ssd in XFS.

I'm running out of space and I'm planning to replace the one NVME system drive and the two ssd by two Firecuda 530 in raid 1.
What is the best way to get the most performance out of these drive?

Intel integrated raid? Xfs? Brtfs?

I've read on some forums than XFS doesn't have good performance, especially with nvme drives, is that true?
Side to this, it is recommended to have PVE installed directly on XFS with the vms on same disk?

Thanks for your feedbacks,
 
Are you sure you are talking about XFS and not ZFS?
If you are talking about ZFS then yes, performance won't be the best. First, those Firecudas are lacking the PLP (power-loss protection) that is recommended for ZFS and server workloads as otherwise, performance will drop to HDD levels and wear will be high once there are sync writes involved (and ZFS is doing a lot of them). Second, ZFS is about data integrity and features. All those additional data integrity checks will cost you performance.

Not sure if it got fixed meanwhile, but in the past, running btrfs in a 2-disk mirror was problematic once a disk failed.

Integrated raid is usually not a great option as it combines the downsides of software raid and hardware raid without the benefits. ;)

Best performing probably should be a mdadm raid1, but that isn't officially supported and if you want to use it as a boot drive, you would have to install Debian 12 with PVE on top instead of using the PVE ISO: https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm
 
Last edited:
Are you sure you are talking about XFS and not ZFS?
If you are talking about ZFS then yes, performance won't be the best. First, those Firecudas are lacking the PLP (power-loss protection) that is recommended for ZFS and server workloads as otherwise, performance will drop to HDD levels and wear will be high once there are sync writes involved (and ZFS is doing a lot of them). Second, ZFS is about data integrity and features. All those additional data integrity checks will cost you performance.

Not sure if it got fixed meanwhile, but in the past, running btrfs in a 2-disk mirror was problematic once a disk failed.
Yep, typo here, I talk about ZFS.

Power loss protection doesn't matter has I have an enterprise grade UPS who lasts 1h+ with nut configured on the proxmox to shut down everything in the event of a power loss.
 
Power loss protection doesn't matter has I have an enterprise grade UPS who lasts 1h+ with nut configured on the proxmox to shut down everything in the event of a power loss.
Without PLP sync writes cannot be cached and optimized by the drives (and this causes wear and slowness with ZFS). A UPS does not change that fact (but is a good idea anyway).
 
  • Like
Reactions: _gabriel and Dunuin
Power loss protection doesn't matter has I have an enterprise grade UPS who lasts 1h+ with nut configured on the proxmox to shut down everything in the event of a power loss.
No, it matters. Consumer/prosumer SSDs without a PLP won't be able to use the SSDs DRAM cache for sync writes resulting in magnitudes of less performance and higher wear when doing these. Having a UPS doesn't help for SSD performance, as the SSDs firmware doesn't know if there is a redundant PSU and UPS or nothing. All the SSDs firmware cares about is if the SSD got it's own backup power built-in, so the volatile DRAM cache could be quickly dumped into the SLC cache once a power outage is detected, turning that DRAM into some kind of non-volatile memory and therefore allowing to use DRAM for important sync writes.
 
You could consider buying 2 or maybe 4 used Enterprise drives with PLP. You won’t get NVMEs of this class for cheap but solid SSD drives. With 4 drives a striped ZFS mirror would give you best performance. As @Dunuin wrote, an alternative would be a MDADM mirror with a Debian 12 base install and PVE on top of it.
 
  • Like
Reactions: justinclift
I don't need the "best" performance and I don't need 4 drives, I just wanted to know if the old big drops in ZFS performance still existed.
ZFS right now is running with two Samsung 850 evo for years just fine without any optimization, despite proxmox indicating wear level to 99% for years.
 
another way is ext4 (and lvmthin for VM) + daily backup with PBS.
+ some 3rd-party backup tool to backup the PVE host itself, as PVE is lacking the ability to backup itself or export its configs (yet).
But proper backups should be done anyway, no matter if raid1 or single disks are used. Raid1 simple saves a lot of work and downtime.
 
Last edited:
@Nicolas Simond As a data point, there are a lot of super cheap SAS SSDs on Ebay. The 400GB ones are especially plentiful, have tonnes of endurance, and play nicely in arrays. Stuff like this are all about US$20 each and would likely have 90%+ of their endurance left:
Here's some 800GB ones, for about US$45 each:
 
@Nicolas Simond As a data point, there are a lot of super cheap SAS SSDs on Ebay. The 400GB ones are especially plentiful, have tonnes of endurance, and play nicely in arrays. Stuff like this are all about US$20 each and would likely have 90%+ of their endurance left:
Here's some 800GB ones, for about US$45 each:
I don't care buying new disk and I don't have SAS interface by the way

There is no point giving advice no one asked without knowing the underlying hardware and what it the project. We talk about nvme, not buying used sas disk on ebay.

Edit: My two 840 evo that run XFS and my vms for year currently have more than 240TB written each, and they still run at full speed without any problem.
 
Last edited:
I don't have any problems here, that's the point.

I just wondered if old issues that I've read on the internet regarding NVMe drives and XFS (and proxmox) like we can see here (https://www.reddit.com/r/zfs/comments/112v7n9/terrible_performance_loss_on_nvme_drives_with_zfs/) were still a thing.

Since the 5th reply, everything is out of subject.
But not just a ZFS thing. No matter what filesystem you use, once you hit a consumer/prosumer SSD with server workloads (databases, continuous writes, ...) it will suck. ZFS just makes this even worse because of the massive overhead.
 
I just wondered if old issues that I've read on the internet regarding NVMe drives and XFS (and proxmox) like we can see here (https://www.reddit.com/r/zfs/comments/112v7n9/terrible_performance_loss_on_nvme_drives_with_zfs/) were still a thing.
Interesting link. There's a detailed response to that which makes it look like the initial results were from a super bad misconfiguration, but there's no further follow up afterwards with updated test and results. :(
 
Interesting link. There's a detailed response to that which makes it look like the initial results were from a super bad misconfiguration, but there's no further follow up afterwards with updated test and results. :(
I've done some tests from my side with one disk right now:

Command used:
Code:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=64 --size=15G --readwrite=randrw --rwmixread=75



BTRFS - default settings

Code:
Run status group 0 (all jobs):
   READ: bw=3054MiB/s (3202MB/s), 3054MiB/s-3054MiB/s (3202MB/s-3202MB/s), io=11.2GiB (12.0GB), run=3749-3749msec
  WRITE: bw=1043MiB/s (1094MB/s), 1043MiB/s-1043MiB/s (1094MB/s-1094MB/s), io=3912MiB (4102MB), run=3749-3749msec

ZFS - No compression - Ashift 13 (default ashift, 12)

Code:
Run status group 0 (all jobs):
   READ: bw=1550MiB/s (1625MB/s), 1550MiB/s-1550MiB/s (1625MB/s-1625MB/s), io=11.2GiB (12.0GB), run=7387-7387msec
  WRITE: bw=530MiB/s (555MB/s), 530MiB/s-530MiB/s (555MB/s-555MB/s), io=3912MiB (4102MB), run=7387-7387msec


Results with EXT4 are almost the same as with BTRFS.
I think I will go with BTRFS for the reinstallation of this server, so I can test this file system for the first time, since ZFS doesn't perform well with Firecuda's 530.

It's even worse with the default ashit, 12, I don't even get a third of the speed I have with Ashift 13.
 
  • Like
Reactions: justinclift
It's even worse with the default ashit, 12, I don't even get a third of the speed I have with Ashift 13.
Cool. That sounds like the threshold being mentioned in that issue. Did you try with ashift 14 (or higher?) as well, just to see where the upper limit is?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!