Epyc 7402P, 256GB RAM, NVMe SSDs, ZFS = terrible performance

tsoo

New Member
Jan 6, 2020
6
0
1
34
Hello,
I'm struggling with this since weeks which is why I now open this thread.

I've bought a new Supermicro server recently with the following configuration:
- AMD EPYC 7402P 24c/48t CPU
- 256GB ECC RAM
- 2x Samsung EVO 1TB m.2 NVMe SSD (on board)
- 2x Samsung PM983 2,5" U.2 NVMe SSD
- 2x 1TB 2,5" SATA HDD

What I would like to have configuration wise (feel free to suggest something better):
- Proxmox 6.1
- ZFS Raid 1 on the Samsung EVOs for VM storage as well as running Proxmox on them
- ZFS Raid 1 on the Samsung PM983s for VM storage
- ZFS Raid 1 on the HDDs for VMs not needing too much IO as well as log storage as to not wear out the SSDs quite as fast

What I've done:
- I've set up the config above with proxmox default settings (8k block size, ashift12, lz4 compression)
- Installed a test Windows Server 2019 VM (150GB SCSI, VirtIO NIC, 6 Cores, 4GB RAM)
- Copied a large test file to this VM (~6GB file)
- Then made a copy of this file in the VM locally.

The issue I encounter:
- While it copies this file, the host machine goes up to almost 100% CPU on all cores.
- Over the span of some minutes, memory consumption on the host goes to 130+GB with only one VM running. This can be accelerated by repeating the copy operation in the VM.
- Transfer speeds for this copy operation are ~300-600MB/s . I know that the EVOs are not enterprise SSDs but they are rated for 3+GB/s and they more or less deliver these speeds if I install plain Windows on the Host. Also, it's not a "steady" copy, but rather does 600MB/s, then 0, then 350 and so on until it finishes.
- atop shows 100% busy time on the SSDs in question
- During the copy, there are many threads of the "zvol" process which seem to lock up the system.

What I've checked/tried:
- Lowering the max. zvol thread count to 8. Not good, even worse performance
- Changing the amount of RAM ZFS can use for caching. Didn't help either
- Changing the block size to 128k. This actually helped. CPU "only" at around 50%, transfer speeds around 1GB/s, memory consumption still high but if it get's released when VMs need it that's fine for me. But the speeds are still far from what they should be which leads me to believe that it's indeed a configuration issue
- Tried out LVM instead of ZFS. Works as intended. Fast, no CPU issues during copy, normal RAM consumption, but I'd like to set up a supported ZFS Raid 1 instead of LVM/mdraid.

I would be very thankful if someone could give me some ideas as what the cause could be and, ideally, on how to make this server perform as it should :)
Thanks!

Regards,
tsoo
 
Some things I would try:
Install some monitoring tool, especially to know how much of the RAM is going towards ZFS' ARC (cache in ram) and to see some stats from the disks (avg write delay, queue,...) but other systems stats might give insight as well.

How did you configure the disks of the VM? SCSI with the SCSI VirtIO controller?
Can you post the config fo the VM? (qm config <vmid>)

Enabling IO threading for the VMs disk might help as well

To monitor how ZFS is doing you can look at and maybe log zpool iostat -v -n 1.
 
Too bad but I guess you need to use the machine in production and cannot tinker around for too long.

It would have been interesting to know what the cause was. I doubt that it was the same problem that LTT encountered in that video because you are far from having that many really fast SSDs. There was another thread in the forum here where one user encountered that same problem, but that was with 14 enterprise SSDs.

https://forum.proxmox.com/threads/clone-vm-on-zfs-raid10-very-slow.64332
 
Too bad but I guess you need to use the machine in production and cannot tinker around for too long.

Correct.


I doubt that it was the same problem that LTT encountered in that video because you are far from having that many really fast SSDs. There was another thread in the forum here where one user encountered that same problem, but that was with 14 enterprise SSDs.
Which makes me wonder if it is actually related to the quality/grade of the SSDs since various configurations seem to have similar issues. Not saying that they all have the same root cause, but strange nonetheless.
 
Hi,
sure. It's a Supermicro H12SSW-NT.
Hello.

Ive got the same motherboard & same cpu. I bought 2x corsair pci 4.0 m.2 ssd which turned out not to be perfect match for the 2 x M.2 slots on this motherboard , even if its ok.

Reading the manual i discovered:

1) M.2 Interface: 2x PCI-E 3.0 <-- not pci 4.0
2) "These particular PCI-E M.2 supports M-Key (PCI-E x2) storage cards. M.2-C1 can support a speed of PCI-E x4, when one M.2 device is installed"


I will now order 2 x Samsung PM1733 with u.2 interface , hope it gives me insane speed :)
 
Interesting... I have almost identical configuration, and pretty much the same problems. One of my VMs monitors CCTV IP cameras, so it's writing constantly and the culprit definitely appears to be ZFS. I'm going to install hardware RAID to the SSDs and re-install PM 6.2 to the server, and restore VMs and see if that will help. The slow performance is definitely related to I/O to disk and I have super fast SSDs on this Supermicro Mobo. But alas, after every attempt to configure within the setup, the performance is terrible. I'll post back after moving to hardware RAID what results I get, but glad to have found this thread. I was sure I wasn't alone here....
 
I have similar issues, any updates on this?
I got two of my PVE boxes fixed, but I left ZFS and went back to hardware RAID cards (Dell H700s), and all problems went away immediately. Also I did notice a massive improvement based on the types of HDDs being used - SAS 15K drives with ZFS was bearable, although not optimal. The same drive arrays with LVM works flawlessly. Those HW Raid cards are cheap on eBay BTW.

Myles
 
Correct.



Which makes me wonder if it is actually related to the quality/grade of the SSDs since various configurations seem to have similar issues. Not saying that they all have the same root cause, but strange nonetheless.
we use evo ssd in large numbers and there is no specific speed problem with it.

you just cannot use it in host with high io. I ve seen evos going to 99% wearout within 1 month under certain workloads (dbs mainly). others run for years without problems. and of course there are fast pieces around
 
we use evo ssd in large numbers and there is no specific speed problem with it.

you just cannot use it in host with high io. I ve seen evos going to 99% wearout within 1 month under certain workloads (dbs mainly). others run for years without problems. and of course there are fast pieces around
samsung evo have terrible sync write performance, so you should really avoid them for zfs or ceph (for their journal).

https://www.sebastien-han.fr/blog/2...-if-your-ssd-is-suitable-as-a-journal-device/

2MB/s for 4k write, 500iops ..

so for read workload, why not, but for any write workload (database,...), you should really avoid them. (and endurance is pretty bad too)
 
I used ZFS in the past for a while and it was a huge pita. I love the theory but in practice it performs poorly for intensive applications and causes a lot of headaches. On the flip side I have an hp server 4 x Raid 10 x NVME with mdadm/lvm/ext4 and had major corruptiion which completely destroyed the data eventually. Still working on it
 
zfs offers very much and im a big fan of it.

But no one can decline that zfs is in most cases even with block tuning and whatever else much slower as anything else (lvm/etc)
Especially on consumer disks it's a very big hit.

So my main recommendation would always be:
- If you need only snapshots, you probably do better with lvm. Even shadow copies works with lvm...
- If you have enterprise ssds, zfs arc and other benefits overweight in generally the downsides, so go with zfs.
- If you need anything that lvm doesn't provide, go with zfs.

Probably btrfs is worth looking, but i personally don't have long time experience with it.
Btrfs offers almost the same feature-set as zfs, but probably with more performance. (It's what i think, but didn't compared or tested)

However, don't take it as zfs is worse etc... zfs has arc and cache and ha and 500 more things, it's amazing.
I just try to tell, that you shouldn't forget that there are other things too, that probably fit your needs better.
Simply try it. We are on proxmox, where everything is supported, thats the big strength.

Edit: I forgot about xfs, used it with centos for a long time and long time ago, performance is great and xfs has many/most important features of zfs either.
But as direct competitor for zfs, there is really only btrfs.

Cheers.
 
Last edited:
  • Like
Reactions: spirit

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!