ZFS zvol vs directory

tjk · Nov 21, 2022

Hi,

is it possible to create the ZFS on a local node and use a directory on top of that to store the VM's vs using zvols? When i go to add a directory storage path, it doesn't show the ZFS that I created on the node.

I'm seeing host node loads of 120+ when doing heavy IO on a node with 6xNVME enterprise U2 drives, ZFS Raid 10 using zvols. If I live migrate another VM to this host node, it impacts all the VM's running on the node, 120+ load, zvol chewing up all the CPU etc.

Crazy, I run several dozen TrueNAS filers using ZFS and NFS presented to the host nodes with the same sort of load and I don't break a sweat on the TrueNAS servers. They are not running iscsi/zvol, just zfs with nfs.

It looks like zvols just can't keep up with the IO when running on the local node and it crushes my nodes.

Dunuin · Nov 21, 2022

Yes, you can create a dataset, then create a directory storage pointing to the mountpoint of the dataset. But don't forget to do a pvesm set YourDirStorageId --is_mountpoint yes after creting it or you might run into problems when the mount fails, filling up your root filesystem.
And using a directory Storage will use qcow2 not raw. So CoW on top of CoW, which should result in even worse performance.

Maybe you should check if you can optimize your pool. For example check if the SSDs are formated for 512B or 4K sectors. If the ZFS storage is using 8K or 16K block size, if you got enough RAM for your ARC and if you set relatime=on for your pool.

tjk · Nov 21, 2022

Dunuin said:
Maybe you should check if you can optimize your pool. For example check if the SSDs are formated for 512B or 4K sectors. If the ZFS storage is using 8K or 16K block size, if you got enough RAM for your ARC and if you set relatime=on for your pool.

Thanks, I've been fighting this on and off for almost a year and everyone tells me it is a config issue. Willing to keep testing things, but I'm pretty sure I've exhausted my config options.

It is frustrating because like I said, I can do the same write workloads over NFS to TrueNAS and the TrueNAS devices don't spike in load anywhere near what I see on the Proxmox node. And yes, I understand I'm also running VM's on top of the same node, but when zvol shows 40-50 processes all in top and my load shoots up to 120-140, I think this is a local ZFS performance.

I'm not sure if the TrueNAS 13 Core (FreeBSD based) is using the same OpenZFS codebase that Proxmox is using on Linux.

To answer your questions, The NVME are 4k formatted, I've used 8k, ~~12k~~, and 16k block size on the pool, I have 256G of ram in the server and only run 3 or 4 VM's with 16G RAM, I have ARC set to min 8G and max 16G. The only option you pointed out that I have not set if relatime, I will give that a try.

Thank you for your reply. I am happy to post any info from the node, set any config options, etc to work through this, if it is solvable.

Dunuin · Nov 21, 2022

12K blocksize isn't possible with ZFS. It must be part of 2^X. So only 8K or 16K should be compatible and both are not perfect (which would be 12K which isn't supported).

And PVE and TrueNAS both use OpenZFS of similar versions.

tjk · Nov 21, 2022

Ok, I set relatime=on on the pool.

1. ZERO VM's are running right now.
2. I am moving a disk from TrueNAS to the RAID10 6xNVME ZFS pool on the local host node, I'm reading around 300MB/s off of TrueNAS to the local node.
3. Local Proxmox load is 50+ and climbing, see attached top capture.
4. When I've done some of this testing with other VM's running on the same node, the load climbs to 120+ and the running VM's start to experience IO errors.

LnxBil · Nov 21, 2022

I'm so sorry for @tjk. I can feel your pain. This is really, really odd. I'm using ZFS on much, much weaker hardware (even on Atom-based systems with SATA enterprise SSDs) and have better performance. The only thing I don't see in this post is to try out different kernel versions. Maybe that could shed some light on the problem? Try also to monitor the I/O times with "ordinary" iostat in order to see if one disk is much slower than others. I had a similar problem once with spinning rust and identified it with iostat.

tjk · Nov 21, 2022

LnxBil said:
I'm so sorry for @tjk. I can feel your pain. This is really, really odd. I'm using ZFS on much, much weaker hardware (even on Atom-based systems with SATA enterprise SSDs) and have better performance. The only thing I don't see in this post is to try out different kernel versions. Maybe that could shed some light on the problem? Try also to monitor the I/O times with "ordinary" iostat in order to see if one disk is much slower than others. I had a similar problem once with spinning rust and identified it with iostat.

Thanks, the IO performance is about where it should be, my message is that ZFS overhead consumes so many cpu resources that in heavy IO workloads running on the same box, it is causing problems.

I've tried various NVME drives (SN630, SK Hynix, HGST SN200's, and Micron 9300 MAX Pro's) ranging from 6x to 10x drives in RAID10 - all show the same performance.

For the folks claiming super high IO (write) workloads on their host nodes and not seeing this, I'd like to see your performance/iostats along with your top output.

LnxBil · Nov 22, 2022

tjk said:
For the folks claiming super high IO (write) workloads on their host nodes and not seeing this, I'd like to see your performance/iostats along with your top output.

What commands excatly? I can run them on my machines, which are on the lower end.

Search

Search

ZFS zvol vs directory

tjk

Active Member

Dunuin

Distinguished Member

tjk

Active Member

Dunuin

Distinguished Member

tjk

Active Member

Attachments

LnxBil

Distinguished Member

tjk

Active Member

LnxBil

Distinguished Member

We value your privacy