Apply ZFS xattr=sa dnodesize=auto on existing pool with data.

Jota V.

Well-Known Member
Jan 29, 2018
59
9
48
49
Hi, we have a cluster with five nodes. All nodes have Proxmox installed on SSD and 4 x 2 TB SATA (3.5" 7200 RPM) ZFS Raid 10. All nodes have between 90 GB and 144 GB RAM.

On nodes 1 to 4, we have about 30-40 LXC container with Moodle on each node. All databases are on external server.

All LXC are replicated on other node once. All LXC on node 1 are replicated on 3 (and vice versa). All LXC on node 2 are replicated on 4 (and vice versa). Replication is every 30 minutes.

We are seeing that doing a "du -sh /var/www/moodledata" on a LXC or in directory pool takes minutes!! (data is between 50-200 GB)

Looking on wiki, we saw this https://pve.proxmox.com/wiki/ZFS:_Tips_and_Tricks#LXC_with_ACL_on_ZFS

ZFS uses as default store for ACL hidden files on filesystem. This reduces performance enormously and with several thousand files a system can feel unresponsive. Storing the xattr in the inode will revoke this performance issue.

Code:
zfs set xattr=sa dnodesize=auto vmstore/data

And we applied to rpool/data but with all data in.

How can apply this change to existing data? Maybe a replication to other node, delete replication, make replication and move to original node?

Is this our bottleneck?

Can we improve our performance?

Also nodes 1-3 have a nvme SSD on PCI card but is a consumer SSD (Samsung SSD 970 EVO Plus 250GB), used as cache (L2ARC) , would improve performance?
 
How can apply this change to existing data? Maybe a replication to other node, delete replication, make replication and move to original node?
This is usually the way to force ZFS to write the data stream again. I think it should adhere to the new xattr setting.

Also nodes 1-3 have a nvme SSD on PCI card but is a consumer SSD (Samsung SSD 970 EVO Plus 250GB), used as cache (L2ARC) , would improve performance?
That really depends. An L2ARC still needs some RAM to hold the index.

If you can, set up some performance monitoring and have a look at the ARC size and the ARC hit ratio. If you don't come close to 100% during regular operations more caches might help.

Another thing that can be done to speed up access to metadata and if configured small files is the new VDEV class of "special" introduced with ZFS 0.8. See https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_zfs_special_device
 
This is usually the way to force ZFS to write the data stream again. I think it should adhere to the new xattr setting.

Any way to check this?

That really depends. An L2ARC still needs some RAM to hold the index.

If you can, set up some performance monitoring and have a look at the ARC size and the ARC hit ratio. If you don't come close to 100% during regular operations more caches might help.

Here is my arc_summary https://pastebin.com/cU8Vrfv8

Another thing that can be done to speed up access to metadata and if configured small files is the new VDEV class of "special" introduced with ZFS 0.8. See https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_zfs_special_device

I've seen this, but we have only one NVME per node. Need redundancy and actually we don't have. Maybe in a near future ;-)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!