Apply ZFS xattr=sa dnodesize=auto on existing pool with data.

Jota V. · Jun 19, 2020

Hi, we have a cluster with five nodes. All nodes have Proxmox installed on SSD and 4 x 2 TB SATA (3.5" 7200 RPM) ZFS Raid 10. All nodes have between 90 GB and 144 GB RAM.

On nodes 1 to 4, we have about 30-40 LXC container with Moodle on each node. All databases are on external server.

All LXC are replicated on other node once. All LXC on node 1 are replicated on 3 (and vice versa). All LXC on node 2 are replicated on 4 (and vice versa). Replication is every 30 minutes.

We are seeing that doing a "du -sh /var/www/moodledata" on a LXC or in directory pool takes minutes!! (data is between 50-200 GB)

Looking on wiki, we saw this https://pve.proxmox.com/wiki/ZFS:_Tips_and_Tricks#LXC_with_ACL_on_ZFS

ZFS uses as default store for ACL hidden files on filesystem. This reduces performance enormously and with several thousand files a system can feel unresponsive. Storing the xattr in the inode will revoke this performance issue.

Code:

zfs set xattr=sa dnodesize=auto vmstore/data

And we applied to rpool/data but with all data in.

How can apply this change to existing data? Maybe a replication to other node, delete replication, make replication and move to original node?

Is this our bottleneck?

Can we improve our performance?

Also nodes 1-3 have a nvme SSD on PCI card but is a consumer SSD (Samsung SSD 970 EVO Plus 250GB), used as cache (L2ARC) , would improve performance?

aaron · Jun 19, 2020

Jota V. said:
How can apply this change to existing data? Maybe a replication to other node, delete replication, make replication and move to original node?

This is usually the way to force ZFS to write the data stream again. I think it should adhere to the new xattr setting.

Jota V. said:
Also nodes 1-3 have a nvme SSD on PCI card but is a consumer SSD (Samsung SSD 970 EVO Plus 250GB), used as cache (L2ARC) , would improve performance?

That really depends. An L2ARC still needs some RAM to hold the index.

If you can, set up some performance monitoring and have a look at the ARC size and the ARC hit ratio. If you don't come close to 100% during regular operations more caches might help.

Another thing that can be done to speed up access to metadata and if configured small files is the new VDEV class of "special" introduced with ZFS 0.8. See https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_zfs_special_device

Jota V. · Jun 19, 2020

This is usually the way to force ZFS to write the data stream again. I think it should adhere to the new xattr setting.

Any way to check this?

That really depends. An L2ARC still needs some RAM to hold the index.

If you can, set up some performance monitoring and have a look at the ARC size and the ARC hit ratio. If you don't come close to 100% during regular operations more caches might help.

Here is my arc_summary https://pastebin.com/cU8Vrfv8

Another thing that can be done to speed up access to metadata and if configured small files is the new VDEV class of "special" introduced with ZFS 0.8. See https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_zfs_special_device

I've seen this, but we have only one NVME per node. Need redundancy and actually we don't have. Maybe in a near future ;-)

Search

Search

Apply ZFS xattr=sa dnodesize=auto on existing pool with data.

Jota V.

Well-Known Member

aaron

Proxmox Staff Member

Jota V.

Well-Known Member

We value your privacy