Performance problem with replication to HDD

RobertWojtowicz

New Member
May 13, 2024
10
0
1
Hi,

Maybe you have some ideas for optimization, solution to the problem.
1. A direct 10G network was set up between the two servers for Proxmox replication;
2. To improve the performance of replication and ZFS, I set zfs set sync=disabled and traffic encryption for replication was disabled;
3. ARC set to min=16GB and max 64GB ram, virtual machines for the test each 1TB disk from the ZFS pool with HDD (raidz2 with 8 disks);
4. Proxmox 8.2.2

The problem is with the bottleneck on writes, if I don't limit the bandwidth to 100MB, I first have a spike to 160MB and then it stabilizes at 60MB, with no return, as if something is clogging up. When I set 100MB rigidly I was getting a steady 100MB and in consevence faster copying over time.

I noticed on writes high I/O delay on the CPU, the question is how to solve this, it is probably not the problem of a small ARC, just think about L2ARC ?
Maybe other ideas ?

What happens when the L2ARC is damaged ?


BR,
Robert
 
You may try adjusting the values for zfs_dirty_data_max . Others have reported benefits on ZFS RAIDz with bit writes [1]. I've personally used it to try to optimize some PBS datastores in the past, but didn't notice too much benefit although reducing it did help stabilizing the write speed at the cost of max throughput. Not currently using it on production.

ARC and L2ARC is used for reads only, so I wouldn't expect it to help in this scenario.

I set zfs set sync=disabled
That's a very bad idea if you appreciate your data: you will lose data eventually if anything goes wrong.

[1] https://forum.proxmox.com/threads/windows-vm-i-o-problems-only-with-zfs.136368/post-628417
 
That's a very bad idea if you appreciate your data: you will lose data eventually if anything goes wrong.

[1] https://forum.proxmox.com/threads/windows-vm-i-o-problems-only-with-zfs.136368/post-628417
Yes, I thought about it, I do not expose to data corruption, only data loss of about 5s (for example, as there will be a power outage, but it is not threatening everything is under UPS). It is to build an archive, also it is not a transactional system, and the data will be checked and read-only.

Ok thanks for the tip with zfs_dirty_data_max, I think aboutn a how much to set.
 
Last edited:
Just to clarify: it's not just power outages. Anything that may fail in the path from the destination server nic (replication uses TCP, so lets rule out network problems/issues as the would be hopefully detected by the protocol) to the disk itself may cause dataloss. Kernel issues, HBA issues, a flaky cable, drive issues. Once you sort out the performance issue you are seeing, try enabling sync again: it will probably not hurt performance.
 
Just to clarify: it's not just power outages. Anything that may fail in the path from the destination server nic (replication uses TCP, so lets rule out network problems/issues as the would be hopefully detected by the protocol) to the disk itself may cause dataloss. Kernel issues, HBA issues, a flaky cable, drive issues. Once you sort out the performance issue you are seeing, try enabling sync again: it will probably not hurt performance.
Ok,
I now have 128 GB of ram, in the file /sys/module/zfs/parameters/zfs_dirty_data_max is set to 4294967296 (4 GB), what value do you suggest to start with ?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!