Question regarding Disk Write Cache

SamFischer

Renowned Member
Feb 10, 2013
3
0
66
Hello,
I need something to be clarified regarding disk caching.
Here you can find a table regarding different caching options (none, writethrough, writeback, directsync, unsafe):
https://pve.proxmox.com/wiki/Performance_Tweaks

There is a column called "Disk Write Cache".
Which disk cache is it about? The one from the physical device (Cache on HDDs and SSDs)?
Or is this the write cache from ZFS (ARC) (which is Host System’s RAM)?

Regards
Sam
 
Last edited:
It's about the setting for the virtual disk, which basically has nothing to do with the host, the hard drive or ZFS. But it does have an impact on their behavior. But that's all there if you read the section.
 
Thanks for answering that fast.

Let me ask some other questions to make it clear for me.
1. Where are data cached when using ZFS (RAIDZ) and "none", "writeback" or "unsafe" at VM's disk cache?
2. Where are data cached when using Hardware RAID and "none", "writeback" or "unsafe" at VM's disk cache?

All three options do enable "Disk Write Cache" according to the mentioned table. The only option from my point of view is using host system's RAM for caching.

The hardware RAID controller does have similar options (writeback, write through), but what’s configured at VM's disk cache does not change the behaviour at hardware RAID controller, right?

Regards
Sam
 
Last edited:
The hardware RAID controller if on the physical host box- set this to passthrough PCI mode or a.k.a. HBA mode.
For ZFS - this is required.

For standalone VMs, best option is "VirtIO SCSI"
 
Thank you both for sharing your knowledge.

First I like to say, please be careful in recommending putting a RAID controller in HBA mode and using ZFS. I have seen this recommendation a lot in this forum and most of the time without precise information about the scenario. According to my tests this is not recommendable in any case. Did you ever made tests comparing ZFS (RAIDZ) with hardware RAID 5? If so, you should have seen that in heavy IO tasks the CPU usage on the host can get up to 70% only by ZFS and its storage tasks (tested with fio random read/write and five datacentre SSDs on two socket system). I mean that’s a lot and you can get serious trouble in all VMs on that host.
If you do have a separate storage server where its CPU is only used for storage operations, ZFS might be fine, but not in any case, i.e. if you need to use the hosts CPU also for VMs.
The reason buying a hardware raid controller is also to have a separate CPU for storage tasks. Same tests show almost no impact on hosts CPU by running the same fio tests with hardware RAID 5.

I also did some tests with "none", "writeback" and “direct sync” at VM's disk cache. When using ZFS "none" seems to use write cache but when using hardware RAID, the option “none” and also “direct sync” seem to behave the same and have no impact on performance tests. Therefore, I assume that no cache is used in case of hardware RAID when using “none” and also “direct sync”.
However using "writeback" does impact performance (much higher reads and writes) according to fio tests not only on ZFS but also when using hardware RAID. But be aware that from my point of view the hosts memory (RAM) is used for caching, also in case of hardware RAID. In case of power outage or if someone is hitting the power button on the server because it may does not respond, you can lose data or get corrupted data.

I also noticed some interesting thing when using fio on ZFS. The measurement data you get, might be deceptive. Starting the same test at same time on a server with hardware RAID 5 and parallel on a server with ZFS (RAIDZ) show higher performance on ZFS while both tests finishing in almost the same time (both servers with identical hardware). More interesting, at fio test result from ZFS, there is shown a much lower operational time than on the result from hardware RAID. It seems that fio test is not meaningful in any case and you should be careful when comparing them.
What you can see is that fio is starting its output later compared to the system with hardware RAID. I assume this is because of ZFS caching. fio may starts to calculate later while it still has started its read/write operation, which then incorrectly show higher performance.
 
Last edited:
Ok, going back to your specific questions:

1. Where are data cached when using ZFS (RAIDZ) and "none", "writeback" or "unsafe" at VM's disk cache?
ZFS does this within the ZIO pipeline (in host memory) for all write operations - this is before the record is assembled then becomes the TXG which is flushed syncronously. Semantically I myself would say buffered rather than cached at this stage. Therefore all these 3 options of "none" "writeback"
and "unsafe" effectively will cause double caching to take place by ZFS and separately by some onboard disk write cache buffer.

When using ZFS, you only want directsync. This is why I always use disks and choose smallest onboard write cache, as ZFS is effectively ignoring/bypassing this entirely. It goes RAM -> straight to disk.

2. Where are data cached when using Hardware RAID and "none", "writeback" or "unsafe" at VM's disk cache?
The host RAM is not used ( For writes this is ). Only within whatever memory is available to the Hardware RAID controller, usually being; 2G, 4G, or 8G onboard and or the disk buffer. Subsequent reads will be from ZFS ARC if using this ... combo.

3. The hardware RAID controller does have similar options (writeback, write through), but what’s configured at VM's disk cache does not change the behaviour at hardware RAID controller, right? - Correct, this cannot change the configured controller setting.

Seems to have been asked elsewhere, this may help too. Perhaps I can look to help them update this doc in some form:
- https://forum.proxmox.com/threads/disk-cache-wiki-documentation.125775/

If this is more about obtaining better performance:

"A SCSI controller of type VirtIO SCSI single and enabling the IO Thread setting for the attached disks is recommended if you aim for performance. This is the default for newly created Linux VMs since Proxmox VE 7.3. Each disk will have its own VirtIO SCSI controller,and QEMU will handle the disks IO in a dedicated thread." - ( this is a per VM setting )

It looks like Disk Cache modes are dependent of the specific hardware used, I myself cannot see any options on these in my setup in the GUI console. Where can I see those? I just use ZFS for everything as you can maybe tell. :)
 
Thank you both for sharing your knowledge.

First I like to say, please be careful in recommending putting a RAID controller in HBA mode and using ZFS. I have seen this recommendation a lot in this forum and most of the time without precise information about the scenario. According to my tests this is not recommendable in any case. Did you ever made tests comparing ZFS (RAIDZ) with hardware RAID 5? If so, you should have seen that in heavy IO tasks the CPU usage on the host can get up to 70% only by ZFS and its storage tasks (tested with fio random read/write and five datacentre SSDs on two socket system). I mean that’s a lot and you can get serious trouble in all VMs on that host.
If you do have a separate storage server where its CPU is only used for storage operations, ZFS might be fine, but not in any case, i.e. if you need to use the hosts CPU also for VMs.
The reason buying a hardware raid controller is also to have a separate CPU for storage tasks. Same tests show almost no impact on hosts CPU by running the same fio tests with hardware RAID 5.

I also did some tests with "none", "writeback" and “direct sync” at VM's disk cache. When using ZFS "none" seems to use write cache but when using hardware RAID, the option “none” and also “direct sync” seem to behave the same and have no impact on performance tests. Therefore, I assume that no cache is used in case of hardware RAID when using “none” and also “direct sync”.
However using "writeback" does impact performance (much higher reads and writes) according to fio tests not only on ZFS but also when using hardware RAID. But be aware that from my point of view the hosts memory (RAM) is used for caching, also in case of hardware RAID. In case of power outage or if someone is hitting the power button on the server because it may does not respond, you can lose data or get corrupted data.

I also noticed some interesting thing when using fio on ZFS. The measurement data you get, might be deceptive. Starting the same test at same time on a server with hardware RAID 5 and parallel on a server with ZFS (RAIDZ) show higher performance on ZFS while both tests finishing in almost the same time (both servers with identical hardware). More interesting, at fio test result from ZFS, there is shown a much lower operational time than on the result from hardware RAID. It seems that fio test is not meaningful in any case and you should be careful when comparing them.
What you can see is that fio is starting its output later compared to the system with hardware RAID. I assume this is because of ZFS caching. fio may starts to calculate later while it still has started its read/write operation, which then incorrectly show higher performance.
This looks to be a longer separate thread for a conversation that can be discussed outside of this exact topic.
I would/do have questions and comments to think about.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!