Strange disk caching mode results and questions

Republicus

Well-Known Member
Aug 7, 2017
137
22
58
41
I have been testing my VM disk performance on an NFS synchronous shared storage. The results have left me scratching my head trying to figure out whats going on. The results may be the expected behavior, but even so I am lost how that may be.

On my NFS synchronous share I created a VM (Linux Mint) and attached several disks in both QCOW2 format and RAW. I noticed some massive difference in performance based upon the file-system (EXT4 and XFS) of those disks. This lead me to believe the cache mode of the disks were hitting the SLOG/ZIL on my zpool differently so I began to investigate my zpool arc summary.

In answering my first suspicion, XFS out preforms EXT4 regardless of the test results that follow; but other questions linger.

My zpool sync setting on storage/dataset of SAN/NAS for all disks that are tested:

Code:
root@pvesan:/# zfs get sync raid50vol/pve/nfs-storage
NAME                       PROPERTY  VALUE     SOURCE
raid50vol/pve/nfs-storage  sync      always    local

My zpool ZIL summary before No-Cache tests:

Code:
ZIL committed transactions:                                         4.4M
        Commit requests:                                            3.2M
        Flushes to stable storage:                                  2.7M
        Transactions to SLOG storage pool:          113.1 GiB       2.4M
        Transactions to non-SLOG storage pool:      232 Bytes          1

My VM disks:
1) scsi0: Linux OS disk; QCOW2; EXT4 formatted
2) scsi1: QCOW2; EXT4 formatted
3) scsi2: RAW; EXT4 formatted
4) scsi3: QCOW; XFS formatted
5) scsi4: RAW; XFS formatted

My test results, first where all disk caches set to "No Cache":

Code:
> EXT4 QCOW2 Linux Mint Guest root (scsi0) disk:

root@nfs-linuxmint:/home/republicus# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 5.94764 s, 722 MB/s

ZIL committed transactions:                                         4.4M
        Commit requests:                                            3.2M
        Flushes to stable storage:                                  2.7M
        Transactions to SLOG storage pool:          113.1 GiB       2.4M
        Transactions to non-SLOG storage pool:      232 Bytes          1

> EXT4 QCOW2 (scsi1) disk:

root@nfs-linuxmint:/mnt/republicus/EXT4-QCOW# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 5.53517 s, 776 MB/s

ZIL committed transactions:                                         4.4M
        Commit requests:                                            3.2M
        Flushes to stable storage:                                  2.7M
        Transactions to SLOG storage pool:          113.1 GiB       2.4M
        Transactions to non-SLOG storage pool:      232 Bytes          1

> EXT4 RAW (scsi2) disk:

root@nfs-linuxmint:/mnt/republicus/EXT4-RAW# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.98988 s, 614 MB/s

ZIL committed transactions:                                         4.5M
        Commit requests:                                            3.3M
        Flushes to stable storage:                                  2.8M
        Transactions to SLOG storage pool:          117.1 GiB       2.4M
        Transactions to non-SLOG storage pool:      232 Bytes          1

> XFS QCOW (scsi3) disk:

root@nfs-linuxmint:/mnt/republicus/XFS-QCOW# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.06323 s, 1.4 GB/s

ZIL committed transactions:                                         4.5M
        Commit requests:                                            3.3M
        Flushes to stable storage:                                  2.8M
        Transactions to SLOG storage pool:          117.1 GiB       2.5M
        Transactions to non-SLOG storage pool:      232 Bytes          1

> XFS-RAW (scsi4) disk:

root@nfs-linuxmint:/mnt/republicus/XFS-RAW# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.88909 s, 1.1 GB/s

ZIL committed transactions:                                         4.6M
        Commit requests:                                            3.3M
        Flushes to stable storage:                                  2.8M
        Transactions to SLOG storage pool:          121.1 GiB       2.5M
        Transactions to non-SLOG storage pool:      232 Bytes          1

My zpool ZIL summary prior to Write-back tests:


Code:
ZIL committed transactions:                                         4.6M
        Commit requests:                                            3.4M
        Flushes to stable storage:                                  2.9M
        Transactions to SLOG storage pool:          121.2 GiB       2.5M
        Transactions to non-SLOG storage pool:      232 Bytes          1

Second tests where all disk caches set to "Write-back":

Code:
> EXT4 QCOW2 Linux Mint Guest root (scsi0) disk:

root@nfs-linuxmint:/home/republicus# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 5.91078 s, 727 MB/s

ZIL committed transactions:                                         4.6M
        Commit requests:                                            3.4M
        Flushes to stable storage:                                  2.9M
        Transactions to SLOG storage pool:          121.2 GiB       2.5M
        Transactions to non-SLOG storage pool:      232 Bytes          1

> EXT4 QCOW2 (scsi1) disk:

root@nfs-linuxmint:/mnt/republicus/EXT4-QCOW# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 5.7421 s, 748 MB/s

ZIL committed transactions:                                         4.6M
        Commit requests:                                            3.4M
        Flushes to stable storage:                                  2.9M
        Transactions to SLOG storage pool:          121.2 GiB       2.5M
        Transactions to non-SLOG storage pool:      232 Bytes          1

> EXT4 RAW (scsi2) disk:

root@nfs-linuxmint:/mnt/republicus/EXT4-RAW# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.47143 s, 664 MB/s

ZIL committed transactions:                                         4.7M
        Commit requests:                                            3.4M
        Flushes to stable storage:                                  2.9M
        Transactions to SLOG storage pool:          125.2 GiB       2.5M
        Transactions to non-SLOG storage pool:      232 Bytes          1


> XFS QCOW (scsi3) disk:

root@nfs-linuxmint:/mnt/republicus/XFS-QCOW# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.34364 s, 1.3 GB/s

ZIL committed transactions:                                         4.7M
        Commit requests:                                            3.4M
        Flushes to stable storage:                                  2.9M
        Transactions to SLOG storage pool:          125.2 GiB       2.5M
        Transactions to non-SLOG storage pool:      232 Bytes          1


> XFS-RAW (scsi4) disk:

root@nfs-linuxmint:/mnt/republicus/XFS-RAW# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 4.34982 s, 987 MB/s

ZIL committed transactions:                                         4.8M
        Commit requests:                                            3.5M
        Flushes to stable storage:                                  3.0M
        Transactions to SLOG storage pool:          129.2 GiB       2.6M
        Transactions to non-SLOG storage pool:      232 Bytes          1


Conclusion:

1) Disk cache mode in PVE GUI are ignored on NFS synchronous shares.
2) QCOW2 disks do NOT commit to ZIL on NFS synchronous shares (regardless of disk cache mode)
3) RAW disks DO commit to ZIL on NFS synchronous shares (regardless of disk cache mode)

Questions:

1) Is this the expected behavior?
2) Are QCOW2 disks write-cache always write-through on NFS sync shares?
3) RAW disks write-cache always write-back on NFS sync shares?
4) Possible to set write-back on QCOW2 disks?

My understanding (which I could be wrong) is an SLOG/ZIL attached zpool where sync=always is that write-through should hit the ZIL but the VM will wait for the ZIL to flush to disk. Whereas write-back will commit to ZIL and inform the VM that the write operation is completed before flushing to disk.

If this is correct, it would seem somehow QCOW2 disks are behaving as async even when stored on a synchronous NFS mounted share. Additionally, the QCOW2 disk is at the same time behaving as though the zpool is in sync=disabled mode and not hitting the ZIL whatsoever... Whereas, RAW disks are behaving normally on the same storage.
 
Last edited:
I have now tested all caching modes.

The caching mode is completely ignored in all circumstances.

RAW will cache as writeback, regardless of config setting.
Likewise, QCOW2 is caching as writethrough, none, or directsync every time regardless of disk caching mode.

RAW disk set to write-through:
Code:
ARC BEFORE

ZIL committed transactions:                                         5.0M
        Commit requests:                                            3.7M
        Flushes to stable storage:                                  3.1M
        Transactions to SLOG storage pool:          134.8 GiB       2.6M
        Transactions to non-SLOG storage pool:      232 Bytes          1


> XFS-RAW (scsi4) disk:

root@nfs-linuxmint:/mnt/republicus/XFS-RAW# sync; dd if=/dev/zero of=tempfile bs=1M count=4096; sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 4.38317 s, 980 MB/s


ARC AFTER

ZIL committed transactions:                                         5.1M
        Commit requests:                                            3.8M
        Flushes to stable storage:                                  3.2M
        Transactions to SLOG storage pool:          138.8 GiB       2.7M
        Transactions to non-SLOG storage pool:      232 Bytes          1

qm showcmd VMID
Shows the right caching mode as configured.
 
I can't seem to get any attention to this thread.
If someone could verify my findings, it would be helpful in troubleshooting.
I've created a bug report #2411.

UPDATE:
This applies to PVE 6. My nodes are latest 6.0-2;

I just tested PVE 5.4-1 and QCOW2 disks are properly syncing to SLOG on the same NFS synchronous share on that node.

I have not yet tested RAW disks but will try further as I can (only 1Gbe network). None the less, I am able to verify at least QCOW2 disks are synchronous on PVE 5.4-1 and not on PVE 6.
 
Hi,

your testing has some major problems.
1.) bs=1M but the ZIL is only used by small sync writes. I guess the threshold is 64K.
2.) dd is not a proper benchmark tool. Use fio instead.
3.) To use zeros for benchmarking is not real-world use and can trigger failures.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!