No network traffic for qcow2 hard disk IO over NFS

airpett

Member
May 24, 2021
1
0
6
25
Helsinki, Finland
I have a Ubuntu Desktop VM with two hard disks. Both reside on an NFS server and the connection is 10GbE. One disk is in raw format and one in qcow2 format. The image files are called vm-106-disk-0.raw and vm-106-disk-0.qcow2. Both are 200 GB in size and attached using VirtIO SCSI controller and SCSI Bus/Device. Neither one has any advanced options set and cahche is set to Default (no cache).

Now when I run the following script (qcow2 mounted at /data and raw mounted at /data2, i change that in the script between tests):

Code:
!/bin/bash

for i in {1..16}
do
        dd if=/dev/zero of=/data/test-$i bs=4k count=3000000 &
done
time -p wait

I get weird results.

For the qcow2 disk iotop shows Total DISK WRITE: 3,6 GB/s (36 Gb/s = impossible with the 10 GbE NIC) and no network traffic is shown on the summary page of the node or in the NFS server.
For the raw disk iotop shows Total DISK WRITE: 970 MB/s (9,7 Gb/s = almost full speed for the 10 GbE NIC) and the traffic is shown both on the Proxmox node summary page and in the Resource monitor of the NFS server.

In the following screenshots you can see the image files next to each other in the same storage (first image from QNAP File Station and second from PVE).


Screenshot 2021-05-24 023845.png
Screenshot 2021-05-24 023933.png

Here is a screenshot from the PVE node summary page. The write and read to and from the qcow2 disk were done between the red markers. There was also nothing on the QNAP Resource monitor, but it showed the right speed for the raw disk (at the same time as the grahps after the latter red marker in the image from PVE).

1621813355156.png

The same applied for a read test done using the following script (again /data = qcow2, /data2 = raw):

Code:
#!/bin/bash

for i in {1..16}
do
        pv /data/test5-$i > /dev/null &
done
time -p wait

None of this weird behaviour is noticed if running the tests on the node with the NFS mount path /mnt/pve/qnap-1.

What could be causing this and where is the data going? It can't be going at 36 Gb/s since the NIC is only 10 GbE. df -h reports the space on both drives getting filled so the data has to go somewhere. There's also no network traffic after the test (like cache flushing) and the tested data size 16*12 GB is more than the RAM of the guest OS and even more than the node has RAM installed so it can't write the whole thing to RAM.

VM config:

1621813865476.png

EDIT:

In case it's important, the NFS server is a QNAP TS-1277XU-RP with Ryzen 5 2600 CPU and 8 GB RAM. The disks are Seagate IronWolf Pro 12 TB and there are 10 disks in RAID10 + 2 hot spares. Additionally, there is a write-only cache of two 500 GB Samsung 980 NVMe SSDs in a RAID1 configuration. The cache is enabled for Random I/O data blocks under 4MB.

It's connected to both PVE nodes straight with no switch using 10 GbE DAC cables and the PVE nodes are HP ProLiant DL380p Gen8 with HP NC523 dual port 10 GbE SFP+ NICs.
 
Last edited:
i general, to benchmark disk/network i'd avoid using /dev/zero as that is highly compressable. use /dev/urandom instead for example (or the tool 'fio' for disk benchmarks)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!