Poor disk performance

vstarikov

New Member
Feb 3, 2021
10
0
1
50
We have migrated multiple Windows VMs from Windows hosts to Proxmox (same hardware).
There were no any performance problems on Windows hosts, but same hardware running Proxmox gives very poor performance while extensive disk IO in VMs or on host.

We have tried different configurations:
1. ZFS on SSD.
2. ZFS on HDD
3. LVM on HDD

Tried different settings for Windows guests (Writeback/No cahce, etc)
Tried changing dirty_background_bytes and dirty_bytes on host
Tried moving VMs between hosts
Tried moving VMs and disks between storages (ZFS, LVM)
Tried movind VMs and disks between different disk types (SSD/HDD).

Nothing helps. This problem appears on all hosts when any extensive disk operation is performed, i.e:
1. copy large file in a VM
2. clone VM
3. create VM backup
4. replication

This is really a problem for us. We like Proxmox for functionality, but may be forced to go back to Windows hosts due to the problem.

Example server config:
CPU(s) 48 x Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz (2 Sockets)
Kernel Version Linux 5.4.73-1-pve #1 SMP PVE 5.4.73-1
PVE Manager Version pve-manager/6.2-15/48bd51b
RAM: 188Gb

Any clues?
 

Attachments

  • Screenshot 2021-02-03 at 16.40.43.png
    Screenshot 2021-02-03 at 16.40.43.png
    199.4 KB · Views: 88
How are the zpools set up? zpool status

How are the disks connected? Is there a RAID controller in between?
 
1. There are different ZFS setups on different hosts.
The most powerfull host has a single-disk ZFS on SSD.

zpool status
pool: zfs
state: ONLINE
scan: scrub repaired 0B in 0 days 10:42:40 with 0 errors on Sun Jan 10 12:06:41 2021
config:

NAME STATE READ WRITE CKSUM
zfs ONLINE 0 0 0
ata-HGST_HUS728T8TALE6L4_VAHT0VAL ONLINE 0 0 0

errors: No known data errors

pool: zfs2
state: ONLINE
scan: scrub repaired 0B in 0 days 01:55:31 with 0 errors on Sun Jan 10 03:19:33 2021
config:

NAME STATE READ WRITE CKSUM
zfs2 ONLINE 0 0 0
wwn-0x5002538e40f10197 ONLINE 0 0 0

errors: No known data errors


2. One of the hosts has ZFS over hardware RAID. But only one of the four hosts. The problem appears on all hosts regardless ZFS or LVM configuration and disks type (SSD or HDD).
 
those I/O delay numbers are huge

How was it installed? boot from ISO or ontop of Debian
Is this a clustered setup?

can you post output from

pvesm status
 
It was installed from ISO by our provider.

pvesm status
Name Type Status Total Used Available %
backup3 dir disabled 0 0 0 N/A
backupsrv lvm disabled 0 0 0 N/A
local dir active 30832548 17099408 12143892 55.46%
lvm lvm disabled 0 0 0 N/A
lvm1 lvm disabled 0 0 0 N/A
zfs zfspool active 7557611100 4301049284 3256561815 56.91%
zfs2 zfspool active 3770678228 3201684844 568993384 84.91%
 
How is your hardware configuration on the VMs? And did you install the VirtIO drivers? Without them Windows VMs perform very poorly in my experience. That would explain the poor IO performance even on SSDs.
 
VirtIO drivers are installed and active.
VM hardware attached.
 

Attachments

  • Screenshot 2021-02-16 at 12.09.04.png
    Screenshot 2021-02-16 at 12.09.04.png
    40 KB · Views: 98
VirtIO drivers are installed and active.
VM hardware attached.

The SCSI controller should be set to "VirtIO SCSI" and the NIC "VirtIO". Changing the NIC is pretty easy but changing the controller can be tricky since it will cause Windows to not boot most of the time because of different disk paths.

Before you start doing that perhaps you could build a new VM and install the VirtIO drivers during Windows installation to get it right from the start, and then use that VM to evaluate the performance? Alternatively you can add an additional controller to an existing VM of the type "VirtIO SCSI" and evaluate performance on that drive.
 
Last edited:
You could also try to set the VM disks cache mode to "write back". This could improve the situation a bit.

Also, are you aware that these ZFS pools (zfs1, zfs2) only have a single disk? zfs1 is using a HGST 8TB datacenter disk with 7200rpm which will not have the best performance anyway, especially if you have multiple VMs running on it. The disk for the other pool (zfs2) is not as easy to find out because the pool is using the wwn identifier of the disk (at least in the output posted earlier).

This means, you have no redundancy, if the disk fails, the pool will be lost.


Regarding the disk setup, to what @Bengt Nolin has mentioned: besides using the virtio scsi controller (single will be better if you have multiple disks in the MV) you will also have to change the disks to use the "scsi" bus instead of Sata. For an existing Windows machine this is a bit tricky but explained in the pve wiki (bottom of the page). Sata as a bus type will also cost you a bit of performance.
 
Thank you very much for help.

Tried first the suggestion regarding VM hardware.
1. Changed SCSI controller to Virtio SCSI.
2. Changed both disks to Virtio.
3. Changed NIC to Virtio.

Unfortunately that did not help. Copying a 5Gb file inside VM hanged (see the screenshot).

Also any long disk write operation on the host itself (i.e. backup) increases IOdelay to 40-60% and slows down everything, so many VMs almost stop working.
Could that be ZFS? If yes, what can I do? Change to LVM or something else?
 

Attachments

  • Screenshot 2021-02-17 at 09.01.37.png
    Screenshot 2021-02-17 at 09.01.37.png
    37.1 KB · Views: 59
  • Screenshot 2021-02-17 at 09.01.12.png
    Screenshot 2021-02-17 at 09.01.12.png
    160.7 KB · Views: 63

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!