Hey folks,
I was playing around with storage migration earlier today when I stumbled into something that I would like to figure out. I have a Dell r620 host with the h310 mini in non-raid mode and 2 120 GB cheap SSDs in a mirror for the rpool. My normal storage devices are 2 separate boxes running omniOS (one with a zfs striped mirror of SSDs and the other is 2 striped raidz2 vdevs of spinning disks). I connect to both of these over 10G via ZFS over iSCSI.
I also have another proxmox node in a different environment that I manage, and that one has the same issue as well. It's also an r620 and has the same h310 mini in non-raid mode but with only local SSDs for vm storage. It has 4 Samsung 860 EVOs in raidz1.
What I have noticed on both of these is that the hypervisor will completely stop responding whenever the rpool is tasked with a heavy write workload. It will completely hang everything. The VMs will also be inaccessible. Things I have noticed that cause this are performing read/write tests in a guest OS with Crystal Disk Mark and today I noticed it happening during a disk migration to the rpool. On that first node with the 2 ZFS over iSCSI connected pools, this obviously does not occur when the guest disks are stored on those connected pools; it only happens when there is a heavy write workload on the host's rpool.
I am about to add 2 SSDs and make a new mirrored pool on one of these hosts to see if heavy write workloads still hang the hypervisor and all other VMs. In the meantime, I would like to get some feedback from you all on what your thoughts are on this. Does anyone have a similar setup maybe with a different HBA that they could run some tests with?
Thanks,
Stan
I was playing around with storage migration earlier today when I stumbled into something that I would like to figure out. I have a Dell r620 host with the h310 mini in non-raid mode and 2 120 GB cheap SSDs in a mirror for the rpool. My normal storage devices are 2 separate boxes running omniOS (one with a zfs striped mirror of SSDs and the other is 2 striped raidz2 vdevs of spinning disks). I connect to both of these over 10G via ZFS over iSCSI.
I also have another proxmox node in a different environment that I manage, and that one has the same issue as well. It's also an r620 and has the same h310 mini in non-raid mode but with only local SSDs for vm storage. It has 4 Samsung 860 EVOs in raidz1.
What I have noticed on both of these is that the hypervisor will completely stop responding whenever the rpool is tasked with a heavy write workload. It will completely hang everything. The VMs will also be inaccessible. Things I have noticed that cause this are performing read/write tests in a guest OS with Crystal Disk Mark and today I noticed it happening during a disk migration to the rpool. On that first node with the 2 ZFS over iSCSI connected pools, this obviously does not occur when the guest disks are stored on those connected pools; it only happens when there is a heavy write workload on the host's rpool.
I am about to add 2 SSDs and make a new mirrored pool on one of these hosts to see if heavy write workloads still hang the hypervisor and all other VMs. In the meantime, I would like to get some feedback from you all on what your thoughts are on this. Does anyone have a similar setup maybe with a different HBA that they could run some tests with?
Thanks,
Stan