So I decided to change from a hardware raid to ZFS on my homelab Proxmox node. The hardware is:
Dell r710
120GB ram
2 6-core 12-thread CPU’s
Dell H200 raid controller flashed to LSI 9211-8i IT Mode
2 Seagate Constellation ES.1 2TB 7.2K 6G/s SAS drives in a ZFS mirror
I’ve limited ZFS to 8GB of ram
Proxmox 6.1 (fresh install)
ZFS version 0.8.3-pve1
Ever since I switched to ZFS my IO delay has gone up an unreasonable amount. At idle (no VMs running) it’s at 4 %. When I ran a vm restore it jumped to high 80’s. A few times I lost access to the GUI (ssh still worked intermitently). I ran a pveperf which gave me this:
CPU BOGOMIPS: 127676.88
REGEX/SECOND: 1762144
HD SIZE: 1749.87 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 4.16
DNS EXT: 43.63 ms
DNS INT: 63.36 ms
Then did some reading which suggested to turn off syncing to see if I would benefit from a ZIL drive. After I did I got this from pveperf (again with NO vm’s running):
CPU BOGOMIPS: 127676.88
REGEX/SECOND: 1748815
HD SIZE: 1749.87 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 565.14
DNS EXT: 38.61 ms
DNS INT: 60.58 ms
Which is loads better, but still pretty bad in the grand scheme of things from my understanding. Does anyone have any idea why I would be getting such high IO delays? Or what I could do to make Proxmox useable again?
Edit:
I just did a VM restore from backup and it's back to 80% IO Delay with ZFS sync=disabled.
Edit 2: I fixed it on my own. It was a bad drive that wasn’t registering as bad in ZFS. For anyone with the same symptoms: zpool iostat –v and looking through the syslog made it pretty obvious to me from the terrible bandwidth on one drive. I replaced the drive and my fsyncs are up in the 3600’s which I’m more than happy with on spinning rust (although that’s with sync=disabled, still need to get a ZIL drive).
Dell r710
120GB ram
2 6-core 12-thread CPU’s
Dell H200 raid controller flashed to LSI 9211-8i IT Mode
2 Seagate Constellation ES.1 2TB 7.2K 6G/s SAS drives in a ZFS mirror
I’ve limited ZFS to 8GB of ram
Proxmox 6.1 (fresh install)
ZFS version 0.8.3-pve1
Ever since I switched to ZFS my IO delay has gone up an unreasonable amount. At idle (no VMs running) it’s at 4 %. When I ran a vm restore it jumped to high 80’s. A few times I lost access to the GUI (ssh still worked intermitently). I ran a pveperf which gave me this:
CPU BOGOMIPS: 127676.88
REGEX/SECOND: 1762144
HD SIZE: 1749.87 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 4.16
DNS EXT: 43.63 ms
DNS INT: 63.36 ms
Then did some reading which suggested to turn off syncing to see if I would benefit from a ZIL drive. After I did I got this from pveperf (again with NO vm’s running):
CPU BOGOMIPS: 127676.88
REGEX/SECOND: 1748815
HD SIZE: 1749.87 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 565.14
DNS EXT: 38.61 ms
DNS INT: 60.58 ms
Which is loads better, but still pretty bad in the grand scheme of things from my understanding. Does anyone have any idea why I would be getting such high IO delays? Or what I could do to make Proxmox useable again?
Edit:
I just did a VM restore from backup and it's back to 80% IO Delay with ZFS sync=disabled.
Edit 2: I fixed it on my own. It was a bad drive that wasn’t registering as bad in ZFS. For anyone with the same symptoms: zpool iostat –v and looking through the syslog made it pretty obvious to me from the terrible bandwidth on one drive. I replaced the drive and my fsyncs are up in the 3600’s which I’m more than happy with on spinning rust (although that’s with sync=disabled, still need to get a ZIL drive).
Last edited: