Hey folks,
Signed up to the forums as have been having some issues lately and haven't been able to find the root cause. Have been running my Proxmox server with the same config for about the last 6 years (I do somewhat regular proxmox version updates) and these issues have only started to happen in about the last month or so. No major changes have taken place in what the server is doing, or how it's being accessed.
CPU: 2x X5670
RAM: 80GB ECC
OS Pool: 2x120GB SSD - ZFS Mirror
Storage Pool: 2x6x3TB 3.5" - ZFS RAIDZ2 (two 6 disk vdev)
Workload:
PiHole VM, Plex VM, Deluge VM etc.
Occasional random test VMs for lab purposes
The problem I am getting is very slow speed of the pool, very high iowait, VMs becoming unresponsive for hours until the iowait issue clears and whatever stuck process was causing issues finally dies off. If I try to reboot when things lock up, it doesn't help and I get console errors for a few hours until the storage calms down again. It is usually possible to mount the pool read only with no issues when this happens. The main storage pool did reach about 75% usage recently so I thought maybe it was getting full, I deleted about 5TB or data and tried to move some of the newest VM disks around to 're-write them' and this just seems to have made things worse. Seems like if I even look at the storage wrong right now, or ask it to do anything even remotely disk intensive it will lock up and can take a few hours to come good again.
No errors in zpool status. I run scrutiny to review HDD SMART data and not seeing any significant issues with the drives, so I'm not sure if it's a hardware issue or something else. There's 80GB of RAM total, and the VMs I'm running use about 24GB or less.
Anyone able to help me try and narrow down where the performance issue is originating and how I can return to some stability? Can't log into it right now to collect any more data or logs but have a few screenshots. Thanks in advance!
Example:
Example of iowait when the pool is struggling:
Example of 'normal' performance:
Thanks!
Signed up to the forums as have been having some issues lately and haven't been able to find the root cause. Have been running my Proxmox server with the same config for about the last 6 years (I do somewhat regular proxmox version updates) and these issues have only started to happen in about the last month or so. No major changes have taken place in what the server is doing, or how it's being accessed.
CPU: 2x X5670
RAM: 80GB ECC
OS Pool: 2x120GB SSD - ZFS Mirror
Storage Pool: 2x6x3TB 3.5" - ZFS RAIDZ2 (two 6 disk vdev)
Workload:
PiHole VM, Plex VM, Deluge VM etc.
Occasional random test VMs for lab purposes
The problem I am getting is very slow speed of the pool, very high iowait, VMs becoming unresponsive for hours until the iowait issue clears and whatever stuck process was causing issues finally dies off. If I try to reboot when things lock up, it doesn't help and I get console errors for a few hours until the storage calms down again. It is usually possible to mount the pool read only with no issues when this happens. The main storage pool did reach about 75% usage recently so I thought maybe it was getting full, I deleted about 5TB or data and tried to move some of the newest VM disks around to 're-write them' and this just seems to have made things worse. Seems like if I even look at the storage wrong right now, or ask it to do anything even remotely disk intensive it will lock up and can take a few hours to come good again.
No errors in zpool status. I run scrutiny to review HDD SMART data and not seeing any significant issues with the drives, so I'm not sure if it's a hardware issue or something else. There's 80GB of RAM total, and the VMs I'm running use about 24GB or less.
Anyone able to help me try and narrow down where the performance issue is originating and how I can return to some stability? Can't log into it right now to collect any more data or logs but have a few screenshots. Thanks in advance!
Example:
Example of iowait when the pool is struggling:
Example of 'normal' performance:
Thanks!