Hi, I am currently operating a 3 node PVE 7.0 & Ceph (cluster based on 10*NVMe/ 10*SATA-SSDs in each node). The CEPH network is a dedicated 100GbEthernet link.
In Windows guests the disk perfomance is maxing out at a 100% disk usage and the latency is at multiple hundred up to several thousand miliseconds making the guests systems absolutely unresponsive...
This is not the normal behavior. I checked for scrubbing/deep-scrubbing at the PVE nodes, but there is currently no operation running (all PGs are active+clean). Ceph health shows HEALTH_OK. Reads/Writes are at around 5 MiB/s and IOPS at around 100 on "PVE -Datacenter - Ceph" screen. OSDs are all under 30% usage.
The guests use VirtIO SCSI controller and IDE disks, no caching and Discard is on. The guest itself is idle. I updated the virtio-drivers to the current version without any changes...
Any help where to start diagnosing the issue?
Thanks a lot.
In Windows guests the disk perfomance is maxing out at a 100% disk usage and the latency is at multiple hundred up to several thousand miliseconds making the guests systems absolutely unresponsive...
This is not the normal behavior. I checked for scrubbing/deep-scrubbing at the PVE nodes, but there is currently no operation running (all PGs are active+clean). Ceph health shows HEALTH_OK. Reads/Writes are at around 5 MiB/s and IOPS at around 100 on "PVE -Datacenter - Ceph" screen. OSDs are all under 30% usage.
The guests use VirtIO SCSI controller and IDE disks, no caching and Discard is on. The guest itself is idle. I updated the virtio-drivers to the current version without any changes...
Any help where to start diagnosing the issue?
Thanks a lot.