Windows guest IO performance improved with krbd and writethrough

Researching a customer request, we have found significant Windows IO benchmark (they use Crystal Disk Mark) improvements, about 10x, using KRBD and guest writethrough cache mode. I was able to replicate this on two of our lab clusters running NVME Ceph pools. No other modes (aio, PG number, virtual hardware, and disk emulations) produce anything near this. The AI explanation is that, while KRBD naturally uses the host page cache, writethrough removes the need for frequent FUAs in Windows.

I am interested in other's experiences and takes on Windows guest behaviors. The settings above may be helpful to some.

Update: looks like when host saturation occurs, either due to memory exhaustion, or backing store (drives) IO saturation, kRBD IO will also pause. So using kRBD in PVE successfully requires sufficient RAM and good response time on the host.

It is very likely that if the normal RBD type storage is not performing well for some reason (as in real world performance, latency on Q1, not some meaningless benchmark), then there is an underlying problem with network or backing storage, which needs to be fixed first, and kRBD would only be a workaround. It could be a solutions for systems that are 99% adequate, but need to handle an occasional spike in IO.
 
Last edited:
  • Like
Reactions: waltar