Hey,
we observe major performance issues while running fstrim on VMs backed by a SSD pool (3 replica, 50OSDs) with Ceph (16.2.7) on proxmox. We have a workload that leads to some bigger data fluctuation on our VMs (CentOS 7) . Therefore we have enabled the discard mode for the disks and run fstrim once a week. First we observed major performance problems if multiple fstrim processes where running at the same time. To mitigate the first problem we wrote some software that scheduled the fstrim runs in a way that we can make sure only one is running at a time on the whole cluster. This made the situation somewhat better but didn't solve the problem. We still observe near blocking of the IO performance while the fstrim runs and we are not sure why. Today we noticed that some strange reported outages by our monitoring are also caused by the fstrim runs, the node exporter needs 135seconds to read 40KB from disk at these moments. Thats totally not what we expected. Is there a known issue with fstrim on rbd devices containing a XFS file system? Or at general are there parameters to tune to get a better performance for small reads and writes?
we observe major performance issues while running fstrim on VMs backed by a SSD pool (3 replica, 50OSDs) with Ceph (16.2.7) on proxmox. We have a workload that leads to some bigger data fluctuation on our VMs (CentOS 7) . Therefore we have enabled the discard mode for the disks and run fstrim once a week. First we observed major performance problems if multiple fstrim processes where running at the same time. To mitigate the first problem we wrote some software that scheduled the fstrim runs in a way that we can make sure only one is running at a time on the whole cluster. This made the situation somewhat better but didn't solve the problem. We still observe near blocking of the IO performance while the fstrim runs and we are not sure why. Today we noticed that some strange reported outages by our monitoring are also caused by the fstrim runs, the node exporter needs 135seconds to read 40KB from disk at these moments. Thats totally not what we expected. Is there a known issue with fstrim on rbd devices containing a XFS file system? Or at general are there parameters to tune to get a better performance for small reads and writes?