Hello,
I have a similar (or same) problem as in this topic: https://forum.proxmox.com/threads/backup-slowing-down-to-a-crawl.44870. I was asked to create a new thread, because the setup is different.
In my case the storage is HPE MSA 2050 (attached by SAS). There's 4 LUNs where VMs located as LVM, shared between multiple nodes. Backup (snapshot mode) started weekly, and at first performance is great, it's 3-digit numbers. Pure read of sparse data can go up to 400 MB/s.
At some point the speed goes down, but not everywhere at once, only on particular LUN(s). Read speed of sparse data slowed to 20 Mb/s. It happens almost simultaneously on multiple nodes, if backup is going on more than one node at once. But only for particular LUNs, and I think it's most used LUNs (there more VMs that on others).
Somehow restart of controller on the storage correct the problem, and backup return to full speed. Because of multipath I can do that safely, but it needs to be done manually every time after a few hours, beacause read speed go down again.
The problem not likely in the storage, because only read speed of vzdump is affected, everything else work fine. I can copy data from the same LUN at high speed when backup already slowed. I'm not sure when this problem started, and I appreciate every suggestion for finding the cause.
I have a similar (or same) problem as in this topic: https://forum.proxmox.com/threads/backup-slowing-down-to-a-crawl.44870. I was asked to create a new thread, because the setup is different.
In my case the storage is HPE MSA 2050 (attached by SAS). There's 4 LUNs where VMs located as LVM, shared between multiple nodes. Backup (snapshot mode) started weekly, and at first performance is great, it's 3-digit numbers. Pure read of sparse data can go up to 400 MB/s.
At some point the speed goes down, but not everywhere at once, only on particular LUN(s). Read speed of sparse data slowed to 20 Mb/s. It happens almost simultaneously on multiple nodes, if backup is going on more than one node at once. But only for particular LUNs, and I think it's most used LUNs (there more VMs that on others).
Somehow restart of controller on the storage correct the problem, and backup return to full speed. Because of multipath I can do that safely, but it needs to be done manually every time after a few hours, beacause read speed go down again.
The problem not likely in the storage, because only read speed of vzdump is affected, everything else work fine. I can copy data from the same LUN at high speed when backup already slowed. I'm not sure when this problem started, and I appreciate every suggestion for finding the cause.