Live Migrations get slower over time

Nov 23, 2023
20
1
8
Hi,

we are experiencing the problem that live migrations get slower over time - meaning the higher the uptime of a host the slower it gets. I know there are several posts already in regards to slow live migrations, however, we

- have dedicated 2 x 100Gbit connections for the live migrations per host
- the migrations is already set to "insecure"
- we are not doing storage migrations, it is ceph with 2 x 100Gbit connections (each, that global and the cluster network have that)
- and we can repetitivly see, that the live migration between hosts drops to 10 or 10MB/s sometimes it goes up to 100MB/s, sometimes it even drops below 1MB/s. After a reboot of the hosts, if i migrated between 2 freshly rebootet nodes, the live migration speed is 2 - 15 GB/s -see attached screenshots form the same cluster (and that was a fast one of the slow ones )

I saw posts where it blames memory fragmentation, which sounds odd for linux.

Does someone have an idea why this is happening (and @proxmox staff, the nodes all have a basic subscription - we could also make a ticket if you think it is necessary and if we want to check together)
 

Attachments

  • fast.png
    fast.png
    38.7 KB · Views: 1
  • slow.png
    slow.png
    92.9 KB · Views: 1
Last edited: