Live Migrations get slower over time

Nov 23, 2023
20
1
8
Hi,

we are experiencing the problem that live migrations get slower over time - meaning the higher the uptime of a host the slower it gets. I know there are several posts already in regards to slow live migrations, however, we

- have dedicated 2 x 100Gbit connections for the live migrations per host
- the migrations is already set to "insecure"
- we are not doing storage migrations, it is ceph with 2 x 100Gbit connections (each, that global and the cluster network have that)
- and we can repetitivly see, that the live migration between hosts drops to 10 or 10MB/s sometimes it goes up to 100MB/s, sometimes it even drops below 1MB/s. After a reboot of the hosts, if i migrated between 2 freshly rebootet nodes, the live migration speed is 2 - 15 GB/s -see attached screenshots form the same cluster (and that was a fast one of the slow ones )

I saw posts where it blames memory fragmentation, which sounds odd for linux.

Does someone have an idea why this is happening (and @proxmox staff, the nodes all have a basic subscription - we could also make a ticket if you think it is necessary and if we want to check together)
 

Attachments

  • fast.png
    fast.png
    38.7 KB · Views: 12
  • slow.png
    slow.png
    92.9 KB · Views: 12
Last edited:
Hi,
please share the VM configuration of an affected VM qm config ID with the numerical ID of the VM as well as the output of pveversion -v and full system journal from the source and target node from the boot until and including the problematic migration. The start and end of the migration task logs (one fast and one slow) would also be interesting.