slow migrations

so in our case the network we think was the cause.

I will leave the thread open as someone else posted their issue, and will wait a few days to make sure there is not a repeat.
nope for us : i make LACP (LAG) for migration network : 3 GB bond.

And always : command 'zfs snapshot zPool-8TB/vm-110-disk-0@__replicate_110-0_1631646062__' failed: got timeout
 
I just check kernlog at one of the cluster nodes, and the dmesg info is very different. there are lines like these:
Code:
Sep 12 07:39:41 pve2 kernel: [ 3385.014666] INFO: task jbd2/rbd0-8:4096 blocked for more than 120 seconds.
Sep 12 07:39:41 pve2 kernel: [ 3385.014698] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

so the offsite system hang is just a coincidence
same message for VM (KVM) !
 
got this message : "command 'zfs snapshot zPool-8TB/vm-413-disk-0@__replicate_413-0_1631739634__' failed: got timeout"
 
so the cumulus switches are the gateway of the vms ?


I don't known if you do active-passive (vrrp), or active-active (vrr).

For vrrp, I think it should work without problem with static route


If you use vrr, be carefull because I don't think it's working fine without bgp and ecmp paths.


https://docs.nvidia.com/networking-...yer-2/Virtual-Router-Redundancy-VRR-and-VRRP/
Hi, I went back and checked the reasoning for our not using bgp. Using static routes works too. see https://docs.nvidia.com/networking-...ulus-linux-43/Layer-3/Routing/Static-Routing/
"You can use static routing if you don’t require the complexity of a dynamic routing protocol (such as BGP or OSPF), if you have routes that do not change frequently and for which the destination is only one or two paths away."

since we have just one or two WAN connections and 15+- vlans this works in our case.


PS: thank you for the help on this.
 
Hello
I am still seeing slow migrations.
let me know what I can do to debug. There could be some suggestions above and I'll check later as I've a big issue to deal with..

any suggestions to debug the issue?
 
Code:
2021-10-15 15:49:06 migration active, transferred 12.5 GiB of 32.0 GiB VM-state, 49.5 MiB/s
2021-10-15 15:49:07 migration active, transferred 12.5 GiB of 32.0 GiB VM-state, 57.2 MiB/s
2021-10-15 15:49:08 migration active, transferred 12.6 GiB of 32.0 GiB VM-state, 37.1 MiB/s
..
2021-10-15 15:49:20 migration active, transferred 13.1 GiB of 32.0 GiB VM-state, 59.2 MiB/s
2021-10-15 15:49:21 migration active, transferred 13.2 GiB of 32.0 GiB VM-state, 71.9 MiB/s
2021-10-15 15:49:22 migration active, transferred 13.2 GiB of 32.0 GiB VM-state, 31.0 MiB/s
2021-10-15 15:49:23 migration active, transferred 13.2 GiB of 32.0 GiB VM-state, 35.5 MiB/s
2021-10-15 15:49:24 migration active, transferred 13.3 GiB of 32.0 GiB VM-state, 33.1 MiB/s
2021-10-15 15:49:25 migration active, transferred 13.3 GiB of 32.0 GiB VM-state, 31.9 MiB/s
2021-10-15 15:49:26 migration active, transferred 13.3 GiB of 32.0 GiB VM-state, 33.6 MiB/s
.
2021-10-15 15:49:51 migration active, transferred 14.3 GiB of 32.0 GiB VM-state, 60.9 MiB/s
2021-10-15 15:49:52 migration active, transferred 14.3 GiB of 32.0 GiB VM-state, 57.4 MiB/s
2021-10-15 15:49:53 migration active, transferred 14.3 GiB of 32.0 GiB VM-state, 45.1 MiB/s
2021-10-15 15:49:54 migration active, transferred 14.4 GiB of 32.0 GiB VM-state, 31.6 MiB/s
2021-10-15 15:49:55 migration active, transferred 14.4 GiB of 32.0 GiB VM-state, 27.6 MiB/s
..
2021-10-15 15:50:16 migration active, transferred 15.2 GiB of 32.0 GiB VM-state, 52.7 MiB/s
2021-10-15 15:50:17 migration active, transferred 15.3 GiB of 32.0 GiB VM-state, 39.2 MiB/s
2021-10-15 15:50:18 migration active, transferred 15.3 GiB of 32.0 GiB VM-state, 52.1 MiB/s
..
2021-10-15 15:50:37 migration active, transferred 16.2 GiB of 32.0 GiB VM-state, 76.3 MiB/s
2021-10-15 15:50:39 migration active, transferred 16.3 GiB of 32.0 GiB VM-state, 45.4 MiB/s
2021-10-15 15:50:40 migration active, transferred 16.4 GiB of 32.0 GiB VM-state, 26.5 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:50:41 migration active, transferred 16.4 GiB of 32.0 GiB VM-state, 38.1 MiB/s
2021-10-15 15:50:42 migration active, transferred 16.4 GiB of 32.0 GiB VM-state, 28.2 MiB/s, VM dirties lots of memory: 29.0 MiB/s
..

2021-10-15 15:52:04 migration active, transferred 19.8 GiB of 32.0 GiB VM-state, 49.1 MiB/s
2021-10-15 15:52:05 migration active, transferred 19.9 GiB of 32.0 GiB VM-state, 74.8 MiB/s
2021-10-15 15:52:06 migration active, transferred 19.9 GiB of 32.0 GiB VM-state, 36.1 MiB/s
2021-10-15 15:52:07 migration active, transferred 19.9 GiB of 32.0 GiB VM-state, 37.9 MiB/s
2021-10-15 15:52:08 migration active, transferred 20.0 GiB of 32.0 GiB VM-state, 43.7 MiB/s
2021-10-15 15:52:09 migration active, transferred 20.0 GiB of 32.0 GiB VM-state, 58.7 MiB/s
2021-10-15 15:52:10 migration active, transferred 20.1 GiB of 32.0 GiB VM-state, 51.6 MiB/s
2021-10-15 15:52:11 migration active, transferred 20.1 GiB of 32.0 GiB VM-state, 51.0 MiB/s
2021-10-15 15:52:12 migration active, transferred 20.2 GiB of 32.0 GiB VM-state, 41.0 MiB/s
2021-10-15 15:52:13 migration active, transferred 20.2 GiB of 32.0 GiB VM-state, 23.0 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:52:14 migration active, transferred 20.2 GiB of 32.0 GiB VM-state, 25.7 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:52:15 migration active, transferred 20.3 GiB of 32.0 GiB VM-state, 46.6 MiB/s
2021-10-15 15:52:16 migration active, transferred 20.3 GiB of 32.0 GiB VM-state, 41.3 MiB/s
2021-10-15 15:52:17 migration active, transferred 20.4 GiB of 32.0 GiB VM-state, 52.5 MiB/s
2021-10-15 15:52:18 migration active, transferred 20.4 GiB of 32.0 GiB VM-state, 46.3 MiB/s
2021-10-15 15:52:19 migration active, transferred 20.5 GiB of 32.0 GiB VM-state, 52.7 MiB/s
2021-10-15 15:52:20 migration active, transferred 20.5 GiB of 32.0 GiB VM-state, 58.3 MiB/s
2021-10-15 15:52:21 migration active, transferred 20.6 GiB of 32.0 GiB VM-state, 52.8 MiB/s
2021-10-15 15:52:22 migration active, transferred 20.6 GiB of 32.0 GiB VM-state, 22.1 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:52:23 migration active, transferred 20.7 GiB of 32.0 GiB VM-state, 29.3 MiB/s
2021-10-15 15:52:24 migration active, transferred 20.7 GiB of 32.0 GiB VM-state, 33.9 MiB/s
2021-10-15 15:52:25 migration active, transferred 20.7 GiB of 32.0 GiB VM-state, 25.5 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:52:26 migration active, transferred 20.8 GiB of 32.0 GiB VM-state, 33.4 MiB/s
2021-10-15 15:52:27 migration active, transferred 20.8 GiB of 32.0 GiB VM-state, 66.2 MiB/s
2021-10-15 15:52:28 migration active, transferred 20.9 GiB of 32.0 GiB VM-state, 76.1 MiB/s
2021-10-15 15:52:29 migration active, transferred 20.9 GiB of 32.0 GiB VM-state, 56.9 MiB/s
2021-10-15 15:52:30 migration active, transferred 21.0 GiB of 32.0 GiB VM-state, 60.2 MiB/s
2021-10-15 15:52:31 migration active, transferred 21.0 GiB of 32.0 GiB VM-state, 58.5 MiB/s
2021-10-15 15:52:32 migration active, transferred 21.1 GiB of 32.0 GiB VM-state, 44.9 MiB/s
2021-10-15 15:52:33 migration active, transferred 21.1 GiB of 32.0 GiB VM-state, 55.7 MiB/s
2021-10-15 15:52:34 migration active, transferred 21.2 GiB of 32.0 GiB VM-state, 33.2 MiB/s
2021-10-15 15:52:35 migration active, transferred 21.2 GiB of 32.0 GiB VM-state, 42.8 MiB/s
2021-10-15 15:52:36 migration active, transferred 21.2 GiB of 32.0 GiB VM-state, 24.7 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:52:37 migration active, transferred 21.3 GiB of 32.0 GiB VM-state, 26.8 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:52:38 migration active, transferred 21.3 GiB of 32.0 GiB VM-state, 32.9 MiB/s
2021-10-15 15:52:39 migration active, transferred 21.3 GiB of 32.0 GiB VM-state, 50.8 MiB/s
2021-10-15 15:52:40 migration active, transferred 21.4 GiB of 32.0 GiB VM-state, 51.3 MiB/s
2021-10-15 15:52:41 migration active, transferred 21.4 GiB of 32.0 GiB VM-state, 21.1 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:52:42 migration active, transferred 21.5 GiB of 32.0 GiB VM-state, 56.8 MiB/s
2021-10-15 15:52:43 migration active, transferred 21.5 GiB of 32.0 GiB VM-state, 52.3 MiB/s
2021-10-15 15:52:44 migration active, transferred 21.6 GiB of 32.0 GiB VM-state, 59.2 MiB/s
..
2021-10-15 15:53:18 migration active, transferred 23.1 GiB of 32.0 GiB VM-state, 33.2 MiB/s
2021-10-15 15:53:19 migration active, transferred 23.2 GiB of 32.0 GiB VM-state, 72.9 MiB/s
2021-10-15 15:53:20 migration active, transferred 23.2 GiB of 32.0 GiB VM-state, 39.9 MiB/s
2021-10-15 15:53:21 migration active, transferred 23.3 GiB of 32.0 GiB VM-state, 179.9 MiB/s
2021-10-15 15:53:22 migration active, transferred 23.4 GiB of 32.0 GiB VM-state, 19.5 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:23 migration active, transferred 23.4 GiB of 32.0 GiB VM-state, 20.4 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:24 migration active, transferred 23.4 GiB of 32.0 GiB VM-state, 88.2 MiB/s
2021-10-15 15:53:25 migration active, transferred 23.4 GiB of 32.0 GiB VM-state, 20.6 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:26 migration active, transferred 23.5 GiB of 32.0 GiB VM-state, 30.4 MiB/s
2021-10-15 15:53:27 migration active, transferred 23.5 GiB of 32.0 GiB VM-state, 32.4 MiB/s
2021-10-15 15:53:28 migration active, transferred 23.5 GiB of 32.0 GiB VM-state, 20.4 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:29 migration active, transferred 23.5 GiB of 32.0 GiB VM-state, 32.4 MiB/s
2021-10-15 15:53:30 migration active, transferred 23.6 GiB of 32.0 GiB VM-state, 32.8 MiB/s
2021-10-15 15:53:31 migration active, transferred 23.6 GiB of 32.0 GiB VM-state, 28.7 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:32 migration active, transferred 23.7 GiB of 32.0 GiB VM-state, 81.0 MiB/s
..
2021-10-15 15:53:45 migration active, transferred 24.2 GiB of 32.0 GiB VM-state, 89.4 MiB/s
2021-10-15 15:53:46 migration active, transferred 24.3 GiB of 32.0 GiB VM-state, 21.0 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:47 migration active, transferred 24.4 GiB of 32.0 GiB VM-state, 20.4 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:48 migration active, transferred 24.4 GiB of 32.0 GiB VM-state, 26.1 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:49 migration active, transferred 24.4 GiB of 32.0 GiB VM-state, 15.2 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:50 migration active, transferred 24.4 GiB of 32.0 GiB VM-state, 18.8 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:51 migration active, transferred 24.5 GiB of 32.0 GiB VM-state, 32.1 MiB/s
2021-10-15 15:53:52 migration active, transferred 24.5 GiB of 32.0 GiB VM-state, 33.1 MiB/s
2021-10-15 15:53:53 migration active, transferred 24.5 GiB of 32.0 GiB VM-state, 20.6 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:54 migration active, transferred 24.5 GiB of 32.0 GiB VM-state, 32.3 MiB/s
2021-10-15 15:53:55 migration active, transferred 24.6 GiB of 32.0 GiB VM-state, 32.7 MiB/s
2021-10-15 15:53:56 migration active, transferred 24.6 GiB of 32.0 GiB VM-state, 29.6 MiB/s
2021-10-15 15:53:57 migration active, transferred 24.6 GiB of 32.0 GiB VM-state, 16.6 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:58 migration active, transferred 24.7 GiB of 32.0 GiB VM-state, 27.0 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:53:59 migration active, transferred 24.7 GiB of 32.0 GiB VM-state, 25.2 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:54:00 migration active, transferred 24.7 GiB of 32.0 GiB VM-state, 23.2 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:54:01 migration active, transferred 24.7 GiB of 32.0 GiB VM-state, 59.6 MiB/s
2021-10-15 15:54:02 migration active, transferred 24.8 GiB of 32.0 GiB VM-state, 37.7 MiB/s
..

2021-10-15 15:54:18 migration active, transferred 25.4 GiB of 32.0 GiB VM-state, 26.7 MiB/s, VM dirties lots of memory: 29.0 MiB/s
2021-10-15 15:54:19 migration active, transferred 25.5 GiB of 32.0 GiB VM-state, 72.4 MiB/s
2021-10-15 15:54:20 migration active, transferred 25.5 GiB of 32.0 GiB VM-state, 36.2 MiB/s
2021-10-15 15:54:21 migration active, transferred 25.5 GiB of 32.0 GiB VM-state, 38.7 MiB/s

..

2021-10-15 15:55:02 migration active, transferred 27.1 GiB of 32.0 GiB VM-state, 32.6 MiB/s
2021-10-15 15:55:03 migration active, transferred 27.2 GiB of 32.0 GiB VM-state, 350.3 MiB/s
2021-10-15 15:55:03 xbzrle: send updates to 10643 pages in 1.4 MiB encoded memory, cache-miss 97.20%, overflow 148
2021-10-15 15:55:04 migration active, transferred 27.2 GiB of 32.0 GiB VM-state, 115.9 MiB/s
2021-10-15 15:55:04 xbzrle: send updates to 32013 pages in 14.7 MiB encoded memory, cache-miss 97.20%, overflow 1824

the transfer is stuck there.

from dmeg t pve host
Code:
## this is why i was moving vm's and getting ready to restart
[Fri Oct 15 14:16:11 2021] Code: 7f 07 c5 fe 7f 4f 20 c5 fe 7f 54 17 e0 c5 fe 7f 5c 17 c0 c5 f8 77 c3 48 39 f7 0f 87 ab 00 00 00 0f 84 e5 fe ff ff c5 fe 6f 26 <c5> fe 6f 6c 16 e0 c5 fe 6f 74 16 c0 c5 fe 6f 7c 16 a0 c5 7e 6f 44
[Fri Oct 15 14:16:11 2021] sogod[856307]: segfault at 7fff606a5bf3 ip 00007f9711952820 sp 00007ffee06a5b08 error 4 in libc-2.28.so[7f9711818000+148000]
[Fri Oct 15 14:16:11 2021] Code: 7f 07 c5 fe 7f 4f 20 c5 fe 7f 54 17 e0 c5 fe 7f 5c 17 c0 c5 f8 77 c3 48 39 f7 0f 87 ab 00 00 00 0f 84 e5 fe ff ff c5 fe 6f 26 <c5> fe 6f 6c 16 e0 c5 fe 6f 74 16 c0 c5 fe 6f 7c 16 a0 c5 7e 6f 44
[Fri Oct 15 14:16:11 2021] sogod[856300]: segfault at 7fff606a5bf3 ip 00007f9711952820 sp 00007ffee06a5b08 error 4 in libc-2.28.so[7f9711818000+148000]
[Fri Oct 15 14:16:11 2021] Code: 7f 07 c5 fe 7f 4f 20 c5 fe 7f 54 17 e0 c5 fe 7f 5c 17 c0 c5 f8 77 c3 48 39 f7 0f 87 ab 00 00 00 0f 84 e5 fe ff ff c5 fe 6f 26 <c5> fe 6f 6c 16 e0 c5 fe 6f 74 16 c0 c5 fe 6f 7c 16 a0 c5 7e 6f 44

##then system froze as I was migrating the kcm
root@pve4:[~]:# client_loop: send disconnect: Broken pipe

Luckily the kvm ended up on the other node.

earlier in the day I had been adjusting network mtu.

i was able to log in to the node using the ceph network - which is on an different connect-x5 card then pve network.

was able to reboot.


still we have been seeing slow migrations. however it could be due to something in our setup or configuration.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!