[solved] multiple migrations

davlaw

Renowned Member
Apr 4, 2012
197
5
83
I used to open a couple of browsers and migrate multiple VMs when doing maint or upgrades..

But as of late I have noticed that then the migration starts it will use the same port for each.

Code:
 [COLOR=#000000][FONT=tahoma]Oct 24 09:22:29 starting online/live migration on localhost:60000[/FONT][/COLOR]]

So if I start a 1 migration from host 1 to host 2 this is fine. But if I start another migration the port does not change like it used to.

The port number does not increment like it used to for each migration and I feel that depending what stage either migration can get confused and I get a error

Code:
[COLOR=#000000][FONT=tahoma]Oct 24 09:11:00 migration status: completed[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 24 09:11:32 ssh tunnel still running - terminating now with SIGTERM[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 24 09:11:33 migration finished successfuly (duration 00:00:52)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]TASK OK[/FONT][/COLOR]

I feel it detects the other migration that is running and terminates the other instance and it fails.

Is this proper behavior?
 
Last edited:
Re: multiple migrations

Guess I have something different than most.. but it appears that when doing a migration the guest gets the port 60000 appended. This machine was migrated this morning, note time-stamp. I still have many of these running in this config and am working to restart them.


Code:
root        5204 15.2  1.9 2479220 1301296 ?     Sl   08:26  65:35 /usr/bin/kvm -id 101 -chardev socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/101.vnc,x509,password -pidfile /var/run/qemu-server/101.pid -daemonize -name roes1 -smp sockets=1,cores=2 -nodefaults -boot menu=on -vga vmware -cpu kvm64,+x2apic,+sep -k en-us -m 2048 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -drive file=/mnt/pve/proxmoxnfs/images/101/vm-101-disk-1.qcow2,if=none,id=drive-virtio1,format=qcow2,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,bootindex=100 -drive if=none,id=drive-ide2,media=cdrom,aio=native -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -netdev type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,vhost=on -device virtio-net-pci,mac=02:39:A4:34:D1:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,base=localtime -machine type=pc-i440fx-1.4 -incoming tcp:localhost:60000 -S


Now after stop/start of node

Code:
root       55596 23.0  3.1 2409024 2107804 ?     Sl   15:47   0:41 /usr/bin/kvm -id 101 -chardev socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/101.vnc,x509,password -pidfile /var/run/qemu-server/101.pid -daemonize -name roes1 -smp sockets=1,cores=2 -nodefaults -boot menu=on -vga vmware -cpu kvm64,+x2apic,+sep -k en-us -m 2048 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -drive file=/mnt/pve/proxmoxnfs/images/101/vm-101-disk-1.qcow2,if=none,id=drive-virtio1,format=qcow2,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,bootindex=100 -drive if=none,id=drive-ide2,media=cdrom,aio=native -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -netdev type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,vhost=on -device virtio-net-pci,mac=02:39:A4:34:D1:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,base=localtime

I have 4 nodes, all showing same versions.

Code:
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-21 (running version: 3.1-21/93bf03d4)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-25-pve: 2.6.32-113
pve-kernel-2.6.32-22-pve: 2.6.32-107
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-8
libpve-access-control: 3.0-7
libpve-storage-perl: 3.0-17
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-4
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

Just completed updates from the no subscription repo this morning.

Have I missed something?
 
Re: multiple migrations

Looking for the matching migration failure that that fits this time frame. But looks like the gui is not showing all the failures. Just some. So still looking.

Would a failure caused by SIGTERM be logged?

Code:
Oct 24 08:43:06 starting migration of VM 142 to node 'proliant01' (10.10.0.200)
Oct 24 08:43:06 copying disk images
Oct 24 08:43:06 starting VM 142 on remote node 'proliant01'
Oct 24 08:43:09 starting ssh migration tunnel
Oct 24 08:43:09 starting online/live migration on localhost:60000
Oct 24 08:43:09 migrate_set_speed: 8589934592
Oct 24 08:43:09 migrate_set_downtime: 0.1
Oct 24 08:43:11 migration status: active (transferred 104229362, remaining 4187598848), total 4312137728)
Oct 24 08:43:13 migration status: active (transferred 229565976, remaining 3888996352), total 4312137728)
Oct 24 08:43:15 migration status: active (transferred 298789978, remaining 3813490688), total 4312137728)
Oct 24 08:43:17 migration status: active (transferred 474001552, remaining 3623948288), total 4312137728)
Oct 24 08:43:21 migration status: active (transferred 632543309, remaining 3461935104), total 4312137728)
Oct 24 08:43:25 migration status: active (transferred 720579859, remaining 3373326336), total 4312137728)
Oct 24 08:43:27 migration status: active (transferred 808710018, remaining 3284979712), total 4312137728)
Oct 24 08:43:31 migration status: active (transferred 898863044, remaining 3194798080), total 4312137728)
Oct 24 08:43:33 migration status: active (transferred 988088296, remaining 3105665024), total 4312137728)
Oct 24 08:43:35 migration status: active (transferred 1075265535, remaining 3018629120), total 4312137728)
Oct 24 08:43:37 migration status: active (transferred 1161909272, remaining 2932117504), total 4312137728)
Oct 24 08:43:39 migration status: active (transferred 1328388165, remaining 2765910016), total 4312137728)
Oct 24 08:43:41 migration status: active (transferred 1408988447, remaining 2684641280), total 4312137728)
Oct 24 08:43:43 migration status: active (transferred 1484726236, remaining 2604158976), total 4312137728)
Oct 24 08:43:45 migration status: active (transferred 1642783757, remaining 2446340096), total 4312137728)
Oct 24 08:43:47 migration status: active (transferred 1721142985, remaining 2367430656), total 4312137728)
Oct 24 08:43:49 migration status: active (transferred 1876725974, remaining 2210136064), total 4312137728)
Oct 24 08:43:51 migration status: active (transferred 1955576602, remaining 2131226624), total 4312137728)
Oct 24 08:43:53 migration status: active (transferred 2032634772, remaining 2051792896), total 4312137728)
Oct 24 08:43:55 migration status: active (transferred 2111591870, remaining 1972883456), total 4312137728)
Oct 24 08:43:57 migration status: active (transferred 2189650435, remaining 1894760448), total 4312137728)
Oct 24 08:43:59 migration status: active (transferred 2268123747, remaining 1816113152), total 4312137728)
Oct 24 08:44:03 migration status: active (transferred 2343389585, remaining 1739825152), total 4312137728)
Oct 24 08:44:05 migration status: active (transferred 2421131683, remaining 1662226432), total 4312137728)
Oct 24 08:44:09 migration status: active (transferred 2499091392, remaining 1584365568), total 4312137728)
Oct 24 08:44:11 migration status: active (transferred 2576312849, remaining 1507028992), total 4312137728)
Oct 24 08:44:15 migration status: active (transferred 2654072479, remaining 1428905984), total 4312137728)
Oct 24 08:44:17 migration status: active (transferred 2729161530, remaining 1349210112), total 4312137728)
Oct 24 08:44:19 migration status: active (transferred 2805905600, remaining 1271087104), total 4312137728)
Oct 24 08:44:23 migration status: active (transferred 2883687433, remaining 1192177664), total 4312137728)
Oct 24 08:44:25 migration status: active (transferred 2960933460, remaining 1114841088), total 4312137728)
Oct 24 08:44:29 migration status: active (transferred 3037953623, remaining 1035931648), total 4312137728)
Oct 24 08:44:31 migration status: active (transferred 3114374764, remaining 957546496), total 4312137728)
Oct 24 08:44:35 migration status: active (transferred 3190829241, remaining 878899200), total 4312137728)
Oct 24 08:44:37 migration status: active (transferred 3260217800, remaining 791863296), total 4312137728)
Oct 24 08:44:39 migration status: active (transferred 3330726661, remaining 705613824), total 4312137728)
Oct 24 08:44:43 migration status: active (transferred 3411043812, remaining 624607232), total 4312137728)
Oct 24 08:44:45 migration status: active (transferred 3494452499, remaining 538095616), total 4312137728)
Oct 24 08:44:49 migration status: active (transferred 3582704932, remaining 450011136), total 4312137728)
Oct 24 08:44:51 migration status: active (transferred 3673017663, remaining 359829504), total 4312137728)
Oct 24 08:44:55 migration status: active (transferred 3756016976, remaining 276987904), total 4312137728)
Oct 24 08:44:59 migration status: active (transferred 3835552621, remaining 197554176), total 4312137728)
Oct 24 08:45:01 migration status: active (transferred 3916102611, remaining 116809728), total 4312137728)
Oct 24 08:45:03 migration status: active (transferred 3993286574, remaining 37900288), total 4312137728)
Oct 24 08:45:06 migration status: active (transferred 4017383003, remaining 0), total 4312137728)
Oct 24 08:45:07 migration status: active (transferred 4068612953, remaining 0), total 4312137728)
Oct 24 08:45:09 migration status: active (transferred 4077436586, remaining 0), total 4312137728)
Oct 24 08:45:09 migration status: active (transferred 4082234186, remaining 0), total 4312137728)
Oct 24 08:45:10 migration status: active (transferred 4089079692, remaining 0), total 4312137728)
Oct 24 08:45:10 migration status: active (transferred 4097780222, remaining 0), total 4312137728)
Oct 24 08:45:10 migrate_set_downtime: 0.2
Oct 24 08:45:10 migration status: active (transferred 4101748807, remaining 0), total 4312137728)
Oct 24 08:45:11 migration speed: 33.57 MB/s - downtime 148 ms
Oct 24 08:45:11 migration status: completed
Oct 24 08:45:42 ssh tunnel still running - terminating now with SIGTERM
Oct 24 08:45:44 migration finished successfuly (duration 00:02:38)
TASK OK


I think this is the matching failure

Code:
 Oct 24 08:43:51 starting migration of VM 146 to node 'proliant01' (10.10.0.200)
Oct 24 08:43:51 copying disk images
Oct 24 08:43:51 starting VM 146 on remote node 'proliant01'
Oct 24 08:43:54 starting ssh migration tunnel
Oct 24 08:43:54 starting online/live migration on localhost:60000
Oct 24 08:43:54 migrate_set_speed: 8589934592
Oct 24 08:43:54 migrate_set_downtime: 0.1
Oct 24 08:43:56 migration status: active (transferred 54600970, remaining 4242067456), total 4312072192)
Oct 24 08:43:58 migration status: active (transferred 121032497, remaining 4175736832), total 4312072192)
Oct 24 08:44:00 migration status: active (transferred 167756617, remaining 4129071104), total 4312072192)
Oct 24 08:44:02 migration status: active (transferred 233145693, remaining 4063793152), total 4312072192)
Oct 24 08:44:04 migration status: active (transferred 306939761, remaining 3990126592), total 4312072192)
Oct 24 08:44:06 migration status: active (transferred 387255184, remaining 3909906432), total 4312072192)
Oct 24 08:44:10 migration status: active (transferred 463425441, remaining 3833880576), total 4312072192)
Oct 24 08:44:12 migration status: active (transferred 540346811, remaining 3757068288), total 4312072192)
Oct 24 08:44:16 migration status: active (transferred 616496593, remaining 3681042432), total 4312072192)
Oct 24 08:44:18 migration status: active (transferred 693438438, remaining 3604230144), total 4312072192)
Oct 24 08:44:20 migration status: active (transferred 769822211, remaining 3527942144), total 4312072192)
Oct 24 08:44:24 migration status: active (transferred 846096571, remaining 3451129856), total 4312072192)
Oct 24 08:44:26 migration status: active (transferred 928792278, remaining 3368550400), total 4312072192)
Oct 24 08:44:30 migration status: active (transferred 1013621993, remaining 3283873792), total 4312072192)
Oct 24 08:44:32 migration status: active (transferred 1098431233, remaining 3199197184), total 4312072192)
Oct 24 08:44:36 migration status: active (transferred 1184048916, remaining 3113734144), total 4312072192)
Oct 24 08:44:40 migration status: active (transferred 1271759657, remaining 3026173952), total 4312072192)
Oct 24 08:44:42 migration status: active (transferred 1357627711, remaining 2940448768), total 4312072192)
Oct 24 08:44:46 migration status: active (transferred 1444041552, remaining 2854199296), total 4312072192)
Oct 24 08:44:48 migration status: active (transferred 1529400674, remaining 2768998400), total 4312072192)
Oct 24 08:44:52 migration status: active (transferred 1614759796, remaining 2683797504), total 4312072192)
Oct 24 08:44:56 migration status: active (transferred 1699839882, remaining 2598858752), total 4312072192)
Oct 24 08:44:58 migration status: active (transferred 1785194909, remaining 2513657856), total 4312072192)
Oct 24 08:45:02 migration status: active (transferred 1870258615, remaining 2428719104), total 4312072192)
Oct 24 08:45:04 migration status: active (transferred 1955355081, remaining 2343780352), total 4312072192)
Oct 24 08:45:08 migration status: active (transferred 2040710108, remaining 2258579456), total 4312072192)
Oct 24 08:45:12 migration status: active (transferred 2194848264, remaining 2104692736), total 4312072192)
Oct 24 08:45:14 migration status: active (transferred 2271502883, remaining 2028142592), total 4312072192)
Oct 24 08:45:16 migration status: active (transferred 2348707384, remaining 1951068160), total 4312072192)
Oct 24 08:45:18 migration status: active (transferred 2502825065, remaining 1797181440), total 4312072192)
Oct 24 08:45:20 migration status: active (transferred 2593425022, remaining 1706737664), total 4312072192)
Oct 24 08:45:22 migration status: active (transferred 2685561117, remaining 1614196736), total 4312072192)
Oct 24 08:45:24 migration status: active (transferred 2778787634, remaining 1521131520), total 4312072192)
Oct 24 08:45:26 migration status: active (transferred 2965195623, remaining 1335001088), total 4312072192)
Oct 24 08:45:28 migration status: active (transferred 3059193728, remaining 1241149440), total 4312072192)
Oct 24 08:45:30 migration status: active (transferred 3153220498, remaining 1147297792), total 4312072192)
Oct 24 08:45:32 migration status: active (transferred 3246693291, remaining 1053970432), total 4312072192)
Oct 24 08:45:34 migration status: active (transferred 3340945862, remaining 959856640), total 4312072192)
Oct 24 08:45:36 migration status: active (transferred 3435477469, remaining 865480704), total 4312072192)
Oct 24 08:45:38 migration status: active (transferred 3611473979, remaining 689573888), total 4312072192)
Oct 24 08:45:40 migration status: active (transferred 3679533139, remaining 594411520), total 4312072192)
Oct 24 08:45:42 migration status: active (transferred 3836416889, remaining 394911744), total 4312072192)
Oct 24 08:45:44 ERROR: online migrate failure - aborting
Oct 24 08:45:44 aborting phase 2 - cleanup resources
Oct 24 08:45:44 migrate_cancel
Oct 24 08:45:45 ERROR: migration finished with problems (duration 00:01:55)
TASK ERROR: migration problems



Another occurrence

Code:
  Oct 24 09:10:41 starting migration of VM 153 to node 'proliant02' (10.10.0.201)
Oct 24 09:10:41 copying disk images
Oct 24 09:10:41 starting VM 153 on remote node 'proliant02'
Oct 24 09:10:43 starting ssh migration tunnel
Oct 24 09:10:43 starting online/live migration on localhost:60000
Oct 24 09:10:43 migrate_set_speed: 8589934592
Oct 24 09:10:43 migrate_set_downtime: 0.1
Oct 24 09:10:45 migration status: active (transferred 157293913, remaining 1966723072), total 2164654080)
Oct 24 09:10:47 migration status: active (transferred 169226815, remaining 805650432), total 2164654080)
Oct 24 09:10:49 migration status: active (transferred 304766407, remaining 507842560), total 2164654080)
Oct 24 09:10:51 migration status: active (transferred 377049870, remaining 430243840), total 2164654080)
Oct 24 09:10:53 migration status: active (transferred 451077290, remaining 354742272), total 2164654080)
Oct 24 09:10:55 migration status: active (transferred 601343241, remaining 198234112), total 2164654080)
Oct 24 09:10:57 migration status: active (transferred 675619880, remaining 122994688), total 2164654080)
Oct 24 09:10:59 migration status: active (transferred 767335670, remaining 0), total 2164654080)
Oct 24 09:11:00 migration status: active (transferred 783305618, remaining 0), total 2164654080)
Oct 24 09:11:00 migration speed: 120.47 MB/s - downtime 35 ms
Oct 24 09:11:00 migration status: completed
Oct 24 09:11:32 ssh tunnel still running - terminating now with SIGTERM
Oct 24 09:11:33 migration finished successfuly (duration 00:00:52)
TASK OK

matching migration error

Code:
 Oct 24 09:10:57 starting migration of VM 142 to node 'proliant02' (10.10.0.201)
Oct 24 09:10:57 copying disk images
Oct 24 09:10:57 starting VM 142 on remote node 'proliant02'
Oct 24 09:10:59 starting ssh migration tunnel
Oct 24 09:11:00 starting online/live migration on localhost:60000
Oct 24 09:11:00 migrate_set_speed: 8589934592
Oct 24 09:11:00 migrate_set_downtime: 0.1
Oct 24 09:11:02 migration status: active (transferred 166450135, remaining 4125478912), total 4312137728)
Oct 24 09:11:04 migration status: active (transferred 223639463, remaining 3895046144), total 4312137728)
Oct 24 09:11:06 migration status: active (transferred 376473366, remaining 3732770816), total 4312137728)
Oct 24 09:11:08 migration status: active (transferred 452604662, remaining 3646521344), total 4312137728)
Oct 24 09:11:10 migration status: active (transferred 533907430, remaining 3562369024), total 4312137728)
Oct 24 09:11:12 migration status: active (transferred 617320275, remaining 3477692416), total 4312137728)
Oct 24 09:11:14 migration status: active (transferred 772589164, remaining 3321708544), total 4312137728)
Oct 24 09:11:16 migration status: active (transferred 850578114, remaining 3243585536), total 4312137728)
Oct 24 09:11:18 migration status: active (transferred 926863607, remaining 3167297536), total 4312137728)
Oct 24 09:11:20 migration status: active (transferred 1097762607, remaining 2996633600), total 4312137728)
Oct 24 09:11:22 migration status: active (transferred 1177322822, remaining 2917199872), total 4312137728)
Oct 24 09:11:24 migration status: active (transferred 1255323481, remaining 2839339008), total 4312137728)
Oct 24 09:11:26 migration status: active (transferred 1410752075, remaining 2683355136), total 4312137728)
Oct 24 09:11:28 migration status: active (transferred 1483075336, remaining 2606280704), total 4312137728)
Oct 24 09:11:30 migration status: active (transferred 1562368800, remaining 2527109120), total 4312137728)
Oct 24 09:11:32 ERROR: online migrate failure - aborting
Oct 24 09:11:32 aborting phase 2 - cleanup resources
Oct 24 09:11:32 migrate_cancel
Oct 24 09:11:33 ERROR: migration finished with problems (duration 00:00:36)
TASK ERROR: migration problems
 
Last edited:
Re: multiple migrations

I thought it odd that it (proxmox) did not increment the port #s now when running more than one migration...
 
Re: multiple migrations

good news, solved.. had installed fail2ban with no proxmox config....

I'm guessing the last part of the line is benign and just part of the migration process? Cause its still there after migration, just never looked that far down into htop .

Code:
-incoming tcp:localhost:60000 -S


Guess I need a nudge in the right direction to reestablish trust between the servers, update certs on each?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!