[SOLVED] Migrate VM to another node, hangs at 100%

fireon

Distinguished Member
Oct 25, 2010
4,520
489
153
Austria/Graz
deepdoc.at
Hello,

we have a little PVE4 cluster with 3 nodes and local storage. I migrate 2 VMs. The first vm was ok. The second vm hangs on second disk 7 hours. Then have killed the process at cmd with "kill -9". But kill does not work. The process is still running. So what is with this? What can i do? Reboot, yes at night is ok. But what is with the VM, and why hangs the process?

Code:
proxmox-ve: 4.0-16 (running kernel: 4.2.2-1-pve)pve-manager: 4.0-50 (running version: 4.0-50/d3a6b7e5)
pve-kernel-4.2.2-1-pve: 4.2.2-16
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-23
qemu-server: 4.0-31
pve-firmware: 1.1-7
libpve-common-perl: 4.0-32
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-27
pve-libspice-server1: 0.12.5-1
vncterm: 1.2-1
pve-qemu-kvm: 2.4-10
pve-container: 1.0-10
pve-firewall: 2.0-12
pve-ha-manager: 1.0-10
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.3-1
lxcfs: 0.9-pve2
cgmanager: 0.37-pve2
criu: 1.6.0-1
zfsutils: 0.6.5-pve4~jessie

Here the last log:

Code:
45,897,449,472  98%   37.94MB/s    0:00:17  
 45,922,058,240  98%   31.70MB/s    0:00:20  
 45,937,721,344  98%   18.27MB/s    0:00:34  
 45,974,519,808  98%   16.62MB/s    0:00:36  
 46,089,895,936  98%   28.38MB/s    0:00:17  
 46,207,139,840  99%   42.03MB/s    0:00:08  
 46,324,219,904  99%   92.22MB/s    0:00:02  
 46,440,906,752  99%  111.28MB/s    0:00:01  
 46,488,027,136  99%   91.73MB/s    0:00:01  
 46,588,035,072 100%   70.01MB/s    0:10:34 (xfr#1, to-chk=0/1)
vm-107-state-vor_driverupdate.raw


         32,768   0%    0.00kB/s    0:00:00  
      2,293,760   0%    1.77MB/s    0:19:07  
     73,039,872   3%   31.42MB/s    0:01:02  
    110,657,536   5%   32.54MB/s    0:00:59  
    177,471,488   8%   29.49MB/s    0:01:03  
    296,615,936  14%   50.85MB/s    0:00:34  
    413,925,376  19%   58.89MB/s    0:00:27  
    530,776,064  25%   72.94MB/s    0:00:20  
    642,023,424  30%  106.47MB/s    0:00:13  
    692,387,840  33%   90.73MB/s    0:00:15  
    699,498,496  33%   65.47MB/s    0:00:20  
    818,610,176  39%   65.99MB/s    0:00:18  
    935,952,384  44%   70.13MB/s    0:00:16  
  1,052,934,144  50%   86.03MB/s    0:00:11  
  1,170,440,192  56%  112.37MB/s    0:00:07  
  1,286,897,664  61%  111.70MB/s    0:00:06  
  1,360,035,840  65%   87.69MB/s    0:00:08  
  1,455,816,704  69%   83.29MB/s    0:00:07  
  1,483,505,664  71%   60.75MB/s    0:00:09  
  1,588,002,816  76%   54.93MB/s    0:00:08  
  1,594,359,808  76%   43.45MB/s    0:00:11  
  1,669,201,920  79%   39.58MB/s    0:00:10  
  1,786,413,056  85%   59.67MB/s    0:00:04  
  1,887,076,352  90%   63.00MB/s    0:00:03  
  1,982,365,696  95%   92.58MB/s    0:00:01  
  2,051,735,552  98%   91.27MB/s    0:00:00  
  2,086,560,256 100%   68.76MB/s    0:00:28 (xfr#1, to-chk=0/1)
And the task:
Code:
6602 ?        Ds     0:03 task UPID:srv-virtu01:000019CA:00032E45:562B5AFF:qmigrate:107:root@pam:

Best Regards
Fireon
 
most likely a failed NFS server? What kind of (shared) storage do you use? Or do you only use local storage?
 
Then have killed the process at cmd with "kill -9". But kill does not work.

really strange if you only use local storage.

The process is still running. So what is with this? What can i do? Reboot, yes at night is ok. But what is with the VM, and why hangs the process?

not sure what kind of bug triggers here - but a reboot at least abort that hanging process.
 
Ok, i migrate another VM. copied with more then 100mb/s. Everything goes fine. Bute after last HDD same error. Then when i reboot the machine the prosess was killed with message:

Code:
[COLOR=#000000][FONT=tahoma] 42,976,215,040 100%  101.15MB/s    0:06:45 (xfr#1, to-chk=0/1)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 24 23:15:34 # /usr/bin/ssh -o 'BatchMode=yes' root@10.70.99.9 qm unlock 109[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 24 23:15:34 Connection to 10.70.99.9 closed by remote host.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 24 23:15:34 ERROR: failed to clear migrate lock: exit code 255[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 24 23:15:34 ERROR: migration finished with problems (duration 00:24:28)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]TASK ERROR: migration problems[/FONT][/COLOR]

One tasks. The other, first tasks, can't be killed durning machine shutdown, waiting more than half an hour. The only way was to reset the server via ILO. After reboot everything was ok. Also the two migratet machines are running in normal.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!