kvm migration fails, exit code 250

bread-baker

Member
Mar 6, 2010
432
0
16
I get this trying to migrate a kvm:

Code:
Executing HA migrate for VM 101 to node fbc241
Trying to migrate pvevm:101 to fbc241...Temporary failure; try again
TASK ERROR: command 'clusvcadm -M pvevm:101 -m fbc241' failed: exit code 250

However 4 other kvm's migrate back and forth with an issue.

Will chack logs and try migration in stop mode...

Any suggestions?
 
here is the conf file :

Code:
# on drbd for high availability
bootdisk: virtio0
cores: 4
cpu: host
ide2: cdrom,media=cdrom
memory: 2048
name: mail-system
net0: virtio=86:CF:B2:A1:41:7C,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 1
virtio0: drbd-fbc241:vm-101-disk-1
 
and lvs and qm list for the 2 drbd systems:

fb240 where kvm 101 is running
Code:
fbc240 s009 /etc/pve/qemu-server # lvs
  LV             VG          Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
  vm-1023-disk-1 drbd-fbc240 -wi-ao--  32.01g                                           
  vm-200-disk-1  drbd-fbc240 -wi-ao--  15.02g                                           
  vm-2091-disk-1 drbd-fbc240 -wi-ao--  17.00g                                           
  vm-100-disk-1  drbd-fbc241 -wi-----  32.00g                                           
  vm-101-disk-1  drbd-fbc241 -wi-ao--  32.00g                                           
  vm-102-disk-1  drbd-fbc241 -wi-----   2.01g                                           
  vm-103-disk-1  drbd-fbc241 -wi-----   4.00g                                           
  vm-103-disk-2  drbd-fbc241 -wi----- 120.00g                                           
  vm-115-disk-1  drbd-fbc241 -wi-----   9.01g                                           
  bkup           fbc240-vg   -wi-ao-- 362.60g                                           
  data           pve         -wi-ao-- 465.50g                                           
  root           pve         -wi-ao--  10.00g                                           
  swap           pve         -wi-ao--   8.00g                                           
fbc240 s009 /etc/pve/qemu-server # qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID       
       101 mail-system          running    2048              32.00 20424     
       200 winxp                running    2048              15.02 20343     
      1003 ltsp-term-KVM        stopped    256                0.00 0         
      1023 fbc123-x2go-wheezy   running    1536              32.01 20334     
      2091 asterisk-fbc91       running    1024              17.00 20339

and the othere system
Code:
fbc241 s012 ~ # lvs
  LV             VG          Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
  vm-1023-disk-1 drbd-fbc240 -wi-----  32.01g                                           
  vm-200-disk-1  drbd-fbc240 -wi-----  15.02g                                           
  vm-2091-disk-1 drbd-fbc240 -wi-----  17.00g                                           
  vm-100-disk-1  drbd-fbc241 -wi-----  32.00g                                           
  vm-101-disk-1  drbd-fbc241 -wi-----  32.00g                                           
  vm-102-disk-1  drbd-fbc241 -wi-ao--   2.01g                                           
  vm-103-disk-1  drbd-fbc241 -wi-ao--   4.00g                                           
  vm-103-disk-2  drbd-fbc241 -wi-ao-- 120.00g                                           
  vm-115-disk-1  drbd-fbc241 -wi-ao--   9.01g                                           
  bkup           fbc241-vg   -wi-ao-- 500.00g                                           
  data           pve         -wi-ao--   1.88t                                           
  root           pve         -wi-ao--  96.00g                                           
  swap           pve         -wi-ao--  23.00g                                           
fbc241 s012 ~ # qm list
qemu-img: Could not open '/dev/drbd-fbc241/vm-100-disk-1': No such file or directory
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID       
       100 wheezy-kvm           stopped    1024               0.00 0         
       102 sarge-edi            running    256                2.01 6017      
       103 etch-kvm-20          running    512                4.00 5768      
       115 squeeze-kvm          running    1024               9.01 6298      
      1001 ltsp-term-KVM        stopped    512                0.00 0         
      8016 fbc16-x2go           running    4096               8.00 4906      
      8030 fbc30-edi-x2go       running    1024               8.00 5183
 
just noticed

qemu-img: Could not open '/dev/drbd-fbc241/vm-100-disk-1': No such file or directory

that is from a different kvm , trying to start thaat results in

Code:
fbc241 s012 ~ # qm start 100
Executing HA start for VM 100
Member fbc241 trying to enable pvevm:100...Aborted; service failed
command 'clusvcadm -e pvevm:100 -m fbc241' failed: exit code 254

Note that on the 2 drbd systems - fbc240 abd fbc241 - I had to add a nic. so at the same time was testing ha and migration for the cluster.
 
Last edited:
more info
Code:
fbc241 s012 ~ # fence_tool ls
fence domain
member count  3
victim count  0
victim now    0
master nodeid 1
wait state    none
members       1 2 4 

fbc241 s012 ~ # clustat
Cluster Status for fbcluster @ Sat May  5 11:09:25 2012
Member Status: Quorate

 Member Name                                       ID   Status
 ------ ----                                       ---- ------
 fbc240                                                1 Online, rgmanager
 fbc246                                                2 Online, rgmanager
 fbc241                                                4 Online, Local, rgmanager

 Service Name                             Owner (Last)                             State         
 ------- ----                             ----- ------                             -----         
 pvevm:100                                (fbc241)                                 failed        
 pvevm:101                                (fbc240)                                 failed        
 pvevm:102                                fbc241                                   started       
 pvevm:1023                               fbc240                                   started       
 pvevm:103                                fbc241                                   started       
 pvevm:115                                fbc241                                   started       
 pvevm:200                                fbc240                                   started       
 pvevm:2091                               fbc240                                   started
 
Code:
fbc241 s012 ~ # clusvcadm -d pvecm:100
Local machine disabling pvecm:100...Service does not exist
[code]

[code]
fbc240 s009 /etc/pve/qemu-server # clusvcadm -d pvecm:100
Local machine disabling pvecm:100...Service does not exist
[code]

next I'll try from http://forum.proxmox.com/threads/9351-HA-errors-and-odd-behaviour-when-adding-servers-to-HA-Cluster?highlight=exit+code+254
"edited /etc/vpe/cluster/cluster.conf to remove debug l..."
 
OK that edit to debug did not work did not find the line.

so i tried
1- remove 100 from cluster on pve page
2- start from cli and got:
Code:
qm start 100
VM is locked (backup)

so
Code:
qm unlock 100
qm start 100

that started it.

I think the issue was caused by rebooting the systems during the weekly backup. My mistake. I had forgotten that the backups were scheduled for the time I came in to add the nic.

so i'll add back 100 to cluster ....

next try to make 101 migrate...
 
OK I got 101 to migrate

tried:
qm unlock 101
[code
Executing HA migrate for VM 101 to node fbc241
Trying to migrate pvevm:101 to fbc241...Temporary failure; try again
TASK ERROR: command 'clusvcadm -M pvevm:101 -m fbc241' failed: exit code 250
[/code]


tried: remove from cluster
then migrate
got this:
Code:
May 05 11:33:44 starting migration of VM 101 to node 'fbc241' (10.100.100.241)
May 05 11:33:44 copying disk images
May 05 11:33:44 ERROR: Failed to sync data - cant migrate local cdrom drive
May 05 11:33:44 aborting phase 1 - cleanup resources
May 05 11:33:44 ERROR: migration aborted (duration 00:00:00): Failed to sync data - cant migrate local cdrom drive
TASK ERROR: migration aborted

so i thought there must be an iso image instead of just a cdrom drive.

that was not the case, there was an ide cdrom ..

so I delete the cdrom hardware then:
Code:
May 05 11:34:42 starting migration of VM 101 to node 'fbc241' (10.100.100.241)
May 05 11:34:42 copying disk images
May 05 11:34:42 starting VM 101 on remote node 'fbc241'
May 05 11:34:43 starting migration tunnel
May 05 11:34:44 starting online/live migration on port 60000
May 05 11:34:46 migration status: active (transferred 53769KB, remaining 909240KB), total 2113920KB)
May 05 11:34:48 migration status: active (transferred 167394KB, remaining 807820KB), total 2113920KB)
May 05 11:35:03 migration status: completed
May 05 11:35:03 migration speed: 107.79 MB/s
May 05 11:35:05 migration finished successfuly (duration 00:00:23)
TASK OK






[/code]
 
from last nights backup, the is the orig 101.conf:
Code:
fbc241 s012 /bkup/rsnapshot-for-systems/daily.0/fbc241/etc/pve/nodes/fbc241/qemu-server # cat 101.conf
#will move mail here
#10.100.1.5   srv5.fantinibakery.com srv5  #  mail wheezy  kvm 2012-05-02
# on drbd for high availability
bootdisk: virtio0
cores: 4
cpu: host
ide2: cdrom,media=cdrom
memory: 2048
name: mail-system
net0: virtio=86:CF:B2:A1:41:7C,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 1
virtio0: drbd-fbc241:vm-101-disk-1

and the current one :
Code:
fbc241 s012 /etc/pve/qemu-server # cat 101.conf
#will move mail here
#10.100.1.5   srv5.fantinibakery.com srv5  #  mail wheezy  kvm 2012-05-02
# on drbd for high availability
bootdisk: virtio0
cores: 4
cpu: host
memory: 2048
name: mail-system
net0: virtio=86:CF:B2:A1:41:7C,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 1
virtio0: drbd-fbc241:vm-101-disk-1
 
"
ide2: cdrom,media=cdrom"
The above means a psychical cdrom device with an attached cdrom. If you however, did the following
"ide2: none,media=cdrom" I am quit convinced the migration would have succeeded.
 
from last nights backup, the is the orig 101.conf:
Code:
fbc241 s012 /bkup/rsnapshot-for-systems/daily.0/fbc241/etc/pve/nodes/fbc241/qemu-server # cat 101.conf
#will move mail here
#10.100.1.5   srv5.fantinibakery.com srv5  #  mail wheezy  kvm 2012-05-02
# on drbd for high availability
bootdisk: virtio0
cores: 4
cpu: host
ide2: cdrom,media=cdrom
memory: 2048
name: mail-system
net0: virtio=86:CF:B2:A1:41:7C,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 1
virtio0: drbd-fbc241:vm-101-disk-1

and the current one :
Code:
fbc241 s012 /etc/pve/qemu-server # cat 101.conf
#will move mail here
#10.100.1.5   srv5.fantinibakery.com srv5  #  mail wheezy  kvm 2012-05-02
# on drbd for high availability
bootdisk: virtio0
cores: 4
cpu: host
memory: 2048
name: mail-system
net0: virtio=86:CF:B2:A1:41:7C,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 1
virtio0: drbd-fbc241:vm-101-disk-1



Thanks, it works for me the command:

pvecm e 1

Very Thanks. Good works with proxmox!!!