container live migration failure

adoreadonai

New Member
Aug 4, 2013
4
0
1
I have a working 3 node HA cluster. I can successfully live-migrate any kvm. All containers fail to live-migrate.
I cant find any reason for this to be happening. The file exists (/lib64/libnss_files-2.12.so) within the container.

Code:
-rwxr-xr-x 1 root root   65928 Aug 27  2012 libnss_files-2.12.so
lrwxrwxrwx 1 root root      20 Oct 23  2012 libnss_files.so.2 -> libnss_files-2.12.so
Not sure where to start here. Any advise would be greatly appreciated. Thanks

Code:
Aug 08 09:01:56 starting migration of CT 225 to node 'proxmox1' (10.10.0.10)
Aug 08 09:01:56 container is running - using online migration
Aug 08 09:01:56 container data is on shared storage 'proxCT_0'
Aug 08 09:01:56 start live migration - suspending container
Aug 08 09:01:57 dump container state
Aug 08 09:01:57 dump 2nd level quota
Aug 08 09:01:58 initialize container on remote node 'proxmox1'
Aug 08 09:01:58 initializing remote quota
Aug 08 09:02:11 turn on remote quota
Aug 08 09:02:11 load 2nd level quota
Aug 08 09:02:11 starting container on remote node 'proxmox1'
Aug 08 09:02:11 restore container state
Aug 08 09:02:12 # /usr/bin/ssh -o 'BatchMode=yes' [EMAIL="root@10.10.0.10"]root@10.10.0.10[/EMAIL] vzctl restore 225 --undump --dumpfile /mnt/pve/proxCT_0/dump/dump.225 --skip_arpdetect
Aug 08 09:02:11 Restoring container ...
Aug 08 09:02:12 Starting container ...
Aug 08 09:02:12 Container is mounted
Aug 08 09:02:12     undump...
Aug 08 09:02:12 Setting CPU units: 1000
Aug 08 09:02:12 Setting CPUs: 1
Aug 08 09:02:12 Configure veth devices: veth225.0 
Aug 08 09:02:12 Adding interface veth225.0 to bridge vmbr0 on CT0 for CT225
Aug 08 09:02:12 vzquota : (warning) Quota is running for id 225 already
Aug 08 09:02:12 Error: undump failed: Not a directory
Aug 08 09:02:12 Restoring failed:
Aug 08 09:02:12 Error: can't open file /lib64/libnss_files-2.12.so
Aug 08 09:02:12 Error: do_rst_vma: rst_file: 83824
Aug 08 09:02:12 Error: do_rst_mm: failed to restore vma: -20
Aug 08 09:02:12 Error: do_rst_mm 238784
Aug 08 09:02:12 Error: rst_mm: -20
Aug 08 09:02:12 Container start failed
Aug 08 09:02:12 ERROR: online migrate failure - Failed to restore container: Can't umount /var/lib/vz/root/225: Device or resource busy
Aug 08 09:02:12 start final cleanup
Aug 08 09:02:12 ERROR: migration finished with problems (duration 00:00:16)
TASK ERROR: migration problems



proxmox2 /var/log/messages
Code:
Aug  8 09:01:30 proxmox2 kernel: CT: 225: started
 Aug  8 09:01:30 proxmox2 kernel: device veth225.0 entered promiscuous mode
 Aug  8 09:01:30 proxmox2 kernel: vmbr0: port 5(veth225.0) entering forwarding state
 Aug  8 09:01:31 proxmox2 kernel: Pid: 227399, comm: rc.sysinit veid: 225 Not tainted 2.6.32-22-pve #1
 Aug  8 09:01:31 proxmox2 kernel: Call Trace:
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa0597675>] ? vzquota_inode_qmblk_recalc+0x335/0x460 [vzdquota]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa0597fd1>] ? vzquota_inode_data+0x81/0x100 [vzdquota]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa059b259>] ? __vzquota_alloc_space+0x39/0x350 [vzdquota]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa059b5a0>] ? vzquota_alloc_space+0x10/0x20 [vzdquota]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa035f34c>] ? nfs_dq_sync_blocks+0x18c/0x1f0 [nfs]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa0324080>] ? nfs_find_actor+0x0/0x70 [nfs]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa0325a27>] ? nfs_refresh_inode+0x77/0x90 [nfs]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa03264e3>] ? nfs_fhget+0x3f3/0x6b0 [nfs]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa031f8d1>] ? nfs_readdir_page_filler+0x341/0x5a0 [nfs]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa031fd08>] ? nfs_readdir_xdr_to_array+0x1d8/0x2c0 [nfs]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa031fe16>] ? nfs_readdir_filler+0x26/0xa0 [nfs]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffff8112c56b>] ? do_read_cache_page+0xab/0x1d0
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa031fdf0>] ? nfs_readdir_filler+0x0/0xa0 [nfs]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffff811b4760>] ? filldir+0x0/0xf0
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffff8112c6d9>] ? read_cache_page_async+0x19/0x20
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffff8112c6ee>] ? read_cache_page+0xe/0x20
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa031fff2>] ? nfs_readdir+0x162/0x5b0 [nfs]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffffa03515f0>] ? nfs4_decode_dirent+0x0/0x1c0 [nfs]
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffff811b4760>] ? filldir+0x0/0xf0
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffff811b49d0>] ? vfs_readdir+0xa0/0xd0
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffff811b4afa>] ? sys_getdents+0x8a/0x100
 Aug  8 09:01:31 proxmox2 kernel: [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
 Aug  8 09:01:31 proxmox2 kernel: eth0: IPv6 duplicate address fe80::e0aa:ccff:fea2:164f detected!
 Aug  8 09:01:56 proxmox2 kernel: Holy Crap 1 0 227976,591(mingetty)
 Aug  8 09:01:57 proxmox2 kernel: vmbr0: port 5(veth225.0) entering disabled state
 Aug  8 09:01:57 proxmox2 kernel: device veth225.0 left promiscuous mode
 Aug  8 09:01:57 proxmox2 kernel: vmbr0: port 5(veth225.0) entering disabled state
 Aug  8 09:01:57 proxmox2 kernel: CT: 225: stopped



proxmox1 /var/log/syslog
Code:
Aug  8 09:02:12 proxmox1 kernel: CT: 225: started
  Aug  8 09:02:12 proxmox1 kernel: device veth225.0 entered promiscuous mode
  Aug  8 09:02:12 proxmox1 kernel: vmbr0: port 10(veth225.0) entering forwarding state
  Aug  8 09:02:12 proxmox1 kernel: CPT ERR: ffff8804b4636000,225 :can't open file /lib64/libnss_files-2.12.so
  Aug  8 09:02:12 proxmox1 kernel: CPT ERR: ffff8804b4636000,225 :do_rst_vma: rst_file: 83824
  Aug  8 09:02:12 proxmox1 kernel: CPT ERR: ffff8804b4636000,225 :do_rst_mm: failed to restore vma: -20
  Aug  8 09:02:12 proxmox1 kernel: CPT ERR: ffff8804b4636000,225 :do_rst_mm 238784
  Aug  8 09:02:12 proxmox1 kernel: CPT ERR: ffff8804b4636000,225 :rst_mm: -20
  Aug  8 09:02:12 proxmox1 kernel: vmbr0: port 10(veth225.0) entering disabled state
  Aug  8 09:02:12 proxmox1 kernel: device veth225.0 left promiscuous mode
  Aug  8 09:02:12 proxmox1 kernel: vmbr0: port 10(veth225.0) entering disabled state
  Aug  8 09:02:12 proxmox1 kernel: CT: 225: stopped


Code:
 root@proxmox2:~# pveversion -v
 pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
 running kernel: 2.6.32-22-pve
 proxmox-ve-2.6.32: 3.0-107
 pve-kernel-2.6.32-14-pve: 2.6.32-74
 pve-kernel-2.6.32-22-pve: 2.6.32-107
 pve-kernel-2.6.32-11-pve: 2.6.32-66
 pve-kernel-2.6.32-20-pve: 2.6.32-100
 pve-kernel-2.6.32-16-pve: 2.6.32-82
 pve-kernel-2.6.32-19-pve: 2.6.32-96
 lvm2: 2.02.95-pve3
 clvm: 2.02.95-pve3
 corosync-pve: 1.4.5-1
 openais-pve: 1.1.4-3
 libqb0: 0.11.1-2
 redhat-cluster-pve: 3.2.0-2
 resource-agents-pve: 3.9.2-4
 fence-agents-pve: 4.0.0-1
 pve-cluster: 3.0-4
 qemu-server: 3.0-20
 pve-firmware: 1.0-23
 libpve-common-perl: 3.0-4
 libpve-access-control: 3.0-4
 libpve-storage-perl: 3.0-8
 vncterm: 1.1-4
 vzctl: 4.0-1pve3
 vzprocps: 2.0.11-2
 vzquota: 3.1-2
 pve-qemu-kvm: 1.4-13
 ksm-control-daemon: 1.1-1
 
Last edited:
Re: container live migration failure - ipv6 duplicate address

Just to help some people out that may be looking at this. The IPv6 issue has been resolved. Turns out we had 3 containers with interfaces that had the same MAC. But seems that the undump part is the last issue we have. The other thing is that after the undump error it shows that it cant open /lib64/libnss_files.so which in container 255 is: -rwxr-xr-x 1 root root 65928 Aug 27 2012 libnss_files-2.12.so

So I am as of right now unaware as to why it wouldnt be able to open the file.
 
Re: container live migration failure - ipv6 duplicate address

Right I removed each network device and recreated with random mac addresses. The ipv6 issue must have been caused through restoring/cloning vzdump backups.

Either way that ipv6 issue is resolved and moot now. But the rest remains.