proxmox 2.1 - openvz live migration fails

ibsa · Sep 26, 2012

I have a reproducible migration issue with openvz live migration. Pveversion output below. Any ideas appreciated

Sep 26 11:41:33 starting migration of CT 7002 to node 'pvebc-master' (10.254.254.251)
Sep 26 11:41:33 container is running - using online migration
Sep 26 11:41:33 starting rsync phase 1
Sep 26 11:41:33 # /usr/bin/rsync -aHAX --delete --numeric-ids --sparse /var/lib/vz/private/7002 root@10.254.254.251:/var/lib/vz/private
Sep 26 11:45:42 start live migration - suspending container
Sep 26 11:45:42 dump container state
Sep 26 11:45:52 copy dump file to target node
Sep 26 11:46:34 starting rsync (2nd pass)
Sep 26 11:46:34 # /usr/bin/rsync -aHAX --delete --numeric-ids /var/lib/vz/private/7002 root@10.254.254.251:/var/lib/vz/private
Sep 26 11:46:39 dump 2nd level quota
Sep 26 11:46:39 copy 2nd level quota to target node
Sep 26 11:46:41 initialize container on remote node 'pvebc-master'
Sep 26 11:46:41 initializing remote quota
Sep 26 11:46:42 turn on remote quota
Sep 26 11:46:42 load 2nd level quota
Sep 26 11:46:42 starting container on remote node 'pvebc-master'
Sep 26 11:46:42 restore container state
Sep 26 11:46:44 # /usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@10.254.254.251 vzctl restore 7002 --undump --dumpfile /var/lib/vz/dump/dump.7002 --skip_arpdetect
Sep 26 11:46:42 Restoring container ...
Sep 26 11:46:42 Starting container ...
Sep 26 11:46:42 Container is mounted
Sep 26 11:46:42 undump...
Sep 26 11:46:42 Adding IP address(es): 10.5.0.139
Sep 26 11:46:42 Setting CPU units: 1000
Sep 26 11:46:42 Setting CPUs: 2
Sep 26 11:46:44 vzquota : (warning) Quota is running for id 7002 already
Sep 26 11:46:44 Error: undump failed: Bad address
Sep 26 11:46:44 Restoring failed:
Sep 26 11:46:44 Error: do_rst_mm: failed to restore vma: -14
Sep 26 11:46:44 Error: do_rst_mm 17381512
Sep 26 11:46:44 Error: rst_mm: -14
Sep 26 11:46:44 Error: make_baby: -14
Sep 26 11:46:44 Error: rst_clone_children
Sep 26 11:46:44 Error: make_baby: -14
Sep 26 11:46:44 Error: rst_clone_children
Sep 26 11:46:44 Container start failed
Sep 26 11:46:44 ERROR: online migrate failure - Failed to restore container: Can't umount /var/lib/vz/root/7002: Device or resource busy
Sep 26 11:46:44 removing container files on local node
Sep 26 11:46:49 start final cleanup
Sep 26 11:46:51 ERROR: migration finished with problems (duration 00:05:19)
TASK ERROR: migration problems

#pveversion -v
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-74
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-49
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1

dietmar · Sep 26, 2012

What OS/services runs inside the container - something special?

ibsa · Sep 28, 2012

dietmar said:
What OS/services runs inside the container - something special?

The containers run jetty and postgresql on ubuntu 10.04LTS. These containers are from vzbackups created with a proxmox 1.9 host. I wouldn't consider these networked services to be anything special.

A search of these forums show a number of almost-similar threads, but I couldn't find anything related to "Error: undump failed: Bad address" What does this error mean?

EDIT: I found the following odd entry on the source (migrate-from) machine.

Oct 2 13:41:54 pvebc-master kernel: Holy Crap 1 0 543383,262(jsvc)

The timestamp corresponds to the beginning of the migration process.

After enabling debug logging (??) with "echo 3 > /proc/sys/debug/rst", the destination machine produces over 700 lines of log, here are some examples:

CPT DBG: ffff880681e05000,1460: JFixup 5699 14559378000
CPT WRN: ffff880681e05000,1460: SNMP stats trimmed
CPT WRN: ffff880681e05000,1460: VMA 7fb4ada91000@2599120 flag mismatch 08201875 08001875
CPT DBG: ffff880681e05000,1460: rst_file: file obtained by dentry_open
CPT WRN: ffff880681e05000,1460: file 2378456 mode mismatch 00000012 00000002
CPT DBG: ffff880681e05000,1460: file is attached to a socket
CPT WRN: ffff880681e05000,1460: fixup_file_flags: oops... creds mismatch
CPT DBG: ffff880681e05000,1460: bsd lock restored
CPT DBG: ffff880681e05000,1460: 430712,162(syslog-ng)open RDONLY fifo ino 2349704 ffff88068ab885c0 11a0
CPT DBG: ffff880681e05000,1460: restoring SHM block: 00000000-00034000
CPT DBG: ffff880681e05000,1460: vma pgoff mismatch, fixing
CPT DBG: ffff880681e05000,1460: vma 6234224 merged, split

CPT ERR: ffff880681e05000,1460 :do_rst_mm: failed to restore vma: -14
CPT ERR: ffff880681e05000,1460 :do_rst_mm 6647344
CPT ERR: ffff880681e05000,1460 :rst_mm: -14
CPT DBG: ffff880681e05000,1460: leaked through 430719/249 ffff8807d437d9c0
CPT ERR: ffff880681e05000,1460 :make_baby: -14
CPT ERR: ffff880681e05000,1460 :rst_clone_children
CPT DBG: ffff880681e05000,1460: leaked through 430718/248 ffff88043d884bc0
CPT ERR: ffff880681e05000,1460 :make_baby: -14
CPT ERR: ffff880681e05000,1460 :rst_clone_children
CPT DBG: ffff880681e05000,1460: leaked through 430681/1 ffff880791df27c0
CPT DBG: ffff880681e05000,1460: leaked through 430707/150 ffff88043da4c280
CPT DBG: ffff880681e05000,1460: leaked through 430715/180 ffff88077ed942c0
CPT DBG: ffff880681e05000,1460: leaked through 430711/164 ffff88083d854c00
CPT DBG: ffff880681e05000,1460: leaked through 430708/153 ffff8803de4c8840
CPT DBG: ffff880681e05000,1460: leaked through 430714/179 ffff88083d8a66c0
CPT DBG: ffff880681e05000,1460: leaked through 430709/155 ffff8803de4c8dc0
CPT DBG: ffff880681e05000,1460: leaked through 430713/176 ffff88043d8840c0
CPT DBG: ffff880681e05000,1460: leaked through 430717/182 ffff88083d8a71c0
CPT DBG: ffff880681e05000,1460: leaked through 430710/161 ffff88043d934c80
CPT DBG: ffff880681e05000,1460: leaked through 430716/181 ffff88077ed958c0
CPT DBG: ffff880681e05000,1460: leaked through 430712/162 ffff8803de4c9340
CT: 1460: stopped

Can anyone suggest additional tests to perform or workarounds that would get online openvz migration working reliably.

thanks..

ibsa · Oct 2, 2012

Also, what is the significance of the quota warning?

Sep 26 11:46:44 vzquota : (warning) Quota is running for id 7002 already

dietmar · Oct 2, 2012

ibsa said:
Also, what is the significance of the quota warning?

Sep 26 11:46:44 vzquota : (warning) Quota is running for id 7002 already

You can ignore that.

ibsa · Oct 3, 2012

dietmar said:
You can ignore that.

Ok, that's good to know as it appears frequently.

Have you any ideas about the root cause of the migration failure?
I see a number of suspicious log entries but don't know what is relevant/important.

Sep 26 11:46:44 Error: undump failed: Bad address
Sep 26 11:46:44 Restoring failed:
Sep 26 11:46:44 Error: do_rst_mm: failed to restore vma: -14

cheers

Search

Search

proxmox 2.1 - openvz live migration fails

ibsa

New Member

dietmar

Proxmox Staff Member

ibsa

New Member

ibsa

New Member

dietmar

Proxmox Staff Member

ibsa

New Member

We value your privacy