[SOLVED] Proxmox 7.0-14+1 crashes VM during migrate to other host

hvillemoes

Member
Dec 13, 2020
12
1
8
72
Hi

I get this kernel error during migrate to another host. This stops the VM and aborts the migrate.
I have migrated the very same VM to this host a few days ago.

Any idea will be appreciated.

syslog:
Nov 14 10:30:21 proxmox3 kernel: [94205.987480] kvm[2288]: segfault at 68 ip 000055f4a3123991 sp 00007f5a200c0eb0 error 6 in qemu-system-x86_64[55f4a2cd8000+545000]
Nov 14 10:30:21 proxmox3 kernel: [94205.987500] Code: 49 89 c1 48 8b 47 38 4c 01 c0 48 01 f0 48 f7 f1 48 39 fd 74 d4 4d 39 e1 77 cf 48 83 e8 01 49 39 c6 77 c6 48 83 7f 68 00 75 bf <48> 89 7d 68 31 f6 48
83 c7 50 e8 a0 0d 0f 00 48 c7 45 68 00 00 00
Nov 14 10:30:21 proxmox3 kernel: [94206.226401] fwbr10013i0: port 2(tap10013i0) entered disabled state
Nov 14 10:30:21 proxmox3 kernel: [94206.226893] fwbr10013i0: port 2(tap10013i0) entered disabled state
Nov 14 10:30:21 proxmox3 systemd[1]: 10013.scope: Succeeded.
Nov 14 10:30:21 proxmox3 systemd[1]: 10013.scope: Consumed 13h 2min 53.425s CPU time.
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running

/ Harald
 

Attachments

  • proxmox3_migrate.log
    18.6 KB · Views: 1
Hi

I get this kernel error during migrate to another host. This stops the VM and aborts the migrate.
I have migrated the very same VM to this host a few days ago.

Any idea will be appreciated.

syslog:
Nov 14 10:30:21 proxmox3 kernel: [94205.987480] kvm[2288]: segfault at 68 ip 000055f4a3123991 sp 00007f5a200c0eb0 error 6 in qemu-system-x86_64[55f4a2cd8000+545000]
Nov 14 10:30:21 proxmox3 kernel: [94205.987500] Code: 49 89 c1 48 8b 47 38 4c 01 c0 48 01 f0 48 f7 f1 48 39 fd 74 d4 4d 39 e1 77 cf 48 83 e8 01 49 39 c6 77 c6 48 83 7f 68 00 75 bf <48> 89 7d 68 31 f6 48
83 c7 50 e8 a0 0d 0f 00 48 c7 45 68 00 00 00
Nov 14 10:30:21 proxmox3 kernel: [94206.226401] fwbr10013i0: port 2(tap10013i0) entered disabled state
Nov 14 10:30:21 proxmox3 kernel: [94206.226893] fwbr10013i0: port 2(tap10013i0) entered disabled state
Nov 14 10:30:21 proxmox3 systemd[1]: 10013.scope: Succeeded.
Nov 14 10:30:21 proxmox3 systemd[1]: 10013.scope: Consumed 13h 2min 53.425s CPU time.
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running
Nov 14 10:30:21 proxmox3 pvedaemon[1037823]: VM 10013 qmp command failed - VM 10013 not running

/ Harald
I tried to use the command line migrate with reduced transfer rate, but with same result:
qm migrate 10013 proxmox2 --bwlimit 50000 --migration_type insecure --targetstorage data2 --online yes --with-local-disks yes
...
drive-sata0: transferred 31.5 GiB of 100.0 GiB (31.52%) in 11m 4s
drive-sata3: Cancelling block job
drive-sata0: Cancelling block job
drive-sata1: Cancelling block job
drive-sata2: Cancelling block job
2021-11-14 16:24:12 ERROR: online migrate failure - block job (mirror) error: VM 10013 not running
2021-11-14 16:24:12 aborting phase 2 - cleanup resources
2021-11-14 16:24:12 migrate_cancel
2021-11-14 16:24:12 migrate_cancel error: VM 10013 not running
drive-sata3: Cancelling block job
drive-sata0: Cancelling block job
drive-sata1: Cancelling block job
drive-sata2: Cancelling block job
2021-11-14 16:24:12 ERROR: VM 10013 not running
2021-11-14 16:24:18 ERROR: migration finished with problems (duration 03:59:05)
migration problems
 
could you please post the following:
  • pveversion -v
  • config of VM
  • storage.cfg
thanks!
 
I tried to use the command line migrate with reduced transfer rate, but with same result:
qm migrate 10013 proxmox2 --bwlimit 50000 --migration_type insecure --targetstorage data2 --online yes --with-local-disks yes
...
drive-sata0: transferred 31.5 GiB of 100.0 GiB (31.52%) in 11m 4s
drive-sata3: Cancelling block job
drive-sata0: Cancelling block job
drive-sata1: Cancelling block job
drive-sata2: Cancelling block job
2021-11-14 16:24:12 ERROR: online migrate failure - block job (mirror) error: VM 10013 not running
2021-11-14 16:24:12 aborting phase 2 - cleanup resources
2021-11-14 16:24:12 migrate_cancel
2021-11-14 16:24:12 migrate_cancel error: VM 10013 not running
drive-sata3: Cancelling block job
drive-sata0: Cancelling block job
drive-sata1: Cancelling block job
drive-sata2: Cancelling block job
2021-11-14 16:24:12 ERROR: VM 10013 not running
2021-11-14 16:24:18 ERROR: migration finished with problems (duration 03:59:05)
migration problems
root@proxmox3:~# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-7-pve)
pve-manager: 7.0-14+1 (running version: 7.0-14+1/08975a4c)
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
pve-kernel-5.11.22-3-pve: 5.11.22-7
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-12
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-13
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.13-1
proxmox-backup-file-restore: 2.0.13-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.1.0-1
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-18
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

root@proxmox3:/etc/pve# cat storage.cfg
dir: local
path /var/lib/vz
content iso,backup,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

nfs: QnapProxmox
export /proxmox
path /mnt/pve/QnapProxmox
server 193.162.159.88
content rootdir,images,iso
prune-backups keep-all=1

pbs: pbs
datastore pbs-qnap
server proxmox2.agurk.dk
content backup
fingerprint 14:04:e5:f3:b6:55:ca:06:5b:35:8b:e9:74:42:8f:a4:b8:ec:d4:ff:91:80:34:39:7d:bf:ea:49:54:3d:4c:07
prune-backups keep-all=1
username root@pam

nfs: hp1nas
export /mnt/data1
path /mnt/pve/hp1nas
server hp1nas.agurk.dk
content vztmpl,iso,snippets,rootdir,backup,images
prune-backups keep-all=1

dir: data2
path /mnt/pve/data2
content vztmpl,snippets,iso,rootdir,images,backup
is_mountpoint 1
nodes proxmox2

dir: data1
path /mnt/pve/data1
content iso,snippets,vztmpl,backup,images,rootdir
is_mountpoint 1
nodes proxmox1

dir: data3
path /mnt/pve/data3
content iso,snippets,vztmpl,backup,images,rootdir
is_mountpoint 1
nodes proxmox3

root@proxmox3:/etc/pve#

root@proxmox3:/etc/pve/nodes# cat proxmox3/qemu-server/10013.conf
agent: 1
boot: order=sata0
cores: 8
memory: 32924
name: agurk8-webhotel
net0: e1000=76:23:91:7F:A5:94,bridge=vmbr0,firewall=1
onboot: 1
ostype: l26
sata0: local-lvm:vm-10013-disk-0,format=raw,size=100G
sata1: local-lvm:vm-10013-disk-1,format=raw,size=100G
sata2: local-lvm:vm-10013-disk-2,format=raw,size=300G
sata3: local-lvm:vm-10013-disk-3,format=raw,size=250G
sata4: local-lvm:vm-10013-disk-4,format=raw,size=500G
smbios1: uuid=8a1498e8-bd63-4cc3-a80b-d06d80577bcf
startup: up=5,down=30
vmgenid: ed69a0a9-51c1-413b-afb2-e49484cacd90
root@proxmox3:/etc/pve/nodes#
 
could you try adding ',aio=native' to the sata disks in the VM config, then power-cycle the VM (full shutdown, than cold start) and report back whether the issue persists?
 
could you try adding ',aio=native' to the sata disks in the VM config, then power-cycle the VM (full shutdown, than cold start) and report back whether the issue persists?
Same result, the problem persists.
VM config:
root@proxmox3:/etc/pve/nodes/proxmox3/qemu-server# cat 10013.conf
agent: 1
boot: order=sata0
cores: 8
memory: 32924
name: agurk8-webhotel
net0: e1000=76:23:91:7F:A5:94,bridge=vmbr0,firewall=1
onboot: 1
ostype: l26
sata0: local-lvm:vm-10013-disk-0,format=raw,size=100G,aio=native
sata1: local-lvm:vm-10013-disk-1,format=raw,size=100G,aio=native
sata2: local-lvm:vm-10013-disk-2,format=raw,size=300G,aio=native
sata3: local-lvm:vm-10013-disk-3,format=raw,size=250G,aio=native
sata4: local-lvm:vm-10013-disk-4,format=raw,size=500G,aio=native
smbios1: uuid=8a1498e8-bd63-4cc3-a80b-d06d80577bcf
startup: up=5,down=30
vmgenid: ed69a0a9-51c1-413b-afb2-e49484cacd90
root@proxmox3:/etc/pve/nodes/proxmox3/qemu-server#
 
okay, then the next step would be to install qemu dbg symbols, attach a gdb instance to the running source VM, and then collect some backtraces to see where the segfault is happening..

e.g., the following should work:

Code:
apt install gdb pve-qemu-kvm-dbg   # do this beforehand

# start your VM if not already running

VMID=100 # change this!
VM_PID=$(cat /var/run/qemu-server/${VMID}.pid)

gdb -p $VM_PID -ex='handle SIGUSR1 nostop noprint pass' -ex='handle SIGPIPE nostop print pass' -ex='set logging on' -ex='set pagination off' -ex='cont'

leave that running (for example, inside a tmux or screen session) and attempt a migration. once the migration task output indicates you have triggered the crash, enter thread apply all bt in the gdb terminal (this should print lots of info), followed by quit to exit gdb (allowing the kvm process to finish crashing, so you can restart the VM again to resume operations). a file called gdb.txt should be generated in the directory where you started gdb, it should contain more pointers w.r.t. what's going on - please post it here as attachment!

edit: removed erroneous 'attach' and replaced with '-p', results should be the same but without a warning about the nonexistence of a file called 'attach' ;)
 
Last edited:
found the issue, it's already fixed upstream, and I just sent a patch for including the fix in our next qemu package release (>= 6.1.0-2). once that hits the repos and you have upgraded, you'll need to cold-restart (poweroff, than start again) your VMs so that they run the updated, fixed binary.

migration should work again then. note that this only affects live-migration with local disks, so any VMs that don't use local disks should be unaffected (and consequently, don't need to be restarted either).
 
okay, then the next step would be to install qemu dbg symbols, attach a gdb instance to the running source VM, and then collect some backtraces to see where the segfault is happening..

e.g., the following should work:

Code:
apt install gdb pve-qemu-kvm-dbg   # do this beforehand

# start your VM if not already running

VMID=100 # change this!
VM_PID=$(cat /var/run/qemu-server/${VMID}.pid)

gdb -p $VM_PID -ex='handle SIGUSR1 nostop noprint pass' -ex='handle SIGPIPE nostop print pass' -ex='set logging on' -ex='set pagination off' -ex='cont'

leave that running (for example, inside a tmux or screen session) and attempt a migration. once the migration task output indicates you have triggered the crash, enter thread apply all bt in the gdb terminal (this should print lots of info), followed by quit to exit gdb (allowing the kvm process to finish crashing, so you can restart the VM again to resume operations). a file called gdb.txt should be generated in the directory where you started gdb, it should contain more pointers w.r.t. what's going on - please post it here as attachment!

edit: removed erroneous 'attach' and replaced with '-p', results should be the same but without a warning about the nonexistence of a file called 'attach' ;)
Thank you for your advice.
I tried to "migrate" by restoring from my proxmox backup server. When I powered up the restored server, I got error messages from the /home xfs file system. I had similar problems, when I "migrated" the server from ESXi to proxmox.
My conclusion is, that I must either somehow repair these errors or move the apps (VirtualMin) to a new server. I could at the same time switch from COS6 to Debian 11.
found the issue, it's already fixed upstream, and I just sent a patch for including the fix in our next qemu package release (>= 6.1.0-2). once that hits the repos and you have upgraded, you'll need to cold-restart (poweroff, than start again) your VMs so that they run the updated, fixed binary.

migration should work again then. note that this only affects live-migration with local disks, so any VMs that don't use local disks should be unaffected (and consequently, don't need to be restarted either).
Great - thank you

/ Harald
 
Thank you for your advice.
I tried to "migrate" by restoring from my proxmox backup server. When I powered up the restored server, I got error messages from the /home xfs file system. I had similar problems, when I "migrated" the server from ESXi to proxmox.
My conclusion is, that I must either somehow repair these errors or move the apps (VirtualMin) to a new server. I could at the same time switch from COS6 to Debian 11.

Great - thank you

/ Harald
I can add, that pve-qemu-kvm: 6.1.0-2 worked flawless.

Again thank you for the fast response and fix.

/ Harald
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!