Live migration fails at final stage (PVE 9.1.4 + Ceph)

tiboo86 · Friday at 10:08

I am experiencing a recurring issue with online (live) migration of a VM in a Proxmox VE cluster.
The migration starts normally and progresses as expected for most of the process, but it consistently fails during the final completion phase, even after Proxmox automatically increases the allowed downtime.

Below is a summary of the behavior and the relevant logs.

Environment:

Proxmox VE 9.1.4
Ceph shared storage
Online / live migration
VM with 16 GiB RAM
Dedicated migration network (high throughput observed)

Observed behavior:

Live migration starts correctly.
Memory state transfer progresses normally up to about 14.9 GiB / 16.0 GiB.
Transfer rates are variable but generally high (peaks close to 900 MiB/s).
Near the end of the migration, the transfer stalls at 14.9 GiB with 0.0 B/s throughput.
Proxmox automatically increases the allowed downtime multiple times:
100 ms → 200 ms → 400 ms → … → up to 204800 ms
Despite this, the migration never completes.

Final error:

Code:

migration status error: failed - Error in migration completion: Bad address
ERROR: online migrate failure - aborting
ERROR: migration finished with problems

Additional message seen at the beginning:

Code:

conntrack state migration not supported or disabled, active connections might get dropped

(The migration continues despite this warning.)

Result:

Migration aborts during phase 2 (finalization).
Cleanup is triggered.
The VM remains on the source node.
The issue is reproducible for this VM.

Open questions:

Is this a known issue in PVE 9.x / QEMU related to the final memory synchronization phase?
Could this be related to Ceph, the migration network, or kernel-level networking (conntrack)?
Are there recommended tunings or workarounds (migration cache size, downtime limits, precopy/postcopy, disabling conntrack, etc.)?

Thx

dcsapak · Friday at 10:18

Hi,

could you post the vm config and 'pveversion -v' from both sides please
also the journal from the time of the migration from both sides could be relevant

tiboo86 · Friday at 10:24

Node source :

Code:

root@agorapverssi1:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.17.4-2-pve)
pve-manager: 9.1.4 (running version: 9.1.4/5ac30304265fbd8e)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.4-2-pve-signed: 6.17.4-2
proxmox-kernel-6.17: 6.17.4-2
amd64-microcode: 3.20250311.1
ceph: 19.2.3-pve2
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.4.1-1+pve1
ifupdown2: 3.3.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.5
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.4
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.4
libpve-rs-perl: 0.11.4
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.1-1
proxmox-backup-file-restore: 4.1.1-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.5
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.1.0
pve-i18n: 3.6.6
pve-qemu-kvm: 10.1.2-5
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.3
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

Node destination :

Code:

root@ccvpverssi2:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.17.4-2-pve)
pve-manager: 9.1.4 (running version: 9.1.4/5ac30304265fbd8e)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.4-2-pve-signed: 6.17.4-2
proxmox-kernel-6.17: 6.17.4-2
amd64-microcode: 3.20250311.1
ceph: 19.2.3-pve2
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.4.1-1+pve1
ifupdown2: 3.3.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.5
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.4
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.4
libpve-rs-perl: 0.11.4
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.1-1
proxmox-backup-file-restore: 4.1.1-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.5
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.1.0
pve-i18n: 3.6.6
pve-qemu-kvm: 10.1.2-5
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.3
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1
root@ccvpverssi2:~#

Journal of task migration is in attached file.
Thx

tiboo86 · Friday at 10:45

Sorry, i have missed the conf of the VM :

Code:

root@agorapverssi1:/etc/pve/qemu-server# cat 111.conf
#Debian 12 Packer Template - 20250109-1035
agent: 1
bios: seabios
boot: order=scsi0;ide2;net0
cores: 4
cpu: x86-64-v2-AES
hotplug: disk,network,usb
ide0: RSSI_CEPH:vm-111-cloudinit,media=cdrom,size=4M
ide2: none,media=cdrom
ipconfig0: ip=10.89.20.16/24,gw=10.89.20.254
kvm: 1
machine: pc-q35-9.2+pve1
memory: 16384
meta: creation-qemu=9.0.2,ctime=1736418784
name: POISRPELKE06
net0: virtio=BC:24:11:6E:E4:4B,bridge=RSSI
numa: 0
onboot: 1
ostype: l26
scsi0: RSSI_CEPH:vm-111-disk-0,cache=writeback,discard=on,iothread=1,size=32G,ssd=1
scsi1: RSSI_CEPH:vm-111-disk-1,cache=writeback,iothread=1,size=10000G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=363314e0-3750-47db-a3c0-ec4761d211ae
sockets: 2
tags: elk
vmgenid: e85419a2-3ee9-44e1-98ad-184af8b02093
root@agorapverssi1:/etc/pve/qemu-server#

dcsapak · Friday at 11:46

tiboo86 said:
Journal of task migration is in attached file.

actually i didn't mean the task log, but the whole journal/syslog from both nodes. you can obtain that with

Code:

journalctl

(this will print the *whole* journal, use '--until' and '--since' to limit it to the correct timeframe)

tiboo86 · Friday at 13:55

ok its better ?

dcsapak · Friday at 14:35

yes, thanks, i can see the folloing messages:

Code:

 2026-02-06T13:41:26+01:00 agorapverssi1 QEMU[13459]: kvm: migration_block_inactivate: bdrv_inactivate_all() failed: -1
 2026-02-06T13:41:26+01:00 agorapverssi1 QEMU[13459]: kvm: Error in migration completion: Bad address

and

Code:

2026-02-06T13:41:26+01:00 agorapverssi2 QEMU[834719]: kvm: load of migration failed: Input/output error

(the latter one is probably just the symptom of the failed migration)

the first one indicates that qemu cannot close the disk image properly, do you have any issues regarding the storage ?(i see you use ceph, and the volume id would indicating that the disks are on ceph)
did you enable krbd ? any special configs on your ceph cluster?
(in the journal there is IMO nothing abnormal about ceph)

tiboo86 · Friday at 15:32

dcsapak said:
yes, thanks, i can see the folloing messages:

Code:

2026-02-06T13:41:26+01:00 agorapverssi1 QEMU[13459]: kvm: migration_block_inactivate: bdrv_inactivate_all() failed: -1 2026-02-06T13:41:26+01:00 agorapverssi1 QEMU[13459]: kvm: Error in migration completion: Bad address

and

Code:

2026-02-06T13:41:26+01:00 agorapverssi2 QEMU[834719]: kvm: load of migration failed: Input/output error

(the latter one is probably just the symptom of the failed migration)

the first one indicates that qemu cannot close the disk image properly, do you have any issues regarding the storage ?(i see you use ceph, and the volume id would indicating that the disks are on ceph)
did you enable krbd ? any special configs on your ceph cluster?
(in the journal there is IMO nothing abnormal about ceph)

Hello Dominik,

Indeed, we are using Ceph, but we don’t see anything abnormal on the cluster side. We also don’t experience any issues with other VMs, including migrations in general.

From our perspective, it looks more like a VM-specific issue. The affected VM is heavily loaded in terms of memory usage, and during the migration process, the final memory switchover may fail because the operating system keeps continuously loading data into RAM.

Could this continuous memory activity prevent the migration from completing properly and lead to this kind of error?

For sure, KRBD is not loaded, nothing return for this command : lsmod | grep rbd

Ceph details :

Code:

root@agorapverssi1:~# ceph status
  cluster:
    id:     d9d05da3-93c7-419d-8437-a3a97f466330
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum agorapverssi1,ccvpverssi2,agorapverssi2,ccvpverssi1,prcpverssi1 (age 22h)
    mgr: agorapverssi1(active, since 22h), standbys: agorapverssi2, ccvpverssi2, ccvpverssi1
    osd: 44 osds: 44 up (since 22h), 44 in (since 22h)
 
  data:
    pools:   3 pools, 1281 pgs
    objects: 16.78M objects, 64 TiB
    usage:   298 TiB used, 182 TiB / 480 TiB avail
    pgs:     1189 active+clean
             55   active+clean+scrubbing
             37   active+clean+scrubbing+deep
 
  io:
    client:   56 MiB/s rd, 40 MiB/s wr, 89 op/s rd, 633 op/s wr

Code:

root@agorapverssi1:~# ceph config dump
WHO     MASK  LEVEL     OPTION                                 VALUE            RO
mon           advanced  auth_allow_insecure_global_id_reclaim  false             
mgr           advanced  mgr/prometheus/server_addr             10.1.1.121       *
mgr           advanced  mgr/prometheus/server_port             9090               
osd           advanced  osd_recovery_sleep                     0.000000           
osd.*         advanced  osd_mclock_profile                     high_client_ops   
osd.0         basic     osd_mclock_max_capacity_iops_hdd       478.223971         
osd.1         basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.10        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.11        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.12        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.13        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.14        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.15        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.16        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.17        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.18        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.19        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.2         basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.20        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.21        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.22        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.23        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.24        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.25        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.26        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.27        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.28        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.3         basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.30        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.31        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.32        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.33        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.34        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.35        basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.36        basic     osd_mclock_max_capacity_iops_ssd       39113.405486       
osd.37        basic     osd_mclock_max_capacity_iops_ssd       34995.008720       
osd.38        basic     osd_mclock_max_capacity_iops_ssd       57385.773856       
osd.39        basic     osd_mclock_max_capacity_iops_ssd       37092.485265       
osd.4         basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.40        basic     osd_mclock_max_capacity_iops_ssd       45456.575159       
osd.41        basic     osd_mclock_max_capacity_iops_ssd       40238.894022       
osd.42        basic     osd_mclock_max_capacity_iops_ssd       40758.208652       
osd.43        basic     osd_mclock_max_capacity_iops_ssd       32269.102284       
osd.5         basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.6         basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.7         basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.8         basic     osd_mclock_max_capacity_iops_hdd       450.000000         
osd.9         basic     osd_mclock_max_capacity_iops_hdd       450.000000

Search

Search

Live migration fails at final stage (PVE 9.1.4 + Ceph)

tiboo86

New Member

dcsapak

Proxmox Staff Member

tiboo86

New Member

Attachments

tiboo86

New Member

dcsapak

Proxmox Staff Member

tiboo86

New Member

Attachments

dcsapak

Proxmox Staff Member

tiboo86

New Member

We value your privacy