[SOLVED] Live migration doesn't work after upgrade to 8.4.0

vegarnilsen

New Member
Apr 29, 2025
6
1
3
We have a two-node cluster that we upgraded to 8.4.0, both nodes are set up with local storage (directory). Prior to 8.4.0 we could migrate VMs between nodes by specifying the target storage, but this doesn't work after the upgrade:

1745921434982.png
The "Storage not available on selected target." error doesn't go away after I select target storage, and the Migrate button stays inactive. (The VM is of course running.)

We have another cluster with the same storage setup, where the nodes are on 8.1.0 and 8.2.0 respectively, where migration works with no issues.
"pveversion -v" output from both nodes:

node1:
Bash:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-9-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8: 6.8.12-9
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250211.1~deb12u1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.0-1
proxmox-backup-file-restore: 3.4.0-1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: not correctly installed
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0

node2:
Bash:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-10-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8: 6.8.12-10
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250211.1~deb12u1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: not correctly installed
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
 
Can you please post the VM config and the storage configuration?

Bash:
qm config VMID

cat /etc/pve/storage.cfg
Bash:
clsvegar@widepmx5017:~$ sudo qm config 101
agent: enabled=1
balloon: 0
boot: order=scsi0
cicustom: vendor=space0:snippets/ubuntu-jammy-vendor1.yaml
ciuser: ubuntu
cores: 1
cpu: host
hotplug: network,disk,cpu,memory,usb
ipconfig0: ip=***/24,gw=***
kvm: 1
memory: 1024
meta: creation-qemu=8.1.5,ctime=1714040118
name: ***
nameserver: IP1 IP2
net0: virtio=BC:24:11:72:77:25,bridge=vmbr828
numa: 1
onboot: 1
ostype: l26
scsi0: space0:101/vm-101-disk-0.qcow2,discard=on,iothread=1,size=15G
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=31531bb0-5f99-4a5b-8235-a55b168600bf
sshkeys: ***
startup: order=1
vga: serial0
vmgenid: c87fbb4c-57d3-42b6-8501-f0da0cdea19a
clsvegar@widepmx5017:~$ sudo cat /etc/pve/storage.cfg
dir: local
    path /var/lib/vz
    content iso
    prune-backups keep-all=1
    shared 0

dir: space0
    path /space0/proxmox
    content vztmpl,snippets,backup,images,rootdir
    nodes widepmx5017
    prune-backups keep-all=1
    shared 0

dir: widepmx5517space0
    path /space0/proxmox
    content rootdir,iso,vztmpl,backup,snippets,images
    nodes widepmx5517
    prune-backups keep-all=1
    shared 0

Cheers
 
Hi,
Bash:
cicustom: vendor=space0:snippets/ubuntu-jammy-vendor1.yaml

dir: space0
    path /space0/proxmox
    content vztmpl,snippets,backup,images,rootdir
    nodes widepmx5017
    prune-backups keep-all=1
    shared 0
the issue is that the snippets won't be available on the target node, so the migration is blocked. Snippets are not migrated to other storages as part of migration, but need to be on shared storages currently.
 
Hi,

the issue is that the snippets won't be available on the target node, so the migration is blocked. Snippets are not migrated to other storages as part of migration, but need to be on shared storages currently.
Ok, so since we're not using Cloud-init or the snippet config after the first boot, can I fix the migration issue by just removing the cicustom config from each VM? (We've used this setup for a while, and it didn't cause migration issues earlier, so this looks to me like a regression in 8.4.)

Cheers
 
I tried removing the vendor snippet config:
Bash:
clsvegar@widepmx5017:~$ sudo qm set 101 --cicustom ""
update VM 101: -cicustom
clsvegar@widepmx5017:~$ sudo qm config 101 | grep cicustom
cicustom:
Then I tried migrating that VM to the other cluster node, but that only gives the same error as before:
1746007737164.png

In case that would make a difference I also shut down the VM and booted it up, and then tried another migration, same error.
 
I spun up a test VM so that I could mess around more freely, this is from an Ubuntu 24.04 template, but we create all our templates from the same script, so it has the same cicustom setup:

Bash:
clsvegar@widepmx5517:~$ sudo qm config 102 | grep cicustom
cicustom: vendor=space0:snippets/ubuntu-noble-vendor1.yaml

Then I ran a live migration from the command line, like this:

Bash:
clsvegar@widepmx5017:~$ sudo qm migrate 102 widepmx5517 --online 1 --targetstorage widepmx5517space0 --with-local-disks 1
2025-04-30 12:17:31 starting migration of VM 102 to node 'widepmx5517' (10.47.47.70)
2025-04-30 12:17:31 found local disk 'space0:102/vm-102-disk-0.qcow2' (attached)
2025-04-30 12:17:31 starting VM 102 on remote node 'widepmx5517'
2025-04-30 12:17:36 volume 'space0:102/vm-102-disk-0.qcow2' is 'widepmx5517space0:102/vm-102-disk-0.qcow2' on the target
2025-04-30 12:17:36 start remote tunnel
2025-04-30 12:17:38 ssh tunnel ver 1
2025-04-30 12:17:38 starting storage migration
2025-04-30 12:17:38 scsi0: start migration to nbd:unix:/run/qemu-server/102_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred 351.0 MiB of 15.0 GiB (2.29%) in 1s
drive-scsi0: transferred 1.2 GiB of 15.0 GiB (8.19%) in 2s
drive-scsi0: transferred 1.4 GiB of 15.0 GiB (9.47%) in 3s
drive-scsi0: transferred 1.6 GiB of 15.0 GiB (10.62%) in 4s
drive-scsi0: transferred 1.8 GiB of 15.0 GiB (12.21%) in 5s
drive-scsi0: transferred 2.1 GiB of 15.0 GiB (14.13%) in 6s
drive-scsi0: transferred 2.6 GiB of 15.0 GiB (17.12%) in 7s
drive-scsi0: transferred 2.8 GiB of 15.0 GiB (18.51%) in 8s
drive-scsi0: transferred 3.0 GiB of 15.0 GiB (19.93%) in 9s
drive-scsi0: transferred 4.3 GiB of 15.0 GiB (28.93%) in 10s
drive-scsi0: transferred 4.7 GiB of 15.0 GiB (31.14%) in 11s
drive-scsi0: transferred 5.2 GiB of 15.0 GiB (34.74%) in 12s
drive-scsi0: transferred 15.0 GiB of 15.0 GiB (100.00%) in 13s, ready
all 'mirror' jobs are ready
2025-04-30 12:17:51 switching mirror jobs to actively synced mode
drive-scsi0: switching to actively synced mode
drive-scsi0: successfully switched to actively synced mode
2025-04-30 12:17:52 starting online/live migration on unix:/run/qemu-server/102.migrate
2025-04-30 12:17:52 set migration capabilities
2025-04-30 12:17:52 migration downtime limit: 100 ms
2025-04-30 12:17:52 migration cachesize: 256.0 MiB
2025-04-30 12:17:52 set migration parameters
2025-04-30 12:17:52 start migrate command to unix:/run/qemu-server/102.migrate
2025-04-30 12:17:53 migration active, transferred 129.6 MiB of 2.0 GiB VM-state, 161.1 MiB/s
2025-04-30 12:17:54 migration active, transferred 297.0 MiB of 2.0 GiB VM-state, 162.1 MiB/s
2025-04-30 12:17:55 average migration speed: 682.9 MiB/s - downtime 34 ms
2025-04-30 12:17:55 migration completed, transferred 426.5 MiB VM-state
2025-04-30 12:17:55 migration status: completed
all 'mirror' jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi0: mirror-job finished
2025-04-30 12:17:57 stopping NBD storage migration server on target.
2025-04-30 12:18:05 migration finished successfully (duration 00:00:35)

The VM worked perfectly after the migration, as expected.

It looks to me like the problem is that the web UI doesn't update when I pick a different target storage, and thus it won't let me start the migration, even though the migration itself would succeed.
 
Oh you're right, the snippets are not being checked after all. So that is a different issue.

Did you reload the UI after upgrading? I wasn't able to reproduce the issue here.
 
  • Like
Reactions: vegarnilsen
Oh you're right, the snippets are not being checked after all. So that is a different issue.

Did you reload the UI after upgrading? I wasn't able to reproduce the issue here.
Hm, I thought you meant server-side first, but I tried with a different browser locally now, and the error message is gone there. Looks like a browser cache issue then.

Thanks!
 
  • Like
Reactions: fiona