[SOLVED] Stopped migration stuck in gui w/ permission errors.

dtiKev · Feb 11, 2021

After losing one node and adding in a new one I was moving some guests to the new member node. One of them has failed and is stuck from the gui side. Is there a manual way to reset this back to how it was before the start of the migration? I'm not sure why there would be permission problems as several others migrated without problems. Output is shown below. If it matters, I do have a recent backup on the node that I was trying to move from.

Code:

2021-02-10 08:31:17 starting migration of VM 108 to node 'pvnOne' (x.y.z.244)
2021-02-10 08:31:18 found local disk 'TwoT:108/vm-108-disk-0.qcow2' (in current VM config)
2021-02-10 08:31:18 copying local disk images
2021-02-10 08:31:21 Formatting '/storage/2t/images/108/vm-108-disk-0.qcow2', fmt=qcow2 cluster_size=65536 preallocation=metadata compression_type=zlib size=536870912000 lazy_refcounts=off refcount_bits=16
2021-02-10 09:52:54 131092064+0 records in
2021-02-10 09:52:54 131092064+0 records out
2021-02-10 09:52:54 536953094144 bytes (537 GB, 500 GiB) copied, 4894.18 s, 110 MB/s
2021-02-10 09:52:54 12594+31614398 records in
2021-02-10 09:52:54 12594+31614398 records out
2021-02-10 09:52:54 536953094144 bytes (537 GB, 500 GiB) copied, 4627.14 s, 116 MB/s
2021-02-10 09:52:54 successfully imported 'TwoT:108/vm-108-disk-0.qcow2'
2021-02-10 09:52:54 volume 'TwoT:108/vm-108-disk-0.qcow2' is 'TwoT:108/vm-108-disk-0.qcow2' on the target
2021-02-10 09:52:54 ERROR: unable to open file '/etc/pve/nodes/pvn2/qemu-server/108.conf.tmp.1064' - Permission denied
2021-02-10 09:52:54 aborting phase 1 - cleanup resources
2021-02-10 09:52:54 ERROR: unable to open file '/etc/pve/nodes/pvn2/qemu-server/108.conf.tmp.1064' - Permission denied
2021-02-10 09:52:54 ERROR: found stale volume copy 'TwoT:108/vm-108-disk-0.qcow2' on node 'pvnOne'
2021-02-10 09:52:54 ERROR: migration aborted (duration 01:21:38): unable to open file '/etc/pve/nodes/pvn2/qemu-server/108.conf.tmp.1064' - Permission denied
TASK ERROR: migration aborted

Moayad · Feb 12, 2021

What say...

Bash:

qm config 108
systemctl status pve-cluster corosync
pveversion -v

dtiKev · Feb 12, 2021

Code:

~# qm config 108
bootdisk: ide0
cores: 4
ide0: TwoT:108/vm-108-disk-0.qcow2,size=500G
ide2: none,media=cdrom
lock: migrate
memory: 24576
name: scanDev
net0: e1000=D6:6B:5C:F2:C9:15,bridge=vmbr0,firewall=1
numa: 0
ostype: win8
scsihw: virtio-scsi-pci
smbios1: uuid=3b70356a-cc26-428a-8a3f-cdf8e329e181
sockets: 2
vmgenid: dfa667a7-f8ff-439d-afc3-f4aa2b55aa0d

Code:

# systemctl status pve-cluster corosync
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2021-02-10 15:14:23 EST; 1 day 17h ago
  Process: 1919 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
 Main PID: 1932 (pmxcfs)
    Tasks: 7 (limit: 4915)
   Memory: 65.3M
   CGroup: /system.slice/pve-cluster.service
           └─1932 /usr/bin/pmxcfs

Feb 12 07:46:50 pvn2 pmxcfs[1932]: [dcdb] notice: data verification successful
Feb 12 07:57:07 pvn2 pmxcfs[1932]: [status] notice: received log
Feb 12 08:12:07 pvn2 pmxcfs[1932]: [status] notice: received log
Feb 12 08:16:39 pvn2 pmxcfs[1932]: [status] notice: received log
Feb 12 08:16:39 pvn2 pmxcfs[1932]: [status] notice: received log
Feb 12 08:16:39 pvn2 pmxcfs[1932]: [status] notice: received log
Feb 12 08:17:07 pvn2 pmxcfs[1932]: [status] notice: received log
Feb 12 08:17:07 pvn2 pmxcfs[1932]: [status] notice: received log
Feb 12 08:17:14 pvn2 pmxcfs[1932]: [status] notice: received log
Feb 12 08:17:25 pvn2 pmxcfs[1932]: [status] notice: received log

● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2021-02-10 15:14:24 EST; 1 day 17h ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 2048 (corosync)
    Tasks: 9 (limit: 4915)
   Memory: 157.8M
   CGroup: /system.slice/corosync.service
           └─2048 /usr/sbin/corosync -f

Feb 10 16:57:32 pvn2 corosync[2048]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 10 16:57:32 pvn2 corosync[2048]:   [KNET  ] host: host: 1 has no active links
Feb 10 16:57:34 pvn2 corosync[2048]:   [KNET  ] rx: host: 1 link: 0 is up
Feb 10 16:57:34 pvn2 corosync[2048]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 10 16:58:48 pvn2 corosync[2048]:   [TOTEM ] Token has not been received in 2737 ms
Feb 12 08:18:25 pvn2 corosync[2048]:   [KNET  ] link: host: 1 link: 0 is down
Feb 12 08:18:25 pvn2 corosync[2048]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 12 08:18:25 pvn2 corosync[2048]:   [KNET  ] host: host: 1 has no active links
Feb 12 08:18:27 pvn2 corosync[2048]:   [KNET  ] rx: host: 1 link: 0 is up
Feb 12 08:18:27 pvn2 corosync[2048]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)

Code:

# pveversion -v

proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-4
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-4
pve-cluster: 6.2-1
pve-container: 3.3-3
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-4
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

dtiKev · Feb 15, 2021

Ideas?

Moayad · Feb 16, 2021

sorry for the late reply. could you start the VM from CLI? qm start 108

dtiKev said:
Is there a manual way to reset this back to how it was before the start of the migration?

Backup the VM 108 and restore it in the new node [0]

[0] https://pve.proxmox.com/pve-docs/chapter-vzdump.html

dtiKev · Feb 16, 2021

qm start 108 results in:
VM is locked (migrate)

I feel like this has happened before. Looking at your link now.

dtiKev · Feb 16, 2021

And I knew this sounded familiar... I had this same problem a couple years back. Different reason this time but I'm pulling on that thread now to see if it'll do: https://forum.proxmox.com/threads/error-while-cold-migrating-after-upgrade.74143/#post-330951

dtiKev · Feb 16, 2021

And bingo!! That did it. I stopped the migration, moved the disk to local/zfs, cold migrated it to the new server and it starts up.

Moayad · Feb 16, 2021

Glad to solve your issue

Have a nice day

Search

Search

[SOLVED] Stopped migration stuck in gui w/ permission errors.

dtiKev

Well-Known Member

Moayad

Proxmox Staff Member

dtiKev

Well-Known Member

dtiKev

Well-Known Member

Moayad

Proxmox Staff Member

dtiKev

Well-Known Member

dtiKev

Well-Known Member

dtiKev

Well-Known Member

Moayad

Proxmox Staff Member

We value your privacy