This is pretty insane. I was migrating VM 401 from PVE1 to PVE2. In the end, I got this error:
The VM is still reachable via ping, and the kvm process is running on VPE2. But it doesn't show up in proxmox in either VPE and not in "qm list" either.
The source PVE1 looks like this:
And on both PVEs (!) /etc/pve/qemu-server/401.conf does not exist any more. It's just gone.
EDIT: It just gets crazier and crazier. I am trying to regenerate 401.conf from the parameters I find in the process list (since the VM is still running). On one hand, the config file does not exist. On the other hand, it does. Now what? WTF?
Code:
[...]
2024-09-30 23:48:16 migration active, transferred 4.9 GiB of 2.0 GiB VM-state, 16.6 MiB/s
2024-09-30 23:48:16 xbzrle: send updates to 384647 pages in 156.9 MiB encoded memory, cache-miss 65.34%, overflow 22795
2024-09-30 23:48:18 migration active, transferred 4.9 GiB of 2.0 GiB VM-state, 19.1 MiB/s, VM dirties lots of memory: 21.8 MiB/s
2024-09-30 23:48:18 xbzrle: send updates to 386878 pages in 157.3 MiB encoded memory, cache-miss 63.62%, overflow 22822
2024-09-30 23:48:18 auto-increased downtime to continue migration: 12800 ms
2024-09-30 23:48:29 average migration speed: 3.6 MiB/s - downtime 9352 ms
2024-09-30 23:48:29 migration status: completed
all 'mirror' jobs are ready
drive-efidisk0: Completing block job...
drive-efidisk0: Completed successfully.
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi1: Completing block job...
drive-scsi1: Completed successfully.
drive-efidisk0: mirror-job finished
drive-scsi0: mirror-job finished
drive-scsi1: mirror-job finished
2024-09-30 23:48:31 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve2' -o 'UserKnownHostsFile=/etc/pve/nodes/pve2/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@10.227.1.21 pvesr set-state 401 \''{"local/pve1":{"last_sync":1727764103,"last_node":"pve1","last_iteration":1727764103,"duration":580.699161,"storeid_list":["local-zfs"],"last_try":1727764103,"fail_count":0}}'\'
2024-09-30 23:48:32 stopping NBD storage migration server on target.
2024-09-30 23:48:41 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve2' -o 'UserKnownHostsFile=/etc/pve/nodes/pve2/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@10.227.1.21 qm unlock 401
2024-09-30 23:48:41 ERROR: failed to clear migrate lock: Configuration file 'nodes/pve2/qemu-server/401.conf' does not exist
2024-09-30 23:48:41 ERROR: migration finished with problems (duration 00:20:20)
TASK ERROR: migration problems
The VM is still reachable via ping, and the kvm process is running on VPE2. But it doesn't show up in proxmox in either VPE and not in "qm list" either.
The source PVE1 looks like this:
And on both PVEs (!) /etc/pve/qemu-server/401.conf does not exist any more. It's just gone.
- Is 401.conf somewhere still available (except in the backup which I should have)?
- How to bring this into a consistent state? What happens to changes in the VM? Is my VM now "officially" running on PVE2 or not?
- Most importantly: How on earth can something like this happen?
EDIT: It just gets crazier and crazier. I am trying to regenerate 401.conf from the parameters I find in the process list (since the VM is still running). On one hand, the config file does not exist. On the other hand, it does. Now what? WTF?
Code:
root@pve1:/etc/pve/qemu-server# cat 401.conf
cat: 401.conf: No such file or directory
root@pve1:/etc/pve/qemu-server# cp /tmp/401.conf .
cp: cannot create regular file './401.conf': File exists
root@pve1:/etc/pve/qemu-server#
Last edited: