VM migration errors

royalj7

Member
Aug 11, 2020
8
0
21
47
Relative Proxmox and Linux newbie here. I was playing around with HA on a 3-node cluster and seem to have hosed things up for one of my VMs. When I simulated a node failure to check HA, the migration failed with a "TASK ERROR: volume 'SSD:110/vm-110-disk-0.qcow2' does not exist". I also got an error that the VM wouldn't start: "TASK ERROR: command 'ha-manager set vm:110 --state started' failed: exit code 255". No big deal, I was going to move the VM back to the original node and start investigating what I did wrong.

Problem now is I can not move the VM to either of the other two nodes, nor can I start it on the 3rd node. When moving back to the original node it came from I get:
2022-05-20 14:28:41 ssh: connect to host 192.168.1.30 port 22: Connection refused
2022-05-20 14:28:41 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted.
I changed my SSH port, but not sure how I can adjust that in Proxmox so that I don't get this error.

However, I changed my SSH port to the default 22 on the 2nd node, and I get this error when trying to migrate the VM there:
ERROR: migration aborted (duration 00:00:00): storage migration for 'SSD:110/vm-110-disk-0.qcow2' to storage '' failed - no storage ID specified
TASK ERROR: migration aborted.
So, not the SSH problem, but something else. Being a newbie, not sure how to fix this. "SSD" is a directory storage setup on each of the 3 nodes.

When I just try to start the VM on the node it got moved to during the HA failure, I get:
TASK ERROR: volume 'SSD:110/vm-110-disk-0.qcow2' does not exist
So, a storage/disk issue, but where to go from here? All three machines are up-to-date with updates.

I appreciate the help!
--James
 
I changed the SSH port on the 1st node back to 22, and I get the same "no storage ID specified" error as I got on node 2. Doing some research on that error, I came across this forum thread. However, I already have updated all my nodes and I'm on pve-container version 4.2-1, which is newer than what was required to fix the "no storage ID specified" in that instance. So I can't move the VM to either of the other cluster nodes, and I can't start it up on the one its shown on in Proxmox's web GUI. Anyone have any ideas on a fix? This was my DNS/ad-blocking VM. Luckily, I have a backup DNS running on a RPi4, but I'd like to get this solved just in case.
 
Anyone have any ideas on how to fix or diagnosis? My google-fu doesn't seem to be up to the task...

Thanks!
 
please post
- pveversion -v
- /etc/pve/storage.cfg
- VM config
- full task logs of the failed tasks