can't migrate vm unless i'm logged into that exact node

jwsl224 · Jun 9, 2025

i've this weird issue with live migrating where pve can't do it unless i log into exactly the source node when doing the migration; connecting to any other node in this 5-node cluster makes the migration fail. using 'qm migrate' also fails; it can only be done via gui. this is so weird i don't even know where to start looking.

firstly, the issue seems to center around pve thinking the vm disks are "attached" storage. they're clearly not, they're regular zfs volumes like everything else.

when connected to the gui via any other node, here is what the migration window says:

vm is of course powered on. this remains, and the 'migrate' button remains greyed out, even after i select an appropriate target storage. the target storge for target nodes do populate correctly.
here is the error in detail when i try to start the migration via shell using qm migrate:

when i connect to the gui via the source node, i can use the gui to migrate the vm, but not qm migrate. any ideas?

fiona · Jun 10, 2025

Hi,
qm migrate needs to be issued on the source node, that is by design. And if your VM has local disks, you need to specify the --with-local-disks flag like the log tells you. To issue the migration call to a VM on another node you can do e.g.
pvesh create /nodes/pve8a2/qemu/106/migrate --target pve8a1 --online 1 --with-local-disks 1

As for why the UI is behaving differently, that sounds like the status of the VM is not detected correctly. Do you see any errors when you check the Network tab in your browsers developer tools (often Ctrl+Shift+C). Is your pvestatd service functioning properly on all nodes? Do you see any interesting messages in the system logs/journal of the relevant nodes?

jwsl224 · Jun 11, 2025

fiona said:
qm migrate needs to be issued on the source node, that is by design.

certainly. this is what i am speaking of. what i meant is connecting to the cluster has to be done via the source node. it should technically work to connect to the cluster from any node, and then issue the command by connecting to the source node shell, right?

fiona said:
And if your VM has local disks, you need to specify the --with-local-disks flag like the log tells you.

the vm has no local disks. that's what is confusing me. it just has regular zfs virtual disks like all the rest of the vms. and again, the migration works fine when connecting to the cluster using the IP address of the source node. and only that way.

fiona said:
Do you see any errors when you check the Network tab in your browsers developer tools (often Ctrl+Shift+C).

hmm. wow. never seen that tab before

but no i don't see any errors. the "status" column shows a list of green "200". lots of activity though. evidently clustering is hard work

fiona said:
Is your pvestatd service functioning properly on all nodes?

yes. it is "running" on relevant nodes.

fiona said:
Do you see any interesting messages in the system logs/journal of the relevant nodes?

i do not see anything interesting in journalctl. is there a specific something you'd like me to filter for?

fiona · Jun 11, 2025

jwsl224 said:
certainly. this is what i am speaking of. what i meant is connecting to the cluster has to be done via the source node. it should technically work to connect to the cluster from any node, and then issue the command by connecting to the source node shell, right?

Yes, if you issue the qm migrate command on the node where the VM currently is, it should work.

jwsl224 said:
the vm has no local disks. that's what is confusing me. it just has regular zfs virtual disks like all the rest of the vms. and again, the migration works fine when connecting to the cluster using the IP address of the source node. and only that way.

ZFS virtual disks are local disks. ZFS storage is not shared (except in the ZFS over iSCSI case): https://pve.proxmox.com/pve-docs/chapter-pvesm.html#_storage_types

Please share the VM configuration qm config 100 and storage configuration /etc/pve/storage.cfg as well as the output of pveversion -v on both nodes.

jwsl224 · Jun 11, 2025

fiona said:
Please share the VM configuration

fiona said:
storage configuration

fiona said:
as well as the output of pveversion -v

this is the only node i am having this issue with. and only when not connecting to it's ip address for the web gui, as mentioned.

fiona · Jun 12, 2025

jwsl224 said:
this is the only node i am having this issue with. and only when not connecting to it's ip address for the web gui, as mentioned.

Can you ping and ssh between the node and the other node in both directions?

What do you get when you run the following command on both the node itself and on the other node: pvesh get /nodes/pve8a2/qemu/106/migrate --output-format json-pretty (replacing the node name and VM ID with yours for which the error in the UI occurs)

jwsl224 · Jun 17, 2025

fiona said:
Can you ping and ssh between the node and the other node in both directions

hmm. when i try to ping from the offending node to the other node, it works great. but trying to ssh brings us to this:

edit: ssh'ing from the other node to the offending node works fine.

jwsl224 · Jun 20, 2025

but i don't know what that error is asking me to do about it. could you advise?

fiona · Jun 20, 2025

It means that the SSH host keys have changed for some reason. You might need to regenerate them. There are a lot of threads about similar situations in the forum.

jwsl224 · Jun 20, 2025

is there an official guide you can give me? because i'm afraid of hosing our cluster....

Gilberto Ferreira · Jun 20, 2025

If you are using different IP to do migrate, perhaps you need to do ssh from and to all nodes in order to register the necessary ssh keys.

jwsl224 · Jun 20, 2025

no everything is on the same ip. the migrations, the clustering, and the gui.

Gilberto Ferreira · Jun 20, 2025

jwsl224 said:
no everything is on the same ip. the migrations, the clustering, and the gui.

Personally, I do not recommend this set up.
It's better has different nics and different IPs.
If your LAN goes south, everything will fall together.
But that's another history.

jwsl224 · Jun 20, 2025

yep. that's another story. for this situation, it made the most sense.

jwsl224 · Jul 18, 2025

fiona said:
It means that the SSH host keys have changed for some reason. You might need to regenerate them. There are a lot of threads about similar situations in the forum.

there seems to be a lot of threads about people just yolo'ing it. lol. is there an official docs entry on how to do this safely without jeopardizing the cluster?

fiona · Jul 18, 2025

jwsl224 said:
there seems to be a lot of threads about people just yolo'ing it. lol. is there an official docs entry on how to do this safely without jeopardizing the cluster?

Not that I know of. What exactly you need to do depends on what precisely the issue is and that needs to be examined on the cluster itself.

jwsl224 · Jul 18, 2025

fiona said:
Not that I know of. What exactly you need to do depends on what precisely the issue is and that needs to be examined on the cluster itself.

ok i'm not well versed in cryptography. here is the error when trying to ssh from issue host into another:

some forum entries mention "check the entries in

Code:

/etc/pve/priv/authorized_keys

and

Code:

/etc/pve/priv/known_hosts

. are all the keys for all hosts supposed to be the same? i don't know what i'm "checking".

pve staff suggested running

Code:

pvecm updatecert -f

on one host, and that should update the keys across all hosts. this has been running so well for months now that i never had to dig into the inner workings of pve clustering. lol.

jwsl224 · Jul 21, 2025

fiona said:
What do you get when you run the following command on both the node itself and on the other node: pvesh get /nodes/pve8a2/qemu/106/migrate --output-format json-pretty (replacing the node name and VM ID with yours for which the error in the UI occurs)

ok it seems to be something about the specific storage. migrating the vm disk to a different local storage makes the migration work as expected. fixing the ssh keys was unrelated. here is what it says when running the code you asked:

Code:

root@pve8a1:~# pvesh get /nodes/pve8a1/qemu/188/migrate --output-format json-pretty
{
   "allowed_nodes" : [],
   "local_disks" : [
      {
         "cdrom" : 0,
         "drivename" : "scsi0",
         "is_attached" : 1,
         "is_tpmstate" : 0,
         "is_unused" : 0,
         "is_vmstate" : 0,
         "referenced_in_snapshot" : {
            "freepbxinstalled" : 1,
            "installedfresh" : 1
         },
         "replicate" : 1,
         "shared" : 0,
         "size" : 53687091200,
         "volid" : "nvme4:vm-188-disk-0"
      },
      {
         "cdrom" : 0,
         "is_attached" : 0,
         "is_tpmstate" : 0,
         "is_unused" : 0,
         "is_vmstate" : 1,
         "referenced_in_snapshot" : {
            "installedfresh" : 1
         },
         "replicate" : 1,
         "shared" : 0,
         "volid" : "nvme4:vm-188-state-installedfresh"
      },
      {
         "cdrom" : 0,
         "is_attached" : 0,
         "is_tpmstate" : 0,
         "is_unused" : 0,
         "is_vmstate" : 1,
         "referenced_in_snapshot" : {
            "freepbxinstalled" : 1
         },
         "replicate" : 1,
         "shared" : 0,
         "volid" : "nvme4:vm-188-state-freepbxinstalled"
      }
   ],
   "local_resources" : [],
   "mapped-resource-info" : {},
   "mapped-resources" : [],
   "not_allowed_nodes" : {
      "pve8a2" : {
         "unavailable_storages" : [
            "nvme4"
         ]
      },
      "pve8a3" : {
         "unavailable_storages" : [
            "nvme4"
         ]
      },
      "pve8a4" : {
         "unavailable_storages" : [
            "nvme4"
         ]
      },
      "pve8a5" : {
         "unavailable_storages" : [
            "nvme4"
         ]
      },
      "pve8a6" : {
         "unavailable_storages" : [
            "nvme4"
         ]
      }
   },
   "running" : 0
}

jwsl224 · Jul 28, 2025

@fiona, is there any troubleshooting i should bother with? or should i just try to hose the storage and rebuild it?

fiona · Jul 29, 2025

I don't see an indication that the storage itself has any issues from what you posted. Please share the details about the current situation again, do you still get the error in the UI with the VM running? Because according to the pvesh output you posted, the VM was not running.

Code:

"running" : 0

Did you ever attempt to use the --with-local-disks option for the CLI command? Because, again, a ZFS storage with the same name, which is available on mutliple nodes, does count as local. Shared storages are really only the ones serving a common state to multiple nodes at the same time: https://pve.proxmox.com/pve-docs/chapter-pvesm.html#_storage_types

can't migrate vm unless i'm logged into that exact node

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Renowned Member

Member

Renowned Member

Member

Member

Proxmox Staff Member

Member

Member

Member

Proxmox Staff Member

We value your privacy