[SOLVED] VMs failing to start and migration.

Micleah · Dec 29, 2024

I have a Proxmox cluster with 3 nodes (Node 1, Node 2, and Node 3). Node 2 has been offline for some time, and I suspect the issue is related to storage, as the node shows a grey question mark with 98% storage usage. Today, Node 1 restarted, and as a result, VMs were automatically migrated to Node 2. However, the VMs on Node 2 also became greyed out, and I can no longer access Node 2 via the GUI.

To work around this, I manually migrated the VMs from Node 2 to Node 1 using the mv command. However, after migration, the VMs failed to start, and there are no logs to indicate the root cause.

Has anyone encountered a similar issue? Any suggestions on how I can troubleshoot this and get the VMs running again?

bfrd9k · Dec 29, 2024

1. How is storage handled by the cluster? Like do VM disks reside on node local volumes?
2. How or why were VM's migrated from node 1 to node 2 when node 2 was not active?
3. Can you provide the exact commands you used to manually migrate the VMs from 2 to 1.
4. Which volumes are full on node 2?
5. Can you try rebooting node 2 to see if it becomes active again?

gfngfn256 · Dec 29, 2024

Micleah said:
I manually migrated the VMs from Node 2 to Node 1 using the mv command.

mv is the Linux command to move files within a filesystem. It is NOT the method employed to migrate VMs within PVE nodes. You have probably messed up the VMs.

Micleah · Dec 29, 2024

bfrd9k said:
1. How is storage handled by the cluster? Like do VM disks reside on node local volumes?
2. How or why were VM's migrated from node 1 to node 2 when node 2 was not active?
3. Can you provide the exact commands you used to manually migrate the VMs from 2 to 1.
4. Which volumes are full on node 2?
5. Can you try rebooting node 2 to see if it becomes active again?

1. Local storage for Iso images, backup files and container templates and vm disks on local -zfs
2. The node is grayed out with a question mark sign but accessible via ssh.
3. I used the command mv /etc/pve/nodes/node2/qemu-server/111.conf /etc/pve/nodes/node1/qemu-server/
4. Both local and local-zfs are affected they are both at 98%
5. The node becomes active after reboot but grays out again later.

Micleah · Dec 29, 2024

gfngfn256 said:
mv is the Linux command to move files within a filesystem. It is NOT the method employed to migrate VMs within PVE nodes. You have probably messed up the VMs.

The qm command was not working and I used mv /etc/pve/nodes/node2/qemu-server/111.conf /etc/pve/nodes/node1/qemu-server/

bfrd9k · Dec 29, 2024

Move 111.conf back to node 1. The issue is that the conf file still references storage from node 1 which is inaccessible to node 2.

You need to solve the ? issue first. SSH into node 2 and check journalctl --no-pager for anything interesting. From what I understand ZFS doesn't like to be over 80% capacity and I'm honestly not sure how ZFS is set up for you, if 98% is really 98% of the volume or if it's 98% of 80% of the volume. Whatever the case, maybe delete some ISO's or anything else that you created and don't need on those local volumes.

Micleah · Dec 29, 2024

The node is now okay and not greyed out and after moving the 111.conf the vm wont start I am getting the following log message

Dec 29 15:52:14 node1 pvedaemon[27355]: <root@pam> starting task UPID:node1:000089E4:000B4C09:677145FE:hastart:111:root@pam:

Dec 29 15:52:14 node1 pvedaemon[27355]: <root@pam> end task UPID:node1:000089E4:000B4C09:677145FE:hastart:111:root@pam: OK

I have also noticed the vm ha status is indicating request_stop

gfngfn256 · Dec 29, 2024

Micleah said:
The qmcommand was not working and I used mv

Couldn't find the key to my front-door, so I found a sledgehammer and knocked the door down.

Get your cluster up & running smoothly, delete those VMs, & restore from backups. You should have them anyway.

Micleah · Dec 30, 2024

Thank you, everyone. After creating some space on Node 2 (now it's at 60%) and migrating the VMs to Node 2, the VMs are now starting up properly.

gfngfn256 · Dec 30, 2024

Happy you got sorted. You must always monitor space available on host, and if your root host gets full you could lose access to the node - as appears to have happened in your case. Also never try to force-engineer something that is not working - try & track what the root cause is & deal with it. Anyway enough of my ranting on...

Since it appears you have solved your issue, Maybe mark this thread as Solved. At the top of the thread, choose the Edit thread button, then from the (no prefix) dropdown choose Solved.

Search

Search

[SOLVED] VMs failing to start and migration.

Micleah

New Member

bfrd9k

New Member

gfngfn256

Distinguished Member

Micleah

New Member

Micleah

New Member

bfrd9k

New Member

Micleah

New Member

gfngfn256

Distinguished Member

Micleah

New Member

gfngfn256

Distinguished Member

We value your privacy