[SOLVED] VMs failing to start and migration.

Micleah

New Member
Aug 30, 2023
5
0
1
I have a Proxmox cluster with 3 nodes (Node 1, Node 2, and Node 3). Node 2 has been offline for some time, and I suspect the issue is related to storage, as the node shows a grey question mark with 98% storage usage. Today, Node 1 restarted, and as a result, VMs were automatically migrated to Node 2. However, the VMs on Node 2 also became greyed out, and I can no longer access Node 2 via the GUI.

To work around this, I manually migrated the VMs from Node 2 to Node 1 using the mv command. However, after migration, the VMs failed to start, and there are no logs to indicate the root cause.

Has anyone encountered a similar issue? Any suggestions on how I can troubleshoot this and get the VMs running again?
 
1. How is storage handled by the cluster? Like do VM disks reside on node local volumes?
2. How or why were VM's migrated from node 1 to node 2 when node 2 was not active?
3. Can you provide the exact commands you used to manually migrate the VMs from 2 to 1.
4. Which volumes are full on node 2?
5. Can you try rebooting node 2 to see if it becomes active again?
 
1. How is storage handled by the cluster? Like do VM disks reside on node local volumes?
2. How or why were VM's migrated from node 1 to node 2 when node 2 was not active?
3. Can you provide the exact commands you used to manually migrate the VMs from 2 to 1.
4. Which volumes are full on node 2?
5. Can you try rebooting node 2 to see if it becomes active again?
1. Local storage for Iso images, backup files and container templates and vm disks on local -zfs
2. The node is grayed out with a question mark sign but accessible via ssh.
3. I used the command mv /etc/pve/nodes/node2/qemu-server/111.conf /etc/pve/nodes/node1/qemu-server/
4. Both local and local-zfs are affected they are both at 98%
5. The node becomes active after reboot but grays out again later.
 
Last edited:
mv is the Linux command to move files within a filesystem. It is NOT the method employed to migrate VMs within PVE nodes. You have probably messed up the VMs.
The qm command was not working and I used mv /etc/pve/nodes/node2/qemu-server/111.conf /etc/pve/nodes/node1/qemu-server/
 
Move 111.conf back to node 1. The issue is that the conf file still references storage from node 1 which is inaccessible to node 2.

You need to solve the ? issue first. SSH into node 2 and check journalctl --no-pager for anything interesting. From what I understand ZFS doesn't like to be over 80% capacity and I'm honestly not sure how ZFS is set up for you, if 98% is really 98% of the volume or if it's 98% of 80% of the volume. Whatever the case, maybe delete some ISO's or anything else that you created and don't need on those local volumes.
 
  • Like
Reactions: Kingneutron
The node is now okay and not greyed out and after moving the 111.conf the vm wont start I am getting the following log message
Dec 29 15:52:14 node1 pvedaemon[27355]: <root@pam> starting task UPID:node1:000089E4:000B4C09:677145FE:hastart:111:root@pam:
Dec 29 15:52:14 node1 pvedaemon[27355]: <root@pam> end task UPID:node1:000089E4:000B4C09:677145FE:hastart:111:root@pam: OK

I have also noticed the vm ha status is indicating request_stop
 
The qmcommand was not working and I used mv
Couldn't find the key to my front-door, so I found a sledgehammer and knocked the door down.

Get your cluster up & running smoothly, delete those VMs, & restore from backups. You should have them anyway.
 
  • Like
Reactions: Kingneutron
Thank you, everyone. After creating some space on Node 2 (now it's at 60%) and migrating the VMs to Node 2, the VMs are now starting up properly.
 
Happy you got sorted. You must always monitor space available on host, and if your root host gets full you could lose access to the node - as appears to have happened in your case. Also never try to force-engineer something that is not working - try & track what the root cause is & deal with it. Anyway enough of my ranting on...

Since it appears you have solved your issue, Maybe mark this thread as Solved. At the top of the thread, choose the Edit thread button, then from the (no prefix) dropdown choose Solved.
 
  • Like
Reactions: Micleah

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!