unable to find configuration file for VM 220 on node 'pve-a'

lknite

Member
Sep 27, 2024
72
5
8
Am working on a script to reboot all vms with a certain tag.

Writing the script is going well, but right off encountered an error (see title).

Error encountered with:
Code:
qm reboot 220

The vm is there. I can right click and reboot it. (also, several other vms are showing the same error when i attempt to reboot them with the script)
Code:
root@pve-a:~# qm status 220
Configuration file 'nodes/pve-a/qemu-server/220.conf' does not exist
root@pve-a:~# qm status 221
status: running

A little research shows that maybe I need to manually copy a configuration file from another node over to this one.

Additional research shows there is already something which is supposed to do this.

Is this a sign of something? Why would the syncing app not sync this vm config between all the proxmox nodes?

Instead of manually copying, is there a better solution? Like, is there is a service I should restart? (I tried 'systemctl restart corosync', didn't seem to help)

I'm assuming if I restart the node that'd force the vms to resync? Seems a bit extreme ... and maybe it wouldn't work anyway.
 
Last edited:
Tried this as a workaround:
Code:
# ssh pve-c qm status 220
ssh: connect to host pve-c port 22: Connection timed out

Was surprised to see it timeout. Have been using this setup for over a year.

Could it be that the host name not resolving is the issue? I'd assume proxmox is using the ip instead, I mean it must be ...
 
3 node cluster:
pve-a
pve-b
pve-c

Running the script on pve-a.

Looks like 220 is currently running on pve-c.
 
Could it be that the host name not resolving is the issue? I'd assume proxmox is using the ip instead, I mean it must be ...
Your timeout error is between your client and PVE machine.

QM command only operates on VMs that are local to the host where you execute the QM. It seems like one of the hosts in your cluster is down.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
You are trying to stop VM 220, and then status check 221



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I did a status check on two vms to show the status worked on one and not the other. The one it didn't work on is the one that didn't reboot. Whether rebooting or checking the status it says it can't find the configuration file.
 
Your timeout error is between your client and PVE machine.

QM command only operates on VMs that are local to the host where you execute the QM. It seems like one of the hosts in your cluster is down.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I can right-click on the vm and reboot it in the gui, which tells me the host is up. However, I could see maybe a partial outage, though there is no indication in the gui.

another source told me to add '--node <node>' to the reboot command, but checking the man page of the 'qm' command doesn't show a '--node' parameter
 
Your timeout error is between your client and PVE machine.

QM command only operates on VMs that are local to the host where you execute the QM. It seems like one of the hosts in your cluster is down.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
This isn't a timeout issue. It is known that the hostname doesn't resolve in dns, by ip its fine. Unless you believe proxmox uses the hostname somewhere.
 
I did a status check on two vms to show the status worked on one and not the other. The one it didn't work on is the one that didn't reboot. Whether rebooting or checking the status it says it can't find the configuration file.
The "qm" command only operates on local VMs. You are executing it from host A, you said the VM is on host C - the error is expected.
You can run "qm list" to see what VMs you can access locally.

You can run "pvecm status" to check the status of your cluster communication.

If you are able to access host C you can run QM command there against VM 220.
You can also try to use API but I suspect it will fail as well in your current state.
The fact that you can right click on the VM in UI does not indicate to us that the host is healthy.

This isn't a timeout issue. It is known that the hostname doesn't resolve in dns, by ip its fine. Unless you believe proxmox uses the hostname somewhere.
If the host could not be resolved you would get a name resolution error. Not a timeout error. Perhaps your DNS is wrong, either in /etc/hosts or in your DNS server.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hi Iknite,

In a cluster usually no configurationfiles for the VMs needs to be copied (they live in /etc/pve/nodes/<hostname> as part of the proxmox cluster filesystem), but maybe you are using features like hookscripts, that are stored on local storages, but that should only affect migrations? Or maybe one of the hosts is not part of the quorum anymore?

Can you provide the job log output of the reboot command.

Can you check the uptime of the three hosts?
If your connection timed out in that moment, maybe the reboot occurred, at that time, due to lost quorum.

like bbgeek17 said qm command works only local. Maybe the suggestion for --node came from a AI tool?

BR, Lucas
 
Last edited:
k, i understand now that a 'qm' command must be run on the host where the vm currently lives.

I'll post the resulting script here in a few minutes.
 
Code:
# cat reboot_by_tag.sh
#!/bin/bash


# display syntax if no parameter
if [[ "$1" == "" ]]; then
  echo "Will reboot all hosts with specified tag."
  echo ""
  echo "Syntax:"
  echo "  $0 <tag>"
  exit 0
fi

# use to get ip of proxmox host
getIpByHost() {
  cat /etc/pve/.members | jq -r '.nodelist."'$1'".ip'
}

# get vms to reboot
vmids=$(pvesh get /cluster/resources --output-format json | jq -r '.[] | select(.type == "qemu" and .tags == "'$1'") | "\(.vmid),\(.name),\(.node)"')

# loop through vms rebooting one at a time
for next in ${vmids}; do
  # parse of vm id & host where it is located
  vmid=$(echo $next | cut -f 1 -d ',')
  name=$(echo $next | cut -f 2 -d ',')
  host=$(echo $next | cut -f 3 -d ',')

  echo "next: $name"
  echo "- rebooting $vmid (via $host)"
  ssh $(getIpByHost $host) qm reboot --timeout 120 $vmid

  # pause after reboot to allow vm some time to start backup before rebooting next
  if [[ "$?" == 0 ]]; then
    echo "- vm has started back up, pausing a moment to give it a little time to start things"
    sleep 60
  else
    echo "- unable to reboot"
  fi
done
 
example output:

Code:
# ./reboot_by_tag.sh k-core
next: k-core
- rebooting 220 (via pve-c)
- vm has started back up, pausing a moment to give it a little time to start things
next: k-core-c1
- rebooting 221 (via pve-a)
- vm has started back up, pausing a moment to give it a little time to start things
next: k-core-c2
- rebooting 222 (via pve-b).
- vm has started back up, pausing a moment to give it a little time to start things
next: k-core-c3
- rebooting 223 (via pve-c)
- vm has started back up, pausing a moment to give it a little time to start things
next: k-core-w1
- rebooting 224 (via pve-a)
- vm has started back up, pausing a moment to give it a little time to start things
next: k-core-w2
- rebooting 225 (via pve-b)
- vm has started back up, pausing a moment to give it a little time to start things
next: k-core-w3
- rebooting 226 (via pve-c)
- vm has started back up, pausing a moment to give it a little time to start things