Hi guys. Sorry for the late update @damarges @Robert.H
The error did not appear in the last year or two.
Steps we try to always do with the GPU VMs:
- Shutdown the VM properly before removing the GPU
- Not ballooning, always fixed RAM
- Assign the GPU to a powered off VM
Maybe this helps...
Hello all,
I want to give a user permission to change the boot order under the VMs "options" menue:
The user cannot change it if I set the permissions for this user to VM.Config.Options (according to https://pve.proxmox.com/wiki/User_Management this user should be able to "modify any other VM...
Sorry if I spam my own thread, but I'll report any finding here in the hopes that maybe one day someone with a similar issue finds help.
This being said, I think we have a new clue: most of the times, the issues occures when some sort of backup machanism was at work. Either when restoring a...
Hi all,
this problem still persists to this day :mad:
Today, the problem occured again after we restored a backup from a virtual machine with a GPU attached (VM was shut down first). We have the exact same error messages like before.
We're going crazy with this damn error!!!!
Best,
Andy
Hello again
sorry to revive this topic, but unfortunately, the exact same error occured again.
We shut down a VM with a attached GPU, set the CPU to "host" instead of kvm64 and rebooted it only to face the same problem again: VM fails to boot and after that nvidia-smi shows Failed to initialize...
The person responsible just told me that they did not (re-)install the guest-agent.
It's a Fedora Core OS that's provisioned via a json. We told the person to install the guest-agent - and they did for a couple of month. During the latest iteration, the guest-agent somehow did not make it into...
Quick update to this:
We solved the issue *fingers-crossed*. After further investigation, we saw in the log that one VM is doing wonky stuff:
Apr 22 10:32:07 HOST pvedaemon[78720]: VM 172 qmp command failed - VM 172 qmp command 'guest-ping' failed - unable to connect to VM 172 qga socket -...
Thanks Thomas for your input.
Both hosts should be the same, they were updated almost at the same time.
They use the same NFS storage where they backup to. VMs are running mostly locally (fast NVMe disks; we had speed issues when having the system disks on NFS) with ("slow") data disks on...
Hi Moayad,
thanks for your reply.
I think we'll need to reboot after that to take effect, is that correct? That is not possible until our mid-May maintenance window, I'm afraid :-(
Hi fellow Proxmoxers
two days ago, our nightly backup raised an error with one VM:
ERROR: Backup of VM 160 failed - VM 160 qmp command 'query-backup' failed - got wrong command id '12299:46449189' (expected 27831:3521)
We manually tried to backup the VM and it failed. When we set the backup...
Well, then we have to wait until the next maintenance window so we can restart the host :-(
Here are the journal entries that are relevant (during the time I was working on said VM):
To add to this:
I think that [1243044.289447] nvidia 0000:37:00.0: MDEV: Unregistering is the problem.
dmesg for mdev:
[ 51.073175] nvidia 0000:37:00.0: MDEV: Registered
[...]
[1243044.289447] nvidia 0000:37:00.0: MDEV: Unregistering
Now I'd wonder why it got unregistered.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.