All VMs locking up after latest PVE update

t.lamprecht · Mar 23, 2021

There's now a new QEMU package on our pvetest repository which includes Stefan's fix, it's in pve-qemu-kvm version 5.2.0-4.

[EDIT 2021-03-24]: Package is now on pve-no-subscription, no need for enabling the pvetest repo anymore.

The quickest and most secure way to upgrade just that package from pvetest would be:

Bash:

# enable pvetest
echo 'deb http://download.proxmox.com/debian/pve buster pvetest' > /etc/apt/sources.list.d/pvetest.list
# update available packages
apt update
# instruct apt to only install the 'pve-qemu-kvm' package, it will get the newer one from pvetest
apt install pve-qemu-kvm
# disable pvetest again
rm /etc/apt/sources.list.d/pvetest.list
apt update

Remember, you always need to either fully restart the VM after the upgrade or migrate it to an upgraded PVE node, else the VM is still running the older QEMU version, and you won't have the fix active.

Adam Koczarski · Mar 23, 2021

rejik said:
Additional information, after proxmox upgrading(pve-qemu-kvm), VM with windows 2019 lost the network adapter and found a new one (the network settings were lost from this). During the downgrade proxmox, the old adapter returned and the new one was lost.

I've noticed the same phenomenon with one of our 2012R2 servers. It happened at the same time we had the issue reported in this thread. It has also happened before the issue in this thread came to be. I noticed it the first time after a Windows update.

Adam Koczarski · Mar 23, 2021

How do I tell if the test pve-qemu-kvm has successfully been install? Thx!

jcoleman · Mar 24, 2021

You can check with:
/usr/bin/kvm --version

Should show the following:

Code:

QEMU emulator version 5.2.0 (pve-qemu-kvm_5.2.0)
Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers

Also:

Code:

# dpkg -l pve-qemu-kvm
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-===================================
ii  pve-qemu-kvm   5.2.0-4      amd64        Full virtualization on x86 hardware

Be sure to actually stop/start the virtual machines after installing. A migration out and back into the machine should be sufficient if I read correctly. I've been migrating systems off, upgrading with:

Code:

# Update repositories
apt update
# Upgrade OS
apt dist-upgrade

Then I use t.lamprecht' update instructions to upgrade pve-qemu-kvm. After fully updating, I've been rebooting the hosts to ensure all kernel updates are applied and then migrating vms back.

Adam Koczarski · Mar 24, 2021

jcoleman said:
You can check with:
/usr/bin/kvm --version

Should show the following:

Code:

QEMU emulator version 5.2.0 (pve-qemu-kvm_5.2.0) Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers

Also:

Code:

# dpkg -l pve-qemu-kvm Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==============-============-============-=================================== ii pve-qemu-kvm 5.2.0-4 amd64 Full virtualization on x86 hardware

Be sure to actually stop/start the virtual machines after installing. A migration out and back into the machine should be sufficient if I read correctly. I've been migrating systems off, upgrading with:

Code:

# Update repositories apt update # Upgrade OS apt dist-upgrade

Then I use t.lamprecht' update instructions to upgrade pve-qemu-kvm. After fully updating, I've been rebooting the hosts to ensure all kernel updates are applied and then migrating vms back.

Thank you!

The first command yields the same on my production versus updated POC cluster. The second command does show 5.2.0-4 versus 5.2.0-3 so it appears to have worked. I live migrated the VMs back and forth on the POC. I'll update and do the same migrating on the production cluster now.

Dunuin · Mar 24, 2021

t.lamprecht said:

Bash:

# enable pvetest
echo 'deb http://download.proxmox.com/debian/pve buster pvetest' > /etc/apt/sources.list.d/pvetest.list
# update available packages
apt update
# instruct apt to only install the 'pve-qemu-kvm' package, it will get the newer one from pvetest
apt install pve-qemu-kvm
# disable pvetest again
rm /etc/apt/sources.list.d/pvetest.list
apt update

I tried that and rebooted the host. Looks like it fixed it.
All snapshot tasks still produce a "

VM 122 qmp command failed - VM 122 qmp command 'query-proxmox-support' failed - unable to connect to VM 122 qmp socket - timeout after 31 retries

" in the syslog but atleast VMs don't get unresponsive anymore.

g0ha · Mar 24, 2021

Thank you! 12h after fix update... normal fly

t.lamprecht · Mar 24, 2021

Adam Koczarski said:
I've noticed the same phenomenon with one of our 2012R2 servers. It happened at the same time we had the issue reported in this thread. It has also happened before the issue in this thread came to be. I noticed it the first time after a Windows update.

FYI: Seem unrelated to this specific issue, rather you were/are affected by this bug:
https://forum.proxmox.com/threads/w...s-6-3-4-patch-inside.84915/page-2#post-373380

And here's the available fix:
https://forum.proxmox.com/threads/w...s-6-3-4-patch-inside.84915/page-3#post-374993

t.lamprecht · Mar 24, 2021

FYI: We moved the package with the fix today to the pve-no-subscription repository, so there's now no need to update from pvetest any more.

jro · Mar 24, 2021

Excellent, thanks Proxmox team!

What is the cause of the "VM ### qmp command failed - VM ### qmp command 'query-proxmox-support' failed - unable to connect to VM ### qmp socket - timeout after 31 retries" syslog event that Dunuin reported above? Is this message safe to ignore?

Stefan_R · Mar 24, 2021

jro said:
What is the cause of the "VM ### qmp command failed - VM ### qmp command 'query-proxmox-support' failed - unable to connect to VM ### qmp socket - timeout after 31 retries" syslog event that Dunuin reported above? Is this message safe to ignore?

Hm, do you also see that? After applying the upgrade? If so, please provide more details about your configuration, when the message appears and potentially anything else that shows up your logs. In general it is not safe to ignore, it means something has gone wrong, but the question is if it is related to the error here.

There is a similar report on our bugtracker: https://bugzilla.proxmox.com/show_bug.cgi?id=3360

jro · Mar 24, 2021

Edit: I re-read your post; after I upgrade, I'll let you know if I'm getting this syslog message as well.

I have not upgraded yet, but the user above (Dunuin) said he upgraded from the test repo, didn't get VM freezing, but still go that syslog message:

Dunuin said:
I tried that and rebooted the host. Looks like it fixed it.
All snapshot tasks still produce a "VM 122 qmp command failed - VM 122 qmp command 'query-proxmox-support' failed - unable to connect to VM 122 qmp socket - timeout after 31 retries" in the syslog but atleast VMs don't get unresponsive anymore.

Dunuin · Mar 24, 2021

Yes, but with the bugged version where the VMs got unresponsive I got that error message several times per minute for hours. Everything was fine until I started any snapshot. Then the VMs got unresponsive, the snapshot finished with "OK" or "Error" but it looked like the snapshot got stuck somehow because there was a factor 100-1000 increasement in disk IO and that error message was spamming the the syslog all the time until I rebooted the host. After reboot everything was fine again.

But now with the fixed version VMs are working fine and I can use snapshotting again. The error message is still there, but only while the snapshot is running and not afterwards. And I also got that message whiel snapshotting with "pve-qemu-kvm 5.1.0-8".

RPTU · Mar 27, 2021

Hi,

update of pve-no-subscription shows me qemu-server-6.3-8 an mentions no fix in the changes. Which package do I have to install to get the fix from there?

Thx.

Dunuin · Mar 27, 2021

tukl said:
Hi,

update of pve-no-subscription shows me qemu-server-6.3-8 an mentions no fix in the changes. Which package do I have to install to get the fix from there?

Thx.

pve-qemu-kvm 5.1.0-8

RPTU · Mar 27, 2021

Hi,

my System is pve-qemu-kvm 5.2.0-2 and want's to update to 5.2.0-4.

Thx.

Dunuin · Mar 27, 2021

tukl said:
Hi,

my System is pve-qemu-kvm 5.2.0-2 and want's to update to 5.2.0-4.

Thx.

Sorry the new version that should work is "pve-qemu-kvm: 5.2.0-4". "pve-qemu-kvm 5.1.0-8" is the last version that was working before the bug.

sonuyos · Mar 29, 2021

Can anyone confirm that which all pve-qemu-kvm versions produce the error? i see that 5.2.0-4 fixed it, and last working was 5.1.0-8, can anyone confirm which versions produce the error?

t.lamprecht · Mar 29, 2021

sonuyos said:
Can anyone confirm that which all pve-qemu-kvm versions produce the error? i see that 5.2.0-4 fixed it, and last working was 5.1.0-8, can anyone confirm which versions produce the error?

That regression came in with 5.2-1 (the first 5.2 release) and was fixed with 5.2-4, so all versions in-between were affected.
Why do you ask?

sonuyos · Mar 29, 2021

t.lamprecht said:
That regression came in with 5.2-1 (the first 5.2 release) and was fixed with 5.2-4, so all versions in-between were affected.
Why do you ask?

I have widespread issue across 20+ nodes, so just want to check which is infected, as it is hard to keep track.

All VMs locking up after latest PVE update

Proxmox Staff Member

Renowned Member

Renowned Member

New Member

Renowned Member

Distinguished Member

Active Member

Proxmox Staff Member

Proxmox Staff Member

Member

Proxmox Retired Staff

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Active Member

Proxmox Staff Member

Active Member

We value your privacy