VM update command hangs server

LM1980

Member
Dec 10, 2020
8
0
6
44
I am running proxmox-ve: 8.2.0 (running kernel: 6.8.4-3-pve). There is not much load on the server, there are a few development VMs. However, whenever I run an apt upgrade command on any of my VM the whole server will freeze. I also got my CPU replaced, and performed a clean install of proxmox, but the same result
------------[ cut here ]------------
Jun 14 07:12:04 in kernel: WARNING: CPU: 0 PID: 3006 at kernel/exit.c:820 do_exit+0x8dd/0xae0
Jun 14 07:12:04 in kernel: Modules linked in: veth ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_NFLOG xt_limit xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrack xt_tcpudp iptable_filter ip_set_hash_net ip_set nf_conntrack_netlink nfnetlink_acct wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel softdog nf_tables iptable_raw xt_CT iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc binfmt_misc nfnetlink_log nfnetlink bonding tls ipmi_ssif intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 joydev input_leds acpi_ipmi aesni_intel hid_generic crypto_simd usbkbd usbmouse ipmi_si cryptd jc42 cmdlinepart intel_pmc_core cdc_ether
Jun 14 07:12:04 in kernel: usbhid mei_me intel_vsec spi_nor ipmi_devintf usbnet pmt_telemetry intel_cstate ee1004 mtd hid mei i2c_algo_bit mii ipmi_msghandler pmt_class acpi_tad mac_hid isofs zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 raid1 xhci_pci nvme xhci_pci_renesas i40e crc32_pclmul intel_lpss_pci ahci xhci_hcd nvme_core i2c_i801 spi_intel_pci intel_lpss spi_intel i2c_smbus libahci idma64 nvme_auth video wmi pinctrl_tigerlake
Jun 14 07:12:04 in kernel: CPU: 0 PID: 3006 Comm: kvm Tainted: P D O 6.8.4-3-pve #1
Jun 14 07:12:04 in kernel: Hardware name: E3C252D4U-2T/E3C252D4U-2T/OVH, BIOS 4.03.OV01 05/28/2024
Jun 14 07:12:04 in kernel: RIP: 0010:do_exit+0x8dd/0xae0
Jun 14 07:12:05 in kernel: Code: e9 42 f8 ff ff 48 8b bb e0 09 00 00 31 f6 e8 9a e0 ff ff e9 ee fd ff ff 4c 89 ee bf 05 06 00 00 e8 08 3a 01 00 e9 6e f8 ff ff <0f> 0b e9 9c f7 ff ff 0f 0b e9 55 f7 ff ff 48 89 df e8 0d 2f 14 00
Jun 14 07:12:05 in kernel: RSP: 0018:ffffb7ad4f863ec8 EFLAGS: 00010282
Jun 14 07:12:05 in kernel: RAX: 0000000000000000 RBX: ffff9a6812a38000 RCX: 0000000000000000
Jun 14 07:12:05 in kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jun 14 07:12:05 in kernel: RBP: ffffb7ad4f863f20 R08: 0000000000000000 R09: 0000000000000000
Jun 14 07:12:05 in kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a68131e4380
Jun 14 07:12:05 in kernel: R13: 0000000000000009 R14: ffff9a6816aa98c0 R15: 0000000000000000
Jun 14 07:12:05 in kernel: FS: 00007bf59d6006c0(0000) GS:ffff9a773f000000(0000) knlGS:0000000000000000
Jun 14 07:12:05 in kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 07:12:05 in kernel: CR2: 0000000000000008 CR3: 0000000130de2003 CR4: 0000000000772ef0
Jun 14 07:12:05 in kernel: PKRU: 55555554
Jun 14 07:12:05 in kernel: Call Trace:
Jun 14 07:12:05 in kernel: <TASK>
Jun 14 07:12:05 in kernel: ? show_regs+0x6d/0x80
Jun 14 07:12:05 in kernel: ? __warn+0x89/0x160
Jun 14 07:12:05 in kernel: ? do_exit+0x8dd/0xae0
Jun 14 07:12:05 in kernel: ? report_bug+0x17e/0x1b0
Jun 14 07:12:05 in kernel: ? handle_bug+0x46/0x90
Jun 14 07:12:05 in kernel: ? exc_invalid_op+0x18/0x80
Jun 14 07:12:05 in kernel: ? asm_exc_invalid_op+0x1b/0x20
Jun 14 07:12:05 in kernel: ? do_exit+0x8dd/0xae0
Jun 14 07:12:05 in kernel: ? do_exit+0x72/0xae0
Jun 14 07:12:05 in kernel: ? _printk+0x60/0x90
Jun 14 07:12:05 in kernel: make_task_dead+0x83/0x170
Jun 14 07:12:05 in kernel: rewind_stack_and_make_dead+0x17/0x20
Jun 14 07:12:05 in kernel: RIP: 0033:0x7bf5aae4fb95
Jun 14 07:12:05 in kernel: Code: 00 00 00 44 89 d0 41 b9 08 00 00 00 83 c8 10 f6 87 d0 00 00 00 01 8b bf cc 00 00 00 44 0f 45 d0 45 31 c0 b8 aa 01 00 00 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 41 83 e2 02 74 c2 f0 48 83 0c 24
Jun 14 07:12:05 in kernel: RSP: 002b:00007bf59d5fafa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
Jun 14 07:12:05 in kernel: RAX: ffffffffffffffda RBX: 00005cdae0e4cd90 RCX: 00007bf5aae4fb95
Jun 14 07:12:05 in kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000024
Jun 14 07:12:05 in kernel: RBP: 00005cdae0e4cd98 R08: 0000000000000000 R09: 0000000000000008
Jun 14 07:12:05 in kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00005cdae0e4ce80
Jun 14 07:12:05 in kernel: R13: 0000000000000001 R14: 00005cdae0cfdac8 R15: 0000000000000000
Jun 14 07:12:05 in kernel: </TASK>
Jun 14 07:12:05 in kernel: ---[ end trace 0000000000000000 ]---

after this i cannot connect
Jun 14 07:22:54 in pvestatd[2297]: VM 107 qmp command failed - VM 107 qmp command 'query-proxmox-support' failed - got timeout
Jun 14 07:22:54 in pvestatd[2297]: status update time (8.030 seconds)
Jun 14 07:23:04 in pvestatd[2297]: VM 107 qmp command failed - VM 107 qmp command 'query-proxmox-support' failed - unable to connect to VM 107 qmp socket - timeout after 51 retries
Jun 14 07:23:04 in pvestatd[2297]: status update time (8.032 seconds)
Jun 14 07:23:17 in pvestatd[2297]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - got timeout
Jun 14 07:23:22 in pvestatd[2297]: VM 107 qmp command failed - VM 107 qmp command 'query-proxmox-support' failed - unable to connect to VM 107 qmp socket - timeout after 51 retries
Jun 14 07:23:22 in pvestatd[2297]: status update time (16.040 seconds)
Jun 14 07:23:30 in pvestatd[2297]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Jun 14 07:23:35 in pvestatd[2297]: VM 107 qmp command failed - VM 107 qmp command 'query-proxmox-support' failed - unable to connect to VM 107 qmp socket - timeout after 51 retries
Jun 14 07:23:35 in pvestatd[2297]: status update time (13.042 seconds)
Jun 14 07:23:46 in pvestatd[2297]: VM 107 qmp command failed - VM 107 qmp command 'query-proxmox-support' failed - unable to connect to VM 107 qmp socket - timeout after 51 retries
Jun 14 07:23:51 in pvestatd[2297]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Jun 14 07:23:51 in pvestatd[2297]: status update time (16.046 seconds)
 
Last edited:
I am running proxmox-ve: 8.2.0 (running kernel: 6.8.4-3-pve). There is not much load on the server, there are a few development VMs. However, whenever I run an apt upgrade command on any of my VM the whole server will freeze. I also got my CPU replaced, and performed a clean install of proxmox, but the same result
------------[ cut here ]------------
Jun 14 07:12:04 in kernel: WARNING: CPU: 0 PID: 3006 at kernel/exit.c:820 do_exit+0x8dd/0xae0
Jun 14 07:12:04 in kernel: Modules linked in: veth ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_NFLOG xt_limit xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrack xt_tcpudp iptable_filter ip_set_hash_net ip_set nf_conntrack_netlink nfnetlink_acct wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel softdog nf_tables iptable_raw xt_CT iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc binfmt_misc nfnetlink_log nfnetlink bonding tls ipmi_ssif intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 joydev input_leds acpi_ipmi aesni_intel hid_generic crypto_simd usbkbd usbmouse ipmi_si cryptd jc42 cmdlinepart intel_pmc_core cdc_ether
Jun 14 07:12:04 in kernel: usbhid mei_me intel_vsec spi_nor ipmi_devintf usbnet pmt_telemetry intel_cstate ee1004 mtd hid mei i2c_algo_bit mii ipmi_msghandler pmt_class acpi_tad mac_hid isofs zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 raid1 xhci_pci nvme xhci_pci_renesas i40e crc32_pclmul intel_lpss_pci ahci xhci_hcd nvme_core i2c_i801 spi_intel_pci intel_lpss spi_intel i2c_smbus libahci idma64 nvme_auth video wmi pinctrl_tigerlake
Jun 14 07:12:04 in kernel: CPU: 0 PID: 3006 Comm: kvm Tainted: P D O 6.8.4-3-pve #1
Jun 14 07:12:04 in kernel: Hardware name: E3C252D4U-2T/E3C252D4U-2T/OVH, BIOS 4.03.OV01 05/28/2024
Jun 14 07:12:04 in kernel: RIP: 0010:do_exit+0x8dd/0xae0
Jun 14 07:12:05 in kernel: Code: e9 42 f8 ff ff 48 8b bb e0 09 00 00 31 f6 e8 9a e0 ff ff e9 ee fd ff ff 4c 89 ee bf 05 06 00 00 e8 08 3a 01 00 e9 6e f8 ff ff <0f> 0b e9 9c f7 ff ff 0f 0b e9 55 f7 ff ff 48 89 df e8 0d 2f 14 00
Jun 14 07:12:05 in kernel: RSP: 0018:ffffb7ad4f863ec8 EFLAGS: 00010282
Jun 14 07:12:05 in kernel: RAX: 0000000000000000 RBX: ffff9a6812a38000 RCX: 0000000000000000
Jun 14 07:12:05 in kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jun 14 07:12:05 in kernel: RBP: ffffb7ad4f863f20 R08: 0000000000000000 R09: 0000000000000000
Jun 14 07:12:05 in kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a68131e4380
Jun 14 07:12:05 in kernel: R13: 0000000000000009 R14: ffff9a6816aa98c0 R15: 0000000000000000
Jun 14 07:12:05 in kernel: FS: 00007bf59d6006c0(0000) GS:ffff9a773f000000(0000) knlGS:0000000000000000
Jun 14 07:12:05 in kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 07:12:05 in kernel: CR2: 0000000000000008 CR3: 0000000130de2003 CR4: 0000000000772ef0
Jun 14 07:12:05 in kernel: PKRU: 55555554
Jun 14 07:12:05 in kernel: Call Trace:
Jun 14 07:12:05 in kernel: <TASK>
Jun 14 07:12:05 in kernel: ? show_regs+0x6d/0x80
Jun 14 07:12:05 in kernel: ? __warn+0x89/0x160
Jun 14 07:12:05 in kernel: ? do_exit+0x8dd/0xae0
Jun 14 07:12:05 in kernel: ? report_bug+0x17e/0x1b0
Jun 14 07:12:05 in kernel: ? handle_bug+0x46/0x90
Jun 14 07:12:05 in kernel: ? exc_invalid_op+0x18/0x80
Jun 14 07:12:05 in kernel: ? asm_exc_invalid_op+0x1b/0x20
Jun 14 07:12:05 in kernel: ? do_exit+0x8dd/0xae0
Jun 14 07:12:05 in kernel: ? do_exit+0x72/0xae0
Jun 14 07:12:05 in kernel: ? _printk+0x60/0x90
Jun 14 07:12:05 in kernel: make_task_dead+0x83/0x170
Jun 14 07:12:05 in kernel: rewind_stack_and_make_dead+0x17/0x20
Jun 14 07:12:05 in kernel: RIP: 0033:0x7bf5aae4fb95
Jun 14 07:12:05 in kernel: Code: 00 00 00 44 89 d0 41 b9 08 00 00 00 83 c8 10 f6 87 d0 00 00 00 01 8b bf cc 00 00 00 44 0f 45 d0 45 31 c0 b8 aa 01 00 00 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 41 83 e2 02 74 c2 f0 48 83 0c 24
Jun 14 07:12:05 in kernel: RSP: 002b:00007bf59d5fafa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
Jun 14 07:12:05 in kernel: RAX: ffffffffffffffda RBX: 00005cdae0e4cd90 RCX: 00007bf5aae4fb95
Jun 14 07:12:05 in kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000024
Jun 14 07:12:05 in kernel: RBP: 00005cdae0e4cd98 R08: 0000000000000000 R09: 0000000000000008
Jun 14 07:12:05 in kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00005cdae0e4ce80
Jun 14 07:12:05 in kernel: R13: 0000000000000001 R14: 00005cdae0cfdac8 R15: 0000000000000000
Jun 14 07:12:05 in kernel: </TASK>
Jun 14 07:12:05 in kernel: ---[ end trace 0000000000000000 ]---

after this i cannot connect
Jun 14 07:22:54 in pvestatd[2297]: VM 107 qmp command failed - VM 107 qmp command 'query-proxmox-support' failed - got timeout
Jun 14 07:22:54 in pvestatd[2297]: status update time (8.030 seconds)
Jun 14 07:23:04 in pvestatd[2297]: VM 107 qmp command failed - VM 107 qmp command 'query-proxmox-support' failed - unable to connect to VM 107 qmp socket - timeout after 51 retries
Jun 14 07:23:04 in pvestatd[2297]: status update time (8.032 seconds)
Jun 14 07:23:17 in pvestatd[2297]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - got timeout
Jun 14 07:23:22 in pvestatd[2297]: VM 107 qmp command failed - VM 107 qmp command 'query-proxmox-support' failed - unable to connect to VM 107 qmp socket - timeout after 51 retries
Jun 14 07:23:22 in pvestatd[2297]: status update time (16.040 seconds)
Jun 14 07:23:30 in pvestatd[2297]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Jun 14 07:23:35 in pvestatd[2297]: VM 107 qmp command failed - VM 107 qmp command 'query-proxmox-support' failed - unable to connect to VM 107 qmp socket - timeout after 51 retries
Jun 14 07:23:35 in pvestatd[2297]: status update time (13.042 seconds)
Jun 14 07:23:46 in pvestatd[2297]: VM 107 qmp command failed - VM 107 qmp command 'query-proxmox-support' failed - unable to connect to VM 107 qmp socket - timeout after 51 retries
Jun 14 07:23:51 in pvestatd[2297]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Jun 14 07:23:51 in pvestatd[2297]: status update time (16.046 seconds)
eventually I have to do a hard shutdown of the server
 
Hi,
what kind of physical CPU do you have? Do you have the latest BIOS updates and CPU microcode installed: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_firmware_cpu ?
If you have an Intel CPU, consider trying the intel_iommu=off setting: https://pve.proxmox.com/wiki/Roadmap#8.2-known-issues
There also is kernel 6.8.8 on the pvetest repository, you could give a shot.

Another thing you might try is change the virtual CPU type in the VM configurations.
it is running an Intel Xeon-E 2386G - 6c/12t - 3.5 GHz/4.7 GHz

I also realized that proxmox would freeze when running this one command

sudo dpkg --configure -a

or whenever

update-initramfs: Generating /boot/initrd.img-5-15.0-105-generic is running

from any vm running ubuntu 22.04
 
I downgrade the whole server from v8 to v7. It ran the whole night, I uploaded all VM backups, restored them, and started them successfully. Till here, everything is working perfectly.

I had a meltdown as the server blocked me out, but I realized that it was the firewall blocking me out. So all good, but I can see EOL for version 7 and am worried upgrading to version 8 will bring back old problems
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!