VMs freezing randomly

rzv · Aug 1, 2022

Hi everyone,

I have an issue with VMs randomly freezing. IT happens multiple times a day at seemingly random times. Not all my VMs freeze at once but they are all affected at one time or another. All my VMs are Linux running different distros and different kernels. There is no information in the host syslog or in any of the guest logs.

Failure mode:
VM becomes totally unresponsive and CPU usage goes up to 100% on a single core. Proxmox will report 50% on VMs with 2 cores, 25% usage on VMs with 4 cores and so on.
All disk and network activity stops for that VM (according to my monitoring). The host has never frozen or become unstable.

Recovery:
Forcefully stop the VM then start it again.

Hardware:
Intel NUC 11 (NUC11ATKC4)
16 GB DDR4 RAM
M.2 SSD (ext4 filesystem, qcow2 virtual disks)
Realtek NIC chipset.

Things I've already tried:

Updating BIOS.
Disabling C-states.
All combinations of power management features from the BIOS.
Disabling Intel SpeedStep and TurboBoost.
Disabling suspend for PCIe devices.
Changing machine type (to q35)
VM BIOS: both legacy and UEFI
Updating the kernels on both the host and guests. This has been happening from the moment I installed proxmox a few months ago and I always keep my kernels updated.
memtest (long shot since everything else is working while a VM is frozen

Another piece of possibly important info is that before Proxmox I had Fedora Server installed on this device and I was using pure libvirt to manage VMs. I can't be sure about this but I was having seemingly the same issues there. My tentative conclusion would be that this is an issue with KVM/QEMU and not specifically with PVE.

As a workaround I am now using watchdog to restart the VMs after they hang. This is working but it's not a long term solution as I don't want my VMs to restart several times a day.

I have now spent many days on troubleshooting this so any info or idea you have will be highly appreciated.
Thanks

shrdlicka · Aug 1, 2022

Hi,
you already tried a lot of things. What you can try as well is install the intel-microcode package (https://packages.debian.org/bullseye/intel-microcode). Maybe this has some newer code in it then the BIOS.

You could also try setting up some local monitoring with atop on the host & inside the VMs. Maybe this helps to see what's going when this is happening.

Did you check Wait-IO while a VM was stuck?

Regards

rzv · Aug 1, 2022

Hi and thanks for the quick reply,

Iowait looks normal while a VM is frozen.
The last thing I tried was setting the intel_idle.max_cstate=1 kernel parameter. It hasn't crashed for a few hours now.
If it freezes again I will try to install the microcode package and see if that helps.

Regards

delVhar · Aug 4, 2022

rzv said:
Hi and thanks for the quick reply,

Iowait looks normal while a VM is frozen.
The last thing I tried was setting the intel_idle.max_cstate=1 kernel parameter. It hasn't crashed for a few hours now.
If it freezes again I will try to install the microcode package and see if that helps.

Regards

Hi, I've also been having a seemingly similar issue with a VM locking and wasn't able to catch anything obvious looking at iowait etc.

Did you have success with that kernel parameter?
And did you set it on the host or the VM?

rzv · Aug 4, 2022

Hi,

After another few days of troubleshooting the freezes are still happening.
I've managed to use netconsole to capture a kernel panic inside one of my VMs.

I'm not sure I want to go down the rabbit hole of debugging this down to the line of code that is causing it until I can confirm that the same thing happens in all my VMs.

I have moved a few of the VM disks off of local storage to a NFS server and haven't had a freeze in 24 hours. So maybe this is a storage issue after all.

gyrex · Aug 4, 2022

I'm also having the same issue with an N5105 CPU and the person in the thread below is also having the same issue with his N5105 CPU. This appears to be a common issue with the N5105 - how to we capture and send these logs and who do we send them to? Is this a kernel issue with this CPU?

My thread: https://forum.proxmox.com/threads/proxmox-vm-crash-freeze.113177/

Added more info to this thread: https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/

Other threads with N5105 freezing issues:

https://forum.proxmox.com/threads/pfsense-vm-keeps-freezing-crashing.112439/
https://forum.proxmox.com/threads/qemu-failure-on-the-latest-kernel-5-15-39-2-pve.112988/

rzv · Aug 4, 2022

gyrex said:
I'm also having the same issue with an N5105 CPU and the person in the thread below is also having the same issue with his N5105 CPU. This appears to be a common issue with the N5105 - how to we capture and send these logs and who do we send them to? Is this a kernel issue with this CPU?

It looks to me like a kernel issue.

You could capture the kernel panic using netconsole (I used this guide). But to go any further than this we would need to debug the kernel.
Unless someone does that and uncovers a possible bug in the kernel with this CPU I think we are out of luck.
For now I am just waiting and updating kernels as soon as they become available.
It's a shame since this CPU gave me the same performance as my old Xeon with 1/6 as much power consumed.

gyrex · Aug 4, 2022

rzv said:
It looks to me like a kernel issue.

You could capture the kernel panic using netconsole (I used this guide). But to go any further than this we would need to debug the kernel.
Unless someone does that and uncovers a possible bug in the kernel with this CPU I think we are out of luck.
For now I am just waiting and updating kernels as soon as they become available.
It's a shame since this CPU gave me the same performance as my old Xeon with 1/6 as much power consumed.

Have you managed to log the kernel crash/freeze through netconsole? If so, I've filed a bug report here: https://bugzilla.proxmox.com/show_bug.cgi?id=4188

Can you please add your log files to that bug and also state that you're experiencing the same issue?

In the meantime, I'll also run netconsole and will also try and collect some logs.

rzv · Aug 4, 2022

gyrex said:
Have you managed to log the kernel crash/freeze through netconsole? If so, I've filed a bug report here: https://bugzilla.proxmox.com/show_bug.cgi?id=4188

Can you please add your log files to that bug and also state that you're experiencing the same issue?

In the meantime, I'll also run netconsole and will also try and collect some logs.

I've started logging and I've collected a crash log already. I will wait to gather 2 or 3 more before I upload them.
It seems to be the same error every time: "stack guard page was hit" although different processes are triggering it.
Can you please upload one of yours so I can compare?

gyrex · Aug 4, 2022

rzv said:
I've started logging and I've collected a crash log already. I will wait to gather 2 or 3 more before I upload them.
It seems to be the same error every time: "stack guard page was hit" although different processes are triggering it.
Can you please upload one of yours so I can compare?

I've only just started running netconsole so I don't have any logs yet. I'll upload them when my vm freezes again.

gyrex · Aug 5, 2022

My pfSense VM froze overnight. Do you happen to run pfsense as well? Have you seen it hang/freeze? There was nothing logged to console, just a dead VM which I had to hard reset.

rzv · Aug 5, 2022

I don't run pfSense but I have 4 VMs running different distros with slightly different kernels. They all freeze eventually but some do it more frequently then others. My docker VM freezes 3-4 times a day.

There are 4 crashes in the log from yesterday. I also added this log to the bug you opened on bugzilla.

gyrex · Aug 5, 2022

rzv said:
I don't run pfSense but I have 4 VMs running different distros with slightly different kernels. They all freeze eventually but some do it more frequently then others. My docker VM freezes 3-4 times a day.

There are 4 crashes in the log from yesterday. I also added this log to the bug you opened on bugzilla.

Yeh, that's the same error I saw on my console screen. There's nothing logged to the console screen when pfSense freezes.

So the common denominators are that we're all running the Intel N5105 and Proxmox. We'd probably need someone skilled to look at the logs but maybe it's a Proxmox kernel issue? Hopefully someone will address the bugzilla report.

If this continues to happen, I'll swap out Proxmox for VMware ESXi and see if the VMs on that are more stable, then we can further isolate the issue to Proxmox and not the VMs themselves.

rzv · Aug 5, 2022

gyrex said:
Yeh, that's the same error I saw on my console screen. There's nothing logged to the console screen when pfSense freezes.

So the common denominators are that we're all running the Intel N5105 and Proxmox. We'd probably need someone skilled to look at the logs but maybe it's a Proxmox kernel issue? Hopefully someone will address the bugzilla report.

If this continues to happen, I'll swap out Proxmox for VMware ESXi and see if the VMs on that are more stable, then we can further isolate the issue to Proxmox and not the VMs themselves.

I'm not convinced this is a proxmox issue. I don't know how much the PVE kernel is different from upstream.
Before Proxmox I had Fedora server installed on this machine and I used libvirt to manage VMs. They also froze back then. Of course there's no way to know if it's the same bug, but it does seem to point to a upstream kernel issue.

gyrex · Aug 5, 2022

rzv said:
I'm not convinced this is a proxmox issue. I don't know how much the PVE kernel is different from upstream.
Before Proxmox I had Fedora server installed on this machine and I used libvirt to manage VMs. They also froze back then. Of course there's no way to know if it's the same bug, but it does seem to point to a upstream kernel issue.

I should clarify: I don't mean a Proxmox issue in terms of the stack they've loaded on the kernel, but an issue with proxmox package as a whole (including the kernel which they don't maintain). I'm not an expert but it does seem like an upstream kernel issue.

Boppel · Aug 9, 2022

Hi, i am facing the same issue with my Intel N5105.
I started using Proxmox with the current version and since then, VMs are randomly crashing.

All VMs are Ubuntu VMs running Version 20.04 or 22.04 (qcow, singel core, 2 gb ram, 32 disk) except for one, which is running Free BSD (Opnsense).
The Opnsense VM didn't crash once and is running fine.

My VMs are running on a local NVME Samsung 970 SSD.
So i guess there isn't an issue with attached storage.

I think this might be an issue with the guest tools.

Currently i'm testing VMs with the following setup:

2 Cores, 2 GB ram, nvme SSD, guest tools installed
2 Cores, 4 GB ram, nvme SSD, guest tools installed
1 Core, 4 GB ram, nvme SSD, guest tools installed
1 core, 2 GB ram, local SATA SSD, no guest tools installed

I first want to make sure that this isn't a ressource issue and second i want to make sure that this isn't a nvme issue.

rzv · Aug 9, 2022

Boppel said:
I first want to make sure that this isn't a ressource issue and second i want to make sure that this isn't a nvme issue.

It also happens with remote storage on NFS.

gyrex · Aug 10, 2022

My Ubuntu VM finally froze again today but thankfully I was able to capture the kernel panic via netconsole and included the output below as well as the bug I've filed on Proxmox's bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=4188

I'm including the output of the log to add as much information to this thread as possible.

Code:

[12361.508193] BUG: kernel NULL pointer dereference, address: 0000000000000000
[12361.509399] #PF: supervisor write access in kernel mode
[12361.510524] #PF: error_code(0x0002) - not-present page
[12361.511847] PGD 0 P4D 0
[12361.513120] Oops: 0002 [#1] SMP PTI
[12361.514392] CPU: 0 PID: 3268 Comm: python3 Not tainted 5.15.0-46-generic #49-Ubuntu
[12361.515796] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
[12361.518606] RIP: 0010:asm_exc_general_protection+0x4/0x30
[12361.520233] Code: c4 48 8d 6c 24 01 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff ff e8 ea 7f f9 ff e9 05 0b 00 00 0f 1f 44 00 00 0f 1f 00 e8 <c8> 09 00 00 48 89 c4 48 8d 6c 24 01 48 89 e7 48 8b 74 24 78 48 c7
[12361.523251] RSP: 0018:ffffa7498342f010 EFLAGS: 00010046
[12361.524599] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000001
[12361.525806] RDX: ffff8fed49a6ed00 RSI: ffff8fed4b178000 RDI: ffff8fec418a9400
[12361.527014] RBP: ffffa7498342f8b0 R08: 0000000000000015 R09: ffff8fed4b1780a8
[12361.527868] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fed57e4f180
[12361.528754] R13: 0000000000004000 R14: 0000000000000015 R15: 0000000000000001
[12361.529623] FS:  00007f291afb8b30(0000) GS:ffff8fed7bc00000(0000) knlGS:0000000000000000
[12361.530318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12361.530941] CR2: 0000000000000000 CR3: 0000000102ad8000 CR4: 00000000000006f0
[12361.531602] Call Trace:
[12361.532257]  <TASK>
[12361.532953]  ? asm_exc_int3+0x40/0x40
[12361.533565]  ? asm_exc_general_protection+0x4/0x30
[12361.534192]  ? asm_exc_int3+0x40/0x40
[12361.534823]  ? asm_exc_general_protection+0x4/0x30
[12361.535450]  ? asm_exc_int3+0x40/0x40
[12361.536063]  ? asm_exc_general_protection+0x4/0x30
[12361.536675]  ? asm_exc_int3+0x40/0x40
[12361.537262]  ? asm_exc_general_protection+0x4/0x30
[12361.537845]  ? asm_exc_int3+0x40/0x40
[12361.538425]  ? asm_exc_general_protection+0x4/0x30
[12361.539015]  ? asm_exc_int3+0x40/0x40
[12361.539630]  ? asm_exc_general_protection+0x4/0x30
[12361.540212]  ? asm_exc_int3+0x40/0x40
[12361.540825]  ? asm_exc_general_protection+0x4/0x30
[12361.541561]  ? asm_exc_int3+0x40/0x40
[12361.542191]  ? asm_exc_general_protection+0x4/0x30
[12361.542761]  ? asm_exc_int3+0x40/0x40
[12361.543325]  ? asm_exc_general_protection+0x4/0x30
[12361.543909]  ? asm_exc_int3+0x40/0x40
[12361.544481]  ? asm_exc_general_protection+0x4/0x30
[12361.545062]  ? asm_exc_int3+0x40/0x40
[12361.545677]  ? asm_exc_general_protection+0x4/0x30
[12361.546270]  ? asm_exc_int3+0x40/0x40
[12361.546861]  ? asm_exc_general_protection+0x4/0x30
[12361.547466]  ? asm_exc_int3+0x40/0x40
[12361.548071]  ? asm_exc_general_protection+0x4/0x30
[12361.548669]  ? asm_exc_int3+0x40/0x40
[12361.549258]  ? asm_exc_general_protection+0x4/0x30
[12361.549844]  ? asm_exc_int3+0x40/0x40
[12361.550425]  ? asm_exc_general_protection+0x4/0x30
[12361.551007]  ? asm_exc_int3+0x40/0x40
[12361.551594]  ? asm_exc_general_protection+0x4/0x30
[12361.552138]  ? asm_exc_int3+0x40/0x40
[12361.552671]  ? asm_exc_general_protection+0x4/0x30
[12361.553201]  ? asm_exc_int3+0x40/0x40
[12361.553737]  ? asm_exc_general_protection+0x4/0x30
[12361.554226]  ? asm_exc_int3+0x40/0x40
[12361.554706]  ? asm_exc_general_protection+0x4/0x30
[12361.555175]  ? asm_exc_int3+0x40/0x40
[12361.555646]  ? asm_exc_general_protection+0x4/0x30
[12361.556093]  ? asm_exc_int3+0x40/0x40
[12361.556549]  ? asm_exc_general_protection+0x4/0x30
[12361.556992]  ? asm_exc_int3+0x40/0x40
[12361.557420]  ? asm_sysvec_spurious_apic_interrupt+0x20/0x20
[12361.557849]  ? schedule_hrtimeout_range_clock+0xa0/0x120
[12361.558272]  ? __fget_files+0x51/0xc0
[12361.558707]  ? __hrtimer_init+0x110/0x110
[12361.559140]  __fget_light+0x32/0x90
[12361.559560]  __fdget+0x13/0x20
[12361.559989]  do_select+0x302/0x850
[12361.560405]  ? __pollwait+0xe0/0xe0
[12361.560820]  ? __pollwait+0xe0/0xe0
[12361.561261]  ? __pollwait+0xe0/0xe0
[12361.561648]  ? __pollwait+0xe0/0xe0
[12361.562028]  ? cpumask_next_and+0x24/0x30
[12361.562443]  ? update_sg_lb_stats+0x78/0x580
[12361.562857]  ? kfree_skbmem+0x81/0xa0
[12361.563266]  ? update_group_capacity+0x2c/0x2d0
[12361.563725]  ? update_sd_lb_stats.constprop.0+0xe0/0x250
[12361.564130]  ? __check_object_size.part.0+0x3a/0x150
[12361.564518]  ? __check_object_size+0x1d/0x30
[12361.564904]  ? core_sys_select+0x246/0x420
[12361.565288]  core_sys_select+0x1dd/0x420
[12361.565684]  ? ktime_get_ts64+0x55/0x100
[12361.566086]  ? _copy_to_user+0x20/0x30
[12361.566495]  ? poll_select_finish+0x121/0x220
[12361.566899]  ? kvm_clock_get_cycles+0x11/0x20
[12361.567313]  kern_select+0xdd/0x180
[12361.567744]  __x64_sys_select+0x21/0x30
[12361.568148]  do_syscall_64+0x5c/0xc0
[12361.568546]  ? __do_softirq+0xd9/0x2e7
[12361.568947]  ? exit_to_user_mode_prepare+0x37/0xb0
[12361.569349]  ? irqentry_exit_to_user_mode+0x9/0x20
[12361.569753]  ? irqentry_exit+0x1d/0x30
[12361.570154]  ? sysvec_apic_timer_interrupt+0x4e/0x90
[12361.570558]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[12361.570970] RIP: 0033:0x7f292739f4a3
[12361.571394] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 c7 d1 ff ff 41 54 b8 02 00 00 00 49 89 f4 be 00 88 08 00 55
[12361.572283] RSP: 002b:00007f291afaaf68 EFLAGS: 00000246 ORIG_RAX: 0000000000000017
[12361.572752] RAX: ffffffffffffffda RBX: 00007f291afb8b30 RCX: 00007f292739f4a3
[12361.573227] RDX: 00007f291afab090 RSI: 00007f291afab010 RDI: 0000000000000017
[12361.573706] RBP: 00007f291afab010 R08: 00007f291afaafb0 R09: 0000000000000000
[12361.574182] R10: 00007f291afab110 R11: 0000000000000246 R12: 0000000000000017
[12361.574656] R13: 00007f291afab090 R14: 00007f291afab190 R15: 00007f291afaf1a0
[12361.575144]  </TASK>
[12361.575640] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay sch_fq_codel joydev input_leds cp210x serio_raw usbserial cdc_acm qemu_fw_cfg mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore pstore_blk mtd ramoops netconsole reed_solomon ipmi_devintf ipmi_msghandler msr pstore_zone ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic bochs drm_vram_helper drm_ttm_helper ttm psmouse drm_kms_helper usbhid syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover failover cec hid rc_core virtio_scsi drm i2c_piix4 pata_acpi floppy
[12361.580240] CR2: 0000000000000000
[12361.580896] ---[ end trace 2596706ab1b3b337 ]---
[12361.581518] RIP: 0010:asm_exc_general_protection+0x4/0x30
[12361.582178] Code: c4 48 8d 6c 24 01 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff ff e8 ea 7f f9 ff e9 05 0b 00 00 0f 1f 44 00 00 0f 1f 00 e8 <c8> 09 00 00 48 89 c4 48 8d 6c 24 01 48 89 e7 48 8b 74 24 78 48 c7
[12361.583552] RSP: 0018:ffffa7498342f010 EFLAGS: 00010046
[12361.584323] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000001
[12361.585078] RDX: ffff8fed49a6ed00 RSI: ffff8fed4b178000 RDI: ffff8fec418a9400
[12361.585828] RBP: ffffa7498342f8b0 R08: 0000000000000015 R09: ffff8fed4b1780a8
[12361.586563] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fed57e4f180
[12361.587283] R13: 0000000000004000 R14: 0000000000000015 R15: 0000000000000001
[12361.588012] FS:  00007f291afb8b30(0000) GS:ffff8fed7bc00000(0000) knlGS:0000000000000000
[12361.588742] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12361.589472] CR2: 0000000000000000 CR3: 0000000102ad8000 CR4: 00000000000006f0
[12394.744918] BUG: kernel NULL pointer dereference, address: 0000000000000045
[12394.745723] #PF: supervisor instruction fetch in kernel mode
[12394.746513] #PF: error_code(0x0010) - not-present page
[12394.747292] PGD 0 P4D 0
[12394.748083] Oops: 0010 [#2] SMP PTI
[12394.748858] CPU: 0 PID: 3950 Comm: mosquitto Tainted: G      D           5.15.0-46-generic #49-Ubuntu
[12394.749639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
[12394.751251] RIP: 0010:0x45
[12394.752088] Code: Unable to access opcode bytes at RIP 0x1b.
[12394.752907] RSP: 0018:ffffa74980003648 EFLAGS: 00010046
[12394.753731] RAX: 0000000000000045 RBX: ffff8fed57f082c8 RCX: 00000000000000c3
[12394.754576] RDX: 0000000000000010 RSI: 0000000000000001 RDI: ffffa7498342fa00
[12394.755413] RBP: ffffa74980003690 R08: 00000000000000c3 R09: ffffa749800036a8
[12394.756244] R10: 00000000b140ae3e R11: ffffa74980003730 R12: 0000000000000000
[12394.757091] R13: 0000000000000000 R14: 0000000000000010 R15: 00000000000000c3
[12394.757972] FS:  00007f250ea9ab48(0000) GS:ffff8fed7bc00000(0000) knlGS:0000000000000000
[12394.758803] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12394.759627] CR2: 0000000000000045 CR3: 0000000026064000 CR4: 00000000000006f0
[12394.760488] Call Trace:
[12394.761303]  <IRQ>
[12394.762148]  ? __wake_up_common+0x7d/0x140
[12394.762979]  __wake_up_common_lock+0x7c/0xc0
[12394.763834]  __wake_up_sync_key+0x20/0x30
[12394.764666]  sock_def_readable+0x3b/0x80
[12394.765471]  tcp_data_ready+0x31/0xe0
[12394.766280]  tcp_data_queue+0x315/0x610
[12394.767028]  tcp_rcv_established+0x25f/0x6d0
[12394.767799]  tcp_v4_do_rcv+0x155/0x260
[12394.768568]  tcp_v4_rcv+0xd9d/0xed0
[12394.769302]  ip_protocol_deliver_rcu+0x3d/0x240
[12394.770033]  ip_local_deliver_finish+0x48/0x60
[12394.770726]  ip_local_deliver+0xfb/0x110
[12394.771387]  ? ip_protocol_deliver_rcu+0x240/0x240
[12394.772059]  ip_rcv_finish+0xbe/0xd0
[12394.772746]  ip_sabotage_in+0x5f/0x70 [br_netfilter]
[12394.773425]  nf_hook_slow+0x44/0xc0
[12394.774105]  ip_rcv+0x8a/0x190
[12394.774731]  ? ip_sublist_rcv+0x200/0x200
[12394.775349]  __netif_receive_skb_one_core+0x8a/0xa0
[12394.775959]  __netif_receive_skb+0x15/0x60
[12394.776551]  netif_receive_skb+0x43/0x140
[12394.777140]  ? fdb_find_rcu+0xb1/0x130 [bridge]
[12394.777769]  br_pass_frame_up+0x151/0x190 [bridge]
[12394.778382]  br_handle_frame_finish+0x1a5/0x520 [bridge]
[12394.778981]  ? __nf_ct_refresh_acct+0x55/0x60 [nf_conntrack]
[12394.779589]  ? nf_conntrack_tcp_packet+0x61f/0xf60 [nf_conntrack]
[12394.780171]  ? br_pass_frame_up+0x190/0x190 [bridge]
[12394.780758]  br_nf_hook_thresh+0xe1/0x120 [br_netfilter]
[12394.781337]  ? br_pass_frame_up+0x190/0x190 [bridge]
[12394.781937]  br_nf_pre_routing_finish+0x16e/0x430 [br_netfilter]
[12394.782517]  ? br_pass_frame_up+0x190/0x190 [bridge]
[12394.783122]  ? nf_nat_ipv4_pre_routing+0x4a/0xc0 [nf_nat]
[12394.783755]  br_nf_pre_routing+0x245/0x550 [br_netfilter]
[12394.784323]  ? tcp_write_xmit+0x690/0xb10
[12394.784872]  ? br_nf_forward_arp+0x320/0x320 [br_netfilter]
[12394.785424]  br_handle_frame+0x211/0x3c0 [bridge]
[12394.785995]  ? fib_multipath_hash+0x4a0/0x6a0
[12394.786535]  ? br_pass_frame_up+0x190/0x190 [bridge]
[12394.787075]  ? br_handle_frame_finish+0x520/0x520 [bridge]
[12394.787615]  __netif_receive_skb_core.constprop.0+0x23a/0xef0
[12394.788148]  ? ip_rcv+0x16f/0x190
[12394.788718]  __netif_receive_skb_one_core+0x3f/0xa0
[12394.789306]  __netif_receive_skb+0x15/0x60
[12394.789831]  process_backlog+0x9e/0x170
[12394.790353]  __napi_poll+0x33/0x190
[12394.790860]  net_rx_action+0x126/0x280
[12394.791351]  __do_softirq+0xd9/0x2e7
[12394.791846]  do_softirq+0x7d/0xb0
[12394.792350]  </IRQ>
[12394.792855]  <TASK>
[12394.793338]  __local_bh_enable_ip+0x54/0x60
[12394.793830]  ip_finish_output2+0x1a2/0x580
[12394.794331]  __ip_finish_output+0xb7/0x180
[12394.794823]  ip_finish_output+0x2e/0xc0
[12394.795316]  ip_output+0x78/0x100
[12394.795803]  ? __ip_finish_output+0x180/0x180
[12394.796322]  ip_local_out+0x5e/0x70
[12394.796816]  __ip_queue_xmit+0x180/0x440
[12394.797311]  ? page_counter_cancel+0x2e/0x80
[12394.797820]  ip_queue_xmit+0x15/0x20
[12394.798322]  __tcp_transmit_skb+0x8dd/0xa00
[12394.798813]  tcp_write_xmit+0x3ab/0xb10
[12394.799303]  ? __check_object_size.part.0+0x4a/0x150
[12394.799808]  __tcp_push_pending_frames+0x37/0x100
[12394.800308]  tcp_push+0xd6/0x100
[12394.800806]  tcp_sendmsg_locked+0x883/0xc80
[12394.801303]  tcp_sendmsg+0x2d/0x50
[12394.801793]  inet_sendmsg+0x43/0x80
[12394.802302]  sock_sendmsg+0x62/0x70
[12394.802787]  sock_write_iter+0x93/0xf0
[12394.803277]  new_sync_write+0x193/0x1b0
[12394.803770]  vfs_write+0x1d5/0x270
[12394.804276]  ksys_write+0xb5/0xf0
[12394.804737]  ? syscall_trace_enter.constprop.0+0xa7/0x1c0
[12394.805205]  __x64_sys_write+0x19/0x20
[12394.805665]  do_syscall_64+0x5c/0xc0
[12394.806129]  ? syscall_exit_to_user_mode+0x27/0x50
[12394.806592]  ? do_syscall_64+0x69/0xc0
[12394.807059]  ? do_syscall_64+0x69/0xc0
[12394.807549]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[12394.808008] RIP: 0033:0x7f250ea593ad
[12394.808499] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 8a d2 ff ff 41 54 b8 02 00 00 00 49 89 f4 be 00 88 08 00 55
[12394.809442] RSP: 002b:00007ffea08ec188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[12394.809945] RAX: ffffffffffffffda RBX: 00007f250ea9ab48 RCX: 00007f250ea593ad
[12394.810440] RDX: 00000000000000a2 RSI: 00007f250e79c810 RDI: 0000000000000009
[12394.810933] RBP: 00007f250e7d7e80 R08: 0000000000000000 R09: 0000000000000000
[12394.811451] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[12394.811938] R13: 000000000000009f R14: 0000000000000000 R15: 00007f250e7d7e80
[12394.812449]  </TASK>
[12394.812930] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay sch_fq_codel joydev input_leds cp210x serio_raw usbserial cdc_acm qemu_fw_cfg mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore pstore_blk mtd ramoops netconsole reed_solomon ipmi_devintf ipmi_msghandler msr pstore_zone ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic bochs drm_vram_helper drm_ttm_helper ttm psmouse drm_kms_helper usbhid syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover failover cec hid rc_core virtio_scsi drm i2c_piix4 pata_acpi floppy
[12394.817596] CR2: 0000000000000045
[12394.818324] ---[ end trace 2596706ab1b3b338 ]---
[12394.819007] RIP: 0010:asm_exc_general_protection+0x4/0x30
[12394.819695] Code: c4 48 8d 6c 24 01 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff ff e8 ea 7f f9 ff e9 05 0b 00 00 0f 1f 44 00 00 0f 1f 00 e8 <c8> 09 00 00 48 89 c4 48 8d 6c 24 01 48 89 e7 48 8b 74 24 78 48 c7
[12394.821094] RSP: 0018:ffffa7498342f010 EFLAGS: 00010046
[12394.821847] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000001
[12394.822622] RDX: ffff8fed49a6ed00 RSI: ffff8fed4b178000 RDI: ffff8fec418a9400
[12394.823371] RBP: ffffa7498342f8b0 R08: 0000000000000015 R09: ffff8fed4b1780a8
[12394.824113] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fed57e4f180
[12394.824874] R13: 0000000000004000 R14: 0000000000000015 R15: 0000000000000001
[12394.825623] FS:  00007f250ea9ab48(0000) GS:ffff8fed7bc00000(0000) knlGS:0000000000000000
[12394.826391] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12394.827160] CR2: 0000000000000045 CR3: 0000000026064000 CR4: 00000000000006f0
[12394.827934] Kernel panic - not syncing: Fatal exception in interrupt
[12394.828901] Kernel Offset: 0x8200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[12394.829699] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

rRobbie · Aug 25, 2022

Same configuration, same exact problem. Installation done yesterday with the latest VE version with updates. Ubuntu VM tonight frozen.

cameloid · Sep 12, 2022

Same issue, similar CPU (Celeron N5095).

VMs freezing randomly

New Member

Proxmox Retired Staff

New Member

New Member

New Member

Member

New Member

Member

New Member

Member

Member

New Member

Attachments

Member

New Member

Member

New Member

New Member

Member

Member

New Member

We value your privacy