System Lockup - How to Diagnose?

Davidoff

Well-Known Member
Nov 19, 2017
66
2
48
Hi there. I'm running PVE 7.3-3 on a single node which hosts a bunch of LXC containers. The machine has 128GB of RAM.

This morning, I discovered that the system was non-responsive. Web UIs for various containerized apps didn't respond; couldn't SSH in; and also couldn't login at a console - it showed the login prompt, but didn't respond to any keystrokes. Everything seemed to be in order once I powered down and powered back up, as far as I could tell. However, thought this was a bit worrisome and was hoping to try to diagnose it further to figure out what went wrong, but not quite sure how to proceed.

Based on a review of the system logs, the problem seemed to start at 3:01 am:

Code:
Jan 24 03:01:05 fava2 kernel: watchdog: BUG: soft lockup - CPU#18 stuck for 26s! [swapper/18:0]

My understanding is that this type of error has something to do with overtaxed resources. However, looking at the Proxmox webgui, at 3:00 am I see CPU at 7.13%, IO delay at 2.14%, Server load at 4.18 and RAM usage at 68.36GB (about 53% of total RAM). So it looks like the problem isn't overtaxed hardware, as far as I can tell.

I do have a script that starts running at 3:00 am every night, so thought that might be the cause, but it doesn't seem that's the case. The first thing the script does is to shutdown all containers, and the problem seems to arise just as it's shutting down one of the containers - the last line in the log output from the script is the following:

Code:
Jan 24 03:01:37 fava2 bash[608776]: command 'lxc-stop -n 111 --nokill --timeout 60' failed: got timeout

Lastly, after the first error message about the soft lockup, there's this:

Code:
Jan 24 03:01:05 fava2 kernel: Modules linked in: ip6t_REJECT nf_reject_ipv6 nft_counter ipt_REJECT nf_reject_ipv4 xt_addrtype xt_mark nft_compat nft_fib_ipv4 nft_ct nft_fib_ipv6 nft_fib binfmt_misc veth ebtable_filter ebtables ip_set ip>
Jan 24 03:01:05 fava2 kernel:  chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) >
Jan 24 03:01:05 fava2 kernel: CPU: 18 PID: 0 Comm: swapper/18 Tainted: P           O      5.15.83-1-pve #1
Jan 24 03:01:05 fava2 kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.3 05/23/2018
Jan 24 03:01:05 fava2 kernel: RIP: 0010:netdev_pick_tx+0xd9/0x310
Jan 24 03:01:05 fava2 kernel: Code: 84 2e 01 00 00 41 0f b7 46 7c 66 85 c0 0f 84 53 01 00 00 8d 48 ff 0f b7 c1 66 39 ca 0f 86 dd 01 00 00 39 c3 0f 87 55 01 00 00 <29> d8 39 c3 0f 87 4b 01 00 00 eb f4 0f 1f 44 00 00 49 8b 94 24 28
Jan 24 03:01:05 fava2 kernel: RSP: 0018:ffffaca706864618 EFLAGS: 00000297
Jan 24 03:01:05 fava2 kernel: RAX: 0000000000000007 RBX: 0000000000000000 RCX: 00000000000000da
Jan 24 03:01:05 fava2 kernel: RDX: 0000000000000000 RSI: ffff90cde9bca400 RDI: ffff90da06602000
Jan 24 03:01:05 fava2 kernel: RBP: ffffaca706864658 R08: 0000000000000000 R09: ffff90cde9bca400
Jan 24 03:01:05 fava2 kernel: R10: ffffaca706864970 R11: ffffaca7068649a0 R12: ffff90da06602000
Jan 24 03:01:05 fava2 kernel: R13: 00000000ffffffff R14: ffff90cde9bca400 R15: ffff90da06602000
Jan 24 03:01:05 fava2 kernel: FS:  0000000000000000(0000) GS:ffff90d9bfc80000(0000) knlGS:0000000000000000
Jan 24 03:01:05 fava2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 24 03:01:05 fava2 kernel: CR2: 00007fa57d7810a0 CR3: 0000001d1c810003 CR4: 00000000000606e0
Jan 24 03:01:05 fava2 kernel: Call Trace:
Jan 24 03:01:05 fava2 kernel:  <IRQ>
Jan 24 03:01:05 fava2 kernel:  netdev_core_pick_tx+0xa4/0xb0
Jan 24 03:01:05 fava2 kernel:  __dev_queue_xmit+0x1b8/0xb30
Jan 24 03:01:05 fava2 kernel:  ? netif_rx_internal+0x3a/0x100
Jan 24 03:01:05 fava2 kernel:  dev_queue_xmit+0x10/0x20
Jan 24 03:01:05 fava2 kernel:  ovs_vport_send+0xae/0x170 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  do_output+0x59/0x180 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  do_execute_actions+0xabf/0x1b40 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? update_process_times+0xc4/0xd0
Jan 24 03:01:05 fava2 kernel:  ? tick_do_update_jiffies64.part.0+0xa0/0xa0
Jan 24 03:01:05 fava2 kernel:  ? tick_do_update_jiffies64.part.0+0xa0/0xa0
Jan 24 03:01:05 fava2 kernel:  ? timerqueue_add+0xa7/0xd0
Jan 24 03:01:05 fava2 kernel:  ? __skb_flow_dissect+0xea9/0x1920
Jan 24 03:01:05 fava2 kernel:  ? lapic_next_deadline+0x2c/0x40
Jan 24 03:01:05 fava2 kernel:  ? flow_lookup.constprop.0+0x5c/0x110 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ovs_execute_actions+0x48/0x110 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? ovs_execute_actions+0x48/0x110 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ovs_dp_process_packet+0xa1/0x200 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? ovs_ct_update_key.isra.0+0xa8/0x120 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? ovs_ct_fill_key+0x1d/0x30 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? ovs_flow_key_extract+0x2da/0x350 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ovs_vport_receive+0x77/0xd0 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? __wake_up_common_lock+0x8a/0xc0
Jan 24 03:01:05 fava2 kernel:  ? __wake_up+0x13/0x20
Jan 24 03:01:05 fava2 kernel:  ? ep_poll_callback+0x23d/0x290
Jan 24 03:01:05 fava2 kernel:  ? __wake_up_common+0x7e/0x140
Jan 24 03:01:05 fava2 kernel:  netdev_frame_hook+0xdf/0x1b0 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? netdev_create+0x40/0x40 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  __netif_receive_skb_core+0x238/0xef0
Jan 24 03:01:05 fava2 kernel:  ? kfree_skb_reason.part.0+0x37/0x70
Jan 24 03:01:05 fava2 kernel:  ? kfree_skb_reason+0x1e/0x60
Jan 24 03:01:05 fava2 kernel:  ? br_forward+0xe8/0x120
Jan 24 03:01:05 fava2 kernel:  __netif_receive_skb_list_core+0x107/0x260
Jan 24 03:01:05 fava2 kernel:  netif_receive_skb_list_internal+0x1a1/0x2c0
Jan 24 03:01:05 fava2 kernel:  ? dev_gro_receive+0x2d6/0x7b0
Jan 24 03:01:05 fava2 kernel:  ? kmem_cache_alloc+0x1ab/0x2f0
Jan 24 03:01:05 fava2 kernel:  ? __build_skb+0x26/0x60
Jan 24 03:01:05 fava2 kernel:  napi_complete_done+0x7a/0x1c0
Jan 24 03:01:05 fava2 kernel:  igb_poll+0xc95/0x1490 [igb]
Jan 24 03:01:05 fava2 kernel:  __napi_poll+0x33/0x180
Jan 24 03:01:05 fava2 kernel:  net_rx_action+0x126/0x280
Jan 24 03:01:05 fava2 kernel:  __do_softirq+0xd9/0x2ea
Jan 24 03:01:05 fava2 kernel:  irq_exit_rcu+0x94/0xc0
Jan 24 03:01:05 fava2 kernel:  common_interrupt+0x8e/0xa0
Jan 24 03:01:05 fava2 kernel:  </IRQ>
Jan 24 03:01:05 fava2 kernel:  <TASK>
Jan 24 03:01:05 fava2 kernel:  asm_common_interrupt+0x27/0x40
Jan 24 03:01:05 fava2 kernel: RIP: 0010:cpuidle_enter_state+0xd9/0x620
Jan 24 03:01:05 fava2 kernel: Code: 3d f4 3f 9f 79 e8 27 00 6e ff 49 89 c7 0f 1f 44 00 00 31 ff e8 68 0d 6e ff 80 7d d0 00 0f 85 5e 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6a 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e5 03 00 00
Jan 24 03:01:05 fava2 kernel: RSP: 0018:ffffaca706443e38 EFLAGS: 00000246
Jan 24 03:01:05 fava2 kernel: RAX: ffff90d9bfcb0bc0 RBX: ffffcc96ffc81ea8 RCX: 0000000000000000
Jan 24 03:01:05 fava2 kernel: RDX: 00000000000a12fb RSI: 00000000313b14ef RDI: 0000000000000000
Jan 24 03:01:05 fava2 kernel: RBP: ffffaca706443e88 R08: 000264166945108a R09: 00026412d98d1274
Jan 24 03:01:05 fava2 kernel: R10: 00026412d134d774 R11: 071c71c71c71c71c R12: ffffffff87ad4420
Jan 24 03:01:05 fava2 kernel: R13: 0000000000000004 R14: 0000000000000004 R15: 000264166945108a
Jan 24 03:01:05 fava2 kernel:  ? cpuidle_enter_state+0xc8/0x620
Jan 24 03:01:05 fava2 kernel:  cpuidle_enter+0x2e/0x50
Jan 24 03:01:05 fava2 kernel:  do_idle+0x20d/0x2b0
Jan 24 03:01:05 fava2 kernel:  cpu_startup_entry+0x20/0x30
Jan 24 03:01:05 fava2 kernel:  start_secondary+0x12a/0x180
Jan 24 03:01:05 fava2 kernel:  secondary_startup_64_no_verify+0xc2/0xcb
Jan 24 03:01:05 fava2 kernel:  </TASK>

Unfortunately, I don't have the expertise to analyze the above to look into it further, other than noting that it seems to have something to do with openvswitch. I do use openvswitch at the host level for channel bonding. So far, it seems to be working just fine, though I did note some error messages, as follows:

Code:
Jan 24 16:05:55 fava2 systemd-udevd[68844]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 24 16:05:55 fava2 systemd-udevd[68844]: Using default interface naming scheme 'v247'.
Jan 24 16:05:56 fava2 ovs-vsctl[68966]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port veth113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68966]: ovs|00002|db_ctl_base|ERR|no port named veth113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68967]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68967]: ovs|00002|db_ctl_base|ERR|no port named fwln113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68968]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl -- add-port vmbr0 veth113i0 tag=20 -- set Interface veth113i0 mtu_request=1500
Jan 24 16:05:56 fava2 kernel: device veth113i0 entered promiscuous mode

I don't think the above errors relate to the system hanging though - a quick scan through shows the same errors popping up at various times over the past few months. They haven't seemed to cause any problems in the past - networking has worked just fine despite these errors.

If anyone has any thoughts or suggestions on how I can further diagnose and/or fix this I'd be very grateful.
 
  • Like
Reactions: toplus
Hi,
I have the same logs in PVE8.0.3 for ovs logs :
Code:
Jan 24 16:05:56 fava2 ovs-vsctl[68966]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port veth113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68966]: ovs|00002|db_ctl_base|ERR|no port named veth113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68967]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68967]: ovs|00002|db_ctl_base|ERR|no port named fwln113i0

Yet I don't have such kind of issue you have above (I have other for sure but not related to OVS apparently.

Regarding the mentioned logs, do you use vlans, it seems that if vlan are not known by the host then ovs will product this kind of logs.
 
Hi there. I'm running PVE 7.3-3 on a single node which hosts a bunch of LXC containers. The machine has 128GB of RAM.

This morning, I discovered that the system was non-responsive. Web UIs for various containerized apps didn't respond; couldn't SSH in; and also couldn't login at a console - it showed the login prompt, but didn't respond to any keystrokes. Everything seemed to be in order once I powered down and powered back up, as far as I could tell. However, thought this was a bit worrisome and was hoping to try to diagnose it further to figure out what went wrong, but not quite sure how to proceed.

Based on a review of the system logs, the problem seemed to start at 3:01 am:

Code:
Jan 24 03:01:05 fava2 kernel: watchdog: BUG: soft lockup - CPU#18 stuck for 26s! [swapper/18:0]

My understanding is that this type of error has something to do with overtaxed resources. However, looking at the Proxmox webgui, at 3:00 am I see CPU at 7.13%, IO delay at 2.14%, Server load at 4.18 and RAM usage at 68.36GB (about 53% of total RAM). So it looks like the problem isn't overtaxed hardware, as far as I can tell.

I do have a script that starts running at 3:00 am every night, so thought that might be the cause, but it doesn't seem that's the case. The first thing the script does is to shutdown all containers, and the problem seems to arise just as it's shutting down one of the containers - the last line in the log output from the script is the following:

Code:
Jan 24 03:01:37 fava2 bash[608776]: command 'lxc-stop -n 111 --nokill --timeout 60' failed: got timeout

Lastly, after the first error message about the soft lockup, there's this:

Code:
Jan 24 03:01:05 fava2 kernel: Modules linked in: ip6t_REJECT nf_reject_ipv6 nft_counter ipt_REJECT nf_reject_ipv4 xt_addrtype xt_mark nft_compat nft_fib_ipv4 nft_ct nft_fib_ipv6 nft_fib binfmt_misc veth ebtable_filter ebtables ip_set ip>
Jan 24 03:01:05 fava2 kernel:  chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) >
Jan 24 03:01:05 fava2 kernel: CPU: 18 PID: 0 Comm: swapper/18 Tainted: P           O      5.15.83-1-pve #1
Jan 24 03:01:05 fava2 kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.3 05/23/2018
Jan 24 03:01:05 fava2 kernel: RIP: 0010:netdev_pick_tx+0xd9/0x310
Jan 24 03:01:05 fava2 kernel: Code: 84 2e 01 00 00 41 0f b7 46 7c 66 85 c0 0f 84 53 01 00 00 8d 48 ff 0f b7 c1 66 39 ca 0f 86 dd 01 00 00 39 c3 0f 87 55 01 00 00 <29> d8 39 c3 0f 87 4b 01 00 00 eb f4 0f 1f 44 00 00 49 8b 94 24 28
Jan 24 03:01:05 fava2 kernel: RSP: 0018:ffffaca706864618 EFLAGS: 00000297
Jan 24 03:01:05 fava2 kernel: RAX: 0000000000000007 RBX: 0000000000000000 RCX: 00000000000000da
Jan 24 03:01:05 fava2 kernel: RDX: 0000000000000000 RSI: ffff90cde9bca400 RDI: ffff90da06602000
Jan 24 03:01:05 fava2 kernel: RBP: ffffaca706864658 R08: 0000000000000000 R09: ffff90cde9bca400
Jan 24 03:01:05 fava2 kernel: R10: ffffaca706864970 R11: ffffaca7068649a0 R12: ffff90da06602000
Jan 24 03:01:05 fava2 kernel: R13: 00000000ffffffff R14: ffff90cde9bca400 R15: ffff90da06602000
Jan 24 03:01:05 fava2 kernel: FS:  0000000000000000(0000) GS:ffff90d9bfc80000(0000) knlGS:0000000000000000
Jan 24 03:01:05 fava2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 24 03:01:05 fava2 kernel: CR2: 00007fa57d7810a0 CR3: 0000001d1c810003 CR4: 00000000000606e0
Jan 24 03:01:05 fava2 kernel: Call Trace:
Jan 24 03:01:05 fava2 kernel:  <IRQ>
Jan 24 03:01:05 fava2 kernel:  netdev_core_pick_tx+0xa4/0xb0
Jan 24 03:01:05 fava2 kernel:  __dev_queue_xmit+0x1b8/0xb30
Jan 24 03:01:05 fava2 kernel:  ? netif_rx_internal+0x3a/0x100
Jan 24 03:01:05 fava2 kernel:  dev_queue_xmit+0x10/0x20
Jan 24 03:01:05 fava2 kernel:  ovs_vport_send+0xae/0x170 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  do_output+0x59/0x180 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  do_execute_actions+0xabf/0x1b40 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? update_process_times+0xc4/0xd0
Jan 24 03:01:05 fava2 kernel:  ? tick_do_update_jiffies64.part.0+0xa0/0xa0
Jan 24 03:01:05 fava2 kernel:  ? tick_do_update_jiffies64.part.0+0xa0/0xa0
Jan 24 03:01:05 fava2 kernel:  ? timerqueue_add+0xa7/0xd0
Jan 24 03:01:05 fava2 kernel:  ? __skb_flow_dissect+0xea9/0x1920
Jan 24 03:01:05 fava2 kernel:  ? lapic_next_deadline+0x2c/0x40
Jan 24 03:01:05 fava2 kernel:  ? flow_lookup.constprop.0+0x5c/0x110 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ovs_execute_actions+0x48/0x110 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? ovs_execute_actions+0x48/0x110 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ovs_dp_process_packet+0xa1/0x200 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? ovs_ct_update_key.isra.0+0xa8/0x120 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? ovs_ct_fill_key+0x1d/0x30 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? ovs_flow_key_extract+0x2da/0x350 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ovs_vport_receive+0x77/0xd0 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? __wake_up_common_lock+0x8a/0xc0
Jan 24 03:01:05 fava2 kernel:  ? __wake_up+0x13/0x20
Jan 24 03:01:05 fava2 kernel:  ? ep_poll_callback+0x23d/0x290
Jan 24 03:01:05 fava2 kernel:  ? __wake_up_common+0x7e/0x140
Jan 24 03:01:05 fava2 kernel:  netdev_frame_hook+0xdf/0x1b0 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  ? netdev_create+0x40/0x40 [openvswitch]
Jan 24 03:01:05 fava2 kernel:  __netif_receive_skb_core+0x238/0xef0
Jan 24 03:01:05 fava2 kernel:  ? kfree_skb_reason.part.0+0x37/0x70
Jan 24 03:01:05 fava2 kernel:  ? kfree_skb_reason+0x1e/0x60
Jan 24 03:01:05 fava2 kernel:  ? br_forward+0xe8/0x120
Jan 24 03:01:05 fava2 kernel:  __netif_receive_skb_list_core+0x107/0x260
Jan 24 03:01:05 fava2 kernel:  netif_receive_skb_list_internal+0x1a1/0x2c0
Jan 24 03:01:05 fava2 kernel:  ? dev_gro_receive+0x2d6/0x7b0
Jan 24 03:01:05 fava2 kernel:  ? kmem_cache_alloc+0x1ab/0x2f0
Jan 24 03:01:05 fava2 kernel:  ? __build_skb+0x26/0x60
Jan 24 03:01:05 fava2 kernel:  napi_complete_done+0x7a/0x1c0
Jan 24 03:01:05 fava2 kernel:  igb_poll+0xc95/0x1490 [igb]
Jan 24 03:01:05 fava2 kernel:  __napi_poll+0x33/0x180
Jan 24 03:01:05 fava2 kernel:  net_rx_action+0x126/0x280
Jan 24 03:01:05 fava2 kernel:  __do_softirq+0xd9/0x2ea
Jan 24 03:01:05 fava2 kernel:  irq_exit_rcu+0x94/0xc0
Jan 24 03:01:05 fava2 kernel:  common_interrupt+0x8e/0xa0
Jan 24 03:01:05 fava2 kernel:  </IRQ>
Jan 24 03:01:05 fava2 kernel:  <TASK>
Jan 24 03:01:05 fava2 kernel:  asm_common_interrupt+0x27/0x40
Jan 24 03:01:05 fava2 kernel: RIP: 0010:cpuidle_enter_state+0xd9/0x620
Jan 24 03:01:05 fava2 kernel: Code: 3d f4 3f 9f 79 e8 27 00 6e ff 49 89 c7 0f 1f 44 00 00 31 ff e8 68 0d 6e ff 80 7d d0 00 0f 85 5e 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6a 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e5 03 00 00
Jan 24 03:01:05 fava2 kernel: RSP: 0018:ffffaca706443e38 EFLAGS: 00000246
Jan 24 03:01:05 fava2 kernel: RAX: ffff90d9bfcb0bc0 RBX: ffffcc96ffc81ea8 RCX: 0000000000000000
Jan 24 03:01:05 fava2 kernel: RDX: 00000000000a12fb RSI: 00000000313b14ef RDI: 0000000000000000
Jan 24 03:01:05 fava2 kernel: RBP: ffffaca706443e88 R08: 000264166945108a R09: 00026412d98d1274
Jan 24 03:01:05 fava2 kernel: R10: 00026412d134d774 R11: 071c71c71c71c71c R12: ffffffff87ad4420
Jan 24 03:01:05 fava2 kernel: R13: 0000000000000004 R14: 0000000000000004 R15: 000264166945108a
Jan 24 03:01:05 fava2 kernel:  ? cpuidle_enter_state+0xc8/0x620
Jan 24 03:01:05 fava2 kernel:  cpuidle_enter+0x2e/0x50
Jan 24 03:01:05 fava2 kernel:  do_idle+0x20d/0x2b0
Jan 24 03:01:05 fava2 kernel:  cpu_startup_entry+0x20/0x30
Jan 24 03:01:05 fava2 kernel:  start_secondary+0x12a/0x180
Jan 24 03:01:05 fava2 kernel:  secondary_startup_64_no_verify+0xc2/0xcb
Jan 24 03:01:05 fava2 kernel:  </TASK>

Unfortunately, I don't have the expertise to analyze the above to look into it further, other than noting that it seems to have something to do with openvswitch. I do use openvswitch at the host level for channel bonding. So far, it seems to be working just fine, though I did note some error messages, as follows:

Code:
Jan 24 16:05:55 fava2 systemd-udevd[68844]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 24 16:05:55 fava2 systemd-udevd[68844]: Using default interface naming scheme 'v247'.
Jan 24 16:05:56 fava2 ovs-vsctl[68966]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port veth113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68966]: ovs|00002|db_ctl_base|ERR|no port named veth113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68967]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68967]: ovs|00002|db_ctl_base|ERR|no port named fwln113i0
Jan 24 16:05:56 fava2 ovs-vsctl[68968]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl -- add-port vmbr0 veth113i0 tag=20 -- set Interface veth113i0 mtu_request=1500
Jan 24 16:05:56 fava2 kernel: device veth113i0 entered promiscuous mode

I don't think the above errors relate to the system hanging though - a quick scan through shows the same errors popping up at various times over the past few months. They haven't seemed to cause any problems in the past - networking has worked just fine despite these errors.

If anyone has any thoughts or suggestions on how I can further diagnose and/or fix this I'd be very grateful.

I have the same result or almost same problem. It always happens after a random number of VM stop and starts...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!