No Connection if run with Kernel version >= 5.15.35

icodetea

New Member
Jul 3, 2022
7
0
1
since kernel update to 5.15.XX I run into problem that my proxmox after some moments terminates all connections:
no container, vm or proxmox itself is reachable.
if I login on machine and try solve it, even ping 8.8.8.8 does not work.
I already looked into /etc/network/interfaces:
Bash:
root@pve:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.2/24
        gateway 192.168.1.1
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0
so, nothing suspcious here, name of my eth is "enp1s0"...

I also already looked up if it was renamed: result no it was not:
Bash:
root@pve:~# lshw | grep -B 5 enp
                description: Ethernet interface
                product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
                vendor: Realtek Semiconductor Co., Ltd.
                physical id: 0
                bus info: pci@0000:01:00.0
                logical name: enp1s0
name matches
I attached syslog, and output if "ip address", "ip link" and "ip route".

Crazy thing is: if i reboot, select advanced boot options and boot with older kernel version (5.13.19-15) then connection works fine again.

So I cannot update my proxmox kernel anymore... any idea whats going on?
 

Attachments

Today I tried with most recen kernel version: 5.15.74 still same problem I am 4 Months behind with Kernel verison. Can someone find out how I can reanimate my NIC Controller to works with most recent updates? or at least how to track if there is a fix for?
 

Attachments

OK i found the interesting part in the Log,

here is similar problem: https://forum.proxmox.com/threads/n...88179_178a-transmit-queue-0-timed-out.117729/
I also tried the workaround here: (ethtook -K <interface> tso off gso off)
https://forum.proxmox.com/threads/e1000-driver-hang.58284/#post-279338
but no luck.
would be happy for any hints...


Code:
ov 23 17:52:38 pve kernel: ------------[ cut here ]------------
Nov 23 17:52:38 pve kernel: NETDEV WATCHDOG: enp1s0 (r8169): transmit queue 0 timed out
Nov 23 17:52:38 pve kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x277/0x280
Nov 23 17:52:38 pve kernel: Modules linked in: wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nf_conntrack_netlink xt_state tcp_diag inet_diag iptable_nat xt_nat nft_chain_nat xt_MASQUERADE nf_nat xfrm_user xfrm_algo nft_counter nft_compat overlay binfmt_misc veth ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac ipt_REJECT nf_reject_ipv4 xt_set xt_physdev xt_addrtype xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_tcpudp xt_mark iptable_filter bpfilter ip_set_hash_net ip_set nf_tables softdog bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal snd_hda_codec_hdmi intel_powerclamp mei_hdcp coretemp rtl8723be btcoexist rtl8723_common snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic i915 ledtrig_audio rtl_pci snd_hda_intel snd_intel_dspcfg rtlwifi kvm_intel btusb ttm uvcvideo
Nov 23 17:52:38 pve kernel:  btrtl videobuf2_vmalloc btbcm videobuf2_memops videobuf2_v4l2 kvm btintel videobuf2_common snd_intel_sdw_acpi mac80211 irqbypass drm_kms_helper videodev snd_hda_codec crct10dif_pclmul ghash_clmulni_intel bluetooth aesni_intel crypto_simd cryptd rapl hp_wmi snd_hda_core platform_profile joydev pcspkr mc intel_cstate sparse_keymap input_leds wmi_bmof ecdh_generic snd_hwdep cec rc_core ecc serio_raw snd_pcm cfg80211 efi_pstore snd_timer mei_me i2c_algo_bit fb_sys_fops syscopyarea sysfillrect snd at24 soundcore sysimgblt libarc4 mei mac_hid hp_accel lis3lv02d wireless_hotkey acpi_pad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c rtsx_pci_sdmmc xhci_pci
Nov 23 17:52:38 pve kernel:  crc32_pclmul rtsx_pci psmouse ahci ehci_pci i2c_i801 libahci ehci_hcd i2c_smbus lpc_ich r8169 xhci_pci_renesas realtek xhci_hcd wmi video
Nov 23 17:52:38 pve kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: P           O      5.15.74-1-pve #1
Nov 23 17:52:38 pve kernel: Hardware name: Hewlett-Packard HP 350 G2/803A, BIOS F.13 06/10/2015
Nov 23 17:52:38 pve kernel: RIP: 0010:dev_watchdog+0x277/0x280
Nov 23 17:52:38 pve kernel: Code: eb 97 48 8b 5d d0 c6 05 e5 a7 4d 01 01 48 89 df e8 0e 55 f9 ff 44 89 e1 48 89 de 48 c7 c7 10 9a ea 87 48 89 c2 e8 76 90 1c 00 <0f> 0b eb 80 e9 9a bf 25 00 0f 1f 44 00 00 55 49 89 ca 48 89 e5 41
Nov 23 17:52:38 pve kernel: RSP: 0018:ffffb054c0168e70 EFLAGS: 00010282
Nov 23 17:52:38 pve kernel: RAX: 0000000000000000 RBX: ffff8e8d8e07c000 RCX: 0000000000000000
Nov 23 17:52:38 pve kernel: RDX: ffff8e8fd1d2c240 RSI: ffff8e8fd1d20580 RDI: 0000000000000300
Nov 23 17:52:38 pve kernel: RBP: ffffb054c0168ea8 R08: 0000000000000003 R09: 0000000000000001
Nov 23 17:52:38 pve kernel: R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000
Nov 23 17:52:38 pve kernel: R13: ffff8e8d801fb880 R14: 0000000000000001 R15: ffff8e8d8e07c4c0
Nov 23 17:52:38 pve kernel: FS:  0000000000000000(0000) GS:ffff8e8fd1d00000(0000) knlGS:0000000000000000
Nov 23 17:52:38 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 23 17:52:38 pve kernel: CR2: 00007fe42f71e000 CR3: 00000002bb210006 CR4: 00000000003706e0
Nov 23 17:52:38 pve kernel: Call Trace:
Nov 23 17:52:38 pve kernel:  <IRQ>
Nov 23 17:52:38 pve kernel:  ? pfifo_fast_enqueue+0x160/0x160
Nov 23 17:52:38 pve kernel:  call_timer_fn+0x2b/0x120
Nov 23 17:52:38 pve kernel:  __run_timers.part.0+0x1e1/0x270
Nov 23 17:52:38 pve kernel:  ? ktime_get+0x46/0xc0
Nov 23 17:52:38 pve kernel:  ? native_x2apic_icr_read+0x20/0x20
Nov 23 17:52:38 pve kernel:  ? lapic_next_event+0x21/0x30
Nov 23 17:52:38 pve kernel:  ? clockevents_program_event+0xab/0x130
Nov 23 17:52:38 pve kernel:  run_timer_softirq+0x2a/0x60
Nov 23 17:52:38 pve kernel:  __do_softirq+0xd9/0x2ea
Nov 23 17:52:38 pve kernel:  irq_exit_rcu+0x94/0xc0
Nov 23 17:52:38 pve kernel:  sysvec_apic_timer_interrupt+0x80/0x90
Nov 23 17:52:38 pve kernel:  </IRQ>
Nov 23 17:52:38 pve kernel:  <TASK>
Nov 23 17:52:38 pve kernel:  asm_sysvec_apic_timer_interrupt+0x1b/0x20
Nov 23 17:52:38 pve kernel: RIP: 0010:cpuidle_enter_state+0xd9/0x620
Nov 23 17:52:38 pve kernel: Code: 3d 64 64 df 78 e8 27 24 6e ff 49 89 c7 0f 1f 44 00 00 31 ff e8 68 31 6e ff 80 7d d0 00 0f 85 5e 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6a 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e5 03 00 00
Nov 23 17:52:38 pve kernel: RSP: 0018:ffffb054c00efe38 EFLAGS: 00000246
Nov 23 17:52:38 pve kernel: RAX: ffff8e8fd1d30bc0 RBX: ffffd054bfd00000 RCX: 0000000000000000
Nov 23 17:52:38 pve kernel: RDX: 0000000000000006 RSI: 000000003a510811 RDI: 0000000000000000
Nov 23 17:52:38 pve kernel: RBP: ffffb054c00efe88 R08: 0000000ac0caf5c9 R09: 0000000000000000
Nov 23 17:52:38 pve kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff886d4280
Nov 23 17:52:38 pve kernel: R13: 0000000000000008 R14: 0000000000000008 R15: 0000000ac0caf5c9
Nov 23 17:52:38 pve kernel:  ? cpuidle_enter_state+0xc8/0x620
Nov 23 17:52:38 pve kernel:  cpuidle_enter+0x2e/0x50
Nov 23 17:52:38 pve kernel:  do_idle+0x20d/0x2b0
Nov 23 17:52:38 pve kernel:  cpu_startup_entry+0x20/0x30
Nov 23 17:52:38 pve kernel:  start_secondary+0x12a/0x180
Nov 23 17:52:38 pve kernel:  secondary_startup_64_no_verify+0xc2/0xcb
Nov 23 17:52:38 pve kernel:  </TASK>
Nov 23 17:52:38 pve kernel: ---[ end trace 2cf9f2d86c92dcd4 ]---
 
ok now i tried a workaround: I buyed a new usb ethernet adapter and installed it on proxmox:
I allocated him to another ip address and crated a new interface for him so my pve GUI is reachable now from 2 ip addresses.
it works again, fine with kenel version 5.13.19-6.
if i start it on newer kernel (5.15.74) my old problem still there, and when i try to reach proxmox usin the new USb Ethernet adapter, on browser I get no response, in syslog I see:
Code:
Dec 01 20:27:42 pve kernel: INFO: task kworker/3:2:199 blocked for more than 241 seconds.
Dec 01 20:27:42 pve kernel:       Tainted: P        W  O      5.15.74-1-pve #1
Dec 01 20:27:42 pve kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 01 20:27:42 pve kernel: task:kworker/3:2     state:D stack:    0 pid:  199 ppid:     2 flags:0x00004000
Dec 01 20:27:42 pve kernel: Workqueue: events rtl_task [r8169]
Dec 01 20:27:42 pve kernel: Call Trace:
Dec 01 20:27:42 pve kernel:  <TASK>
Dec 01 20:27:42 pve kernel:  __schedule+0x34e/0x1740
Dec 01 20:27:42 pve kernel:  ? hrtimer_reprogram+0x52/0xb0
Dec 01 20:27:42 pve kernel:  ? schedule+0x85/0x110
Dec 01 20:27:42 pve kernel:  ? schedule_hrtimeout_range_clock+0xa3/0x130
Dec 01 20:27:42 pve kernel:  schedule+0x69/0x110
Dec 01 20:27:42 pve kernel:  schedule_preempt_disabled+0xe/0x20
Dec 01 20:27:42 pve kernel:  __mutex_lock.constprop.0+0x255/0x480
Dec 01 20:27:42 pve kernel:  ? net_ratelimit+0x1c/0x30
Dec 01 20:27:42 pve kernel:  ? rtl_hw_start_8411_2+0x940/0x940 [r8169]
Dec 01 20:27:42 pve kernel:  __mutex_lock_slowpath+0x13/0x20
Dec 01 20:27:42 pve kernel:  mutex_lock+0x38/0x50
Dec 01 20:27:42 pve kernel:  rtl_reset_work+0x17b/0x460 [r8169]
Dec 01 20:27:42 pve kernel:  rtl_task+0x4d/0x70 [r8169]
Dec 01 20:27:42 pve kernel:  process_one_work+0x22b/0x3d0
Dec 01 20:27:42 pve kernel:  worker_thread+0x53/0x420
Dec 01 20:27:42 pve kernel:  ? process_one_work+0x3d0/0x3d0
Dec 01 20:27:42 pve kernel:  kthread+0x12a/0x150
Dec 01 20:27:42 pve kernel:  ? set_kthread_struct+0x50/0x50
Dec 01 20:27:42 pve kernel:  ret_from_fork+0x22/0x30
Dec 01 20:27:42 pve kernel:  </TASK>

serious guys, I know it probably again kernel Problems but I need help or I need to throw proxmox away....