NETDEV WATCHDOG r8169 transmit queue 0 timed out

ulf.kosack

Renowned Member
Jan 28, 2012
49
7
73
Wachtberg
www.edvnet-uk.com
Hi there,

at the moment I'm searching for a solution my 4-port-nic with realtek chip not working. After boot I have a kernel dump in the log

Code:
Sep 11 13:14:13 pve05 kernel: ------------[ cut here ]------------
Sep 11 13:14:13 pve05 kernel: NETDEV WATCHDOG: enp47s0 (r8169): transmit queue 0 timed out
Sep 11 13:14:13 pve05 kernel: WARNING: CPU: 11 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x277/0x280
Sep 11 13:14:13 pve05 kernel: Modules linked in: ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd amdgpu snd_hda_codec_hdmi kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core irqbypass crct10dif_pclmul snd_pci_acp6x ghash_clmulni_intel snd_hwdep ast iommu_v2 snd_pcm aesni_intel drm_vram_helper gpu_sched drm_ttm_helper snd_timer snd_pci_acp5x ttm crypto_simd snd cryptd snd_rn_pci_acp3x rapl wmi_bmof pcspkr k10temp joydev snd_pci_acp3x soundcore drm_kms_helper ccp cdc_ether usbnet input_leds cec rc_core mii fb_sys_fops syscopyarea sysfillrect sysimgblt acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O)
Sep 11 13:14:13 pve05 kernel:  zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq simplefb usbmouse hid_generic usbkbd usbhid hid xhci_pci xhci_pci_renesas crc32_pclmul i2c_piix4 r8169 nvme realtek ahci igb xhci_hcd libahci bnx2x i2c_algo_bit nvme_core dca mdio libcrc32c wmi video
Sep 11 13:14:13 pve05 kernel: CPU: 11 PID: 0 Comm: swapper/11 Tainted: P           O      5.15.53-1-pve #1
Sep 11 13:14:13 pve05 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570D4U, BIOS P1.20 05/19/2021
Sep 11 13:14:13 pve05 kernel: RIP: 0010:dev_watchdog+0x277/0x280
Sep 11 13:14:13 pve05 kernel: Code: eb 97 48 8b 5d d0 c6 05 81 c0 4d 01 01 48 89 df e8 ee 57 f9 ff 44 89 e1 48 89 de 48 c7 c7 48 82 ca 9b 48 89 c2 e8 81 7e 1c 00 <0f> 0b eb 80 e9 07 ac 25 00 0f 1f 44 00 00 55 49 89 ca 48 89 e5 41
Sep 11 13:14:13 pve05 kernel: RSP: 0018:ffff9e0a804a0e70 EFLAGS: 00010282
Sep 11 13:14:13 pve05 kernel: RAX: 0000000000000000 RBX: ffff919195074000 RCX: 0000000000000000
Sep 11 13:14:13 pve05 kernel: RDX: ffff91b0bdcec240 RSI: ffff91b0bdce0580 RDI: 0000000000000300
Sep 11 13:14:13 pve05 kernel: RBP: ffff9e0a804a0ea8 R08: 0000000000000003 R09: 0000000000000001
Sep 11 13:14:13 pve05 kernel: R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000
Sep 11 13:14:13 pve05 kernel: R13: ffff919195052c80 R14: 0000000000000001 R15: ffff9191950744c0
Sep 11 13:14:13 pve05 kernel: FS:  0000000000000000(0000) GS:ffff91b0bdcc0000(0000) knlGS:0000000000000000
Sep 11 13:14:13 pve05 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 11 13:14:13 pve05 kernel: CR2: 00007f2660006558 CR3: 0000001fdda10000 CR4: 0000000000750ee0
Sep 11 13:14:13 pve05 kernel: PKRU: 55555554
Sep 11 13:14:13 pve05 kernel: Call Trace:
Sep 11 13:14:13 pve05 kernel:  <IRQ>
Sep 11 13:14:13 pve05 kernel:  ? pfifo_fast_enqueue+0x160/0x160
Sep 11 13:14:13 pve05 kernel:  call_timer_fn+0x2b/0x120
Sep 11 13:14:13 pve05 kernel:  __run_timers.part.0+0x1e1/0x270
Sep 11 13:14:13 pve05 kernel:  ? ktime_get+0x46/0xc0
Sep 11 13:14:13 pve05 kernel:  ? native_x2apic_icr_read+0x20/0x20
Sep 11 13:14:13 pve05 kernel:  ? lapic_next_event+0x21/0x30
Sep 11 13:14:13 pve05 kernel:  ? clockevents_program_event+0xab/0x130
Sep 11 13:14:13 pve05 kernel:  run_timer_softirq+0x2a/0x60
Sep 11 13:14:13 pve05 kernel:  __do_softirq+0xd9/0x2ea
Sep 11 13:14:13 pve05 kernel:  irq_exit_rcu+0x94/0xc0
Sep 11 13:14:13 pve05 kernel:  sysvec_apic_timer_interrupt+0x80/0x90
Sep 11 13:14:13 pve05 kernel:  </IRQ>
Sep 11 13:14:13 pve05 kernel:  <TASK>
Sep 11 13:14:13 pve05 kernel:  asm_sysvec_apic_timer_interrupt+0x1a/0x20
Sep 11 13:14:13 pve05 kernel: RIP: 0010:cpuidle_enter_state+0xd9/0x620
Sep 11 13:14:13 pve05 kernel: Code: 3d 84 7d ff 64 e8 07 3a 6e ff 49 89 c7 0f 1f 44 00 00 31 ff e8 48 47 6e ff 80 7d d0 00 0f 85 5e 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6a 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e5 03 00 00
Sep 11 13:14:13 pve05 kernel: RSP: 0018:ffff9e0a801bfe38 EFLAGS: 00000246
Sep 11 13:14:13 pve05 kernel: RAX: ffff91b0bdcf0bc0 RBX: ffff9191867f2c00 RCX: 0000000587bea45e
Sep 11 13:14:13 pve05 kernel: RDX: 000000000000051e RSI: 0000000587bea45e RDI: 0000000000000000
Sep 11 13:14:13 pve05 kernel: RBP: ffff9e0a801bfe88 R08: 0000000587bea97c R09: 00000000000aae60
Sep 11 13:14:13 pve05 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff9c4e6a00
Sep 11 13:14:13 pve05 kernel: R13: 0000000000000003 R14: 0000000000000003 R15: 0000000587bea97c
Sep 11 13:14:13 pve05 kernel:  ? cpuidle_enter_state+0xc8/0x620
Sep 11 13:14:13 pve05 kernel:  cpuidle_enter+0x2e/0x50
Sep 11 13:14:13 pve05 kernel:  do_idle+0x20d/0x2b0
Sep 11 13:14:13 pve05 kernel:  cpu_startup_entry+0x20/0x30
Sep 11 13:14:13 pve05 kernel:  start_secondary+0x12a/0x180
Sep 11 13:14:13 pve05 kernel:  secondary_startup_64_no_verify+0xc2/0xcb
Sep 11 13:14:13 pve05 kernel:  </TASK>
Sep 11 13:14:13 pve05 kernel: ---[ end trace e7be187b3d8bae88 ]---

For this card

Bash:
root@pve05:~# lspci -v -s 2f:00.0
2f:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
        Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
        Flags: bus master, fast devsel, latency 0, IRQ 71, IOMMU group 27
        I/O ports at b000 [size=256]
        Memory at fc100000 (64-bit, non-prefetchable) [size=4K]
        Memory at e2d00000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
        Capabilities: [170] Latency Tolerance Reporting
        Kernel driver in use: r8169
        Kernel modules: r8169

In journal I've got this message several times

Code:
Sep 11 13:18:57 pve05 kernel: r8169 0000:2f:00.0 enp47s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).

If I use this nic for vmbr interface, vm's in the same subnet a not pingable.

If I use the buildin nic the same vm is pingable.

Code:
pve-manager/7.2-7/d0dd0e85 (running kernel: 5.15.53-1-pve)
 
Base Board Information
        Manufacturer: ASRockRack
        Product Name: X570D4U

Handle 0x000D, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: Multi-bit ECC
        Maximum Capacity: 128 GB
        Error Information Handle: 0x000C
        Number Of Devices: 4

Handle 0x000E, DMI type 19, 31 bytes
Memory Array Mapped Address
        Starting Address: 0x00000000000
        Ending Address: 0x01FFFFFFFFF
        Range Size: 128 GB
        Physical Array Handle: 0x000D
        Partition Width: 4

Processor Information
        Socket Designation: CPUSocket
        Type: Central Processor
        Family: Zen
        Manufacturer: Advanced Micro Devices, Inc.
        ID: 00 0F A5 00 FF FB 8B 17
        Signature: Family 25, Model 80, Stepping 0
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                MCE (Machine check exception)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                CLFSH (CLFLUSH instruction supported)
                MMX (MMX technology supported)
                FXSR (FXSAVE and FXSTOR instructions supported)
                SSE (Streaming SIMD extensions)
                SSE2 (Streaming SIMD extensions 2)
                HTT (Multi-threading)
        Version: AMD Ryzen 7 PRO 5750G with Radeon Graphics
        Voltage: 1.4 V
        External Clock: 100 MHz
        Max Speed: 4650 MHz
        Current Speed: 3800 MHz
        Status: Populated, Enabled
        Upgrade: Socket AM4
...
        Core Count: 8
        Core Enabled: 8
        Thread Count: 16
        Characteristics:
                64-bit capable
                Multi-Core
                Hardware Thread
                Execute Protection
                Enhanced Virtualization
                Power/Performance Control

How can I get the nic up and running?

Thanks
Ulf
 
Last edited:
With kernel 6.2 (Proxmox 8) this rears it head again.
I have some SBC's (can change the controller), that crashes the r8169 network adapter with

Code:
[75932.570796] r8169 0000:02:00.0 eno0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
[75933.163139] r8169 0000:02:00.0 eno0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[75933.163776] r8169 0000:02:00.0 eno0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[75933.164417] r8169 0000:02:00.0 eno0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[75933.264204] vmbr0: port 1(eno0) entered blocking state
[75933.264208] vmbr0: port 1(eno0) entered disabled state
[75933.264253] device eno0 entered promiscuous mode
[75933.268756] r8169 0000:02:00.0 eno0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[75933.293554] Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
[75933.294089] r8169 0000:02:00.0 eno0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[75933.334030] r8169 0000:02:00.0 eno0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[75933.350024] r8169 0000:02:00.0 eno0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[75933.350617] r8169 0000:02:00.0 eno0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[75933.351204] r8169 0000:02:00.0 eno0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[75933.984846] Generic FE-GE Realtek PHY r8169-0-200:00: r8169_apply_firmware failed: -110
[75934.593845] Generic FE-GE Realtek PHY r8169-0-200:00: phy_poll_reset failed: -110

And:

Code:
2023-07-14T20:40:20.269783+02:00 pve2 kernel: [ 8714.536952] ------------[ cut here ]------------
2023-07-14T20:40:20.269790+02:00 pve2 kernel: [ 8714.536961] NETDEV WATCHDOG: eno0 (r8169): transmit queue 0 timed out
2023-07-14T20:40:20.269794+02:00 pve2 kernel: [ 8714.536971] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x23a/0x250
2023-07-14T20:40:20.269864+02:00 pve2 kernel: [ 8714.536976] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_NFLOG xt_limit xt_physdev xt_addrtype xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_tcpudp xt_set xt_mark iptable_filter bpfilter ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel nf_tables scsi_transport_iscsi nvme_fabrics 8021q garp mrp bonding tls softdog nfnetlink_log nfnetlink sunrpc binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core intel_tcc_cooling snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus x86_pkg_temp_thermal intel_powerclamp snd_soc_core coretemp i915 snd_compress ac97_bus
2023-07-14T20:40:20.269979+02:00 pve2 kernel: [ 8714.537053]  snd_pcm_dmaengine kvm_intel drm_buddy video wmi kvm ttm snd_hda_intel irqbypass snd_intel_dspcfg drm_display_helper ath10k_pci crct10dif_pclmul polyval_clmulni snd_intel_sdw_acpi snd_hda_codec polyval_generic cec ath10k_core ghash_clmulni_intel btusb snd_hda_core rc_core sha512_ssse3 btrtl ath snd_hwdep btbcm drm_kms_helper aesni_intel snd_pcm mac80211 btintel processor_thermal_device_pci_legacy i2c_algo_bit snd_timer processor_thermal_device crypto_simd btmtk syscopyarea intel_rapl_msr cmdlinepart cryptd processor_thermal_rfim processor_thermal_mbox snd sysfillrect spi_nor bluetooth sysimgblt rapl cfg80211 ecdh_generic processor_thermal_rapl soundcore ecc intel_rapl_common libarc4 mtd pcspkr ee1004 intel_cstate int340x_thermal_zone intel_soc_dts_iosf joydev input_leds intel_pch_thermal mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap dm_multipath drm efi_pstore dmi_sysfs ip_tables x_tables autofs4
2023-07-14T20:40:20.270021+02:00 pve2 kernel: [ 8714.537171]  xfs btrfs blake2b_generic xor raid6_pq simplefb hid_generic usbkbd usbmouse uas usbhid usb_storage hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xhci_pci nvme spi_intel_pci xhci_pci_renesas crc32_pclmul r8169 ahci nvme_core spi_intel i2c_i801 i2c_smbus nvme_common realtek xhci_hcd libahci pinctrl_cannonlake
2023-07-14T20:40:20.270024+02:00 pve2 kernel: [ 8714.537216] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           O       6.2.16-4-pve #1
2023-07-14T20:40:20.270024+02:00 pve2 kernel: [ 8714.537219] Hardware name: Nitrokey NitroPC/NitroPC, BIOS 4.13-dirty 11/20/2020
2023-07-14T20:40:20.270030+02:00 pve2 kernel: [ 8714.537220] RIP: 0010:dev_watchdog+0x23a/0x250
2023-07-14T20:40:20.270031+02:00 pve2 kernel: [ 8714.537225] Code: 00 e9 2b ff ff ff 48 89 df c6 05 4a 6c 7d 01 01 e8 6b 08 f8 ff 44 89 f1 48 89 de 48 c7 c7 98 64 e0 b3 48 89 c2 e8 86 a6 30 ff <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
2023-07-14T20:40:20.270036+02:00 pve2 kernel: [ 8714.537226] RSP: 0018:ffffb5d500003e38 EFLAGS: 00010246
2023-07-14T20:40:20.270039+02:00 pve2 kernel: [ 8714.537229] RAX: 0000000000000000 RBX: ffff8d73c1cd0000 RCX: 0000000000000000
2023-07-14T20:40:20.270042+02:00 pve2 kernel: [ 8714.537231] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2023-07-14T20:40:20.270042+02:00 pve2 kernel: [ 8714.537232] RBP: ffffb5d500003e68 R08: 0000000000000000 R09: 0000000000000000
2023-07-14T20:40:20.270043+02:00 pve2 kernel: [ 8714.537233] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d73c1cd04c8
2023-07-14T20:40:20.270043+02:00 pve2 kernel: [ 8714.537235] R13: ffff8d73c1cd041c R14: 0000000000000000 R15: 0000000000000000
2023-07-14T20:40:20.270044+02:00 pve2 kernel: [ 8714.537236] FS:  0000000000000000(0000) GS:ffff8d82de400000(0000) knlGS:0000000000000000
2023-07-14T20:40:20.270045+02:00 pve2 kernel: [ 8714.537238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2023-07-14T20:40:20.270048+02:00 pve2 kernel: [ 8714.537239] CR2: 00007fe6d42ec3e0 CR3: 0000000496010005 CR4: 00000000003706f0
2023-07-14T20:40:20.270051+02:00 pve2 kernel: [ 8714.537241] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2023-07-14T20:40:20.270051+02:00 pve2 kernel: [ 8714.537242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2023-07-14T20:40:20.270051+02:00 pve2 kernel: [ 8714.537243] Call Trace:
2023-07-14T20:40:20.270052+02:00 pve2 kernel: [ 8714.537245]  <IRQ>
2023-07-14T20:40:20.270057+02:00 pve2 kernel: [ 8714.537247]  ? __pfx_dev_watchdog+0x10/0x10
2023-07-14T20:40:20.270058+02:00 pve2 kernel: [ 8714.537251]  call_timer_fn+0x29/0x160
2023-07-14T20:40:20.270062+02:00 pve2 kernel: [ 8714.537254]  ? __pfx_dev_watchdog+0x10/0x10
2023-07-14T20:40:20.270063+02:00 pve2 kernel: [ 8714.537257]  __run_timers+0x259/0x310
2023-07-14T20:40:20.270068+02:00 pve2 kernel: [ 8714.537259]  run_timer_softirq+0x1d/0x40
2023-07-14T20:40:20.270069+02:00 pve2 kernel: [ 8714.537262]  __do_softirq+0xd6/0x346
2023-07-14T20:40:20.270074+02:00 pve2 kernel: [ 8714.537265]  ? hrtimer_interrupt+0x11f/0x250
2023-07-14T20:40:20.270075+02:00 pve2 kernel: [ 8714.537268]  __irq_exit_rcu+0xa2/0xd0
2023-07-14T20:40:20.270078+02:00 pve2 kernel: [ 8714.537271]  irq_exit_rcu+0xe/0x20
2023-07-14T20:40:20.270079+02:00 pve2 kernel: [ 8714.537273]  sysvec_apic_timer_interrupt+0x92/0xd0
2023-07-14T20:40:20.270080+02:00 pve2 kernel: [ 8714.537275]  </IRQ>
2023-07-14T20:40:20.270082+02:00 pve2 kernel: [ 8714.537276]  <TASK>
2023-07-14T20:40:20.270085+02:00 pve2 kernel: [ 8714.537278]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
2023-07-14T20:40:20.270087+02:00 pve2 kernel: [ 8714.537280] RIP: 0010:cpuidle_enter_state+0xde/0x6f0
2023-07-14T20:40:20.270091+02:00 pve2 kernel: [ 8714.537283] Code: 27 f7 4c e8 d4 79 4a ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 02 82 49 ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 12 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c7 04 00 00
2023-07-14T20:40:20.270092+02:00 pve2 kernel: [ 8714.537285] RSP: 0018:ffffffffb4603da8 EFLAGS: 00000246
2023-07-14T20:40:20.270093+02:00 pve2 kernel: [ 8714.537287] RAX: 0000000000000000 RBX: ffffd5d4ffc00000 RCX: 0000000000000000
2023-07-14T20:40:20.270094+02:00 pve2 kernel: [ 8714.537289] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2023-07-14T20:40:20.270099+02:00 pve2 kernel: [ 8714.537290] RBP: ffffffffb4603df8 R08: 0000000000000000 R09: 0000000000000000
2023-07-14T20:40:20.270100+02:00 pve2 kernel: [ 8714.537291] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb48c33a0
2023-07-14T20:40:20.270103+02:00 pve2 kernel: [ 8714.537292] R13: 0000000000000008 R14: 0000000000000008 R15: 000007ed02e101ea
2023-07-14T20:40:20.270105+02:00 pve2 kernel: [ 8714.537294]  ? cpuidle_enter_state+0xce/0x6f0
2023-07-14T20:40:20.270105+02:00 pve2 kernel: [ 8714.537296]  cpuidle_enter+0x2e/0x50
2023-07-14T20:40:20.270108+02:00 pve2 kernel: [ 8714.537298]  do_idle+0x216/0x2a0
2023-07-14T20:40:20.270110+02:00 pve2 kernel: [ 8714.537301]  cpu_startup_entry+0x1d/0x20
2023-07-14T20:40:20.270110+02:00 pve2 kernel: [ 8714.537303]  rest_init+0xdc/0x100
2023-07-14T20:40:20.270115+02:00 pve2 kernel: [ 8714.537305]  ? acpi_enable_subsystem+0xe6/0x2a0
2023-07-14T20:40:20.270116+02:00 pve2 kernel: [ 8714.537308]  arch_call_rest_init+0xe/0x30
2023-07-14T20:40:20.270119+02:00 pve2 kernel: [ 8714.537311]  start_kernel+0x6b0/0xb80
2023-07-14T20:40:20.270121+02:00 pve2 kernel: [ 8714.537314]  ? load_ucode_intel_bsp+0x3d/0x80
2023-07-14T20:40:20.270125+02:00 pve2 kernel: [ 8714.537317]  x86_64_start_kernel+0x102/0x180
2023-07-14T20:40:20.270128+02:00 pve2 kernel: [ 8714.537319]  secondary_startup_64_no_verify+0xe5/0xeb
2023-07-14T20:40:20.270129+02:00 pve2 kernel: [ 8714.537323]  </TASK>
2023-07-14T20:40:20.270130+02:00 pve2 kernel: [ 8714.537324] ---[ end trace 0000000000000000 ]---
2023-07-14T20:40:20.301617+02:00 pve2 kernel: [ 8714.566763] r8169 0000:02:00.0 eno0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
2023-07-14T20:40:20.301621+02:00 pve2 kernel: [ 8714.568345] r8169 0000:02:00.0 eno0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
2023-07-14T20:40:20.305626+02:00 pve2 kernel: [ 8714.569924] r8169 0000:02:00.0 eno0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
2023-07-14T20:40:20.305630+02:00 pve2 kernel: [ 8714.571439] r8169 0000:02:00.0 eno0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
2023-07-14T20:40:20.309630+02:00 pve2 kernel: [ 8714.572970] r8169 0000:02:00.0 eno0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
2023-07-14T20:40:20.309633+02:00 pve2 kernel: [ 8714.574483] r8169 0000:02:00.0 eno0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
2023-07-14T20:40:20.309634+02:00 pve2 kernel: [ 8714.575973] r8169 0000:02:00.0 eno0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
2023-07-14T20:40:20.337635+02:00 pve2 kernel: [ 8714.602138] r8169 0000:02:00.0 eno0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
2023-07-14T20:40:20.365704+02:00 pve2 kernel: [ 8714.628860] r8169 0000:02:00.0 eno0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
2023-07-14T20:40:20.389621+02:00 pve2 kernel: [ 8714.655522] r8169 0000:02:00.0 eno0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
2023-07-14T20:40:20.633665+02:00 pve2 kernel: [ 8714.897583] r8169 0000:02:00.0 eno0: Link is Down
2023-07-14T20:40:20.633685+02:00 pve2 kernel: [ 8714.897602] vmbr0: port 1(eno0) entered disabled state
2023-07-14T20:40:20.633686+02:00 pve2 kernel: [ 8714.897694] vmbr11: port 1(vlan11) entered disabled state
2023-07-14T20:40:20.633686+02:00 pve2 kernel: [ 8714.897816] vmbr12: port 1(vlan12) entered disabled state
2023-07-14T20:40:20.633687+02:00 pve2 kernel: [ 8714.897851] vmbr13: port 1(vlan13) entered disabled state
2023-07-14T20:40:20.633687+02:00 pve2 kernel: [ 8714.897901] vmbr14: port 1(vlan14) entered disabled state
2023-07-14T20:40:20.633688+02:00 pve2 kernel: [ 8714.897923] vmbr15: port 1(vlan15) entered disabled state
2023-07-14T20:40:20.633688+02:00 pve2 kernel: [ 8714.897956] vmbr20: port 1(vlan20) entered disabled state
2023-07-14T20:40:50.225688+02:00 pve2 kernel: [ 8744.488955]  session4: session recovery timed out after 120 secs
2023-07-14T20:40:52.273683+02:00 pve2 kernel: [ 8746.536868]  session3: session recovery timed out after 120 secs
2023-07-14T20:40:52.273690+02:00 pve2 kernel: [ 8746.536874]  session2: session recovery timed out after 120 secs
2023-07-14T20:40:52.273690+02:00 pve2 kernel: [ 8746.536881] sd 5:0:0:0: rejecting I/O to offline device
2023-07-14T20:40:52.273690+02:00 pve2 kernel: [ 8746.536883]  session5: session recovery timed out after 120 secs
2023-07-14T20:40:52.273691+02:00 pve2 kernel: [ 8746.536886] sd 4:0:0:0: rejecting I/O to offline device
2023-07-14T20:40:52.273691+02:00 pve2 kernel: [ 8746.536887]  session6: session recovery timed out after 120 secs
2023-07-14T20:40:52.273692+02:00 pve2 kernel: [ 8746.536887] I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
2023-07-14T20:40:52.273692+02:00 pve2 kernel: [ 8746.536894]  session1: session recovery timed out after 120 secs
2023-07-14T20:40:52.273693+02:00 pve2 kernel: [ 8746.536902] sd 8:0:0:0: rejecting I/O to offline device
2023-07-14T20:40:52.273693+02:00 pve2 kernel: [ 8746.536904] I/O error, dev sdg, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
2023-07-14T20:40:52.273693+02:00 pve2 kernel: [ 8746.536912] sd 3:0:0:0: rejecting I/O to offline device
2023-07-14T20:40:52.273694+02:00 pve2 kernel: [ 8746.536913] I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
2023-07-14T20:40:52.273694+02:00 pve2 kernel: [ 8746.537133] I/O error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class
The sd devices are iscsi disk.

Anyone any idea...
(as trial remedy i installed the r8168-dkms from the non-free repository)

There might be a link with undersized frames missing interupts ... (from other fora, where it exact errors are slightly different).

IMHO there is a link with Jumbo frames as crash can be created by sending a ping -s 8000 to the correct VLAN.
After a few packets the crash occurs.
 
Last edited:
Device info:
Code:
root@pve2:~# lspci -s 2:0.0 -vv
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
        DeviceName: Ethernet controller
        Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at 2000 [size=256]
        Region 2: Memory at 9fa04000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory at 9fa00000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 01
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Via message/WAKE#, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00000800
        Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: No such device
                Not readable
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
        Capabilities: [170 v1] Latency Tolerance Reporting
                Max snoop latency: 3145728ns
                Max no snoop latency: 3145728ns
        Capabilities: [178 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
                L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+
                           T_CommonMode=0us LTR1.2_Threshold=306176ns
                L1SubCtl2: T_PwrOn=150us
        Kernel driver in use: r8168
        Kernel modules: r8168
 
A description looks a lot like this one....
https://bugzilla.kernel.org/show_bug.cgi?id=107421#c36

I had the same issue (network driver crashes after a while) with kernel 5.x and 6.x on debian 11 with proxmox.

But I could resolve the problem disable ASPM:
Editing /etc/default/grub with:

GRUB_CMDLINE_LINUX="pcie_aspm=off pcie_port_pm=off"
....

I opted to specifically disable ASPM for the network adapter involved.
 
Last edited:
Tried disabling the ASPM using the echo 0 >/sys/.... method...
System continues to run also with some load on it.

It does cause some resets.
Code:
[Sun Jul 16 23:04:18 2023]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4322120976, last ping 4322122240, now 4322123520
[Sun Jul 16 23:04:18 2023]  connection2:0: detected conn error (1022)
[Sun Jul 16 23:04:49 2023] sd 4:0:0:1: Power-on or device reset occurred
[Sun Jul 16 23:10:12 2023]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4322209533, last ping 4322210784, now 4322212096
[Sun Jul 16 23:10:12 2023]  connection2:0: detected conn error (1022)
[Sun Jul 16 23:10:44 2023] sd 4:0:0:1: Power-on or device reset occurred
[Sun Jul 16 23:13:19 2023]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4322256320, last ping 4322257600, now 4322258880
[Sun Jul 16 23:13:19 2023]  connection2:0: detected conn error (1022)
[Sun Jul 16 23:13:44 2023] sd 4:0:0:1: Power-on or device reset occurred

It appearantly does cause fsck's on the container volumes active there.

Code:
[Sun Jul 16 18:01:18 2023] sd 4:0:0:1: Power-on or device reset occurred
[Sun Jul 16 18:31:48 2023] EXT4-fs (dm-17): error count since last fsck: 2
[Sun Jul 16 18:31:48 2023] EXT4-fs (dm-37): error count since last fsck: 16
[Sun Jul 16 18:31:48 2023] EXT4-fs (dm-39): error count since last fsck: 3
[Sun Jul 16 18:31:48 2023] EXT4-fs (dm-39): initial error at time 1681300837: kmmpd:179
[Sun Jul 16 18:31:48 2023] EXT4-fs (dm-39): last error at time 1681301477: ext4_journal_check_start:83
[Sun Jul 16 18:31:48 2023] EXT4-fs (dm-17): initial error at time 1681300837: kmmpd:179
[Sun Jul 16 18:31:48 2023] EXT4-fs (dm-37): initial error at time 1682805271: kmmpd:179
[Sun Jul 16 18:31:48 2023] EXT4-fs (dm-17): last error at time 1681301101: ext4_journal_check_start:83
[Sun Jul 16 18:31:48 2023] EXT4-fs (dm-37): last error at time 1682865000: ext4_discard_preallocations:5036
 
Last edited:
A description looks a lot like this one....
https://bugzilla.kernel.org/show_bug.cgi?id=107421#c36



I opted to specifically disable ASPM for the network adapter involved.
tried and it crashed again the next day. Kernel 6.5 has some ASPM relative commits added recently for r8169. Let's wait if they will work.

About the r8168 driver, for me everything is a little less worse, i.e instead of crashing the speed drops from 1Gbit to 10Mbit.
 
me have the same issue on my router withn proxmox7, idont wanna use pve8 since this router hosts is too cheap to run higher version of debian. will higher version kernel like 6.5 or 6.x will do good in pve7?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!