[SOLVED] Thunderbolt : Linx Kernel Error "kernel: thunderbolt 1-1: failed request link state change, aborting"

Both patches now also cherry-picked to our kernel, so those fixes will be included in the next kernel bump (probably done in a few days to ~two weeks max)
sweet, do you want me to test before the go live or wait until they do (also can't wait to get back on your kernel... seeing some weird ceph issues hard crashing the node...) :cool:
 
do you want me to test before the go live or wait until they do
No need to wait, that's what's our test repository is for.
So, I just uploaded that kernel proxmox-kernel-6.2.16-13-pve in version 6.2.16-13 and related packages to the pvetest repo from Proxmox VE 8

So you could just add that repo (e.g., add it via the Repository management UI in the Node panel) and either do a normal full-upgrade (that would pull in a few other test packages too) or to just pull in the kernel you can also run the following command after the test repo got added:

Code:
apt update
apt install proxmox-kernel-6.2

Might want to throw in proxmox-headers-6.2 if you need to have current kernel headers, e.g., for some DKMS build or the like.

Afterwards you can disable the test repo again.
 
  • Like
Reactions: scyto
No need to wait, that's what's our test repository is for.
looks like since your last post proxmox-kernel-6.2.16-14-pve was loaded up so i used that.

1. worked perfectly - i have IPv6 and i have resolved all disconnection issues (not surprising given how small and self contained they were)

2. i can confirm that reverting to 6.2.16-14 from 6.5.2 fixed my hard crash uploading large ISOs to a cephFS volume - amusingly uploading the ISO into cephFS to the node running 6.2.16.14 instantly crashed node2 and node 3 running 6.5.2... just FYI (i assume due to mismatch of ceph stuff in kernel vs user space)
 
Though should this be a cause for concern?


Code:
Sep 19 17:09:36 pve1 kernel: ------------[ cut here ]------------
Sep 19 17:09:36 pve1 kernel: thunderbolt 0000:00:0d.3: interrupt for TX ring 1 is already enabled
Sep 19 17:09:36 pve1 kernel: WARNING: CPU: 4 PID: 12826 at drivers/thunderbolt/nhi.c:137 ring_interrupt_active+0x270/0x310 [thunderbolt]
Sep 19 17:09:36 pve1 kernel: Modules linked in: tcp_diag inet_diag cmac nls_utf8 cifs cifs_arc4 rdma_cm iw_cm ib_cm ib_core cifs_md4 ceph libceph fscache netfs ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables nvme_fabrics bonding tls qrtr softdog sunrpc nfnetlink_log nfnetlink binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci iwlmvm snd_sof_xtensa_dsp intel_rapl_msr snd_sof intel_rapl_common snd_sof_utils mac80211 snd_soc_hdac_hda snd_hda_ext_core libarc4 snd_soc_acpi_intel_match snd_soc_acpi x86_pkg_temp_thermal soundwire_bus intel_powerclamp i915 snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine coretemp kvm_intel drm_buddy snd_hda_intel ttm snd_intel_dspcfg kvm snd_intel_sdw_acpi drm_display_helper irqbypass snd_hda_codec
Sep 19 17:09:36 pve1 kernel:  cec crct10dif_pclmul snd_hda_core polyval_clmulni btusb polyval_generic rc_core ghash_clmulni_intel btrtl sha512_ssse3 snd_hwdep drm_kms_helper btbcm aesni_intel iwlwifi snd_pcm btintel ov13858 i2c_algo_bit cmdlinepart mei_hdcp mei_pxp btmtk crypto_simd syscopyarea snd_timer spi_nor v4l2_fwnode sysfillrect pmt_telemetry cryptd pmt_class bluetooth snd v4l2_async ucsi_acpi mei_me typec_ucsi joydev videodev ecdh_generic rapl input_leds cfg80211 mtd soundcore intel_cstate pcspkr sysimgblt ee1004 wmi_bmof mei ecc typec intel_vsec mc acpi_pad acpi_tad mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap thunderbolt_net msr drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb uas usb_storage hid_generic usbmouse usbkbd usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme xhci_pci xhci_pci_renesas crc32_pclmul intel_lpss_pci spi_intel_pci
Sep 19 17:09:36 pve1 kernel:  thunderbolt i2c_i801 nvme_core ahci xhci_hcd igc spi_intel intel_lpss i2c_smbus video libahci idma64 nvme_common wmi pinctrl_tigerlake
Sep 19 17:09:36 pve1 kernel: CPU: 4 PID: 12826 Comm: kworker/4:1 Tainted: P     U  W  O       6.2.16-14-pve #1
Sep 19 17:09:36 pve1 kernel: Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, BIOS ANRPL357.0026.2023.0314.1458 03/14/2023
Sep 19 17:09:36 pve1 kernel: Workqueue: events_long tbnet_connected_work [thunderbolt_net]
Sep 19 17:09:36 pve1 kernel: RIP: 0010:ring_interrupt_active+0x270/0x310 [thunderbolt]
Sep 19 17:09:36 pve1 kernel: Code: 89 5d c8 44 89 45 d4 e8 fe 6b c4 ed 44 8b 45 d4 48 8b 4d c0 49 89 d9 48 8b 55 b8 48 89 c6 48 c7 c7 18 01 6a c0 e8 90 ff 27 ed <0f> 0b 44 8b 5d c8 49 8b 47 08 45 84 e4 0f 85 d9 fe ff ff f6 40 70
Sep 19 17:09:36 pve1 kernel: RSP: 0018:ffffafc8cf24bdb0 EFLAGS: 00010046
Sep 19 17:09:36 pve1 kernel: RAX: 0000000000000000 RBX: ffffffffc069e8ef RCX: 0000000000000000
Sep 19 17:09:36 pve1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Sep 19 17:09:36 pve1 kernel: RBP: ffffafc8cf24be00 R08: 0000000000000000 R09: 0000000000000000
Sep 19 17:09:36 pve1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Sep 19 17:09:36 pve1 kernel: R13: 0000000000000002 R14: 0000000000038200 R15: ffff890656211740
Sep 19 17:09:36 pve1 kernel: FS:  0000000000000000(0000) GS:ffff891497700000(0000) knlGS:0000000000000000
Sep 19 17:09:36 pve1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 19 17:09:36 pve1 kernel: CR2: 0000561901c0e420 CR3: 00000007a4410000 CR4: 0000000000750ee0
Sep 19 17:09:36 pve1 kernel: PKRU: 55555554
Sep 19 17:09:36 pve1 kernel: Call Trace:
Sep 19 17:09:36 pve1 kernel:  <TASK>
Sep 19 17:09:36 pve1 kernel:  ? show_regs+0x6d/0x80
Sep 19 17:09:36 pve1 kernel:  ? __warn+0x89/0x160
Sep 19 17:09:36 pve1 kernel:  ? ring_interrupt_active+0x270/0x310 [thunderbolt]
Sep 19 17:09:36 pve1 kernel:  ? report_bug+0x17e/0x1b0
Sep 19 17:09:36 pve1 kernel:  ? handle_bug+0x46/0x90
Sep 19 17:09:36 pve1 kernel:  ? exc_invalid_op+0x18/0x80
Sep 19 17:09:36 pve1 kernel:  ? asm_exc_invalid_op+0x1b/0x20
Sep 19 17:09:36 pve1 kernel:  ? ring_interrupt_active+0x270/0x310 [thunderbolt]
Sep 19 17:09:36 pve1 kernel:  tb_ring_start+0x17e/0x330 [thunderbolt]
Sep 19 17:09:36 pve1 kernel:  tbnet_connected_work+0xd7/0x310 [thunderbolt_net]
Sep 19 17:09:36 pve1 kernel:  ? queue_work_on+0x67/0x70
Sep 19 17:09:36 pve1 kernel:  process_one_work+0x222/0x430
Sep 19 17:09:36 pve1 kernel:  worker_thread+0x50/0x3e0
Sep 19 17:09:36 pve1 kernel:  ? __pfx_worker_thread+0x10/0x10
Sep 19 17:09:36 pve1 kernel:  kthread+0xe6/0x110
Sep 19 17:09:36 pve1 kernel:  ? __pfx_kthread+0x10/0x10
Sep 19 17:09:36 pve1 kernel:  ret_from_fork+0x29/0x50
Sep 19 17:09:36 pve1 kernel:  </TASK>
Sep 19 17:09:36 pve1 kernel: ---[ end trace 0000000000000000 ]---
Sep 19 17:09:36 pve1 kernel: ------------[ cut here ]------------
Sep 19 17:09:36 pve1 kernel: thunderbolt 0000:00:0d.3: interrupt for RX ring 1 is already enabled
Sep 19 17:09:36 pve1 kernel: WARNING: CPU: 4 PID: 12826 at drivers/thunderbolt/nhi.c:137 ring_interrupt_active+0x270/0x310 [thunderbolt]
Sep 19 17:09:36 pve1 kernel: Modules linked in: tcp_diag inet_diag cmac nls_utf8 cifs cifs_arc4 rdma_cm iw_cm ib_cm ib_core cifs_md4 ceph libceph fscache netfs ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables nvme_fabrics bonding tls qrtr softdog sunrpc nfnetlink_log nfnetlink binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci iwlmvm snd_sof_xtensa_dsp intel_rapl_msr snd_sof intel_rapl_common snd_sof_utils mac80211 snd_soc_hdac_hda snd_hda_ext_core libarc4 snd_soc_acpi_intel_match snd_soc_acpi x86_pkg_temp_thermal soundwire_bus intel_powerclamp i915 snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine coretemp kvm_intel drm_buddy snd_hda_intel ttm snd_intel_dspcfg kvm snd_intel_sdw_acpi drm_display_helper irqbypass snd_hda_codec
Sep 19 17:09:36 pve1 kernel:  cec crct10dif_pclmul snd_hda_core polyval_clmulni btusb polyval_generic rc_core ghash_clmulni_intel btrtl sha512_ssse3 snd_hwdep drm_kms_helper btbcm aesni_intel iwlwifi snd_pcm btintel ov13858 i2c_algo_bit cmdlinepart mei_hdcp mei_pxp btmtk crypto_simd syscopyarea snd_timer spi_nor v4l2_fwnode sysfillrect pmt_telemetry cryptd pmt_class bluetooth snd v4l2_async ucsi_acpi mei_me typec_ucsi joydev videodev ecdh_generic rapl input_leds cfg80211 mtd soundcore intel_cstate pcspkr sysimgblt ee1004 wmi_bmof mei ecc typec intel_vsec mc acpi_pad acpi_tad mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap thunderbolt_net msr drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb uas usb_storage hid_generic usbmouse usbkbd usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme xhci_pci xhci_pci_renesas crc32_pclmul intel_lpss_pci spi_intel_pci
Sep 19 17:09:36 pve1 kernel:  thunderbolt i2c_i801 nvme_core ahci xhci_hcd igc spi_intel intel_lpss i2c_smbus video libahci idma64 nvme_common wmi pinctrl_tigerlake
Sep 19 17:09:36 pve1 kernel: CPU: 4 PID: 12826 Comm: kworker/4:1 Tainted: P     U  W  O       6.2.16-14-pve #1
Sep 19 17:09:36 pve1 kernel: Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, BIOS ANRPL357.0026.2023.0314.1458 03/14/2023
Sep 19 17:09:36 pve1 kernel: Workqueue: events_long tbnet_connected_work [thunderbolt_net]
Sep 19 17:09:36 pve1 kernel: RIP: 0010:ring_interrupt_active+0x270/0x310 [thunderbolt]
Sep 19 17:09:36 pve1 kernel: Code: 89 5d c8 44 89 45 d4 e8 fe 6b c4 ed 44 8b 45 d4 48 8b 4d c0 49 89 d9 48 8b 55 b8 48 89 c6 48 c7 c7 18 01 6a c0 e8 90 ff 27 ed <0f> 0b 44 8b 5d c8 49 8b 47 08 45 84 e4 0f 85 d9 fe ff ff f6 40 70
Sep 19 17:09:36 pve1 kernel: RSP: 0018:ffffafc8cf24bdb0 EFLAGS: 00010046
Sep 19 17:09:36 pve1 kernel: RAX: 0000000000000000 RBX: ffffffffc069e8ef RCX: 0000000000000000
Sep 19 17:09:36 pve1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Sep 19 17:09:36 pve1 kernel: RBP: ffffafc8cf24be00 R08: 0000000000000000 R09: 0000000000000000
Sep 19 17:09:36 pve1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Sep 19 17:09:36 pve1 kernel: R13: 0000000000002000 R14: 0000000000038200 R15: ffff890656211800
Sep 19 17:09:36 pve1 kernel: FS:  0000000000000000(0000) GS:ffff891497700000(0000) knlGS:0000000000000000
Sep 19 17:09:36 pve1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 19 17:09:36 pve1 kernel: CR2: 0000561901c0e420 CR3: 00000007a4410000 CR4: 0000000000750ee0
Sep 19 17:09:36 pve1 kernel: PKRU: 55555554
Sep 19 17:09:36 pve1 kernel: Call Trace:
Sep 19 17:09:36 pve1 kernel:  <TASK>
Sep 19 17:09:36 pve1 kernel:  ? show_regs+0x6d/0x80
Sep 19 17:09:36 pve1 kernel:  ? __warn+0x89/0x160
Sep 19 17:09:36 pve1 kernel:  ? ring_interrupt_active+0x270/0x310 [thunderbolt]
Sep 19 17:09:36 pve1 kernel:  ? report_bug+0x17e/0x1b0
Sep 19 17:09:36 pve1 kernel:  ? handle_bug+0x46/0x90
Sep 19 17:09:36 pve1 kernel:  ? exc_invalid_op+0x18/0x80
Sep 19 17:09:36 pve1 kernel:  ? asm_exc_invalid_op+0x1b/0x20
Sep 19 17:09:36 pve1 kernel:  ? ring_interrupt_active+0x270/0x310 [thunderbolt]
Sep 19 17:09:36 pve1 kernel:  tb_ring_start+0x17e/0x330 [thunderbolt]
Sep 19 17:09:36 pve1 kernel:  tbnet_connected_work+0xe3/0x310 [thunderbolt_net]
Sep 19 17:09:36 pve1 kernel:  ? queue_work_on+0x67/0x70
Sep 19 17:09:36 pve1 kernel:  process_one_work+0x222/0x430
Sep 19 17:09:36 pve1 kernel:  worker_thread+0x50/0x3e0
Sep 19 17:09:36 pve1 kernel:  ? __pfx_worker_thread+0x10/0x10
Sep 19 17:09:36 pve1 kernel:  kthread+0xe6/0x110
Sep 19 17:09:36 pve1 kernel:  ? __pfx_kthread+0x10/0x10
Sep 19 17:09:36 pve1 kernel:  ret_from_fork+0x29/0x50
Sep 19 17:09:36 pve1 kernel:  </TASK>
Sep 19 17:09:36 pve1 kernel: ---[ end trace 0000000000000000 ]---
Sep 19 17:09:36 pve1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): en06: link becomes ready

cephFS and mycpeh RBDs are fully working over the IPv6 thunderbolt mesh so i assume not an issue for now...
 
Last edited:
Though should this be a cause for concern?
oh, i think i get it...

i think maybe the test branch pve kernels have debug and tracing turned on - i am seeing all sorts of things in the pve syslog view and dmesg from other components i don't recall seeing - including thunderbolt dbg messages.... so yeah, seems like a transient i can ignore...?
 
Though should this be a cause for concern?
It's at least not ideal, and while kernel warnings are sometimes for mostly cosmetics things, they also might show a real underlying problem.

I checked and found a bug report that sounds like this issue, mentioning the following patch:
https://git.kernel.org/pub/scm/linu...6&id=9f9666e65359d5047089aef97ac87c50f624ecb0

As that patch got into 6.4 you did not see this warning with the newer mainline kernel you tested.
I backported that fix and applied it now to our kernel, but it might take a bit longer until we do a new bump+build again (we just did, and I do not see this warning as that critical to immediately redo the work).
 
  • Like
Reactions: scyto
i think maybe the test branch pve kernels have debug and tracing turned on - i am seeing all sorts of things in the pve syslog view and dmesg from other components i don't recall seeing - including thunderbolt dbg messages.... so yeah, seems like a transient i can ignore...?
No, as our packages get moved 1:1 as is from internal -> pvetest -> pve-no-subscription -> pve-enterprise they're always the same and do not have different level of logging enabled or the like.
 
  • Like
Reactions: scyto
I backported that fix and applied it now to our kernel, but it might take a bit longer until we do a new bump+build again (we just did, and I do not see this warning as that critical to immediately redo the work).
Thanks.

This error was seen on a running node when another node in the TB ring went down. I can thus understand why people might see it in suspend scenarios. I think (not that my opinion matters) that your logic is sound and this is the same issue.

Thanks, for backporting this fix too and explaining how your packages move. I do not see an issue in this taking a while for another bump. So far it appears cosmetically scary rather than having an real impact as when the TB connection comes back connection is re-initiated correctly.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!