Kernel 6.8.8-2 broke my Thunderbolt-connected Intel PCIe NIC

Mar 8, 2016
65
3
73
Hello!

I did the offered updates today, 10th July 2024, which included Kernel 6.8.8-2 (Firmware 3.12-1 and a few other things), and upon reboot the host and VMs using one of the three NICs had no networking -- this was the NIC with the default gateway for the host box.

I run a 13th Gen Intel NUC Pro with a thunderbolt-connected Intel NIC. It shows as 82599ES. The Thunderbolt box is a Sonnet Twin that is a bridge to what I assume is a standard low-profile Intel 10G NIC.

Code:
02:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
03:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
03:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
04:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
2d:00.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)
2e:01.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)
2e:02.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)

I did considerable poking and prodding both in hardware and software, but the only thing that made it work again was reverting to kernel 6.8.4-3.

I even went the extra mile and after getting it working with 6.8.4-3, rebooted again back to 6.8.8-2, and again it did not work. Then back to 6.8.4-3 and it works again.

Let me know if pulling any other configs would help diagnose, but I do need the box working so going back to the state where the NIC isn't working may not be possible. Thanks.

__ update: I am leaning toward a Thunderbolt issue since there were also some problem messages during boot related to a Thunderbolt-connected NVME SSD on the new kernel.
 
Last edited:
I believe there was at least one or two other threads related to thunderbolt issues on the 6.8 kernel. It's not something that affects me so I'm not up on it, but you might want to search for the other threads to see if there is anything useful there. I think downgrading the kernel and waiting for a fix is probably the solution.
 
  • Like
Reactions: leesteken
Thanks.... doesn't seem to be solved though! I do have 'bolt' installed (though boltd is not running) ... it says (under 6.8.4-3):

Code:
# boltctl
 ○ Sabrent Enclosure
   ├─ type:          peripheral
   ├─ name:          Enclosure
   ├─ vendor:        Sabrent
   ├─ uuid:          c4010000-0082-8088-206b-x
   ├─ generation:    Thunderbolt 3
   ├─ status:        disconnected
   ├─ authorized:    Sun 19 May 2024 08:15:29 PM UTC
   ├─ connected:     Sun 19 May 2024 08:15:29 PM UTC
   └─ stored:        Mon 27 May 2024 08:05:50 PM UTC
      ├─ policy:     iommu
      └─ key:        no

 ● Sonnet Technologies, Inc. Echo Express SEL TB3
   ├─ type:          peripheral
   ├─ name:          Echo Express SEL TB3
   ├─ vendor:        Sonnet Technologies, Inc.
   ├─ uuid:          d5010000-0072-7508-22ea-x
   ├─ generation:    Thunderbolt 3
   ├─ status:        authorized
   │  ├─ domain:     f0058780-60e0-fd4d-ffff-ffffffffffff
   │  ├─ rx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  ├─ tx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  └─ authflags:  boot
   ├─ authorized:    Wed 10 Jul 2024 11:32:41 AM UTC
   ├─ connected:     Wed 10 Jul 2024 11:32:41 AM UTC
   └─ stored:        Mon 27 May 2024 08:05:50 PM UTC
      ├─ policy:     iommu
      └─ key:        no

 ● Sabrent Dual enclosure
   ├─ type:          peripheral
   ├─ name:          Dual enclosure
   ├─ vendor:        Sabrent
   ├─ uuid:          d3010000-0080-7708-234b-x
   ├─ generation:    Thunderbolt 3
   ├─ status:        authorized
   │  ├─ domain:     37a48780-a1cd-b401-ffff-ffffffffffff
   │  ├─ rx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  ├─ tx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  └─ authflags:  boot
   ├─ authorized:    Wed 10 Jul 2024 11:32:41 AM UTC
   ├─ connected:     Wed 10 Jul 2024 11:32:41 AM UTC
   └─ stored:        Thu 06 Jun 2024 03:55:08 PM UTC
      ├─ policy:     iommu
      └─ key:        no
 
You might try the latest 6.5 kernel. I think problems started in 6.8, but search and check. I am not using thunderbolt so my problems with 6.8 were different. The latest 6.8 is working for my issues, but I had to stick with 6.5 for a while due to issues in 6.8.
 
Last edited:
Code:
╭─flo at nuc01 in ~ 24-07-19 - 10:30:59
╰─○ sudo proxmox-boot-tool kernel list
Manually selected kernels:
None.


Automatically selected kernels:
6.5.13-5-pve
6.8.4-2-pve
6.8.8-2-pve

╭─flo at nuc01 in ~ 24-07-19 - 10:31:00
╰─○ sudo proxmox-boot-tool kernel pin 6.8.4-2-pve

Set kernel '6.8.4-2-pve' in /etc/kernel/proxmox-boot-pin.
Refresh the actual boot ESPs now? [yN] y
Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
WARN: /dev/disk/by-uuid/50BC-7243 does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
Copying and configuring kernels on /dev/disk/by-uuid/9A7C-7602
    Copying kernel and creating boot-entry for 6.5.13-5-pve
    Copying kernel and creating boot-entry for 6.8.4-2-pve
    Copying kernel and creating boot-entry for 6.8.8-2-pve
 
Yeah, I have Thunderbolt mesh networking between two nodes and when you use the latest 6.8 kernels, it causes a bunch of retries and throughput issues with Thunderbolt when you use iperf3 to test bandwidth. I had to revert back to 6.5.13-5-pve to make everything work right. There might be some 6.8 versions that work too, but I haven't had the time to try all of them. Is there a list of all of the available kernel versions somewhere?
 
Has anyone tried proxmox-kernel-6.8.8-4?

(looking at change logs on github it doesn't seem to be anything worked on recently) ... I suppose I should look further to see why it broke between 6.8.4-3 which works for me and 6.8.8-2 which doesn't.
 
Last edited:
Thanks for the notes here, I downgraded to 6.5.13-6-pve, my adapter is an Startech TB one and this bought it back.


Jul 28 07:10:04 nucpm kernel: thunderbolt 1-1: new device found, vendor=0x6f device=0x21
Jul 28 07:10:04 nucpm kernel: thunderbolt 1-1: StarTech.com TB310G2
Jul 28 07:10:04 nucpm kernel: 8021q: 802.1Q VLAN Support v1.8

Running on a NUC also.

I'm assuming I need to wait until:

6.8.8-2-pve
6.8.8-4-pve

Are upgraded to test again.
 
Yeah, I have Thunderbolt mesh networking between two nodes and when you use the latest 6.8 kernels, it causes a bunch of retries and throughput issues with Thunderbolt when you use iperf3 to test bandwidth. I had to revert back to 6.5.13-5-pve to make everything work right. There might be some 6.8 versions that work too, but I haven't had the time to try all of them. Is there a list of all of the available kernel versions somewhere?
I am experiencing the same issue with 6.8 kernels on my 3 node setup with Thunderbolt networking. Reverting solves this for me as well.
 
I’m now on 6.8.12-1 and am getting the full 26Gb/s speeds. You do have to pin thunderbolt to the P cores on the cpu though.
 
Thanks for info. I just tested the new kernel, but unfortunately the entire networking goes down for me on 6.8.12-1. Same issue as here.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!