Warning: Proxmox Updates Broke Thunderbolt Ceph Cluster

Envy8181

New Member
Apr 7, 2024
11
6
3
I have a 3 node Intel NUC cluster with meshed Thunderbolt connections between them for a ceph storage pool. After applying the latest four updates and rebooting, the ceph cluster is no longer available. Other than an update a few months ago where FRR was updated to 10.2.2 it has been rock solid for at least a year.

Updates applied:
proxmox-kernel-6.8.12-13
proxmox-kernel-6.8.12-13-pve-signed
proxmox-kernel-helper 8.1.4
pve-container 5.3.0

Normally I'm more cautious and only do one host at a time and check the ceph storage, but Murphy whispered in my ear and said these updates look innocuous enough so abandon caution and update all three hosts.

I haven't gone deep into troubleshooting but it it appears something changed with Thunderbolt and therefore the networking.
In DMESG I see this which I don't think was there before.


Code:
[    4.943015] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[    5.013911] typec port1: bound usb1-port5 (ops connector_ops)
[    5.013919] typec port1: bound usb4-port3 (ops connector_ops)
[    5.054221] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[    5.090882] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[    5.209368] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[    5.908043] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[    5.949000] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[    6.228175] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[    6.268344] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[    9.134896] thunderbolt 0-1: new host found, vendor=0x8086 device=0x1
[    9.134901] thunderbolt 0-1: Intel Corp. pve03
[    9.138077] thunderbolt-net 0-1.0 en05: renamed from thunderbolt0

I'm not sure what to do at this point. I guess look into reverting to the previous kernel and hope that recovers it.