I have a 3 node Intel NUC cluster with meshed Thunderbolt connections between them for a ceph storage pool. After applying the latest four updates and rebooting, the ceph cluster is no longer available. Other than an update a few months ago where FRR was updated to 10.2.2 it has been rock solid for at least a year.
Updates applied:
proxmox-kernel-6.8.12-13
proxmox-kernel-6.8.12-13-pve-signed
proxmox-kernel-helper 8.1.4
pve-container 5.3.0
Normally I'm more cautious and only do one host at a time and check the ceph storage, but Murphy whispered in my ear and said these updates look innocuous enough so abandon caution and update all three hosts.
I haven't gone deep into troubleshooting but it it appears something changed with Thunderbolt and therefore the networking.
In DMESG I see this which I don't think was there before.
I'm not sure what to do at this point. I guess look into reverting to the previous kernel and hope that recovers it.
Updates applied:
proxmox-kernel-6.8.12-13
proxmox-kernel-6.8.12-13-pve-signed
proxmox-kernel-helper 8.1.4
pve-container 5.3.0
Normally I'm more cautious and only do one host at a time and check the ceph storage, but Murphy whispered in my ear and said these updates look innocuous enough so abandon caution and update all three hosts.
I haven't gone deep into troubleshooting but it it appears something changed with Thunderbolt and therefore the networking.
In DMESG I see this which I don't think was there before.
Code:
[ 4.943015] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[ 5.013911] typec port1: bound usb1-port5 (ops connector_ops)
[ 5.013919] typec port1: bound usb4-port3 (ops connector_ops)
[ 5.054221] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[ 5.090882] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[ 5.209368] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[ 5.908043] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[ 5.949000] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[ 6.228175] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[ 6.268344] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[ 9.134896] thunderbolt 0-1: new host found, vendor=0x8086 device=0x1
[ 9.134901] thunderbolt 0-1: Intel Corp. pve03
[ 9.138077] thunderbolt-net 0-1.0 en05: renamed from thunderbolt0
I'm not sure what to do at this point. I guess look into reverting to the previous kernel and hope that recovers it.