Hi there,
I am in the middle of performing an upgrade of my 4 node cluster from latest 7.4 to current 8.1. While the update procedure itself worked like a charm, I am running into RAID trouble with the current kernel (same as here https://bugzilla.kernel.org/show_bug.cgi?id=217599#c30, really seems to be a kernel issue). I am getting aborted requests, making the system unavailabe from several seconds up to minutes.
Two nodes have been updated, and I of course fear loosing quorums, when two nodes fail at the same time, especially, since I also run a hyperconverged Ceph (current 17.2).
It seems that reverting to the latest previous kernel available through apt (6.2.16-5-pve), the problem is mitigated, but of course, I do not want to keep the cluster in this state.
Searching the forums, I only saw the possibility of reverting the upgrade by pinning all the changed packages to their previous versions, but that seems not realistic to me for a total of 658 upgraded packages.
So, I am thinking of taking the broken nodes out of the cluster and reinstalling them from the latest 7.4 image. Does that seem like a good approach to you?
I am in the middle of performing an upgrade of my 4 node cluster from latest 7.4 to current 8.1. While the update procedure itself worked like a charm, I am running into RAID trouble with the current kernel (same as here https://bugzilla.kernel.org/show_bug.cgi?id=217599#c30, really seems to be a kernel issue). I am getting aborted requests, making the system unavailabe from several seconds up to minutes.
Two nodes have been updated, and I of course fear loosing quorums, when two nodes fail at the same time, especially, since I also run a hyperconverged Ceph (current 17.2).
It seems that reverting to the latest previous kernel available through apt (6.2.16-5-pve), the problem is mitigated, but of course, I do not want to keep the cluster in this state.
Searching the forums, I only saw the possibility of reverting the upgrade by pinning all the changed packages to their previous versions, but that seems not realistic to me for a total of 658 upgraded packages.
So, I am thinking of taking the broken nodes out of the cluster and reinstalling them from the latest 7.4 image. Does that seem like a good approach to you?