Search results

  1. PVE 5.4-11 + Corosync 3.x: major issues

    Hi @Fabio, @fabian, This morning sanctuary dropped out of the cluster with a divide error, load would have been negligible at the time: Aug 7 00:10:37 sanctuary kernel: [739314.684408] show_signal: 6 callbacks suppressed Aug 7 00:10:37 sanctuary kernel: [739314.684410] traps...
  2. PVE 5.4-11 + Corosync 3.x: major issues

    I can report our cluster is still green again this morning for all 3 nodes
  3. PVE 5.4-11 + Corosync 3.x: major issues

    Hi Apollon77, Marin Bernard's logs did show pmtud log entries, which lead me to suspect their issue was the same as mine - but it's possible there's multiple things at work here (in which case I apologize for any potential thread hijacking!). You might want to grep your logs for it to confirm...
  4. PVE 5.4-11 + Corosync 3.x: major issues

    I did try that the night before you posted the test version of libknet, and that morning the cluster was green and not reporting the PMTUD issue. I undid this change to test that version of libknet however
  5. PVE 5.4-11 + Corosync 3.x: major issues

    I think previously, on the original libknet version other hosts were flapping too. I can go back further in the logs to find out if needed. The workload on scramjet isn't very different in nature but it is higher, being an EPYC system with more ram than the other hosts it'll typically run more...
  6. PVE 5.4-11 + Corosync 3.x: major issues

    Hi Fabio, we have some hardware coming in this week for a new production cluster so more than happy to do whatever we can to help fix this issue. Attached are the logs from all three hosts from 12am saturday morning until I manually restarted corosync on scramjet around midday. Scramjet is the...
  7. PVE 5.4-11 + Corosync 3.x: major issues

    This morning the cluster is green, no hosts marked as offline. I hope this means the specific issue with knet pmtud and crypto has been resolved, and the floating point exception I saw over the weekend was an anomaly. When I get to the office this morning I'll dig through logs for any sign of...
  8. PVE 5.4-11 + Corosync 3.x: major issues

    Aug 4 00:18:16 scramjet corosync[1472229]: [KNET ] pmtud: possible MTU misconfiguration detected. kernel is reporting MTU: 1500 bytes for host 2 link 0 but the other node is not acknowledging packets of this size. Aug 4 00:18:16 scramjet corosync[1472229]: [KNET ] pmtud: This can be...
  9. PVE 5.4-11 + Corosync 3.x: major issues

    Hi @fabian, sorry to report this version still has issues. I don't see the PMTUD issue in the logs today but one host still had issues keeping quorum, reported lost tokens for about an hour, (all of which might be a separate issue elsewhere) but then its kernel reported a crash in libknet.so...
  10. PVE 5.4-11 + Corosync 3.x: major issues

    Hi Fabian, I'd be happy to test. I'll report back after the weekend if it's stable, or sooner if not.
  11. PVE 5.4-11 + Corosync 3.x: major issues

    *maybe* possibly related to this issue on knet with pmtud when using crypto? ://github.com/kronosnet/kronosnet/pull/242/commits if so it looks like a possible fix is in the pipeline already then (had to remove the https because the forum won't let me post links)
  12. PVE 5.4-11 + Corosync 3.x: major issues

    We are seeing the same behaviour on a pve 6 cluster which was upgraded from pve 5. Not sure if the root cause is the same but the symptom surely is. Logs seem to indicate pvesr hanging on one or another host right before the quorum loss, but we don't have any replication jobs configured, so...

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!