Recent content by ahovda

  1. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Yes, all nodes completely up to date as of today. (and corosync and/or node restarted). Cluster has been stable ever since the first patched 1.11 knet update from pvetest, and later upgraded to latest version 1.12. I've also applied the suggested knet timeout tweaks since we only have a single ring.
  2. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Got a segfault today. Attached coredump (recompressed with xz for size and gz for accepted filetype in forum). PID: 2200707 (corosync) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Thu 2019-10-03 16:42:21 CEST (18min ago) Command Line...
  3. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Yeah, according to the lacp specs, correct ordering is required, so since I'm on lacp+balance-tcp I'm good from that standpoint. The xor hashing both in ovs and the Cisco switch with src-dst-tcp should be 100% deterministic and make each tcp stream stay on a single interface. This is my...
  4. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Sorry, I got that mixed up. I meant balance-tcp and was referring to https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#_linux_bond which says that But during my limited testing it looks as if other bonding modes can work fine too. (Perhaps getting a bit OT here.)
  5. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    If that was referring to me, not really any problems with ovs. I'm running balance-tcp lacp after patch was installed on 16 node cluster. In fact, not even a single log entry for corosync since, and I've tried to stress test it by running iperf sessions in both directions (even simultaneously...
  6. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Thanks a lot! I've reverted the kernel/driver flags and installed the new patched version and after initial tests it looks very promising. There are still retransmits while the LACP bonds are renegotiating if I play with the bonding settings and such but that is to be expected. I'm...
  7. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Unfortunately, I don't think it made a difference, I still have problems, and had to recover from a cluster meltdown yesterday. I have tried both intremap=off and disable_msi=1, both with no success -- which was expected, since all the posts suggesting those flags are for older kernel versions...
  8. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    I found I had a flakey LACP bond between two switches. Rebooted the switches and that seems resolved. I've added options bnx2 disable_msi=1 to /etc/modprobe.d/bnx2.conf and rebooted all hosts (checked lspci -v for MSI-X: Enable- afterwards). So far no more corosync segfaults, but I have yet to...
  9. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Hehe, right. Guess I was a bit quick on the zpool upgrade command. Just a test machine on 4.15 so far, with nothing in the pool, but yes, the machine came up but zpool-import failed as expected. root@osl108pve:~# /sbin/zpool import -aN -d /dev/disk/by-id -o cachefile=none This pool uses the...
  10. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Good idea. Is that going to work, or does some part of pve 6 depend on the new kernel? I'll give it a go and find out, I guess.
  11. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    You're right, it did not really help; the whole cluster crashed and I'm in late to recover. :cool: I think I might have some problems with the bnx2 driver, since ethernet ports are going down and up. I've searched and have added intremap=off to the kernel cmdline and after rebooting nodes I'll...
  12. [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Same deal here. To mitigate for now, I've added [Service] Restart=on-failure to /etc/systemd/system/corosync.service.d/override.conf and ran systemctl daemon-reload through ansible on our 16-node cluster. I have some collected a few coredumps as well if that helps, but they seem to big to...

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!