Search results

  1. C

    Best Practice Network

    I would not do this, especially if you take are taking a lot of effort to separate your networks. Corosync should have minimum 1 (better 2) seperate physical link (not shared with any other traffic) because of latency. Corosync links do not need a high bandwidth connection, but a stable and low...
  2. C

    Proxmox performance

    Can you give us more details on your system configuration? You are using Ceph as Storage for your VM's? What you should check from my perfpective ([...] slow starting arount 12 and from ~15:00 would be to slow to work [...]): Check nf_conntrack: This connection tracking and limiting system is...
  3. C

    Proxmox nodes are not showing up status and vote quorum has 2 activity blocked

    Sre you using ceph? Can you tell us a llittle bit more about your network setup. Is the corosync link seperated from other traffic?
  4. C

    Proxmox HA/Ceph cluster - VLANs for VM traffic, Ceph and Corosync - what traffic needs to go between VLANs?

    Hello victorhooi, you should do seperate the traffic as @Alwin said. Especially for corosync it is neccesary to have a low latancy on the connection. When all traffic is running over the same link the latency maybe become too high and you will get problems with that cluster. The seperated...
  5. C

    [SOLVED] Ceph data not available

    Good that you could solve the problem. Great that they have now seperated links. Could you mark the thread as solves?
  6. C

    [SOLVED] Ceph data not available

    Now all nodes are up again? https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/#failures-osd-peering They become incactive if the placement group has not been active for too long (i.e., it hasn’t been able to service read/write requests). Maybe because 2 nodes went down...
  7. C

    [SOLVED] corosync crash when adding a 15th node

    Perfect. May it is possible to add in future a seperate link for corosync?! Please mark this thread as solved. Thank you.
  8. C

    [SOLVED] corosync crash when adding a 15th node

    Have you checked according the Requirements: All nodes must be able to connect to each other via UDP ports 5404 and 5405 for corosync to work. (Firewall?) Date and time have to be synchronized. (Very Essential, that the cluster could run!) SSH tunnel on TCP port 22 between nodes is used. If you...
  9. C

    [SOLVED] corosync crash when adding a 15th node

    10.123.1.188 : unicast, xmt/rcv/%loss = 9014/9005/0%, min/avg/max/std-dev = 0.067/0.176/22.110/0.755 10.123.1.114 : unicast, xmt/rcv/%loss = 9179/9172/0%, min/avg/max/std-dev = 0.060/0.192/17.492/0.586 10.123.1.83 : unicast, xmt/rcv/%loss = 10000/9999/0%, min/avg/max/std-dev =...
  10. C

    Total Cluster Failure

    If you have updated all firmware and bios I think it could be a problem with the C-States. Is there any possibility in the bios to change things arround the C-States? (Have never worked with Dell hardware) Best would be only using C0 and C1. Maybe try this on one of your nodes. To disable it in...
  11. C

    Total Cluster Failure

    You did the ping over one of the vlan networks, wich is on the bond? You see, that the ping to NODE7 (CEPH OSD'S & VM'S) 172.16.3.27 is much higher than to an node, wich only has vm's on is. Because it has the highest latency from all the ceph nodes, i think, that this host was master, when you...
  12. C

    Total Cluster Failure

    Concerning a maybe possible network infrastructure for you: You do not need 6 Switches, if there are enough ports free on your existing ones. Seperate the corosync links on 2 (maybe use also cheaper switches, even 2 100MBit/s or 2 1GBit/s seperated interfaces are better than on a bond with...
  13. C

    Total Cluster Failure

    Concerning Ceph: ceph: 14.2.15-pve3 ceph-fuse: 14.2.15-pve3 on the old cluster nodes ceph-fuse: 12.2.11+dfsg1-2.1+b1 on the new node8 Is node 8 connected to your 4 ceph nodes? You should use the same versions. Do you get any warnings from ceph? Could be a Problem with the bond. you could...
  14. C

    Total Cluster Failure

    Concerning omping: Beware if you are already experiencing issues, the steps taken to diagnose the problem may make the problem worse in the short term! You have to install omping on all the machines, you want to test. Then you have to fire up omping -c 10000 -i 0.001 -F -q node1 node2 node3...
  15. C

    Total Cluster Failure

    Concerning Ceph Network you can use iperf Concerning Corosync Network you can use omping
  16. C

    Total Cluster Failure

    Even if not saturated, traffic on the network will rise the latency. If it works out and you can go fine with that, leave it. But it is always highly recomanded to split the links. MTU is absolutely relevant. They have to set up everywhere with the same MTU value. Requirements: All nodes...
  17. C

    Total Cluster Failure

    To understand it better: You added the 8th node and there was a problem with your VLAN 0 and 1. (This affected NET2 - Ceph Cluster and NET3 - Ceph Public on all nodes?!) After fixing this, you added the node again. Ceph tries to recover after restart and this is your main traffic on the bond...
  18. C

    Total Cluster Failure

    I think 1 really seperated link and another one as backup over vlan or whatever on an other physical link will be much better for the moment. You can add this link, and when it works out fine (testing before) you can delete the other two (nonsense) links on the bond. Cluster should be in an fine...
  19. C

    Total Cluster Failure

    First, in a properly set up, when ha is needed (which I expect when hosting costumers) there should be a seperated interface (better 2 for redundancy) for corosync. When they share, only seperated with vlans, the same links you may be get into troubles with latency. Same problem with ceph. I...
  20. C

    Total Cluster Failure

    When checking the package versions, you should do that on all nodes by compareing the outputs of pveversion -v Is the 8th node maybe on a newer or older version than the rest of the cluster?

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!