Cluster Corosync Fails when clustering

ImpactStrafe

New Member
Jan 8, 2016
11
0
1
30
Hi,

Since 4.x has been released I've built about fifteen clusters of 4+ machines and most have worked just fine, however about four or five of them have blown up as I am setting up the original cluster. Each time it hangs after the "Backing up the old Database" step.

Each time it reports: "job for coroync.service failed. See 'systemctl status corosync.service' and 'journalctl -xn' for details"

I've attached a picture of the error that I get each time it fails. It's always the same error.

These are all being built on identical hardware, so I don't imagine that hardware is the issue, and they are all on a lan with only a switch connecting them. I've tried multiple switches and the same thing happens. It also appears to be a random occurrence, I never know when it is going to fail.

Machine hardware:
2 x 8 Core Xeons
48 GB of RAM
10GB Mellanox NIC
1 GB LAN NIC
3 1 TB WD Blue drives

I have 1 10GB network for CEPH/HA which is set up post this, but I never get to this point.

These are all base proxmox installs and the only addition I make is some modules that allows the Mellanox Card to work.

Anyone have any thoughts?
 

Attachments

  • IMAG0373.jpg
    IMAG0373.jpg
    978.2 KB · Views: 9
It looks like corosync is timing out. Is your network wide open? If there is a firewall blocking all segments to/from port 5405 upstream, you could have this issue. Also, verify that multicast is supported on your switch.
 
What about the firewall configuration on each Proxmox Node? Is the default option set to 'ALLOW'?
compare the contents of "cluster.conf" between the nodes.
Another thing to note is that if you add and remove nodes to the cluster, for example remove Proxmox2 node, then add it again later, there are artifacts of the first node that remain in the cluster. You should never add a node that existed previously unless you know how to remove these artifacts that can cause issues.
 
Sorry, I got busy during the week. These are all brand new nodes with brand new installs. No firewall rules. This happens when I do the following:

Install proxmox with base settings. Assign IP address. Confirm connectivity between all the nodes via DNS and IP (using the hosts file), create cluster using pvecm create COOLNAME then add other nodes using pvecm add x.x.x.x.

I'll try to compare the Cluster.conf later this week.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!