Problems with cluster...

I"m still having the issue, although I have not made any changes, either. If it worked under 3.2, why isn't the same config working under 3.3? Also, two of the three nodes work fine. Why aren't they having the same issue?
 
I"m still having the issue, although I have not made any changes, either. If it worked under 3.2, why isn't the same config working under 3.3? Also, two of the three nodes work fine. Why aren't they having the same issue?

I would suggest you to use tcpdump to debug your problem. Try to capture multicast address (from pvecm status) and see if it stops to comming/sending.
 
Because with 3.3 you updated corosync too. I would suggest you to use tcpdump. Try to capture multicast packets (see address from pvecm status) when node is failed. Also try to use omping when node is failed to see if it stops to receive heartbeat. And as a final workaround for 3 nodes cluster you can use unicast instead of multicast and see if it helps. The nature of your issue is not clear and all these steps will help to understand where exactly your problem is.
 
Last edited:
Symptoms were the same as in another thread, although it was in regard to running a mixed cluster of 3.10 and 2.6.32 kernels, so I thought I would give the fix a try, and it worked. However, I'm not running a mixed-kernel cluster; everything is 3.10-4. Anyway, this worked and fixed the issue. But, WHY must this be done?
find /sys/devices/virtual/net -name 'multicast_snooping' -exec sh -c 'echo 0 > {}' \;
find /sys/devices/virtual/net -name 'multicast_querier' -exec sh -c 'echo 0 > {}' \;
service cman stop; service cman start; service pve-cluster restart
 
If I remember, multicast_querier is disabled in kernel 3.10 by default, because of bugs and incompatiblity with cisco switches.

You should have an igmp querier somewhere in your physical switches to manage multicast; (or disable snooping on vmbrX if your switchs are not manageable and don't support igmp querier)
 
Try to downgrade kernel to 2.6.32. According to spirit's post kernel 3.10 has a disabled multicast_querier. And sorry for me, I've forgot to mention that I use 2.6.32 kernel with 3.3 pve. Maybe that's why I don't have such issue after upgrade.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!