Corosync redundancy over second nic with public ip

TheMrg

Active Member
Aug 1, 2019
118
4
38
42
We have 3 Nodes. Each has eth0 with public ip and eth1 with private. no more NICs possible.

privatenet: 192.168.0.0 - eth0
public: different public ips, bit all in same datacenter - eth1
we are not able to change this. 2 private networks are not possible due datacenter restrictions.

we set the nodes with
pvecm add 192.168.0.1 -link0 192.168.0.2

sadly if the eth0 have much traffic we sometimes get :
Aug 01 20:43:07 storage1 corosync[3283]: [KNET ] link: host: 4 link: 0 is down
Aug 01 20:43:07 storage1 corosync[3283]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Aug 01 20:43:07 storage1 corosync[3283]: [KNET ] host: host: 4 has no active links
Aug 01 20:43:07 storage1 corosync[3283]: [TOTEM ] Token has not been received in 61 ms
Aug 01 20:43:07 storage1 corosync[3283]: [TOTEM ] Retransmit List: e89
Aug 01 20:43:07 storage1 corosync[3283]: [TOTEM ] Retransmit List: e89
Aug 01 20:43:07 storage1 corosync[3283]: [TOTEM ] Retransmit List: e89

So we want to know to send the corosync traffic via eth1 (public ip) ?
or set a redundancy may:
https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_redundancy
to send traffic via eth1 if eth0 is slow.

How can we do this. So if eth0 traffic is high, the corosync do not los and sometimes kill the nodes.

Thanks
 
Hi,

we have a similar setup up and running: ring0 with private IP over eth0, ring1 with public IP over eth1. We only got it working with public IPs being from the same /24 subnet, so it depends on what public IPs you have assigned.

So e.g.

103.43.75.180
103.43.75.190
103.43.75.232

works for us, while

103.43.75.180
103.43.75.190
103.43.88.232

does not.
 
Like i wrote: we only got it working with IPs from the same /24 subnet. Using /16 with corosync 2 did not work for us, but maybe things have changed with corosync 3.
 
Hi,

Maybe you can do 1 or 2 variants:

- create a vpn using the external interface, using a /24 address (udp vpn) for the second ring
- use a DSCP(flash) mark for corosync traffic(this is what I used and I do not see any corosync problems, but I mark with DSCP on my switch level)

Good luck

-
 
Thanks. Any suggestions with VPN?
We testet with tinc.
But it high cpu if we test high traffic on this net.

2nd: Sadly we have no access to infrastructure in our datacenter. So VPN is the only way we see.
 
Any suggestions with VPN?
We testet with tinc.
But it high cpu if we test high traffic on this

No. You will use this vpn only for corosync, and not for any other things. Corosync do not make a lot of traffic, so I could not imagine how you can have high cpu usage for any udp vpn!
 
we have ips like 52.78.. 52.76 and so on
Maybe a bit offtopic but just a suggestion:

Judging on the IP Range you are using Amazon AWS [1][2], which is not a good idea to run Proxmox on: You are trying to run a hypervisor (Proxmox/KVM) on virtualized hardware (EC2/Xen) - so if CPU load on the host is high your virtualized network cards will start to lag, which is exactly what you are seeing.

I'd suggest to look for a different datacenter instead where you can rent bare metal dedicated servers - and most probably you will get 2 private networks/NICs there as well.

[1] http://geoiplookup.net/ip/52.78.0.0
[2] http://geoiplookup.net/ip/52.76.0.0
 
Last edited:
Thanks. Only Corosync works well.
Thanks, IPs are examples.
Corosyn all is fine with tinc now.

But we have other "Problems".

We use eth0 for corosync and eth1 is used for proxmox internal net (communicate to hosts / nodes) and it is internal net for KVM VMs.
So if eth1 is down, corosync eth1 says all is fine, but it is not:
GUI did not find the other Nodes.
Internal NET between VMs is offline.
So also HA VMs are not restarted in other hosts, because corosync says all is fine. No fence is done.

How can we solve this?
 
Good evening,

I have exactly the same problem, and I haven't found a solution.

I think this problem comes from the hosts file.

So, what is the point of several Corosync links?

Chris
 

I think we're talking at cross purposes.

ringX_addr actually specifies a corosync link address, the name "ring" is a remnant of older corosync versions that is kept for backwards compatibility.


Corosyn all is fine with tinc now.

But we have other "Problems".
This sounds like a good moment to open a new thread for me.

How can we solve this?

Could you provide the (redacted) output of cat /etc/network/interfaces and cat /etc/pvebackup/corosync.conf?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!