Random fencing

stann · Monday at 11:16

Hello,

We are experiencing random fencing on our 4 PVE nodes + 1 node tie-breaker (stretched cluster). It happened for the third time.

Our topology:

- DC1: Node 1 and Node 2
- DC2: Node 3 and Node 4
- DC3: Node 5. This node acts as a tie-breaker. It is connected via a VPN because it is our only option to have a 3rd DC.
We have no issues with Ceph.

Context and hardware history:

- Fencing 1: We do not have the logs, but it looked very similar to fencing 2.
- Fencing 2 (June 15) : Node 1 and Node 2 (in DC1) fenced and rebooted. Node 3 and Node 4 (in DC2) stayed online. Node 5 was a mini PC that suffered from frequent hard freezes.
- Hardware change: Between June 15 and June 26, we replaced the mini PC with a standard PC to resolve the hardware freezes.
- Fencing 3 (June 26): Node 1 and Node 2 (in DC1) fenced and rebooted. Node 3 and Node 4 (in DC2) stayed online. The logs for this third event are very different from the second one.

The problem: Recently, Node 1 and Node 2 (in DC1) fenced and rebooted. Node 3 and Node 4 (in DC2) did not fence.

Looking at the logs. We assume this is caused by the MTU mismatch due to the VPN encapsulation for Node 5.

Bash:

Jun 15 15:29:00 NODE-1 corosync[2459]:   [KNET  ] pmtud: possible MTU misconfiguration detected. kernel is reporting MTU: 1500 bytes for host 5 link 0 but the other node is not acknowledging packets of this size.
Jun 15 15:29:00 NODE-1 corosync[2459]:   [KNET  ] pmtud: This can be caused by this node interface MTU too big or a network device that does not support or has been misconfigured to manage MTU of this size, or packet loss. knet will continue to run but performances might be affected.

(I alse have attached the error logs to this post from Node 1, Node 2 and Node 5 [second and third fencing]).

We plan to create a dedicated 1 Gbps network strictly for Corosync. To solve the VPN overhead issue, we intend to lower the MTU to around 1400 on these dedicated Corosync interfaces across all 5 nodes.

Will setting a lower MTU globally for Corosync be enough to stabilize the heartbeat over the VPN and prevent the watchdog from fencing the nodes? Or is there anything else we need to configure?

Thank you for your help.

j.theisen · Monday at 14:03

Hi @paname

thanks for posting on the forum!

I assume you had a look at the guide on stretch clusters here [1], if not just for reference.

From the logs it seems like your main problem is the pve-ha-lrm losing and not being able to update its lock on the pmxcfs.
This might be due to the MTU size mismatch and is a good place to start.

What kind of VPN are you using to connect the tiebreaker?
In case of IPsec, are you using IKEv1 or IKEv2 and if latter, are you using Rekeying?

Yours sincerely
Jonas

[1] https://pve.proxmox.com/wiki/Stretch_Cluster

stann · Tuesday at 09:30

j.theisen said:
Hi @paname

thanks for posting on the forum!

I assume you had a look at the guide on stretch clusters here [1], if not just for reference.

From the logs it seems like your main problem is the pve-ha-lrm losing and not being able to update its lock on the pmxcfs.
This might be due to the MTU size mismatch and is a good place to start.

What kind of VPN are you using to connect the tiebreaker?
In case of IPsec, are you using IKEv1 or IKEv2 and if latter, are you using Rekeying?

Yours sincerely
Jonas

[1] https://pve.proxmox.com/wiki/Stretch_Cluster

Hello Jonas,

Thanks for your answer. Yes I did take a look at the stretch cluster guide.

We will add the dedicated link with a lower MTU soon, probably today or tomorrow.

Regarding the VPN, I just checked our FortiGate firewall. We use IPsec IKEv2 and the Rekeying is active (a key lifetime is configured).

Best regards,

Stan

(I changed my username)

j.theisen · Tuesday at 10:32

stann said:
Regarding the VPN, I just checked our FortiGate firewall. We use IPsec IKEv2 and the Rekeying is active (a key lifetime is configured).

Is Dead Peer Detection also enabled?
Since the communication outage seems to have lasted over a minute, it might be a good idea to raise the check frequency there:

Code:

Jun 26 08:56:15 NODE-5-TIE-BREAKER corosync[995]:   [QUORUM] Sync joined[4]: 1 2 3 4
Jun 26 08:56:15 NODE-5-TIE-BREAKER corosync[995]:   [TOTEM ] A new membership (1.7a9) was formed. Members joined: 1 2 3 4
Jun 26 08:56:17 NODE-5-TIE-BREAKER pmxcfs[842]: [status] notice: cpg_send_message retry 10
Jun 26 08:56:18 NODE-5-TIE-BREAKER pmxcfs[842]: [status] notice: cpg_send_message retry 20
[...]
Jun 26 08:57:30 NODE-5-TIE-BREAKER pmxcfs[842]: [status] notice: cpg_send_message retry 40
Jun 26 08:57:31 NODE-5-TIE-BREAKER pmxcfs[842]: [dcdb] notice: cpg_join retry 700
Jun 26 08:57:31 NODE-5-TIE-BREAKER corosync[995]:   [QUORUM] Sync members[3]: 3 4 5
Jun 26 08:57:31 NODE-5-TIE-BREAKER corosync[995]:   [QUORUM] Sync joined[2]: 3 4

In any case this setup is less than ideal, since the VPN connected node is not only a tie-breaker where a QDevice would suffice, but rather a fully-functional node.
This technically also means that it has to fulfill the requirements of LAN-level latencies for corosync to work properly.

stann · Tuesday at 10:43

j.theisen said:
Is Dead Peer Detection also enabled?
Since the communication outage seems to have lasted over a minute, it might be a good idea to raise the check frequency there:

Code:

Jun 26 08:56:15 NODE-5-TIE-BREAKER corosync[995]: [QUORUM] Sync joined[4]: 1 2 3 4 Jun 26 08:56:15 NODE-5-TIE-BREAKER corosync[995]: [TOTEM ] A new membership (1.7a9) was formed. Members joined: 1 2 3 4 Jun 26 08:56:17 NODE-5-TIE-BREAKER pmxcfs[842]: [status] notice: cpg_send_message retry 10 Jun 26 08:56:18 NODE-5-TIE-BREAKER pmxcfs[842]: [status] notice: cpg_send_message retry 20 [...] Jun 26 08:57:30 NODE-5-TIE-BREAKER pmxcfs[842]: [status] notice: cpg_send_message retry 40 Jun 26 08:57:31 NODE-5-TIE-BREAKER pmxcfs[842]: [dcdb] notice: cpg_join retry 700 Jun 26 08:57:31 NODE-5-TIE-BREAKER corosync[995]: [QUORUM] Sync members[3]: 3 4 5 Jun 26 08:57:31 NODE-5-TIE-BREAKER corosync[995]: [QUORUM] Sync joined[2]: 3 4

In any case this setup is less than ideal, since the VPN connected node is not only a tie-breaker where a QDevice would suffice, but rather a fully-functional node.
This technically also means that it has to fulfill the requirements of LAN-level latencies for corosync to work properly.

Dead Peer Detection is set to "On Demand".

Regarding your last point, we are completely aware that this is not the ideal setup. However, we absolutely need a Ceph Monitor in this 3rd datacenter.

And the guide [1] says : "The use of a Proxmox VE installation is strongly recommended, as it allows the use of the Proxmox VE Ceph tools to set up the monitor."

[1] https://pve.proxmox.com/wiki/Stretch_Cluster#Tie-Breaker_Node

j.theisen · Tuesday at 10:55

stann said:
Dead Peer Detection is set to "On Demand".

This seems to be the correct setting.
What are the retry counts and intervals set to?

stann said:
And the guide [1] says : "The use of a Proxmox VE installation is strongly recommended, as it allows the use of the Proxmox VE Ceph tools to set up the monitor."

This is correct and i don't argue with that decision.
However the guide also states that it has to be ensured "that all Corosync links and the Ceph (public) network are always accessible with low bandwidth"

So you should probably also look into your VPN configuration, if and why the connection drops for some reason.

stann · Tuesday at 11:00

j.theisen said:
This seems to be the correct setting.
What are the retry counts and intervals set to?

DPD retry count : 3
DPD retry interval : 20s

j.theisen said:
This is correct and i don't argue with that decision.
However the guide also states that it has to be ensured "that all Corosync links and the Ceph (public) network are always accessible with low bandwidth"

So you should probably also look into your VPN configuration, if and why the connection drops for some reason.

Ok I will ask my network administrator to check the VPN configuration. Thanks.

j.theisen · Tuesday at 11:04

stann said:
DPD retry count : 3
DPD retry interval : 20s

Ok this seems a little high for this application, might also ask your network admin to lower the interval.

stann · Tuesday at 11:09

j.theisen said:
Ok this seems a little high for this application, might also ask your network admin to lower the interval.

To what values should we lower it? I need to be careful because this VPN also carries regular office traffic.

j.theisen · Tuesday at 11:13

Well this depends on the underlying internet connection and the reason the VPN connection drops in the first place (if it even drops at all).
In case there seems to be no outage of the VPN, you might want to consider applying QoS rules to the Corosync traffic. This would be just a band-aid but might also improve the situation.

stann · Tuesday at 11:18

j.theisen said:
Well this depends on the underlying internet connection and the reason the VPN connection drops in the first place (if it even drops at all).
In case there seems to be no outage of the VPN, you might want to consider applying QoS rules to the Corosync traffic. This would be just a band-aid but might also improve the situation.

In case the VPN itself is stable, how would you recommend configuring this QoS rule for Corosync?

stann · Tuesday at 12:03

@j.theisen

We're going to add the Corosync link today, and change MTU.

Shall just lower MTU on Proxmox GUI ?

And also on the VPN we don't have any logs that might end to a fencing, we are uncertain if the VPN drops or not.

guruevi · Tuesday at 16:19

If you’re using an IPSec VPN, you are encapsulating Ethernet packets over whatever link you have. If your IPSec goes over a link with 1500MTU, your IPSec header takes about 60 bytes if I’m not mistaken, then your Ethernet packets need to be smaller to fit the IPSec header info + the Ethernet packets. There is a way to test this with ping, but more safe to ask your firewall setup. Over WAN, you cannot assume even 1500 MTU, many commercial Metro Ethernet only give you about 1400 or smaller.

stann · Tuesday at 16:23

guruevi said:
If you’re using an IPSec VPN, you are encapsulating Ethernet packets over whatever link you have. If your IPSec goes over a link with 1500MTU, your IPSec header takes about 60 bytes if I’m not mistaken, then your Ethernet packets need to be smaller to fit the IPSec header info + the Ethernet packets. There is a way to test this with ping, but more safe to ask your firewall setup. Over WAN, you cannot assume even 1500 MTU, many commercial Metro Ethernet only give you about 1400 or smaller.

I tried to ping with different MTU.

And after many pings 1410 bytes does work, so might be 1438 bytes without ICMP header.

So my IPsec header should be 62 bytes

Random fencing

stann

New Member

Attachments

j.theisen

Active Member

stann

New Member

j.theisen

Active Member

stann

New Member

j.theisen

Active Member

stann

New Member

j.theisen

Active Member

stann

New Member

j.theisen

Active Member

stann

New Member

stann

New Member

guruevi

Renowned Member

stann

New Member

We value your privacy