Hello,
Since we've seen (and fixed) "corosync [TOTEM ] Retransmit List: XXXX" errors in /var/log/cluster/corosync.log on several Proxmox VE clusters, and information on the internet is not always very clear about the solution, I thought it was a good idea to share some information on how to fix these errors. I strongly recommend to fix these errors, even if it are only a couple of them and you do not have any problems right now. I've seen a Proxmox VE cluster running for 137+ days without any noticeable issues (except some retransmit errors), but on one day one node was having much totem retransmits and the other nodes in the cluster was not seeing this node anymore and thus fenced it.
1. First of all, be absolutly sure multicast traffic is working fine. Please see: https://pve.proxmox.com/wiki/Multicast_notes. I've had the best results with IGMP querier enabled per VLAN on the switch(es) (be sure to configure IGMP querier on at least 2 switches if you have a HA-setup, because if it's only configured on 1 switch and that switch fails, your whole cluster will fail within a couple of minutes after the switch failed) and then disable IGMP querier on the Linux Bridge(s) of your Proxmox VE node's (/etc/network/interfaces):
If you use the Proxmox VE built-in firewall, be sure to allow multicast traffic. If everything is configured, test multicast traffic with omping.
2. If you are sure multicast traffic is working fine but still get some of these errors AND you have the default system/network MTU of 1500, consider changing your system/network MTU to 9000 (enable jumbo frames on the switch(es) and then on your Proxmox VE nodes interfaces (/etc/network/interfaces)). If this isn't possible, and you need to have the default MTU of 1500, edit the /etc/pve/cluster.conf file (please read the instructions at http://pve.proxmox.com/wiki/Fencing#General_HowTo_for_editing_the_cluster.conf first and don't forget to increase config_version!) and add the section:
For example:
If you have changed your MTU (already) to 9000, you don't need to change the netmtu in your cluster.conf, as this value will stay at the corosync default of 1500 which is fine in that case.
3. If you are sure multicast traffic is working fine, have a MTU of 9000 (or changed the netmtu to 1480 in cluster.conf, see above), but still get some of these errors AND you have nodes in your cluster that are noticeable slower then other nodes in your cluster, you may consider changing your corosync's window size. The default is 50. Never go higher then 256000 / MTU. So if you have a MTU of 1500, your max. window size will be 170 (256000 / 1500 = 170). This should be save (however, I didn't test this!) with a system MTU of 9000 also, because the corosync netmtu will stay at the default of 1500. If you increase your corosync netmtu also (for example to 8980), your max. window size will be 28 (which is lower than the default value!). But I don't recommend to do this and did not yet see any configuration where this was needed (just leave netmtu at the default when using system MTU of 9000, so your window size can be default (or a bit higher) also). In general I think you should be very reserved with changing your window size, the default of 50 is a save value in most of the cases. However, if needed you can do this as follows:
Edit the /etc/pve/cluster.conf file (please read the instructions at http://pve.proxmox.com/wiki/Fencing#General_HowTo_for_editing_the_cluster.conf first and don't forget to increase config_version!) and add the section:
For example:
4. If the steps above didn't help I suggest you check your network drivers and hardware (is there a node with a very high load, a configuration error on one or more ports on the switch(es), one or more bad network cables/cards etc.). However, in most cases then you also see multicast traffic beeing (partially) dropped when testing with omping.
I hope the above information will help someone.
Since we've seen (and fixed) "corosync [TOTEM ] Retransmit List: XXXX" errors in /var/log/cluster/corosync.log on several Proxmox VE clusters, and information on the internet is not always very clear about the solution, I thought it was a good idea to share some information on how to fix these errors. I strongly recommend to fix these errors, even if it are only a couple of them and you do not have any problems right now. I've seen a Proxmox VE cluster running for 137+ days without any noticeable issues (except some retransmit errors), but on one day one node was having much totem retransmits and the other nodes in the cluster was not seeing this node anymore and thus fenced it.
1. First of all, be absolutly sure multicast traffic is working fine. Please see: https://pve.proxmox.com/wiki/Multicast_notes. I've had the best results with IGMP querier enabled per VLAN on the switch(es) (be sure to configure IGMP querier on at least 2 switches if you have a HA-setup, because if it's only configured on 1 switch and that switch fails, your whole cluster will fail within a couple of minutes after the switch failed) and then disable IGMP querier on the Linux Bridge(s) of your Proxmox VE node's (/etc/network/interfaces):
Code:
post-up ( echo 0 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )
If you use the Proxmox VE built-in firewall, be sure to allow multicast traffic. If everything is configured, test multicast traffic with omping.
2. If you are sure multicast traffic is working fine but still get some of these errors AND you have the default system/network MTU of 1500, consider changing your system/network MTU to 9000 (enable jumbo frames on the switch(es) and then on your Proxmox VE nodes interfaces (/etc/network/interfaces)). If this isn't possible, and you need to have the default MTU of 1500, edit the /etc/pve/cluster.conf file (please read the instructions at http://pve.proxmox.com/wiki/Fencing#General_HowTo_for_editing_the_cluster.conf first and don't forget to increase config_version!) and add the section:
Code:
<totem netmtu="1480"/>
For example:
Code:
<?xml version="1.0"?>
<cluster name="clustername" config_version="2">
<totem netmtu="1480"/>
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<clusternodes>
<clusternode name="node1" votes="1" nodeid="1"/>
<clusternode name="node2" votes="1" nodeid="2"/>
<clusternode name="node3" votes="1" nodeid="3"/></clusternodes>
</cluster>
If you have changed your MTU (already) to 9000, you don't need to change the netmtu in your cluster.conf, as this value will stay at the corosync default of 1500 which is fine in that case.
3. If you are sure multicast traffic is working fine, have a MTU of 9000 (or changed the netmtu to 1480 in cluster.conf, see above), but still get some of these errors AND you have nodes in your cluster that are noticeable slower then other nodes in your cluster, you may consider changing your corosync's window size. The default is 50. Never go higher then 256000 / MTU. So if you have a MTU of 1500, your max. window size will be 170 (256000 / 1500 = 170). This should be save (however, I didn't test this!) with a system MTU of 9000 also, because the corosync netmtu will stay at the default of 1500. If you increase your corosync netmtu also (for example to 8980), your max. window size will be 28 (which is lower than the default value!). But I don't recommend to do this and did not yet see any configuration where this was needed (just leave netmtu at the default when using system MTU of 9000, so your window size can be default (or a bit higher) also). In general I think you should be very reserved with changing your window size, the default of 50 is a save value in most of the cases. However, if needed you can do this as follows:
Edit the /etc/pve/cluster.conf file (please read the instructions at http://pve.proxmox.com/wiki/Fencing#General_HowTo_for_editing_the_cluster.conf first and don't forget to increase config_version!) and add the section:
Code:
<totem window_size="170"/>
For example:
Code:
<?xml version="1.0"?>
<cluster name="clustername" config_version="2">
<totem window_size="170"/>
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<clusternodes>
<clusternode name="node1" votes="1" nodeid="1"/>
<clusternode name="node2" votes="1" nodeid="2"/>
<clusternode name="node3" votes="1" nodeid="3"/></clusternodes>
</cluster>
4. If the steps above didn't help I suggest you check your network drivers and hardware (is there a node with a very high load, a configuration error on one or more ports on the switch(es), one or more bad network cables/cards etc.). However, in most cases then you also see multicast traffic beeing (partially) dropped when testing with omping.
I hope the above information will help someone.
Last edited: