VE Cluster with 5 servers - issue

linhu · May 23, 2025

VE Cluster with 5 servers - issue

Hi, we have a ve clster with 5 servers, all server are:
Supermicro Server CSE-819U 2x 14-Core Xeon E5-2690 v4 2,6GHz 128GB 9361-8i
prox1 to prox5 have the same netconfig 192.168.1.150-154 (adminnet)
There are running 2-3 vm's on eatch server with local zfs storage, there are shared storage and 3 backupservers on the cluster, the shared storage are on the same net, the backupservers are on the same net

The strange thing is the cluster is not running very good with 5 servers online, because the prox2 goes offline,
- the smartdog shutting down the prox2,
- the prox2 comes up again but not online
- in the cluster i can see the prox2 is offline, i have to hardreset the server
- we have changed the hardware on prox2, also networkcard and mainboard
- after we have changed all hardware the server was running in standalone mode with proxmox and a test vm without issues
- now after 10days we have added prox2 to the cluster, and it goes offline after almost 1hour
- when we remove the server from the cluster, all is running without issues
- the difference between the prox2 and the other servers is, prox2 is new installed with version 8.2 and upgraded to 8.3, and the other servers are installed with version 8.0 and upgraded to 8.4

the internal net is a 10Gib net
HA is activatet on almost all vm's
all proxmox software installations are default and now running version 8.4

any ideas what is going on?

Maximiliano · May 26, 2025

Hello,

Could you please share the contents of /etc/pve/corosync.conf and
```
pvesm status
```
while node 2 is online?

Could you please ping node 2 from all other nodes and tell us the biggest latency? Corosync is extremely sensitive to latency, see [1].

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_cluster_network_requirements

linhu · May 27, 2025

We have chnaged the netconfig, so now this is the config:
we will also add a net for backup and local server lan, maybe i can post the final config tomorow

Code:

auto lo
iface lo inet loopback

auto enp1s0f0
iface enp1s0f0 inet manual

auto enp1s0f1
iface enp1s0f1 inet manual

auto enp2s0f0
iface enp2s0f0 inet manual

auto enp2s0f1
iface enp2s0f1 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves enp1s0f0 enp1s0f1
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3
#adm

auto bond0.20
iface bond0.20 inet manual
#cluster

auto bond0.30
iface bond0.30 inet manual
#storage

auto bond0.3500
iface bond0.3500 inet manual
#wan

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.150/24
    gateway 192.168.1.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
#adm

auto vmbr1
iface vmbr1 inet static
    address 10.10.20.150/24
    bridge-ports bond0.20
    bridge-stp off
    bridge-fd 0
#cluster

auto vmbr2
iface vmbr2 inet static
    address 10.10.30.150/24
    bridge-ports bond0.30
    bridge-stp off
    bridge-fd 0
#storage

auto vmbr3
iface vmbr3 inet manual
    bridge-ports enp2s0f0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 3500
#wan

source /etc/network/interfaces.d/*

linhu · May 27, 2025

and this is the corosync:

Code:

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: prox1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.20.150
    ring1_addr: 192.168.1.150
  }
  node {
    name: prox3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.10.20.152
    ring1_addr: 192.168.1.152
  }
  node {
    name: prox4
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.10.20.153
    ring1_addr: 192.168.1.153
  }
  node {
    name: prox5
    nodeid: 5
    quorum_votes: 1
    ring0_addr: 10.10.20.154
    ring1_addr: 192.168.1.154
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: site
  config_version: 15
  ip_version: ipv4
  secauth: on
  version: 2

  interface {
    ringnumber: 0
    bindnetaddr: 10.10.20.0
    mcastport: 5405
    ttl: 1
  }

  interface {
    ringnumber: 1
    bindnetaddr: 192.168.1.0
    mcastport: 5406
    ttl: 1
  }
}

Maximiliano · May 28, 2025

Could you please show us the output of

```
corosync-cfgtool -n
```

and
```
pvesm status
```

linhu · May 28, 2025

prox2 will be added in the next days

Code:

root@prox1:~# corosync-cfgtool -n
Local node ID 1, transport knet
nodeid: 3 reachable
   LINK: 0 udp (10.10.20.150->10.10.20.152) enabled connected mtu: 1397
   LINK: 1 udp (192.168.1.150->192.168.1.152) enabled connected mtu: 1397

nodeid: 4 reachable
   LINK: 0 udp (10.10.20.150->10.10.20.153) enabled connected mtu: 1397
   LINK: 1 udp (192.168.1.150->192.168.1.153) enabled connected mtu: 1397

nodeid: 5 reachable
   LINK: 0 udp (10.10.20.150->10.10.20.154) enabled connected mtu: 1397
   LINK: 1 udp (192.168.1.150->192.168.1.154) enabled connected mtu: 1397

root@prox1:~#

root@prox1:~# pvesm status
Name             Type     Status           Total            Used       Available        %
backup1           pbs     active      1885686784       662208896      1223477888   35.12%
backup2           pbs     active      1885706112       658345856      1227360256   34.91%
backup3           pbs     active      2129629056       393100160      1736528896   18.46%
local             dir     active        40379648        21153840        17142380   52.39%
local-lvm     lvmthin   disabled               0               0               0      N/A
nas1              nfs     active      1876048896         4664320      1871384576    0.25%
nas2              nfs     active      1876132864         4664320      1871468544    0.25%
zpool1        zfspool     active      1885863936      1041320212       844543724   55.22%
root@prox1:~#

Maximiliano · May 28, 2025

We recommend to have at least one dedicated NIC for Corosync, this NIC should be given directly to Corosync without a bond. Corosync can do redundancy natively and bonds increase the latency, Corosync is extremely sensitive to latency. We recommend to give Corosync at least two different links.

In your example both Corosync networks are running on the same bond0, just with different vlans. So if the single network is saturated then you will run into a fence.

linhu · May 28, 2025

we have a extra nic free on all servers, we will add them as didicatet nic for corosync, thanks for you help

i will add them today and post the config later.

Maximiliano · May 28, 2025

OK, you can use the `corosync-cfgtool -n` command to determine whether all links are working OK (you don't want to find out the redundant link is not properly setup when a failover is needed). If the tool reports both "enabled" and "connected" on all links, and the links are on the interfaces you defined, for all nodes then it should be fine.

Please take a look at our documentation where it explains how to change corosync config [1]. Note that changes only take into effect if the config version value increased. Some specific changes might require to restart the corosync service manually, but this would appear in the system logs.

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_configuration

linhu · May 28, 2025

now the interface look like:

Code:

# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto enp1s0f0
iface enp1s0f0 inet manual

auto enp1s0f1
iface enp1s0f1 inet manual

auto enp2s0f0
iface enp2s0f0 inet manual

auto enp2s0f1
iface enp2s0f1 inet manual

auto enp2s0f1.20
iface enp2s0f1.20 inet manual
# VLAN 20 auf enp2s0f1

auto bond0
iface bond0 inet manual
    bond-slaves enp1s0f0 enp1s0f1
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3
#adm

auto bond0.30
iface bond0.30 inet manual
#storage

auto bond0.3500
iface bond0.3500 inet manual
#wan

auto bond0.40
iface bond0.40 inet manual
#backup

auto bond0.55
iface bond0.55 inet manual
#serverlan

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.150/24
    gateway 192.168.1.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
#adm

auto vmbr1
iface vmbr1 inet static
    address 10.10.20.150/24
    bridge-ports enp2s0f1.20
    bridge-stp off
    bridge-fd 0
#cluster

auto vmbr2
iface vmbr2 inet static
    address 10.10.30.150/24
    bridge-ports bond0.30
    bridge-stp off
    bridge-fd 0
#storage

auto vmbr3
iface vmbr3 inet manual
    bridge-ports enp2s0f0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 3500
#wan

auto vmbr4
iface vmbr4 inet static
    address 10.10.40.150/24
    bridge-ports bond0.40
    bridge-stp off
    bridge-fd 0
#backup

auto vmbr5
iface vmbr5 inet manual
    bridge-ports bond0.55
    bridge-stp off
    bridge-fd 0
#serverlan

source /etc/network/interfaces.d/*

linhu · May 28, 2025

is the ping statistic ok?

Code:

oot@prox1:~# ping 10.10.20.152
PING 10.10.20.152 (10.10.20.152) 56(84) bytes of data.
64 bytes from 10.10.20.152: icmp_seq=1 ttl=64 time=0.229 ms
64 bytes from 10.10.20.152: icmp_seq=2 ttl=64 time=0.169 ms
64 bytes from 10.10.20.152: icmp_seq=3 ttl=64 time=0.178 ms
^C
--- 10.10.20.152 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2042ms
rtt min/avg/max/mdev = 0.169/0.192/0.229/0.026 ms
root@prox1:~# ping 10.10.20.153
PING 10.10.20.153 (10.10.20.153) 56(84) bytes of data.
64 bytes from 10.10.20.153: icmp_seq=1 ttl=64 time=0.134 ms
64 bytes from 10.10.20.153: icmp_seq=2 ttl=64 time=0.160 ms
64 bytes from 10.10.20.153: icmp_seq=3 ttl=64 time=0.195 ms
^C
--- 10.10.20.153 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2069ms
rtt min/avg/max/mdev = 0.134/0.163/0.195/0.025 ms
root@prox1:~# ping 10.10.20.154
PING 10.10.20.154 (10.10.20.154) 56(84) bytes of data.
64 bytes from 10.10.20.154: icmp_seq=1 ttl=64 time=0.218 ms
64 bytes from 10.10.20.154: icmp_seq=2 ttl=64 time=0.208 ms
64 bytes from 10.10.20.154: icmp_seq=3 ttl=64 time=0.230 ms
^C
--- 10.10.20.154 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2045ms
rtt min/avg/max/mdev = 0.208/0.218/0.230/0.009 ms
root@prox1:~# ping 10.10.20.15

Maximiliano · May 28, 2025

It looks better. Please ensure only corosync runs on its first link (e.g. no management traffic, VM traffic, or VM backups).

If you are still using 10.10.20.150/24 as the primary Corosync link, then this subnet is handled by vmbr1, you could simplify it and define such ip directly on `enp2s0f1.20` instead of having a bridge on top of it. This could be done if there are no VMs using vmbr1.

linhu · May 28, 2025

hm ja ok this will also a option, we will add prox2 in the next week so i hope it will work because we have 5 more server we will add to this cluster

linhu · Jun 3, 2025

now we have added prox2 to the cluster, this is the ping with the cluster network

Code:

root@prox1:~# ping 10.10.20.151
PING 10.10.20.151 (10.10.20.151) 56(84) bytes of data.
64 bytes from 10.10.20.151: icmp_seq=1 ttl=64 time=0.216 ms
64 bytes from 10.10.20.151: icmp_seq=2 ttl=64 time=0.235 ms
^C
--- 10.10.20.151 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1046ms
rtt min/avg/max/mdev = 0.216/0.225/0.235/0.009 ms
root@prox1:~# ping 10.10.20.152
PING 10.10.20.152 (10.10.20.152) 56(84) bytes of data.
64 bytes from 10.10.20.152: icmp_seq=1 ttl=64 time=0.165 ms
64 bytes from 10.10.20.152: icmp_seq=2 ttl=64 time=0.168 ms
^C
--- 10.10.20.152 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1010ms
rtt min/avg/max/mdev = 0.165/0.166/0.168/0.001 ms
root@prox1:~# ping 10.10.20.153
PING 10.10.20.153 (10.10.20.153) 56(84) bytes of data.
64 bytes from 10.10.20.153: icmp_seq=1 ttl=64 time=0.207 ms
^C
--- 10.10.20.153 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.207/0.207/0.207/0.000 ms
root@prox1:~# ping 10.10.20.154
PING 10.10.20.154 (10.10.20.154) 56(84) bytes of data.
64 bytes from 10.10.20.154: icmp_seq=1 ttl=64 time=0.157 ms
64 bytes from 10.10.20.154: icmp_seq=2 ttl=64 time=0.157 ms
^C
--- 10.10.20.154 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1048ms
rtt min/avg/max/mdev = 0.157/0.157/0.157/0.000 ms
root@prox1:~# ping 10.10.20.150
PING 10.10.20.150 (10.10.20.150) 56(84) bytes of data.
64 bytes from 10.10.20.150: icmp_seq=1 ttl=64 time=0.034 ms
64 bytes from 10.10.20.150: icmp_seq=2 ttl=64 time=0.035 ms
^C
--- 10.10.20.150 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1060ms
rtt min/avg/max/mdev = 0.034/0.034/0.035/0.000 ms
root@prox1:~#

linhu · Jun 5, 2025

Now the cluster is offline again after we have added the prox2 to the cluster, we are curently find out why and will send logs asap.

Code:

The node 'prox2' in cluster 'site' failed and needs manual intervention.

The PVE HA manager will now fence 'prox2'.

Status: Try to fence node 'prox2'
Timestamp: 2025-06-05 06:45:03
Cluster Node Status:

    prox1: online [master]
    prox2: unknown
    prox3: online
    prox4: online
    prox5: online

HA Resources:
The following HA resources were running on the failed node and will be recovered to a healthy node if possible:

    vm:105 [prox2]: started

The other HA resources in this cluster are:

    vm:100 [prox1]: started
    vm:101 [prox1]: started
    vm:102 [prox3]: started
    vm:103 [prox3]: started
    vm:104 [prox1]: started
    vm:106 [prox4]: started
    vm:107 [prox4]: started
    vm:240 [prox5]: started

linhu · Jun 5, 2025

from prox1:

Code:

Jun 05 06:46:13 pve-ha-crm[3336]: service 'vm:105': state changed from 'recovery' to 'started'  (node = prox5)
Jun 05 06:46:13 pve-ha-crm[3336]: recover service 'vm:105' from fenced node 'prox2' to node 'prox5'
Jun 05 06:46:13 pve-ha-crm[3336]: service 'vm:105': state changed from 'fence' to 'recovery'
Jun 05 06:46:13 perl[3336]: notified via target `mail-to-root`
Jun 05 06:46:13 pve-ha-crm[3336]: node 'prox2': state changed from 'fence' => 'unknown'
Jun 05 06:46:13 pve-ha-crm[3336]: fencing: acknowledged - got agent lock for node 'prox2'
Jun 05 06:46:13 pve-ha-crm[3336]: successfully acquired lock 'ha_agent_prox2_lock'
Jun 05 06:45:13 perl[3336]: notified via target `mail-to-root`
Jun 05 06:45:13 pve-ha-crm[3336]: node 'prox2': state changed from 'unknown' => 'fence'
Jun 05 06:45:13 pve-ha-crm[3336]: service 'vm:105': state changed from 'started' to 'fence'
Jun 05 06:44:23 pve-ha-crm[3336]: node 'prox2': state changed from 'online' => 'unknown'

pve-cluster:

Code:

Jun 05 09:24:45 pmxcfs[2689]: [dcdb] notice: data verification successful
Jun 05 09:00:05 pmxcfs[2689]: [status] notice: received log
Jun 05 09:00:05 pmxcfs[2689]: [status] notice: received log
Jun 05 09:00:00 pmxcfs[2689]: [status] notice: received log
Jun 05 08:24:45 pmxcfs[2689]: [dcdb] notice: data verification successful
Jun 05 07:24:45 pmxcfs[2689]: [dcdb] notice: data verification successful
Jun 05 06:46:25 pmxcfs[2689]: [status] notice: received log
Jun 05 06:46:23 pmxcfs[2689]: [status] notice: received log
Jun 05 06:44:13 pmxcfs[2689]: [status] notice: dfsm_deliver_queue: queue length 49
Jun 05 06:44:13 pmxcfs[2689]: [status] notice: all data is up to date
Jun 05 06:44:13 pmxcfs[2689]: [status] notice: received all states
Jun 05 06:44:13 pmxcfs[2689]: [dcdb] notice: dfsm_deliver_queue: queue length 15
Jun 05 06:44:13 pmxcfs[2689]: [dcdb] notice: all data is up to date
Jun 05 06:44:13 pmxcfs[2689]: [dcdb] notice: sent all (0) updates
Jun 05 06:44:13 pmxcfs[2689]: [dcdb] notice: start sending inode updates
Jun 05 06:44:13 pmxcfs[2689]: [dcdb] notice: synced members: 1/2689, 3/4291, 4/2505, 5/2544
Jun 05 06:44:13 pmxcfs[2689]: [dcdb] notice: leader is 1/2689
Jun 05 06:44:13 pmxcfs[2689]: [dcdb] notice: received all states
Jun 05 06:44:13 pmxcfs[2689]: [status] notice: received sync request (epoch 1/2689/00000012)
Jun 05 06:44:13 pmxcfs[2689]: [dcdb] notice: received sync request (epoch 1/2689/00000013)
Jun 05 06:44:13 pmxcfs[2689]: [status] notice: starting data syncronisation
Jun 05 06:44:13 pmxcfs[2689]: [status] notice: members: 1/2689, 3/4291, 4/2505, 5/2544
Jun 05 06:44:13 pmxcfs[2689]: [dcdb] notice: cpg_send_message retried 1 times
Jun 05 06:44:12 pmxcfs[2689]: [dcdb] notice: starting data syncronisation
Jun 05 06:44:12 pmxcfs[2689]: [dcdb] notice: members: 1/2689, 3/4291, 4/2505, 5/2544
Jun 05 06:24:45 pmxcfs[2689]: [dcdb] notice: data verification successful
Jun 05 05:49:23 pmxcfs[2689]: [status] notice: received log
Jun 05 05:49:20 pmxcfs[2689]: [status] notice: received log
Jun 05 05:24:45 pmxcfs[2689]: [dcdb] notice: data verification successful
Jun 05 05:03:56 pmxcfs[2689]: [status] notice: received log
Jun 05 05:03:53 pmxcfs[2689]: [status] notice: received log
Jun 05 04:35:59 pmxcfs[2689]: [status] notice: received log
Jun 05 04:35:56 pmxcfs[2689]: [status] notice: received log
Jun 05 04:24:45 pmxcfs[2689]: [dcdb] notice: data verification successful
Jun 05 03:36:51 pmxcfs[2689]: [status] notice: received log
Jun 05 03:36:47 pmxcfs[2689]: [status] notice: received log
Jun 05 03:24:45 pmxcfs[2689]: [dcdb] notice: data verification successful
Jun 05 02:24:45 pmxcfs[2689]: [dcdb] notice: data verification successful
Jun 05 01:24:45 pmxcfs[2689]: [dcdb] notice: data verification successful
Jun 05 01:00:04 pmxcfs[2689]: [status] notice: received log
Jun 05 01:00:03 pmxcfs[2689]: [status] notice: received log
Jun 05 01:00:01 pmxcfs[2689]: [status] notice: received log
Jun 05 01:00:01 pmxcfs[2689]: [status] notice: received log
Jun 05 00:24:45 pmxcfs[2689]: [dcdb] notice: data verification successful

i have send a ping to prox2 with no result, so it was totaly offline, i have stopped the server with the hardware button.

linhu · Jun 5, 2025

after we have startet prox2 again it comes online without issues

Search

Search

VE Cluster with 5 servers - issue

linhu

Member