Hi,
I have 3 proxmox servers with redundant network interfaces. All servers are connected to 2 different switches, to be prepared if a switch (or just a link) fails. Bonding is configure on both side (server and switch) with LACP. (osd are not defined at the moment)
If one link fails (e.g. I cut the connection to the switch), it takes a couple of seconds an the server is via ping available again. But the ceph-clusters does never return to quorum.
Why is an operating system fail over (tested with ping) possible, but ceph never gets healthy anymore?
My Configuration:
ceph.conf
/etc/network/interfaces (of ariel1, all IPs of ariel2 ends with 32, of ariel4 it is 34)
eth0, eth2 and eth4 are connected to switch-1
eth1, eth3 and eth5 are connected to switch-2
ping to all IPs in network 192.168.17. (31, 32, 34) from all servers are ok
ceph status
Now I pull out eth4 from ariel4 - waiting a couple of seconds and ping is available withour any errors, again
But ceph-cluster fails:
Is any configuration missing or is this a bug?
Please help.
Kind regards,
Harry
I have 3 proxmox servers with redundant network interfaces. All servers are connected to 2 different switches, to be prepared if a switch (or just a link) fails. Bonding is configure on both side (server and switch) with LACP. (osd are not defined at the moment)
If one link fails (e.g. I cut the connection to the switch), it takes a couple of seconds an the server is via ping available again. But the ceph-clusters does never return to quorum.
Why is an operating system fail over (tested with ping) possible, but ceph never gets healthy anymore?
My Configuration:
ceph.conf
Code:
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 192.168.17.0/24
fsid = 5070e036-8f6c-4795-a34d-9035472a628d
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 192.168.17.0/24
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
[mon.ariel1]
host = ariel1
mon addr = 192.168.17.31:6789
[mon.ariel4]
host = ariel4
mon addr = 192.168.17.34:6789
[mon.ariel2]
host = ariel2
mon addr = 192.168.17.32:6789
/etc/network/interfaces (of ariel1, all IPs of ariel2 ends with 32, of ariel4 it is 34)
eth0, eth2 and eth4 are connected to switch-1
eth1, eth3 and eth5 are connected to switch-2
Code:
auto lo
iface lo inet loopback
iface eth0 inet manual
iface eth1 inet manual
iface eth2 inet manual
iface eth3 inet manual
iface eth4 inet manual
iface eth5 inet manual
auto bond0
iface bond0 inet manual
slaves eth0 eth1
bond_miimon 100
bond_mode 802.3ad
bond_xmit_hash_policy layer3+4
#frontside
auto bond1
iface bond1 inet static
address 192.168.16.31
netmask 255.255.255.0
slaves eth2 eth3
bond_miimon 100
bond_mode 802.3ad
bond_xmit_hash_policy layer3+4
pre-up (ifconfig eth2 mtu 8996 && ifconfig eth3 mtu 8996)
mtu 8996
#corosync
auto bond2
iface bond2 inet static
address 192.168.17.31
netmask 255.255.255.0
slaves eth4 eth5
bond_miimon 100
bond_mode 802.3ad
bond_xmit_hash_policy layer3+4
pre-up (ifconfig eth4 mtu 8996 && ifconfig eth5 mtu 8996)
mtu 8996
#ceph
auto vmbr0
iface vmbr0 inet static
address 192.168.19.31
netmask 255.255.255.0
gateway 192.168.19.1
bridge_ports bond0
bridge_stp off
bridge_fd 0
ping to all IPs in network 192.168.17. (31, 32, 34) from all servers are ok
ceph status
Code:
cluster:
id: 5070e036-8f6c-4795-a34d-9035472a628d
health: HEALTH_OK
services:
mon: 3 daemons, quorum ariel1,ariel2,ariel4
mgr: ariel2(active), standbys: ariel4
osd: 0 osds: 0 up, 0 in
Now I pull out eth4 from ariel4 - waiting a couple of seconds and ping is available withour any errors, again
But ceph-cluster fails:
Code:
root@ariel1:~# ceph status
cluster:
id: 5070e036-8f6c-4795-a34d-9035472a628d
health: HEALTH_WARN
1/3 mons down, quorum ariel1,ariel2
services:
mon: 3 daemons, quorum ariel1,ariel2, out of quorum: ariel4
mgr: ariel2(active), standbys: ariel4
osd: 0 osds: 0 up, 0 in
Is any configuration missing or is this a bug?
Please help.
Kind regards,
Harry