Proxmox 6.2-10 CEPH Cluster reboots if one node is shutdown or rebooting

dan.ger

Well-Known Member
May 13, 2019
83
7
48
Hello,

after update the proxmox nodes few days ago with latest CEPH-Version, something strange happens. If a node is rebooted, all HA cluster nodes are rebooted.

In the log I saw something like that on Node that is not rebooted:

Code:
Jul 23 13:36:28 hyperx-01 ceph-mon[2793]: 2020-07-23 13:36:28.792 7fb71a14c700 -1 mon.pve-01@0(electing) e5 failed to get devid for : fallback method has serial ''but no model
Jul 23 13:36:32 hyperx-01 pvestatd[2980]: got timeout
Jul 23 13:36:32 pve-01 ceph-mon[2793]: 2020-07-23 13:36:32.528 7fb71a14c700 -1 mon.pve-01@0(electing) e5 get_health_metrics reporting 6 slow ops, oldest is auth(proto 0 34 bytes epoch 0)
Jul 23 13:36:32 pve-01 pvestatd[2980]: status update time (5.172 seconds)
Jul 23 13:36:35 pve-01 ceph-osd[2862]: 2020-07-23 13:36:35.052 7f90f53bec80 -1 unable to find any IPv4 address in networks '10.10.10.0/24' interfaces ''
Jul 23 13:36:35 pve-01 ceph-osd[2858]: 2020-07-23 13:36:35.052 7fe504797c80 -1 unable to find any IPv4 address in networks '10.10.10.0/24' interfaces ''
Jul 23 13:36:35 pve-01 ceph-osd[2862]: 2020-07-23 13:36:35.052 7f90f53bec80 -1 unable to find any IPv4 address in networks '10.10.10.0/24' interfaces ''
Jul 23 13:36:35 pve-01 ceph-osd[2858]: 2020-07-23 13:36:35.052 7fe504797c80 -1 unable to find any IPv4 address in networks '10.10.10.0/24' interfaces ''
Jul 23 13:36:35 pve-01 ceph-osd[2864]: 2020-07-23 13:36:35.060 7f5bc1187c80 -1 unable to find any IPv4 address in networks '10.10.10.0/24' interfaces ''
Jul 23 13:36:35 pve-01 ceph-osd[2864]: 2020-07-23 13:36:35.064 7f5bc1187c80 -1 unable to find any IPv4 address in networks '10.10.10.0/24' interfaces ''
Jul 23 13:36:35 pve-01 ceph-osd[2847]: 2020-07-23 13:36:35.076 7f8466cbfc80 -1 unable to find any IPv4 address in networks '10.10.10.0/24' interfaces ''
Jul 23 13:36:35 pve-01 ceph-osd[2847]: 2020-07-23 13:36:35.076 7f8466cbfc80 -1 unable to find any IPv4 address in networks '10.10.10.0/24' interfaces ''

Nodes are:
pve-01: 10.10.10.1
pve-02: 10.10.10.2
pve-03: 10.10.10.3

Each node has 2 Bonds with 10GBe Nics (MTU 9000), one for CEPH-Cluster and one for default network. I can ping each host within the cluster from every node in the cluster.

Any suggestions?

So s
 
What is the network config of pve-01? And how does the ceph.conf look like?
 
Here is the network config of pve-01 (same for pve-02/pve-03)

Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
    mtu 9000

auto eth1
iface eth1 inet manual
    mtu 9000

auto eth2
iface eth2 inet manual
    mtu 9000

auto eth3
iface eth3 inet manual
    mtu 9000

auto bond0
iface bond0 inet manual
    bond-slaves eth0 eth1
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3
    mtu 9000

auto bond1
iface bond1 inet static
    address 10.10.10.1/24
    bond-slaves eth2 eth3
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3
    mtu 9000

auto vmbr0
iface vmbr0 inet static
    address xx.xxx.xxx.xxx/27
    gateway xx.xxx.xxx.xxx
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    mtu 9000
#Wan

ceph.config
Code:
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 10.10.10.0/24
     fsid = 15ebfe6d-db76-4ed3-bf14-3a31243ca94e
     mon allow pool delete = true
     osd journal size = 5120
     osd pool default min size = 1
     osd pool default size = 2
     public network = 10.10.10.0/24
     mon_host = 10.10.10.1 10.10.10.2 10.10.10.3

[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring

[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve-01]
     host = pve-01
     mon addr = 10.10.10.1:6789

[mon.pve-02]
     host = pve-02
     mon addr = 10.10.10.2:6789

[mon.pve-03]
     host = pve-03
     mon addr = 10.10.10.3:6789
 
iface bond1 inet static address 10.10.10.1/24
Try to set the netmask in the old style, as a separate config line. Or it doesn't see the bond interface.
 
Mhhh, really? it is working until the upgrade for mor than 1 year. i do not think it is really the matter, cause proxmox use debian buster under the hood.
 
It's the ceph-osd service that is complaining.
 
Changing from

Code:
iface bond1 inet static
    address 10.10.10.1/24

to
Code:
iface bond1 inet static
    address 10.10.10.1
    netmask 255.255.255.0

takes no effect, all nodes reboot, same as before. Any other suggestions?
 
takes no effect, all nodes reboot, same as before. Any other suggestions?
ATM, I would think that it could maybe be the bond interface, since the interface part in the log message is empty.
 
Mmmh, I can repoduce it, reboot pve-01 and pve-02 is not a problem everything just fine but pve-03 causes the issue rebooting pve-01 and pve-02
 
Mmmh, I can repoduce it, reboot pve-01 and pve-02 is not a problem everything just fine but pve-03 causes the issue rebooting pve-01 and pve-02
Do you reboot both at the same time?
 
The nodes connected to an 10GBe switch with configured trunks. So the cable shouldn't be the problem. Ceph is working without issues only monitors seems to restart nodes.
 
Do you reboot both at the same time?


No, first restart pve-03 and then wait round about 30 seconds, then pve-01 and pve-02 restart automatically... It is a little bit anoying and drives me round the bent... cause this is a productive system...

If you only restart pve-01 everything ist fine, all other nodes are online. Same happens if you restart pve-02.
 
Last edited:
If those nodes get fenced, then you should be able to get from the logs (syslog/messages) why that has been the case. I'd assume some issue with your ceph/osd config. I was actually wondering about this config:

Code:
osd pool default min size = 1
osd pool default size = 2

when you do have a 3-node setup… since for a 3-node cluster it's probably more like this:

Code:
osd pool default min size = 2
osd pool default size = 3

This way, you can loose one Ceph node and still run the Ceph cluster in an degraded state, once you loose the 2nd node, the Ceph pool will render read-only.
 
No, first restart pve-03 and then wait round about 30 seconds, then pve-01 and pve-02 restart automatically... It is a little bit anoying and drives me round the bent... cause this is a productive system...
AS @budy said, the nodes get fenced. Best check the syslog and remove all HA resources (no fencing). This seems to call for a network issue. Would fit the ceph message as well.
 
I found the issue pve-03 has an MTU of 1500 and all others are set to 9000. So configuring pve-03 mtu to 9000 and it works like before.
 
  • Like
Reactions: Alwin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!