3 node cluster can't keep ceph up after a reboot

Safely2974 · Aug 10, 2023

Upgraded to this:

Code:

proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.104-1-pve: 5.15.104-2
ceph: 17.2.6-pve1
ceph-fuse: 17.2.6-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

All nodes are the same version. Running a fully licensed cluster.

On boot a node will seem to cycle ceph osds in and out of service, monitor/manager on one node is down and I can't re-add it. We can't keep the cephFS mounts up so we can't load any VMs.

Syslog seems to report that osd heartbeat is trying to use the ceph 'public' network - is this right? Shouldn't it use the ceph cluster network? Public heartbeats are failing, and I'm not sure why so I'll try to figure that out - but I don't want this to be on the public network.

So what to do? I've been cruising the forums to try and fix this, but I'm really scratching my head.

Here's my ceph config:

Code:

[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.128.16.0/24
     fsid = 2c88d85e-8a28-4cdc-800e-1979903a8d09
     mon_allow_pool_delete = true
     mon_host = 10.128.16.11 10.128.16.12 10.128.16.10
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.128.18.0/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.VAN3PM1]
     host = VAN3PM1
     mds standby for name = pve

[mds.VAN3PM2]
     host = VAN3PM2
     mds_standby_for_name = pve

[mds.VAN3PM3]
     host = VAN3PM3
     mds_standby_for_name = pve

[mon.VAN3PM1]
     public_addr = 10.128.16.10

[mon.VAN3PM2]
     public_addr = 10.128.16.11

[mon.VAN3PM3]
     public_addr = 10.128.16.12

And you can see that the osd's are trying to get heartbeat over the public network:

Code:

Aug 10 14:36:46 VAN3PM1 ceph-osd[38621]: 2023-08-10T14:36:46.066-0700 7fdf06b8c700 -1 osd.36 8808 heartbeat_check: no reply from 10.128.18.12:6810 osd.34 ever on either front or back, first ping sent 2023-08-10T14:33:17.768929-0700 (oldest deadline 2023-08-10T14:33:37.768929-0700)

Help!

Safely2974 · Aug 10, 2023

I see that some osd's have a layout like this:

Code:

Front Address
(Client & Monitor)
    
v2: 10.128.18.10:6854
v1: 10.128.18.10:6855
Heartbeat Front Address
    
v2: 10.128.18.10:6856
v1: 10.128.18.10:6857
Back Address
(OSD)
    
v2: 10.128.16.10:6819
v1: 10.128.16.10:6821
Heartbeat Back Address
    
v2: 10.128.16.10:6854
v1: 10.128.16.10:6855

So I guess they do use the public network as heartbeat, but they should also use the backend too - and we have good connectivity on the cluster network.

Safely2974 · Aug 11, 2023

I restarted the osd service for this particular osd and it didn't help:

Code:

root@VAN3PM1:/var/lib/ceph# systemctl status ceph-osd@36
● ceph-osd@36.service - Ceph object storage daemon osd.36
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Thu 2023-08-10 15:02:22 PDT; 3min 7s ago
    Process: 57975 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 36 (code=exited, status=0/SUCCESS)
   Main PID: 57981 (ceph-osd)
      Tasks: 76
     Memory: 871.6M
        CPU: 11.997s
     CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@36.service
             └─57981 /usr/bin/ceph-osd -f --cluster ceph --id 36 --setuser ceph --setgroup ceph

Aug 10 15:05:28 VAN3PM1 ceph-osd[57981]: 2023-08-10T15:05:28.320-0700 7f803c852700 -1 osd.36 9263 heartbeat_check: no reply from 10.128.18.12:6846 osd.33 ever on either front or back, first ping sent 2023-08-10T15:03:20.248597-0700 (>
Aug 10 15:05:28 VAN3PM1 ceph-osd[57981]: 2023-08-10T15:05:28.320-0700 7f803c852700 -1 osd.36 9263 heartbeat_check: no reply from 10.128.18.12:6812 osd.35 ever on either front or back, first ping sent 2023-08-10T15:03:13.247579-0700 (>
Aug 10 15:05:28 VAN3PM1 ceph-osd[57981]: 2023-08-10T15:05:28.320-0700 7f803c852700 -1 osd.36 9263 heartbeat_check: no reply from 10.128.18.12:6813 osd.40 ever on either front or back, first ping sent 2023-08-10T15:03:13.247579-0700 (>
Aug 10 15:05:28 VAN3PM1 ceph-osd[57981]: 2023-08-10T15:05:28.320-0700 7f803c852700 -1 osd.36 9263 heartbeat_check: no reply from 10.128.18.12:6840 osd.41 ever on either front or back, first ping sent 2023-08-10T15:03:13.247579-0700 (>
Aug 10 15:05:29 VAN3PM1 ceph-osd[57981]: 2023-08-10T15:05:29.332-0700 7f803c852700 -1 osd.36 9264 heartbeat_check: no reply from 10.128.18.11:6815 osd.21 ever on either front or back, first ping sent 2023-08-10T15:04:54.261511-0700 (>
Aug 10 15:05:29 VAN3PM1 ceph-osd[57981]: 2023-08-10T15:05:29.332-0700 7f803c852700 -1 osd.36 9264 heartbeat_check: no reply from 10.128.18.11:6809 osd.22 ever on either front or back, first ping sent 2023-08-10T15:04:48.960970-0700 (>
Aug 10 15:05:29 VAN3PM1 ceph-osd[57981]: 2023-08-10T15:05:29.332-0700 7f803c852700 -1 osd.36 9264 heartbeat_check: no reply from 10.128.18.12:6846 osd.33 ever on either front or back, first ping sent 2023-08-10T15:03:20.248597-0700 (>
Aug 10 15:05:29 VAN3PM1 ceph-osd[57981]: 2023-08-10T15:05:29.332-0700 7f803c852700 -1 osd.36 9264 heartbeat_check: no reply from 10.128.18.12:6812 osd.35 ever on either front or back, first ping sent 2023-08-10T15:03:13.247579-0700 (>
Aug 10 15:05:29 VAN3PM1 ceph-osd[57981]: 2023-08-10T15:05:29.332-0700 7f803c852700 -1 osd.36 9264 heartbeat_check: no reply from 10.128.18.12:6813 osd.40 ever on either front or back, first ping sent 2023-08-10T15:03:13.247579-0700 (>
Aug 10 15:05:29 VAN3PM1 ceph-osd[57981]: 2023-08-10T15:05:29.332-0700 7f803c852700 -1 osd.36 9264 heartbeat_check: no reply from 10.128.18.12:6840 osd.41 ever on either front or back, first ping sent 2023-08-10T15:03:13.247579-0700 (>
root@VAN3PM1:/var/lib/ceph#

This is the host that this osd resides on, and it is reporting heartbeat problems locally so this might not be a general networking problem.

Maximiliano · Aug 11, 2023

Hello,

Could please share the output of `systemctl status ceph-mon@NODE_NAME.service` on the node which is having problems with the monitor?

Also could you check the output `pvecm status` on all nodes and verify that they can see each other properly on the network?

Safely2974 · Aug 11, 2023

This is systemctl status ceph-mon of the problem node, it says it is successful but it isn't actually running. I ran this on the other nodes and the service is running.

Code:

root@VAN3PM1:/var/lib/ceph# systemctl status ceph-mon@VAN3PM1.service
● ceph-mon@VAN3PM1.service - Ceph cluster monitor daemon
     Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
             └─ceph-after-pve-cluster.conf
     Active: inactive (dead) since Fri 2023-08-11 09:35:18 PDT; 2min 34s ago
    Process: 142007 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id VAN3PM1 --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
   Main PID: 142007 (code=exited, status=0/SUCCESS)
        CPU: 98ms

Aug 11 09:35:18 VAN3PM1 systemd[1]: Started Ceph cluster monitor daemon.
Aug 11 09:35:18 VAN3PM1 systemd[1]: ceph-mon@VAN3PM1.service: Succeeded.

On the Proxmox GUI in ceph, the PM1 monitor is gone, and the manager is offline.

I ran pvecm stauts on all nodes, and got this output on each:

Code:

root@VAN3PM1:~# pvecm status
Cluster information
-------------------
Name:             FibreTel
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Aug 11 09:15:01 2023
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.2b9
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.128.17.10 (local)
0x00000002          1 10.128.17.11
0x00000003          1 10.128.17.12

The only difference between each device is the node id.

Safely2974 · Aug 11, 2023

This is the current monmap, of course it shows the other two nodes and not this one.

Code:

root@VAN3PM1:~# ceph mon getmap -o /tmp/monmap
got monmap epoch 6
root@VAN3PM1:~# monmaptool --print /tmp/monmap
monmaptool: monmap file /tmp/monmap
epoch 6
fsid 2c88d85e-8a28-4cdc-800e-1979903a8d09
last_changed 2023-08-10T14:01:10.242054-0700
created 2023-03-22T22:26:10.535430-0700
min_mon_release 17 (quincy)
election_strategy: 1
0: [v2:10.128.16.11:3300/0,v1:10.128.16.11:6789/0] mon.VAN3PM2
1: [v2:10.128.16.12:3300/0,v1:10.128.16.12:6789/0] mon.VAN3PM3

Safely2974 · Aug 11, 2023

This is the lines generated when I restart the ceph-mon process:

Code:

root@VAN3PM1:/var/lib/ceph# journalctl -b -u "ceph-mon@*.service"

Aug 11 10:34:28 VAN3PM1 systemd[1]: Started Ceph cluster monitor daemon.
Aug 11 10:34:28 VAN3PM1 systemd[1]: ceph-mon@VAN3PM1.service: Succeeded.

It says it succeeded, but it is definitely not running.

Code:

root@VAN3PM1:/var/lib/ceph# systemctl status ceph-mon@VAN3PM1.service
● ceph-mon@VAN3PM1.service - Ceph cluster monitor daemon
     Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
             └─ceph-after-pve-cluster.conf
     Active: inactive (dead) since Fri 2023-08-11 10:34:28 PDT; 2min 45s ago
    Process: 142676 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id VAN3PM1 --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
   Main PID: 142676 (code=exited, status=0/SUCCESS)
        CPU: 96ms

Aug 11 10:34:28 VAN3PM1 systemd[1]: Started Ceph cluster monitor daemon.
Aug 11 10:34:28 VAN3PM1 systemd[1]: ceph-mon@VAN3PM1.service: Succeeded.

Despite its reports of success, the process is dead.

Safely2974 · Aug 11, 2023

Each node has full reachability to the other nodes, but two of the nodes don't seem to be able to see themselves (in terms of proxmox stuff).

They have the grey question mark like this - but this is when I'm directly connected to node 1 and it cannot even see its own status.

Node 3 is the same behaviour.

Node 2 can see itself but the VMs aren't loading properly and it cannot load all the ceph volumes. Probably because they're all degraded. For example I tried to start a VM and even though it claims it is running, I cannot open its console.

This is run on the same node the VM is on, so presumably no networking (off the system) is required. The error is:

Code:

VM 108 qmp command 'set_password' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
TASK ERROR: Failed to run vncproxy.

fweber · Aug 14, 2023

Hi, please have a look at the Ceph documentation on public and cluster networks [1]: The cluster network is only meant for OSD heartbeat/replication/recovery traffic. Traffic from clients to OSDs, as well as traffic between Ceph monitors and MDSs are still routed via the public network.

However, in your /etc/ceph/ceph.conf, the monitor addresses (public_addr in the mon sections) belong to the cluster_network subnet. Normally, they should belong to the public_network subnet. This might be a cause of the Ceph connectivity issues. Have you changed the cluster_network or public_network recently? I would suggest to try changing the public network to 10.128.16.0/24 and see if the situation improves. I'd also suggest to double-check connectivity between nodes on the 10.128.16.0/24 network.

[1] https://docs.ceph.com/en/reef/rados/configuration/network-config-ref/

Safely2974 · Aug 17, 2023

Hi Friedrich that was good advice.

I had altered the public network about a month ago, and I guess *something* happened this week to cause it to trip up. I have changed the public network back to 10.128.16.0/24 and two of our nodes are back up. I would like to get this into an optimal configuration, but I'm still in fire-fighting mode and I really want to get this system usable again.

I still have some troubles.

One node still isn't running ceph, and it isn't responding to the web dashboard (it shows as a grey question mark). On boot it comes up green, but after a few minutes it disappears.

I have a couple VMs up and running on good ceph pools so that's a huge improvement.

I am hopeful that if I can get that problem node up some of my other troubles will be resolved, but I do have a litany of other errors:
* 2x cephFS are degraded
* 1 ceph pool is unusable (even though it reports as green) so any VMs that use this pool are down

I can't migrate disks off the problem ceph pool, the transfers never get past 0%.

3 node cluster can't keep ceph up after a reboot

Safely2974

New Member

Safely2974

New Member

Safely2974

New Member

Maximiliano

Proxmox Staff Member

Safely2974

New Member

Safely2974

New Member

Safely2974

New Member

Safely2974

New Member

fweber

Proxmox Staff Member

Safely2974

New Member

We value your privacy