Ceph storage goes offline when first server in cluster reboots

nkSupport · May 15, 2025

Hi there

I've a three equal nodes to run proxomx cluster each has 4 OSDs used in a ceph storage.
PRX01, PRX02 and PRX03

When it comes to an update I sometimes have to reboot a node especially in case a kernel update was involved.

So I set the node into maintenance mode to let the VMs migrate to another node first.
After all VMs have been migratated I doing a reboot of these node.
Once the node is up again, I disable maintenance mode for this node and wait until the VMs migrated back until I proceed with the other nodes on by on.

So far everything went fine.

Only in case I need to reboot the node PRX01something goes terribly wrong.
The whole ceph cluster becomes unavailabel until reboot has finisched.

Does anyone have an idea why?
What info from my configuration do you need to help me?

fabian · May 15, 2025

you'd need to provide more details about your setup and ideally log files..

how many monitors do you have? how is your pool set up (replication settings, anything you customized?)? how many OSDs are there, and how are they distributed across the nodes?

what does "ceph -s" say when the cluster works, and what does it say when it doesn't?

nkSupport · May 15, 2025

fabian said:
you'd need to provide more details about your setup and ideally log files..

how many monitors do you have? how is your pool set up (replication settings, anything you customized?)? how many OSDs are there, and how are they distributed across the nodes?

what does "ceph -s" say when the cluster works, and what does it say when it doesn't?

how many monitors do you have?
- I have three monitors, one on each node.

how is your pool set up (replication settings, anything you customized?

Pool#	1	2
Name	.mgr	VMPool
Size/min	3/2	3/2
of placement groups	1	128
opt # PGs	1	128
Autoscale Mode	on	on
Crusch rule	replicated_rule (0)	replicated_rule (0)
used [%]	44,45 MiB (0,00%)	9.41 TiB (49,60%)

how many OSDs are there, and how are they distributed across the nodes?
- there are 12 OSDs available, 4 on each PVE node.

what does "ceph -s" say when the cluster works?
...
cluster:
id: e514f756-xxxxxxxx-aa96-9304de459fd1
health: HEALTH_OK

services:
mon: 3 daemons, quorum prx02,prx03,prx01 (age 42h)
mgr: prx02(active, since 42h), standbys: prx03, prx01
osd: 12 osds: 12 up (since 42h), 12 in (since 9M)

data:
pools: 2 pools, 129 pgs
objects: 844.08k objects, 3.2 TiB
usage: 9.4 TiB used, 12 TiB / 21 TiB avail
pgs: 129 active+clean

io:
client: 120 KiB/s rd, 7.7 MiB/s wr, 24 op/s rd, 139 op/s wr
....

See also my ceph.conf:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.xxx.1.0/24
fsid = e514f756-b1ce-4429-aa96-9304de459fd1
mon_allow_pool_delete = true
mon_host = 10.xxx.1.20 10.xxx.1.30 10.xxx.1.10
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.xxx.1.0/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mon.prx01]
public_addr = 10.xxx.1.10

[mon.prx02]
public_addr = 10.xxx.1.20

[mon.prx03]
public_addr = 10.xxx.1.30

nkSupport · May 15, 2025

... and yes before you're asking - cluster network and public network share the same IP Net

fabian · May 15, 2025

in case you don't want to provoke another outage (understandable

) could you maybe provide the journal of one of the other nodes, starting slightly before you trigger the shutdown of the first node? anything particular about your network setup (going over a switch? full mesh? ... ?)?

nkSupport · May 15, 2025

forgive me my linux know how is rappidly growing but not as fast as I wish

How to get the requested journal data?
journalctl --since "25025-05-13 16:05" --until "2025-05-13 16:15" thats where the reboot of prx01 happened

But what output format and how to geht the export to be uploaded?

fabian · May 16, 2025

journalctl --since "2025-05-13 16:05" --until "2025-05-13 16:15" > log.txt and then you can attach the log.txt file here (you can download it using scp for example)

nkSupport · May 16, 2025

Thanks for the hint

I've collected the logs from all three notes.

PRX01 was rebooted, unfortunatly I didn't find a not in the journal of PRX02 or PRX03.
But as far as I understand the log from PRX01, ceph was shutdown for ALL Nodes during reboot.

fabian · May 16, 2025

okay, that looks fine so far.. could you also post "ceph osd crush dump" (should be the same on all nodes) and /var/log/ceph/ceph.log of nodes 2 and 3 for the problematic reboot? the lines in that file start with a timestamp in unix epoch format, you can convert that with date: date --date=@XXXXXX, e..g.

Code:

$ date --date=@1747381287
Fri May 16 09:41:27 AM CEST 2025

nkSupport · May 16, 2025

Sure - hope getting the right info out of the system....

fabian · May 16, 2025

those look okay as well AFAICT..

nkSupport said:
The whole ceph cluster becomes unavailabel until reboot has finisched.

how exactly did you determine this? the ceph logs only show 1 mon and 4 osds going down, but other than the PGs being undersized (which is expected and okay, they remained active!) ceph doesn't complain about anything as a result..

nkSupport · Jun 2, 2025

fabian said:
those look okay as well AFAICT..

how exactly did you determine this? the ceph logs only show 1 mon and 4 osds going down, but other than the PGs being undersized (which is expected and okay, they remained active!) ceph doesn't complain about anything as a result..

Hi Fabian,
sorry for the late answer and thanks for doublechecking.

I was logged in to prx02 using web consol
initiated a reboot of prx01
during the reboot I've expected OSDs from PRX01 will become offline
as far as I remember in the WebGUI the whole Ceph Montor became offline (all OSD are off) until PRX01 was online again.

One Thing I discoverd a few weeks ago was a typo in ceph configuration, whers cluster_network and public_network had been assigned to the ceph-nodes IP Address instead of the ceph-network address.
(f.e. cluster_network = 10.xxx.1.10/24 instead of 10.xxx.1.0/24)

But to be rest assured I'll try to reproduce this again.

Search

Search

Ceph storage goes offline when first server in cluster reboots

nkSupport

New Member

fabian

Proxmox Staff Member

nkSupport

New Member

nkSupport

New Member

fabian

Proxmox Staff Member

nkSupport

New Member

fabian

Proxmox Staff Member

nkSupport

New Member

Attachments

fabian

Proxmox Staff Member

nkSupport

New Member

Attachments

fabian

Proxmox Staff Member

nkSupport

New Member

We value your privacy