Ceph not working / showing HEALTH_WARN

ptmtobi

New Member
Oct 15, 2025
10
1
3
Hi, I'm fairly new to Proxmox and only just set up my first actual cluster made of 3 PCs/nodes.

Everything seemed to be working fine and showed up correctly and I started to set up Ceph for a shared storage.
I gave all 3 PCs an extra physical SSD for the shared storage additional to the Proxmox NVMe and installed Ceph on all 3 nodes.
Then I created the 3 OSDs with the SSDs, one per node. They all show green and up/in.
Then I created a pool, which immediately showed up below all 3 nodes on the left after refreshing.
After that I created 3 Monitors, one for each pool and they all show status running.

I was expecting it to be done and working but when I clicked on the Ceph overview, it showed status HEALTH_WARN.
The summary has two reports:
- Reduced data availability: 128 pgs inactive
- 3 slow ops, oldest one blocked for 2337 sec, osd.1 has slow ops

Before I wrote this post, the second one said 2 slow ops.
(update edit: it is now at 4)

1760707643437.png
1760707842394.png

Does anyone have an idea what the issue could be? Any help appreciated and feel free to ask if more details are needed for a troubleshoot!
 
Last edited:
Usually when building a Ceph cluster one starts with the MONs and not the OSDs.
To be honest I just followed a YouTube tutorial on the setup and he started with the OSDs so I just did it the same way lol.
What's the difference in doing it in a different order? Can that cause issues?
 
Did you not use the PVE installation wizard? The monitor "maintains a master copy of the cluster map." I would imagine doing it the other way could result in OSDs not connected to the system...?

re: SSDs, yes it can make a big difference, search the forum for enterprise (drives with PLP) vs consumer SSDs.
 
  • Like
Reactions: gurubert
post the output of
ceph health detail
ceph osd tree
ceph osd df
/etc/pve/ceph.conf
Sorry for the late response, I couldn't work on my cluster until today:
1761126160371.png

1761126571234.png

1761124461073.png
 

Attachments

  • Screenshot 2025-10-22 091716.png
    Screenshot 2025-10-22 091716.png
    172.7 KB · Views: 9
  • 1761124632711.png
    1761124632711.png
    67.7 KB · Views: 5
Last edited:
Did you not use the PVE installation wizard? The monitor "maintains a master copy of the cluster map." I would imagine doing it the other way could result in OSDs not connected to the system...?

re: SSDs, yes it can make a big difference, search the forum for enterprise (drives with PLP) vs consumer SSDs.
I did use the installation wizard (clicked on ceph on all 3 nodes where the install automatically popped up and went through the installation process). That's the first thing I did before creating any of the other ceph components.

I can try to remove all the OSDs, monitors and the pool again and setting it up in a different order (starting with the monitors), do you think that could help?
 
-----UPDATE-----

I was able to contact a Proxmox & Ceph professional and together we were able to narrow it down a lot.
I seem to have invented a new issue and we still haven't fixed it or fully figured out but this is what we know:

The hardware seems to be completely fine and he also approved of my overall setup, that shouldn't be the cause for the issue. The SSDs aren't fast but definetly fast enough to theoretically run the thing, the latency in the individual steps of ops is rather low. Though what we did find was that for some reason one of the steps took really long and we're now trying to figure out why it gets stuck and how to fix it.
This issue also only occured on osd.1 for some reason and even after deleting osd.1 and switching the SSD, it hasn't resolved. It just doesn't replicate data correctly.

He searched for that specific issue in every possible forum and only found one similar case with slight differences from 8 years ago. It seems I've discovered some kind of new bug or issue.
We will try to further narrow it down and I'll update.
 
The wizard creates a monitor so I’m a bit confused about the ordering.

Do you have the node firewall enabled and if so is Ceph allowed?
My bad, I just checked my documentation and yes, the node 1 monitor was there automatically from the base installation. i just added the other 2 for node 2 and 3 after creating the OSDs.
There is no firewall yet.
 
  • Like
Reactions: gurubert
-----UPDATE-----

I was able to contact a Proxmox & Ceph professional and together we were able to narrow it down a lot.
I seem to have invented a new issue and we still haven't fixed it or fully figured out but this is what we know:

The hardware seems to be completely fine and he also approved of my overall setup, that shouldn't be the cause for the issue. The SSDs aren't fast but definetly fast enough to theoretically run the thing, the latency in the individual steps of ops is rather low. Though what we did find was that for some reason one of the steps took really long and we're now trying to figure out why it gets stuck and how to fix it.
This issue also only occured on osd.1 for some reason and even after deleting osd.1 and switching the SSD, it hasn't resolved. It just doesn't replicate data correctly.

He searched for that specific issue in every possible forum and only found one similar case with slight differences from 8 years ago. It seems I've discovered some kind of new bug or issue.
We will try to further narrow it down and I'll update.
It looks like you're having an issue with the SATA controller or the cable on the host for osd.1. You see in your first image that oldest block activity was blocked for 2400 sec (40 minutes), i.e. one disk is effectively not working, but is considered in the cluster, forcing all write activities to wait for it.

You also don't need 1 monitor per pool. 3 monitors is fine but perhaps not necessary for such a small cluster.

Take out the failing OSD from the cluster and see if things work properly. There will be a health warning, because you only have two copies instead of three, but it's allowed according to your rule.

You should also try to do some testing of the disk/path directly (without ceph), i.e. install it, put ext4 on it, run some big fio tests, and try to replace the cable, the sata port etc. to confirm whether the disk is working properly as is.
 
  • Like
Reactions: ptmtobi
3 monitors is fine but perhaps not necessary for such a small cluster.
Well two is exceptionally(!) bad because then there is no voting with a majority-concept. None may fail --> the risk for a failure is more than double as high as with a single one!

One is bad because... if it fails you have a problem --> single-point-of-failure for the whole cluster...

From my point of view three is the minimum.


https://docs.ceph.com/en/reef/rados/operations/add-or-rm-mons/ :
"It is best to run an odd number of monitors. This is because a cluster that is running an odd number of monitors is more resilient than a cluster running aneven number. For example, in a two-monitor deployment, no failures can be tolerated... "
 
  • Like
Reactions: gurubert
osd.1 appears to be unreachable; in case this is a networking issue, since you are paranoid about posting your actual IP addresses (or at least within their respective subnets) I cant really help you.

That said, check the host of osd.1 to see if there are in-host issues (eg problems with the drive.)

You also don't need 1 monitor per pool. 3 monitors is fine but perhaps not necessary for such a small cluster.
the minimum number of monitors are dependent on cluster size, but not the way you think. 1 monitor is not suitable for a cluster (Spof) and 2 are not suitable because its an even number (no tie breaker.) That makes 3 the ABSOLUTE MINIMUM, although larger clusters (16+ nodes) could benefit from 2 more. type and number of pools don't make any difference- monitors serve the cluster, not a specific pool. (@UdoB stated much the same I see ;)

Speaking of cluster size- what is the purpose for this one? with only one OSD per host, and three total hosts, this cluster will perform poorly under the best possible conditions (only 1 write possible at a time) and would be very flakey. you really should read @UdoB link in post #2.
 
  • Like
Reactions: gurubert and UdoB
osd.1 appears to be unreachable; in case this is a networking issue, since you are paranoid about posting your actual IP addresses (or at least within their respective subnets) I cant really help you.

That said, check the host of osd.1 to see if there are in-host issues (eg problems with the drive.)


the minimum number of monitors are dependent on cluster size, but not the way you think. 1 monitor is not suitable for a cluster (Spof) and 2 are not suitable because its an even number (no tie breaker.) That makes 3 the ABSOLUTE MINIMUM, although larger clusters (16+ nodes) could benefit from 2 more. type and number of pools don't make any difference- monitors serve the cluster, not a specific pool. (@UdoB stated much the same I see ;)

Speaking of cluster size- what is the purpose for this one? with only one OSD per host, and three total hosts, this cluster will perform poorly under the best possible conditions (only 1 write possible at a time) and would be very flakey. you really should read @UdoB link in post #2.
I'm quite certain it's not a network issue. Those IPs are not from a private setup, this is a test setup in a huge company with global unicast IPs so that it can actually interact with other devices in the network.

The test setup is specifically for cases like this. I want to know possible issues and get used to Proxmox before starting actual stuff. The performance doesn't matter at all and technically it wouldn't matter if the whole setup got destroyed, it's nothing important. For now I'm an apprentice but my practical final exam will likely include Proxmox so until then I should somehow get this running so that something like this doesn't happen in the exam or after.

The real setup would then include actual high budget dedicated Proxmox servers with more OSDs.
 
It looks like you're having an issue with the SATA controller or the cable on the host for osd.1. You see in your first image that oldest block activity was blocked for 2400 sec (40 minutes), i.e. one disk is effectively not working, but is considered in the cluster, forcing all write activities to wait for it.

You also don't need 1 monitor per pool. 3 monitors is fine but perhaps not necessary for such a small cluster.

Take out the failing OSD from the cluster and see if things work properly. There will be a health warning, because you only have two copies instead of three, but it's allowed according to your rule.

You should also try to do some testing of the disk/path directly (without ceph), i.e. install it, put ext4 on it, run some big fio tests, and try to replace the cable, the sata port etc. to confirm whether the disk is working properly as is.
I'll leave it at 3 Monitors as the others said but your first point might've actually been the issue.

I still don't fully understand what exactly the problem was because I took the drive with the osd.1 and plugged it into a different sata port where the system didn't even show the drive at all. Like lsblk only showed the Proxmox NVMe but not the SSD as sda. When I came back today, it just randomly showed it and due to some other issues I had to delete and reinstall the whole Ceph but now it works without any issues.

@UdoB @gurubert @SteveITS @alexskysilk Thank you all for your help as well, learned some important things here!