Ceph freeze when a node reboots on Proxmox cluster

LeoDAVID · 2025-10-30T08:28:24+0100

Hello everyone,

I’m currently facing a rather strange issue on my Proxmox cluster, which uses Ceph for storage.

My infrastructure consists of 8 nodes, each equipped with 7 NVMe drives of 7.68 TB.
Each node therefore hosts 7 OSDs (one per drive), for a total of 56 OSDs across the cluster.

Each node is connected to a 40 Gbps core network, and I’ve configured several dedicated bonds and bridges for the following purposes:

Proxmox cluster communication
Ceph communication
Node management
Live migration

For virtual machine networking, I use an SDN zone in VLAN mode with dedicated VMNets.

Issue

Whenever a node reboots — either for maintenance or due to a crash — the Ceph cluster sometimes completely freezes for several minutes.

After some investigation, it appears this happens when one OSD becomes slow: Ceph reports “slow OPS”, and the entire cluster seems to hang.

It’s quite surprising that a single slow OSD (out of 56) can have such a severe impact on the whole production environment.
Once the affected OSD is restarted, performance gradually returns to normal, but the production impact remains significant.

For context, I recently changed the mClock profile from “balanced” to “high_client_ops” in an attempt to reduce latency.

Question

Has anyone experienced a similar issue — specifically, VMs freezing when a Ceph node reboots?
If so, what solutions or best practices did you implement to prevent this from happening again?

Thank you in advance for your help — this issue is a real challenge in my production environment.

Have a great day,
Léo

LeoDAVID · 2025-10-30T08:38:43+0100

Hello,

I’m reopening a ticket in the correct category to address the issue I’m experiencing with my Proxmox/Ceph cluster.

Have a great day,
Léo

Nemesiz · 2025-10-30T09:50:55+0100

In maintenance time I set noout norebalance norecover flags before OSD/server shutdown. It stops from moving data around others OSD.

In some Ceph talks was mentioned that single HDD can impact all cluster event HDD SMART will not show any evidence of coming HDD death. So you must track of disk activity.

As of SSD/NVMe sometimes device struggles to respond fast enough. Could it be the reason? Then I do single ( not multi parallel ) write test on SSD I see very low numbers.

gurubert · 2025-10-30T10:02:55+0100

How many MONs does the Ceph cluster have?

LeoDAVID · 2025-10-30T10:07:08+0100

gurubert said:
How many MONs does the Ceph cluster have?

Hello,

I have configured 5 monitors in my cluster:
4 monitors on the hypervisors and 1 monitor on the witness server.

Have a great day,
Léo

LeoDAVID · 2025-10-30T10:09:28+0100

Nemesiz said:
In maintenance time I set noout norebalance norecover flags before OSD/server shutdown. It stops from moving data around others OSD.

In some Ceph talks was mentioned that single HDD can impact all cluster event HDD SMART will not show any evidence of coming HDD death. So you must track of disk activity.

As of SSD/NVMe sometimes device struggles to respond fast enough. Could it be the reason? Then I do single ( not multi parallel ) write test on SSD I see very low numbers.

I’ve already performed write tests, and the results are quite good on my side.
As for the maintenance part, indeed, setting noout, norebalance, and norecover is one way to handle it, but it’s not a viable solution in production. I can’t afford to have my virtual machines crash when I lose one node out of eight.

Search

Search

Ceph freeze when a node reboots on Proxmox cluster

LeoDAVID

New Member

Issue

Question

LeoDAVID

New Member

Nemesiz

Renowned Member

gurubert

Distinguished Member

LeoDAVID

New Member

LeoDAVID

New Member

We value your privacy

Ceph freeze when a node reboots on Proxmox cluster

LeoDAVID

New Member

Issue​

Question​

LeoDAVID

New Member

Nemesiz

Renowned Member

gurubert

Distinguished Member

LeoDAVID

New Member

LeoDAVID

New Member

We value your privacy

Issue

Question