cluster performance degradation

Actually you might want to redo your whole setup if you want to use ceph

increase to 5 nodes each node with 2x 100G and a 2x 100G switch (25g and above is ok but 100g switch and transceiver is affordable now so why not?)
change all the disks to NVME enterprise - you can google many model and capacity


Breakup the HA
NODE 1 ZFS replication to NODE 2 (1mins)
NODE 3 keep it as emergency replacement node
Add 1 more Proxmox Backup Server for peace of mind

You might want to increase each NODE to 512GB Ram to accomodate ZFS ram requirement
I disabled a node and removed it from the cluster, and reinstalled proxmox and did a ZFS RAID, now I wanted to copy all the VMs to the new node but it doesn't let me access the VMs anymore, but wasn't ceph supposed to let me access the VMs even with 2 nodes? The info tells me it's under construction, but shouldn't I have had HA with 2 active nights?

I have to say that ceph ruined my Christmas..
Last edited:
Did you backup your VMs to the PBS? if so you can roll them back in to the newly created standalone ZFS Proxmox node
no, I didn't make any backups, I thought I could access it even with a turned off node, what is ceph rebuilding then? do I have to wait for it to finish?

It doesn't let me turn off the VMs or even copy or migrate them, but they're offline anyway
Last edited:
how many OSD per node?

From your screenshot i saw 5 were out and left 7 in only? if total is 12 ... it should be 4 OSD per node but why the uneven?
I removed an entire node with 4 OSDs, and then I removed just one from another node, I thought that with 5 OSDs out everything would work
why did you remove OSD from working node and no backup ... i am sorry you might experience data loss ...

If you need more OSD (disk) for your new ZFS raid you should go and buy and not doing work like this.

Since you have paid support with Proxmox, please contact them ASAP and stop doing any work without their clear instruction.
  • Like
Reactions: cave
but first I tried turning off a node, and everything worked normally, then I removed another OSD from an online node, and now it is rebuilding the datastore, I hope that in the end it will let me access the VMs to make the backup
for the record, ceph has finished rebuilding, now I can access the VMs, I'm backing up all the VMs to then restore them on the new node with ZFS, I have 2 VMs left in backup state but in reality that's not true, how do I stop them and put them in active state?
Take a screenshot and show what it is
for the record, ceph has finished rebuilding, now I can access the VMs, I'm backing up all the VMs to then restore them on the new node with ZFS, I have 2 VMs left in backup state but in reality that's not true, how do I stop them and put them in active state?