[SOLVED] What happens when one of the raided disks dies in a 3-server HA system?

Mayank006

Member
Dec 6, 2023
55
0
6
Indeed, Proxmox will not work when 2 out of 3 disks die in shared storage.

My question is what happens when 2 disks in servers are RAID 1(shared storage) and one of the disks from two servers die?
Will the Proxmox server use the raided disks or will it die?
 
Last edited:
I am not following your question.

What constitutes shared storage for you? chances are, all your questions are specific to your storage and have no bearing specific to proxmox. If your shared storage fails, it wont be available for your cluster to use.

If your question is with reference to ceph, you should not be operating with only 3 disks; its not a stable configuration. At minimum, 3-4 disks per node.
 
  • Like
Reactions: Mayank006
I am not following your question.

What constitutes shared storage for you? chances are, all your questions are specific to your storage and have no bearing specific to proxmox. If your shared storage fails, it wont be available for your cluster to use.

If your question is concerning ceph, you should not be operating with only 3 disks; its not a stable configuration. At minimum, 3-4 disks per node.
I have three servers with me, and all 3 servers have 1 SSD drive that I am using in the pool for shared storage (Ceph). I noticed that if 2 out of 3 drives die Proxmox stops working.

I have a separate OS storage drive (two drives on Raid 1) per server. Since OS drives are raided it improves system failure chances.
I want to keep separate OS drives and Ceph drives.

How can I reduce the chances of Ceph drive failure?
 
How can I reduce the chances of Ceph drive failure?
In principle it is not necessary to deal with the probability of this. You should set up the storage according to best practices and simply use 3 - 4 SSDs per node. If one OSD fails, just put a new one in and it's good.
 
  • Like
Reactions: Mayank006
How can I reduce the chances of Ceph drive failure?
If you mean disk- you cant. The whole point of fault tolerance/redundancy techniques is to provide continuous functionality in the face of inevitable component failure. If you mean storage- design a solution that is sustainable and resilient. in the case of ceph, that means having, AT MINIMUM, 3 Nodes with 4 OSDs each. in practice, 5 nodes and many more OSDs.

Where ceph differs from "RAID" is that ceph will automatically rebalance your data when the number of OSDs are increased (adding more drives) or reduced (when drives fail), assuring the data is always (or as close to) highly available- but in order for that to work you need to have enough resources in both dimensions (nodes and osds) available for the rebalance to target.
 
  • Like
Reactions: Mayank006
If you mean disk- you cant. The whole point of fault tolerance/redundancy techniques is to provide continuous functionality in the face of inevitable component failure. If you mean storage- design a solution that is sustainable and resilient. in the case of ceph, that means having, AT MINIMUM, 3 Nodes with 4 OSDs each. in practice, 5 nodes and many more OSDs.

Where ceph differs from "RAID" is that ceph will automatically rebalance your data when the number of OSDs are increased (adding more drives) or reduced (when drives fail), assuring the data is always (or as close to) highly available- but in order for that to work you need to have enough resources in both dimensions (nodes and osds) available for the rebalance to target.
1 physical disk = 1 osd right?
To increase the number of OSDs, what if I replace SSDs with HDDs and make a solution of 2 HDDs for Ceph per server?
This may decrease the number of Ceph OSD failures. Does using HDD instead of SDDs can impact the efficiency of VM migration and working speed?
 
1 physical disk = 1 osd right?
yes.
To increase the number of OSDs, what if I replace SSDs with HDDs and make a solution of 2 HDDs for Ceph per server?
not sure what you're asking. "replacing" is not "increasing"
Does using HDD instead of SDDs can impact the efficiency of VM migration and working speed?
A WHOLE LOT. to give you an idea just how much, modern HDDs provide ~100-150 iops. SSDs provide 10000-500000 iops.
 
  • Like
Reactions: Mayank006

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!