Hyperconverged Proxmox + Ceph Cluster - how to reconnect the right disk to nodes

lenovom720q

New Member
Oct 16, 2023
2
0
1
Hi,
i had created a 3 Nodes Proxmox cluster with 3 Lenovo M720Q (for simplicity i call the nodes N1,N2 and N3).
Then i had added 4 disks (D1, D2, D3 and D4).
All was working fine.
Then i move all the SFF PC and the disk from my desk to the rack but unfortunately i do not write down the association between the 4 disks and the 3 nodes..
Ok...don't laugh at me.. :)
The question now is..
Is there any way to understand what was the right association between the 4 disks and the 3 nodes in order to have the Ceph cluster up and running again..?
I had created only a few VM so if there is no way, I'll start again from scratch..but i'd prefer not to.. :-)
Thank you very much for the time you dedicate in answering to my question..
Paul.
 
CEPH is usually not stupid. Basically, any OSD can simply be reconnected anywhere, but certain assignments are missing. If you don't have a problem with data loss anyway, then clamp the disks as you think it could fit and start everything. Then you might have a few hits and if not, you know which OSD is what and can then attach it back to the right node. Normally, CEPH catches itself again then.
 
Hello sb-jw.
Thank you for replying to me.
Finally i found the right way to solve the problem.
I write it down as a memo for me and for other people that will be facing this particular issue.

1- I connected one disk at a time. I took note from this point of the GUI of the identifier assigned by CEPH (osd.0, osd.1...) and so on.

2023-11-22 00_19_09-pve02 - Proxmox Virtual Environment.png
2- I put a sticker with the identification printed on each disc so as not to confuse them (3 are identical).
3- I connected to the GUI and from a random node among the three I selected the node itself and then Ceph and Configuration.
In the configuration portion indicated by #Buckets I saw this series of information:

# buckets
host pve03 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
id -5 class ssd # do not change unnecessarily
# weight 0.46579
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.23289
item osd.3 weight 0.23289
}
host pve02 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
id -9 class ssd # do not change unnecessarily
# weight 0.93149
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.93149
}
host pve01 {
id -10 # do not change unnecessarily
id -11 class hdd # do not change unnecessarily
id -12 class ssd # do not change unnecessarily
# weight 1.81940
alg straw2
hash 0 # rjenkins1
item osd.2 weight 1.81940


From this I understood that:
- I had to connect osd.0 and osd.3 to the pve03 node
- I had to connect osd.1 to the pve02 node
- I had to connect osd.2 to the pve01 node

As soon as i connected the disks following this scheme the cluster was back up and running...

Now I've taken note of how I connected the disks so that the next time it doesn't happen again...

I will soon receive two Lenovo M920Xs to add to the cluster..so better safer than sorry.. :)

Good evening!

Paul.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!