[SOLVED] won't boot after upgrade and ceph version mismatch

Ting

Member
Oct 19, 2021
94
4
13
56
Hi,

I have a 4 nodes cluster and ceph working well for few months now, one machine has a bit older hardware, cpu = x5670 (about 2010 cpu)

Last week, I upgrade to new Kernel 5.13.19-4, then I reboot, Proxmox failed to find boot drive. At the end, I have pull out all ceph osd disks, then reboot success, however, after Proxmox is up, I plug in all osd disks, but they shows up mismatch version, and this node can not connect to ceph network drive for VMs.

Here is my set up:
1. I have two raid cards, card #1 is a older raid card #1, has two virtual disks (raid 0 each), and zfs-1 to have proxmox system on them.
2. my 2nd raid card is a IT mode lsi card, and two ssd connected to it, and as two OSD ceph disk.

If I have two OSD plugged in, the system will not boot because it can not find zfs-1 disks. the first attachment is the output of "zpool status", it shows disk id for two zfs disks, that is why I do not understand why the system is confused with osd disks at boot.

Once it booted up, I plugged in OSD disks, ceph shows two mismatched version OSD, and this node lost connection to ceph drives (can not run VM on it)

please see screen shot for OSD screen.

My question:
1. why proxmox can not find bootable zfs-1 when osd disks are plugged in?
2. how to fix ceph OSD mismatch version issue?

many thanks.
 
On what kind of hardware are you running this?
Some old HP server by any chance?

Do you see the bootloader at all if the machine is not booting?

My suspicion is, that this is caused by some BIOS setting. For example, which controller should be used to look for boot devices or some other setting. Depending on the manufacturer and BIOS/model version, these settings can be a bit tricky to get right.
 
Thanks, aaron;

You are absolutely correct.

Before I thought my IT raid card is not recognized by bio, that is why I only had two disks (0 & 1, I thought 0&1 is disks on my first raid card) in boot order. This morning, I added disk #2&#3 and boot off disk#2 instead, then it is a success. I guess my motherboard recognize my IT raid card is #1 card, and other raid card is #2.

My ceph issue was because I forgot to connect cable back, it is resolved now.

Thank you for your reply.
 
  • Like
Reactions: aaron
ood to hear. :)

I took the liberty to mark the thread as solved. You can also do so yourself if you edit the first post and select the prefix from the drop down menu next to the title.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!