PVE 4.2 ZFS RAID-10 or RAID-Z2 grub failure

DynFi User

Renowned Member
Apr 18, 2016
152
17
83
49
dynfi.com
Hello folks,

We are fighting to install PVE 4.2 on a brand new Supermicro with HBA and 5 SAS disks installed on It.
Configuration is quite straightforward and all disks are seen at install time.

First goal was to configure PVE on 4 disks in RAID 10 ZFS : I selected the four target disks and started boot and got a grub error "error: no such device: id_of_device"

Second goal was to configure PVE on 5 disks in RAID-Z2 : I selected five devices and started boot and got the same grub error "error: no such device: id_of_device"

I am not really sure what to do from there on ?
 

Attachments

  • IMG_2201.JPG
    IMG_2201.JPG
    84.3 KB · Views: 27
We have re-installed with a basic install using ext-4 on the first disk detected by our HBA and (beside a slower install time) everything went as expected !

Server has booted.

So I can confirm that there is a bug in the installer when It comes to ZFS in RAID 10 or ZFS in RAID-Z2

Not really sure what to do from there… ?
 
I have re-installed using a simple ZFS RAID-0 on one disk only (same disk as the one used for ext4 boot) and I had yet another error…

With a "cannot import 'rpool'"
 

Attachments

  • IMG_2202.jpg
    IMG_2202.jpg
    109.2 KB · Views: 22
the first error means that grub does not find /boot (or rather, it does not find the grub files there ;)). this usually indicates that the bios/uefi (or the raid controller?) did not pass all member devices of the zpool to grub (as indicated by the ls output in the rescue shell, which only shows one hard disk instead of four/five).

the second error is simply because you have disks belonging to multiple zpools named "rpool", so the initramfs does not know which of those it should import. remove/reformat the leftover disks from your previous attempt and it will work.

note that those two errors look superficially similar, but occur at very different stages in the boot process and have completely unrelated causes. the second one is easily recoverable (import by numerical ID as displayed by "zpool import" once, complete the boot, clear the labels / reformat / .. the left over disks). the first one depends very much on the root cause. it might be fixable with a configuration change in bios/uefi or raid controller..
 
Thanks for this answer.

So the root cause of the main scenario (RAIDZ-2 or 10) is related to the disks not being all correctly displayed by grub after install ?
And you suspect a BIOS setting to be the root cause.

Strange thing is that disks are being correctly displayed when I boot on the install key… ??
If there was a problem in the BIOS settings, shouldn't It prevent the disks from being displayed even at install time ?

And after the install, It looks like there is only one disk being displayed to grub …

There are very little (if none) options for the RAID : it is an HBA not a full featured RAID card.
I'll check options for the BIOS, but I already went through that.


If you have any other clue or ways to solve this: most welcome !


Thanks.
 
(Since you also have a support ticket open for this issue, it might make sense to move the conversation either here or there to prevent confusion).

The installer sees the disks because at that point we already have a full Linux kernel running. So yes, the problem is almost certainly the BIOS or raid controller not presenting all the disks as boot devices - grub does not initialize all the devices itself (like the Linux kernel later in the boot process or when running a LiveCD / installer). This setting is unfortunately highly vendor/hardware specific.
 
This day we had and similar problem with an HP ML10 Server. This little nice device would not boot with PVE. I was able to install PVE on it with ZFS, but after install is finished the bios does not see the HDDs anymore. With EXT i was able to boot. The only way was to install PVE on USBstick (install was only possible with 4.1 then upgrade to 4.2, and much problems to boot... delay... systemdservicesdelay....) Then create extra vmpool with Raid1. This works fine.

My problem was that it was not possible to deactivate UEFI on the bios.

BTW: PVE 4.1 on USBstick works out of the box (only rootdelay=10) without problems.
 
[SOLVED]

Ok - after a series of hassles :

  • BIOS update
  • UEFI update
  • SAS controller update

I have found out what the problem really was all about.

You need to tune your Controller SAS-2308 in order to set it with the maximum possible device to scan for the device.
The parameter is : "Maximum INT 13 devices for this adapter"

  • Default value is set to "1"
  • You need to tune it to "24"

What this means is that your controller will look for all 24 disks while booting (if you have less than 24 devices, no problem).
Problem was that if you set this value to 1, it will only look for the first disk in your controller !
And with a RAID-Z2 or any other setting, data will be spread on all your "n" disks present in your pool.

I am wondering why they set this value to 1 when you set to default value.
Probably because they enjoy you spend one day wondering what's going on ;-)


Thanks so much to Proxmox team who helped me figure out better where all this was coming from !
 
[SOLVED]

Ok - after a series of hassles :

  • BIOS update
  • UEFI update
  • SAS controller update

I have found out what the problem really was all about.

You need to tune your Controller SAS-2308 in order to set it with the maximum possible device to scan for the device.
The parameter is : "Maximum INT 13 devices for this adapter"

  • Default value is set to "1"
  • You need to tune it to "24"

What this means is that your controller will look for all 24 disks while booting (if you have less than 24 devices, no problem).
Problem was that if you set this value to 1, it will only look for the first disk in your controller !
And with a RAID-Z2 or any other setting, data will be spread on all your "n" disks present in your pool.

I am wondering why they set this value to 1 when you set to default value.
Probably because they enjoy you spend one day wondering what's going on ;-)


Thanks so much to Proxmox team who helped me figure out better where all this was coming from !
I'm LSI 2308, too - IR, I've painted 2038 - It mode, install RAID10, raidz1, and your problem is the same. How to optimize LSI 2308 to find 24 hard disks. thank you very much!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!