ZFS boot failure (GRUB issue)

jayg30 · Nov 8, 2017

I performed a proxmox install and selected 2 disks (out of 17 available disks) for a RAID1 install. After reboot it doesn't boot and I just get a black screen with a cursor. This seems to be a grub boot problem.

Hardware

Intel R2224GZ4GC4
Intel S2600GZ
Intel E5-2670 x 2
Hynix HMT31GR7CFR4A-H9 (8GB x 16)
LSI 9340-8i x 2 (SAS3008 flashed to IT mode)
HGST HUC106045CSS601 2.5" SAS x 16
Intel DC S3710 400GB x 1

I selected 2 of the HGST disks (just wanted to get installed and test).

This hardware has run ZFS without issues on illumos (smartos) and freebsd/freenas. However these tend to run the hypervisor or OS from removable media that loads into RAM.

Is there any hope of resolving this so I can boot from a ZFS pool?

Thanks.

jim.bond.9862 · Nov 8, 2017

first check if your MB can boot from SAS drives. it looks like your HGST are SAS drives and some MB can not boot from SAS drives.
second, identify first 2 in the bus. i.e. the ones connected to first ports. sometimes Proxmox have issues booting from drives that are not physically first in line. also check BIOS to make sure the drives you choose are set to bootable as well.

also, it looks like you have 2 controllers in this thing, SATA and SAS (that is the general specs).
if so, try connecting boot drives to the SATA ports instead. as said earlier Proxmox sometime have issues booting with SAS.

jayg30 · Nov 8, 2017

jim.bond.9862 said:
first check if your MB can boot from SAS drives. it looks like your HGST are SAS drives and some MB can not boot from SAS drives.
second, identify first 2 in the bus. i.e. the ones connected to first ports. sometimes Proxmox have issues booting from drives that are not physically first in line. also check BIOS to make sure the drives you choose are set to bootable as well.

also, it looks like you have 2 controllers in this thing, SATA and SAS (that is the general specs).
if so, try connecting boot drives to the SATA ports instead. as said earlier Proxmox sometime have issues booting with SAS.

Thanks.

I've booted from the SAS disks in the past running windows and ubuntu. But you reminded me, I should check in the LSI controller to make sure I have it enabled for booting. I don't believe I turned that off since I last tested but you never know, I might have after reading something about speeding up boot times. These machines are in a lab environment and when I first set them up I tested them with both Windows and Linux OS installs.

The SAS disks are connected the the backplane of the server, which all connect to the LSI controllers using SFF connections. The backplane and LSI controllers support both SAS and SATA (even mixed). The SATA SSD is connected to the onboard SATA port. I typically use the SATA SSD(s) and port for the ZFS SLOG device and/or L2ARC device.

I'm use gparted to clear out all the disks again. I'll remove all but the 2 SAS disks I'm installing to from the chassis and see what happens.

Homer · Nov 8, 2017

Another thought, try removing or disabling all drives that are NOT being used as boot disks. When setting a zpool as a boot device during installation, PVE defaults to identifying the disks used in the pool by their /dev/sdX format instead of their actual disk IDs. With all of your disks connected, it's kind of a coin toss as to which disks will be assigned to which letter.

jayg30 · Nov 8, 2017

Homer said:
Another thought, try removing or disabling all drives that are NOT being used as boot disks. When setting a zpool as a boot device during installation, PVE defaults to identifying the disks used in the pool by their /dev/sdX format instead of their actual disk IDs. With all of your disks connected, it's kind of a coin toss as to which disks will be assigned to which letter.

I'm in the process of that right now.

I removed all the disks except the 2 I'm going to install the rpool.

I checked the LSI Configuration Utility (CTRL+C) and the 2 controllers are set to allow BIOS and OS booting. I have the controller with the 2 disks I'm trying to install to set first in the boot order as well.

I'm now walking through the install and will try a RAID1 again.

Homer · Nov 8, 2017

jayg30 said:
I'm in the process of that right now.

I removed all the disks except the 2 I'm going to install the rpool.

I checked the LSI Configuration Utility (CTRL+C) and the 2 controllers are set to allow BIOS and OS booting. I have the controller with the 2 disks I'm trying to install to set first in the boot order as well.

I'm now walking through the install and will try a RAID1 again.

Sounds good. I would leave the other disks out for the time being. If you are able to boot into PVE, please login to the command line (via ssh or GUI console) and post the output of zpool status.

jayg30 · Nov 8, 2017

Congrats to you guys. It installed and booted.

Code:

root@pve01:~# zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0

errors: No known data errors

Now I wonder what will happen if I add the rest of the disks back.

Homer · Nov 8, 2017

Since your drives are assigned by letter, I'm guessing you may fail to boot again. Best practice for zpools is to assign drives by-id. I am currently experiencing a similar issue. Check this thread: https://forum.proxmox.com/threads/how-to-import-zfs-root-pool-by-id.24295/

jayg30 · Nov 8, 2017

Homer said:
Since your drives are assigned by letter, I'm guessing you may fail to boot again. Best practice for zpools is to assign drives by-id. I am currently experiencing a similar issue. Check this thread: https://forum.proxmox.com/threads/how-to-import-zfs-root-pool-by-id.24295/

Yea. I know freenas uses gptid and it that gets messed up things can go bad. Ultimately you don't want the disks been referenced like this sdX because it isn't persistent and can change.

I'm new to ZFS on Linux, but it appears they are aware of this issue per https://github.com/zfsonlinux/zfs/wiki/FAQ#selecting-dev-names-when-creating-a-pool. As you point out, clearly using sdx is NOT a smart approach for something as critical as a proxmox server.

jayg30 · Nov 8, 2017

A bit off topic to why I opened this thread, but I notice that after install you can use the boot pool (rpool) to also setup and run VM/containers. So I guess you COULD just make 1 pool with all the disks and use that for both the boot and vm storage.

I would think this would be a bad idea, no? I'm surprised proxmox even allows it. I would have expected it to hide away the rpool for storage of VM/containers and require the additional of another dedicated pool.

Homer · Nov 8, 2017

jayg30 said:
Yea. I know freenas uses gptid and it that gets messed up things can go bad. Ultimately you don't want the disks been referenced by physical location or slot in the machine. There should be a layer of abstraction.

I'm new to ZFS on Linux, but it appears they are aware of this issue per https://github.com/zfsonlinux/zfs/wiki/FAQ#selecting-dev-names-when-creating-a-pool. As you point out, clearly using sdx is NOT a smart approach for something as critical as a proxmox server.

Agreed. I wish PVE would create zpools by-id during installation. FWIW, I used the method explained by user "alchemycs" here: https://forum.proxmox.com/threads/how-to-import-zfs-root-pool-by-id.24295/

Did you try editing /etc/default/zfs yet?

jayg30 said:
A bit off topic to why I opened this thread, but I notice that after install you can use the boot pool (rpool) to also setup and run VM/containers. So I guess you COULD just make 1 pool with all the disks and use that for both the boot and vm storage.

I would think this would be a bad idea, no? I'm surprised proxmox even allows it. I would have expected it to hide away the rpool for storage of VM/containers and require the additional of another dedicated pool.

I am currently testing with all my VMs and containers in my rpool, but I do agree that best practice is probably to have your VM & container storage separate from your rpool.

jim.bond.9862 · Nov 8, 2017

I was under impression that this is a default behaviour and expectation for all distros using ZFS.
one big pool divided into subvolume as needed. and that separation of pools by usage is just an add-on, a good practice, rather than expected practices .

am I wrong?

jayg30 · Nov 8, 2017

jim.bond.9862 said:
I was under impression that this is a default behaviour and expectation for all distros using ZFS.
one big pool divided into subvolume as needed. and that separation of pools by usage is just an add-on, a good practice, rather than expected practices .

am I wrong?

Well, on BSD, illumos, and solaris systems I've used they've always separated the rpool and zpool. Keeping the ROOT pool separate seems pretty common. Something tells me there are legitimate reasons why the rpool should be separate.

FYI, here is an output of a freenas box with two pools (boot and data).

Code:

[root@freenas ~]$ zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h8m with 0 errors on Sat Oct  7 03:53:43 2017
config:

        NAME                                            STATE     READ WRITE CKSUM
        freenas-boot                                    ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/b92a7506-5735-11e5-b7f3-0cc47a335ac4  ONLINE       0     0     0
            gptid/31894a9d-cd92-11e4-89c2-0cc47a335ac4  ONLINE       0     0     0

errors: No known data errors

  pool: store
 state: ONLINE
  scan: scrub repaired 0 in 4h8m with 0 errors on Sun Oct 22 04:08:59 2017
config:

        NAME                                            STATE     READ WRITE CKSUM
        store                                           ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/1c383e96-d315-11e4-98c7-0cc47a335ac4  ONLINE       0     0     0
            gptid/90b50eaf-d315-11e4-98c7-0cc47a335ac4  ONLINE       0     0     0
            gptid/284a6fc3-d316-11e4-98c7-0cc47a335ac4  ONLINE       0     0     0
            gptid/c66e0391-d317-11e4-98c7-0cc47a335ac4  ONLINE       0     0     0
            gptid/14a02475-d318-11e4-98c7-0cc47a335ac4  ONLINE       0     0     0
            gptid/b276810c-3a12-11e5-a72f-0cc47a335ac4  ONLINE       0     0     0

jim.bond.9862 · Nov 8, 2017

jayg30 said:
Well, on BSD, illumos, and solaris systems I've used they've always separated the rpool and zpool. Keeping the ROOT pool separate seems pretty common. Something tells me there are legitimate reasons why the rpool should be separate.

ohh, Iagree, it's just everywhere I look it seams the setups always uses a single pool config. I always wondered why that is.
I have been sidestepping ZFS for long time because of that. until I found out that it was absolutely possible to separate the OS pool and data pool(s) the way I have been used to.

Search

Search

ZFS boot failure (GRUB issue)

jayg30

Member

jim.bond.9862

Renowned Member

jayg30

Member

Homer

New Member

jayg30

Member

Homer

New Member

jayg30

Member

Homer

New Member

jayg30

Member

jayg30

Member

Homer

New Member

jim.bond.9862

Renowned Member

jayg30

Member

jim.bond.9862

Renowned Member

We value your privacy