[solved] rpool silently fails to boot on 1 of 2 disks.

Tommmii

Well-Known Member
Jun 11, 2019
66
12
48
53
edit : solved, reinstalled the OS. The "how was this possible" remains.

Hi all,

I have an rpool, created at Proxmox installation time. Install was about 4 weeks ago from the Proxmox iso.
The pool is a 2 disk mirror.
After a powercut, the server won't boot, just stays on black screen with an underscore top-right corner.
I go into BIOS, and set the other disk to be first boot disk...server boots.

The pool is of course degraded, i wipe the partitions from Disk 1, and re-add Disk 1 to the rpool.
Rpool then resilvers fine.

On reboot, the server will still not boot if the order of the drives is set to Disk 1, then Disk 2 (disks 3 & 4 are excluded from the boot order).
Setting Disk 2 as the primary boot drive boots fine.

Why is boot not happening from both of those disks ?

Code:
root@pve-chf:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-8
pve-kernel-5.3: 6.1-6
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 4.0.1-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-23
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-7
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
root@pve-chf:~#
Code:
root@pve-chf:~# zpool status rpool
  pool: rpool
state: ONLINE
  scan: resilvered 53.9M in 0 days 00:00:01 with 0 errors on Tue May  5 14:51:55 2020
config:

        NAME                                            STATE     READ WRITE CKSUM
        rpool                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            ata-KINGSTON_SA400S37240G_50026B768340617A  ONLINE       0     0     0
            ata-KINGSTON_SA400S37240G_50026B76834060AD  ONLINE       0     0     0

errors: No known data errors
root@pve-chf:~#
Code:
root@pve-chf:~# lsblk
NAME     MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda        8:0    0 223.6G  0 disk
├─sda1     8:1    0 223.6G  0 part
└─sda9     8:9    0     8M  0 part
sdb        8:16   0 223.6G  0 disk
├─sdb1     8:17   0 223.6G  0 part
└─sdb9     8:25   0     8M  0 part
sdc        8:32   0 465.8G  0 disk
├─sdc1     8:33   0 465.8G  0 part
└─sdc9     8:41   0     8M  0 part
sdd        8:48   0 465.8G  0 disk
├─sdd1     8:49   0 465.8G  0 part
└─sdd9     8:57   0     8M  0 part
sr0       11:0    1  1024M  0 rom
zd0      230:0    0    45G  0 disk
├─zd0p1  230:1    0   100M  0 part
├─zd0p2  230:2    0  31.3G  0 part
└─zd0p3  230:3    0  12.8G  0 part
zd16     230:16   0     1M  0 disk
zd32     230:32   0    40G  0 disk
├─zd32p1 230:33   0   529M  0 part
├─zd32p2 230:34   0    99M  0 part
├─zd32p3 230:35   0    16M  0 part
└─zd32p4 230:36   0  39.4G  0 part
zd48     230:48   0    10G  0 disk
├─zd48p1 230:49   0     9G  0 part
├─zd48p2 230:50   0     1K  0 part
└─zd48p5 230:53   0  1022M  0 part
zd64     230:64   0    64G  0 disk
├─zd64p1 230:65   0    60G  0 part
├─zd64p2 230:66   0     1K  0 part
└─zd64p5 230:69   0     4G  0 part
root@pve-chf:~#
Code:
root@pve-chf:~# ls -la /dev/disk/by-id
total 0
drwxr-xr-x 2 root root 540 May  6 10:02 .
drwxr-xr-x 8 root root 160 May  6 10:02 ..
lrwxrwxrwx 1 root root   9 May  6 10:02 ata-KINGSTON_SA400S37240G_50026B76834060AD -> ../../sda
lrwxrwxrwx 1 root root  10 May  6 10:02 ata-KINGSTON_SA400S37240G_50026B76834060AD-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 May  6 10:02 ata-KINGSTON_SA400S37240G_50026B76834060AD-part9 -> ../../sda9
lrwxrwxrwx 1 root root   9 May  6 10:02 ata-KINGSTON_SA400S37240G_50026B768340617A -> ../../sdb
lrwxrwxrwx 1 root root  10 May  6 10:02 ata-KINGSTON_SA400S37240G_50026B768340617A-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 May  6 10:02 ata-KINGSTON_SA400S37240G_50026B768340617A-part9 -> ../../sdb9
lrwxrwxrwx 1 root root   9 May  6 10:02 ata-ST3500514NS_9WJ14WD9 -> ../../sdc
lrwxrwxrwx 1 root root  10 May  6 10:02 ata-ST3500514NS_9WJ14WD9-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 May  6 10:02 ata-ST3500514NS_9WJ14WD9-part9 -> ../../sdc9
lrwxrwxrwx 1 root root   9 May  6 10:02 ata-ST3500514NS_9WJ15NAM -> ../../sdd
lrwxrwxrwx 1 root root  10 May  6 10:02 ata-ST3500514NS_9WJ15NAM-part1 -> ../../sdd1
lrwxrwxrwx 1 root root  10 May  6 10:02 ata-ST3500514NS_9WJ15NAM-part9 -> ../../sdd9
lrwxrwxrwx 1 root root   9 May  6 10:02 usb-TEAC_DV-28S-W_000000000033-0:0 -> ../../sr0
lrwxrwxrwx 1 root root   9 May  6 10:02 wwn-0x5000c5002e0b5c02 -> ../../sdc
lrwxrwxrwx 1 root root  10 May  6 10:02 wwn-0x5000c5002e0b5c02-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 May  6 10:02 wwn-0x5000c5002e0b5c02-part9 -> ../../sdc9
lrwxrwxrwx 1 root root   9 May  6 10:02 wwn-0x5000c5002e0bb485 -> ../../sdd
lrwxrwxrwx 1 root root  10 May  6 10:02 wwn-0x5000c5002e0bb485-part1 -> ../../sdd1
lrwxrwxrwx 1 root root  10 May  6 10:02 wwn-0x5000c5002e0bb485-part9 -> ../../sdd9
lrwxrwxrwx 1 root root   9 May  6 10:02 wwn-0x50026b76834060ad -> ../../sda
lrwxrwxrwx 1 root root  10 May  6 10:02 wwn-0x50026b76834060ad-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 May  6 10:02 wwn-0x50026b76834060ad-part9 -> ../../sda9
lrwxrwxrwx 1 root root   9 May  6 10:02 wwn-0x50026b768340617a -> ../../sdb
lrwxrwxrwx 1 root root  10 May  6 10:02 wwn-0x50026b768340617a-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 May  6 10:02 wwn-0x50026b768340617a-part9 -> ../../sdb9
root@pve-chf:~#

I thought perhaps grub isn't on the disk because i wiped the partition table, but grub is present :
Code:
root@pve-chf:~# dd if=/dev/sda bs=512 count=1 | xxd
1+0 records in
1+0 records out
512 bytes copied, 4.1342e-05 s, 12.4 MB/s
00000000: eb63 9000 0000 0000 0000 0000 0000 0000  .c..............
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0080 2200 0000  ............"...
00000060: 0000 0000 fffa 9090 f6c2 8074 05f6 c270  ...........t...p
00000070: 7402 b280 ea79 7c00 0031 c08e d88e d0bc  t....y|..1......
00000080: 0020 fba0 647c 3cff 7402 88c2 52bb 1704  . ..d|<.t...R...
00000090: f607 0374 06be 887d e817 01be 057c b441  ...t...}.....|.A
000000a0: bbaa 55cd 135a 5272 3d81 fb55 aa75 3783  ..U..ZRr=..U.u7.
000000b0: e101 7432 31c0 8944 0440 8844 ff89 4402  ..t21..D.@.D..D.
000000c0: c704 1000 668b 1e5c 7c66 895c 0866 8b1e  ....f..\|f.\.f..
000000d0: 607c 6689 5c0c c744 0600 70b4 42cd 1372  `|f.\..D..p.B..r
000000e0: 05bb 0070 eb76 b408 cd13 730d 5a84 d20f  ...p.v....s.Z...
000000f0: 83d0 00be 937d e982 0066 0fb6 c688 64ff  .....}...f....d.
00000100: 4066 8944 040f b6d1 c1e2 0288 e888 f440  @f.D...........@
00000110: 8944 080f b6c2 c0e8 0266 8904 66a1 607c  .D.......f..f.`|
00000120: 6609 c075 4e66 a15c 7c66 31d2 66f7 3488  f..uNf.\|f1.f.4.
00000130: d131 d266 f774 043b 4408 7d37 fec1 88c5  .1.f.t.;D.}7....
00000140: 30c0 c1e8 0208 c188 d05a 88c6 bb00 708e  0........Z....p.
00000150: c331 dbb8 0102 cd13 721e 8cc3 601e b900  .1......r...`...
00000160: 018e db31 f6bf 0080 8ec6 fcf3 a51f 61ff  ...1..........a.
00000170: 265a 7cbe 8e7d eb03 be9d 7de8 3400 bea2  &Z|..}....}.4...
00000180: 7de8 2e00 cd18 ebfe 4752 5542 2000 4765  }.......GRUB .Ge
00000190: 6f6d 0048 6172 6420 4469 736b 0052 6561  om.Hard Disk.Rea
000001a0: 6400 2045 7272 6f72 0d0a 00bb 0100 b40e  d. Error........
000001b0: cd10 ac3c 0075 f4c3 0000 0000 0000 00ff  ...<.u..........
000001c0: ffff eeff ffff 0100 0000 af44 f21b 0000  ...........D....
000001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 55aa  ..............U.
root@pve-chf:~#
Any input greatly appreciated !!
 
Last edited:
Hi,

this looks not like a default Proxmox VE installer installed ZFS disk layout.
There are the boot partition missing.
I guess the easiest way is to back up the data and install it new again.
 
There are the boot partition missing
That's what I was thinking, and that's why I had included lsblk output in my OP.

I do have external backups of all clients, I'll just need to read up on backing up the host config files because that is going to be a pain to manually reconfigure from scratch.

No way of adding the missing partition to the rpool members you think ?
 
Last edited:
No way of adding the missing partition to the rpool members you think ?

In Linux, there is always at least one way, but depending on your knowledge of Linux, reinstalling may be way faster.

Just fact: If we need to explain every step in detail to you, then reinstalling is definitely the fastest way.
 
You might have deduced from the info I gave in OP that this isn't my first rodeo.
Nevertheless, thank you for having added something which you might consider valuable to this thread.
 
I can't any more, system has been re-installed from scratch since yesterday.
fdisk showed just 2 partitions, where a fresh install shows 3 partitions on the rpool members, the boot partition being the crucial missing one I believe.
 
I can't any more, system has been re-installed from scratch since yesterday.

So is it solved then?

The layout of the seagates also look like a non-boot ZFS to me. Where do they come into play. There is still no information provided how this system could boot in the first place, AFAIK you need those boot partitions for grub.

You might have deduced from the info I gave in OP that this isn't my first rodeo.

I'm unfamiliar with rodeos to express how hard this would be. Your ZFS pool consists only of one partition per disk, so in order to get space for a boot partition, you would need to make space, so that you would have to backup your pool, destroy it, repartition your disks, create a new pool and restore everything back. Depending on your platform (mbr/uefi), you would need to create a bootdisk accordingly.
 
So is it solved then?
It is indeed solved (but unexplained), and I've now marked the thread accordingly.

The layout of the seagates also look like a non-boot ZFS to me. Where do they come into play.
Those are another non-boot raidz1 pool. They didn't come into play.

There is still no information provided how this system could boot in the first place, AFAIK you need those boot partitions for grub.
Correct, and I fail to understand how the system booted at all, since none of the rpool members had a boot partition. Although they both had a boot loader, as shown by dd if=/dev/sda bs=512 count=1
 
Correct, and I fail to understand how the system booted at all, since none of the rpool members had a boot partition. Although they both had a boot loader, as shown by dd if=/dev/sda bs=512 count=1

Strange indeed.
Yeah, but the mbr-bootloader itself cannot boot zfs. The last complete bootloader that fittet into those 446 bytes in the first sector was lilo (and of course DOS) and that was a looooooong time ago. Maybe it is somehow installed in the space filling at the end? The disks are so small that a non-gpt mbr could boot them.
 
just adding, both those drives were brand new, so whatever was on those SSDs came from the original Proxmox installation (which I had to wipe because that was easiest way to get back to a working system).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!