zpool Gone After Updating from PVE 7 to 8

rbeard.js

Member
Aug 11, 2022
53
2
13
We were performing updates on all of our nodes from version 7 to version 8 pve. All the other nodes went off without a hitch but one of our servers is missing one of its zpools and Im not sure how to get it back.
The disks arent even showing up in proxmox which im sure is a large part of the issues.
Looking in the idrac, I can see all 8 of the missing disks. They are all online. The raid card is still in HBA mode. It doesnt look like anything has changed on that front but proxmox cant see them for whatever reason.

I know the update warned about older hardware, but we have other nodes on the same raid cards and SSDs and they are fully functional on pve 8.

if anyone has seen this issue before and has any ideas I can look into, would really appreciate it.
I did the updates remotely afterhours. Because of this, I havent been able to see proxmox booting up and any error messages that might be present. Im going to look into that first thing tomorrow, but I wanted to reach out in case it's something else.

Thank you
 
The main change between PVE8 with kernel version 6.8 and earlier is that intel_iommu=on by default. Maybe try intel_iommu=off (or maybe iommu=pt is enough to use identity mapping for non-passed through device). Other people have reported missing drives because the device controller does not with with IOMMU on.
 
  • Like
Reactions: rbeard.js
The main change between PVE8 with kernel version 6.8 and earlier is that intel_iommu=on by default. Maybe try intel_iommu=off (or maybe iommu=pt is enough to use identity mapping for non-passed through device). Other people have reported missing drives because the device controller does not with with IOMMU on.
Ill give this a try, thank you
 
The main change between PVE8 with kernel version 6.8 and earlier is that intel_iommu=on by default. Maybe try intel_iommu=off (or maybe iommu=pt is enough to use identity mapping for non-passed through device). Other people have reported missing drives because the device controller does not with with IOMMU on.
I added both to my grub file, updated grub and rebooted but unfortunately, still no drives showing

Im seeing this a bunch in dmesg
[ 8.938879] megaraid_sas 0000:03:00.0: Could not get controller info. Fail from megasas_init_adapter_fusion 1907
[ 8.954753] megaraid_sas 0000:03:00.0: Failed from megasas_init_fw 6539

The only other thing in red is this bit

[ 21.163962] power_meter ACPI000D:00: Waiting for ACPI IPMI timeout
[ 21.164511] ACPI Error: AE_NOT_EXIST, Returned by Handler for [IPMI] (20230628/evregion-300)
[ 21.164562] ACPI Error: Region IPMI (ID=7) has no handler (20230628/exfldio-261)

[ 21.164626] No Local Variables are initialized for Method [_GHL]

[ 21.164665] No Arguments are initialized for method [_GHL]

[ 21.164705] ACPI Error: Aborting method \_SB.PMI0._GHL due to previous error (AE_NOT_EXIST) (20230628/psparse-529)
[ 21.164772] ACPI Error: Aborting method \_SB.PMI0._PMC due to previous error (AE_NOT_EXIST) (20230628/psparse-529)
 
initrd=\EFI\proxmox\6.8.8-4-pve\initrd.img-6.8.8-4-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs

Im fairly sure Im using grub on this guy as it always boots with the "welcome to grub" screen
I want to believe you but the output does not show intel_iommu=off, which implies that it is on with kernel version 6.8. Or did you remove it again after testing?
 
I want to believe you but the output does not show intel_iommu=off, which implies that it is on with kernel version 6.8. Or did you remove it again after testing?
Oh I removed it again after testing. My apologies

Im updating the firmware on the raid card to see if that might be the issue, but I can put the flag back and send you the output again
 
Oh I removed it again after testing. My apologies

Im updating the firmware on the raid card to see if that might be the issue, but I can put the flag back and send you the output again
No need, if you're sure it was applied correctly when trying it out. At least now you know how to check for it.
 
No need, if you're sure it was applied correctly when trying it out. At least now you know how to check for it.
initrd=\EFI\proxmox\6.8.8-4-pve\initrd.img-6.8.8-4-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs

Code:
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=off iommu=pt"
GRUB_CMDLINE_LINUX=""

So maybe something changed in the lasted update and kernel? Because you are right, Im not seeing the flags even after putting them back and rebooting. Where else can I set these flags if Im not using grub now?
The firmware update didnt help either unfortunately
 
Code:
root@bixbite:~# update-grub
Generating grub configuration file ...
W: This system is booted via proxmox-boot-tool:
W: Executing 'update-grub' directly does not update the correct configs!
W: Running: 'proxmox-boot-tool refresh'

Copying and configuring kernels on /dev/disk/by-uuid/0A22-7B24
        Copying kernel and creating boot-entry for 5.15.158-2-pve
        Copying kernel and creating boot-entry for 6.8.8-4-pve
Copying and configuring kernels on /dev/disk/by-uuid/0A23-9343
        Copying kernel and creating boot-entry for 5.15.158-2-pve
        Copying kernel and creating boot-entry for 6.8.8-4-pve
Found linux image: /boot/vmlinuz-6.8.8-4-pve
Found initrd image: /boot/initrd.img-6.8.8-4-pve
Found linux image: /boot/vmlinuz-5.15.158-2-pve
Found initrd image: /boot/initrd.img-5.15.158-2-pve
Found linux image: /boot/vmlinuz-5.15.102-1-pve
Found initrd image: /boot/initrd.img-5.15.102-1-pve
done

It also looks like I still have the old kernel on the disk as well. This is the output from update-grub
 
Oh good god
I expect that you need to edit /etc/kernel/cmdline (on a single line!) and run proxmox-boot-tool refresh: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot_edit_kernel_cmdline
I updated the cmdline file to add the flag. I ran the refresh command but the Cat command is still showing that the flag is missing. Maybe it will show after the reboot that is taking place now.

I also see here in the documentation that you can boot into the old kernel. Do you think that would be worth testing to see if its a kernel issue or is that really going to bork the system?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!