Updated my Proxmox Gen8 Microserver Homelab and Homeserver - broke everything!

theimpaler · Jul 25, 2023

Hi all

I was just updating my homelab/homeserver (HP Proliant Gen 8 Microserver) with a fair number of updates as I’d not gotten around to updating the server for a while (bad me I know). I’ve ben running this for a while now (probably 6.X roughly) and have been loving learning and using Proxmox.

I noticed something didn’t seem to update properly while doing that, but I stupidly assumed it was something that would probably update fine after a reboot…. Now the web gui is not accessible and the server doesn’t even seem to request an IP address or register with my router. I get these two errors in the console at boot:

Code:

DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [01:00.0] fault index 15 [fault reason 38] Blocked an interrupt request due to source-id verification failure

along with a few disk errors as well and a failure to import my ZFS pool.

The Kernel panics and the server doesn’t boot past the first couple of steps when I try to boot using 5.15.108-1-pve or 5.13.19-6-pve. It boots using 5.11.22-4-pve and 5.13.19-3-pve but in both cases is incredibly slow (even typing) in iLO and issues the same DMAR errors and has no network. I assume I was running the 5.11.XX Kernel before this as its the oldest one listed in Grub, but I’m not sure.

Some time ago I did some attempts on PCIe passthrough, but I (now!) know there’s an issue with IOMMU and RMRR on the Gen8 Microserver. When I first tried to enable it, it almost brought the server down completely, so I disabled it in Grub and its not caused a problem since.

I’ve been troubleshooting and searching the Proxmox forums a lot, and have come up with 4 possible issues so far:

Network Interface not getting an IP - still doesn’t when you try to bring the interface up
The apt/dpkg packages which failed to install during the updates
Full system drive?
IOMMU/RMRR issue

I have the outputs for lots of different logs / files / tools as screenshots from iLO as I’ve been looking around trying to troubleshoot and figure out what to do to fix it. How best to share them on here? (there's a lot of screenshots!):

Looking at apt / dpkg issue I've grabbed these three: (and pveversion shows the vast majority of pve packages are now "not correctly installed" (I've attached these having OCRed the screenshots to text for pveversion and history.log and a long screenshot for term.log)

Code:

/var/log/apt/history.log
/var/log/apt/term.log
pveversion -v

I've also looked at the following but not sure if these are useful so not attached them yet:

General info (from a recent boot now the server doesn't actually boot):

Code:

Dmesg

Looking at it being a disk issue or disks being full (they aren't)

Code:

lsblk
df -h

In relation to the networking (and I’ve tried to bring the interfaces up, but it never gets an IP and my vmbr0 seems to have vanished):

Code:

ip addr
/etc/network/interfaces

Thinking it might be an IOMMU/RMRR issue:

Code:

/etc/default/grub
cat proc/cmdline

I do have backups of some of the data, but as is often the case (yes I have learned my lesson) - they are not as up to date as I would like.

Annoyingly I have also been tinkering with a small cluster setup on another set of devices as part of learning what I was intending to replace the G8 Microserver with. If I’d just waited and finished that off and mitigated over then I would have avoided this mess as I could have updated one node, saw it was a problem and not worried…. *sigh*

I’d love to get this back up and running just so I can rescue my data which is on a ZFS pool passed through as a disk to an OMV VM. Any help is very much appreciated as I’m a little broken having been digging around lots and generally not wanting to break it further with running commands in a situation that feels quite delicate!

Thanks

Dunuin · Jul 25, 2023

theimpaler said:
Now the web gui is not accessible and the server doesn’t even seem to request an IP address or register with my router.

Normally a PVE will use a static IP and not request an IP via DHCP from the router.

theimpaler said:
I’d love to get this back up and running just so I can rescue my data which is on a ZFS pool passed through as a disk to an OMV VM. Any help is very much appreciated as I’m a little broken having been digging around lots and generally not wanting to break it further with running commands in a situation that feels quite delicate!

If you just want to rescue data you could boot a live linux with ZFS support (like Ubuntu). Import the ZFS pool and then dd your zvols or rsync the content of your mounted datasets or ounted filesystems on those zvols.

Did you update your PVE to the latest 6.X version before trying to upgrade to PVE 7.X? Usually a upgrade will fail if you screwed up the repos (for example mixing buster/PVE6 and bullseye/PVE7 repos) in case you for example forget the repos in the /etc/apt/sources.list.d/ folder and only change the repos in the /etc/apt/sources.list file. So I would check if the repos are correct and then try to finish the upgrade.

Did you follow the upgrade guide?:
https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0

Did you switch from grub to proxmox-boot-tool in case you are booting using grub and not systemd boot?:
https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Unable_to_boot_due_to_grub_failure

theimpaler · Jul 25, 2023

Hi Dunuin

Thanks for getting back to me on this.

Dunuin said:
Normally a PVE will use a static IP and not request an IP via DHCP from the router.

Sorry I should have been clearer - it does use a static IP. I meant it doesn't show up in my Unifi console (which it normally does).

Dunuin said:
If you just want to rescue data you could boot a live linux with ZFS support (like Ubuntu). Import the ZFS pool and then dd your zvols or rsync the content of your mounted datasets or ounted filesystems on those zvols.

This is useful to know thank you. It's a pain to have to recreate this, but the data is the critical bit. I may need more guidance on exactly how to do this but you've given me a lot there to work with thank you. It sounds like I could import the pool, mount the zvol and then get the data off the OMV filesystem and move it elsewhere. There's only one large filesystem on there which I hope makes it easier. I hope I can get the existing system back up if at all possible as it will be easier to get the data out that way but this is useful to know.

Dunuin said:
Did you update your PVE to the latest 6.X version before trying to upgrade to PVE 7.X?

Again I should have been clearer - this server wasn't running 6.X recently - I'd done the upgrade to 7.X not long after 7.X came out. The update that seems to have caused this was just a minor set of updates from 7.4.X

Dunuin said:
Did you switch from grub to proxmox-boot-tool in case you are booting using grub and not systemd boot?:

No I never did do this. The system was running fine using grub though in 7.4.X.

Thanks!

Search

Search

Updated my Proxmox Gen8 Microserver Homelab and Homeserver - broke everything!

theimpaler

New Member

Attachments

Dunuin

Distinguished Member

theimpaler

New Member