[SOLVED] sometimes it is just maddening

Joris L. · Jun 18, 2020

because i fail to understand i share my clumsly learnings with proxmox

For some reason or other i wanted to explore how to configure two bridges using a single NIC. Eventually this resulted in the faint hearted decisions to set the gateway of vmbr0 to 0.0.0.0, because who knows, it might just work and i do not have to dig into yet anoter pile of ill written pages found through googling and waisting hours on hours.

This resulted in the system becomming entirely unresponsive., because yeah, in hindsight, don't do that, for real. So i press the powerbutton and a cleans shutdown ensues. Just do discover the machine i happily booted a number of times has now become unbootable. I need to insert the typicall root=ZFS=rpool/... sequence and it boots fine.

So i do not delay and correct the stupid error config for the gateway, run update-grub, run update-initramfs -k all -u, run pve-efiboot-mgr refresh because for all i know i automagically unmade any of those.

The system reboots and now refuses to boot with the root=ZFS=.... sequence, it now hangs on vgaarb: VGA decodes

wtf!

gone are the days systems were merciless, now they put you to sleep with all kinds of conveniences and then hit you right between the eyes

Joris L. · Jun 18, 2020

aaah. the sweet agony of systemd-boot. What works once, does not work again.

for example

module_blacklist=vfio-pci

would i dare to consider systemd has somehow something to do with this time consuming event ?
because yeah, i did run update-grub and that is not what i was supposed to do, huh

Joris L. · Jun 18, 2020

ok it works, thanks Anakin, thanks Mr. Bean

now how to repair systemd-boot

May i suggest s6 by skarnet.org as future default init system for proxmox ?

Joris L. · Jun 18, 2020

FYI the only boot parameters resulting in a functioning sytem is to set iommu=off

iommu=off text nomodeset .... root=ZFS=mypool/ROOT/....

Joris L. · Jun 18, 2020

maybe i am just impatient, may i just lack common sense
how on earth does one fix systemd-boot after doing upgrade-grub ?

running bootctl simpy lists the partition which holds the ESP
mounting this partition on /boot/efi and running bootctl again shows success
still, no boot

the ESP/EFI partition was corrupted, fsck cleared the dirty bit
still, no boot, despite bootctl status showing all good things, partition is now clean, running pve-efiboot-tool refresh

final try, then i give up and wheep on the floor

mount -t efivars efivars /sys/firmware/drivers/efivars
mount /dev/mydiskwithefipartition /boot/efi
bootctl --path=/boot/efi install
pve-efiboot-tool refresh
reboot

FAIL the system reboots reliably, however, still will not boot with iommu=pt. it now requires iommu=off or freezes with vfio-pci .... vgaabr: ....

Joris L. · Jun 19, 2020

Am i correct in assuming this is simply how systemd-boot works ? The system boots fine now, except for this.

bootctl status
Couldn't find EFI system partition. It is recommended to mount it to /boot or /efi.
Alternatively, use --path= to specify path to mount point.

Even after reinstalling systemd-boot using efibootmgr /dev/<efipartitionpath> install the system does not mount /boot/efi correctly.

Incredible all this began with a stupid mistake with network configuration and consequently performing upgrade-grub (reflex i guess) which for some reason is on the same system as where systemd-boot is already installed.

Stoiko Ivanov · Jun 19, 2020

have you seen the reference documentation: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_systemd_boot

in short - systemd-boot is used if grub is not used (so no need for update-grub), plus PVE has it's own handling of the systemd-boot config (because it needs to get replicated to all bootable disks in your zfs pool)

I hope this helps!

Joris L. · Jun 19, 2020

Stoiko Ivanov said:
have you seen the reference documentation: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_systemd_boot

in short - systemd-boot is used if grub is not used (so no need for update-grub), plus PVE has it's own handling of the systemd-boot config (because it needs to get replicated to all bootable disks in your zfs pool)

I hope this helps!

Thanks Stoiko. I had a read again on this page now i slept a few hours.

Typical as for many open source 'documentation' it is informative, not helpful. That much i could learn with 'man bootctl' and 'man systemd-boot'

What has me stumped is systemd-boot works fine but the /boot/efi partition is not even mounted. One would expect if systemd-boots it has some parameters set to automount /boot/efi Which is the crux of any issues i encounter.

Stoiko Ivanov · Jun 19, 2020

Stoiko Ivanov said:
PVE has it's own handling of the systemd-boot config (because it needs to get replicated to all bootable disks in your zfs pool)

This is the reason why the ESP(s) are not mounted during regular operations - if you need to adapt the kernel commandline - edit /etc/kernel/cmdline and run `pve-efiboot-tool refresh` afterwards.

I hope this explains it

Joris L. · Jun 19, 2020

Stoiko Ivanov said:
This is the reason why the ESP(s) are not mounted during regular operations - if you need to adapt the kernel commandline - edit /etc/kernel/cmdline and run `pve-efiboot-tool refresh` afterwards.

I hope this explains it

Ah, thanks.~~May i suggest to document~~ ~~that not-mounting bit and such ?~~ OOPS

From the page you mentioned, i missed this 3 times.

The ESPs are not kept mounted during regular operation, in contrast to grub, which keeps an ESP mounted on /boot/efi. This helps to prevent filesystem corruption to the vfat formatted ESPs in case of a system crash, and removes the need to manually adapt /etc/fstab in case the primary boot device fails.

Now i can sleep easy again. I will never fully comprehend how i evolved from break > fix. I will test a few cmdline versions i stored. Somehow with one cmdline the system boots onto a login prompt with others it stops at vfio-pci .... vgaaarb: ... but the system boots normal. My impression is the sequence of the cmdline parameters is more influential than i anticipated.

Yes, i am aware of the required `pve-efiboot-tool refresh` the pve-tools are quite nice actually.

Joris L. · Jun 20, 2020

Just one hour ago. The machine froze again.

The system which rebooted fine multiple times before is now (as per usual) showing "vfio-pci ...... vgaarb: changed VGA decodes: olddecodes=io+mem, decodes=none

wns=none" as the last line on the screen, strangely it reboots almost immediately on the three finger salute "ctrl-alt-del"

If i would be certain upgrading the CPU would make a difference i order it today, that would be from 1700X to 3700X

Joris L. · Jun 20, 2020

WTF!

while /etc/network/interface has the bridge configured and a bridge.vlanid configured it now does not know the bridge interface anymore when looking with brctl show

the file /var/log/error shows a

kernel: mce: [Hardware Error]: CPU 11: Machine Check: 0 Bank 5: bea......
kernel: mce: [Hardware Error]: TSC ADDR 1f.........c0 MISC ..... SYND ..... IPID ....
kernel: mce: [Hardware Error]: PROCESSOR 2:.... TIME ... SOCKET 0 APIC 7 microcode 8001138
kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 5: bea......
kernel: mce: [Hardware Error]: TSC ADDR 1f.........c0 MISC ..... SYND ..... IPID ....
kernel: mce: [Hardware Error]: PROCESSOR 2:.... TIME ... SOCKET 0 APIC d microcode 8001138

proxwolfe · Jun 20, 2020

sorry for my hijacking this thread.

I just stumbled across it accidentally, but now I am afraid to reboot as I may have rendered my system unbootable.

This is what I did: I wanted to amend the kernel command line in order to allow pci passthrough.

I found this: https://192.168.10.11:8006/pve-docs/chapter-qm.html#qm_pci_passthrough

and followed it to this: https://192.168.10.11:8006/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline

I don't know whether proxmox boots via grub or systemd-boot. It looked like both is possible so I went with the one I know: grub

I followed this through:
"Grub
The kernel commandline needs to be placed in the variable GRUB_CMDLINE_LINUX_DEFAULT in the file /etc/default/grub. Running update-grub appends its content to all linux entries in /boot/grub/grub.cfg."

and then happened to find this thread that now makes me fear my system won't boot up again.

Assuming this is what is going to happen, what do I need to do to rectify the situation?

Thanks

Search

Search

[SOLVED] sometimes it is just maddening

Joris L.

Active Member

Joris L.

Active Member

Joris L.

Active Member

Joris L.

Active Member

Joris L.

Active Member

Joris L.

Active Member

Stoiko Ivanov

Proxmox Staff Member

Joris L.

Active Member

Stoiko Ivanov

Proxmox Staff Member

Joris L.

Active Member

Joris L.

Active Member

Joris L.

Active Member

proxwolfe

Well-Known Member