[SOLVED] sometimes it is just maddening

May 16, 2020
271
16
38
51
Antwerp, Belgium
commandline.be
because i fail to understand i share my clumsly learnings with proxmox

For some reason or other i wanted to explore how to configure two bridges using a single NIC. Eventually this resulted in the faint hearted decisions to set the gateway of vmbr0 to 0.0.0.0, because who knows, it might just work and i do not have to dig into yet anoter pile of ill written pages found through googling and waisting hours on hours.

This resulted in the system becomming entirely unresponsive., because yeah, in hindsight, don't do that, for real. So i press the powerbutton and a cleans shutdown ensues. Just do discover the machine i happily booted a number of times has now become unbootable. I need to insert the typicall root=ZFS=rpool/... sequence and it boots fine.

So i do not delay and correct the stupid error config for the gateway, run update-grub, run update-initramfs -k all -u, run pve-efiboot-mgr refresh because for all i know i automagically unmade any of those.

The system reboots and now refuses to boot with the root=ZFS=.... sequence, it now hangs on vgaarb: VGA decodes

wtf!

gone are the days systems were merciless, now they put you to sleep with all kinds of conveniences and then hit you right between the eyes


noooo.gif
 
  • Like
Reactions: H4R0
aaah. the sweet agony of systemd-boot. What works once, does not work again.

for example

module_blacklist=vfio-pci​
would i dare to consider systemd has somehow something to do with this time consuming event ?
because yeah, i did run update-grub and that is not what i was supposed to do, huh




proxy-image
 
maybe i am just impatient, may i just lack common sense
how on earth does one fix systemd-boot after doing upgrade-grub ?

running bootctl simpy lists the partition which holds the ESP
mounting this partition on /boot/efi and running bootctl again shows success
still, no boot

the ESP/EFI partition was corrupted, fsck cleared the dirty bit
still, no boot, despite bootctl status showing all good things, partition is now clean, running pve-efiboot-tool refresh

final try, then i give up and wheep on the floor

mount -t efivars efivars /sys/firmware/drivers/efivars
mount /dev/mydiskwithefipartition /boot/efi
bootctl --path=/boot/efi install
pve-efiboot-tool refresh
reboot

FAIL the system reboots reliably, however, still will not boot with iommu=pt. it now requires iommu=off or freezes with vfio-pci .... vgaabr: ....
 
Last edited:
Am i correct in assuming this is simply how systemd-boot works ? The system boots fine now, except for this.

bootctl status
Couldn't find EFI system partition. It is recommended to mount it to /boot or /efi.
Alternatively, use --path= to specify path to mount point.

Even after reinstalling systemd-boot using efibootmgr /dev/<efipartitionpath> install the system does not mount /boot/efi correctly.

Incredible all this began with a stupid mistake with network configuration and consequently performing upgrade-grub (reflex i guess) which for some reason is on the same system as where systemd-boot is already installed.
 
Last edited:
have you seen the reference documentation: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_systemd_boot

in short - systemd-boot is used if grub is not used (so no need for update-grub), plus PVE has it's own handling of the systemd-boot config (because it needs to get replicated to all bootable disks in your zfs pool)

I hope this helps!

Thanks Stoiko. I had a read again on this page now i slept a few hours.

Typical as for many open source 'documentation' it is informative, not helpful. That much i could learn with 'man bootctl' and 'man systemd-boot'

What has me stumped is systemd-boot works fine but the /boot/efi partition is not even mounted. One would expect if systemd-boots it has some parameters set to automount /boot/efi Which is the crux of any issues i encounter.
 
PVE has it's own handling of the systemd-boot config (because it needs to get replicated to all bootable disks in your zfs pool)
This is the reason why the ESP(s) are not mounted during regular operations - if you need to adapt the kernel commandline - edit /etc/kernel/cmdline and run `pve-efiboot-tool refresh` afterwards.

I hope this explains it
 
This is the reason why the ESP(s) are not mounted during regular operations - if you need to adapt the kernel commandline - edit /etc/kernel/cmdline and run `pve-efiboot-tool refresh` afterwards.

I hope this explains it

Ah, thanks.May i suggest to document that not-mounting bit and such ? OOPS

From the page you mentioned, i missed this 3 times.

The ESPs are not kept mounted during regular operation, in contrast to grub, which keeps an ESP mounted on /boot/efi. This helps to prevent filesystem corruption to the vfat formatted ESPs in case of a system crash, and removes the need to manually adapt /etc/fstab in case the primary boot device fails.​

Now i can sleep easy again. I will never fully comprehend how i evolved from break > fix. I will test a few cmdline versions i stored. Somehow with one cmdline the system boots onto a login prompt with others it stops at vfio-pci .... vgaaarb: ... but the system boots normal. My impression is the sequence of the cmdline parameters is more influential than i anticipated.

Yes, i am aware of the required `pve-efiboot-tool refresh` the pve-tools are quite nice actually.
 
Last edited:
Just one hour ago. The machine froze again.

The system which rebooted fine multiple times before is now (as per usual) showing "vfio-pci ...... vgaarb: changed VGA decodes: olddecodes=io+mem, decodes=none:eek:wns=none" as the last line on the screen, strangely it reboots almost immediately on the three finger salute "ctrl-alt-del"

If i would be certain upgrading the CPU would make a difference i order it today, that would be from 1700X to 3700X
 
WTF!

while /etc/network/interface has the bridge configured and a bridge.vlanid configured it now does not know the bridge interface anymore when looking with brctl show

the file /var/log/error shows a

kernel: mce: [Hardware Error]: CPU 11: Machine Check: 0 Bank 5: bea......
kernel: mce: [Hardware Error]: TSC ADDR 1f.........c0 MISC ..... SYND ..... IPID ....
kernel: mce: [Hardware Error]: PROCESSOR 2:.... TIME ... SOCKET 0 APIC 7 microcode 8001138
kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 5: bea......
kernel: mce: [Hardware Error]: TSC ADDR 1f.........c0 MISC ..... SYND ..... IPID ....
kernel: mce: [Hardware Error]: PROCESSOR 2:.... TIME ... SOCKET 0 APIC d microcode 8001138
 
sorry for my hijacking this thread.

I just stumbled across it accidentally, but now I am afraid to reboot as I may have rendered my system unbootable.

This is what I did: I wanted to amend the kernel command line in order to allow pci passthrough.

I found this: https://192.168.10.11:8006/pve-docs/chapter-qm.html#qm_pci_passthrough

and followed it to this: https://192.168.10.11:8006/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline

I don't know whether proxmox boots via grub or systemd-boot. It looked like both is possible so I went with the one I know: grub

I followed this through:
"Grub
The kernel commandline needs to be placed in the variable GRUB_CMDLINE_LINUX_DEFAULT in the file /etc/default/grub. Running update-grub appends its content to all linux entries in /boot/grub/grub.cfg."

and then happened to find this thread that now makes me fear my system won't boot up again.

Assuming this is what is going to happen, what do I need to do to rectify the situation?

Thanks