LXC does not start after update/upgrade - apparmor issue(?)

Palerius · Oct 1, 2022

Hi there!

I am running Proxmox 6.4-13 and from the information that I gathered I seem to have an issue with apparmor after doing an update & dist-upgrade for my container.
There is a thread from 2017 that talked about a similar issue and the solution was to downgrade the kernel. I posted in that thread but since this is a 5-year-old thread it might make more sense to open a new thread.

More about my issue:
I updated a container with update & dist-upgrade and can since not start the container anymore.
Looking at the syslogs I do get references to issues with apparmor.

I don't really understand the solution explained in the old thread (and am afraid it is simply outdated). So maybe there is a more recent thread someone can linke me to or help me out right here.

I am running the server as a hobby so while it is not all that dramatic to lose the container I would prefer not to as it would mean a lot of time to run my backup (which is admittedly also a little older than I would like it to be)

I appreciate any input.

Cheers.

Dunuin · Oct 1, 2022

You know that PVE 6.X is end of life for 3 months? You really should consider upgrading to 7.2 to be able to receive security patches again and so on.

Palerius · Oct 1, 2022

I did not know that so I appreciate the info.
On the other hand, I would have assumed that running updates for a node would include an update to proxmox system itself.

Since this server is not available outside the network I don't think this is too critical for now but time to write this update on my bucket list

Dunuin · Oct 1, 2022

Palerius said:
On the other hand, I would have assumed that running updates for a node would include an update to proxmox system itself.

Only for minor upgrades. Major upgrades you have to do yourself as these often will break stuff (for example cgroup LXCs won't work anymore, only cgroup2 LXCs...atleast if you don't apply the workaround). See here: https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0

Palerius · Oct 1, 2022

I see. Thanks for the infos and the link. I will certainly check that out.
Would it make sense to do this system upgrade first? Is there a chance this will fix my current issue with it?

Dunuin · Oct 1, 2022

Palerius said:
I see. Thanks for the infos and the link. I will certainly check that out.
Would it make sense to do this system upgrade first? Is there a chance this will fix my current issue with it?

Hard to tell. Usually its a good idea to upgrade stuff in case your problem got meanwhile fixed.
But on the other hand doing a major upgrade could cause new problems.

Palerius · Oct 1, 2022

Dunuin said:
Hard to tell. Usually its a good idea to upgrade stuff in case your problem got meanwhile fixed.
But on the other hand doing a major upgrade could cause new problems.

Good, that was exactly my thought! Haha.
I guess I will just flip a coin while doing container backups and let fate decide.

Dunuin · Oct 1, 2022

Its always a good idea to have recent backups of the guests and the PVE node. Keep in mind that you can't downgrade a PVE node, so wouldn't be a bad idea to create a blocklevel backup of the whole system disk (for example using clonezilla) before doing an major upgrade. That way you could restore a backup with PVE 6.4 in case PVE 7.2 is causing unfixable problems.

Neobin · Oct 1, 2022

Palerius said:
after doing an update & dist-upgrade for my container.

What OS is it and from which to which version did you update/upgrade it?

Since LXCs are really tight connected to the host, I can well imagine that upgrading the PVE-host [1] could help.
No guarantee, but since PVE 6 is EOL [2] anyway, it would be a really good point to start in my opinion.

If you are going to upgrade [1], make sure to read the guide really carefully (also the "Known upgrade issues" part [3]) and double/triple check your package repositories [4].
Before you start the upgrade to PVE 7/Debian Bullseye, you want to make sure, that your current PVE 6/Debian Buster installation is on the most recent version utilizing an appropriate PVE 6 repository [5].

[1] https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0
[2] https://pve.proxmox.com/wiki/FAQ
[3] https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Known_upgrade_issues
[4] https://pve.proxmox.com/wiki/Package_Repositories
[5] https://pve.proxmox.com/wiki/Package_Repositories#_proxmox_ve_6_x_repositories

Palerius · Oct 2, 2022

Hey Neobin,

thanks for your input!
I did indeed start with the upgrade to 7 as this was the clearest path forward for the moment.
Following the guide and checking with the pve6to7 command if any issues are visible worked with no problem.
The update did go through but I am currently working on an issue where the 5.15.60 kernel has a panic on boot up.
I can still boot the node up using the old kernel but that is not a long-term solution.

So far I am clueless about why this happens but I just started my research on the issue.
I guess the fact that I am running some old hardware could be part of it (Supermicro board with 2 AMD Opteron CPUs)
If you have any ideas for a starting point, I am open to tips.
Unfortunately, I have so far only found similar issues for older upgrades/kernels but I haven't taken to the big googles yet

Once proxmox runs stable und on the new kernel I will get back into the actual VM issue
As for your question regarding this.
The container used to run ubuntu 20.04 and showed update info after a reboot one day. The update was to 22.04 which from looking at all my containers is the only one that has this version. Idk why it would prompt me for this version out of the blue but it is what it is.

Palerius · Oct 2, 2022

Quick update after a long evening:
The upgrade to 7 (even with the currently still old kernel) seems to have fixed the initial issue of this post. While working on the kernel issue (and other things that came up) I noticed that the affected container was running. A quick check revealed that it is indeed back online.
Now I have to tackle the kernel issue and the fact that while lxc container work, VMs do not boot at all anymore.
All I get is a "booting from hard disk" and that's it. No shutdown is possible either.
This might relate to another issue where the node itself reports "/usr/sbin/grub-probe: error: disk **** not found" on most if not all(?) disks.

So overall I have a big new time eating task at hand. I should have gotten into a more outdoor and less techy hobby!

Neobin · Oct 3, 2022

That is unfortunate.

Palerius said:
This might relate to another issue where the node itself reports "/usr/sbin/grub-probe: error: disk **** not found" on most if not all(?) disks.

Searching the forum for grub-probe brings up some posts: [1] and this one: [2] seems helpful. (Check if this is your actual problem!)
Unfortunately I have no experience with LVM, sorry.

I would first troubleshoot the grub problem, but for the kernel problem:

Is the bios and all firmwares up-to-date?
You could try the even newer 5.19 opt-in kernel: [3] and/or the previous: pve-kernel-5.13
And/Or try installing the AMD-microcode package: [4]. (Not even sure, if it contains anything for such old CPUs.)

Otherwise I have no more ideas at the moment; sorry to not be more helpful.

[1] https://forum.proxmox.com/search/5421052/?q=grub-probe&t=post&o=date
[2] https://forum.proxmox.com/threads/upgrade-pve-6-x-to-7-x-grub-issues.92118/page-2#post-429676
[3] https://forum.proxmox.com/threads/opt-in-linux-5-19-kernel-for-proxmox-ve-7-x-available.115090
[4] https://wiki.debian.org/Microcode

Palerius · Oct 3, 2022

No worries at all! I really appreciate all the time you have put in to help me.
I think the kernel/grub issue is related to why the VMs won't start. So fixing those issues is a priority.

I will spend today trying some more troubleshooting.

I haven't checked if the bios is up-to-date (which I doubt) so I need to see about that.
Changing to kernel 5.19 might be an option but tbh I don't know (yet) how to install a kernel. So research is needed there as well.
I will also read into the microcode package.
Again. Thank you so much for your help so far. I will def. keep posting here in case this will be helpful for anyone else out there

Palerius · Oct 3, 2022

Small update:
Apparmor continues to be the source of my pain.
A ton of updates for pve fail due to dependencies on each other with the initial dependency Apparmor failing.

dpkg fails to setup apparmor stating:
cannot compute MD5 hash for file '/etc/init.d/apparmor': failed to read (Input/output error)

Once that is fixed it should setup 6-7 other packages which should get me further.

This is now a race against time. Do I fix it before I downloaded my cloud backup or does the backup finish first and I decide to just wipe the install (which I would like to rather not)

fabian · Oct 3, 2022

that doesn't sound like an apparmor issue, but a corrupt filesystem / broken disk..

Palerius · Oct 3, 2022

Interesting point.
What's weird is that it is only that single file that is making problems. The pve is running on an SSD that is fairly new too.
Manually copy pasting it or opening it is the same issue. I haven't in the past and still to the present seen this issue anywhere else.
Tbh I am at a loss on how to move on from here since I can not interact with the file at all and in that update/configure/fix apparmor.

Palerius · Oct 3, 2022

Update:
After running a full backup on the last things I was worried about I gave it another restart. Tried the 5.15.60 kernel for giggles.
While I still get some grub errors it did boot all the way up. This also fixed the VMs which start now.

The I/O issue with apparmor persists but at least I can run some basic services of the machine while troubleshooting it.

I figured I post the actual message I get for more context:

Code:

Setting up apparmor (2.13.6-10) ...
dpkg: error processing package apparmor (--configure):
 cannot compute MD5 hash for file '/etc/init.d/apparmor': failed to read (Input/output error)
dpkg: dependency problems prevent configuration of lxc-pve:
 lxc-pve depends on apparmor; however:
  Package apparmor is not configured yet.

My journey continues!

Palerius · Oct 14, 2022

Okay, I am back.
I finally had some more time to work on the machine.
Given that the drive might be failing as mentioned above (as I can not identify any other issue even tho it says drive status is okay), what would be a plan to swap that drive out?
I do have the system drive in Raid 10 so there is a second drive with a copy of the entire system. But so far I do not know how to tell the system to switch to that drive so I can remove the broken one and replace it with a new one.

I would be appreciative of any tips on how to approach this.

Dunuin · Oct 14, 2022

If you use ZFS for the raid you could change the boot order in BIOS/UEFI.

Palerius · Oct 22, 2022

Dunuin said:
If you use ZFS for the raid you could change the boot order in BIOS/UEFI.

I do! That is great info. I will look into that.
Trying to run a clone for the defective drive failed (due to the problematic file).
I will give the boot order idea a try, it that already works it would be great and I could just swap out the defective drive.
Otherwise, I am thinking about just running a clean install of proxmox. Since all VMs/LXCs are on a separate drive I would only need to somehow backup the references to those as well as the references to the connected drives and the ZFS setup I assume?
Appreciate your input. I will keep posting how it's going.

LXC does not start after update/upgrade - apparmor issue(?)

Active Member

Distinguished Member

Active Member

Distinguished Member

Active Member

Distinguished Member

Active Member

Distinguished Member

Distinguished Member

Active Member

Active Member

Distinguished Member

Active Member

Active Member

Proxmox Staff Member

Active Member

Active Member

Active Member

Distinguished Member

Active Member