[SOLVED] Old LXC Containers on Proxmox 7 - Watch out(s) for the 6-7 upgrade

Donovan Hoare · Sep 25, 2022

So I'm writing this as help for users. I found all the info in the forums, but there were many different posts and different solutions. (This worked for me)
@my bad this was on production and critical.

So my scenario: I still had a cluster running 7 servers (running Proxmox 5.x) Time of writing Proxmox 7.2-11 is out. So my bad
I just never had a need to update it. But then Proxmox released the "Proxmox backup server" Got to be one of the best things added for proxmox.
So On a Saturday at 1 am i decided I'm going to shut down all containers and upgrade proxmox.

Side Notes:
1. All seven servers run OVS switch with bonded LACP interfaces
2. I don't use ceph so didn't have to worry about that upgrade

(please note on the 7 servers I have over 200 LXC containers running and over 100 VMs)

So ist step was

Code:

apt update
apt dist-upgrade

That went well (I even didn't have up-to-date 5 {head slap})
Then it was time to upgrade.
I followed the official help ( https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0 )
Follow the checklist.

The major error was upgrading corosync and the official tutorials were awesome
(https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0#Cluster:_always_upgrade_to_Corosync_3_first)

Again that was smooth.

So i followed the upgrades and to my happiness Version, 6.4 was running on all 7 servers.
The problem was that was so easy i decided to go to 6 to 7 straight away.
And Again hed slap was that cgroup or something like that for LXC containers is not supported by ubuntu 14.04, ubuntu 16.04 and centos 7.
(I don't use centos but it was mentioned in all the threads so I decided to add it here)

Here is where the problems started.
FIRST the upgrade.

Code:

pve6to7

said everything was OK. Hence the false sense of security.

So i did the upgrade

Code:

sed -i 's/buster\/updates/bullseye-security/g;s/buster/bullseye/g' /etc/apt/sources.list

## change the repos as suggested.

apt dist-upgrade

Well here the S**T hit the proverbial fan.
ALL 7 servers stopped "Hung" at 57% then after a huge amount of time (60 mins on production is huge)
they went to 58%

So now I'm really not willing to risk a forced reboot at 57% into the production server on an OS upgrade.
So here a lot of Google came into play (Yes everyone says test with a lab system - they seem to always work)

The problem was the server hung at memtest install or something like that. i didn't take notes so I stopped and waited.
So to find where it hung i ran

Code:

ps faxl

This under dist-upgrade showed the memtest item that was hanging my entire 7 servers.

So I risked it and i stopped the process

Code:

sudo kill -9 29303

#change 29303 with the PID of your memtest one

That whatever powers you like to believe or pray to | the installs on all seven servers continued "it did mention an error but continued"

So 2 hours waiting for a pause that was not mentioned in any document. I was so happy.
the servers rebooted and all the hosts came online.

Then the VMs came online and the LXC containers said started.
I thought I was done. No No No. I was wrong.

So I tried to ping A VM, Well it's on and booted but there is no network.
So a simple ifconfig shows that not all interfaces are up.

So this I have experienced before in testing so I know that OVSswith has some config changes that are required.
So I make the config changes.
You have to remove allow or something like that it was so long ago I did the research
MAKE A BACKUP OF YOUR /etc/network/interfaces 1st

CSS:

cp /etc/network/interfaces /root/

Cause after the next error you can have to restore it. Typing it all out in the console sucks (Side note novnc needs a copy-paste option).

Sample Working OVS setup | bonded interface | vlan11 in my case

Code:

auto lo
iface lo inet loopback

# Bond eno1 and eno2 together
auto eno1
iface eno1 inet manual
    ovs_mtu 9000

auto eno2
iface eno2 inet manual
    ovs_mtu 9000

auto bond0
iface bond0 inet manual
  ovs_bridge vmbr0
  ovs_type OVSBond
  ovs_bonds eno1 eno2
  ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast tag=1 vlan_mode=native-untagged
  ovs_mtu 9000
 
 # Bridge for our bond and vlan virtual interfaces (our VMs will
# also attach to this bridge)
auto vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  # NOTE: we MUST mention bond0, vlan50, and vlan55 even though each
  #       of them lists ovs_bridge vmbr0!  Not sure why it needs this
  #       kind of cross-referencing but it won't work without it!
  ovs_ports bond0 vlan11
  ovs_mtu 9000

auto vlan11
iface vlan11 inet static
    ovs_type OVSIntPort
    ovs_bridge vmbr0
    ovs_options tag=11
    address 192.168.241.10
    netmask 255.255.255.0
    gateway 192.168.241.1
    ovs_mtu 1500

So after this, i reboot. (Please note this is a WORKING Config).
So after reboot - Nothing i can't access the host.

so then i remeber that proxmox 7 - (New configs need ifupdown2) - default in proxmox 5 and 6 is ifupdown.
But now I cant apt install it as I have no network.
So I plug in a KVM (It's a 1-hour drive to the data centre) and then copy the backed-up network file
(At least from my previous experience I did do this)

CSS:

cp/root/interfaces /etc/network/interfaces

then reboot
*** Dell servers take 10 mins a reboot***
(Im sure this is where most of my time went)

After reboot, i then had a network to the host but not the containers/vm
but that allowed me to

Code:

apt install ifupdown2

it worked
Note: you don't have to remove ifupdown, the install does it automatically.

On a reboot *** 10 more mins of stress *** i have host access and i can ping my VM's.
I'm so happy I think I can go home and relax.
This is now 9 am on Saturday. - At this point, i really wish I stopped at proxmox 6.4.

But I'm passed the point of no return.
A client phones and says her PBX is down. (Its an ubuntu 14.04 box running asterisk) .
I had the audacity to say it should be working - I see the container is on.
then you click on the container. it's using a whopping 4Mb ram and 0 CPU usage. WTF.
To be honest they do warn you here, ( https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_cgroup_compat )
AND here

Maybe proxmox support should state - Crgoup2 - CAUTION FOR OLD LXC CONTAINERS.

So now I'm really about to die|cry|give up.
But I hear there is a way to recover.
So if you have read this far, google has found this post and like me, you are running OLD LXC containers that are not worth upgrading.
(Please leave security issues out of the post - All my traffic is private and firewalled - but yes still production)

And yes there is but there are some conflicting posts on the matter so this is what works(worked for me)

ON THE HOST RUNNING AN OLD LXC CONTAINER

Code:

nano /etc/default/grub
{
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox VE"
GRUB_CMDLINE_LINUX_DEFAULT="systemd.unified_cgroup_hierarchy=0 quiet"
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"
}

note i have systemd.unified_cgroup_hierarchy=0 is listed twice
I found multiple posts stating to place in different places.
I DONT care it works. My servers are down over 10 hours at this point. didn't need an extra reboot

Code:

update-grub

That was a success. there might of been some ospropper stuff but was easy to fix

Code:

nano /etc/kernel/cmdline
{#Add this line to new file
systemd.unified_cgroup_hierarchy=0
}

Im really not sure if i needed this but im was doing everything possible.
And again this is a working solution even if not needed

Code:

proxmox-boot-tool refresh

The following on a post i can CONFIRM is NOT needed.
Dont do it well you can but not needed

Code:

l
nano /etc/pve/lxc/#id#.conf
xc.cgroup.devices.allow =
lxc.cgroup.devices.deny =

PS Not needed

So afther theese changes i boot and start.
dell again takes so long to reboot is stress time (I drove home in this time)

And then the Vm's where up/ The ubuntu 16.04 Containers where up (I assume the centos 7 would be up)
But ah C**p the ubuntu 14.04 containers are still not booting.
After another 2 hours and a lot of searching, I found this gem

Thanks pizza - this did nt require a reboot.
shutdown - features - nesting on.
then they boot.

So now i have a working cluster that can run old LXC images.

Future pans.
===================
Redo PBX's on 14.04
Update 16.04 to 22.04
===================
Again this was stress full but so worth it for Proxmox backup server support.

I really hope this helps.
It's probably a 20 min read but it was the most stressful 12.5 hours I've had.

Started 1:00, nap time 13:45
Enjoy

Search

Search

[SOLVED] Old LXC Containers on Proxmox 7 - Watch out(s) for the 6-7 upgrade

Donovan Hoare

Active Member

We value your privacy