Q: Weird/Problems with LXC Container Copy - cannot get IP address on VM start / things are broken..

fortechitsolutions

Renowned Member
Jun 4, 2008
449
51
93
Hi,

I've got some weird issues and I'm hoping someone might recognize some symptoms and possibly be able to comment.

This is on a recently patched-to-latest proxmox 5.4.13 host. It has been running proxmox for >1 year and has been patched 'gradually'. (ie, every few months). I'm pretty sure most recent patches were applied with no reboot yet. (because it is in production).

For this environment, normally I deploy a new LXC VM from a backup of a 'good copy of an existing host that does the same thing' basically. Spin up the new copy after adjusting its hostname, MAC address, IP address. Then carry on with it, some small internal customization, all is well. The base starting point is a 'priviledged' LXC VM which was built using a pretty standard CentOS 6.X template.

Yesterday, I tried to do 'the normal routine' and - first weird thing - is that the LXC VM when it boots up, it does not have any IP address.

I can manually 'ifup eth0' and bring up the interface. Then can ping out; or from the outside, can ping in.
But things still don't work well after this - I think maybe various services which depend on network fail to start properly. There are 'many small problems' I think. I can manually start httpd for example, but then from outside cannot actually connect to port80/443 as I should be normally able to do so.

I did some various tests. Tried to spin up the same VM on second Proxmox host which more spare resources - not a resource issue on proxmox; same outcome.
Tried to restore it as an 'unpriviledged' restore container. No change/improvement. Tried to enable 'nesting' as a workaround, after finding this thread discussion which seems to have similar? symptoms/issues?
https://forum.proxmox.com/threads/privileged-lxc-container-cant-get-ip-apparmor.58912/
Tried a copy of it on a Prox6.X.Latest / recently rebooted box. Same outcome / behaviour exactly.


So far the net result is that I can't get this container to start up normally.

I can create a new LXC container from scratch if I use a stock CentOS LXC template, and this starts up just fine / no weird behaviour.
So I am kind of baffled, what is different / why it is so unhappy.

Dmesg Logs on the proxmox node, tend to show this kind of stuff:
Code:
[14315910.210979] EXT4-fs (loop30): mounted filesystem with ordered data mode. Opts: (null)
[14315910.231488] IPv6: ADDRCONF(NETDEV_UP): veth135i0: link is not ready
[14315910.594864] vmbr1: port 32(veth135i0) entered blocking state
[14315910.594866] vmbr1: port 32(veth135i0) entered disabled state
[14315910.594934] device veth135i0 entered promiscuous mode
[14315910.648355] eth0: renamed from vethVAXL1M
[14316038.426798] EXT4-fs (loop31): mounted filesystem with ordered data mode. Opts: (null)
[14316193.080213] EXT4-fs (loop31): mounted filesystem with ordered data mode. Opts: (null)
[14316448.851152] EXT4-fs (loop31): mounted filesystem with ordered data mode. Opts: (null)
[14316449.201362] audit: type=1400 audit(1588187659.550:541): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-136_</var/lib/lxc>" pid=2747 comm="apparmor_parser"
[14316449.203285] IPv6: ADDRCONF(NETDEV_UP): veth136i0: link is not ready
[14316449.631160] vmbr1: port 33(veth136i0) entered blocking state
[14316449.631162] vmbr1: port 33(veth136i0) entered disabled state
[14316449.631237] device veth136i0 entered promiscuous mode
[14316450.060041] eth0: renamed from vethCIUV9Q
[14316480.066097] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[14316480.066104] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[14316480.066131] vmbr1: port 33(veth136i0) entered blocking state
[14316480.066132] vmbr1: port 33(veth136i0) entered forwarding state
[14316595.560979] audit: type=1400 audit(1588187805.909:542): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-136_</var/lib/lxc>" pid=8489 comm="apparmor_parser"
[14316595.971627] vmbr1: port 33(veth136i0) entered disabled state
[14316595.975248] device veth136i0 left promiscuous mode
[14316595.975251] vmbr1: port 33(veth136i0) entered disabled state
[14316610.585643] EXT4-fs (loop31): mounted filesystem with ordered data mode. Opts: (null)
[14316610.884154] audit: type=1400 audit(1588187821.229:543): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-136_</var/lib/lxc>" pid=9520 comm="apparmor_parser"
[14316610.937981] IPv6: ADDRCONF(NETDEV_UP): veth136i0: link is not ready
[14316611.352387] vmbr1: port 33(veth136i0) entered blocking state
[14316611.352389] vmbr1: port 33(veth136i0) entered disabled state
[14316611.352457] device veth136i0 entered promiscuous mode
[14316611.439309] eth0: renamed from vethOSDKX5
[14316648.181608] audit: type=1400 audit(1588187858.529:544): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-136_</var/lib/lxc>" pid=10975 comm="apparmor_parser"
[14316648.681869] vmbr1: port 33(veth136i0) entered disabled state
[14316648.685830] device veth136i0 left promiscuous mode
[14316648.685833] vmbr1: port 33(veth136i0) entered disabled state
root@prox1:/var/lib/vz/dump#

inside the VM, the messages are ~messy / more things are visible than I really want to see (ie, messages from other LXC Containers, not relevant but confounding; but visible due to shared kernel IIRC). No clear smoking gun.

I'm curious if this sounds vaguely familiar to anyone, in any way, and if there are any hints/suggestions on ways I can proceed to try to debug this further.

Ultimately I would like to be able to .. make copy .. spin up copy VM .. without this kind of drama.

Thanks,

Tim
 
Digging a bit more on this today.

In case this helps ?

Starting the VM thus,

Code:
root@dprox1:/tmp#  lxc-start -n 135 -F -l DEBUG -o /tmp/lxc-135.log

I see this one line immediately:
init: Failed to spawn rcS main process: unable to execute: Permission denied

and in the log for the above, I am seeing,

(see attached - too big to put here in code paste it seems?)

ie, absence of serious errors here, just various debug notice info stuff. no clear smoking gun.

I did more random searching, forums/google/etc.

- scenario that this CentOS LXC VM is too old / something is busted with update to containers on proxmox? I tried to roll back update that happened ~week ago, which pushed pve-container (2.0-42) over (2.0-41)

Code:
downgrade was done, thus,

apt-get install pve-container=2.0-41

but this resulted in no change/improvement. So then updated back to latest-current.

I'm not clear if I should expect to see a clear smoking gun somewhere (inside the VM, on proxmox 'outside the VM', other?) but if anyone has banged there head on an issue like this before and wants to make a suggestion, I am very happy to hear hints, even if something like "Doh, look here, is super obvious, yes-no?!". :)



Sigh.


Tim
 

Attachments

hi,

can you post the container config?

i'd guess it could have something to do with dhcp though. did you try using a static ip for the container?

also is ipv6 used? that could be the problem source as well
 
Hi, sure, here is config. It is static IPv4 and no IPv6:

Code:
root@dprox1:/etc/pve/lxc# cat 135.conf

#192.168.95.125 # old IP
#hostname.goes.here
#
#setup sep-17-19.TDC # Old comment
#
# clone-copy of VEID114 backup from Aug.2019
arch: amd64
cores: 2
cpulimit: 1
features: nesting=1 # this feature added apr.29.2020 as part of testing. no change.
hostname: testing
memory: 2048
net0: name=eth0,bridge=vmbr1,firewall=1,gw=192.168.95.1,hwaddr=82:31:97:C1:7F:DF,ip=192.168.95.135/24,type=veth
onboot: 1
ostype: centos
rootfs: local:135/vm-135-disk-0.raw,size=32G
swap: 2048

root@dprox1:/etc/pve/lxc#


Thanks,


Tim
 
fortechitsolutions and community, sorry for replying under an old post, but it might be useful.
Code:
init: Failed to spawn rcS main process: unable to execute: Permission denied

may be related to permissions on the /etc/rc.d/rc.sysinit (execute permissions needed for this file).

It`s hard to debug (if you don't do it all the time), but first step learn how the `init` works and step by step debug /etc/init files.
 
Ok, thank you, I will try to chase this a bit and see if I can find anything. Since the VM(s) involved are all based on now a very old stale version of CentOS I think long term the solution most likely here is, (a) Setup new VM on new template to replace old, (b) migrate Tomcat app thing from old>new (C) Power off then delete the old VM, migration away from old bad config is finished.

lots of fun!

Thank you for the suggestion though!
Tim
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!