[solved] lxc won't start on one of the VE nodes

kriznik

Member
Sep 29, 2023
34
2
8
I have two proxmox servers which one is main and second if failover.
recently I've migrated some lxc containers back and forth and one of them won't start on the second node (which was OK couple of days ago)
So i recreated that container from PBS, no luck, still same error

Any clue?

Code:
[  OK  ] Finished ifupdown-pre.service - Helper to synchronize boot up for ifupdown.
         Mounting sys-kernel-config.mount - Kernel Configuration File System...
         Starting systemd-sysctl.service - Apply Kernel Variables...
         Starting systemd-sysusers.service - Create System Users...
[  OK  ] Finished nftables.service - nftables.
sys-kernel-config.mount: Mount process exited, code=exited, status=32/n/a
sys-kernel-config.mount: Failed with result 'exit-code'.
[FAILED] Failed to mount sys-kernel-config.mount - Kernel Configuration File System.
See 'systemctl status sys-kernel-config.mount' for details.
[  OK  ] Reached target network-pre.target - Preparation for Network.
[  OK  ] Finished systemd-sysctl.service - Apply Kernel Variables.
[  OK  ] Finished systemd-sysusers.service - Create System Users.

Code:
root@lagertha:~# systemctl status sys-kernel-config.mount
● sys-kernel-config.mount - Kernel Configuration File System
     Loaded: loaded (/lib/systemd/system/sys-kernel-config.mount; static)
     Active: active (mounted) since Tue 2024-02-13 14:29:56 CET; 40min ago
      Where: /sys/kernel/config
       What: configfs
       Docs: https://docs.kernel.org/filesystems/configfs.html
             https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
      Tasks: 0 (limit: 38388)
     Memory: 4.0K
        CPU: 2ms
     CGroup: /sys-kernel-config.mount

Notice: journal has been rotated since unit was started, output may be incomplete.
root@lagertha:~#

in debug log I dont see anything what grabs my attention.

same lxc container works on other node withou issues, and was working on this node as well before
 

Attachments

Last edited:
sure

Code:
root@lagertha:~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-8-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5: 6.5.11-8
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 17.2.7-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.1
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.4
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-3
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.0
pve-qemu-kvm: 8.1.5-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1
root@lagertha:~#
 
sure :)

Code:
root@lagertha:~# cat /etc/pve/lxc/105.conf                                                                                                           
## mqtt                                                                                                                                             
#### debian 12                                                                                                                                     
#### 10.10.10.5                                                                                                                                     
arch: amd64                                                                                                                                         
cores: 1                                                                                                                                           
features: nesting=1,keyctl=1                                                                                                                       
hostname: mqtt                                                                                                                                     
memory: 64                                                                                                                                         
net0: name=eth0,bridge=vmbr0,gw=10.10.10.1,hwaddr=8A:1D:A7:6D:42:65,ip=10.10.10.5/24,tag=10,type=veth                                               
onboot: 1                                                                                                                                           
ostype: debian                                                                                                                                     
rootfs: local-ssd:vm-105-disk-1,size=2G                                                                                                             
startup: order=1                                                                                                                                   
swap: 512                                                                                                                                           
tags: trusted;home                                                                                                                                 
unprivileged: 1                                                                                                                                     
root@lagertha:~#
 
In the log you've provided, the LXC ID is 500, does both 105 and 500 LXCs didn't work? Or the 500 is restored from backup on different ID?

What say when you try to start the LXC using CLI pct start <CTID>
 
that's just another try with another ID restored to different storage and so on, it's all the same.

Code:
root@lagertha:~# pct start 105                                                                                                                     
root@lagertha:~#

it gets started UI green, but in like 5-10seconds in UI it's showing as stopped
 
no, it's not running, it looks like it runs for like 2-5s and then stops

Code:
root@lagertha:~# pct enter 105                                                                                                                     
container '105' not running!                                                                                                                       
root@lagertha:~#
 
interestingly, looks like same issue is on the node where it is running just fine, as when I look into the running lxc I see same failed entry as my first message. Only difference is that lxc on this node works just fine.
Same lxc on other node won't start, so might be unrelated with sys-kernel FAILED during boot I guess

Code:
root@mqtt:~# systemctl | grep sys-kernel                                                                                                           
* sys-kernel-config.mount                                           loaded failed failed    Kernel Configuration File System                       
root@mqtt:~#
 
Ok interesting finding, I was digging in the syslog and found this during startup of 105 lxc

Code:
Feb 14 13:13:49 lagertha kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=ns,mems_allowed=0,oom_memcg=/lxc/105,task_memcg=/lxc/105/ns/system.slice/log2ram.service,task=cp,pid=20907,uid=100000
Feb 14 13:13:49 lagertha kernel: Memory cgroup out of memory: Killed process 20907 (cp) total-vm:3604kB, anon-rss:384kB, file-rss:1152kB, shmem-rss:0kB, UID:100000 pgtables:44kB oom_score_adj:0

So I changed lxc ram to 128 (was 64) and it runs.
When I change it back to 64, it won't run again as log2ram was set to 128M which exceeded ram for container.
What is weird is that same configuration was working (and is working on other node) which should not apparently

so it's my fault, but somehow it's accepted on one node but not on both :D
 
Last edited:
it's default 8g on both servers ... they are identical instalations. SW wise they shall be identical. The only real difference is one has got 280g ram and PVE is installed on VD hw raid, while failover has got only 32g ram and PVE is installed on zfs raid1

strange, indeed. But at least I learnt bit more about logging and confs and so on :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!