[solved] lxc won't start on one of the VE nodes

kriznik · Feb 13, 2024

I have two proxmox servers which one is main and second if failover.
recently I've migrated some lxc containers back and forth and one of them won't start on the second node (which was OK couple of days ago)
So i recreated that container from PBS, no luck, still same error

Any clue?

Code:

[  OK  ] Finished ifupdown-pre.service - Helper to synchronize boot up for ifupdown.
         Mounting sys-kernel-config.mount - Kernel Configuration File System...
         Starting systemd-sysctl.service - Apply Kernel Variables...
         Starting systemd-sysusers.service - Create System Users...
[  OK  ] Finished nftables.service - nftables.
sys-kernel-config.mount: Mount process exited, code=exited, status=32/n/a
sys-kernel-config.mount: Failed with result 'exit-code'.
[FAILED] Failed to mount sys-kernel-config.mount - Kernel Configuration File System.
See 'systemctl status sys-kernel-config.mount' for details.
[  OK  ] Reached target network-pre.target - Preparation for Network.
[  OK  ] Finished systemd-sysctl.service - Apply Kernel Variables.
[  OK  ] Finished systemd-sysusers.service - Create System Users.

Code:

root@lagertha:~# systemctl status sys-kernel-config.mount
● sys-kernel-config.mount - Kernel Configuration File System
     Loaded: loaded (/lib/systemd/system/sys-kernel-config.mount; static)
     Active: active (mounted) since Tue 2024-02-13 14:29:56 CET; 40min ago
      Where: /sys/kernel/config
       What: configfs
       Docs: https://docs.kernel.org/filesystems/configfs.html
             https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
      Tasks: 0 (limit: 38388)
     Memory: 4.0K
        CPU: 2ms
     CGroup: /sys-kernel-config.mount

Notice: journal has been rotated since unit was started, output may be incomplete.
root@lagertha:~#

in debug log I dont see anything what grabs my attention.

same lxc container works on other node withou issues, and was working on this node as well before

Moayad · Feb 13, 2024

Hi,

Can you please post the output of `pveversion -v`

kriznik · Feb 13, 2024

sure

Code:

root@lagertha:~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-8-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5: 6.5.11-8
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 17.2.7-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.1
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.4
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-3
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.0
pve-qemu-kvm: 8.1.5-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1
root@lagertha:~#

Moayad · Feb 14, 2024

Hi,

thank you for the output! could you please also post the output of the LXC config?

kriznik · Feb 14, 2024

sure

Code:

root@lagertha:~# cat /etc/pve/lxc/105.conf                                                                                                           
## mqtt                                                                                                                                             
#### debian 12                                                                                                                                     
#### 10.10.10.5                                                                                                                                     
arch: amd64                                                                                                                                         
cores: 1                                                                                                                                           
features: nesting=1,keyctl=1                                                                                                                       
hostname: mqtt                                                                                                                                     
memory: 64                                                                                                                                         
net0: name=eth0,bridge=vmbr0,gw=10.10.10.1,hwaddr=8A:1D:A7:6D:42:65,ip=10.10.10.5/24,tag=10,type=veth                                               
onboot: 1                                                                                                                                           
ostype: debian                                                                                                                                     
rootfs: local-ssd:vm-105-disk-1,size=2G                                                                                                             
startup: order=1                                                                                                                                   
swap: 512                                                                                                                                           
tags: trusted;home                                                                                                                                 
unprivileged: 1                                                                                                                                     
root@lagertha:~#

Moayad · Feb 14, 2024

In the log you've provided, the LXC ID is 500, does both 105 and 500 LXCs didn't work? Or the 500 is restored from backup on different ID?

What say when you try to start the LXC using CLI pct start <CTID>

kriznik · Feb 14, 2024

that's just another try with another ID restored to different storage and so on, it's all the same.

Code:

root@lagertha:~# pct start 105                                                                                                                     
root@lagertha:~#

it gets started UI green, but in like 5-10seconds in UI it's showing as stopped

Moayad · Feb 14, 2024

And can you act with the CT e.g., pct enter 105?

kriznik · Feb 14, 2024

no, it's not running, it looks like it runs for like 2-5s and then stops

Code:

root@lagertha:~# pct enter 105                                                                                                                     
container '105' not running!                                                                                                                       
root@lagertha:~#

kriznik · Feb 14, 2024

interestingly, looks like same issue is on the node where it is running just fine, as when I look into the running lxc I see same failed entry as my first message. Only difference is that lxc on this node works just fine.
Same lxc on other node won't start, so might be unrelated with sys-kernel FAILED during boot I guess

Code:

root@mqtt:~# systemctl | grep sys-kernel                                                                                                           
* sys-kernel-config.mount                                           loaded failed failed    Kernel Configuration File System                       
root@mqtt:~#

kriznik · Feb 14, 2024

Ok interesting finding, I was digging in the syslog and found this during startup of 105 lxc

Code:

Feb 14 13:13:49 lagertha kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=ns,mems_allowed=0,oom_memcg=/lxc/105,task_memcg=/lxc/105/ns/system.slice/log2ram.service,task=cp,pid=20907,uid=100000
Feb 14 13:13:49 lagertha kernel: Memory cgroup out of memory: Killed process 20907 (cp) total-vm:3604kB, anon-rss:384kB, file-rss:1152kB, shmem-rss:0kB, UID:100000 pgtables:44kB oom_score_adj:0

So I changed lxc ram to 128 (was 64) and it runs.
When I change it back to 64, it won't run again as log2ram was set to 128M which exceeded ram for container.
What is weird is that same configuration was working (and is working on other node) which should not apparently

so it's my fault, but somehow it's accepted on one node but not on both

leesteken · Feb 14, 2024

kriznik said:
so it's my fault, but somehow it's accepted on one node but not on both

Does one node have swap memory/space and the other not?

kriznik · Feb 14, 2024

it's default 8g on both servers ... they are identical instalations. SW wise they shall be identical. The only real difference is one has got 280g ram and PVE is installed on VD hw raid, while failover has got only 32g ram and PVE is installed on zfs raid1

strange, indeed. But at least I learnt bit more about logging and confs and so on

Search

Search

[solved] lxc won't start on one of the VE nodes

kriznik

Member

Attachments

Moayad

Proxmox Staff Member

kriznik

Member

Moayad

Proxmox Staff Member

kriznik

Member

Moayad

Proxmox Staff Member

kriznik

Member

Moayad

Proxmox Staff Member

kriznik

Member

kriznik

Member

kriznik

Member

leesteken

Distinguished Member

kriznik

Member