Proxmox Virtual Environment 8.2.2 - LXC High Availibilty after upgrade PVE v7 to v8

Jera92 · Jun 12, 2024

Dear members of Proxmox forum

I have a question about HA issues of LXC's happening after the upgrade of Proxmox VE from version 7.4-18 to 8.2.2.
We have multiple Debian 12 LXC's running on our PVE clusters, One PVE cluster as a development environment and one as a production environment.

I executed the upgrade in our development environment to upgrade from version 7.4-18 to 8.2.2.
All hosts in the 2 separate clusters are using Ceph v17.2.7 (before and after the upgrade)

I ran the steps and did all necessary steps before upgrading by checking the following post: https://pve.proxmox.com/wiki/Upgrade_from_7_to_8.

After the upgrade, everything went fine, except the LXC containers.

They are in a HA configuration, but when it auto migrates to another host in the development cluster, the LXC won't start, with error message:

Bash:

run_buffer: 571 Script exited with status 255
lxc_init: 845 Failed to run lxc.hook.pre-start for container "101"
__lxc_start: 2034 Failed to initialize container "101"
TASK ERROR: startup for container '101' failed

The LXC works when I disable the HA for the LXC, restore a backup and run it on host3. And host 1 and 2 the LXC won't start.
I want to fix this issue and found out why this happened after the upgrade and how I can fix it.
I plan to upgrade my production environment with v7 before it reaches the end of support date.

Can you help me further investigate this?

Kind regards

Chris · Jun 12, 2024

Hi,
please post the output of pct start 101 --debug when starting the container in its failing state and post the container config pct config 101 --current

Jera92 · Jun 16, 2024

Hi

Thank you for your reply.

Without doing anything on the LXC's itself or when they are not in HA mode, the LXC crashes after 2 days also on the same host were it run well brefore the migration to PVE8.
I don't know if this information is relevant, but I wanted to mention it.

Below you find the output of the provided commands:

Bash:

#  pct start 101 --debug
run_buffer: 571 Script exited with status 255
lxc_init: 845 Failed to run lxc.hook.pre-start for container "101"
__lxc_start: 2034 Failed to initialize container "101"
0 hostid 100000 range 65536
INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "101", config section "lxc"
DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 101 lxc pre-start produced output: unable to parse version info '
ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 255
ERROR    start - ../src/lxc/start.c:lxc_init:845 - Failed to run lxc.hook.pre-start for container "101"
ERROR    start - ../src/lxc/start.c:__lxc_start:2034 - Failed to initialize container "101"
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "101", config section "lxc"
startup for container '101' failed

Bash:

# pct config 101 --current
arch: amd64
cores: 2
description: # Report Sandbox LXC general%0A## Network config%0A| NIC | IPv4 | MAC | %0A| ---%3A--- | ---%3A--- | ---%3A--- |%0A| net0 | 192.168.12.101 | 32%3A60%3A62%3A96%3AB4%3A88 |%0A
features: keyctl=1,nesting=1
hostname: report-sandbox.internal.robotronic.be
memory: 2048
nameserver: 192.168.16.238
net0: name=eth0,bridge=vmbr0,gw=192.168.12.254,hwaddr=32:60:62:96:B4:88,ip=192.168.12.101/24,type=veth
onboot: 1
ostype: debian
rootfs: vm-lxc-storage:101/vm-101-disk-1.raw,size=5G
searchdomain: internal.robotronic.be
swap: 512
tags: debian12
unprivileged: 1

At the moment, no LXC's won't boot anymore.

Thank you for your help!

Kind regards

LnxBil · Jun 16, 2024

Are you sure you updated all cluster nodes? Please post pveversion of each node in that cluster.

Jera92 · Jun 16, 2024

Yes, I'm sure, I updates the 3 nodes on the same day from the same repo's (confirmed and posted only once below):

Code:

pve:~# cat /etc/apt/sources.list
deb http://ftp.be.debian.org/debian bookworm main contrib

deb http://ftp.be.debian.org/debian bookworm-updates main contrib

# PVE pve-no-subscription repository provided by proxmox.com,
# NOT recommended for production use
deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription

# security updates
deb http://security.debian.org bookworm-security main contrib

pve:~# cat /etc/apt/sources.list.d/ceph.list
deb http://download.proxmox.com/debian/ceph-quincy bookworm no-subscription

Bash:

pve1-sandbox:~# pveversion
pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-3-pve)

pve2-sandbox:~# pveversion
pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-3-pve)

pve3-sandbox:~# pveversion
pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-3-pve)

I did a backup restore of LXC 101 on node pve3-sandbox without recreating the HA configuration for that LXC.
It ran well for a day, and the day after the restore on node pve3-sandbox, the LXC suddenly failed and didn't want to reboot anymore.
But I could find why this happens with the LXC's after the upgrade,..

Chris · Jun 17, 2024

Jera92 said:
Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 101 lxc pre-start produced output: unable to parse version info '

What version of Debian are you running inside the container? The version in /etc/debain_version does not seem to match the regex pattern used to parse the version according to the error, see https://git.proxmox.com/?p=pve-cont...2239a91f2ef3de9fc1742229787c5c22d;hb=HEAD#l24

But this is independent from whether you manage the container via HA or manually start the container.

Jera92 said:
At the moment, no LXC's won't boot anymore.

Are they all running the same version of Debian? Are they all failing with the same error message when starting in debug mode?

Jera92 said:
Without doing anything on the LXC's itself or when they are not in HA mode, the LXC crashes after 2 days also on the same host were it run well brefore the migration to PVE8.

That seems unrelated, do the systemd journal inside the container or on the Proxmox VE host show any error in the logs when this happens? Are you maybe overcommitting the host?

Jera92 · Jun 18, 2024

Chris said:
What version of Debian are you running inside the container? The version in /etc/debain_version does not seem to match the regex pattern used to parse the version according to the error, see https://git.proxmox.com/?p=pve-cont...2239a91f2ef3de9fc1742229787c5c22d;hb=HEAD#l24

I'm using the following version on all LXC's:

Bash:

# cat /etc/debian_version
12.4

Chris said:
Are they all running the same version of Debian? Are they all failing with the same error message when starting in debug mode?

All hosts in the cluster and LXC's are running the same version.
The LXC's only want to boot on pve3-sandbox after a restore from backup.
Even after a restore from a backup they don't boot on pve1-sandbox & pve2-sandbox.
Also when I succesfully restore the LXC's on pve3-sandbox, and it runs smoothly, when I manually migrate the LXC to another node (pve1-sandbox or pve2-sandbox) the same error message happen, the container can't be booted.

Chris said:
That seems unrelated, do the systemd journal inside the container or on the Proxmox VE host show any error in the logs when this happens? Are you maybe overcommitting the host?

I'm not overcommiting the host, it's a small LXC with.
Host hardware:
- CPU: 16 x Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz (1 Socket)
- RAM: 64GB (only 20GB used in total right now)

LXC hardware:
- CPU: 2 cores
- RAM: 2GB
- VM disk: 5GB

I get the following errors:

Bash:

Jun 18 17:21:03 pve3-sandbox kernel: loop0: detected capacity change from 0 to 20971520
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs (loop0): warning: mounting fs with errors, running e2fsck is recommended
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs (loop0): mounted filesystem 3b3635ab-5730-4b78-b20c-93c5e4fdb91e r/w with ordered data mode. Quota>
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs error (device loop0): ext4_lookup:1855: inode #131073: comm lxc-pve-prestar: iget: checksum invalid
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs error (device loop0): ext4_lookup:1855: inode #16386: comm lxc-pve-prestar: iget: checksum invalid
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs error (device loop0): ext4_lookup:1855: inode #131073: comm lxc-pve-prestar: iget: checksum invalid
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs error (device loop0): ext4_lookup:1855: inode #131073: comm lxc-pve-prestar: iget: checksum invalid
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs error (device loop0): ext4_lookup:1855: inode #131073: comm lxc-pve-prestar: iget: checksum invalid
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs error (device loop0): ext4_lookup:1855: inode #131073: comm lxc-pve-prestar: iget: checksum invalid
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs error (device loop0): ext4_lookup:1855: inode #131073: comm lxc-pve-prestar: iget: checksum invalid
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs error (device loop0): ext4_lookup:1855: inode #131073: comm lxc-pve-prestar: iget: checksum invalid
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs error (device loop0): ext4_lookup:1855: inode #131073: comm lxc-pve-prestar: iget: checksum invalid
Jun 18 17:21:04 pve3-sandbox kernel: EXT4-fs error (device loop0): ext4_lookup:1855: inode #131073: comm lxc-pve-prestar: iget: checksum invalid
Jun 18 17:21:04 pve3-sandbox audit[3859986]: AVC apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-102_</var/li>
Jun 18 17:21:04 pve3-sandbox kernel: audit: type=1400 audit(1718724064.406:43): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lx>
Jun 18 17:21:04 pve3-sandbox kernel: vmbr0: port 5(veth102i0) entered blocking state
Jun 18 17:21:04 pve3-sandbox kernel: vmbr0: port 5(veth102i0) entered disabled state
Jun 18 17:21:04 pve3-sandbox kernel: veth102i0: entered allmulticast mode
Jun 18 17:21:04 pve3-sandbox kernel: veth102i0: entered promiscuous mode
Jun 18 17:21:04 pve3-sandbox kernel: eth0: renamed from vethMUBGFR
Jun 18 17:21:04 pve3-sandbox pvedaemon[3859952]: startup for container '102' failed
Jun 18 17:21:05 pve3-sandbox audit[3860047]: AVC apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-102_</var/>
Jun 18 17:21:05 pve3-sandbox kernel: audit: type=1400 audit(1718724065.112:44): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/>
Jun 18 17:21:05 pve3-sandbox pvedaemon[3858816]: unable to get PID for CT 102 (not running?)
Jun 18 17:21:05 pve3-sandbox pvedaemon[3858816]: <root@pam> end task UPID:pve3-sandbox:003AE5F0:04AD99B2:6671A5DF:vzstart:102:root@pam: startup>
Jun 18 17:21:05 pve3-sandbox kernel: vmbr0: port 5(veth102i0) entered disabled state
Jun 18 17:21:05 pve3-sandbox kernel: veth102i0 (unregistering): left allmulticast mode
Jun 18 17:21:05 pve3-sandbox kernel: veth102i0 (unregistering): left promiscuous mode
Jun 18 17:21:05 pve3-sandbox kernel: vmbr0: port 5(veth102i0) entered disabled state
Jun 18 17:21:05 pve3-sandbox kernel: EXT4-fs (loop0): unmounting filesystem 3b3635ab-5730-4b78-b20c-93c5e4fdb91e.
Jun 18 17:21:06 pve3-sandbox systemd[1]: pve-container@102.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit pve-container@102.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.

Seems it's related to an EXT4 partition, the partition inside the LXC disk then?

After I restored a backup, I tried a e2fsck on the lxc disk but that didn't fix the issue.

Should I fix the virtual hard drive, and how can I do that?
Or what do you suggest?

Chris · Jun 19, 2024

Jera92 said:
Seems it's related to an EXT4 partition, the partition inside the LXC disk then?

After I restored a backup, I tried a e2fsck on the lxc disk but that didn't fix the issue.

Should I fix the virtual hard drive, and how can I do that?
Or what do you suggest?

You can try to use a pct fsck <vmid> and see if that solves your issues. But yes, it seems your filesystems got corrupted.

Can you exclude bad hardware such as bad memory modules or defect disks? Also, please check if the issues persist when you boot the nodes with a different kernel version.

Jera92 · Jun 19, 2024

Chris said:
You can try to use a pct fsck <vmid> and see if that solves your issues. But yes, it seems your filesystems got corrupted.

Thanks for your reply!

I tried the following:

Bash:

pve3-sandbox# pct fsck 101
fsck from util-linux 2.38.1
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

fsck.ext2: Bad magic number in super-block while trying to open /mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw
command 'fsck -a -l /mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw' failed: exit code 8

Well that seems to me that the LXC disk is damaged. But I can't understand why it is suddenly damaged.
The strange thing to me as that it happend after the upgrade to pve8.

Chris said:
Can you exclude bad hardware such as bad memory modules or defect disks? Also, please check if the issues persist when you boot the nodes with a different kernel version.

Normally the servers don't have bad hardware, otherwise the system should alert it via IDRAC.
I booted the previous kernel 5.15.152-1-pve on pve2-sandbox, migrated the container to that host, start it from the gui without debug options, and it boots up correctly.

I tried the following on pve2-sandbox (booted with previous kernel 5.15.152-1-pve):

Bash:

pve2-sandbox# pct fsck 101
fsck from util-linux 2.38.1
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw contains a file system with errors, check forced.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: Entry '.pwd.lock' in /etc (131073) has deleted/unused inode 164574.  CLEARED.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: Entry 'subuid-' in /etc (131073) has deleted/unused inode 165004.  CLEARED.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: Entry 'subgid-' in /etc (131073) has deleted/unused inode 165107.  CLEARED.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: Entry 'opasswd' in /etc/security (164609) has deleted/unused inode 164614.  CLEARED.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: Entry 'installed' in /etc/sv/ssh/.meta (164677) has deleted/unused inode 164678.  CLEARED.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: Entry '.placeholder' in /etc/sensors.d (164796) has deleted/unused inode 164797.  CLEARED.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: Entry 'sbin.dhclient' in /etc/apparmor.d/local (165451) has deleted/unused inode 165452.  CLEARED.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: Entry 'usr.bin.man' in /etc/apparmor.d/local (165451) has deleted/unused inode 165453.  CLEARED.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: Entry '.system.lock' in /etc/.java/.systemPrefs (165460) has deleted/unused inode 165461.  CLEARED.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: Entry '.systemRootModFile' in /etc/.java/.systemPrefs (165460) has deleted/unused inode 165462.  CLEARED.
/mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw: 36992/327680 files (0.2% non-contiguous), 394087/1310720 blocks
command 'fsck -a -l /mnt/pve/cephfs/vm-lxc-storage/images/101/vm-101-disk-1.raw' failed: exit code 1

Try te reboot it on pve2-sandbox, booted normally.
Then migrated back to pve3-sandbox and booted the LXC on the latest PVE kernel.
Then again, the following message:

Bash:

()
run_buffer: 571 Script exited with status 255
lxc_init: 845 Failed to run lxc.hook.pre-start for container "101"
__lxc_start: 2034 Failed to initialize container "101"
TASK ERROR: startup for container '101' failed

So this issue could be related to the PVE host kernel version?

When I migrate the LXC101 back to node pve2-sandbox, the LXC boots fine.
I updated the whole LXC101 and rebooted after the update on pve2-sandbox:

Bash:

$ sudo apt update -y && sudo apt upgrade -y && sudo apt dist-upgrade -y

And again did a migration to PVE3-sandbox, resulting in the same error

Bash:

()
run_buffer: 571 Script exited with status 255
lxc_init: 845 Failed to run lxc.hook.pre-start for container "101"
__lxc_start: 2034 Failed to initialize container "101"
TASK ERROR: startup for container '101' failed

And then again, migrating it back to PVE2-sandbox makes it work..
What can I do further?..

Chris · Jun 20, 2024

Jera92 said:
What can I do further?..

Please check the error you get when starting the container via the debug flag, the issues you are seeing seem still rather inconsistent or there are multiple issues at play.

Also, please try to migrate the container to a different storage not related to the ceph cluster and check if the issue persists. Further, check the ceph cluster health status and double check the systemd journal for errors around the time you try to start the failed container.

This should help to further narrow down the issue.

Jera92 · Jun 23, 2024

Chris said:
Please check the error you get when starting the container via the debug flag, the issues you are seeing seem still rather inconsistent or there are multiple issues at play.

The result:

Bash:

pve2-sandbox# lxc-start -n 102 -F -lDEBUG
lxc-start: 102: ../src/lxc/sync.c: sync_wait: 34 An error occurred in another process (expected sequence number 7)
lxc-start: 102: ../src/lxc/start.c: __lxc_start: 2114 Failed to spawn container "102"
lxc-start: 102: ../src/lxc/tools/lxc_start.c: lxc_start_main: 307 The container failed to start
lxc-start: 102: ../src/lxc/tools/lxc_start.c: lxc_start_main: 312 Additional information can be obtained by setting the --logfile and --logpriority options

Chris said:
Also, please try to migrate the container to a different storage not related to the ceph cluster and check if the issue persists.

I moved the LXC his disk to local LVM storage of node pve2-sandbox, the result after starting with debug flag:

Bash:

pve2-sandbox# lxc-start -n 102 -F -lDEBUG -o lxc-102.log
lxc-start: 102: ../src/lxc/sync.c: sync_wait: 34 An error occurred in another process (expected sequence number 7)
lxc-start: 102: ../src/lxc/start.c: __lxc_start: 2114 Failed to spawn container "102"
lxc-start: 102: ../src/lxc/tools/lxc_start.c: lxc_start_main: 307 The container failed to start
lxc-start: 102: ../src/lxc/tools/lxc_start.c: lxc_start_main: 312 Additional information can be obtained by setting the --logfile and --logpriority options

Jera92 · Jun 23, 2024

pve2-sandbox# cat lxc-102.log
lxc-start 102 20240623124819.130 INFO confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type u nsid 0 hostid 100000 range 65536
lxc-start 102 20240623124819.130 INFO confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 0 hostid 100000 range 65536
lxc-start 102 20240623124819.130 INFO lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
lxc-start 102 20240623124819.131 INFO utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "102", config section "lxc"
lxc-start 102 20240623124820.912 INFO cgfsng - ../src/lxc/cgroups/cgfsng.c:unpriv_systemd_create_scope:1498 - Running privileged, not using a systemd unit
lxc-start 102 20240623124820.913 DEBUG seccomp - ../src/lxc/seccomp.c

arse_config_v2:664 - Host native arch is [3221225534]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c

arse_config_v2:815 - Processing "reject_force_umount # comment this to allow umount -f; not recommended"
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c

arse_config_v2:815 - Processing "[all]"
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c

arse_config_v2:815 - Processing "kexec_load errno 1"
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[246:kexec_load] action[327681:errno] arch[0]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[246:kexec_load] action[327681:errno] arch[1073741827]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[246:kexec_load] action[327681:errno] arch[1073741886]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c

arse_config_v2:815 - Processing "open_by_handle_at errno 1"
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[304

pen_by_handle_at] action[327681:errno] arch[0]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[304

pen_by_handle_at] action[327681:errno] arch[1073741827]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[304

pen_by_handle_at] action[327681:errno] arch[1073741886]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c

arse_config_v2:815 - Processing "init_module errno 1"
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[175:init_module] action[327681:errno] arch[0]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[175:init_module] action[327681:errno] arch[1073741827]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[175:init_module] action[327681:errno] arch[1073741886]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c

arse_config_v2:815 - Processing "finit_module errno 1"
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[313:finit_module] action[327681:errno] arch[0]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[313:finit_module] action[327681:errno] arch[1073741827]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[313:finit_module] action[327681:errno] arch[1073741886]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c

arse_config_v2:815 - Processing "delete_module errno 1"
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[176:delete_module] action[327681:errno] arch[0]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[176:delete_module] action[327681:errno] arch[1073741827]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[176:delete_module] action[327681:errno] arch[1073741886]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c

arse_config_v2:815 - Processing "ioctl errno 1 [1,0x9400,SCMP_CMP_MASKED_EQ,0xff00]"
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:555 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[16:ioctl] action[327681:errno] arch[0]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:555 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[16:ioctl] action[327681:errno] arch[1073741827]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:555 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[16:ioctl] action[327681:errno] arch[1073741886]
lxc-start 102 20240623124820.913 INFO seccomp - ../src/lxc/seccomp.c

arse_config_v2:1036 - Merging compat seccomp contexts into main context
lxc-start 102 20240623124821.766 INFO start - ../src/lxc/start.c:lxc_init:882 - Container "102" is initialized
lxc-start 102 20240623124821.766 INFO cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_create:1669 - The monitor process uses "lxc.monitor/102" as cgroup
lxc-start 102 20240623124821.767 DEBUG storage - ../src/lxc/storage/storage.c:storage_query:231 - Detected rootfs type "dir"
lxc-start 102 20240623124821.767 DEBUG storage - ../src/lxc/storage/storage.c:storage_query:231 - Detected rootfs type "dir"
lxc-start 102 20240623124821.767 INFO cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_payload_create:1777 - The container process uses "lxc/102/ns" as inner and "lxc/102" as limit cgroup
lxc-start 102 20240623124821.767 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWUSER
lxc-start 102 20240623124821.767 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWNS
lxc-start 102 20240623124821.768 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWPID
lxc-start 102 20240623124821.768 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWUTS
lxc-start 102 20240623124821.768 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWIPC
lxc-start 102 20240623124821.768 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWCGROUP
lxc-start 102 20240623124821.768 DEBUG start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved user namespace via fd 17 and stashed path as user:/proc/1252628/fd/17
lxc-start 102 20240623124821.768 DEBUG start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved mnt namespace via fd 18 and stashed path as mnt:/proc/1252628/fd/18
lxc-start 102 20240623124821.768 DEBUG start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved pid namespace via fd 19 and stashed path as pid:/proc/1252628/fd/19
lxc-start 102 20240623124821.768 DEBUG start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved uts namespace via fd 20 and stashed path as uts:/proc/1252628/fd/20
lxc-start 102 20240623124821.768 DEBUG start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved ipc namespace via fd 21 and stashed path as ipc:/proc/1252628/fd/21
lxc-start 102 20240623124821.768 DEBUG start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved cgroup namespace via fd 22 and stashed path as cgroup:/proc/1252628/fd/22
lxc-start 102 20240623124821.768 DEBUG idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newuidmap" does have the setuid bit set
lxc-start 102 20240623124821.768 DEBUG idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newgidmap" does have the setuid bit set
lxc-start 102 20240623124821.768 DEBUG idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:178 - Functional newuidmap and newgidmap binary found
lxc-start 102 20240623124821.775 INFO cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_setup_limits:3528 - Limits for the unified cgroup hierarchy have been setup
lxc-start 102 20240623124821.775 DEBUG idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newuidmap" does have the setuid bit set
lxc-start 102 20240623124821.775 DEBUG idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newgidmap" does have the setuid bit set
lxc-start 102 20240623124821.775 INFO idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:176 - Caller maps host root. Writing mapping directly
lxc-start 102 20240623124821.775 NOTICE utils - ../src/lxc/utils.c:lxc_drop_groups:1572 - Dropped supplimentary groups
lxc-start 102 20240623124821.776 INFO start - ../src/lxc/start.c:do_start:1105 - Unshared CLONE_NEWNET
lxc-start 102 20240623124821.777 NOTICE utils - ../src/lxc/utils.c:lxc_drop_groups:1572 - Dropped supplimentary groups
lxc-start 102 20240623124821.777 NOTICE utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1548 - Switched to gid 0
lxc-start 102 20240623124821.777 NOTICE utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1557 - Switched to uid 0
lxc-start 102 20240623124821.777 DEBUG start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved net namespace via fd 5 and stashed path as net:/proc/1252628/fd/5
lxc-start 102 20240623124821.785 INFO utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/lxcnetaddbr" for container "102", config section "net"
lxc-start 102 20240623124823.312 DEBUG network - ../src/lxc/network.c:netdev_configure_server_veth:876 - Instantiated veth tunnel "veth102i0 <--> vethQDsZmb"
lxc-start 102 20240623124823.312 DEBUG conf - ../src/lxc/conf.c:lxc_mount_rootfs:1240 - Mounted rootfs "/var/lib/lxc/102/rootfs" onto "/usr/lib/x86_64-linux-gnu/lxc/rootfs" with options "(null)"
lxc-start 102 20240623124823.312 INFO conf - ../src/lxc/conf.c:setup_utsname:679 - Set hostname to "dashboard-sandbox.internal.robotronic.be"
lxc-start 102 20240623124823.336 DEBUG network - ../src/lxc/network.c:setup_hw_addr:3863 - Mac address "16:85:39:1C:5A:86" on "eth0" has been setup
lxc-start 102 20240623124823.336 DEBUG network - ../src/lxc/network.c:lxc_network_setup_in_child_namespaces_common:4004 - Network device "eth0" has been setup
lxc-start 102 20240623124823.336 INFO network - ../src/lxc/network.c:lxc_setup_network_in_child_namespaces:4061 - Finished setting up network devices with caller assigned names
lxc-start 102 20240623124823.336 INFO conf - ../src/lxc/conf.c:mount_autodev:1023 - Preparing "/dev"
lxc-start 102 20240623124823.339 INFO conf - ../src/lxc/conf.c:mount_autodev:1084 - Prepared "/dev"
lxc-start 102 20240623124823.339 DEBUG conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:539 - Invalid argument - Tried to ensure procfs is unmounted
lxc-start 102 20240623124823.339 DEBUG conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:562 - Invalid argument - Tried to ensure sysfs is unmounted
lxc-start 102 20240623124823.340 DEBUG conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/sys/fs/fuse/connections" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/sys/fs/fuse/connections" to respect bind or remount options
lxc-start 102 20240623124823.340 DEBUG conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/sys/fs/fuse/connections" were 4110, required extra flags are 14
lxc-start 102 20240623124823.340 DEBUG conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/sys/fs/fuse/connections" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/sys/fs/fuse/connections" with filesystem type "none"
lxc-start 102 20240623124823.340 DEBUG conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "proc" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/dev/.lxc/proc" with filesystem type "proc"
lxc-start 102 20240623124823.340 DEBUG conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "sys" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/dev/.lxc/sys" with filesystem type "sysfs"
lxc-start 102 20240623124823.340 DEBUG cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroupfs_mount:2187 - Mounted cgroup filesystem cgroup2 onto 19((null))
lxc-start 102 20240623124823.340 INFO utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxcfs/lxc.mount.hook" for container "102", config section "lxc"
lxc-start 102 20240623124823.384 INFO utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-autodev-hook" for container "102", config section "lxc"
lxc-start 102 20240623124823.631 INFO conf - ../src/lxc/conf.c:lxc_fill_autodev:1121 - Populating "/dev"
lxc-start 102 20240623124823.631 DEBUG conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/full) to 18(full)
lxc-start 102 20240623124823.631 DEBUG conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/null) to 18(null)
lxc-start 102 20240623124823.631 DEBUG conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/random) to 18(random)
lxc-start 102 20240623124823.631 DEBUG conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/tty) to 18(tty)
lxc-start 102 20240623124823.631 DEBUG conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/urandom) to 18(urandom)
[/CODE]

Jera92 · Jun 23, 2024

Bash:

lxc-start 102 20240623124823.632 DEBUG    conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/zero) to 18(zero)

lxc-start 102 20240623124823.632 INFO     conf - ../src/lxc/conf.c:lxc_fill_autodev:1209 - Populated "/dev"

lxc-start 102 20240623124823.632 INFO     conf - ../src/lxc/conf.c:lxc_transient_proc:3307 - Caller's PID is 1; /proc/self points to 1

lxc-start 102 20240623124823.632 DEBUG    conf - ../src/lxc/conf.c:lxc_setup_devpts_child:1554 - Attached detached devpts mount 20 to 18/pts

lxc-start 102 20240623124823.632 DEBUG    conf - ../src/lxc/conf.c:lxc_setup_devpts_child:1640 - Created "/dev/ptmx" file as bind mount target

lxc-start 102 20240623124823.632 DEBUG    conf - ../src/lxc/conf.c:lxc_setup_devpts_child:1647 - Bind mounted "/dev/pts/ptmx" to "/dev/ptmx"

lxc-start 102 20240623124823.632 DEBUG    conf - ../src/lxc/conf.c:lxc_allocate_ttys:908 - Created tty with ptx fd 22 and pty fd 23 and index 1

lxc-start 102 20240623124823.632 DEBUG    conf - ../src/lxc/conf.c:lxc_allocate_ttys:908 - Created tty with ptx fd 24 and pty fd 25 and index 2

lxc-start 102 20240623124823.632 INFO     conf - ../src/lxc/conf.c:lxc_allocate_ttys:913 - Finished creating 2 tty devices

lxc-start 102 20240623124823.632 DEBUG    conf - ../src/lxc/conf.c:lxc_setup_ttys:869 - Bind mounted "pts/1" onto "tty1"

lxc-start 102 20240623124823.632 DEBUG    conf - ../src/lxc/conf.c:lxc_setup_ttys:869 - Bind mounted "pts/2" onto "tty2"

lxc-start 102 20240623124823.632 INFO     conf - ../src/lxc/conf.c:lxc_setup_ttys:876 - Finished setting up 2 /dev/tty<N> device(s)

lxc-start 102 20240623124823.633 INFO     conf - ../src/lxc/conf.c:setup_personality:1720 - Set personality to "0lx0"

lxc-start 102 20240623124823.633 DEBUG    conf - ../src/lxc/conf.c:capabilities_deny:3006 - Capabilities have been setup

lxc-start 102 20240623124823.633 NOTICE   conf - ../src/lxc/conf.c:lxc_setup:4014 - The container "102" is set up

lxc-start 102 20240623124823.633 INFO     apparmor - ../src/lxc/lsm/apparmor.c:apparmor_process_label_set_at:1189 - Set AppArmor label to "lxc-102_</var/lib/lxc>//&:lxc-102_<-var-lib-lxc>:"

lxc-start 102 20240623124823.633 INFO     apparmor - ../src/lxc/lsm/apparmor.c:apparmor_process_label_set:1234 - Changed AppArmor profile to lxc-102_</var/lib/lxc>//&:lxc-102_<-var-lib-lxc>:

lxc-start 102 20240623124823.634 DEBUG    terminal - ../src/lxc/terminal.c:lxc_terminal_peer_default:716 - Using terminal "/dev/tty" as proxy

lxc-start 102 20240623124823.634 DEBUG    terminal - ../src/lxc/terminal.c:lxc_terminal_winsz:60 - Set window size to 176 columns and 56 rows

lxc-start 102 20240623124823.634 NOTICE   start - ../src/lxc/start.c:start:2201 - Exec'ing "/sbin/init"

lxc-start 102 20240623124823.634 ERROR    start - ../src/lxc/start.c:start:2204 - No such file or directory - Failed to exec "/sbin/init"

lxc-start 102 20240623124823.634 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 7)

lxc-start 102 20240623124823.724 INFO     network - ../src/lxc/network.c:lxc_delete_network_priv:3720 - Removed interface "veth102i0" from ""

lxc-start 102 20240623124823.724 DEBUG    network - ../src/lxc/network.c:lxc_delete_network:4217 - Deleted network devices

lxc-start 102 20240623124823.724 ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "102"

lxc-start 102 20240623124823.724 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 16 for process 1252647

lxc-start 102 20240623124824.539 INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "102", config section "lxc"

lxc-start 102 20240623124826.272 INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "102", config section "lxc"

lxc-start 102 20240623124826.775 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:lxc_start_main:307 - The container failed to start

lxc-start 102 20240623124826.775 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:lxc_start_main:312 - Additional information can be obtained by setting the --logfile and --logpriority options

Journalctl output when starting LXC102:

Bash:

pve2-sandbox# journalctl -xe



Jun 23 14:46:12 pve2-sandbox pvedaemon[1670]: <root@pam> end task UPID:pve2-sandbox:00125DEE:01F30695:6677E54A:vncshell::root@pam: OK

Jun 23 14:46:12 pve2-sandbox pveproxy[1205935]: worker exit

Jun 23 14:46:21 pve2-sandbox pvedaemon[1252011]: <root@pam> move volume CT 102: move --volume rootfs --storage local-lvm

Jun 23 14:46:21 pve2-sandbox pvedaemon[1670]: <root@pam> starting task UPID:pve2-sandbox:00131AAB:02074540:6678191D:move_volume:102:root@pam:

Jun 23 14:46:24 pve2-sandbox kernel: loop1: detected capacity change from 0 to 20971520

Jun 23 14:46:24 pve2-sandbox kernel: EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

Jun 23 14:46:24 pve2-sandbox kernel: EXT4-fs (dm-12): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

Jun 23 14:46:54 pve2-sandbox pvedaemon[1670]: <root@pam> end task UPID:pve2-sandbox:00131AAB:02074540:6678191D:move_volume:102:root@pam: OK

Jun 23 14:47:30 pve2-sandbox kernel: EXT4-fs (dm-12): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

Jun 23 14:47:30 pve2-sandbox audit[1252347]: AVC apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-102_</var/lib/lxc>" pid=1252347 comm="apparm>

Jun 23 14:47:30 pve2-sandbox kernel: audit: type=1400 audit(1719146850.905:41): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-102_</var/lib>

Jun 23 14:47:32 pve2-sandbox kernel: vmbr0: port 3(veth102i0) entered blocking state

Jun 23 14:47:32 pve2-sandbox kernel: vmbr0: port 3(veth102i0) entered disabled state

Jun 23 14:47:32 pve2-sandbox kernel: device veth102i0 entered promiscuous mode

Jun 23 14:47:32 pve2-sandbox kernel: eth0: renamed from vethYZIFxf

Jun 23 14:47:32 pve2-sandbox kernel: vmbr0: port 3(veth102i0) entered disabled state

Jun 23 14:47:32 pve2-sandbox kernel: device veth102i0 left promiscuous mode

Jun 23 14:47:32 pve2-sandbox kernel: vmbr0: port 3(veth102i0) entered disabled state

Jun 23 14:47:33 pve2-sandbox audit[1252414]: AVC apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-102_</var/lib/lxc>" pid=1252414 comm="appa>

Jun 23 14:47:33 pve2-sandbox kernel: audit: type=1400 audit(1719146853.501:42): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-102_</var/l>

Jun 23 14:47:33 pve2-sandbox pvedaemon[1671]: unable to get PID for CT 102 (not running?)

Jun 23 14:47:33 pve2-sandbox pvedaemon[1672]: unable to get PID for CT 102 (not running?)

Jun 23 14:47:33 pve2-sandbox pvestatd[1619]: unable to get PID for CT 102 (not running?)

Jun 23 14:47:33 pve2-sandbox pvestatd[1619]: status update time (5.887 seconds)

Jun 23 14:48:20 pve2-sandbox kernel: EXT4-fs (dm-12): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

Jun 23 14:48:21 pve2-sandbox audit[1252646]: AVC apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-102_</var/lib/lxc>" pid=1252646 comm="apparm>

Jun 23 14:48:21 pve2-sandbox kernel: audit: type=1400 audit(1719146901.758:43): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-102_</var/lib>

Jun 23 14:48:23 pve2-sandbox kernel: vmbr0: port 3(veth102i0) entered blocking state

Jun 23 14:48:23 pve2-sandbox kernel: vmbr0: port 3(veth102i0) entered disabled state

Jun 23 14:48:23 pve2-sandbox kernel: device veth102i0 entered promiscuous mode

Jun 23 14:48:23 pve2-sandbox kernel: eth0: renamed from vethQDsZmb

Jun 23 14:48:23 pve2-sandbox kernel: vmbr0: port 3(veth102i0) entered disabled state

Jun 23 14:48:23 pve2-sandbox kernel: device veth102i0 left promiscuous mode

Jun 23 14:48:23 pve2-sandbox kernel: vmbr0: port 3(veth102i0) entered disabled state

Jun 23 14:48:24 pve2-sandbox audit[1252713]: AVC apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-102_</var/lib/lxc>" pid=1252713 comm="appa>

Jun 23 14:48:24 pve2-sandbox kernel: audit: type=1400 audit(1719146904.530:44): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-102_</var/l>

Jun 23 14:48:24 pve2-sandbox pvedaemon[1672]: unable to get PID for CT 102 (not running?)

Jun 23 14:48:24 pve2-sandbox pvedaemon[1671]: unable to get PID for CT 102 (not running?)

Jun 23 14:48:39 pve2-sandbox pvestatd[1619]: got timeout

Jun 23 14:49:20 pve2-sandbox pveproxy[1243818]: worker exit

Jun 23 14:49:21 pve2-sandbox pveproxy[2480]: worker 1243818 finished

Jun 23 14:49:21 pve2-sandbox pveproxy[2480]: starting 1 worker(s)

Jun 23 14:49:21 pve2-sandbox pveproxy[2480]: worker 1252958 started

Jera92 · Jun 23, 2024

Chris said:
Further, check the ceph cluster health status and double check the systemd journal for errors around the time you try to start the failed container.

This should help to further narrow down the issue.

The CEPH system mention some error before the upgrade due to the availibility of host pve1-sandbox:

Chris · Jun 24, 2024

Jera92 said:
lxc-start 102 20240623124823.634 ERROR start - ../src/lxc/start.c:start:2204 - No such file or directory - Failed to exec "/sbin/init"

Hmm, so you are trying with a different container now? And get once again a different error, but probably also related to a corrupt filesystem. Did you try any of this before or after you run a filesystem check on the container?

Jera92 said:
The CEPH system mention some error before the upgrade due to the availibility of host pve1-sandbox:

Maybe you should investigate this further, as it seems you have issues with the underlying storage and the container filesystems being corrupt might indicate that there is something wrong. Where are your backups stored, on the qnap i guess?

Jera92 · Jun 25, 2024

Chris said:
Hmm, so you are trying with a different container now? And get once again a different error, but probably also related to a corrupt filesystem. Did you try any of this before or after you run a filesystem check on the container?

Yes, the LXC101 was now running succesfully, I didn't want to break it again.
Because I was expierincing the same issue with LXC 102, I tried the same steps. But it seems this one has other problems?

Chris said:
Maybe you should investigate this further, as it seems you have issues with the underlying storage and the container filesystems being corrupt might indicate that there is something wrong. Where are your backups stored, on the qnap i guess?

What I now found out, on PVE2-sandbox disk 4 has a SMART failure, so I guess that disk is broken.

Bash:

pve2-sandbox# journalctl -xe
Jun 25 14:17:01 pve2-sandbox CRON[1891785]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 25 14:17:01 pve2-sandbox CRON[1891784]: pam_unix(cron:session): session closed for user root
Jun 25 14:44:57 pve2-sandbox smartd[870]: Device: /dev/bus/4 [megaraid_disk_04], SMART Failure: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE
Jun 25 14:54:38 pve2-sandbox pmxcfs[1140]: [dcdb] notice: data verification successful
Jun 25 15:14:57 pve2-sandbox smartd[870]: Device: /dev/bus/4 [megaraid_disk_04], SMART Failure: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE

But what I don't understand, why does the LXC 101 keep running perfectly on PVE2-sandbox which has a broken HDD and the LXC can't run on other nodes in the cluster, with perfectly working disks?
As CEPH is shared storage, I thought it is shared along all disks of each host in the cluster that are configured with CEPH?.. Or am I wrong here?..

What will happen when I remove the broken hard drive and restore the LXC102 afterwards?

Yes, the Back-up's are stored on a SMB target on QNAP nas'es.

Chris · Jun 25, 2024

Jera92 said:
Device: /dev/bus/4 [megaraid_disk_04], SMART Failure

Oh, you should not use hardware RAID in combination with Ceph OSDs, this is not recommended at all, see https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_recommendations_for_a_healthy_ceph_cluster

Jera92 said:
As CEPH is shared storage, I thought it is shared along all disks of each host in the cluster that are configured with CEPH?.. Or am I wrong here?..

Yes, Ceph will give you redundancy, but it is designed with raw disk access in mind, so if you have the additional hardware RAID layer, that is calling for trouble. Further, on which OSD the data is placed is given by the CRUSH map, see https://docs.ceph.com/en/quincy/rados/operations/crush-map/ You don't have all the data on all the disks...

Jera92 said:
Device: /dev/bus/4 [megaraid_disk_04]

You should not only replace the faulty hard disk, but rather recreate the test cluster without the hardware RAID.

Jera92 · Jun 25, 2024

Chris said:
Oh, you should not use hardware RAID in combination with Ceph OSDs, this is not recommended at all, see https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_recommendations_for_a_healthy_ceph_cluster

Yes, Ceph will give you redundancy, but it is designed with raw disk access in mind, so if you have the additional hardware RAID layer, that is calling for trouble. Further, on which OSD the data is placed is given by the CRUSH map, see https://docs.ceph.com/en/quincy/rados/operations/crush-map/ You don't have all the data on all the disks...

You should not only replace the faulty hard disk, but rather recreate the test cluster without the hardware RAID.

Thank you for reply.

I know CEPH and Raid is not a good option, but on these older servers, I was unable to remove the RAID card and add a passtrough card.
I didn't configure any RAID setup on the hardware RAID controller, but I placed every disk in a bypass mode supported by the RAID card.
For a sandbox environment, I thought it was okay...

In our production environment there is passtrough card, so no RAID hardware card is involved.

Is there anything else I can do?..

Chris · Jun 26, 2024

Jera92 said:
Is there anything else I can do?..

First make sure your ceph cluster is healthy and replace defect OSDs, please have a closer look at section 8.12 and 8.13 which describe Ceph cluster maintenance and troubleshooting in details https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_ceph_maintenance

Jera92 · Jul 16, 2024

I did a test on one production server with the correct setup for CEPH (no hardware RAID).
Upgraded from PVE 7.4-18 to PVE 8.2.4.

We also have Debian 12 LXC's and with HA migration to the node with the latest version of PVE, it doesn't want to start.
I tested with a Debian LXC with id 102:

Bash:

task started by HA resource agent
run_buffer: 571 Script exited with status 255
lxc_init: 845 Failed to run lxc.hook.pre-start for container "101"
__lxc_start: 2034 Failed to initialize container "101"
TASK ERROR: startup for container '101' failed

Journalctl -xe mentions a ext4 error:

Bash:

pve1# journalctl -xe
Jul 16 10:42:11 pve1 kernel: loop0: detected capacity change from 0 to 20971520
Jul 16 10:42:11 pve1 kernel: EXT4-fs error (device loop0): ext4_get_journal_inode:5779: inode #8: comm mount: iget: checksum invalid
Jul 16 10:42:11 pve1 kernel: EXT4-fs (loop0): no journal found
Jul 16 10:42:11 pve1 kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Jul 16 10:42:11 pve1 pvedaemon[2135]: unable to get PID for CT 101 (not running?)
Jul 16 10:42:11 pve1 pve-ha-lrm[32154]: startup for container '101' failed
Jul 16 10:42:11 pve1 pve-ha-lrm[32152]: <root@pam> end task UPID:pve1:00007D9A:00015395:66963261:vzstart:101:root@pam: startup for container '101' failed
Jul 16 10:42:11 pve1 pve-ha-lrm[32152]: unable to start service ct:101
Jul 16 10:42:11 pve1 pve-ha-lrm[4461]: restart policy: retry number 2 for service 'ct:101'
Jul 16 10:42:13 pve1 systemd[1]: pve-container@101.service: Main process exited, code=exited, status=1/FAILURE

The configuration of the LCX:

Bash:

# pct config 101 --current
arch: amd64
cores: 1
description: # Vaultwarden LXC general%0A## Network config%0A| NIC | IPv4 | MAC | %0A| ---%3A--- | ---%3A--- | ---%3A--- |%0A| net0 | 192.168.16.101 | E2%3A61%3ADC%3A07%3A1F%3A8F |%0A
features: keyctl=1,nesting=1
hostname: vaultwarden.internal.robotronic.be
memory: 512
nameserver: 192.168.16.238
net0: name=eth0,bridge=vmbr0,gw=192.168.23.254,hwaddr=E2:61:DC:07:1F:8F,ip=192.168.16.101/21,type=veth
onboot: 1
ostype: debian
rootfs: vm-lxc-storage:101/vm-101-disk-0.raw,size=10G
searchdomain: internal.robotronic.be
swap: 512
tags: debian12;webserver
unprivileged: 1

I ran a file system check on the LXC disk:

Bash:

pve1# pct fsck 101


fsck from util-linux 2.38.1
/mnt/pve/cephfs_cluster/vm-lxc-storage/images/101/vm-101-disk-0.raw: Superblock has an invalid journal (inode 8).
CLEARED.
*** journal has been deleted ***

/mnt/pve/cephfs_cluster/vm-lxc-storage/images/101/vm-101-disk-0.raw: Resize inode not valid.

/mnt/pve/cephfs_cluster/vm-lxc-storage/images/101/vm-101-disk-0.raw: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
        (i.e., without -a or -p options)
command 'fsck -a -l /mnt/pve/cephfs_cluster/vm-lxc-storage/images/101/vm-101-disk-0.raw' failed: exit code 4

Then I ran it manually on the lxc disk (a lot of inode issues):

Bash:

pve1# fsck -l /mnt/pve/cephfs_cluster/vm-lxc-storage/images/101/vm-101-disk-0.raw
Inode 146317 ref count is 2, should be 1.  Fix? yes

Unattached inode 146319
Connect to /lost+found? yes


/mnt/pve/cephfs_cluster/vm-lxc-storage/images/101/vm-101-disk-0.raw: ***** FILE SYSTEM WAS MODIFIED *****
/mnt/pve/cephfs_cluster/vm-lxc-storage/images/101/vm-101-disk-0.raw: 31726/655360 files (0.4% non-contiguous), 608611/2621440 blocks

I tried to startup the LXC in debug mode:

Bash:

#  pct start 101 --debug
mount_autodev: 1028 Permission denied - Failed to create "/dev" directory
lxc_setup: 3898 Failed to mount "/dev"
do_start: 1273 Failed to setup container "101"
sync_wait: 34 An error occurred in another process (expected sequence number 3)
__lxc_start: 2114 Failed to spawn container "101"
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "101", config section "lxc"
DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 101 lxc pre-start produced output: /etc/os-release file not found and autodetection failed, falling back to 'unmanaged'
WARNING: /etc not present in CT, is the rootfs mounted?
got unexpected ostype (unmanaged != debian)

INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:unpriv_systemd_create_scope:1498 - Running privileged, not using a systemd unit
DEBUG    seccomp - ../src/lxc/seccomp.c:parse_config_v2:664 - Host native arch is [3221225534]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "reject_force_umount  # comment this to allow umount -f;  not recommended"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "[all]"
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "kexec_load errno 1"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[246:kexec_load] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[246:kexec_load] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[246:kexec_load] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "open_by_handle_at errno 1"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[304:open_by_handle_at] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[304:open_by_handle_at] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[304:open_by_handle_at] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "init_module errno 1"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[175:init_module] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[175:init_module] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[175:init_module] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "finit_module errno 1"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[313:finit_module] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[313:finit_module] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[313:finit_module] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "delete_module errno 1"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[176:delete_module] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[176:delete_module] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[176:delete_module] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "ioctl errno 1 [1,0x9400,SCMP_CMP_MASKED_EQ,0xff00]"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:555 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[16:ioctl] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:555 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[16:ioctl] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:555 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[16:ioctl] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:1036 - Merging compat seccomp contexts into main context
INFO     start - ../src/lxc/start.c:lxc_init:882 - Container "101" is initialized
INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_create:1669 - The monitor process uses "lxc.monitor/101" as cgroup
DEBUG    storage - ../src/lxc/storage/storage.c:storage_query:231 - Detected rootfs type "dir"
DEBUG    storage - ../src/lxc/storage/storage.c:storage_query:231 - Detected rootfs type "dir"
INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_payload_create:1777 - The container process uses "lxc/101/ns" as inner and "lxc/101" as limit cgroup
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWUSER
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWNS
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWPID
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWUTS
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWIPC
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWCGROUP
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved user namespace via fd 17 and stashed path as user:/proc/37044/fd/17
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved mnt namespace via fd 18 and stashed path as mnt:/proc/37044/fd/18
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved pid namespace via fd 19 and stashed path as pid:/proc/37044/fd/19
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved uts namespace via fd 20 and stashed path as uts:/proc/37044/fd/20
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved ipc namespace via fd 21 and stashed path as ipc:/proc/37044/fd/21
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved cgroup namespace via fd 22 and stashed path as cgroup:/proc/37044/fd/22
DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newuidmap" does have the setuid bit set
DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newgidmap" does have the setuid bit set
DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:178 - Functional newuidmap and newgidmap binary found
INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_setup_limits:3528 - Limits for the unified cgroup hierarchy have been setup
DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newuidmap" does have the setuid bit set
DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newgidmap" does have the setuid bit set
INFO     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:176 - Caller maps host root. Writing mapping directly
NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1572 - Dropped supplimentary groups
INFO     start - ../src/lxc/start.c:do_start:1105 - Unshared CLONE_NEWNET
NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1572 - Dropped supplimentary groups
NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1548 - Switched to gid 0
NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1557 - Switched to uid 0
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved net namespace via fd 5 and stashed path as net:/proc/37044/fd/5
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/lxcnetaddbr" for container "101", config section "net"
DEBUG    network - ../src/lxc/network.c:netdev_configure_server_veth:876 - Instantiated veth tunnel "veth101i0 <--> vethgd6BVY"
DEBUG    conf - ../src/lxc/conf.c:lxc_mount_rootfs:1240 - Mounted rootfs "/var/lib/lxc/101/rootfs" onto "/usr/lib/x86_64-linux-gnu/lxc/rootfs" with options "(null)"
INFO     conf - ../src/lxc/conf.c:setup_utsname:679 - Set hostname to "vaultwarden.internal.robotronic.be"
DEBUG    network - ../src/lxc/network.c:setup_hw_addr:3863 - Mac address "E2:61:DC:07:1F:8F" on "eth0" has been setup
DEBUG    network - ../src/lxc/network.c:lxc_network_setup_in_child_namespaces_common:4004 - Network device "eth0" has been setup
INFO     network - ../src/lxc/network.c:lxc_setup_network_in_child_namespaces:4061 - Finished setting up network devices with caller assigned names
INFO     conf - ../src/lxc/conf.c:mount_autodev:1023 - Preparing "/dev"
ERROR    conf - ../src/lxc/conf.c:mount_autodev:1028 - Permission denied - Failed to create "/dev" directory
INFO     conf - ../src/lxc/conf.c:mount_autodev:1084 - Prepared "/dev"
ERROR    conf - ../src/lxc/conf.c:lxc_setup:3898 - Failed to mount "/dev"
ERROR    start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "101"
ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 3)
DEBUG    network - ../src/lxc/network.c:lxc_delete_network:4217 - Deleted network devices
ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "101"
WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 16 for process 37080
startup for container '101' failed

When migrating the container back to another PVE node with version of 7.4-18 I now have the following error:

Bash:

task started by HA resource agent
mount_autodev: 1225 Permission denied - Failed to create "/dev" directory
lxc_setup: 4395 Failed to mount "/dev"
do_start: 1272 Failed to setup container "101"
sync_wait: 34 An error occurred in another process (expected sequence number 3)
__lxc_start: 2107 Failed to spawn container "101"
TASK ERROR: startup for container '101' failed

Any idea's?
I want to update the whole production cluster only if I'm 100% sure that the containers can run on the latest PVE version...

Thank you for further investigating this!

Proxmox Virtual Environment 8.2.2 - LXC High Availibilty after upgrade PVE v7 to v8

Active Member

Proxmox Staff Member

Active Member

Distinguished Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Active Member

Active Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

We value your privacy