My LXC container failed to boot

DoorTruck

New Member
Feb 13, 2025
11
0
1
I had a problem a month ago where I ran out of space which gave me problems I have now fixed https://forum.proxmox.com/threads/l...tart-now-i-get-manual-repair-required.168850/

But today it failed again, but without having run out of space, should I be worried about that machine? Everything else runs fine and on the same ssd, its running on a https://www.gmktec.com/products/amd...&variant=be82d75a-6d7f-4cde-9455-acbdfe0ae998

I fixed it with this command and pressing 'y' a bunch of times, but I fear it will keep coming back?
fsck /dev/mapper/pve-vm--101--disk--0

run_buffer: 571 Script exited with status 32
lxc_init: 845 Failed to run lxc.hook.pre-start for container "101"
__lxc_start: 2034 Failed to initialize container "101"
1000 hostid 1000 range 2000
INFO confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type u nsid 65534 hostid 165534 range 1
INFO confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 0 hostid 100000 range 100
INFO confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 100 hostid 100 range 1
INFO confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 101 hostid 100100 range 899
INFO confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 1000 hostid 1000 range 2000
INFO lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
INFO utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "101", config section "lxc"
DEBUG utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 101 lxc pre-start produced output: mount: /var/lib/lxc/.pve-staged-mounts/rootfs: can't read superblock on /dev/mapper/pve-vm--101--disk--0.
dmesg(1) may have more information after failed mount system call.

DEBUG utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 101 lxc pre-start produced output: command 'mount /dev/dm-6 /var/lib/lxc/.pve-staged-mounts/rootfs' failed: exit code 32

ERROR utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 32
ERROR start - ../src/lxc/start.c:lxc_init:845 - Failed to run lxc.hook.pre-start for container "101"
ERROR start - ../src/lxc/start.c:__lxc_start:2034 - Failed to initialize container "101"
INFO utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "101", config section "lxc"

TASK ERROR: startup for container '101' failed
 
Last edited:
I'd like to take a look at lsblk -o+FSTYPE,MODEL, pvs and lvs.
Code:
root@reppiks:~# lsblk -o+FSTYPE,MODEL
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS FSTYPE      MODEL
nvme0n1                      259:0    0 953.9G  0 disk                         TWSC TSC3AN1T0-F6Q10S
├─nvme0n1p1                  259:1    0  1007K  0 part                         
├─nvme0n1p2                  259:2    0     1G  0 part /boot/efi   vfat       
└─nvme0n1p3                  259:3    0 952.9G  0 part             LVM2_member
  ├─pve-swap                 252:0    0     8G  0 lvm  [SWAP]      swap       
  ├─pve-root                 252:1    0    96G  0 lvm  /           ext4       
  ├─pve-data_tmeta           252:2    0   8.3G  0 lvm                         
  │ └─pve-data-tpool         252:4    0 816.2G  0 lvm                         
  │   ├─pve-data             252:5    0 816.2G  1 lvm                         
  │   ├─pve-vm--101--disk--0 252:6    0   400G  0 lvm              ext4       
  │   ├─pve-vm--102--disk--0 252:7    0     2G  0 lvm              ext4       
  │   ├─pve-vm--100--disk--1 252:8    0    32G  0 lvm                         
  │   └─pve-vm--100--disk--0 252:9    0     4M  0 lvm                         
  └─pve-data_tdata           252:3    0 816.2G  0 lvm                         
    └─pve-data-tpool         252:4    0 816.2G  0 lvm                         
      ├─pve-data             252:5    0 816.2G  1 lvm                         
      ├─pve-vm--101--disk--0 252:6    0   400G  0 lvm              ext4       
      ├─pve-vm--102--disk--0 252:7    0     2G  0 lvm              ext4       
      ├─pve-vm--100--disk--1 252:8    0    32G  0 lvm                         
      └─pve-vm--100--disk--0 252:9    0     4M  0 lvm                         
root@reppiks:~# pvs
  PV             VG  Fmt  Attr PSize    PFree
  /dev/nvme0n1p3 pve lvm2 a--  <952.87g <7.68g
root@reppiks:~# lvs
  LV            VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-cotzM- <816.21g             42.39  2.31                           
  data_meta0    pve -wi-------   <8.33g                                                   
  root          pve -wi-ao----   96.00g                                                   
  swap          pve -wi-ao----    8.00g                                                   
  vm-100-disk-0 pve Vwi-aotz--    4.00m data        0.00                                   
  vm-100-disk-1 pve Vwi-aotz--   32.00g data        23.17                                 
  vm-101-disk-0 pve Vwi-aotz--  400.00g data        84.38                                 
  vm-102-disk-0 pve Vwi-aotz--    2.00g data        52.75
 
I'd try if pct fstrim 101 can get the Data% of the LV a bit lower but otherwise nothing sticks out. I never heard of that model. Please share smartctl -a /dev/nvme0n1 too.
 
I'd try if pct fstrim 101 can get the Data% of the LV a bit lower but otherwise nothing sticks out. I never heard of that model. Please share smartctl -a /dev/nvme0n1 too.
Would it help to recreate the LXC container? Wonder if some corruption form when it was filled up have stayed?

Code:
root@reppiks:~# pct fstrim 101
/var/lib/lxc/101/rootfs/: 266.7 GiB (286389686272 bytes) trimmed
root@reppiks:~# smartctl -a /dev/nvme0n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-7-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       TWSC TSC3AN1T0-F6Q10S
Serial Number:                      TTSQA24CUX07924
Firmware Version:                   SN14243
PCI Vendor/Subsystem ID:            0x1e4b
IEEE OUI Identifier:                0x000000
Total NVM Capacity:                 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            000000 2400007924
Local Time is:                      Tue Aug 19 10:42:01 2025 CEST
Firmware Updates (0x1a):            5 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     95 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.50W       -        -    0  0  0  0        0       0
 1 +     5.80W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.7460W       -        -    3  3  3  3     5000   10000
 4 -   0.7260W       -        -    4  4  4  4     8000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        42 Celsius
Available Spare:                    100%
Available Spare Threshold:          1%
Percentage Used:                    1%
Data Units Read:                    4,248,511 [2.17 TB]
Data Units Written:                 5,128,116 [2.62 TB]
Host Read Commands:                 113,152,414
Host Write Commands:                152,027,023
Controller Busy Time:               471
Power Cycles:                       19
Power On Hours:                     4,835
Unsafe Shutdowns:                   4
Media and Data Integrity Errors:    7
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               42 Celsius
Temperature Sensor 2:               51 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
 
Now it happened again :(

I just get
root@docker:~# ls
-bash: /usr/bin/ls: Input/output error
When I do anything on the container

Is it a problem that when I look at the resources tab for the container I only see 350G
1755600236823.png
 
I see nothing sticking out in the disk's SMART data. Unsafe Shutdowns and Media and Data Integrity Errors are probably only a concern if they increased since the last error. I'd also take a look at journalctl -r on the node and CT during that time. I'm not sure how you resized the disk for that to not be updated. AFAIK it's just a comment (some actions like backup/restore can use that though) and pct rescan might update it. I don't know whether the source of the bind mounts becoming unavailable (network issues, for example) can cause this. If this only affects the CT and nothing else it could help to recreate or restore it but you have to try. I'd expect this to be a physical issue but we need more logs.
 
Last edited:
I think the drive is broken, will have to write to gmktek support as its only half a year old :s

Code:
badblocks -sv /dev/nvme0n1

root@reppiks:~# badblocks -sv /dev/nvme0n1

Checking blocks 0 to 1000204631

Checking for bad blocks (read-only test): 23614236done, 1:11 elapsed. (0/0/0 errors)

23614237

23614238

23614239

 25.40% done, 14:10 elapsed. (4/0/0 errors)

Code:
journalctl -r

Aug 19 20:15:33 reppiks kernel: EXT4-fs (dm-1): mounted filesystem bea01f1c-cd02-4f46-8928-8f3eed1449e7 ro with ordered >
Aug 19 20:15:33 reppiks kernel: Btrfs loaded, zoned=yes, fsverity=yes
Aug 19 20:15:33 reppiks kernel: critical medium error, dev nvme0n1, sector 1949616336 op 0x0:(READ) flags 0x0 phys_seg 1>
Aug 19 20:15:33 reppiks kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1949616336, 8 blocks, I/O Error (sct 0x2 / sc 0x81)
Aug 19 20:15:33 reppiks kernel: critical medium error, dev nvme0n1, sector 1949396160 op 0x0:(READ) flags 0x0 phys_seg 1>
Aug 19 20:15:33 reppiks kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1949396160, 8 blocks, I/O Error (sct 0x2 / sc 0x81)
Aug 19 20:15:33 reppiks kernel: critical medium error, dev nvme0n1, sector 1949616336 op 0x0:(READ) flags 0x0 phys_seg 1>
Aug 19 20:15:33 reppiks kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1949616336, 8 blocks, I/O Error (sct 0x2 / sc 0x81)
Aug 19 20:15:33 reppiks kernel: critical medium error, dev nvme0n1, sector 1949396160 op 0x0:(READ) flags 0x0 phys_seg 1>
Aug 19 20:15:33 reppiks kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1949396160, 8 blocks, I/O Error (sct 0x2 / sc 0x81)
Aug 19 20:15:33 reppiks kernel: critical medium error, dev nvme0n1, sector 1949616336 op 0x0:(READ) flags 0x0 phys_seg 1>
Aug 19 20:15:33 reppiks kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1949616336, 8 blocks, I/O Error (sct 0x2 / sc 0x81)
Aug 19 20:15:33 reppiks kernel: critical medium error, dev nvme0n1, sector 1949396160 op 0x0:(READ) flags 0x0 phys_seg 1>
Aug 19 20:15:33 reppiks kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1949396160, 8 blocks, I/O Error (sct 0x2 / sc 0x81)
Aug 19 20:15:33 reppiks kernel: critical medium error, dev nvme0n1, sector 1949616336 op 0x0:(READ) flags 0x0 phys_seg 1>
Aug 19 20:15:33 reppiks kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1949616336, 8 blocks, I/O Error (sct 0x2 / sc 0x81)
Aug 19 20:15:33 reppiks kernel: critical medium error, dev nvme0n1, sector 1949396160 op 0x0:(READ) flags 0x0 phys_seg 1>
Aug 19 20:15:33 reppiks kernel: blk_print_req_error: 2 callbacks suppressed
Aug 19 20:15:33 reppiks kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1949396160, 8 blocks, I/O Error (sct 0x2 / sc 0x81)
Aug 19 20:15:33 reppiks kernel: nvme_log_error: 2 callbacks suppressed
Aug 19 20:15:33 reppiks kernel: fbcon: Taking over console
Aug 19 20:15:33 reppiks kernel: async_tx: api initialized (async)
Aug 19 20:15:33 reppiks kernel: xor: automatically using best checksumming function   avx       
Aug 19 20:15:33 reppiks kernel: raid6: using avx2x2 recovery algorithm