[SOLVED] LXC lassen sich nach Update nicht mehr starten

at3tb · Aug 1, 2020

Hallo Liebe Gemeinde

Habe heute meine PVE upgedatet.
Danach einen Neustart gemacht und alle VM´s und LXC´s sind hochgefahren.
So nach 10min oder so hat der PVE auf einmal Harten Reset gemacht und seit dem lasen sich die LXC´s nicht mehr starten.
Habe daraufhin meine DNS LXC vom Backup zurückgespielt , seitdem läuft er wieder.
Hatt aber ein Neues Subvolume als rootdisk angelegt.

Wenn ich einen anderen LXC starten will steht in der syslog
Aug 01 11:27:52 pve systemd[1]: Started PVE LXC Container: 104.
Aug 01 11:27:53 pve pvedaemon[30537]: <root@pam> end task UPID

ve:00003A57:00089888:5F253598:vzstart:104:root@pam: OK
Aug 01 11:27:54 pve systemd[1]: pve-container@104.service: Main process exited, code=exited, status=1/FAILURE
Aug 01 11:27:54 pve systemd[1]: pve-container@104.service: Failed with result 'exit-code'.

root@pve:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
VMpool 279G 2.36T 28K /VMpool
VMpool/subvol-104-disk-0 102G 8.31G 102G /VMpool/subvol-104-disk-0
VMpool/subvol-105-disk-0 8.03G 192G 8.03G /VMpool/subvol-105-disk-0
VMpool/subvol-106-disk-0 1.03G 79.0G 1.03G /VMpool/subvol-106-disk-0
VMpool/subvol-107-disk-0 729M 3.29G 729M /VMpool/subvol-107-disk-0
VMpool/subvol-107-disk-2 726M 3.29G 726M /VMpool/subvol-107-disk-2

root@pve:~# ls -lah /VMpool/
total 5.0K
drwxr-xr-x 4 root root 4 Aug 1 10:11 .
drwxr-xr-x 19 root root 25 Dec 24 2019 ..
drwxr-xr-x 21 root root 21 Aug 1 10:50 subvol-107-disk-2

ls zeigt mir auch nicht alle Subvolume von den LXC´s an ?
Da fehlen irgenwie 104 105 und 106.

Mit
lxc-start -n 104 -l DEBUG -F -o /tmp/ct104.log

Bekomme ich das in der LOG

root@pve:~# cat /tmp/ct104.log
lxc-start 104 20200801090746.480 INFO lsm - lsm/lsm.c:lsm_init:29 - LSM security driver AppArmor
lxc-start 104 20200801090746.480 INFO conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "104", config section "lxc"
lxc-start 104 20200801090747.278 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 104 lxc pre-start produced output: cannot open directory //VMpool/subvol-104-disk-0: No such file or directory

lxc-start 104 20200801090747.290 ERROR conf - conf.c:run_buffer:323 - Script exited with status 2
lxc-start 104 20200801090747.291 ERROR start - start.c:lxc_init:804 - Failed to run lxc.hook.pre-start for container "104"
lxc-start 104 20200801090747.291 ERROR start - start.c:__lxc_start:1903 - Failed to initialize container "104"
lxc-start 104 20200801090747.291 INFO conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "104", config section "lxc"
lxc-start 104 20200801090748.242 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 104 lxc post-stop produced output: umount: /var/lib/lxc/104/rootfs: not mounted

lxc-start 104 20200801090748.243 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 104 lxc post-stop produced output: command 'umount --recursive -- /var/lib/lxc/104/rootfs' failed: exit code 1

lxc-start 104 20200801090748.361 ERROR conf - conf.c:run_buffer:323 - Script exited with status 1
lxc-start 104 20200801090748.362 ERROR start - start.c:lxc_end:971 - Failed to run lxc.hook.post-stop for container "104"
lxc-start 104 20200801090748.362 ERROR lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 104 20200801090748.363 ERROR lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options

Die LXC´s sind Debain 9

Hoffe das mir Wer helfen kann.
mfg

H4R0 · Aug 1, 2020

Scheint als ob der Mountpoint raus geflogen ist, versuch mal:
zfs set mountpoint=/VMpool/subvol-104-disk-0 VMpool/subvol-104-disk-0

Anschließend versuch den container 104 zu starten, wenn das klappt mach es für die verbleibenden.

at3tb · Aug 1, 2020

Danke für die anregung !
Leider ohne Erfolg

Syslog
Aug 01 15:23:36 pve systemd[1]: Started PVE LXC Container: 104.
Aug 01 15:23:36 pve pvedaemon[1863]: <root@pam> end task UPID: pve:00006637:00129756:5F256CD7:vzstart:104:root@pam: OK
Aug 01 15:23:36 pve pvestatd[1839]: unable to get PID for CT 104 (not running?)
Aug 01 15:23:37 pve systemd[1]: pve-container@104.service: Main process exited, code=exited, status=1/FAILURE
Aug 01 15:23:37 pve systemd[1]: pve-container@104.service: Failed with result 'exit-code'.

Wird auch mit ls -lah /VMpool/
nicht angezeigt.

root@pve:~# lxc-start -n 104 -l DEBUG -F -o /tmp/ct104.log
lxc-start: 104: conf.c: run_buffer: 323 Script exited with status 2
lxc-start: 104: start.c: lxc_init: 804 Failed to run lxc.hook.pre-start for container "104"
lxc-start: 104: start.c: __lxc_start: 1903 Failed to initialize container "104"
lxc-start: 104: conf.c: run_buffer: 323 Script exited with status 1
lxc-start: 104: start.c: lxc_end: 971 Failed to run lxc.hook.post-stop for container "104"
lxc-start: 104: tools/lxc_start.c: main: 308 The container failed to start
lxc-start: 104: tools/lxc_start.c: main: 314 Additional information can be obtained by setting the --logfile and --logpriority options
root@pve:~#

und die Log dazu

root@pve:~# cat /tmp/ct104.log
lxc-start 104 20200801132951.612 INFO lsm - lsm/lsm.c:lsm_init:29 - LSM security driver AppArmor
lxc-start 104 20200801132951.612 INFO conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "104", config section "lxc"
lxc-start 104 20200801132952.415 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 104 lxc pre-start produced output: cannot open directory //VMpool/subvol-104-disk-0: No such file or directory

lxc-start 104 20200801132952.427 ERROR conf - conf.c:run_buffer:323 - Script exited with status 2
lxc-start 104 20200801132952.427 ERROR start - start.c:lxc_init:804 - Failed to run lxc.hook.pre-start for container "104"
lxc-start 104 20200801132952.427 ERROR start - start.c:__lxc_start:1903 - Failed to initialize container "104"
lxc-start 104 20200801132952.427 INFO conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "104", config section "lxc"
lxc-start 104 20200801132953.176 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 104 lxc post-stop produced output: umount: /var/lib/lxc/104/rootfs: not mounted

lxc-start 104 20200801132953.176 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 104 lxc post-stop produced output: command 'umount --recursive -- /var/lib/lxc/104/rootfs' failed: exit code 1

lxc-start 104 20200801132953.187 ERROR conf - conf.c:run_buffer:323 - Script exited with status 1
lxc-start 104 20200801132953.188 ERROR start - start.c:lxc_end:971 - Failed to run lxc.hook.post-stop for container "104"
lxc-start 104 20200801132953.188 ERROR lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 104 20200801132953.188 ERROR lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options

H4R0 · Aug 1, 2020

Poste mal bitte den output von "df -h /VMpool"

Und "systemctl status | head -n 5"

Und versuch dann mal folgendes:
mkdir -p /tmp/104
zfs set mountpoint=/tmp/104 VMpool/subvol-104-disk-0

Poste den output von "df -h /tmp/104" und "ls -lah /tmp/104"

zfs set mountpoint=/VMpool/subvol-104-disk-0 VMpool/subvol-104-disk-0

Ist jetzt das volume unter /Vmpool ?

at3tb · Aug 1, 2020

Na dann wolln ma mal

root@pve:~# df -Th /VMpool/
Filesystem Type Size Used Avail Use% Mounted on
rpool/ROOT/pve-1 zfs 250G 3.5G 247G 2% /

root@pve:~# systemctl status | head -n 5
● pve
State: degraded
Jobs: 0 queued
Failed: 3 units
Since: Sat 2020-08-01 12:00:50 CEST; 3h 52min ago

root@pve:~# df -h /tmp/104
Filesystem Size Used Avail Use% Mounted on
rpool/ROOT/pve-1 250G 3.5G 247G 2% /

root@pve:~# ls -lah /tmp/104
total 11K
drwxr-xr-x 2 root root 2 Aug 1 15:54 .
drwxrwxrwt 9 root root 10 Aug 1 15:54 ..
root@pve:~#

root@pve:~# ls -lah /VMpool/
total 6.0K
drwxr-xr-x 6 root root 6 Aug 1 12:06 .
drwxr-xr-x 19 root root 25 Dec 24 2019 ..
drwxr----- 2 root root 2 Aug 1 10:11 subvol-107-disk-2

H4R0 · Aug 1, 2020

Interessant das funktioniert ja gar nicht mit dem zfs mountpoint.

Dein Server ist degraded. Da sind 3 Dienste nicht richtig gestartet wurden. Ich gehe davon mal aus das zfs hiervon betroffen ist.

Poste mal "systemctl list-units --failed". Die Dienste muss man wieder zum laufen bekommen.

Dann für jeden Dienst in der liste einmal "systemctl status <service>" um den Fehler zu analysieren. Für ssh wäre das z.B. "systemctl status sshd.service", anstatt service kann das aber auch .mount, .target etc. sein, musst du der Liste entnehmen.

Auch nützlich der gesamte log je Dienst "journalctl -u <service>"

Und für mich wäre noch vor allem "zpool status" interessant.

at3tb · Aug 1, 2020

OK Danke schon mal dafür !

root@pve:~# systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● pve-container@104.service loaded failed failed PVE LXC Container: 104
● pve-container@107.service loaded failed failed PVE LXC Container: 107
● pve-container@109.service loaded failed failed PVE LXC Container: 109

LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.

3 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
root@pve:~#

root@pve:~# zpool status
pool: VMpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:23:03 with 0 errors on Sat Aug 1 12:47:49 2020
config:

NAME STATE READ WRITE CKSUM
VMpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD10EZEX-75WN4A0_WD-WCC6Y6FCAFL7 ONLINE 0 0 0
ata-WDC_WD10EZEX-75WN4A0_WD-WCC6Y5XZTF39 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD10EZEX-75WN4A0_WD-WCC6Y3CNXVPL ONLINE 0 0 0
ata-WDC_WD10EZEX-75WN4A0_WD-WCC6Y4PUJFHL ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-WDC_WD10EZEX-75WN4A0_WD-WCC6Y3KZCA1E ONLINE 0 0 0
ata-WDC_WD10EZEX-75WN4A0_WD-WCC6Y2XKNJ41 ONLINE 0 0 0

errors: No known data errors

pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:01:05 with 0 errors on Sat Aug 1 12:51:45 2020
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD3000HLFS-01G6U1_WD-WXC0CA9V2677-part3 ONLINE 0 0 0
ata-WDC_WD3000HLFS-01G6U0_WD-WXL209066628-part3 ONLINE 0 0 0

errors: No known data errors
root@pve:~#

LXC 104 ist mein Video LXC 107 ist Audio und
LXC 109 gibt es schon lange nicht mehr war ein Test Container

root@pve:~# systemctl status pve-container@104.service
● pve-container@104.service - PVE LXC Container: 104
Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Sat 2020-08-01 16:51:27 CEST; 47s ago
Docs: man:lxc-start
man:lxc
man: pct
Process: 11561 ExecStart=/usr/bin/lxc-start -F -n 104 (code=exited, status=1/FAILURE)
Main PID: 11561 (code=exited, status=1/FAILURE)

Aug 01 16:51:26 pve systemd[1]: Started PVE LXC Container: 104.
Aug 01 16:51:27 pve systemd[1]: pve-container@104.service: Main process exited, code=exited, status=1/FAILURE
Aug 01 16:51:27 pve systemd[1]: pve-container@104.service: Failed with result 'exit-code'.
root@pve:~#

H4R0 · Aug 1, 2020

Hmm echt merkwürdig. Die pools sind alle im raid und weisen keine Fehler auf.

Solange die volumes nicht unter /VMpool gemountet sind, werden die container nicht starten.

Kannst du mich per remote auf die Maschine lassen, bringt wenig hier im Forum hin und her zu schreiben. Schreib mir einfach eine PM.

at3tb · Aug 1, 2020

So neue Erkäntniss

Habe denn server mal neugestartet.
Daraufhin war in APT noch Updates drinnen die ich gemacht habe in der hoffnung das was besser wird.
Danach den server nochmal neu gestartet.

Jetzt sieht es so aus das er die SubVolumes nicht beim start mountet und dadurch die LXC nicht starten.
Wenn ich das Backup vom LXC zurückspiele ist das Subvolume gemountet unter

root@pve:~# df -Th
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 12G 0 12G 0% /dev
tmpfs tmpfs 2.4G 9.2M 2.4G 1% /run
rpool/ROOT/pve-1 zfs 250G 3.5G 247G 2% /
tmpfs tmpfs 12G 43M 12G 1% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 12G 0 12G 0% /sys/fs/cgroup
rpool zfs 247G 128K 247G 1% /rpool
rpool/ROOT zfs 247G 128K 247G 1% /rpool/ROOT
rpool/data zfs 247G 128K 247G 1% /rpool/data
/dev/fuse fuse 30M 24K 30M 1% /etc/pve
192.168.1.20:/volume1/PVE nfs4 3.6T 1.4T 2.3T 37% /mnt/pve/Rack-NAS
VMpool/subvol-222-disk-1 zfs 8.0G 920M 7.2G 12% /VMpool/subvol-222-disk-1
tmpfs tmpfs 2.4G 0 2.4G 0% /run/user/0
root@pve:~#

Wenn ich aber den server jetzt nochmal starte verliert er irgenwas und mountet subvolume nicht und dadurch startet der LXC nicht.
Ist ein wennig blöde wenn da der DNS drauf ist LOL

Wo ubd wie macht PVE die mountpoints den die fstab ist leer.
mfg

H4R0 · Aug 1, 2020

ZFS importiert die volumes automatisch aus dem cache. Hat nichts mit fstab zu tun.

Scheint mir so als ob der zfs cache probleme hat.

Mach mal folgendes:
systemctl enable zfs-import.target
zpool set cachefile=/etc/zfs/zpool.cache rpool
zpool set cachefile=/etc/zfs/zpool.cache VMpool
update-initramfs -u
reboot

Ansonsten kannst du auch nochmal versuchen das volume explizit zu mounten
zfs mount VMpool/subvol-104-disk-0

at3tb · Aug 1, 2020

So bin ein stück weiter
Habe jetzt alle LXC mit zfs mount gemountet
Noch schnel ein Backup von allen gemacht.
Und wie alle eingebunden waren nochmals
zpool set chachfile=/etc/zfs/zpoolcache POOLNAME
gemacht
Nochmal updade-Initramfs -u -k ALL

Jetzt bekomme ich beim Booten eine Fehlermeldung:
Aug 01 19:31:50 pve systemd[1]: Started Import ZFS pools by cache file.
Aug 01 19:31:50 pve systemd[1]: Reached target ZFS pool import target.
Aug 01 19:31:50 pve systemd[1]: Starting Mount ZFS filesystems...
Aug 01 19:31:50 pve systemd[1]: Starting Wait for ZFS Volume (zvol) links in /dev...
Aug 01 19:31:50 pve zfs[2024]: cannot mount '/VMpool': directory is not empty
Aug 01 19:31:50 pve kernel: zd0: p1
Aug 01 19:31:50 pve zvol_wait[2026]: Testing 5 zvol links
Aug 01 19:31:50 pve kernel: zd16: p1 p2
Aug 01 19:31:50 pve kernel: zd32: p1
Aug 01 19:31:50 pve systemd[1]: zfs-mount.service: Main process exited, code=exited, status=1/FAILURE
Aug 01 19:31:50 pve systemd[1]: zfs-mount.service: Failed with result 'exit-code'.
Aug 01 19:31:50 pve systemd[1]: Failed to start Mount ZFS filesystems.

Und natürlich gingen keine LXC mehr.
Habe dann probiert mit zfs moutn das subvolume vom DNS Contener zu mounten, was mit der fehlermeldund
Directori is not emty beantwortet wurde.
Darauf hin ein
rm -r /VMpool/subvol-222-disk-0 gemacht
Wieder
zfs mount /VMpool/subvol-222-disk-0
gemacht und er hat es brav eingebunden und der LXC is am laufen.

Warum bindet er den 2ten ZFS Pool nicht ein ?
mfg

at3tb · Aug 1, 2020

JUHU
Nach gefühlten 500 Neustarts , ES LEBT

Habe jetzt ganz brutal ein
rm -r /VMpool gemacht
und einen Neustart.
Keine Fehlermeldung beim Booten.
Ein LXC wollte aber dann immer noch nicht starten obwol bei
df -Th alle subvolumes eingebunden waren.
Noch schnell das Backup zurückgespielt !
Und TaTaaaa LXC starten wieder !
DANKE DANKE DANKE an das super Forum !

Habe schon richtig Angst was beim nächsten Update vom Proxmox wieder schief geht

Es Lebe das BACKUP !

Search

Search

[SOLVED] LXC lassen sich nach Update nicht mehr starten

at3tb

Well-Known Member

H4R0

Well-Known Member

at3tb

Well-Known Member

H4R0

Well-Known Member

at3tb

Well-Known Member

H4R0

Well-Known Member

at3tb

Well-Known Member

H4R0

Well-Known Member

at3tb

Well-Known Member

H4R0

Well-Known Member

at3tb

Well-Known Member

at3tb

Well-Known Member