LXC-Container starten nicht mehr noch Stromausfall durch Orkan Sabine

joerg_re · Feb 10, 2020

Nach einem Stromausfall durch den Orkan "Sabine" starten die LXC-Container nicht mehr. Proxmox 6.1-5 an sich inkl. Weboberfläche scheint komplett zu funktionieren. Die Container liegen alle in zwei ZFS-Dateisystemen, einmal gespiegelt und einmal ohne extras.

Die Fehlermeldung bei allen 10 Containern lautet:

Code:

timed out waiting for client
TASK ERROR: command '/usr/bin/termproxy 5900 --path /vms/102 --perm VM.Console -- /usr/bin/dtach -A /var/run/dtach/vzctlconsole102 -r winch -z lxc-console -n 102 -e -1' failed: exit code 4

ZFS scheint laut Webübersicht ok zu sein:

zpool status:

Code:

root@proxmox1:~# zpool status -v
  pool: WD3TB
state: ONLINE
  scan: scrub repaired 0B in 0 days 00:03:25 with 0 errors on Sun Feb  9 00:27:26 2020
config:

    NAME                      STATE     READ WRITE CKSUM
    WD3TB                     ONLINE       0     0     0
      wwn-0x50014ee2bc21eeec  ONLINE       0     0     0

errors: No known data errors

  pool: WD4TB
state: ONLINE
  scan: scrub repaired 0B in 0 days 00:36:51 with 0 errors on Sun Feb  9 01:00:55 2020
config:

    NAME                        STATE     READ WRITE CKSUM
    WD4TB                       ONLINE       0     0     0
      mirror-0                  ONLINE       0     0     0
        wwn-0x50014ee209a03a05  ONLINE       0     0     0
        wwn-0x50014ee211ce4311  ONLINE       0     0     0

errors: No known data errors

Was ist der nächste logische Schritt? Ich denke ein Art fsck wäre sinnvoll ...

Bin für jeden Hinweis dankbar!

matrix · Feb 10, 2020

Hallo,

Wenn du den Container mit diesem Command pct start <vmid> gestartet hast, was kommt dann zum Terminal?

joerg_re · Feb 10, 2020

pct start 101:

Code:

root@proxmox1:~# pct start 101
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = "de_DE.UTF-8",
    LC_MONETARY = "de_DE.UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Job for pve-container@101.service failed because the control process exited with error code.
See "systemctl status pve-container@101.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@101' failed: exit code 1

systemctl status:

Code:

root@proxmox1:~# systemctl status pve-container@101.service
● pve-container@101.service - PVE LXC Container: 101
   Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled
   Active: failed (Result: exit-code) since Mon 2020-02-10 13:18:12 CET; 3min 5s ago
     Docs: man:lxc-start
           man:lxc
           man:pct
  Process: 87559 ExecStart=/usr/bin/lxc-start -n 101 (code=exited, status=1/FAILURE)

Feb 10 13:18:11 proxmox1 systemd[1]: Starting PVE LXC Container: 101...
Feb 10 13:18:12 proxmox1 lxc-start[87559]: lxc-start: 101: lxccontainer.c: wait_on_daemonized
Feb 10 13:18:12 proxmox1 lxc-start[87559]: lxc-start: 101: tools/lxc_start.c: main: 329 The c
Feb 10 13:18:12 proxmox1 lxc-start[87559]: lxc-start: 101: tools/lxc_start.c: main: 332 To ge
Feb 10 13:18:12 proxmox1 lxc-start[87559]: lxc-start: 101: tools/lxc_start.c: main: 335 Addit
Feb 10 13:18:12 proxmox1 systemd[1]: pve-container@101.service: Control process exited, code=
Feb 10 13:18:12 proxmox1 systemd[1]: pve-container@101.service: Failed with result 'exit-code
Feb 10 13:18:12 proxmox1 systemd[1]: Failed to start PVE LXC Container: 101.

journalctl -xe:

Code:

root@proxmox1:~# journalctl -xe
-- Support: https://www.debian.org/support
--
-- The unit UNIT has successfully entered the 'dead' state.
Feb 10 13:23:00 proxmox1 systemd[1]: var-lib-docker-overlay2-efb59ad9b92d511176b5666bb9dcf26b
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit var-lib-docker-overlay2-efb59ad9b92d511176b5666bb9dcf26b6ccc886829d55d35ebdadcfe9
Feb 10 13:23:01 proxmox1 systemd[1]: pvesr.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit pvesr.service has successfully entered the 'dead' state.
Feb 10 13:23:01 proxmox1 systemd[1]: Started Proxmox VE replication runner.
-- Subject: A start job for unit pvesr.service has finished successfully
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvesr.service has finished successfully.
--
-- The job identifier is 19385.
lines 2418-2440/2440 (END)

matrix · Feb 10, 2020

Versuchen, "dpkg-reconfigure locales" (als root) [1] und wählen das entsprechende Gebietsschema aus.

[1]https://rephlex.de/blog/2018/07/13/proxmox-perl-warning-setting-locale-failed/

joerg_re · Feb 10, 2020

Habe ich gemacht. Fehler bleibt bestehen, nur die Warnung wegen der Sprache sind entfallen.

matrix · Feb 10, 2020

Welche Fehlermeldung erhältst du, wenn du den Container im Vordergrund starten: lxc-start -n 101 -F

joerg_re · Feb 10, 2020

lxc-start -n 101 -F:

Code:

root@proxmox1:~# lxc-start -n 101 -F
lxc-start: 101: conf.c: run_buffer: 352 Script exited with status 2
lxc-start: 101: start.c: lxc_init: 897 Failed to run lxc.hook.pre-start for container "101"
lxc-start: 101: start.c: __lxc_start: 2032 Failed to initialize container "101"
Segmentation fault

Bei den anderen Containern kommt die gleiche Meldung.

matrix · Feb 10, 2020

Hmmmm,

kannst du versuchen, mit fsck: pct fsck vmid,

joerg_re · Feb 10, 2020

Ja, dass war meine allererste Idee gewesen:

Code:

root@proxmox1:/WD3TB# pct fsck 101
unable to run fsck for 'WD3TB:subvol-101-disk-0' (format == subvol)

Hier unterscheiden sich die Fehlermeldung je nachdem auf welchem ZFS-Volumen sie liegen:

Code:

root@proxmox1:~# pct fsck 109
fsck from util-linux 2.33.1
Possibly non-existent device?
fsck.ext2: No such file or directory while trying to open /WD4TB/images/109/vm-109-disk-0.raw
command 'fsck -a -l /WD4TB/images/109/vm-109-disk-0.raw' failed: exit code 8

Das Problem bei dem unteren Container ist, dass das Verzeichnis /WD4TB/images leer ist. Wie kann das sein?

joerg_re · Feb 10, 2020

Die Container werden in der Weboberfläche angezeigt:

oguz · Feb 10, 2020

hi,

probier mal ein debug log zu kriegen[0]

poste hier danach als anhang

[0]: https://pve.proxmox.com/pve-docs/chapter-pct.html#_obtaining_debugging_logs

joerg_re · Feb 10, 2020

Bash:

root@proxmox1:~# lxc-start -n 101 -F -l DEBUG -o /tmp/lxc-ID.log
lxc-start: 101: conf.c: run_buffer: 352 Script exited with status 2
lxc-start: 101: start.c: lxc_init: 897 Failed to run lxc.hook.pre-start for container "101"
lxc-start: 101: start.c: __lxc_start: 2032 Failed to initialize container "101"
Speicherzugriffsfehler

Datei wird leider keine angelegt.

Was mich irritiert ist die Ausgabe von df -h gegenüber fdisk -l. Die ZFS-Laufwerke werden beim mount nicht angezeigt.

Bash:

root@proxmox1:~# df -h
Dateisystem          Größe Benutzt Verf. Verw% Eingehängt auf
udev                  7,8G       0  7,8G    0% /dev
tmpfs                 1,6G     25M  1,6G    2% /run
/dev/mapper/pve-root   57G     14G   41G   26% /
tmpfs                 7,9G     43M  7,8G    1% /dev/shm
tmpfs                 5,0M       0  5,0M    0% /run/lock
tmpfs                 7,9G       0  7,9G    0% /sys/fs/cgroup
/dev/fuse              30M     32K   30M    1% /etc/pve
overlay                57G     14G   41G   26% /var/lib/docker/overlay2/ddc01e004e232bcd8e752b943f34d29fbc2f2c819f223c26af952af7f5f7dcd7/merged
overlay                57G     14G   41G   26% /var/lib/docker/overlay2/06d9bb78f0cc115e735388740923008778bb80a39eed54df00377e15f8f60a08/merged
tmpfs                 1,6G       0  1,6G    0% /run/user/0

Code:

root@proxmox1:~# fdisk -l
Disk /dev/sdd: 232,9 GiB, 250059350016 bytes, 488397168 sectors
Disk model: VB0250EAVER     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EC51368D-0700-4F72-AA9D-4029ADBF66D6

Device       Start       End   Sectors   Size Type
/dev/sdd1       34      2047      2014  1007K BIOS boot
/dev/sdd2     2048   1050623   1048576   512M EFI System
/dev/sdd3  1050624 488397134 487346511 232,4G Linux LVM


Disk /dev/sdc: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68N
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: F3598192-C608-E34E-AB8D-E3DBE210C7E3

Device          Start        End    Sectors  Size Type
/dev/sdc1        2048 7814019071 7814017024  3,7T Solaris /usr & Apple ZFS
/dev/sdc9  7814019072 7814035455      16384    8M Solaris reserved 1


Disk /dev/sdb: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: WDC WD30EFRX-68E
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 92C48412-6DC1-4E43-89BF-9D581F9D9BC0

Device          Start        End    Sectors  Size Type
/dev/sdb1        2048 5860515839 5860513792  2,7T Solaris /usr & Apple ZFS
/dev/sdb9  5860515840 5860532223      16384    8M Solaris reserved 1


Disk /dev/sda: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: F8CB60C2-9432-8446-851A-F4FBC40A28BB

Device          Start        End    Sectors  Size Type
/dev/sda1        2048 7814019071 7814017024  3,7T Solaris /usr & Apple ZFS
/dev/sda9  7814019072 7814035455      16384    8M Solaris reserved 1


Disk /dev/mapper/pve-swap: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-root: 58 GiB, 62277025792 bytes, 121634816 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-vm--100--disk--0: 40 GiB, 42949672960 bytes, 83886080 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes


Disk /dev/mapper/pve-vm--101--disk--0: 40 GiB, 42949672960 bytes, 83886080 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes


Disk /dev/mapper/pve-vm--102--disk--0: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes


Disk /dev/mapper/pve-vm--103--disk--0: 80 GiB, 85899345920 bytes, 167772160 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes

oguz · Feb 10, 2020

joerg_re said:
Datei wird leider keine angelegt.

kannst du ueberhaupt dateien anlegen? oder geht es generell nicht?

joerg_re said:
Was mich irritiert ist die Ausgabe von df -h gegenüber fdisk -l. Die ZFS-Laufwerke werden beim mount nicht angezeigt.

die sind dann hoechtwahrscheinlich nicht gemounted.. kannst du die zvols haendisch mounten?

was steht in dmesg/syslog? vielleicht findest du genauere infos dort.

joerg_re · Feb 11, 2020

Bash:

root@proxmox1:~# dmesg | grep -i 'error\|warn\|exception'
[    0.042223] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20190703/tbfadt-569)
[    2.218971] ERST: Failed to get Error Log Address Range.
[    2.219071] [Firmware Warn]: GHES: Poll interval is 0 for generic hardware error source: 1, disabled.
[    2.486805] RAS: Correctable Errors collector initialized.
[    4.579414] random: 7 urandom warning(s) missed due to ratelimiting
[   10.725249] EXT4-fs (dm-1): re-mounted. Opts: errors=remount-ro
[   52.362932] WARNING: CPU: 0 PID: 1588 at arch/x86/mm/extable.c:126 ex_handler_uaccess+0x52/0x60
[   52.363009]  fixup_exception+0x4a/0x61
[   61.393634] audit: type=1400 audit(1581380309.140:19): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="/usr/bin/lxc-start" name="/WD3TB/" pid=2108 comm="mount.zfs" fstype="zfs" srcname="WD3TB" flags="rw, strictatime"
[   61.552553] lxc-start[1691]: segfault at 50 ip 00007fd31863ef8b sp 00007fff6bb34620 error 4 in liblxc.so.1.6.0[7fd3185e5000+8a000]
[   62.743242] lxc-start[2118]: segfault at 50 ip 00007f12c2c02f8b sp 00007ffe56d64400 error 4 in liblxc.so.1.6.0[7f12c2ba9000+8a000]
[   64.136138] lxc-start[2134]: segfault at 50 ip 00007f22139c4f8b sp 00007fff1f5d30e0 error 4 in liblxc.so.1.6.0[7f221396b000+8a000]
[   65.296392] lxc-start[2183]: segfault at 50 ip 00007f9491e68f8b sp 00007ffc845d3220 error 4 in liblxc.so.1.6.0[7f9491e0f000+8a000]

Wie mounte ich die Laufwerke?

Bash:

root@proxmox1:~# zfs list -r -o name,mountpoint,mounted
NAME                     MOUNTPOINT                MOUNTED
WD3TB                    /WD3TB                         no
WD3TB/share              /WD3TB/share                   no
WD3TB/subvol-101-disk-0  /WD3TB/subvol-101-disk-0       no
WD3TB/subvol-102-disk-0  /WD3TB/subvol-102-disk-0       no
WD3TB/subvol-103-disk-0  /WD3TB/subvol-103-disk-0       no
WD3TB/subvol-105-disk-0  /WD3TB/subvol-105-disk-0       no
WD3TB/subvol-110-disk-0  /WD3TB/subvol-110-disk-0       no
WD4TB                    /WD4TB                         no
WD4TB/mayan-daten        /WD4TB/mayan-daten             no
WD4TB/mayan-datenbank    /WD4TB/mayan-datenbank         no
WD4TB/subvol-109-disk-0  /WD4TB/subvol-109-disk-0       no

In den Verzeichnissen stehen aber Daten:

Bash:

root@proxmox1:~# ls -l /WD4TB/
insgesamt 28
drwxr-xr-x 2 root             root 4096 Feb 11 01:14 dump
drwxr-xr-x 2 root             root 4096 Feb 11 01:14 images
drwx------ 4             1000 1000 4096 Feb 11 01:14 mayan-datenbank
drwxr-xr-x 2 systemd-coredump root 4096 Feb 11 01:18 mayan-redis
drwxr-xr-x 2 root             root 4096 Feb 11 01:14 private
drwxr-xr-x 2 root             root 4096 Feb 11 01:14 snippets
drwxr-xr-x 4 root             root 4096 Feb 11 01:14 template

Bash:

root@proxmox1:~# zfs mount -a
cannot mount '/WD4TB': directory is not empty
Speicherzugriffsfehler
root@proxmox1:~# zfs mount WD4TB
cannot mount '/WD4TB': directory is not empty

Löschen kann ich den Inhalt aber auch nicht und das stellt mich wirklich vor ein Rätsel:

Bash:

root@proxmox1:/WD4TB# ls -l
insgesamt 24
drwxr-xr-x 2 root root 4096 Feb 11 01:34 dump
drwxr-xr-x 2 root root 4096 Feb 11 01:34 images
drwx------ 4   70 1000 4096 Feb 11 01:34 mayan-datenbank
drwxr-xr-x 2 root root 4096 Feb 11 01:34 private
drwxr-xr-x 2 root root 4096 Feb 11 01:34 snippets
drwxr-xr-x 4 root root 4096 Feb 11 01:34 template
root@proxmox1:/WD4TB# rm -rf  /WD4TB/
root@proxmox1:/WD4TB# ls -l
insgesamt 0
root@proxmox1:/WD4TB# cd
root@proxmox1:~# cd /WD4TB/
root@proxmox1:/WD4TB# ls
dump  images  mayan-datenbank  private    snippets  template
root@proxmox1:/WD4TB#

Wieso sind die Daten erst weg und dann plötzlich wieder da? Wo kommen die her?
Warum sagt die Weboberfläche und zpool status -v, dass alles ok ist? Siehe ersten Post.

joerg_re · Feb 11, 2020

Nach dem löschen des Verzeichnisses /W3TB konnte ich diesen nach einem reboot mit dem Befehl


zfs mount -a

mounten und bekomme jetzt folgende Anzeige und konnte die entsprechenden Container starten:

Bash:

root@proxmox1:~# zfs list -r -o name,mountpoint,mounted
NAME                     MOUNTPOINT                MOUNTED
WD3TB                    /WD3TB                        yes
WD3TB/share              /WD3TB/share                  yes
WD3TB/subvol-101-disk-0  /WD3TB/subvol-101-disk-0      yes
WD3TB/subvol-102-disk-0  /WD3TB/subvol-102-disk-0      yes
WD3TB/subvol-103-disk-0  /WD3TB/subvol-103-disk-0      yes
WD3TB/subvol-105-disk-0  /WD3TB/subvol-105-disk-0      yes
WD3TB/subvol-110-disk-0  /WD3TB/subvol-110-disk-0      yes
WD4TB                    /WD4TB                         no
WD4TB/mayan-daten        /WD4TB/mayan-daten             no
WD4TB/mayan-datenbank    /WD4TB/mayan-datenbank         no
WD4TB/subvol-109-disk-0  /WD4TB/subvol-109-disk-0       no

Ich hätte diesen Erfolg auch gerne für das andere Volumen (WD4TB), dass gespiegelt ist. Aber ich kann den Mount-Point nicht löschen, siehe Posten eins oben drüber.

T.Herrmann · Feb 29, 2020

Ja, das passiert leider bei einem außerplanmäßigen Reboot, ZFS mountet kein Dataset in einen Mountpoint der nicht wirklich leer ist (Kein einziges VZ darf drin sein). Ist mir auch schon bei den LXC subvols mountpoints passiert.

joerg_re · Feb 29, 2020

T.Herrmann said:
Ja, das passiert leider bei einem außerplanmäßigen Reboot, ZFS mountet kein Dataset in einen Mountpoint der nicht wirklich leer ist (Kein einziges VZ darf drin sein). Ist mir auch schon bei den LXC subvols mountpoints passiert.

Das habe ich auch gelesen. Und gibt es da eine Lösung? Ich wollte eigentlich Proxmox im Unterricht verwenden und habe lange überlegt wegen den Lizenzproblemen von ZFS. Aber wenn Proxmox nicht einmal mit seinem Storage zurecht kommt, dann habe ich eher das Gefühl einer Bastellösung und keiner Lösung die man angehenden Administratoren an die Hand geben kann.

T.Herrmann · Feb 29, 2020

Proxmox ist eine Profilösung, und ZFS verliert auch keine Daten, sondern das wieder Hochfahren nach einem Stromausfall muss überwacht werden, weil die Umount natürlich nicht gemacht wurden und ZFS da lieber nicht das Dataset einhängt falls da noch Daten in dem Mount point sind. Dieses Vorgehen ist soweit völlig ok, man kann diese auch aus-hebeln und ZFS anweisen auch auf nicht leere Mountpoints zu mounten.

>> zfs mount -o

https://docs.oracle.com/cd/E19253-01/820-2313/gaynd/index.html
Datensicherheit und HA sind zwei unterschiedliche Kategorien die auch unterschiedlichen Ressourcenverbrauch haben.

Search

Search

LXC-Container starten nicht mehr noch Stromausfall durch Orkan Sabine

joerg_re

New Member

matrix

Active Member

joerg_re

New Member

matrix

Active Member

joerg_re

New Member

matrix

Active Member

joerg_re

New Member

matrix

Active Member

joerg_re

New Member

joerg_re

New Member

oguz

Proxmox Retired Staff

joerg_re

New Member

oguz

Proxmox Retired Staff

joerg_re

New Member

joerg_re

New Member

T.Herrmann

Well-Known Member

joerg_re

New Member

T.Herrmann

Well-Known Member