Host eventually loses web-gui and ssh login. Terminal show journal services in "zombie" mode

valk

New Member
Dec 2, 2022
13
0
1
Hey guys. Rly need some help to figure out what keeps happening to my host:
I can have it running random time but eventually it would lose web-gui and even ssh. Terminal is alive and usually shows some trouble killing multiple processes of journal service. I stumbled across this thread on github It seems to me that my journal service is a victim and not the couse of my problems but how do I troubleshoot this further? Any advice would be very appreciated!
root@prox:~# journalctl -p 5 -xb
-- Journal begins at Mon 2022-12-26 18:45:43 +03, ends at Sun 2023-02-12 13:39:51 +03. --
Feb 12 10:57:38 prox kernel: Linux version 6.1.6-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT_DYNAMIC PVE 6.1.6-1>
Feb 12 10:57:38 prox kernel: Kernel command line: BOOT_IMAGE=/vmlinuz-6.1.6-1-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt
Feb 12 10:57:38 prox kernel: Unknown kernel command line parameters "BOOT_IMAGE=/vmlinuz-6.1.6-1-pve boot=zfs", will be passed to user space.
Feb 12 10:57:38 prox kernel: MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
Feb 12 10:57:38 prox kernel: #5 #6 #7
Feb 12 10:57:38 prox kernel: audit: type=2000 audit(1676188652.068:1): state=initialized audit_enabled=0 res=1
Feb 12 10:57:38 prox kernel: pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
Feb 12 10:57:38 prox kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
Feb 12 10:57:38 prox kernel: SCSI subsystem initialized
Feb 12 10:57:38 prox kernel: VFS: Disk quotas dquot_6.6.0
Feb 12 10:57:38 prox kernel: Initialise system trusted keyrings
Feb 12 10:57:38 prox kernel: Key type blacklist registered
Feb 12 10:57:38 prox kernel: integrity: Platform Keyring initialized
Feb 12 10:57:38 prox kernel: integrity: Machine keyring initialized
Feb 12 10:57:38 prox kernel: Key type asymmetric registered
Feb 12 10:57:38 prox kernel: Asymmetric key parser 'x509' registered
Feb 12 10:57:38 prox kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
Feb 12 10:57:38 prox kernel: platform eisa.0: EISA: Cannot allocate resource for mainboard
Feb 12 10:57:38 prox kernel: platform eisa.0: Cannot allocate resource for EISA slot 1
Feb 12 10:57:38 prox kernel: platform eisa.0: Cannot allocate resource for EISA slot 2
Feb 12 10:57:38 prox kernel: platform eisa.0: Cannot allocate resource for EISA slot 3
Feb 12 10:57:38 prox kernel: platform eisa.0: Cannot allocate resource for EISA slot 4
Feb 12 10:57:38 prox kernel: platform eisa.0: Cannot allocate resource for EISA slot 5
Feb 12 10:57:38 prox kernel: platform eisa.0: Cannot allocate resource for EISA slot 6
Feb 12 10:57:38 prox kernel: platform eisa.0: Cannot allocate resource for EISA slot 7
Feb 12 10:57:38 prox kernel: platform eisa.0: Cannot allocate resource for EISA slot 8
Feb 12 10:57:38 prox kernel: Bridge firewalling registered
Feb 12 10:57:38 prox kernel: Key type dns_resolver registered
Feb 12 10:57:38 prox kernel: Loading compiled-in X.509 certificates
Feb 12 10:57:38 prox kernel: Key type .fscrypt registered
Feb 12 10:57:38 prox kernel: Key type fscrypt-provisioning registered
Feb 12 10:57:38 prox kernel: Key type encrypted registered
Feb 12 10:57:38 prox kernel: Loading compiled-in module X.509 certificates
Feb 12 10:57:38 prox kernel: Loaded X.509 cert 'Build time autogenerated kernel key: 56fa3448f01c00ff0ab800a4e8fa93a1f8f88db2'
Feb 12 10:57:38 prox kernel: ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042F conflicts with OpRegion 0x0000000000000400-0x000000000000047F (\PMIO) (20220331/utaddress->
Feb 12 10:57:38 prox kernel: ACPI Warning: SystemIO range 0x0000000000000540-0x000000000000054F conflicts with OpRegion 0x0000000000000500-0x000000000000057F (\_SB.PCI0.LPCB.GPBX) (20220>
Feb 12 10:57:38 prox kernel: ACPI Warning: SystemIO range 0x0000000000000540-0x000000000000054F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20220331/utaddress->
Feb 12 10:57:38 prox kernel: ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053F conflicts with OpRegion 0x0000000000000500-0x000000000000057F (\_SB.PCI0.LPCB.GPBX) (20220>
Feb 12 10:57:38 prox kernel: ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20220331/utaddress->
Feb 12 10:57:38 prox kernel: ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x0000000000000500-0x000000000000057F (\_SB.PCI0.LPCB.GPBX) (20220>
Feb 12 10:57:38 prox kernel: ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20220331/utaddress->
Feb 12 10:57:38 prox kernel: lpc_ich: Resource conflict(s) found affecting gpio_ich
Feb 12 10:57:38 prox kernel: mpt3sas 0000:04:00.0: can't disable ASPM; OS doesn't have ASPM control
Feb 12 10:57:38 prox kernel: mpt2sas_cm0: overriding NVDATA EEDPTagMode setting
Feb 12 10:57:38 prox kernel: scsi 0:0:0:0: Direct-Access ATA HGST HUS724040AL AA70 PQ: 0 ANSI: 6
Feb 12 10:57:38 prox kernel: scsi 0:0:1:0: Direct-Access ATA WDC WD40PURZ-85T 0A80 PQ: 0 ANSI: 6
Feb 12 10:57:38 prox kernel: scsi 0:0:2:0: Direct-Access ATA WDC WD40PURZ-85T 0A80 PQ: 0 ANSI: 6
Feb 12 10:57:38 prox kernel: scsi 0:0:3:0: Direct-Access ATA HGST HMS5C4040BL A5D0 PQ: 0 ANSI: 6
Feb 12 10:57:38 prox kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
Feb 12 10:57:38 prox kernel: sd 0:0:1:0: Attached scsi generic sg1 type 0
Feb 12 10:57:38 prox kernel: sd 0:0:2:0: Attached scsi generic sg2 type 0
Feb 12 10:57:38 prox kernel: sd 0:0:3:0: Attached scsi generic sg3 type 0
Feb 12 10:57:38 prox kernel: sd 0:0:1:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
Feb 12 10:57:38 prox kernel: sd 0:0:1:0: [sdb] 4096-byte physical blocks
Feb 12 10:57:38 prox kernel: sd 0:0:2:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
Feb 12 10:57:38 prox kernel: sd 0:0:2:0: [sdc] 4096-byte physical blocks
Feb 12 10:57:38 prox kernel: sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
Feb 12 10:57:38 prox kernel: sd 0:0:3:0: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
Feb 12 10:57:38 prox kernel: sd 0:0:3:0: [sdd] 4096-byte physical blocks
Feb 12 10:57:38 prox kernel: sd 0:0:1:0: [sdb] Write Protect is off
Feb 12 10:57:38 prox kernel: sd 0:0:2:0: [sdc] Write Protect is off
Feb 12 10:57:38 prox kernel: sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
Feb 12 10:57:38 prox kernel: sd 0:0:2:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
Feb 12 10:57:38 prox kernel: sd 0:0:2:0: [sdc] Attached SCSI disk
Feb 12 10:57:38 prox kernel: sd 0:0:1:0: [sdb] Attached SCSI disk
Feb 12 10:57:38 prox kernel: sd 0:0:0:0: [sda] Write Protect is off
Feb 12 10:57:38 prox kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
Feb 12 10:57:38 prox kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.SPT5._GTF.DSSP], AE_NOT_FOUND (20220331/psargs-330)
Feb 12 10:57:38 prox kernel:
Feb 12 10:57:38 prox kernel: No Local Variables are initialized for Method [_GTF]
Feb 12 10:57:38 prox kernel:
Feb 12 10:57:38 prox kernel: No Arguments are initialized for method [_GTF]
Feb 12 10:57:38 prox kernel:
Feb 12 10:57:38 prox kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.SPT5._GTF due to previous error (AE_NOT_FOUND) (20220331/psparse-529)
Feb 12 10:57:38 prox kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.SPT5._GTF.DSSP], AE_NOT_FOUND (20220331/psargs-330)
Feb 12 10:57:38 prox kernel:
Feb 12 10:57:38 prox kernel: No Local Variables are initialized for Method [_GTF]
Feb 12 10:57:38 prox kernel:
Feb 12 10:57:38 prox kernel: No Arguments are initialized for method [_GTF]
Feb 12 10:57:38 prox kernel:
Feb 12 10:57:38 prox kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.SPT5._GTF due to previous error (AE_NOT_FOUND) (20220331/psparse-529)
Feb 12 10:57:38 prox kernel: scsi 6:0:0:0: Direct-Access ATA WDC WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
Feb 12 10:57:38 prox kernel: sd 6:0:0:0: Attached scsi generic sg4 type 0
Feb 12 10:57:38 prox kernel: sd 6:0:0:0: [sde] 7814028911 512-byte logical blocks: (4.00 TB/3.64 TiB)
Feb 12 10:57:38 prox kernel: sd 6:0:0:0: [sde] 4096-byte physical blocks
Feb 12 10:57:38 prox kernel: sd 6:0:0:0: [sde] Write Protect is off
Feb 12 10:57:38 prox kernel: sd 6:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 12 10:57:38 prox kernel: sd 0:0:3:0: [sdd] Write Protect is off
Feb 12 10:57:38 prox kernel: sd 0:0:3:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
Feb 12 10:57:38 prox kernel: sd 6:0:0:0: [sde] Attached SCSI disk
Feb 12 10:57:38 prox kernel: scsi 9:0:0:0: Direct-Access ATA Micron_5100_MTFD U027 PQ: 0 ANSI: 5
Feb 12 10:57:38 prox kernel: sd 9:0:0:0: Attached scsi generic sg5 type 0
Feb 12 10:57:38 prox kernel: sd 9:0:0:0: [sdf] 937703088 512-byte logical blocks: (480 GB/447 GiB)
Feb 12 10:57:38 prox kernel: sd 9:0:0:0: [sdf] 4096-byte physical blocks
Feb 12 10:57:38 prox kernel: sd 9:0:0:0: [sdf] Write Protect is off
Feb 12 10:57:38 prox kernel: sd 9:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 12 10:57:38 prox kernel: scsi 10:0:0:0: Direct-Access ATA Micron_5100_MTFD U027 PQ: 0 ANSI: 5
Feb 12 10:57:38 prox kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
Feb 12 10:57:38 prox kernel: sd 10:0:0:0: [sdg] 937703088 512-byte logical blocks: (480 GB/447 GiB)
Feb 12 10:57:38 prox kernel: sd 10:0:0:0: [sdg] 4096-byte physical blocks
Feb 12 10:57:38 prox kernel: sd 10:0:0:0: [sdg] Write Protect is off
Feb 12 10:57:38 prox kernel: sd 10:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 12 10:57:38 prox kernel: sd 10:0:0:0: [sdg] Attached SCSI removable disk
Feb 12 10:57:38 prox kernel: sd 9:0:0:0: [sdf] Attached SCSI removable disk
Feb 12 10:57:38 prox kernel: sd 0:0:0:0: [sda] Attached SCSI disk
Feb 12 10:57:38 prox kernel: sd 0:0:3:0: [sdd] Attached SCSI disk
Feb 12 10:57:38 prox kernel: random: crng init done
Feb 12 10:57:38 prox kernel: scsi 11:0:0:0: Direct-Access WD Elements 25A3 1021 PQ: 0 ANSI: 6
Feb 12 10:57:38 prox kernel: sd 11:0:0:0: Attached scsi generic sg7 type 0
Feb 12 10:57:38 prox kernel: sd 11:0:0:0: [sdh] Very big device. Trying to use READ CAPACITY(16).
Feb 12 10:57:38 prox kernel: sd 11:0:0:0: [sdh] 19532808192 512-byte logical blocks: (10.0 TB/9.09 TiB)
Feb 12 10:57:38 prox kernel: sd 11:0:0:0: [sdh] 4096-byte physical blocks
Feb 12 10:57:38 prox kernel: sd 11:0:0:0: [sdh] Write Protect is off
Feb 12 10:57:38 prox kernel: sd 11:0:0:0: [sdh] No Caching mode page found
Feb 12 10:57:38 prox kernel: sd 11:0:0:0: [sdh] Assuming drive cache: write through
Feb 12 10:57:38 prox kernel: sd 11:0:0:0: [sdh] Attached SCSI disk
Feb 12 10:57:38 prox kernel: spl: loading out-of-tree module taints kernel.
Feb 12 10:57:38 prox kernel: znvpair: module license 'CDDL' taints kernel.
Feb 12 10:57:38 prox kernel: Disabling lock debugging due to kernel taint
Feb 12 10:57:38 prox kernel: ZFS: Loaded module v2.1.9-pve1, ZFS pool version 5000, ZFS filesystem version 5
Feb 12 10:57:38 prox kernel: iscsi: registered transport (tcp)
Feb 12 10:57:38 prox kernel: iscsi: registered transport (iser)
Feb 12 10:57:38 prox systemd-modules-load[1055]: Failed to find module 'nvidia'
Feb 12 10:57:38 prox systemd-modules-load[1055]: Failed to find module 'nvidia-modeset'
Feb 12 10:57:38 prox systemd-modules-load[1055]: Failed to find module 'nvidia_uvm'
Feb 12 10:57:38 prox kernel: at24 0-0050: supply vcc not found, using dummy regulator
Feb 12 10:57:38 prox kernel: at24 0-0051: supply vcc not found, using dummy regulator
Feb 12 10:57:38 prox kernel: at24 0-0052: supply vcc not found, using dummy regulator
Feb 12 10:57:38 prox kernel: at24 0-0053: supply vcc not found, using dummy regulator
Feb 12 10:57:38 prox kernel: asus_wmi: fan_curve_get_factory_default (0x00110024) failed: -61
Feb 12 10:57:38 prox kernel: asus_wmi: fan_curve_get_factory_default (0x00110025) failed: -61
Feb 12 10:57:39 prox kernel: MXM: GUID detected in BIOS
...skipping...
Feb 12 13:05:41 prox systemd[1]: systemd-journald.service: State 'stop-watchdog' timed out. Killing.
Feb 12 13:05:41 prox systemd[1]: systemd-journald.service: Killing process 59695 (systemd-journal) with signal SIGKILL.
Feb 12 13:05:41 prox systemd[1]: systemd-journald.service: Killing process 60818 (journal-offline) with signal SIGKILL.
Feb 12 13:05:41 prox systemd[1]: systemd-journald.service: Processes still around after SIGKILL. Ignoring.
Feb 12 13:05:41 prox systemd[1]: systemd-journald.service: State 'final-sigterm' timed out. Killing.
Feb 12 13:05:41 prox systemd[1]: systemd-journald.service: Killing process 59695 (systemd-journal) with signal SIGKILL.
Feb 12 13:05:41 prox systemd[1]: systemd-journald.service: Killing process 60818 (journal-offline) with signal SIGKILL.
Feb 12 13:05:41 prox systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
Feb 12 13:05:41 prox systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
Feb 12 13:05:41 prox systemd[1]: systemd-journald.service: Found left-over process 59695 (systemd-journal) in control group while starting unit. Ignoring.
Feb 12 13:05:41 prox systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Feb 12 13:31:41 prox login[68117]: ROOT LOGIN on '/dev/pts/0'

root@prox:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 6.1.6-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-6.1: 7.3-3
pve-kernel-helper: 7.3-3
pve-kernel-5.15: 7.3-1
pve-kernel-6.1.6-1-pve: 6.1.6-1
pve-kernel-6.1.2-1-pve: 6.1.2-1
pve-kernel-6.0-edge: 6.0.15-1
pve-kernel-6.0.15-edge: 6.0.15-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-1
lxcfs: 5.0.3-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-2
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!