Proxmox "crashing"/hanging randomly

wengacz

New Member
Jan 15, 2024
1
0
1
Hi,

I have this issue that Proxmox crashes/hangs randomly, the weird thing is that the machine is physically running but is unavailable.

I'm attaching the log (hopefully the correct way) of the last run (I omitted first few seconds after start so it isn't too log) that only lasted a few hours as you can see. I don't really see anything suspicious so I'm hoping you can help me out.

Thanks a lot.

Code:
Jan 14 20:55:41 proxmox kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Jan 14 20:55:41 proxmox kernel: fwbr100i0: port 1(fwln100i0) entered forwarding state
Jan 14 20:55:41 proxmox kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Jan 14 20:55:41 proxmox kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Jan 14 20:55:41 proxmox kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Jan 14 20:55:41 proxmox kernel: fwbr100i0: port 2(tap100i0) entered forwarding state
Jan 14 20:55:41 proxmox pvedaemon[1181]: <root@pam> successful auth for user 'root@pam'
Jan 14 20:55:45 proxmox pve-guests[1198]: <root@pam> end task UPID:proxmox:000004AF:00000B39:65A43C3B:startall::root@pam: OK
Jan 14 20:55:45 proxmox systemd[1]: Finished PVE guests.
Jan 14 20:55:45 proxmox systemd[1]: Starting Proxmox VE scheduler...
Jan 14 20:55:46 proxmox pvescheduler[1355]: starting server
Jan 14 20:55:46 proxmox systemd[1]: Started Proxmox VE scheduler.
Jan 14 20:55:46 proxmox systemd[1]: Reached target Multi-User System.
Jan 14 20:55:46 proxmox systemd[1]: Reached target Graphical Interface.
Jan 14 20:55:46 proxmox systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jan 14 20:55:46 proxmox systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Jan 14 20:55:46 proxmox systemd[1]: Finished Update UTMP about System Runlevel Changes.
Jan 14 20:55:46 proxmox systemd[1]: Startup finished in 7.362s (kernel) + 28.338s (userspace) = 35.700s.
Jan 14 20:55:51 proxmox systemd[1]: systemd-fsckd.service: Succeeded.
Jan 14 20:56:00 proxmox systemd[1]: systemd-timedated.service: Succeeded.
Jan 14 20:56:29 proxmox pvedaemon[1589]: start VM 101: UPID:proxmox:00000635:00001F08:65A43C6D:qmstart:101:root@pam:
Jan 14 20:56:29 proxmox pvedaemon[1179]: <root@pam> starting task UPID:proxmox:00000635:00001F08:65A43C6D:qmstart:101:root@pam:
Jan 14 20:56:30 proxmox systemd[1]: Started 101.scope.
Jan 14 20:56:30 proxmox systemd-udevd[1605]: Using default interface naming scheme 'v247'.
Jan 14 20:56:30 proxmox systemd-udevd[1605]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 14 20:56:30 proxmox kernel: device tap101i0 entered promiscuous mode
Jan 14 20:56:30 proxmox systemd-udevd[1604]: Using default interface naming scheme 'v247'.
Jan 14 20:56:30 proxmox systemd-udevd[1604]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 14 20:56:30 proxmox systemd-udevd[1604]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 14 20:56:30 proxmox systemd-udevd[1605]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 14 20:56:30 proxmox kernel: vmbr0: port 3(fwpr101p0) entered blocking state
Jan 14 20:56:30 proxmox kernel: vmbr0: port 3(fwpr101p0) entered disabled state
Jan 14 20:56:30 proxmox kernel: device fwpr101p0 entered promiscuous mode
Jan 14 20:56:30 proxmox kernel: vmbr0: port 3(fwpr101p0) entered blocking state
Jan 14 20:56:30 proxmox kernel: vmbr0: port 3(fwpr101p0) entered forwarding state
Jan 14 20:56:30 proxmox kernel: fwbr101i0: port 1(fwln101i0) entered blocking state
Jan 14 20:56:30 proxmox kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
Jan 14 20:56:30 proxmox kernel: device fwln101i0 entered promiscuous mode
Jan 14 20:56:30 proxmox kernel: fwbr101i0: port 1(fwln101i0) entered blocking state
Jan 14 20:56:30 proxmox kernel: fwbr101i0: port 1(fwln101i0) entered forwarding state
Jan 14 20:56:30 proxmox kernel: fwbr101i0: port 2(tap101i0) entered blocking state
Jan 14 20:56:30 proxmox kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jan 14 20:56:30 proxmox kernel: fwbr101i0: port 2(tap101i0) entered blocking state
Jan 14 20:56:30 proxmox kernel: fwbr101i0: port 2(tap101i0) entered forwarding state
Jan 14 20:56:31 proxmox pvedaemon[1179]: <root@pam> end task UPID:proxmox:00000635:00001F08:65A43C6D:qmstart:101:root@pam: WARNINGS: 1
Jan 14 20:57:38 proxmox pvedaemon[1972]: starting vnc proxy UPID:proxmox:000007B4:000039CA:65A43CB2:vncproxy:101:root@pam:
Jan 14 20:57:38 proxmox pvedaemon[1179]: <root@pam> starting task UPID:proxmox:000007B4:000039CA:65A43CB2:vncproxy:101:root@pam:
Jan 14 20:57:39 proxmox pvedaemon[1179]: <root@pam> end task UPID:proxmox:000007B4:000039CA:65A43CB2:vncproxy:101:root@pam: OK
Jan 14 20:57:43 proxmox pvedaemon[1179]: <root@pam> starting task UPID:proxmox:000007BB:00003BD3:65A43CB7:vncproxy:101:root@pam:
Jan 14 20:57:43 proxmox pvedaemon[1979]: starting vnc proxy UPID:proxmox:000007BB:00003BD3:65A43CB7:vncproxy:101:root@pam:
Jan 14 20:58:54 proxmox chronyd[906]: Selected source 78.108.96.197 (2.debian.pool.ntp.org)
Jan 14 21:00:30 proxmox dbus-daemon[683]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service' requested by ':1.2' (uid=0 pid=692 comm="/usr/lib/snapd/snapd " label="unconfined")
Jan 14 21:00:30 proxmox systemd[1]: Starting Time & Date Service...
Jan 14 21:00:30 proxmox dbus-daemon[683]: [system] Successfully activated service 'org.freedesktop.timedate1'
Jan 14 21:00:30 proxmox systemd[1]: Started Time & Date Service.
Jan 14 21:01:00 proxmox systemd[1]: systemd-timedated.service: Succeeded.
Jan 14 21:10:17 proxmox systemd[1]: Starting Cleanup of Temporary Directories...
Jan 14 21:10:17 proxmox systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Jan 14 21:10:17 proxmox systemd[1]: Finished Cleanup of Temporary Directories.
Jan 14 21:11:23 proxmox pvedaemon[1180]: <root@pam> successful auth for user 'root@pam'
Jan 14 21:17:01 proxmox CRON[6800]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jan 14 21:17:01 proxmox CRON[6801]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 14 21:17:01 proxmox CRON[6800]: pam_unix(cron:session): session closed for user root
Jan 14 21:25:27 proxmox smartd[691]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 71 to 70
Jan 14 21:25:27 proxmox smartd[691]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 114 to 113
Jan 14 21:26:23 proxmox pvedaemon[1180]: <root@pam> successful auth for user 'root@pam'
Jan 14 21:42:08 proxmox pveproxy[1189]: worker exit
Jan 14 21:42:08 proxmox pveproxy[1187]: worker 1189 finished
Jan 14 21:42:08 proxmox pveproxy[1187]: starting 1 worker(s)
Jan 14 21:42:08 proxmox pveproxy[1187]: worker 13035 started
Jan 14 21:42:18 proxmox pveproxy[1188]: worker exit
Jan 14 21:42:18 proxmox pveproxy[1187]: worker 1188 finished
Jan 14 21:42:18 proxmox pveproxy[1187]: starting 1 worker(s)
Jan 14 21:42:18 proxmox pveproxy[1187]: worker 13076 started
Jan 14 21:42:23 proxmox pvedaemon[1180]: <root@pam> successful auth for user 'root@pam'
Jan 14 21:45:00 proxmox pveproxy[1187]: worker 1190 finished
Jan 14 21:45:00 proxmox pveproxy[1187]: starting 1 worker(s)
Jan 14 21:45:00 proxmox pveproxy[1187]: worker 13738 started
Jan 14 21:45:01 proxmox pveproxy[13737]: got inotify poll request in wrong process - disabling inotify
Jan 14 21:58:23 proxmox pvedaemon[1180]: <root@pam> successful auth for user 'root@pam'
Jan 14 22:14:23 proxmox pvedaemon[1179]: <root@pam> successful auth for user 'root@pam'
Jan 14 22:17:01 proxmox CRON[21670]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jan 14 22:17:01 proxmox CRON[21671]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 14 22:17:01 proxmox CRON[21670]: pam_unix(cron:session): session closed for user root
Jan 14 22:22:34 proxmox pveproxy[13035]: worker exit
Jan 14 22:22:34 proxmox pveproxy[1187]: worker 13035 finished
Jan 14 22:22:34 proxmox pveproxy[1187]: starting 1 worker(s)
Jan 14 22:22:34 proxmox pveproxy[1187]: worker 23059 started
Jan 14 22:26:45 proxmox pvedaemon[1179]: <root@pam> end task UPID:proxmox:000007BB:00003BD3:65A43CB7:vncproxy:101:root@pam: OK
Jan 14 22:26:45 proxmox pveproxy[13737]: worker exit
Jan 14 23:17:01 proxmox CRON[36521]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jan 14 23:17:01 proxmox CRON[36522]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 14 23:17:01 proxmox CRON[36521]: pam_unix(cron:session): session closed for user root
-- Reboot --
 
The logs don't show anything wrong, which usually indicates a hardware problem. Is there a physical display connected that might indicate something (that could not be written to the logs on disk) when Proxmox freezes?
 
I also encounter the randomly freeze (crash) problem. While the power light still on, but server is unavailable and the fan is not running.
The only thing i can do is restart the machine physically. It appears about every three or four days
Here is my output of pveversion -v

Code:
proxmox-ve: 7.2-1 (running kernel: 6.1.15-1-pve)
pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1)
pve-kernel-6.1: 7.3-6
pve-kernel-helper: 7.2-2
pve-kernel-5.15: 7.2-1
pve-kernel-6.1.15-1-pve: 6.1.15-1
pve-kernel-6.0-edge: 6.0.19-1
pve-kernel-6.0.19-edge: 6.0.19-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-6
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.2-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.8-1
proxmox-backup-file-restore: 2.1.8-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-1
pve-ha-manager: 3.3-4
pve-i18n: 2.7-1
pve-qemu-kvm: 6.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1-1.3~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.11-1~bpo11+1

I have tried 6.1.15-1-pve and 5.15.30-2-pve. and both dont work. I also found some error messages in syslog after reboot:
when use 6.1.15-1-pve, it's like:
Code:
BERT: [Hardware Error]: Skipped 1 error records [CODE]BERT: [Hardware Error]: Skipped 1 error records

when use 5.15.30-2-pve, it's like:
Code:
kernel: BERT: Error records from previous boot:
kernel: [Hardware Error]: event severity: fatal
kernel: [Hardware Error]:  Error 0, type: fatal
kernel: [Hardware Error]:   section_type: Firmware Error Record Reference
kernel: [Hardware Error]:   Firmware Error Record Type: SOC Firmware Error Record

It there any suggestion to solve this? I really appreciate it
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!