Full system hang every few days

j1a2o

Member
Feb 14, 2021
34
4
13
38
Nothing is responsive, not even the power button on my machine. I have to pull the plug in order to restart the machine. Everything was working fine until about a week ago.

Nothing shows up in /var/log/syslog.

pve-manager/6.3-6/2184247e (running kernel: 5.4.103-1-pve)
ZFS miror

Machine is a Ryzen 4650G, MSI B450i motherboard

Everything was stable until I did the Proxmox update that went from ZFS 0.8.5 to ZFS 2.0. I'm highly suspicious of that update.

Anyone know what I should do next?
 
Did you also checked "/var/log/syslog.1"?

If even the power button isn't working anymore it sounds more like a hardware problem.
 
Did you also checked "/var/log/syslog.1"?

If even the power button isn't working anymore it sounds more like a hardware problem.
I actually meant that syslog didn't show anything meaningful. The log entries between when it froze and when I rebooted it were these:

Code:
Mar 17 16:39:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 17 16:39:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 17 16:40:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 17 16:40:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 17 16:40:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 17 16:41:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 17 16:41:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 17 16:41:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 17 16:42:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 17 16:42:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 17 16:42:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 17 16:43:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 17 16:43:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 17 16:43:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 17 16:44:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 17 16:44:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 17 16:44:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 17 16:45:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 17 16:45:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 17 16:45:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 17 16:45:01 pve CRON[11441]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 17 16:46:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 17 16:46:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 17 16:46:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 17 16:58:53 pve systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Mar 17 16:58:53 pve kernel: [    0.000000] Linux version 5.4.103-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.103-1 (Sun, 07 Mar 2021 15:55:09 +0100) ()
Mar 17 16:58:53 pve kernel: [    0.000000] Command line: initrd=\EFI\proxmox\5.4.103-1-pve\initrd.img-5.4.103-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs amdgpu.exp_hw_support=1
Mar 17 16:58:53 pve kernel: [    0.000000] KERNEL supported cpus:
Mar 17 16:58:53 pve kernel: [    0.000000]   Intel GenuineIntel
Mar 17 16:58:53 pve kernel: [    0.000000]   AMD AuthenticAMD
Mar 17 16:58:53 pve kernel: [    0.000000]   Hygon HygonGenuine
Mar 17 16:58:53 pve kernel: [    0.000000]   Centaur CentaurHauls
Mar 17 16:58:53 pve kernel: [    0.000000]   zhaoxin   Shanghai
Mar 17 16:58:53 pve kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Mar 17 16:58:53 pve kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Mar 17 16:58:53 pve kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Mar 17 16:58:53 pve kernel: [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Mar 17 16:58:53 pve kernel: [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
Mar 17 16:58:53 pve kernel: [    0.000000] BIOS-provided physical RAM map:

If it's a hardware problem, then it'd be very coincidentally timed with the Proxmox update from about 1-2 weeks ago...
 
Also, I also have Telegraf logging to InfluxDB, and I don't see any signs of resource issues right before it hangs. Out of the 3 instances, CPU load was between 40-60%, CPU temperatures between 50-60C, and memory was about 80% utilization.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!