PVE Freezing & ACPI BIOS Errors on Bootup

wltd

New Member
Jan 20, 2023
1
0
1
Receiving the following errors when booting up my Proxmox host. Every 2-3 days I have to reboot my Proxmox host as it stops responding to SSH and WebUI won't work. I am always able to ping it by hostname and IP though.
Any assistance would be much appreciated!
 
I have got similar freezing behavior with similar errors and the latest BIOS:

Code:
Jun 02 14:45:35 pve kernel: snd_hda_intel 0000:00:1f.3: no codecs found!
Jun 02 14:44:34 pve kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20210730/psobject-220)
Jun 02 14:44:34 pve kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PC00.TXHC.RHUB.SS04], AE_NOT_FOUND (20210730/dswload2-162)
Jun 02 14:44:34 pve kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20210730/psobject-220)
Jun 02 14:44:34 pve kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PC00.TXHC.RHUB.SS03], AE_NOT_FOUND (20210730/dswload2-162)
Jun 02 14:44:34 pve kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20210730/psobject-220)
Jun 02 14:44:34 pve kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PC00.TXHC.RHUB.SS02], AE_NOT_FOUND (20210730/dswload2-162)
Jun 02 14:44:34 pve kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20210730/psobject-220)
Jun 02 14:44:34 pve kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PC00.TXHC.RHUB.SS01], AE_NOT_FOUND (20210730/dswload2-162)

I also tried the opt-in pve-kernel-6.2 kernel and got crashed with:

Code:
Jun 02 03:21:15 pve kernel: unwind stack type:0 next_sp:0000000000000000 mask:0x2 graph_idx:0
Jun 02 03:21:15 pve kernel: WARNING: kernel stack frame pointer at 00000000805038c2 in CPU 1/KVM:1032 has bad value 00000000fe396124
Jun 02 03:21:15 pve kernel: Call Trace:
Jun 02 03:21:15 pve kernel: PKRU: 55555554
Jun 02 03:21:15 pve kernel: CR2: 0000000000000001 CR3: 000000010a4f2005 CR4: 0000000000772ee0
Jun 02 03:21:15 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 02 03:21:15 pve kernel: FS:  00007f249bdff700(0000) GS:ffff9783af8c0000(0000) knlGS:0000000000000000
Jun 02 03:21:15 pve kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jun 02 03:21:15 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jun 02 03:21:15 pve kernel: RBP: ffffbf998575fbe6 R08: 0000000000000000 R09: 0000000000000000
Jun 02 03:21:15 pve kernel: RDX: 73496b4337ef94e9 RSI: 0000000000000000 RDI: ffff97744a67c800
Jun 02 03:21:15 pve kernel: RAX: 00000000cfddc470 RBX: 0000000000000000 RCX: ffffffff9f6011f7
Jun 02 03:21:15 pve kernel: RSP: 0018:ffffbf998575fc38 EFLAGS: 00010082
Jun 02 03:21:15 pve kernel: Code: e9 f1 0a 00 00 90 0f 01 ca fc e8 97 09 00 00 48 89 c4 48 8d 6c 24 01 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff ff e8>
Jun 02 03:21:15 pve kernel: RIP: 0010:asm_exc_page_fault+0x24/0x30
Jun 02 03:21:15 pve kernel: Hardware name: Default string Default string/Default string, BIOS 1744NP12V10R003S 03/16/2023
Jun 02 03:21:15 pve kernel: CPU: 3 PID: 1032 Comm: CPU 1/KVM Tainted: P           O       6.2.11-2-pve #1
Jun 02 03:21:15 pve kernel: general protection fault, maybe for address 0xcfddc470: 0000 [#1] PREEMPT SMP NOPTI

My spec:

Bash:
# uname -a
Linux pve 5.15.107-2-pve #1 SMP PVE 5.15.107-2 (2023-05-10T09:10Z) x86_64 GNU/Linux

# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-6.2: 7.4-3
pve-kernel-5.15: 7.4-3
pve-kernel-6.2.11-2-pve: 6.2.11-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.0
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-2
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
 
Currently, with the 5.15.107-2-pve kernel and CPU c-state disabled in BIOS, my machine stopped freezing. However, the 6.2 kernel is still buggy with above bad pointer error

Btw, my CPU is intel i7-1265u
 
Last edited:
Just jumping in on this thread as i have the exact same issue with the same cpu as you @guiguan and been pulling my hair out trying to figure out what has been causing it, however i am running pve 8.0.2 kernel 6.2.16-14-pve.

I've dropped back to 6.2.16-3-pve and disabled c states but ive only give it a couple of hours uptime so i'm yet to see how stable it is.... @guiguan did you ever update on this CPU/system or are you still running on 5.15.107-2pve?
 
So far so good.... past the 24 hour period but id want to give it at least a week before saying 'i think its ok'. This specific machine is a 'mini pc' with 4 port ethernet which im specifically using as a soft router (OPNsense) with passthrough NIC (only 3 ports).

In the event its not id like to run 5.15.107-2pve, How did you install/get this version if you dont mind me asking?
 
So far so good.... past the 24 hour period but id want to give it at least a week before saying 'i think its ok'. This specific machine is a 'mini pc' with 4 port ethernet which im specifically using as a soft router (OPNsense) with passthrough NIC (only 3 ports).

In the event its not id like to run 5.15.107-2pve, How did you install/get this version if you dont mind me asking?
I think I followed this https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_kernel_pin
 
I believe I am running into similar issues with my device. I have an Intel 8505 which is freezing every 3 days or so. It becomes completely unresponsive and I have to hard reset it.

I also have very similar messages in my BIOS.

@guiguan are you running Proxmox VE 8, with the 5.X kernel?

Also in general, are there any issues with running PVE 8 with the 5.X kernel?
 
Last edited:
Thanks @guiguan i'll drop back to that version if i run into any issues but for the moment its all running smoothly. @natedogg have you tried disabling C -States in bios, maybe updating the bios firmware? What version are you running, specs, type of machine, VMs running, passthroughs etc? Theres a few things to try you see.

Currently i have a cluster of 4 with 13th gen, 12th gen and Xeon chipsets, 3 of them are running the very latest updates 'kernel 6.2.16-14-pve.' with no problems and the problematic one is running 6.2.16-3-pve.

However, one thing worth mentioning/to note (which maybe related i dont know) is the 13th gen i7 13700 intel machine, yesterday i changed the MTU value on the adaptor which completely froze the thing after about 20 seconds and i could not then boot back into it as it would freeze again at start up until i removed the changed MTU value in recovery mode. This adaptor is the same as my problematic one 'Intel Corporation Ethernet Controller I226-V' does yours have that adaptor by any chance? I know from 12/13th gen intel boards have these 2.5gbps onboard. I mention this because when i was setting up the other one with OPNsense with passingthrough, this adaptor completely froze it several times (VM crash not PVE though).
 
@gs800uk, I have two machines running, both on alderlake.

One is an N95 with a Realtek NIC. I am running 6.2.16-3 there because of some package dependencies. It seems to be fine with no crashes.

The one in having issues with is a 4x i226-v mini router with an Intel 8505. It is running 2x16gb 3200mhz of team force ram, I ran memtest for multiple hours and ran stress with s-tui and had no stability issues or errors.

I am running opnsense and opnsense is using Linux Bridges (no passthrough).

I have not disabled cstates yet. Running the latest bios but I think it's kind of old (February of this year?).

What bios settings do you have on your machine?
 
ahhh well ive just come on here to report that mine has actually crashed now using 6.2.16-3-pve unfortunately (48 hours uptime or so) and a force shutdown power off was needed. So even with C-States disabled its still doing it and that version. All other settings in BIOS are defaults. I did reach out to the seller from AliExpress about a new BIOS firmware but apparently its the latest...... Not sure i believe them but who knows..... It appears to be an untouched AMI BIOS which could well have a parameter set thats not supposed to be enabled for this specific motherboard design/hardware.

Surprisingly enough i am running the exact same machine by the sounds of it with 4x i226-v (with passthrough) with OPNsense. Only difference is i have 64gb RAM (also ran memtest multiple times for over 4hours) and using the intel i7-1265u and also to note i use a SATA ssd for the OS and OPNsense VM and a NVME for future VMs.

Since i have a subscription i could try the latest stable release 6.2.16-12 but im not sure that will help or i revert to PVE 7.4 with 5.15.107-2pve kernel. Or maybe i scrap using proxmox with OPNsense as a VM and run it bare metal........ Maybe the VM is crashing the host here so i will look into the logs a bit deeper.

What settings do you have exactly on your instance of OPNsense? Heres mine:

Code:
agent: 1,fstrim_cloned_disks=1
balloon: 0
boot: order=scsi0
cores: 8
cpu: host,flags=+aes
hostpci0: 0000:02:00
hostpci1: 0000:03:00
hostpci2: 0000:04:00
machine: q35
memory: 8096
meta: creation-qemu=8.0.2,ctime=1694698266
name: OPNsense
numa: 0
onboot: 1
ostype: l26
protection: 1
scsi0: local-lvm:vm-100-disk-0,iothread=1,size=100G
scsihw: virtio-scsi-single
smbios1: uuid=6e96d9ea-5a4c-47fa-b3ad-e83d7e960d0c
sockets: 1
startup: order=1
vmgenid: 0b7f1519-fed3-4c4c-83f8-cf8369d91abe

Maybe i should emulate a different CPU as i read that FreeBSD is always lacking behind with driver support???? Thats of course if the VM is to blame for host freezing

As i am writing this i am investigating journal anyway and get this right up until i believe it completely froze:

Code:
Oct 04 20:07:46 softrouter pvedaemon[1048]: <root@pam> starting task UPID:softrouter:000AEEC5:00C993E4:651DB802:vncproxy:450:root@pam:
Oct 04 20:07:46 softrouter pvedaemon[716485]: starting vnc proxy UPID:softrouter:000AEEC5:00C993E4:651DB802:vncproxy:450:root@pam:
Oct 04 20:09:14 softrouter pvedaemon[1048]: <root@pam> end task UPID:softrouter:000AEEC5:00C993E4:651DB802:vncproxy:450:root@pam: OK
Oct 04 20:09:14 softrouter pvedaemon[716968]: starting vnc proxy UPID:softrouter:000AF0A8:00C9B65A:651DB85A:vncproxy:401:root@pam:
Oct 04 20:09:14 softrouter pvedaemon[1047]: <root@pam> starting task UPID:softrouter:000AF0A8:00C9B65A:651DB85A:vncproxy:401:root@pam:
Oct 04 20:09:16 softrouter pvedaemon[1047]: <root@pam> end task UPID:softrouter:000AF0A8:00C9B65A:651DB85A:vncproxy:401:root@pam: OK
Oct 04 20:16:25 softrouter pvedaemon[1049]: <root@pam> successful auth for user 'root@pam'
Oct 04 20:17:01 softrouter CRON[719489]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 04 20:17:01 softrouter CRON[719490]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 04 20:17:01 softrouter CRON[719489]: pam_unix(cron:session): session closed for user root
Oct 04 20:26:11 softrouter pmxcfs[915]: [dcdb] notice: data verification successful
Oct 04 20:56:10 softrouter smartd[697]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 58 to 56
Oct 04 20:56:10 softrouter smartd[697]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 58 to 56
Oct 04 21:17:01 softrouter CRON[738938]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 04 21:17:01 softrouter CRON[738939]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 04 21:17:01 softrouter CRON[738938]: pam_unix(cron:session): session closed for user root
Oct 04 21:18:13 softrouter pvedaemon[1047]: <root@pam> successful auth for user 'root@pam'
Oct 04 21:23:28 softrouter systemd[1]: Starting apt-daily.service - Daily apt download activities...
Oct 04 21:23:28 softrouter systemd[1]: apt-daily.service: Deactivated successfully.
Oct 04 21:23:28 softrouter systemd[1]: Finished apt-daily.service - Daily apt download activities.
Oct 04 21:26:11 softrouter pmxcfs[915]: [dcdb] notice: data verification successful
Oct 04 21:56:10 softrouter smartd[697]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 56 to 58
Oct 04 21:56:10 softrouter smartd[697]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 56 to 58
Oct 04 22:17:01 softrouter CRON[758446]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 04 22:17:01 softrouter CRON[758447]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 04 22:17:01 softrouter CRON[758446]: pam_unix(cron:session): session closed for user root
Oct 04 22:26:10 softrouter smartd[697]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 58 to 59
Oct 04 22:26:10 softrouter smartd[697]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 58 to 59
Oct 04 22:26:11 softrouter pmxcfs[915]: [dcdb] notice: data verification successful
Oct 04 22:30:00 softrouter pmxcfs[915]: [status] notice: received log
Oct 04 22:30:02 softrouter pvescheduler[762671]: <root@pam> starting task UPID:softrouter:000BA330:00D69A7C:651DD95A:vzdump:100:root@pam:
Oct 04 22:30:02 softrouter pvescheduler[762672]: INFO: starting new backup job: vzdump --notes-template '{{guestname}}' --prune-backups 'keep-last=14' --mode snapshot --quiet 1 --storage PBSLOCAL --mailnotific>
Oct 04 22:30:02 softrouter pvescheduler[762672]: INFO: Starting Backup of VM 100 (qemu)
Oct 04 22:30:04 softrouter pmxcfs[915]: [status] notice: received log
Oct 04 22:30:06 softrouter pmxcfs[915]: [status] notice: received log
Oct 04 22:30:11 softrouter pvestatd[1018]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - got timeout
Oct 04 22:30:12 softrouter pvestatd[1018]: status update time (8.826 seconds)
Oct 04 22:30:20 softrouter pvescheduler[762672]: INFO: Finished Backup of VM 100 (00:00:18)
Oct 04 22:30:21 softrouter pvescheduler[762672]: INFO: Backup job finished successfully
Oct 04 23:17:01 softrouter CRON[777996]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 04 23:17:01 softrouter CRON[777997]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 04 23:17:01 softrouter CRON[777996]: pam_unix(cron:session): session closed for user root
Oct 04 23:26:11 softrouter pmxcfs[915]: [dcdb] notice: data verification successful
Oct 05 00:00:28 softrouter systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
Oct 05 00:00:28 softrouter systemd[1]: Starting logrotate.service - Rotate log files...
Oct 05 00:00:28 softrouter systemd[1]: dpkg-db-backup.service: Deactivated successfully.
Oct 05 00:00:28 softrouter systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
Oct 05 00:00:28 softrouter systemd[1]: Reloading pveproxy.service - PVE API Proxy Server...
Oct 05 00:00:28 softrouter pveproxy[792089]: send HUP to 1055
Oct 05 00:00:28 softrouter pveproxy[1055]: received signal HUP
Oct 05 00:00:28 softrouter pveproxy[1055]: server closing
Oct 05 00:00:28 softrouter pveproxy[1055]: server shutdown (restart)

Now with the last few times this happened i did not see 'server shutdown' im not sure what other logs or reports to investigate further into this.....
 
actually that above journal output there was more to it..... so the shutdown isnt a thing and it crashed later on at 01:17:01:

Code:
Oct 05 00:00:28 softrouter systemd[1]: Reloaded spiceproxy.service - PVE SPICE Proxy Serve>
Oct 05 00:00:29 softrouter pvefw-logger[321129]: received terminate request (signal)
Oct 05 00:00:29 softrouter systemd[1]: Stopping pvefw-logger.service - Proxmox VE firewall>
Oct 05 00:00:29 softrouter pvefw-logger[321129]: stopping pvefw logger
Oct 05 00:00:29 softrouter spiceproxy[1062]: restarting server
Oct 05 00:00:29 softrouter spiceproxy[1062]: starting 1 worker(s)
Oct 05 00:00:29 softrouter spiceproxy[1062]: worker 792099 started
Oct 05 00:00:29 softrouter pveproxy[1055]: restarting server
Oct 05 00:00:29 softrouter pveproxy[1055]: starting 3 worker(s)
Oct 05 00:00:29 softrouter pveproxy[1055]: worker 792100 started
Oct 05 00:00:29 softrouter pveproxy[1055]: worker 792101 started
Oct 05 00:00:29 softrouter pveproxy[1055]: worker 792102 started
Oct 05 00:00:29 softrouter systemd[1]: pvefw-logger.service: Deactivated successfully.
Oct 05 00:00:29 softrouter systemd[1]: Stopped pvefw-logger.service - Proxmox VE firewall >
Oct 05 00:00:29 softrouter systemd[1]: pvefw-logger.service: Consumed 15.920s CPU time.
Oct 05 00:00:29 softrouter systemd[1]: Starting pvefw-logger.service - Proxmox VE firewall>
Oct 05 00:00:29 softrouter pvefw-logger[792105]: starting pvefw logger
Oct 05 00:00:29 softrouter systemd[1]: Started pvefw-logger.service - Proxmox VE firewall >
Oct 05 00:00:29 softrouter systemd[1]: logrotate.service: Deactivated successfully.
Oct 05 00:00:29 softrouter systemd[1]: Finished logrotate.service - Rotate log files.
Oct 05 00:00:34 softrouter spiceproxy[321123]: worker exit
Oct 05 00:00:34 softrouter spiceproxy[1062]: worker 321123 finished
Oct 05 00:00:34 softrouter pveproxy[627663]: worker exit
Oct 05 00:00:34 softrouter pveproxy[623461]: worker exit
Oct 05 00:00:34 softrouter pveproxy[646508]: worker exit
Oct 05 00:00:34 softrouter pveproxy[1055]: worker 627663 finished
Oct 05 00:00:34 softrouter pveproxy[1055]: worker 623461 finished
Oct 05 00:00:34 softrouter pveproxy[1055]: worker 646508 finished
Oct 05 00:17:01 softrouter CRON[797502]: pam_unix(cron:session): session opened for user r>
Oct 05 00:17:01 softrouter CRON[797503]: (root) CMD (cd / && run-parts --report /etc/cron.>
Oct 05 00:17:01 softrouter CRON[797502]: pam_unix(cron:session): session closed for user r>
Oct 05 00:24:01 softrouter CRON[799776]: pam_unix(cron:session): session opened for user r>
Oct 05 00:24:01 softrouter CRON[799777]: (root) CMD (if [ $(date +%w) -eq 0 ] && [ -x /usr>
Oct 05 00:24:01 softrouter CRON[799776]: pam_unix(cron:session): session closed for user r>
Oct 05 00:26:11 softrouter pmxcfs[915]: [dcdb] notice: data verification successful
Oct 05 01:17:01 softrouter CRON[816993]: pam_unix(cron:session): session opened for user r>
Oct 05 01:17:01 softrouter CRON[816994]: (root) CMD (cd / && run-parts --report /etc/cron.>
Oct 05 01:17:01 softrouter CRON[816993]: pam_unix(cron:session): session closed for user r>
 
actually that above journal output there was more to it..... so the shutdown isnt a thing and it crashed later on at 01:17:01:

Code:
Oct 05 00:00:28 softrouter systemd[1]: Reloaded spiceproxy.service - PVE SPICE Proxy Serve>
Oct 05 00:00:29 softrouter pvefw-logger[321129]: received terminate request (signal)
Oct 05 00:00:29 softrouter systemd[1]: Stopping pvefw-logger.service - Proxmox VE firewall>
Oct 05 00:00:29 softrouter pvefw-logger[321129]: stopping pvefw logger
Oct 05 00:00:29 softrouter spiceproxy[1062]: restarting server
Oct 05 00:00:29 softrouter spiceproxy[1062]: starting 1 worker(s)
Oct 05 00:00:29 softrouter spiceproxy[1062]: worker 792099 started
Oct 05 00:00:29 softrouter pveproxy[1055]: restarting server
Oct 05 00:00:29 softrouter pveproxy[1055]: starting 3 worker(s)
Oct 05 00:00:29 softrouter pveproxy[1055]: worker 792100 started
Oct 05 00:00:29 softrouter pveproxy[1055]: worker 792101 started
Oct 05 00:00:29 softrouter pveproxy[1055]: worker 792102 started
Oct 05 00:00:29 softrouter systemd[1]: pvefw-logger.service: Deactivated successfully.
Oct 05 00:00:29 softrouter systemd[1]: Stopped pvefw-logger.service - Proxmox VE firewall >
Oct 05 00:00:29 softrouter systemd[1]: pvefw-logger.service: Consumed 15.920s CPU time.
Oct 05 00:00:29 softrouter systemd[1]: Starting pvefw-logger.service - Proxmox VE firewall>
Oct 05 00:00:29 softrouter pvefw-logger[792105]: starting pvefw logger
Oct 05 00:00:29 softrouter systemd[1]: Started pvefw-logger.service - Proxmox VE firewall >
Oct 05 00:00:29 softrouter systemd[1]: logrotate.service: Deactivated successfully.
Oct 05 00:00:29 softrouter systemd[1]: Finished logrotate.service - Rotate log files.
Oct 05 00:00:34 softrouter spiceproxy[321123]: worker exit
Oct 05 00:00:34 softrouter spiceproxy[1062]: worker 321123 finished
Oct 05 00:00:34 softrouter pveproxy[627663]: worker exit
Oct 05 00:00:34 softrouter pveproxy[623461]: worker exit
Oct 05 00:00:34 softrouter pveproxy[646508]: worker exit
Oct 05 00:00:34 softrouter pveproxy[1055]: worker 627663 finished
Oct 05 00:00:34 softrouter pveproxy[1055]: worker 623461 finished
Oct 05 00:00:34 softrouter pveproxy[1055]: worker 646508 finished
Oct 05 00:17:01 softrouter CRON[797502]: pam_unix(cron:session): session opened for user r>
Oct 05 00:17:01 softrouter CRON[797503]: (root) CMD (cd / && run-parts --report /etc/cron.>
Oct 05 00:17:01 softrouter CRON[797502]: pam_unix(cron:session): session closed for user r>
Oct 05 00:24:01 softrouter CRON[799776]: pam_unix(cron:session): session opened for user r>
Oct 05 00:24:01 softrouter CRON[799777]: (root) CMD (if [ $(date +%w) -eq 0 ] && [ -x /usr>
Oct 05 00:24:01 softrouter CRON[799776]: pam_unix(cron:session): session closed for user r>
Oct 05 00:26:11 softrouter pmxcfs[915]: [dcdb] notice: data verification successful
Oct 05 01:17:01 softrouter CRON[816993]: pam_unix(cron:session): session opened for user r>
Oct 05 01:17:01 softrouter CRON[816994]: (root) CMD (cd / && run-parts --report /etc/cron.>
Oct 05 01:17:01 softrouter CRON[816993]: pam_unix(cron:session): session closed for user r>
Your crash seems to be exactly the same as mine. No error messages and the system just hangs. I might try going down to PVE 7.4 and the 5.X Kernel soon. My wife is starting to get mad since the internet keeps going out lol.

I also bought new RAM, that gets here tomorrow. Although I feel like my current ram is fine since it passed all of the tests.

I have it unplugged from the WAN now and waiting to see if it still freezes. Then I'll try shutting down the opnsense VM and see if it still freezes then. Not sure if it is opnsense or proxmox causing the freezes right now.
 
Ok well I just reinstalled PVE 7.4-17 and I'm running kernel 5.15.116-1-pve (latest updates for 7.4 and 5.15 kernel).

I also updated my microcode to 2023_0808 which is the latest for the 8505.

Just finished getting everything set back up, let's see if it is stable now. I also left mitigations on for now.
 
Well that only lasted 8 hours...

Just dropoed down to 5.15.107-1-pve, disabled ballooning, and changed cpu from host to kvm64.

I'm wondering if having it set to host is causing it to crash the host as well.

The other thing I'm unsure of is if it will crash without the opnsense VM running. I guess I could try letting it run fit a few days like that.
 
ok @natedogg ive done more testing myself and ive done a fair bit of changes. Mainly BIOS changes, VM settings and removing it from the cluster i had running (turns out i wont benefit from clustering afterall anyway). Im running the latest Enterpise Repo but i dont think its down to this anyway its more configuration.... Ive been running stable for nearing 3 days now and before it was always crashing around 1:17am (no coincidence). It's possible its not BIOS at all but ive left the following disabled/enabled:

CPU Power Management Control
>race to halt RTH = OFF
>turbo mode = OFF
>turbo settings
>Energy Effiecienct P-Sates = off
>Package power limit msr = off
>energy effiecent turbo = off
>Config TDP Configuration
>cTDP Bios Control = Enabled

>C States = OFF
Energy Performance Gain = off (i think)
CFG Lock bit = On
ACPI > Disable
Resize bar - Disabled
CSM - ON

Also make sure Proxmox is booting in UEFI mode and not legacy.

My OPNSENSE vm config (i think you should match this like for like as its working for me):

Code:
agent: 1,fstrim_cloned_disks=1
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 6
cpu: x86-64-v2-AES,flags=+aes
efidisk0: local-lvm:vm-100-disk-5,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:02:00
hostpci1: 0000:03:00
hostpci2: 0000:04:00
machine: q35
memory: 8096
meta: creation-qemu=8.0.2,ctime=1694698266
name: OPNsense
numa: 0
onboot: 1
ostype: other
protection: 1
scsi0: local-lvm:vm-100-disk-0,iothread=1,size=100G
scsihw: virtio-scsi-single
smbios1: uuid=6e96d9ea-5a4c-47fa-b3ad-e83d7e960d0c
sockets: 1
startup: order=1
vmgenid: 0b7f1519-fed3-4c4c-83f8-cf8369d91abe

Also if its in a cluster remove it along with any shared drives SMB/CIFS!! Just have it isolated from anything else while you test for stabilty....

Let me know how you get on.
 
Really crippling a lot of the energy management / performance of the CPU there just to get stable. I had made a lot of those changes, but I did C0/C1 instead of entirely disabled, and I haven't disabled Turbo mode yet. Will give it a shot for now until my new box shows up.

I opened a return request with the seller, I'm going to get a nicer box from CWWK instead since this one just doesn't seem to be working.

I crashed again on my downgraded versions this morning. My BIOS is pretty locked down too, I would try running the memory a little lower (even though I'm not getting errors) but I can't find any settings to do that.
 
I have the same thought as you regarding the energy consumption but to iron out any BIOS parameter being a culprit i disabled the lot and will come back to add them back in. If it means anything i did test the consumption from the wall on my unit and it pulls 16watt even with these changes in place.

Having said that, i dont believe its BIOS settings anyway as now ive made the VM config changes and removed it from a cluster etc it seems to be working fine... (if it survies one more night i think im out the woods with it all) But crashing always at around 1am points to some service running at that time perhaps in cluster mode, some instruction on the CPU emulation/passthrough in VM config (was on host for me before) causing the host to hang....... too many things i changed so not sure what worked....

Those CWWK boxes look like they sport the exact same board/12th gen cpu etc as the unit i got from 'Kingnovy' Ali Express only mine has a fan on the heatsink fins. The same CPU I71265U i have but CWWK charge near £200 more though they have 2 more ports than my 4..... It could well be that they have made customisations to the BIOS firmware for theirs though......
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!