proxmox freezes. General protection fault

edgeswe

New Member
Aug 31, 2022
7
0
1
Hi.
I have pve 7.2 running on an intel nuc and I have issues with that the system totally freezes. No way to interact with the server.
My gut feeling is that it might be hardware related. So i ran memtest a couple of times, but it gave me no errors.
Can someone take a look at the syslog that I attached and maybe guide me in the right direction.
The crash in the log occurred at Aug 28 02:45:53.

thanks!

Code:
Aug 27 21:34:13 pve kernel: perf: interrupt took too long (3372 > 3277), lowering kernel.perf_event_max_sample_rate to 59250
Aug 27 22:17:01 pve CRON[292875]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 27 22:17:01 pve CRON[292876]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 27 22:17:01 pve CRON[292875]: pam_unix(cron:session): session closed for user root
Aug 27 23:17:01 pve CRON[301228]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 27 23:17:01 pve CRON[301229]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 27 23:17:01 pve CRON[301228]: pam_unix(cron:session): session closed for user root
Aug 28 00:00:47 pve systemd[1]: Starting Rotate log files...
Aug 28 00:00:47 pve systemd[1]: Starting Daily man-db regeneration...
Aug 28 00:00:47 pve systemd[1]: Reloading PVE API Proxy Server.
Aug 28 00:00:47 pve systemd[1]: man-db.service: Succeeded.
Aug 28 00:00:47 pve systemd[1]: Finished Daily man-db regeneration.
Aug 28 00:00:47 pve pveproxy[307284]: send HUP to 908
Aug 28 00:00:47 pve pveproxy[908]: received signal HUP
Aug 28 00:00:47 pve pveproxy[908]: server closing
Aug 28 00:00:47 pve pveproxy[908]: server shutdown (restart)
Aug 28 00:00:47 pve systemd[1]: Reloaded PVE API Proxy Server.
Aug 28 00:00:47 pve systemd[1]: Reloading PVE SPICE Proxy Server.
Aug 28 00:00:48 pve spiceproxy[307305]: send HUP to 914
Aug 28 00:00:48 pve spiceproxy[914]: received signal HUP
Aug 28 00:00:48 pve spiceproxy[914]: server closing
Aug 28 00:00:48 pve spiceproxy[914]: server shutdown (restart)
Aug 28 00:00:48 pve systemd[1]: Reloaded PVE SPICE Proxy Server.
Aug 28 00:00:48 pve systemd[1]: Stopping Proxmox VE firewall logger...
Aug 28 00:00:48 pve pvefw-logger[547]: received terminate request (signal)
Aug 28 00:00:48 pve pvefw-logger[547]: stopping pvefw logger
Aug 28 00:00:48 pve systemd[1]: pvefw-logger.service: Succeeded.
Aug 28 00:00:48 pve systemd[1]: Stopped Proxmox VE firewall logger.
Aug 28 00:00:48 pve systemd[1]: pvefw-logger.service: Consumed 9.666s CPU time.
Aug 28 00:00:48 pve systemd[1]: Starting Proxmox VE firewall logger...
Aug 28 00:00:48 pve systemd[1]: Started Proxmox VE firewall logger.
Aug 28 00:00:48 pve pvefw-logger[307314]: starting pvefw logger
Aug 28 00:00:48 pve systemd[1]: rsyslog.service: Sent signal SIGHUP to main process 24797 (rsyslogd) on client request.
Aug 28 00:00:48 pve systemd[1]: logrotate.service: Succeeded.
Aug 28 00:00:48 pve systemd[1]: Finished Rotate log files.
Aug 28 00:00:48 pve spiceproxy[914]: restarting server
Aug 28 00:00:48 pve spiceproxy[914]: starting 1 worker(s)
Aug 28 00:00:48 pve spiceproxy[914]: worker 307323 started
Aug 28 00:00:48 pve pveproxy[908]: restarting server
Aug 28 00:00:48 pve pveproxy[908]: starting 3 worker(s)
Aug 28 00:00:48 pve pveproxy[908]: worker 307324 started
Aug 28 00:00:48 pve pveproxy[908]: worker 307325 started
Aug 28 00:00:48 pve pveproxy[908]: worker 307326 started
Aug 28 00:00:53 pve spiceproxy[915]: worker exit
Aug 28 00:00:53 pve spiceproxy[914]: worker 915 finished
Aug 28 00:00:53 pve pveproxy[250579]: worker exit
Aug 28 00:00:53 pve pveproxy[249594]: worker exit
Aug 28 00:00:53 pve pveproxy[276568]: worker exit
Aug 28 00:00:53 pve pveproxy[908]: worker 250579 finished
Aug 28 00:00:53 pve pveproxy[908]: worker 276568 finished
Aug 28 00:00:53 pve pveproxy[908]: worker 249594 finished
Aug 28 00:10:48 pve rsyslogd[24797]: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="24797" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Aug 28 00:17:01 pve CRON[309588]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 28 00:17:01 pve CRON[309589]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 28 00:17:01 pve CRON[309588]: pam_unix(cron:session): session closed for user root
Aug 28 00:34:58 pve IPCC.xs[248115]: pam_unix(proxmox-ve-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=  user=root
Aug 28 00:35:00 pve pvedaemon[248115]: authentication failure; rhost=::ffff:192.168.50.165 user=root@pam msg=Authentication failure
Aug 28 01:17:01 pve CRON[317925]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 28 01:17:01 pve CRON[317926]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 28 01:17:01 pve CRON[317925]: pam_unix(cron:session): session closed for user root
Aug 28 02:17:01 pve CRON[326274]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 28 02:17:01 pve CRON[326275]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 28 02:17:01 pve CRON[326274]: pam_unix(cron:session): session closed for user root
Aug 28 02:45:53 pve kernel: general protection fault, probably for non-canonical address 0x6964227b203a224b: 0000 [#1] SMP PTI
Aug 28 02:45:53 pve kernel: CPU: 2 PID: 186398 Comm: kvm Tainted: P           O      5.15.30-2-pve #1
Aug 28 02:45:53 pve kernel: Hardware name: Intel Corporation NUC7i5BNH/NUC7i5BNB, BIOS BNKBL357.86A.0085.2021.0901.1844 09/01/2021
Aug 28 02:45:53 pve kernel: RIP: 0010:add_wait_queue+0x48/0x80
Aug 28 02:45:53 pve kernel: Code: 08 49 8d 7c 24 08 49 89 c0 48 39 cf 74 47 48 8d 51 e8 48 89 fe eb 13 48 8b 42 18 48 89 ce 48 8d 50 e8 48 39 c7 74 08 48 89 c1 <f6> 02 20 75 e8 48 8b 16 48 8d 43 18 4c 89 e7 48 89 42 08 48 89 73
Aug 28 02:45:53 pve kernel: RSP: 0018:ffffbe8ec18e3990 EFLAGS: 00010092
Aug 28 02:45:53 pve kernel: RAX: 6964227b203a2263 RBX: ffff96fe2db140e0 RCX: 6964227b203a2263
Aug 28 02:45:53 pve kernel: RDX: 6964227b203a224b RSI: ffff96fd568900f8 RDI: ffff96fd4c7e0d10
Aug 28 02:45:53 pve kernel: RBP: ffffbe8ec18e39a0 R08: 0000000000000246 R09: ffff96fe14750130
Aug 28 02:45:53 pve kernel: R10: ffff96feb6d2d988 R11: ffff96feb6d37590 R12: ffff96fd4c7e0d08
Aug 28 02:45:53 pve kernel: R13: ffff96fd4c7e0d08 R14: ffff96fe2db14000 R15: ffffbe8ec18e3acc
Aug 28 02:45:53 pve kernel: FS:  00007f757202d200(0000) GS:ffff96feb6d00000(0000) knlGS:0000000000000000
Aug 28 02:45:53 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 28 02:45:53 pve kernel: CR2: 00007fd61c000010 CR3: 0000000103f72002 CR4: 00000000003726e0
Aug 28 02:45:53 pve kernel: Call Trace:
Aug 28 02:45:53 pve kernel:  <TASK>
Aug 28 02:45:53 pve kernel:  __pollwait+0x7d/0xd0
Aug 28 02:45:53 pve kernel:  eventfd_poll+0x2f/0x70
Aug 28 02:45:53 pve kernel:  do_sys_poll+0x2ba/0x680
Aug 28 02:45:53 pve kernel:  ? poll_initwait+0x40/0x40
Aug 28 02:45:53 pve kernel:  ? __pollwait+0xd0/0xd0
Aug 28 02:45:53 pve kernel:  ? __pollwait+0xd0/0xd0
Aug 28 02:45:53 pve kernel:  ? __pollwait+0xd0/0xd0
Aug 28 02:45:53 pve kernel:  ? __pollwait+0xd0/0xd0
Aug 28 02:45:53 pve kernel:  ? __pollwait+0xd0/0xd0
Aug 28 02:45:53 pve kernel:  ? __pollwait+0xd0/0xd0
Aug 28 02:45:53 pve kernel:  ? __pollwait+0xd0/0xd0
Aug 28 02:45:53 pve kernel:  ? __pollwait+0xd0/0xd0
Aug 28 02:45:53 pve kernel:  ? __pollwait+0xd0/0xd0
Aug 28 02:45:53 pve kernel:  ? recalibrate_cpu_khz+0x10/0x10
Aug 28 02:45:53 pve kernel:  __x64_sys_ppoll+0xbc/0x150
Aug 28 02:45:53 pve kernel:  do_syscall_64+0x5c/0xc0
Aug 28 02:45:53 pve kernel:  ? exit_to_user_mode_prepare+0x37/0x1b0
Aug 28 02:45:53 pve kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Aug 28 02:45:53 pve kernel:  ? __x64_sys_read+0x1a/0x20
Aug 28 02:45:53 pve kernel:  ? do_syscall_64+0x69/0xc0
Aug 28 02:45:53 pve kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Aug 28 02:45:53 pve kernel:  ? __x64_sys_write+0x1a/0x20
Aug 28 02:45:53 pve kernel:  ? do_syscall_64+0x69/0xc0
Aug 28 02:45:53 pve kernel:  ? do_syscall_64+0x69/0xc0
Aug 28 02:45:53 pve kernel:  ? do_syscall_64+0x69/0xc0
Aug 28 02:45:53 pve kernel:  ? asm_sysvec_apic_timer_interrupt+0xa/0x20
Aug 28 02:45:53 pve kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Aug 28 02:45:53 pve kernel: RIP: 0033:0x7f757c9fc4f6
Aug 28 02:45:53 pve kernel: Code: 7c 24 08 e8 ac 18 f9 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 44 89 cf 89 44 24 08 e8 d6 18 f9 ff 8b 44
Aug 28 02:45:53 pve kernel: RSP: 002b:00007ffc8f3a0330 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
Aug 28 02:45:53 pve kernel: RAX: ffffffffffffffda RBX: 000055af138a3d30 RCX: 00007f757c9fc4f6
Aug 28 02:45:53 pve kernel: RDX: 00007ffc8f3a0350 RSI: 0000000000000050 RDI: 000055af14734400
Aug 28 02:45:53 pve kernel: RBP: 00007ffc8f3a03bc R08: 0000000000000008 R09: 0000000000000000
Aug 28 02:45:53 pve kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffc8f3a0350
Aug 28 02:45:53 pve kernel: R13: 000055af138a3d30 R14: 00007ffc8f3a03c0 R15: 0000000000000000
Aug 28 02:45:53 pve kernel:  </TASK>
Aug 28 02:45:53 pve kernel: Modules linked in: veth tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_rapl_msr snd_soc_skl intel_rapl_common snd_soc_hdac_hda intel_tcc_cooling x86_pkg_temp_thermal snd_hda_ext_core intel_powerclamp snd_soc_sst_ipc coretemp snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core kvm_intel snd_compress ac97_bus snd_pcm_dmaengine kvm i915 irqbypass snd_hda_intel crct10dif_pclmul snd_intel_dspcfg mei_hdcp ghash_clmulni_intel iwlmvm ttm snd_intel_sdw_acpi mac80211 btusb btrtl snd_hda_codec libarc4 aesni_intel btbcm snd_hda_core crypto_simd snd_hwdep cryptd btintel snd_pcm drm_kms_helper rapl snd_timer iwlwifi intel_cstate bluetooth cec intel_wmi_thunderbolt rc_core snd mei_me pcspkr wmi_bmof efi_pstore ecdh_generic i2c_algo_bit mei
Aug 28 02:45:53 pve kernel:  intel_xhci_usb_role_switch ecc soundcore fb_sys_fops ee1004 cfg80211 syscopyarea sysfillrect sysimgblt intel_pch_thermal mac_hid acpi_pad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi drm scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c rtsx_pci_sdmmc crc32_pclmul i2c_i801 e1000e rtsx_pci i2c_smbus xhci_pci xhci_pci_renesas xhci_hcd ahci libahci wmi video
Aug 28 02:45:53 pve kernel: ---[ end trace ee4436259ec8b61f ]---
Aug 28 02:45:53 pve kernel: RIP: 0010:add_wait_queue+0x48/0x80
Aug 28 02:45:53 pve kernel: Code: 08 49 8d 7c 24 08 49 89 c0 48 39 cf 74 47 48 8d 51 e8 48 89 fe eb 13 48 8b 42 18 48 89 ce 48 8d 50 e8 48 39 c7 74 08 48 89 c1 <f6> 02 20 75 e8 48 8b 16 48 8d 43 18 4c 89 e7 48 89 42 08 48 89 73
Aug 28 02:45:53 pve kernel: RSP: 0018:ffffbe8ec18e3990 EFLAGS: 00010092
Aug 28 02:45:53 pve kernel: RAX: 6964227b203a2263 RBX: ffff96fe2db140e0 RCX: 6964227b203a2263
Aug 28 02:45:53 pve kernel: RDX: 6964227b203a224b RSI: ffff96fd568900f8 RDI: ffff96fd4c7e0d10
Aug 28 02:45:53 pve kernel: RBP: ffffbe8ec18e39a0 R08: 0000000000000246 R09: ffff96fe14750130
Aug 28 02:45:53 pve kernel: R10: ffff96feb6d2d988 R11: ffff96feb6d37590 R12: ffff96fd4c7e0d08
Aug 28 02:45:53 pve kernel: R13: ffff96fd4c7e0d08 R14: ffff96fe2db14000 R15: ffffbe8ec18e3acc
Aug 28 02:45:53 pve kernel: FS:  00007f757202d200(0000) GS:ffff96feb6d00000(0000) knlGS:0000000000000000
Aug 28 02:45:53 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 28 02:45:53 pve kernel: CR2: 00007fd61c000010 CR3: 0000000103f72002 CR4: 00000000003726e0
Aug 28 02:45:55 pve kernel: BUG: unable to handle page fault for address: ffffbe8ec18e3b78
Aug 28 02:45:55 pve kernel: #PF: supervisor read access in kernel mode
Aug 28 02:45:55 pve kernel: #PF: error_code(0x0000) - not-present page
-- Reboot --
Aug 28 07:13:14 pve kernel: Linux version 5.15.30-2-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.30-3 (Fri, 22 Apr 2022 18:08:27 +0200) ()
 
Some basics I would do:
  1. Update the bios/UEFI (yours is out-of-date)
  2. Update the SSD firmware, if available
  3. Update the PVE-host (yours is out-of-date; see: [1], if you do not have a subscription)
  4. Install the Intel microcode package [2]

[1] https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_no_subscription_repo
[2] https://wiki.debian.org/Microcode

I think I have already updated the bios to the latest version and I also running the latest vpe?. Or how can you see it is out of date?.

I will try to update the ssd firmware and intel microcode package.
Thank you so much for your help!.
 
I think I have already updated the bios to the latest version and I also running the latest vpe?. Or how can you see it is out of date?.

I will try to update the ssd firmware and intel microcode package.
Thank you so much for your help!.

Your bios version: BNKBL357.86A.0085
Recent bios version: BNKBL357.86A.0088

Your PVE-kernel version: 5.15.30-2-pve
Recent PVE-kernel version: 5.15.39-4-pve

As I said, if you do not have a active subscription, you need to disable/comment out the pve-enterprise repository [1] (see the note there) and add the pve-no-subscription repository [2].
Once this is done, update [3] the PVE-host and reboot him afterwards.

[1] https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_enterprise_repo
[2] https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_no_subscription_repo
[3] https://pve.proxmox.com/wiki/System_Software_Updates
 
Your bios version: BNKBL357.86A.0085
Recent bios version: BNKBL357.86A.0088

Your PVE-kernel version: 5.15.30-2-pve
Recent PVE-kernel version: 5.15.39-4-pve

As I said, if you do not have a active subscription, you need to disable/comment out the pve-enterprise repository [1] (see the note there) and add the pve-no-subscription repository [2].
Once this is done, update [3] the PVE-host and reboot him afterwards.

[1] https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_enterprise_repo
[2] https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_no_subscription_repo
[3] https://pve.proxmox.com/wiki/System_Software_Updates

Ok. I actually had the latest bios update, but I updated it after the log was taken. (but still experience freezes with the latest bios aswell).
I have now updated to the latest pve kernel, and also installed the microcode package.
I have some issues to upgrade the ssd firmware, so I will wait with that and try this first.
I also disabled the onboard bluetooth and wifi adapter the reduce hardware related issues.
Thank you so much!
 
New crash tonight. This time without any entries in the syslog :(
Is there some more logs on the system that I can access to figure out the root cause?.
 
  • When the freezes/crashes happen, is there something (load producing) going on? Scheduled backups or something? Inside VMs anything?
  • How much RAM does the system have? Do you maybe overcommit it?
  • Which filesystem(s) do you use on the host?
  • Are all the temperatures okay in any load condition?

Otherwise I personally would start the usual process of elimination now:
  • Try (a) different Linux distro(s)
  • Try a Windows installation!
  • Stress test all components in the system with those

Did you had run this system before without a problem? If yes, with which OS?
 
  • When the freezes/crashes happen, is there something (load producing) going on? Scheduled backups or something? Inside VMs anything?
  • How much RAM does the system have? Do you maybe overcommit it?
  • Which filesystem(s) do you use on the host?
  • Are all the temperatures okay in any load condition?

Otherwise I personally would start the usual process of elimination now:
  • Try (a) different Linux distro(s)
  • Try a Windows installation!
  • Stress test all components in the system with those

Did you had run this system before without a problem? If yes, with which OS?

There is basically no load on the system. Only running two VM's (home assistant OS, and ubuntu server). only a few % cpu load and not much memory used (about 60% free). All temperatures looks fine.

In the past I ran Home assistant OS (I think it is based on alpine), and I had the exactly the same behaviour with crashes. At that point I thought it was software related, so thats why I started use proxmox (I thought that maybe in that case only the VM would freeze).
The NUC have in the past been running windows without any problems.
 
In the past I ran Home assistant OS (I think it is based on alpine), and I had the exactly the same behaviour with crashes. At that point I thought it was software related, so thats why I started use proxmox (I thought that maybe in that case only the VM would freeze).
The NUC have in the past been running windows without any problems.

To be sure, you could test Windows again; if the system runs stable with it, it seems like a compatibility issue with the Linux kernel and/or a driver or whatever Linux does different to Windows with your system.
But I have absolutely no clue, how to troubleshoot this, sorry. :(

Hopefully someone other has an idea.
 
I just started to belive that the issue was gone, but then I got another system freeze with syslog

Sep 09 18:02:49 pve kernel: BUG: unable to handle page fault for address: ffffffb146fa6838 Sep 09 18:02:49 pve kernel: #PF: supervisor write access in kernel mode Sep 09 18:02:49 pve kernel: #PF: error_code(0x0002) - not-present page -- Reboot -- Sep 09 19:08:22 pve kernel: Linux version 5.15.53-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.53-1 (Fri, 26 Aug 2022 16:53:52 +0200) ()

Any ideas?.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!