enable/install crash dump

xokia

Member
Apr 8, 2023
95
8
8
I have a 13900 cpu system crashes fairly regularly. If I disable C-states crashes go away. With C-states enable crashes once a day.

I'd like to try and capture why the system is crashing/hanging but cant seem to install linux crashdump

root@HOME-SERVER:~# sudo apt-get install linux-crashdump
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package linux-crashdump

Does proxmox log crashes or hangs? Does it have crash logging abilities?
 
any help here on how to capture a crash and try to debug?

Under windows 10 and windows 11 this system is stable. i.e. non proxmox installation can be up for weeks and no issues.
Have run memory stress test and no memory issues.
GPU is the intel iGPU
motherboard is ASUS B760-I
Audio has been disabled in bios
wifi has been disabled in bios
Lan is intel lan
no pcie cards.

Fairly basic setup.
 
Last edited:
proxmox crashes even with nothing else running if C-states are not disabled :(
Seems to be an issue here but dont know what.
 
I don't know, sorry, but maybe there are error messages or clues in journalctl (Syslog in GUI)?
useful thank you looks like some cron job starts while the cpu is probably in a C-state and then the system thinks the CPU is stuck and it crashes. Just my guess

May 23 08:09:43 HOME-SERVER pvescheduler[1913]: starting server
May 23 08:09:43 HOME-SERVER systemd[1]: Started Proxmox VE scheduler.
May 23 08:09:43 HOME-SERVER systemd[1]: Reached target Multi-User System.
May 23 08:09:43 HOME-SERVER systemd[1]: Reached target Graphical Interface.
May 23 08:09:43 HOME-SERVER systemd[1]: Starting Update UTMP about System Runlevel Changes...
May 23 08:09:43 HOME-SERVER systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
May 23 08:09:43 HOME-SERVER systemd[1]: Finished Update UTMP about System Runlevel Changes.
May 23 08:09:43 HOME-SERVER systemd[1]: Startup finished in 46.208s (firmware) + 5.454s (loader) + 3.333s (kernel) + 52.603s (userspace) = 1min 47.599s.
May 23 08:17:01 HOME-SERVER CRON[3402]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 23 08:17:01 HOME-SERVER CRON[3403]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 23 08:17:01 HOME-SERVER CRON[3402]: pam_unix(cron:session): session closed for user root
May 23 08:23:52 HOME-SERVER systemd[1]: Starting Cleanup of Temporary Directories...
May 23 08:23:52 HOME-SERVER systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
May 23 08:23:52 HOME-SERVER systemd[1]: Finished Cleanup of Temporary Directories.
May 23 09:17:01 HOME-SERVER CRON[12955]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 23 09:17:01 HOME-SERVER CRON[12956]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 23 09:17:01 HOME-SERVER CRON[12955]: pam_unix(cron:session): session closed for user root
May 23 10:17:01 HOME-SERVER CRON[22448]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 23 10:17:01 HOME-SERVER CRON[22449]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 23 10:17:01 HOME-SERVER CRON[22448]: pam_unix(cron:session): session closed for user root
May 23 10:37:01 HOME-SERVER kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
May 23 10:37:01 HOME-SERVER kernel: #PF: supervisor write access in kernel mode
May 23 10:37:01 HOME-SERVER kernel: #PF: error_code(0x0002) - not-present page
May 23 10:37:01 HOME-SERVER kernel: PGD 0 P4D 0
May 23 10:37:01 HOME-SERVER kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
May 23 10:37:01 HOME-SERVER kernel: CPU: 6 PID: 25595 Comm: pvescheduler Tainted: P O 6.2.11-2-pve #1
May 23 10:37:01 HOME-SERVER kernel: Hardware name: ASUS System Product Name/ROG STRIX B760-I GAMING WIFI, BIOS 1003 04/14/2023
May 23 10:37:01 HOME-SERVER kernel: RIP: 0010:get_page_from_freelist+0x1d8/0x10e0
May 23 10:37:01 HOME-SERVER kernel: Code: 48 8b 43 18 49 39 c6 0f 84 4a 0a 00 00 48 8b 43 18 48 8b 08 48 8b 50 08 48 8d 70 f8 48 89 75 c0 48 be 00 01 00 00 00 00 ad de <48> 89 51 08 48 89 0a 48 89 30 48 83 c6 22 48 89 70 0>
May 23 10:37:01 HOME-SERVER kernel: RSP: 0000:ffffa70d07327a38 EFLAGS: 00010287
May 23 10:37:01 HOME-SERVER kernel: RAX: fffff86e44283208 RBX: ffff8feb7f1b7b90 RCX: 0000000000000000
May 23 10:37:01 HOME-SERVER kernel: RDX: ffff8feb7f1b7ba8 RSI: dead000000000100 RDI: ffff8feb7f1b7b80
May 23 10:37:01 HOME-SERVER kernel: RBP: ffffa70d07327b50 R08: 0000000000000000 R09: 0000000000000000
May 23 10:37:01 HOME-SERVER kernel: R10: ffff8fdc414f13b0 R11: ffff8fdc414f13b0 R12: ffff8feb7f1b7b80
May 23 10:37:01 HOME-SERVER kernel: R13: 0000000000000000 R14: ffff8feb7f1b7ba8 R15: ffff8febbf7d5c00
May 23 10:37:01 HOME-SERVER kernel: FS: 00007fbd86cd2280(0000) GS:ffff8feb7f180000(0000) knlGS:0000000000000000
May 23 10:37:01 HOME-SERVER kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 23 10:37:01 HOME-SERVER kernel: CR2: 0000000000000008 CR3: 0000000183928005 CR4: 0000000000770ee0
May 23 10:37:01 HOME-SERVER kernel: PKRU: 55555554
May 23 10:37:01 HOME-SERVER kernel: Call Trace:
May 23 10:37:01 HOME-SERVER kernel: <TASK>
May 23 10:37:01 HOME-SERVER kernel: ? post_alloc_hook+0xd5/0x120
May 23 10:37:01 HOME-SERVER kernel: ? get_page_from_freelist+0x73e/0x10e0
May 23 10:37:01 HOME-SERVER kernel: ? drain_stock+0x6b/0xb0
May 23 10:37:01 HOME-SERVER kernel: __alloc_pages+0x1c8/0x1230
May 23 10:37:01 HOME-SERVER kernel: ? mod_memcg_state+0x19/0x30
May 23 10:37:01 HOME-SERVER kernel: ? memcg_account_kmem+0x1d/0x50
May 23 10:37:01 HOME-SERVER kernel: ? __memcg_kmem_charge_page+0x1ab/0x280
May 23 10:37:01 HOME-SERVER kernel: ? __mod_memcg_lruvec_state+0x67/0xf0
May 23 10:37:01 HOME-SERVER kernel: __folio_alloc+0x1b/0x50
May 23 10:37:01 HOME-SERVER kernel: ? policy_node+0x5b/0x80
May 23 10:37:01 HOME-SERVER kernel: vma_alloc_folio+0xab/0x400
May 23 10:37:01 HOME-SERVER kernel: ? page_remove_rmap+0x131/0x4b0
May 23 10:37:01 HOME-SERVER kernel: do_wp_page+0x28a/0xab0
May 23 10:37:01 HOME-SERVER kernel: __handle_mm_fault+0xa46/0x1140
May 23 10:37:01 HOME-SERVER kernel: handle_mm_fault+0x110/0x330
May 23 10:37:01 HOME-SERVER kernel: do_user_addr_fault+0x1be/0x710
May 23 10:37:01 HOME-SERVER kernel: ? exit_to_user_mode_prepare+0x37/0x180
May 23 10:37:01 HOME-SERVER kernel: exc_page_fault+0x76/0x180
May 23 10:37:01 HOME-SERVER kernel: asm_exc_page_fault+0x27/0x30
May 23 10:37:01 HOME-SERVER kernel: RIP: 0033:0x55701aea2a21
May 23 10:37:01 HOME-SERVER kernel: Code: fe 48 8b 50 20 e8 3f ff fc ff 48 8d 3d 98 2b 2e 00 e8 83 15 fa ff 85 c0 0f 85 f7 0d 00 00 49 8b 06 48 8b 40 30 48 85 c0 74 0c <48> 83 40 18 01 49 8b 06 48 8b 40 30 49 8b 17 48 8d 3>
May 23 10:37:01 HOME-SERVER kernel: RSP: 002b:00007ffd8d02b3c0 EFLAGS: 00010206
May 23 10:37:01 HOME-SERVER kernel: RAX: 000055701d9e7cb8 RBX: 000055701ce552a0 RCX: 0000000000000000
May 23 10:37:01 HOME-SERVER kernel: RDX: 0000000000000000 RSI: 000055701d578508 RDI: 000055701b1855a0
May 23 10:37:01 HOME-SERVER kernel: RBP: 000055701ce552a0 R08: 000055701ce889e0 R09: 000055701b1265e0
May 23 10:37:01 HOME-SERVER kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 000055701d9fe190
May 23 10:37:01 HOME-SERVER kernel: R13: 0000000000000000 R14: 000055701d9fe190 R15: 0000557022e766d0
May 23 10:37:01 HOME-SERVER kernel: </TASK>
May 23 10:37:01 HOME-SERVER kernel: Modules linked in: cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nf>
May 23 10:37:01 HOME-SERVER kernel: hwmon_vid coretemp vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb hid_generic usbhid hi>
May 23 10:37:01 HOME-SERVER kernel: CR2: 0000000000000008
May 23 10:37:01 HOME-SERVER kernel: ---[ end trace 0000000000000000 ]---
May 23 10:37:01 HOME-SERVER kernel: pstore: backend (efi_pstore) writing error (-5)
May 23 10:37:01 HOME-SERVER kernel: RIP: 0010:get_page_from_freelist+0x1d8/0x10e0
May 23 10:37:01 HOME-SERVER kernel: Code: 48 8b 43 18 49 39 c6 0f 84 4a 0a 00 00 48 8b 43 18 48 8b 08 48 8b 50 08 48 8d 70 f8 48 89 75 c0 48 be 00 01 00 00 00 00 ad de <48> 89 51 08 48 89 0a 48 89 30 48 83 c6 22 48 89 70 0>
May 23 10:37:01 HOME-SERVER kernel: RSP: 0000:ffffa70d07327a38 EFLAGS: 00010287
May 23 10:37:01 HOME-SERVER kernel: RAX: fffff86e44283208 RBX: ffff8feb7f1b7b90 RCX: 0000000000000000
May 23 10:37:01 HOME-SERVER kernel: RDX: ffff8feb7f1b7ba8 RSI: dead000000000100 RDI: ffff8feb7f1b7b80
May 23 10:37:01 HOME-SERVER kernel: RBP: ffffa70d07327b50 R08: 0000000000000000 R09: 0000000000000000
May 23 10:37:01 HOME-SERVER kernel: R10: ffff8fdc414f13b0 R11: ffff8fdc414f13b0 R12: ffff8feb7f1b7b80
May 23 10:37:01 HOME-SERVER kernel: R13: 0000000000000000 R14: ffff8feb7f1b7ba8 R15: ffff8febbf7d5c00
May 23 10:37:01 HOME-SERVER kernel: FS: 00007fbd86cd2280(0000) GS:ffff8feb7f180000(0000) knlGS:0000000000000000
May 23 10:37:01 HOME-SERVER kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 23 10:37:01 HOME-SERVER kernel: CR2: 0000000000000008 CR3: 0000000183928005 CR4: 0000000000770ee0
May 23 10:37:01 HOME-SERVER kernel: PKRU: 55555554
May 23 10:37:01 HOME-SERVER kernel: note: pvescheduler[25595] exited with irqs disabled
May 23 10:37:01 HOME-SERVER kernel: note: pvescheduler[25595] exited with preempt_count 2
May 23 10:39:35 HOME-SERVER pvestatd[1471]: auth key pair too old, rotating..
May 23 10:42:14 HOME-SERVER kernel: BUG: Bad page state in process lvs pfn:1237c4
May 23 10:42:14 HOME-SERVER kernel: page:00000000976686e6 refcount:0 mapcount:0 mapping:00000000c865f3d6 index:0x1 pfn:0x1237c4
May 23 10:42:14 HOME-SERVER kernel: memcg:ff00000048000000
May 23 10:42:14 HOME-SERVER kernel: invalid mapping:ff00000055000000
May 23 10:42:14 HOME-SERVER kernel: flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
May 23 10:42:14 HOME-SERVER kernel: raw: 0017ffffc0000000 dead000000000100 dead000000000122 ff00000055000000
May 23 10:42:14 HOME-SERVER kernel: raw: 0000000000000001 0000000000000000 00000000ffffffff ff00000048000000
May 23 10:42:14 HOME-SERVER kernel: page dumped because: page still charged to cgroup
May 23 10:42:14 HOME-SERVER kernel: Modules linked in: cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nf>
May 23 10:42:14 HOME-SERVER kernel: hwmon_vid coretemp vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb hid_generic usbhid hi>
May 23 10:42:14 HOME-SERVER kernel: CPU: 12 PID: 26411 Comm: lvs Tainted: P D O 6.2.11-2-pve #1
May 23 10:42:14 HOME-SERVER kernel: Hardware name: ASUS System Product Name/ROG STRIX B760-I GAMING WIFI, BIOS 1003 04/14/2023
May 23 10:42:14 HOME-SERVER kernel: Call Trace:
 
May 23 10:42:14 HOME-SERVER kernel: <TASK>
May 23 10:42:14 HOME-SERVER kernel: dump_stack_lvl+0x48/0x70
May 23 10:42:14 HOME-SERVER kernel: dump_stack+0x10/0x20
May 23 10:42:14 HOME-SERVER kernel: bad_page+0x72/0x100
May 23 10:42:14 HOME-SERVER kernel: check_new_pages+0xd2/0x110
May 23 10:42:14 HOME-SERVER kernel: rmqueue_bulk+0x243/0x7b0
May 23 10:42:14 HOME-SERVER kernel: ? post_alloc_hook+0xd5/0x120
May 23 10:42:14 HOME-SERVER kernel: get_page_from_freelist+0xc42/0x10e0
May 23 10:42:14 HOME-SERVER kernel: __alloc_pages+0x1c8/0x1230
May 23 10:42:14 HOME-SERVER kernel: ? xas_load+0x1f/0xf0
May 23 10:42:14 HOME-SERVER kernel: ? aa_file_perm+0x15e/0x5e0
May 23 10:42:14 HOME-SERVER kernel: ? current_time+0x2b/0x100
May 23 10:42:14 HOME-SERVER kernel: ? shmem_file_read_iter+0x320/0x380
May 23 10:42:14 HOME-SERVER kernel: __folio_alloc+0x1b/0x50
May 23 10:42:14 HOME-SERVER kernel: ? policy_node+0x5b/0x80
May 23 10:42:14 HOME-SERVER kernel: vma_alloc_folio+0xab/0x400
May 23 10:42:14 HOME-SERVER kernel: ? vfs_read+0x20c/0x2e0
May 23 10:42:14 HOME-SERVER kernel: __handle_mm_fault+0x9c5/0x1140
May 23 10:42:14 HOME-SERVER kernel: ? kmem_cache_free+0x1e/0x3b0
May 23 10:42:14 HOME-SERVER kernel: handle_mm_fault+0x110/0x330
May 23 10:42:14 HOME-SERVER kernel: do_user_addr_fault+0x1be/0x710
May 23 10:42:14 HOME-SERVER kernel: ? do_syscall_64+0x69/0x90
May 23 10:42:14 HOME-SERVER kernel: exc_page_fault+0x76/0x180
May 23 10:42:14 HOME-SERVER kernel: asm_exc_page_fault+0x27/0x30
May 23 10:42:14 HOME-SERVER kernel: RIP: 0033:0x7fc345be0e45
May 23 10:42:14 HOME-SERVER kernel: Code: 49 8d 0c 2f 48 8b 5c 24 28 49 39 d4 49 89 4c 24 60 0f 95 c2 48 83 c8 01 0f b6 d2 48 c1 e2 02 48 09 ea 48 83 ca 01 49 89 57 08 <48> 89 41 08 49 83 c7 10 eb b2 48 8d 3d 92 5e 11 00 e>
May 23 10:42:14 HOME-SERVER kernel: RSP: 002b:00007ffedeafd130 EFLAGS: 00010206
May 23 10:42:14 HOME-SERVER kernel: RAX: 0000000000008971 RBX: 0000000000000801 RCX: 0000559bc86f5690
May 23 10:42:14 HOME-SERVER kernel: RDX: 0000000000000811 RSI: 0000000000000000 RDI: 0000000000000004
May 23 10:42:14 HOME-SERVER kernel: RBP: 0000000000000810 R08: 0000000000000003 R09: 00007fc345d2abe0
May 23 10:42:14 HOME-SERVER kernel: R10: 000000000000006e R11: 0000000000000000 R12: 00007fc345d2ab80
May 23 10:42:14 HOME-SERVER kernel: </TASK>
May 23 10:53:32 HOME-SERVER kernel: watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [sed:28130]
May 23 10:53:32 HOME-SERVER kernel: Modules linked in: cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nf>
May 23 10:53:32 HOME-SERVER kernel: hwmon_vid coretemp vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb hid_generic usbhid hi>
May 23 10:53:32 HOME-SERVER kernel: CPU: 8 PID: 28130 Comm: sed Tainted: P B D O 6.2.11-2-pve #1
May 23 10:53:32 HOME-SERVER kernel: Hardware name: ASUS System Product Name/ROG STRIX B760-I GAMING WIFI, BIOS 1003 04/14/2023
May 23 10:53:32 HOME-SERVER kernel: RIP: 0010:filemap_fault+0x5c2/0xa40
May 23 10:53:32 HOME-SERVER kernel: Code: 49 89 85 98 00 00 00 48 89 45 b0 e8 d8 cf 00 00 4d 89 e2 41 bd 04 00 00 00 e9 87 fb ff ff f0 41 80 24 24 fe 0f 88 03 01 00 00 <f0> 41 ff 4c 24 34 0f 84 d3 00 00 00 45 84 db 0f 85 7>
May 23 10:53:32 HOME-SERVER kernel: RSP: 0018:ffffa70d0f3bf980 EFLAGS: 00000206
May 23 10:53:32 HOME-SERVER kernel: RAX: fffff86e449ad480 RBX: 000000000000001d RCX: 0000000000000000
May 23 10:53:32 HOME-SERVER kernel: RDX: 0000000000000000 RSI: 000000000000001d RDI: ffff8fdc4a9b9980
May 23 10:53:32 HOME-SERVER kernel: RBP: ffffa70d0f3bfa18 R08: ffff8feb7f200000 R09: fffff86e449ad4b4
May 23 10:53:32 HOME-SERVER kernel: R10: 0000000000000000 R11: 0000000000000001 R12: fffff86e449ad480
May 23 10:53:32 HOME-SERVER kernel: R13: 0000000000000000 R14: ffffa70d0f3bfab0 R15: ffff8fdc5456c878
May 23 10:53:32 HOME-SERVER kernel: FS: 0000000000000000(0000) GS:ffff8feb7f200000(0000) knlGS:0000000000000000
May 23 10:53:32 HOME-SERVER kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 23 10:53:32 HOME-SERVER kernel: CR2: 000055be892b34ec CR3: 0000000182d82004 CR4: 0000000000770ee0
May 23 10:53:32 HOME-SERVER kernel: PKRU: 55555554
May 23 10:53:32 HOME-SERVER kernel: Call Trace:
May 23 10:53:32 HOME-SERVER kernel: <TASK>
May 23 10:53:32 HOME-SERVER kernel: ? __mod_lruvec_state+0x37/0x50
May 23 10:53:32 HOME-SERVER kernel: __do_fault+0x36/0x140
May 23 10:53:32 HOME-SERVER kernel: do_fault+0x31c/0x410
May 23 10:53:32 HOME-SERVER kernel: __handle_mm_fault+0x699/0x1140
May 23 10:53:32 HOME-SERVER kernel: ? vma_set_page_prot+0x69/0xb0
May 23 10:53:32 HOME-SERVER kernel: handle_mm_fault+0x110/0x330
May 23 10:53:32 HOME-SERVER kernel: do_user_addr_fault+0x1be/0x710
May 23 10:53:32 HOME-SERVER kernel: exc_page_fault+0x76/0x180
May 23 10:53:32 HOME-SERVER kernel: asm_exc_page_fault+0x27/0x30
May 23 10:53:32 HOME-SERVER kernel: RIP: 0010:padzero+0x3c/0x70
May 23 10:53:32 HOME-SERVER kernel: Code: 00 00 48 29 c1 48 ba 00 f0 ff ff ff 7f 00 00 48 39 d1 77 32 48 8d 84 10 00 f0 ff ff 48 39 c7 77 25 55 48 89 e5 0f 01 cb 31 c0 <f3> aa 0f 1f 00 0f 01 ca 48 85 c9 75 19 31 c0 5d c3 c>
May 23 10:53:32 HOME-SERVER kernel: RSP: 0018:ffffa70d0f3bfcd0 EFLAGS: 00050246
May 23 10:53:32 HOME-SERVER kernel: RAX: 0000000000000000 RBX: 000055be892b34ec RCX: 0000000000000b14
May 23 10:53:32 HOME-SERVER kernel: RDX: 00007ffffffff000 RSI: 000055be892b3e88 RDI: 000055be892b34ec
May 23 10:53:32 HOME-SERVER kernel: RBP: ffffa70d0f3bfcd0 R08: 000055be892b2000 R09: 0000000000000000
May 23 10:53:32 HOME-SERVER kernel: R10: ffffa70d0f3bfc40 R11: ffff8fdc4d694b40 R12: ffff8fdc490f5e00
May 23 10:53:32 HOME-SERVER kernel: R13: ffff8fdcbfa26c00 R14: 0000000000000001 R15: 0000000000000000
May 23 10:53:32 HOME-SERVER kernel: load_elf_binary+0x739/0x1740
May 23 10:53:32 HOME-SERVER kernel: bprm_execve+0x280/0x680
May 23 10:53:32 HOME-SERVER kernel: do_execveat_common+0x1b3/0x260
May 23 10:53:32 HOME-SERVER kernel: ? getname_flags.part.0+0x4c/0x1b0
May 23 10:53:32 HOME-SERVER kernel: __x64_sys_execve+0x39/0x50
May 23 10:53:32 HOME-SERVER kernel: do_syscall_64+0x59/0x90
May 23 10:53:32 HOME-SERVER kernel: ? exit_to_user_mode_prepare+0x37/0x180
May 23 10:53:32 HOME-SERVER kernel: ? irqentry_exit_to_user_mode+0x9/0x20
May 23 10:53:32 HOME-SERVER kernel: ? irqentry_exit+0x3b/0x50
May 23 10:53:32 HOME-SERVER kernel: ? exc_page_fault+0x87/0x180
May 23 10:53:32 HOME-SERVER kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
May 23 10:53:32 HOME-SERVER kernel: RIP: 0033:0x7fbe44a98c07
May 23 10:53:32 HOME-SERVER kernel: Code: Unable to access opcode bytes at 0x7fbe44a98bdd.
May 23 10:53:32 HOME-SERVER kernel: RSP: 002b:00007ffdf47c5448 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
May 23 10:53:32 HOME-SERVER kernel: RAX: ffffffffffffffda RBX: 000056350f4535d8 RCX: 00007fbe44a98c07
May 23 10:53:32 HOME-SERVER kernel: RDX: 000056350f4535f0 RSI: 000056350f4535d8 RDI: 000056350f453640
May 23 10:53:32 HOME-SERVER kernel: RBP: 000056350f32446e R08: 000056350f324470 R09: 000056350f32447b
May 23 10:53:32 HOME-SERVER kernel: R10: 000000000000000d R11: 0000000000000246 R12: 000056350f4535f0
 
This one took down my last boot.
I cant find the file its looking for
root@HOME-SERVER:~# ldd /lib/security/pam_unix.so
ldd: /lib/security/pam_unix.so: No such file or directory

May 23 20:07:41 HOME-SERVER smartd[1093]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 69
May 23 20:07:41 HOME-SERVER smartd[1093]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 33 to 31
May 23 20:07:46 HOME-SERVER smartd[1093]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 66 to 69
May 23 20:07:46 HOME-SERVER smartd[1093]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 34 to 31
May 23 20:07:46 HOME-SERVER smartd[1093]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 171 to 185
May 23 20:17:01 HOME-SERVER CRON[29618]: PAM unable to dlopen(pam_unix.so): /lib/security/pam_unix.so: cannot open shared object file: No such file or directory
May 23 20:17:01 HOME-SERVER CRON[29618]: PAM adding faulty module: pam_unix.so
May 23 20:17:01 HOME-SERVER cron[29618]: Authentication failure
May 23 20:17:01 HOME-SERVER CRON[29618]: Authentication failure
May 23 20:37:47 HOME-SERVER smartd[1093]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 185 to 180
May 23 21:07:47 HOME-SERVER smartd[1093]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 180 to 185
May 23 21:17:01 HOME-SERVER CRON[39181]: PAM unable to dlopen(pam_unix.so): /lib/security/pam_unix.so: cannot open shared object file: No such file or directory
May 23 21:17:01 HOME-SERVER CRON[39181]: PAM adding faulty module: pam_unix.so
May 23 21:17:01 HOME-SERVER cron[39181]: Authentication failure
May 23 21:17:01 HOME-SERVER CRON[39181]: Authentication failure
 
I'd like to try and capture why the system is crashing/hanging but cant seem to install linux crashdump
Why do you add the linux there? The packages are kdump-tools and crash. Are you familiar with crashdump? If not, here is a howto for debian. It is almost the same for PVE.

May 23 20:17:01 HOME-SERVER CRON[29618]: PAM unable to dlopen(pam_unix.so): /lib/security/pam_unix.so: cannot open shared object file: No such file or directory
That has nothing to do with your kernel crash. It's userland and the file is normally not there.
 
  • Like
Reactions: xokia and leesteken
Why do you add the linux there? The packages are kdump-tools and crash. Are you familiar with crashdump? If not, here is a howto for debian. It is almost the same for PVE.


That has nothing to do with your kernel crash. It's userland and the file is normally not there.
I'll take a look at the crashdump info thank you. When I posted I assumed it was linux then saw proxmox is using debian, didnt modify the post.
All I have been able to make out so far is that this always happens after the system has been sitting idle for long period of time. So I assume cores are in deep C-state. And then seems to happen when CRON jobs are called. I always see CRON before the crash.

Otherwise the system runs fine.

Right after that message this repeated a ton of times
May 23 21:17:01 HOME-SERVER CRON[39181]: PAM unable to dlopen(pam_unix.so): /lib/security/pam_unix.so: cannot open shared object file: No such file or directory
May 23 21:17:01 HOME-SERVER CRON[39181]: PAM adding faulty module: pam_unix.so
May 23 21:17:01 HOME-SERVER cron[39181]: Authentication failure
May 23 21:17:01 HOME-SERVER CRON[39181]: Authentication failure
May 23 21:27:39 HOME-SERVER postfix/master[1459]: warning: process /usr/lib/postfix/sbin/pickup pid 40880 exit status 127
May 23 21:27:39 HOME-SERVER postfix/master[1459]: warning: /usr/lib/postfix/sbin/pickup: bad command startup -- throttling
May 23 21:28:39 HOME-SERVER postfix/master[1459]: warning: process /usr/lib/postfix/sbin/pickup pid 41042 exit status 127

then this
May 24 03:10:36 HOME-SERVER kernel: traps: pvestatd[1475] general protection fault ip:5598ea867237 sp:7ffd7bef4a70 error:0 in perl[5598ea808000+185000]
May 24 03:10:36 HOME-SERVER systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
May 24 03:10:36 HOME-SERVER systemd[1]: pvestatd.service: Failed with result 'signal'.
May 24 03:10:36 HOME-SERVER systemd[1]: pvestatd.service: Consumed 1min 56.457s CPU time.


May 24 01:02:40 HOME-SERVER audit[75523]: AVC apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="lxc-100_</var/lib/lxc>" name="/run/systemd/unit-r>
May 24 01:02:40 HOME-SERVER kernel: kauditd_printk_skb: 8 callbacks suppressed
May 24 01:02:40 HOME-SERVER kernel: audit: type=1400 audit(1684915360.070:39): apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="lxc-100_</var/li>
May 24 01:02:40 HOME-SERVER kernel: audit: type=1400 audit(1684915360.070:40): apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="lxc-100_</var/li>
May 24 01:02:40 HOME-SERVER audit[75525]: AVC apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="lxc-100_</var/lib/lxc>" name="/run/systemd/unit-r>
May 24 01:02:40 HOME-SERVER kernel: audit: type=1400 audit(1684915360.078:41): apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="lxc-100_</var/li>


eventually dead
May 24 03:55:05 HOME-SERVER pve-firewall[1474]: status update error: command 'ipset save' failed: got signal 11
May 24 03:55:05 HOME-SERVER kernel: ipset[100817]: segfault at 7f4925681084 ip 00007f4924f30c2a sp 00007ffc38a261a8 error 6 in libc-2.31.so[7f4924e90000+159000] likely on CPU 0 (core 0, sock>
May 24 03:55:05 HOME-SERVER kernel: Code: 74 df 48 d3 1d 67 b2 25 00 89 ef dc 44 8b 2b a0 cc d6 02 03 64 44 89 1b eb 9b 66 8c 1f 44 00 8d 41 54 55 45 81 ec a8 f3 00 00 <64> 08 8b 04 25 82 00>
May 24 03:55:06 HOME-SERVER systemd[1]: lxcfs.service: Main process exited, code=killed, status=11/SEGV
May 24 03:55:06 HOME-SERVER kernel: traps: lxcfs[1104] general protection fault ip:7f6f8298ac2a sp:7f6f8286d818 error:0 in libc-2.31.so[7f6f828ea000+159000]
May 24 03:55:06 HOME-SERVER systemd[1]: var-lib-lxcfs.mount: Succeeded.
May 24 03:55:06 HOME-SERVER kernel: traps: systemd[1] general protection fault ip:7f02888eec2a sp:7fffe357aed8 error:0 in libc-2.31.so[7f028884e000+159000]
May 24 03:55:06 HOME-SERVER systemd[1]: Caught <SEGV>, dumped core as pid 100821.
May 24 03:55:06 HOME-SERVER systemd[1]: Freezing execution.
May 24 03:55:15 HOME-SERVER pve-firewall[1474]: status update error: command 'ipset save' failed: got signal 11
May 24 03:55:15 HOME-SERVER kernel: ipset[100832]: segfault at 7f1b8ca61084 ip 00007f1b8c310c2a sp 00007ffde5bc44c8 error 6 in libc-2.31.so[7f1b8c270000+159000] likely on CPU 0 (core 0, sock>
May 24 03:55:15 HOME-SERVER kernel: Code: 74 df 48 d3 1d 67 b2 25 00 89 ef dc 44 8b 2b a0 cc d6 02 03 64 44 89 1b eb 9b 66 8c 1f 44 00 8d 41 54 55 45 81 ec a8 f3 00 00 <64> 08 8b 04 25 82 00>
May 24 03:55:25 HOME-SERVER pve-firewall[1474]: status update error: command 'ipset save' failed: got signal 11
May 24 03:55:25 HOME-SERVER kernel: ipset[100841]: segfault at 7f0634f17084 ip 00007f06347c6c2a sp 00007ffdad985ac8 error 6 in libc-2.31.so[7f0634726000+159000] likely on CPU 0 (core 0, sock>
May 24 03:55:25 HOME-SERVER kernel: Code: 74 df 48 d3 1d 67 b2 25 00 89 ef dc 44 8b 2b a0 cc d6 02 03 64 44 89 1b eb 9b 66 8c 1f 44 00 8d 41 54 55 45 81 ec a8 f3 00 00 <64> 08 8b 04 25 82 00>
lines 2706-2748/2748 (END)
 
Last edited:
eventually dead
That really does not look good. A lot of programs failing with segfaults and protection faults imply some problem with the execution (wrong on the disk, wrong read, bad ram, bad cpu, too hot?). I don't see anything to you can do besides check the system via e.g. memtest, stresstest and also try reinstall on another disk.
 
That really does not look good. A lot of programs failing with segfaults and protection faults imply some problem with the execution (wrong on the disk, wrong read, bad ram, bad cpu, too hot?). I don't see anything to you can do besides check the system via e.g. memtest, stresstest and also try reinstall on another disk.
I have done mem test it runs fine. CPU is set to never exceed 90 in bios. Runs for weeks under windows with zero problems. All components are newly purchased. I've stress tested the CPU under windows zero issues.

I have installed proxmox multiple times the end result is always the same. Proxmox is installed on an nVMe drive. I disabled APM and spin down on all drives. The only thing that fixes it for me is to disable C-states in bios. Which the idle power goes from 40w->120w
 
Last edited:
The only thing that fixes it for me is to disable C-states in bios. Which the idle power goes from 40w->120w
I'm unfamiliar with how c-states work under the hood, but have you tried disabling them but installing cpufreqd and configure it to step-down the cpu? Maybe that can counteract it somehow.
 
I'm unfamiliar with how c-states work under the hood, but have you tried disabling them but installing cpufreqd and configure it to step-down the cpu? Maybe that can counteract it somehow.
C-states just puts the cpu cores in lower power states. C0 being fully active to C6 being the deepest power state removes power to the core. The deeper the C state the longer it takes for the CPUs to wake up to service new requests. Which is why when I disable C-states idle power goes from 40w->120w

Bios allows me to play with the C-states. Intel I believe says you get shorter CPU life with C-States disabled. I can check bios see if I can limit C-states to some higher level. Doesnt solve my problem but might be an additional clue.
 
@LnxBil Do not want to Jinx it but appears to have survived a full day which it never has before. I'll report back if it stays up and then I'll probably want to understand what that is actually doing to cause it not to crash.
 
Last edited:
damn......improved but ultimately failed

May 26 06:15:07 HOME-SERVER systemd[1]: Finished Daily apt upgrade and clean activities.
May 26 08:33:28 HOME-SERVER smartd[1083]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 69 to 70
May 26 08:33:28 HOME-SERVER smartd[1083]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 31 to 30
May 26 08:33:34 HOME-SERVER smartd[1083]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 69 to 70
May 26 08:33:34 HOME-SERVER smartd[1083]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 31 to 30
May 26 09:00:19 HOME-SERVER kernel: general protection fault, probably for non-canonical address 0xff8f4800924e08: 0000 [#1] PREEMPT SMP NOPTI
May 26 09:00:19 HOME-SERVER kernel: CPU: 12 PID: 429776 Comm: lxc-info Tainted: P O 6.2.11-2-pve #1
May 26 09:00:19 HOME-SERVER kernel: Hardware name: ASUS System Product Name/ROG STRIX B760-I GAMING WIFI, BIOS 1003 04/14/2023
May 26 09:00:19 HOME-SERVER kernel: RIP: 0010:vma_interval_tree_insert+0x37/0xc0
May 26 09:00:19 HOME-SERVER kernel: Code: 4c 8d 5f 28 48 8b 47 08 48 2b 07 48 c1 e8 0c 49 8d 74 00 ff 49 8b 02 48 89 e5 48 85 c0 74 66 41 b9 01 00 00 00 eb 03 48 89 d0 <48> 39 70 18 73 04 48 89 70 18 48>
May 26 09:00:19 HOME-SERVER kernel: RSP: 0018:ffffa8cf474579d0 EFLAGS: 00010206
May 26 09:00:19 HOME-SERVER kernel: RAX: 00ff8f4800924df0 RBX: ffff8f4849e1e7e0 RCX: ffff8f484f4d5c68
May 26 09:00:19 HOME-SERVER kernel: RDX: 00ff8f4800924df0 RSI: 0000000000000019 RDI: ffff8f4849e1e7e0
May 26 09:00:19 HOME-SERVER kernel: RBP: ffffa8cf474579d0 R08: 0000000000000014 R09: 0000000000000000
May 26 09:00:19 HOME-SERVER kernel: R10: ffff8f4843415210 R11: ffff8f4849e1e808 R12: ffff8f4849e1ec78
May 26 09:00:19 HOME-SERVER kernel: R13: ffff8f48567c6880 R14: 00007fe89f936000 R15: 00007fe89f94d000
May 26 09:00:19 HOME-SERVER kernel: FS: 0000000000000000(0000) GS:ffff8f577f300000(0000) knlGS:0000000000000000
May 26 09:00:19 HOME-SERVER kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 26 09:00:19 HOME-SERVER kernel: CR2: 00007fe89f94d028 CR3: 0000000106482006 CR4: 0000000000770ee0


May 26 09:00:19 HOME-SERVER kernel: </TASK>
May 26 09:00:19 HOME-SERVER kernel: Modules linked in: tcp_diag inet_diag cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tabl>
May 26 09:00:19 HOME-SERVER kernel: nct6775_core hwmon_vid coretemp vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_>
May 26 09:00:19 HOME-SERVER kernel: ---[ end trace 0000000000000000 ]---
May 26 09:00:19 HOME-SERVER kernel: pstore: backend (efi_pstore) writing error (-5)
May 26 09:00:19 HOME-SERVER kernel: RIP: 0010:vma_interval_tree_insert+0x37/0xc0
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!