Error crashing kernel

ricardolanes

New Member
Jan 17, 2024
9
0
1
42
Brazil, Rio de Janeiro
lanes.rio.br
Hello, I have a problem. My PVE is crashing randomly, at the moment everything is working, and at any moment without even touching it, this error message appears on the display and nothing else works (GUI, SHELL...), the VM/CT freezes, I have to force the server to shut down with the button.

Virtual Environment 8.2.6
Linux pve 6.8.12-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-2 (2024-09-05T10:03Z) x86_64

System Log (PVE)

Sep 22 15:33:00 pve postfix/qmgr[1018]: 7757F28033F: from=<>, size=41663, nrcpt=1 (queue active)
Sep 22 15:33:00 pve postfix/qmgr[1018]: 4CE9D2802DC: from=<>, size=9643, nrcpt=1 (queue active)
Sep 22 15:33:00 pve postfix/qmgr[1018]: BE2E4280372: from=<>, size=29077, nrcpt=1 (queue active)
Sep 22 15:33:00 pve postfix/local[72470]: error: open database /etc/aliases.db: No such file or directory
Sep 22 15:33:00 pve postfix/local[72470]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Sep 22 15:33:00 pve postfix/local[72470]: warning: hash:/etc/aliases: lookup of 'root' failed
Sep 22 15:33:00 pve postfix/local[72471]: error: open database /etc/aliases.db: No such file or directory
Sep 22 15:33:00 pve postfix/local[72471]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Sep 22 15:33:00 pve postfix/local[72471]: warning: hash:/etc/aliases: lookup of 'root' failed
Sep 22 15:33:00 pve postfix/local[72470]: 7757F28033F: to=<root@pve.intra>, relay=local, delay=46598, delays=46598/0.01/0/0.01, dsn=4.3.0, status=deferred (alias database unavailable)
Sep 22 15:33:00 pve postfix/local[72470]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Sep 22 15:33:00 pve postfix/local[72470]: warning: hash:/etc/aliases: lookup of 'root' failed
Sep 22 15:33:00 pve postfix/local[72471]: 4CE9D2802DC: to=<root@pve.intra>, relay=local, delay=63372, delays=63372/0.01/0/0, dsn=4.3.0, status=deferred (alias database unavailable)
Sep 22 15:33:00 pve postfix/local[72470]: BE2E4280372: to=<root@pve.intra>, relay=local, delay=538, delays=538/0.01/0/0.01, dsn=4.3.0, status=deferred (alias database unavailable)
Sep 22 15:43:00 pve postfix/qmgr[1018]: BE2E4280372: from=<>, size=29077, nrcpt=1 (queue active)
Sep 22 15:43:00 pve postfix/local[78706]: error: open database /etc/aliases.db: No such file or directory
Sep 22 15:43:00 pve postfix/local[78706]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Sep 22 15:43:00 pve postfix/local[78706]: warning: hash:/etc/aliases: lookup of 'root' failed
Sep 22 15:43:00 pve postfix/local[78706]: BE2E4280372: to=<root@pve.intra>, relay=local, delay=1139, delays=1139/0.01/0/0.01, dsn=4.3.0, status=deferred (alias database unavailable)
Sep 22 15:49:38 pve kernel: perf: interrupt took too long (3972 > 3913), lowering kernel.perf_event_max_sample_rate to 50000
Sep 22 15:51:50 pve kernel: general protection fault, probably for non-canonical address 0xffd09d6bc1e6ce08: 0000 [#1] PREEMPT SMP NOPTI
Sep 22 15:51:50 pve kernel: CPU: 1 PID: 76488 Comm: kworker/u8:5 Tainted: P O 6.8.12-2-pve #1
Sep 22 15:51:50 pve kernel: Hardware name: Default string Default string/Default string, BIOS GF1264NP126LV11R003 11/16/2023
Sep 22 15:51:50 pve kernel: Workqueue: dm-thin do_worker [dm_thin_pool]
Sep 22 15:51:50 pve kernel: RIP: 0010:do_worker+0x815/0xd60 [dm_thin_pool]
Sep 22 15:51:50 pve kernel: Code: 0e fb 4d 85 e4 75 e4 e8 e9 11 b5 fa 4d 8b 65 00 4c 39 a5 48 ff ff ff 0f 84 ad fe ff ff 49 8d bc 24 88 00 00 00 b8 01 00 00 00 <f0> 41 0f c1 84 24 88 00 00 00 85 c0 0f 84 8b 03 00 00 8d 50 01 09
Sep 22 15:51:50 pve kernel: RSP: 0018:ffffaf4a0fbd7d68 EFLAGS: 00010202
Sep 22 15:51:50 pve kernel: RAX: 0000000000000001 RBX: ffffaf4a0fbd7de8 RCX: 0000000000000000
Sep 22 15:51:50 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffd09d6bc1e6ce08
Sep 22 15:51:50 pve kernel: RBP: ffffaf4a0fbd7e40 R08: 0000000000000000 R09: 0000000000000000
Sep 22 15:51:50 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffd09d6bc1e6cd80
Sep 22 15:51:50 pve kernel: R13: ffff9d6bc3192c00 R14: ffff9d6bd1d75a05 R15: ffff9d6bc3192c4c
Sep 22 15:51:50 pve kernel: FS: 0000000000000000(0000) GS:ffff9d6f2fa80000(0000) knlGS:0000000000000000
Sep 22 15:51:50 pve kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 22 15:51:50 pve kernel: CR2: 00002cac02b20000 CR3: 000000031e636002 CR4: 0000000000f72ef0
Sep 22 15:51:50 pve kernel: PKRU: 55555554
Sep 22 15:51:50 pve kernel: Call Trace:
Sep 22 15:51:50 pve kernel: <TASK>
Sep 22 15:51:50 pve kernel: ? show_regs+0x6d/0x80
Sep 22 15:51:50 pve kernel: ? die_addr+0x37/0xa0
Sep 22 15:51:50 pve kernel: ? exc_general_protection+0x1db/0x480
Sep 22 15:51:50 pve kernel: ? asm_exc_general_protection+0x27/0x30
Sep 22 15:51:50 pve kernel: ? do_worker+0x815/0xd60 [dm_thin_pool]
Sep 22 15:51:50 pve kernel: ? do_worker+0x7f7/0xd60 [dm_thin_pool]
Sep 22 15:51:50 pve kernel: ? finish_task_switch.isra.0+0x8c/0x310
Sep 22 15:51:50 pve kernel: process_one_work+0x16a/0x350
Sep 22 15:51:50 pve kernel: worker_thread+0x306/0x440
Sep 22 15:51:50 pve kernel: ? __pfx_worker_thread+0x10/0x10
Sep 22 15:51:50 pve kernel: kthread+0xef/0x120
Sep 22 15:51:50 pve kernel: ? __pfx_kthread+0x10/0x10
Sep 22 15:51:50 pve kernel: ret_from_fork+0x44/0x70
Sep 22 15:51:50 pve kernel: ? __pfx_kthread+0x10/0x10
Sep 22 15:51:50 pve kernel: ret_from_fork_asm+0x1b/0x30
Sep 22 15:51:50 pve kernel: </TASK>
Sep 22 15:51:50 pve kernel: Modules linked in: tcp_diag inet_diag nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment nft_compat ip_set_hash_net bluetooth ecdh_generic ecc cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables bonding tls softdog sunrpc nfnetlink_log binfmt_misc nfnetlink xe drm_gpuvm drm_exec gpu_sched drm_suballoc_helper drm_ttm_helper snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_sof_pci_intel_tgl snd_sof_intel_hda_common kvm_intel soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda kvm snd_sof_pci snd_sof_xtensa_dsp snd_sof irqbypass snd_sof_utils snd_soc_hdac_hda crct10dif_pclmul polyval_clmulni snd_hda_ext_core polyval_generic ghash_clmulni_intel snd_soc_acpi_intel_match sha256_ssse3 snd_soc_acpi sha1_ssse3 soundwire_generic_allocation aesni_intel soundwire_bus crypto_simd snd_soc_core cryptd snd_compress ac97_bus
Sep 22 15:51:50 pve kernel: snd_pcm_dmaengine i915 snd_hda_intel snd_intel_dspcfg rapl snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm drm_buddy intel_cstate cmdlinepart ttm snd_timer pcspkr wmi_bmof spi_nor snd drm_display_helper mtd cec soundcore rc_core i2c_algo_bit igen6_edac serial_multi_instantiate intel_pmc_core intel_vsec pmt_telemetry pmt_class acpi_pad acpi_tad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c uas usb_storage nvme spi_intel_pci i2c_i801 xhci_pci xhci_pci_renesas ahci i2c_smbus spi_intel libahci crc32_pclmul sdhci_pci xhci_hcd nvme_core cqhci nvme_auth sdhci igc video wmi
Sep 22 15:51:50 pve kernel: ---[ end trace 0000000000000000 ]---
Sep 22 15:51:52 pve kernel: pstore: backend (efi_pstore) writing error (-5)
Sep 22 15:51:52 pve kernel: RIP: 0010:do_worker+0x815/0xd60 [dm_thin_pool]
Sep 22 15:51:52 pve kernel: Code: 0e fb 4d 85 e4 75 e4 e8 e9 11 b5 fa 4d 8b 65 00 4c 39 a5 48 ff ff ff 0f 84 ad fe ff ff 49 8d bc 24 88 00 00 00 b8 01 00 00 00 <f0> 41 0f c1 84 24 88 00 00 00 85 c0 0f 84 8b 03 00 00 8d 50 01 09
Sep 22 15:51:52 pve kernel: RSP: 0018:ffffaf4a0fbd7d68 EFLAGS: 00010202
Sep 22 15:51:52 pve kernel: RAX: 0000000000000001 RBX: ffffaf4a0fbd7de8 RCX: 0000000000000000
Sep 22 15:51:52 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffd09d6bc1e6ce08
Sep 22 15:51:52 pve kernel: RBP: ffffaf4a0fbd7e40 R08: 0000000000000000 R09: 0000000000000000
Sep 22 15:51:52 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffd09d6bc1e6cd80
Sep 22 15:51:52 pve kernel: R13: ffff9d6bc3192c00 R14: ffff9d6bd1d75a05 R15: ffff9d6bc3192c4c
Sep 22 15:51:52 pve kernel: FS: 0000000000000000(0000) GS:ffff9d6f2fa80000(0000) knlGS:0000000000000000
Sep 22 15:51:52 pve kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 22 15:51:52 pve kernel: CR2: 00002cac02b20000 CR3: 0000000325112005 CR4: 0000000000f72ef0
Sep 22 15:51:52 pve kernel: PKRU: 55555554


Can someone help me find out what it is and how to fix it?

Thanks!
 
Absolutamente o mesmo comportamento aqui. Estou investigando isso há uma semana e, por enquanto, decidi fixar 6.5.13-6-pve.

Este tópico pode ajudar você também: https://forum.proxmox.com/threads/proxmox-freeze-nach-kernel-update-to-6-8-4-2-pve.145920/page-4

Hey my friend, thank you very much for the topic, I went back to just 1 version 6.8.12-1-pve, I'll check the behavior, and if it's still bad, I'll go back to more versions.

thank you very much!

sorry my english for google translate :p
 
Kernel 6.8.12.1-pve crashing

Sep 22 22:36:36 pve kernel: get_swap_device: Bad swap offset entry 3bfffffffffff
Sep 22 22:36:36 pve kernel: BUG: Bad page map in process pvestatd pte:80000000000000 pmd:30430e067
Sep 22 22:36:36 pve kernel: addr:00007e1d0cbdc000 vm_flags:08000071 anon_vma:0000000000000000 mapping:ffff98f8022b99c0 index:1dc
Sep 22 22:36:36 pve kernel: file:locale-archive fault:filemap_fault mmap:ext4_file_mmap read_folio:ext4_read_folio
Sep 22 22:36:36 pve kernel: CPU: 2 PID: 82267 Comm: pvestatd Tainted: P O 6.8.12-1-pve #1
Sep 22 22:36:36 pve kernel: Hardware name: Default string Default string/Default string, BIOS GF1264NP126LV11R003 11/16/2023
Sep 22 22:36:36 pve kernel: Call Trace:
Sep 22 22:36:36 pve kernel: <TASK>
Sep 22 22:36:36 pve kernel: dump_stack_lvl+0x76/0xa0
Sep 22 22:36:36 pve kernel: dump_stack+0x10/0x20
Sep 22 22:36:36 pve kernel: print_bad_pte+0x1b2/0x270
Sep 22 22:36:36 pve kernel: unmap_page_range+0xfa4/0x1180
Sep 22 22:36:36 pve kernel: unmap_single_vma+0x89/0xf0
Sep 22 22:36:36 pve kernel: unmap_vmas+0xb5/0x190
Sep 22 22:36:36 pve kernel: exit_mmap+0x10a/0x3f0
Sep 22 22:36:36 pve kernel: __mmput+0x41/0x140
Sep 22 22:36:36 pve kernel: mmput+0x31/0x40
Sep 22 22:36:36 pve kernel: do_exit+0x324/0xae0
Sep 22 22:36:36 pve kernel: ? __count_memcg_events+0x6f/0xe0
Sep 22 22:36:36 pve kernel: do_group_exit+0x35/0x90
Sep 22 22:36:36 pve kernel: __x64_sys_exit_group+0x18/0x20
Sep 22 22:36:36 pve kernel: x64_sys_call+0x1822/0x24b0
Sep 22 22:36:36 pve kernel: do_syscall_64+0x81/0x170
Sep 22 22:36:36 pve kernel: ? irqentry_exit+0x43/0x50
Sep 22 22:36:36 pve kernel: ? exc_page_fault+0x94/0x1b0
Sep 22 22:36:36 pve kernel: entry_SYSCALL_64_after_hwframe+0x78/0x80
Sep 22 22:36:36 pve kernel: RIP: 0033:0x7e1d0ce62349
Sep 22 22:36:36 pve kernel: Code: Unable to access opcode bytes at 0x7e1d0ce6231f.
Sep 22 22:36:36 pve kernel: RSP: 002b:00007ffc2d9ec0b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
Sep 22 22:36:36 pve kernel: RAX: ffffffffffffffda RBX: 000059a0bd26f2a0 RCX: 00007e1d0ce62349
Sep 22 22:36:36 pve kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
Sep 22 22:36:36 pve kernel: RBP: 0000000000000001 R08: ffffffffffffff78 R09: 0000000000000000
Sep 22 22:36:36 pve kernel: R10: 00007e1d0cda0200 R11: 0000000000000206 R12: 0000000000000009
Sep 22 22:36:36 pve kernel: R13: 0000000000000000 R14: 000059a0c2e80f70 R15: 000059a0bd7ca7d8
Sep 22 22:36:36 pve kernel: </TASK>
Sep 22 22:36:36 pve kernel: BUG: Bad rss-counter state mm:0000000052279baa type:MM_SWAPENTS val:-1
 
it also crashed on 6.5 for me. I contacted the support of my server provider and asked for a BIOS update. Until now it is working. Before, I could trigger the error with my wireguard in a lxc container. As in the thread I have referenced the BIOS update really seems to help.
 
  • Like
Reactions: Falk R.
Hello my friends, yesterday (09/24) I made some changes to proxmox that seem to have solved the problem so far.

1. I turned off a crontab that I used to read the temperature sensors.

1727270218189.png

2. I removed some VMs (ubuntu desktop and kali shell) both as host CPU

1727270248874.png

3. I activated VLAN aware on the LAN, this was causing some errors in the system log:
..entered blocking state
..entered forwarding state
..Link is down


1727270142502.png

I believe this could be a compatibility issue with VM/CT, if possible, turn off some, or if you are using host, switch to virtual CPU, it is just a suggestion.

Well, these were the changes I made, I have been 12 hours without crashing (I was not online for 4 to 5 hours), it seems to be solved, I will report back within the next few hours.

We are trying..
 
Thank you for reporting back. Indeed I had the same messages in the journal regarding the network interfaces.
And i can confirm making the bridges VLAN aware made these messages vanish. did not have any effect.
I am not sure regarding the cronjob you talked about, since it seems useful to me... - lets see if the vlan setting fixed it already.
 
Last edited:
does not work for me. :(

I have replied to this post about this with some image and how this error can be reproduced.
https://forum.proxmox.com/threads/proxmox-kernel-6-8-12-2-freezes-again.154875/post-706206

Hi, I saw that you still have a problem, I'm working hard on this to find the reason for the problem. Yesterday I turned my crontab back on and less than 10 minutes later the problem started again.
So I went to see if there was anything related to the lm-sensors package I was using, and I found this.
https://community.frame.work/t/resp...ror-causes-missing-sensors-on-linux-6-7/47767

Now I turned off the crontab and my PVE is back to normal, I now have 12 hours of Uptime again.

This could be one of the problems, but it could be just this one.

Check if you have this package (lm-sensors) installed, if so, remove it.

apt remove lm-sensors
 
Hi, I saw that you still have a problem, I'm working hard on this to find the reason for the problem. Yesterday I turned my crontab back on and less than 10 minutes later the problem started again.
So I went to see if there was anything related to the lm-sensors package I was using, and I found this.
https://community.frame.work/t/resp...ror-causes-missing-sensors-on-linux-6-7/47767

Now I turned off the crontab and my PVE is back to normal, I now have 12 hours of Uptime again.

This could be one of the problems, but it could be just this one.

Check if you have this package (lm-sensors) installed, if so, remove it.

apt remove lm-sensors
I don't have any crontab, and I can reproduce this error by just clicking 'Refresh' (pve web). ^^

I have even stopped the crontab completely, and it still happens.
 
Last edited:
I don't have any crontab, and I can reproduce this error by just clicking 'Refresh' (pve web). ^^

I have even stopped the crontab completely, and it still happens.
Me neither. Does your system completely crash? After a bios update my system is running for 3 days without totally freezing. However, I still have random segmentation faults with pvestatd.
A simple
Code:
service restart pvestatd
makes the webinterface available again.
 
Me neither. Does your system completely crash? After a bios update my system is running for 3 days without totally freezing. However, I still have random segmentation faults with pvestatd.
A simple
Code:
service restart pvestatd
makes the webinterface available again.
when I play with pve web manager... there is the problem, if I don't touch anything, everything works fine... but the error is still there.

and crashes can occur at any time


Does your system completely crash?
if you see the images in the other post, you can see that sometimes it crashes completely, and sometimes it's just the error but the kernel still works.
 
Last edited:
when I play with pve web manager... there is the problem, if I don't touch anything, everything works fine... but the error is still there.

and crashes can occur at any time
I understand that, but do you get CPU softlocks/hardlocks so you have to restart the machine, or is it sufficient to restart the service.
 
I understand that, but do you get CPU softlocks/hardlocks so you have to restart the machine, or is it sufficient to restart the service.
Not always, but in most cases I have to manually restart the entire server.

- At this point I am going to reinstall proxmox on the same hardware, different disks and without upgrading anything from proxmox to see what happens.

I have had the server active for more than 7 months and nothing has happened, that it is happening now seems very strange to me.
 
Last edited:
Not always, but in most cases I have to manually restart the entire server.

- At this point I am going to reinstall proxmox on the same hardware, different disks and without upgrading anything from proxmox to see what happens.

I have had the server active for more than 7 months and nothing has happened, that it is happening now seems very strange to me.
But it happens after an upgrade, right? If yes I doubt it is your hardware. Have you direct access to the server (e.g. homeserver) and can you conduct a BIOS update?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!