[SOLVED] Proxmox constantly crashes after few mins

bossmania · Aug 10, 2024

Edit: My problem was faulty ram sticks.

So I recently installed the latest version of proxmox with ZFS raid 1 on a new machine with the following specs:

Mobo: MSI MAG B550 Tomahawk Max Wifi
Cpu: AMD RYZEN 7 5700X
Cpu Fan: Noctua NH-D9L
Ram: HYPERX Fury 32GB (4x8GB) DDR4 2400MHz
Gpu: Nivida Gtx 1030
Storage: SanDisk SSD PLUS 240GB
2nd Storage: Western Digital 240GB WD Green Internal SSD
Psu: EVGA 650 BP, 80+ BRONZE 650W

However, every time I boot up proxmox on the machine, proxmox would crash in a few mins without fail. It also immediately crash when I access the web ui on Chrome and Firefox.

I attach the output when that happened. I apologize for the pics being shit, since I don't have another way I can think of getting the output without using my phone camera.

esi_y · Aug 11, 2024

Upon (the involuntary) restart, can you log into the console and provide output of journalctl -k -b -1 | tail -100?

Can you connect to the machine via SSH while it's running?

bossmania · Aug 12, 2024

esi_y said:
Upon (the involuntary) restart, can you log into the console and provide output of journalctl -k -b -1 | tail -100?

Can you connect to the machine via SSH while it's running?

This is the output of that command:

Bash:

Aug 10 11:03:54 pve kernel: Call Trace:
Aug 10 11:03:54 pve kernel:  <TASK>
Aug 10 11:03:54 pve kernel:  dump_stack_lvl+0x48/0x70
Aug 10 11:03:54 pve kernel:  dump_stack+0x10/0x20
Aug 10 11:03:54 pve kernel:  bad_page+0x76/0x120
Aug 10 11:03:54 pve kernel:  __rmqueue_pcplist+0x218/0x8c0
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  get_page_from_freelist+0x674/0x1200
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? __mod_memcg_state+0x71/0x130
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? memcg_account_kmem+0x1e/0x60
Aug 10 11:03:54 pve kernel:  __alloc_pages+0x251/0x1320
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? __alloc_pages+0x286/0x1320
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? release_pages+0x152/0x4c0
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? policy_nodemask+0xe1/0x150
Aug 10 11:03:54 pve kernel:  alloc_pages_mpol+0x101/0x1f0
Aug 10 11:03:54 pve kernel:  vma_alloc_folio+0x64/0xe0
Aug 10 11:03:54 pve kernel:  do_huge_pmd_anonymous_page+0xbb/0x740
Aug 10 11:03:54 pve kernel:  __handle_mm_fault+0xbe1/0xef0
Aug 10 11:03:54 pve kernel:  ? native_smp_send_reschedule+0x1f/0x50
Aug 10 11:03:54 pve kernel:  handle_mm_fault+0x18d/0x380
Aug 10 11:03:54 pve kernel:  __get_user_pages+0x149/0x6c0
Aug 10 11:03:54 pve kernel:  get_user_pages_unlocked+0xe8/0x370
Aug 10 11:03:54 pve kernel:  hva_to_pfn+0xb6/0x540 [kvm]
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? clockevents_program_event+0xb6/0x140
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  __gfn_to_pfn_memslot+0xb5/0x150 [kvm]
Aug 10 11:03:54 pve kernel:  kvm_faultin_pfn+0x123/0x670 [kvm]
Aug 10 11:03:54 pve kernel:  kvm_tdp_page_fault+0x11c/0x170 [kvm]
Aug 10 11:03:54 pve kernel:  kvm_mmu_do_page_fault+0x1b4/0x1f0 [kvm]
Aug 10 11:03:54 pve kernel:  kvm_mmu_page_fault+0x90/0x700 [kvm]
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? kvm_pmu_trigger_event+0x56/0x160 [kvm]
Aug 10 11:03:54 pve kernel:  ? svm_set_msr+0x53b/0x7e0 [kvm_amd]
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? kvm_complete_insn_gp+0x75/0x90 [kvm]
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  npf_interception+0x47/0xc0 [kvm_amd]
Aug 10 11:03:54 pve kernel:  svm_invoke_exit_handler+0x183/0x1b0 [kvm_amd]
Aug 10 11:03:54 pve kernel:  svm_handle_exit+0xa2/0x200 [kvm_amd]
Aug 10 11:03:54 pve kernel:  ? svm_vcpu_run+0x2cb/0x830 [kvm_amd]
Aug 10 11:03:54 pve kernel:  kvm_arch_vcpu_ioctl_run+0xd5b/0x1760 [kvm]
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? io_interception+0xf5/0x120 [kvm_amd]
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? kvm_arch_vcpu_put+0x1a6/0x200 [kvm]
Aug 10 11:03:54 pve kernel:  kvm_vcpu_ioctl+0x297/0x800 [kvm]
Aug 10 11:03:54 pve kernel:  ? kvm_arch_vcpu_ioctl_run+0x471/0x1760 [kvm]
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? kvm_vcpu_ioctl+0x30e/0x800 [kvm]
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  __x64_sys_ioctl+0xa3/0xf0
Aug 10 11:03:54 pve kernel:  do_syscall_64+0x87/0x180
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? fire_user_return_notifiers+0x3a/0x80
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? syscall_exit_to_user_mode+0x86/0x260
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? kvm_on_user_return+0x78/0xd0 [kvm]
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? fire_user_return_notifiers+0x3a/0x80
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? syscall_exit_to_user_mode+0x86/0x260
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? do_syscall_64+0x93/0x180
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? do_syscall_64+0x93/0x180
Aug 10 11:03:54 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 10 11:03:54 pve kernel:  ? do_syscall_64+0x93/0x180
Aug 10 11:03:54 pve kernel:  ? do_syscall_64+0x93/0x180
Aug 10 11:03:54 pve kernel:  ? do_syscall_64+0x93/0x180
Aug 10 11:03:54 pve kernel:  ? exc_page_fault+0x94/0x1b0
Aug 10 11:03:54 pve kernel:  entry_SYSCALL_64_after_hwframe+0x73/0x7b
Aug 10 11:03:54 pve kernel: RIP: 0033:0x7625cb276c5b
Aug 10 11:03:54 pve kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Aug 10 11:03:54 pve kernel: RSP: 002b:00007625c59faf30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 10 11:03:54 pve kernel: RAX: ffffffffffffffda RBX: 0000654f8fc76740 RCX: 00007625cb276c5b
Aug 10 11:03:54 pve kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000020
Aug 10 11:03:54 pve kernel: RBP: 000000000000ae80 R08: 0000654f8ef01c90 R09: 0000000000000000
Aug 10 11:03:54 pve kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Aug 10 11:03:54 pve kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
Aug 10 11:03:54 pve kernel:  </TASK>
Aug 10 11:03:59 pve kernel: show_signal_msg: 16 callbacks suppressed
Aug 10 11:03:59 pve kernel: pveproxy worker[1512]: segfault at 58d05811c430 ip 00005890559acf23 sp 00007fffce2a9da0 error 4 in perl[5890558dc000+195000] likely on CPU 1 (core 1, socket 0)
Aug 10 11:03:59 pve kernel: Code: 24 08 48 8b 34 24 48 89 c5 eb 84 66 2e 0f 1f 84 00 00 00 00 00 66 90 53 48 8b 47 08 48 89 fb 0f 1f 84 00 00 00 00 00 48 89 df <ff> 50 10 48 89 43 08 48 85 c0 75 f1 8b 83 24 06 00 00 85 c0 75 0f
Aug 10 11:04:04 pve kernel: traps: pveproxy worker[2026] general protection fault ip:5890559c61f7 sp:7fffce2a9d40 error:0 in perl[5890558dc000+195000]
Aug 10 11:04:09 pve kernel: traps: pveproxy worker[2032] general protection fault ip:5890559c61f7 sp:7fffce2a9d40 error:0 in perl[5890558dc000+195000]
Aug 10 11:04:14 pve kernel: traps: pveproxy worker[2063] general protection fault ip:5890559c61f7 sp:7fffce2a9d40 error:0 in perl[5890558dc000+195000]
Aug 10 11:04:19 pve kernel: traps: pveproxy worker[2066] general protection fault ip:5890559c61f7 sp:7fffce2a9d40 error:0 in perl[5890558dc000+195000]
Aug 10 11:04:24 pve kernel: traps: pveproxy worker[2089] general protection fault ip:5890559c61f7 sp:7fffce2a9d40 error:0 in perl[5890558dc000+195000]
Aug 10 11:04:29 pve kernel: traps: pveproxy worker[2093] general protection fault ip:5890559c61f7 sp:7fffce2a9d40 error:0 in perl[5890558dc000+195000]
Aug 10 11:04:34 pve kernel: traps: pveproxy worker[2116] general protection fault ip:5890559c61f7 sp:7fffce2a9d40 error:0 in perl[5890558dc000+195000]

bossmania · Aug 12, 2024

So for some reason, when I booted the server after 24 hours, the server was working just fine. I was able to create a boot a vm, and run a cpu benchmark without any issues. I then decided to reboot the pc to see if the issues was fully fixed after 30 mins of runtime. The pc frooze during the reboot, and I had to hold the power button to turn it off and on. I then acess the web UI and turn on a vm. However, the vncproxy failed to connect to the vm, and then crashed the pc.

The log messages:

Bash:

Aug 11 21:06:12 pve kernel:  entry_SYSCALL_64_after_hwframe+0x73/0x7b
Aug 11 21:06:12 pve kernel: RIP: 0033:0x77de7cfd2c5b
Aug 11 21:06:12 pve kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Aug 11 21:06:12 pve kernel: RSP: 002b:000077de721faf30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 11 21:06:12 pve kernel: RAX: ffffffffffffffda RBX: 00006464d04eee30 RCX: 000077de7cfd2c5b
Aug 11 21:06:12 pve kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000022
Aug 11 21:06:12 pve kernel: RBP: 000000000000ae80 R08: 00006464ce4cec90 R09: 0000000000000000
Aug 11 21:06:12 pve kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Aug 11 21:06:12 pve kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
Aug 11 21:06:12 pve kernel:  </TASK>
Aug 11 21:06:12 pve kernel: BUG: Bad page state in process CPU 0/KVM  pfn:276d23
Aug 11 21:06:12 pve kernel: page:00000000fb5aca39 refcount:0 mapcount:0 mapping:00000000e235f79b index:0x0 pfn:0x276d23
Aug 11 21:06:12 pve kernel: invalid mapping:2000000000000000
Aug 11 21:06:12 pve kernel: flags: 0x17ffffe0000000(node=0|zone=2|lastcpupid=0x3fffff)
Aug 11 21:06:12 pve kernel: page_type: 0xffffffff()
Aug 11 21:06:12 pve kernel: raw: 0017ffffe0000000 ffffe06b89db48c8 ffffe06b89db48c8 2000000000000000
Aug 11 21:06:12 pve kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
Aug 11 21:06:12 pve kernel: page dumped because: non-NULL mapping
Aug 11 21:06:12 pve kernel: Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables sunrpc bonding tls softdog nfnetlink_log nfnetlink binfmt_misc intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd nouveau kvm snd_hda_codec_realtek mt7921e snd_hda_codec_generic mt7921_common snd_hda_codec_hdmi mt792x_lib mxm_wmi irqbypass drm_gpuvm mt76_connac_lib drm_exec crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_intel gpu_sched mt76 drm_ttm_helper snd_intel_dspcfg snd_intel_sdw_acpi ghash_clmulni_intel snd_hda_codec ttm sha256_ssse3 sha1_ssse3 mac80211 drm_display_helper aesni_intel snd_hda_core snd_hwdep crypto_simd snd_pcm cryptd cec cfg80211 snd_timer rc_core snd i2c_algo_bit video soundcore ccp libarc4 rapl pcspkr wmi_bmof k10temp mac_hid vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c crc32_pclmul r8169 ahci i2c_piix4 realtek libahci wmi
Aug 11 21:06:12 pve kernel:  gpio_amdpt
Aug 11 21:06:12 pve kernel: CPU: 13 PID: 2264 Comm: CPU 0/KVM Tainted: P    B      O       6.8.4-2-pve #1
Aug 11 21:06:12 pve kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C91/MAG B550 TOMAHAWK MAX WIFI (MS-7C91), BIOS 2.60 10/10/2023
Aug 11 21:06:12 pve kernel: Call Trace:
Aug 11 21:06:12 pve kernel:  <TASK>
Aug 11 21:06:12 pve kernel:  dump_stack_lvl+0x48/0x70
Aug 11 21:06:12 pve kernel:  dump_stack+0x10/0x20
Aug 11 21:06:12 pve kernel:  bad_page+0x76/0x120
Aug 11 21:06:12 pve kernel:  __rmqueue_pcplist+0x218/0x8c0
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  get_page_from_freelist+0x674/0x1200
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? __mod_memcg_state+0x71/0x130
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? memcg_account_kmem+0x1e/0x60
Aug 11 21:06:12 pve kernel:  __alloc_pages+0x251/0x1320
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? __alloc_pages+0x286/0x1320
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? release_pages+0x152/0x4c0
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? policy_nodemask+0xe1/0x150
Aug 11 21:06:12 pve kernel:  alloc_pages_mpol+0x101/0x1f0
Aug 11 21:06:12 pve kernel:  vma_alloc_folio+0x64/0xe0
Aug 11 21:06:12 pve kernel:  do_huge_pmd_anonymous_page+0xbb/0x740
Aug 11 21:06:12 pve kernel:  __handle_mm_fault+0xbe1/0xef0
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  handle_mm_fault+0x18d/0x380
Aug 11 21:06:12 pve kernel:  __get_user_pages+0x149/0x6c0
Aug 11 21:06:12 pve kernel:  get_user_pages_unlocked+0xe8/0x370
Aug 11 21:06:12 pve kernel:  hva_to_pfn+0xb6/0x540 [kvm]
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  __gfn_to_pfn_memslot+0xb5/0x150 [kvm]
Aug 11 21:06:12 pve kernel:  kvm_faultin_pfn+0x123/0x670 [kvm]
Aug 11 21:06:12 pve kernel:  kvm_tdp_page_fault+0x11c/0x170 [kvm]
Aug 11 21:06:12 pve kernel:  kvm_mmu_do_page_fault+0x1b4/0x1f0 [kvm]
Aug 11 21:06:12 pve kernel:  kvm_mmu_page_fault+0x90/0x700 [kvm]
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? kvm_pmu_trigger_event+0x56/0x160 [kvm]
Aug 11 21:06:12 pve kernel:  npf_interception+0x47/0xc0 [kvm_amd]
Aug 11 21:06:12 pve kernel:  svm_invoke_exit_handler+0x183/0x1b0 [kvm_amd]
Aug 11 21:06:12 pve kernel:  svm_handle_exit+0xa2/0x200 [kvm_amd]
Aug 11 21:06:12 pve kernel:  ? svm_vcpu_run+0x2cb/0x830 [kvm_amd]
Aug 11 21:06:12 pve kernel:  kvm_arch_vcpu_ioctl_run+0xd5b/0x1760 [kvm]
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  kvm_vcpu_ioctl+0x297/0x800 [kvm]
Aug 11 21:06:12 pve kernel:  ? kvm_vcpu_ioctl+0x30e/0x800 [kvm]
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? kvm_on_user_return+0x78/0xd0 [kvm]
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? kvm_on_user_return+0x78/0xd0 [kvm]
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? fire_user_return_notifiers+0x3a/0x80
Aug 11 21:06:12 pve kernel:  __x64_sys_ioctl+0xa3/0xf0
Aug 11 21:06:12 pve kernel:  do_syscall_64+0x87/0x180
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? kvm_on_user_return+0x78/0xd0 [kvm]
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? fire_user_return_notifiers+0x3a/0x80
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? syscall_exit_to_user_mode+0x86/0x260
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  ? do_syscall_64+0x93/0x180
Aug 11 21:06:12 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 11 21:06:12 pve kernel:  entry_SYSCALL_64_after_hwframe+0x73/0x7b
Aug 11 21:06:12 pve kernel: RIP: 0033:0x77de7cfd2c5b
Aug 11 21:06:12 pve kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Aug 11 21:06:12 pve kernel: RSP: 002b:000077de733faf30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 11 21:06:12 pve kernel: RAX: ffffffffffffffda RBX: 00006464d04be750 RCX: 000077de7cfd2c5b
Aug 11 21:06:12 pve kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000020
Aug 11 21:06:12 pve kernel: RBP: 000000000000ae80 R08: 00006464ce4cec90 R09: 0000000000000000
Aug 11 21:06:12 pve kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Aug 11 21:06:12 pve kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Aug 11 21:06:12 pve kernel:  </TASK>
Aug 11 21:06:12 pve kernel: BUG: Bad page state in process CPU 0/KVM  pfn:277882

esi_y · Aug 12, 2024

bossmania said:
The pc frooze during the reboot, and I had to hold the power button to turn it off and on. I then acess the web UI and turn on a vm. However, the vncproxy failed to connect to the vm, and then crashed the pc.

I believe you would be able to access at least SSH, since that works for you (and getting proper text), do you mind simply posting the whole "faulty boot" log: journal -b -1 > out.log

This gets you the whole log bootup till crash of the previous boot (if it was the one before, use -2, etc.). If you lose yourself after multiple reboots, you can check the list with: journalctl --list-boots

In the meantime, I can only guess, but ...

bossmania said:

Bash:

Aug 10 11:03:59 pve kernel: pveproxy worker[1512]: segfault at 58d05811c430 ip 00005890559acf23 sp 00007fffce2a9da0 error 4 in perl[5890558dc000+195000] likely on CPU 1 (core 1, socket 0)

What version of PVE are you running, how long ago have you installed it, how (apt full-upgrade or dist-upgrade or via GUI) did you update it?

Someone else coming around would also probably suggest to run a full memtest [1] on the RAMs as it's a low hanging fruit (to have you wait through it

). I personally would start with removing all but 1 RAM module (if you can reproduce it as easily) and give a it a shot. If it fails, exchange for another module and try again. You get the idea ...

[1] https://www.memtest.org/

bossmania · Aug 14, 2024

esi_y said:
I believe you would be able to access at least SSH, since that works for you (and getting proper text), do you mind simply posting the whole "faulty boot" log: journal -b -1 > out.log

This gets you the whole log bootup till crash of the previous boot (if it was the one before, use -2, etc.). If you lose yourself after multiple reboots, you can check the list with: journalctl --list-boots

In the meantime, I can only guess, but ...

What version of PVE are you running, how long ago have you installed it, how (apt full-upgrade or dist-upgrade or via GUI) did you update it?

Someone else coming around would also probably suggest to run a full memtest [1] on the RAMs as it's a low hanging fruit (to have you wait through it). I personally would start with removing all but 1 RAM module (if you can reproduce it as easily) and give a it a shot. If it fails, exchange for another module and try again. You get the idea ...

[1] https://www.memtest.org/

Now the system won't boot at all and is stuck on this screen, so I'm now unable to ssh into the machine. I didn't apt update it cause I been reinstalled the image so many times to make sure that the install isn't corrupt, and forgot to update it. However I was able to get memtest86+ to run the test, and I accidentally left it running for almost the entire day.

esi_y · Aug 14, 2024

bossmania said:
Now the system won't boot at all and is stuck on this screen

I would install it as a regular ext4 at first on one of the SSDs. Pick one that has less of "history" and see from there. At least you can check SMART values on the drives after you have a running system.

bossmania · Aug 16, 2024

esi_y said:
Someone else coming around would also probably suggest to run a full memtest [1] on the RAMs as it's a low hanging fruit (to have you wait through it). I personally would start with removing all but 1 RAM module (if you can reproduce it as easily) and give a it a shot. If it fails, exchange for another module and try again. You get the idea ...

[1] https://www.memtest.org/

So I noticed that memtest reported the ram with a different name than what I bought. So I bought a new ram stick, and the machine hasn't crashed in 24 hours. So my problem was faulty rams. Thanks for the help tho.

Search

Search

[SOLVED] Proxmox constantly crashes after few mins

bossmania

New Member

Attachments

esi_y

Renowned Member

bossmania

New Member

bossmania

New Member

esi_y

Renowned Member

bossmania

New Member

Attachments

esi_y

Renowned Member

bossmania

New Member