Bulk shutdown after start - Hardware change

CitizenKanedk

New Member
Aug 15, 2024
6
0
1
Been running Proxmox without a hitch for ½ year, changed the mobo , psu and cpu. after changing the adapter name everything seems to run smoothe until I start a VM. after about 30 secs. Proxmox iniates a bulk shutdown and powers off. Some1 knows what this is about?

cheers guys!
 
This time I turned of my USB keyboard and received this.

Aug 15 21:37:53 Server systemd-logind[717]: Power key pressed short.
Aug 15 21:37:53 Server systemd-logind[717]: Powering off...
Aug 15 21:37:53 Server systemd-logind[717]: System is powering down.
Aug 15 21:37:53 Server systemd[1]: 100.scope: Deactivated successfully.
Aug 15 21:37:53 Server systemd[1]: Stopped 100.scope.
Aug 15 21:37:53 Server systemd[1]: 100.scope: Consumed 1min 19.201s CPU time.
Aug 15 21:37:53 Server systemd[1]: Stopping session-1.scope - Session 1 of User root...
Aug 15 21:37:53 Server systemd[1]: Removed slice qemu.slice - Slice /qemu.
Aug 15 21:37:53 Server systemd[1]: qemu.slice: Consumed 1min 19.202s CPU time.

and get this error continuesly
Aug 15 21:45:38 Server pmxcfs[914]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/102: -1
Aug 15 21:45:38 Server pmxcfs[914]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/102: /var/lib/rrdcached/db/pve2-vm/102: illegal attempt to update using time 1723751138 when last update time is 1723751139 (minimum one second step)
Aug 15 21:45:38 Server pmxcfs[914]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/Server/local-btrfs: -1
Aug 15 21:45:38 Server pmxcfs[914]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/Server/local-btrfs: /var/lib/rrdcached/db/pve2-storage/Server/local-btrfs: illegal attempt to update using time 1723751138 when last update time is 1723751139 (minimum one second step)
 
Now 30 secs of this:

ug 15 22:06:42 Server pvedaemon[1050]: Bad symbol for filehandle at /usr/lib/x86_64-linux-gnu/perl5/5.36/AnyEvent.pm line 1384.
Aug 15 22:06:42 Server pvedaemon[1052]: Bad symbol for filehandle at /usr/lib/x86_64-linux-gnu/perl5/5.36/AnyEvent.pm line 1384.
Aug 15 22:06:42 Server pvedaemon[1051]: Bad symbol for filehandle at /usr/lib/x86_64-linux-gnu/perl5/5.36/AnyEvent.pm line 1384.
Aug 15 22:06:42 Server pvedaemon[1050]: Bad symbol for filehandle at /usr/lib/x86_64-linux-gnu/perl5/5.36/AnyEvent.pm line 1384.
Aug 15 22:07:10 Server systemd-journald[391]: Suppressed 1493274 messages from pvedaemon.service
Aug 15 22:07:10 Server pvedaemon[1050]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server pvedaemon[1051]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server pvedaemon[1052]: Can't locate object method "new" via package "SelectSaver" (perhaps you forgot to load "SelectSaver"?) at /usr/lib/x86_64-linux-gnu/perl-base/IO/Handle.pm line 214.
Aug 15 22:07:10 Server pvedaemon[1051]: Can't locate object method "new" via package "SelectSaver" (perhaps you forgot to load "SelectSaver"?) at /usr/lib/x86_64-linux-gnu/perl-base/IO/Handle.pm line 214.
Aug 15 22:07:10 Server pvedaemon[1050]: Can't locate object method "new" via package "SelectSaver" (perhaps you forgot to load "SelectSaver"?) at /usr/lib/x86_64-linux-gnu/perl-base/IO/Handle.pm line 214.
Aug 15 22:07:10 Server pvedaemon[1052]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server pvedaemon[1052]: Can't locate object method "new" via package "SelectSaver" (perhaps you forgot to load "SelectSaver"?) at /usr/lib/x86_64-linux-gnu/perl-base/IO/Handle.pm line 214.
Aug 15 22:07:10 Server pvedaemon[1050]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server pvedaemon[1051]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server pvedaemon[1050]: Can't locate object method "new" via package "SelectSaver" (perhaps you forgot to load "SelectSaver"?) at /usr/lib/x86_64-linux-gnu/perl-base/IO/Handle.pm line 214.
Aug 15 22:07:10 Server pvedaemon[1052]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server systemd-journald[391]: Suppressed 727892 messages from pvedaemon.service
 
and why is my SSDs reporting crazy temperatures that cant be real.

Aug 15 22:09:55 Server smartd[700]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 70
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 70
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 69 to 64
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdd [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 59
 
okay Ill just do the Windows fix and reinstall it since it just blew up my VM

Aug 15 22:37:57 Server kernel: BUG: unable to handle page fault for address: ffffffffc18466b0
Aug 15 22:37:57 Server kernel: #PF: supervisor write access in kernel mode
Aug 15 22:37:57 Server kernel: #PF: error_code(0x0003) - permissions violation
Aug 15 22:37:57 Server kernel: PGD 8ea23b067 P4D 8ea23b067 PUD 8ea23d067 PMD 10f932067 PTE 107368121
Aug 15 22:37:57 Server kernel: Oops: 0003 [#6] PREEMPT SMP NOPTI
Aug 15 22:37:57 Server kernel: CPU: 3 PID: 18981 Comm: sh Tainted: P D O 6.8.8-2-pve #1
Aug 15 22:37:57 Server kernel: Hardware name: To Be Filled By O.E.M. B550M-ITX/ac/B550M-ITX/ac, BIOS P3.40 01/18/2024
Aug 15 22:37:57 Server kernel: RIP: 0010:vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: Code: 41 b8 11 00 00 00 48 89 df e8 1c 9e e0 ff e9 6d f1 ff ff 44 89 f2 81 c6 b0 02 00 00 45 89 e9 31 c9 81 ca 00 31 20 00 41 b8 11 <00> 00 00 48 89 df e8 f5 9d e0 ff e9 e3 e4 ff ff 81 c6 b0 02 00 00
Aug 15 22:37:57 Server kernel: RSP: 0018:ffffbae45306bee8 EFLAGS: 00010086
Aug 15 22:37:57 Server kernel: RAX: ffffffffc18466b0 RBX: 0000000000450000 RCX: 0000000000000000
Aug 15 22:37:57 Server kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffdae43ef81fc0
Aug 15 22:37:57 Server kernel: RBP: ffffbae45306bef8 R08: 0000000000000000 R09: 0000000000000000
Aug 15 22:37:57 Server kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffbae45306bf58
Aug 15 22:37:57 Server kernel: R13: ffff970d13292900 R14: 0000000000000000 R15: 0000000000000000
Aug 15 22:37:57 Server kernel: FS: 000075e523d35740(0000) GS:ffff971bfdb80000(0000) knlGS:0000000000000000
Aug 15 22:37:57 Server kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 15 22:37:57 Server kernel: CR2: ffffffffc18466b0 CR3: 0000000cc8774000 CR4: 0000000000f50ef0
Aug 15 22:37:57 Server kernel: PKRU: 55555554
Aug 15 22:37:57 Server kernel: Call Trace:
Aug 15 22:37:57 Server kernel: <TASK>
Aug 15 22:37:57 Server kernel: ? show_regs+0x6d/0x80
Aug 15 22:37:57 Server kernel: ? __die+0x24/0x80
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? page_fault_oops+0x176/0x500
Aug 15 22:37:57 Server kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Aug 15 22:37:57 Server kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? kernelmode_fixup_or_oops+0xb2/0x140
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? __bad_area_nosemaphore+0x1a5/0x270
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? bad_area_nosemaphore+0x16/0x30
Aug 15 22:37:57 Server kernel: ? do_kern_addr_fault+0x7b/0xa0
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? exc_page_fault+0x10d/0x1b0
Aug 15 22:37:57 Server kernel: ? asm_exc_page_fault+0x27/0x30
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? fire_user_return_notifiers+0x3a/0x80
Aug 15 22:37:57 Server kernel: syscall_exit_to_user_mode+0x18b/0x260
Aug 15 22:37:57 Server kernel: ret_from_fork+0x29/0x70
Aug 15 22:37:57 Server kernel: ret_from_fork_asm+0x1b/0x30
Aug 15 22:37:57 Server kernel: RIP: 0033:0x75e523e0c293
Aug 15 22:37:57 Server kernel: Code: Unable to access opcode bytes at 0x75e523e0c269.
Aug 15 22:37:57 Server kernel: RSP: 002b:00007ffd3ccddf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Aug 15 22:37:57 Server kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000075e523e0c293
Aug 15 22:37:57 Server kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
Aug 15 22:37:57 Server kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
Aug 15 22:37:57 Server kernel: R10: 000075e523d35a10 R11: 0000000000000246 R12: 0000000000000001
Aug 15 22:37:57 Server kernel: R13: 000056746fe9d8d0 R14: 000056746fe9d8d0 R15: 00007ffd3ccde090
Aug 15 22:37:57 Server kernel: </TASK>
Aug 15 22:37:57 Server kernel: Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables bonding tls softdog sunrpc nfnetlink_log nfnetlink binfmt_misc intel_rapl_msr intel_rapl_common btusb btrtl edac_mce_amd btintel btbcm joydev kvm_amd btmtk bluetooth amdgpu ecdh_generic kvm ecc hid_generic snd_hda_codec_hdmi irqbypass crct10dif_pclmul polyval_clmulni snd_hda_intel amdxcp polyval_generic ghash_clmulni_intel drm_exec snd_intel_dspcfg sha256_ssse3 gpu_sched snd_intel_sdw_acpi sha1_ssse3 aesni_intel drm_buddy snd_hda_codec drm_suballoc_helper crypto_simd drm_ttm_helper cryptd snd_hda_core ttm snd_hwdep snd_pcm drm_display_helper snd_timer cec rapl snd rc_core wmi_bmof i2c_algo_bit pcspkr soundcore k10temp ccp usbhid hid mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c xhci_pci xhci_pci_renesas crc32_pclmul r8169
Aug 15 22:37:57 Server kernel: ahci i2c_piix4 realtek xhci_hcd libahci video wmi gpio_amdpt
Aug 15 22:37:57 Server kernel: CR2: ffffffffc18466b0
Aug 15 22:37:57 Server kernel: ---[ end trace 0000000000000000 ]---
Aug 15 22:37:57 Server kernel: RIP: 0010:vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: Code: 41 b8 11 00 00 00 48 89 df e8 1c 9e e0 ff e9 6d f1 ff ff 44 89 f2 81 c6 b0 02 00 00 45 89 e9 31 c9 81 ca 00 31 20 00 41 b8 11 <00> 00 00 48 89 df e8 f5 9d e0 ff e9 e3 e4 ff ff 81 c6 b0 02 00 00
Aug 15 22:37:57 Server kernel: RSP: 0018:ffffbae453043b88 EFLAGS: 00010086
Aug 15 22:37:57 Server kernel: RAX: ffffffffc18466b0 RBX: 0000000000450000 RCX: 0000000000000000
Aug 15 22:37:57 Server kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffdae43ef81fc0
Aug 15 22:37:57 Server kernel: RBP: ffffbae453043b98 R08: 0000000000000000 R09: 0000000000000000
Aug 15 22:37:57 Server kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffbae453043f58
Aug 15 22:37:57 Server kernel: R13: ffff970d1138d200 R14: 0000000000000000 R15: 0000000000000000
Aug 15 22:37:57 Server kernel: FS: 000075e523d35740(0000) GS:ffff971bfdb80000(0000) knlGS:0000000000000000
Aug 15 22:37:57 Server kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 15 22:37:57 Server kernel: CR2: 000075e523e0c269 CR3: 0000000cc8774000 CR4: 0000000000f50ef0
Aug 15 22:37:57 Server kernel: PKRU: 55555554
Aug 15 22:37:57 Server kernel: note: sh[18981] exited with irqs disabled
Aug 15 22:37:57 Server kernel: note: sh[18981] exited with preempt_count 1
Aug 15 22:37:57 Server kernel: BUG: unable to handle page fault for address: ffffffffc18466b0
Aug 15 22:37:57 Server kernel: #PF: supervisor write access in kernel mode
Aug 15 22:37:57 Server kernel: #PF: error_code(0x0003) - permissions violation
Aug 15 22:37:57 Server kernel: PGD 8ea23b067 P4D 8ea23b067 PUD 8ea23d067 PMD 10f932067 PTE 107368121
Aug 15 22:37:57 Server kernel: Oops: 0003 [#7] PREEMPT SMP NOPTI
Aug 15 22:37:57 Server kernel: CPU: 3 PID: 18993 Comm: ksmtuned Tainted: P D O 6.8.8-2-pve #1
Aug 15 22:37:57 Server kernel: Hardware name: To Be Filled By O.E.M. B550M-ITX/ac/B550M-ITX/ac, BIOS P3.40 01/18/2024
 
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 70
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 70
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 69 to 64
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdd [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 59
Those values probably aren't temperature in degree celsius but the "health" for that metric going drom 0 to 100 or reverse. See the "smartctl -a ..." output and have a look at the datasheet of the disk manufacturer in case there is one that explains how to interpret SMART attributes.
 
Last edited: