Bulk shutdown after start - Hardware change

CitizenKanedk

New Member
Aug 15, 2024
6
0
1
Been running Proxmox without a hitch for ½ year, changed the mobo , psu and cpu. after changing the adapter name everything seems to run smoothe until I start a VM. after about 30 secs. Proxmox iniates a bulk shutdown and powers off. Some1 knows what this is about?

cheers guys!
 
This time I turned of my USB keyboard and received this.

Aug 15 21:37:53 Server systemd-logind[717]: Power key pressed short.
Aug 15 21:37:53 Server systemd-logind[717]: Powering off...
Aug 15 21:37:53 Server systemd-logind[717]: System is powering down.
Aug 15 21:37:53 Server systemd[1]: 100.scope: Deactivated successfully.
Aug 15 21:37:53 Server systemd[1]: Stopped 100.scope.
Aug 15 21:37:53 Server systemd[1]: 100.scope: Consumed 1min 19.201s CPU time.
Aug 15 21:37:53 Server systemd[1]: Stopping session-1.scope - Session 1 of User root...
Aug 15 21:37:53 Server systemd[1]: Removed slice qemu.slice - Slice /qemu.
Aug 15 21:37:53 Server systemd[1]: qemu.slice: Consumed 1min 19.202s CPU time.

and get this error continuesly
Aug 15 21:45:38 Server pmxcfs[914]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/102: -1
Aug 15 21:45:38 Server pmxcfs[914]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/102: /var/lib/rrdcached/db/pve2-vm/102: illegal attempt to update using time 1723751138 when last update time is 1723751139 (minimum one second step)
Aug 15 21:45:38 Server pmxcfs[914]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/Server/local-btrfs: -1
Aug 15 21:45:38 Server pmxcfs[914]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/Server/local-btrfs: /var/lib/rrdcached/db/pve2-storage/Server/local-btrfs: illegal attempt to update using time 1723751138 when last update time is 1723751139 (minimum one second step)
 
Now 30 secs of this:

ug 15 22:06:42 Server pvedaemon[1050]: Bad symbol for filehandle at /usr/lib/x86_64-linux-gnu/perl5/5.36/AnyEvent.pm line 1384.
Aug 15 22:06:42 Server pvedaemon[1052]: Bad symbol for filehandle at /usr/lib/x86_64-linux-gnu/perl5/5.36/AnyEvent.pm line 1384.
Aug 15 22:06:42 Server pvedaemon[1051]: Bad symbol for filehandle at /usr/lib/x86_64-linux-gnu/perl5/5.36/AnyEvent.pm line 1384.
Aug 15 22:06:42 Server pvedaemon[1050]: Bad symbol for filehandle at /usr/lib/x86_64-linux-gnu/perl5/5.36/AnyEvent.pm line 1384.
Aug 15 22:07:10 Server systemd-journald[391]: Suppressed 1493274 messages from pvedaemon.service
Aug 15 22:07:10 Server pvedaemon[1050]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server pvedaemon[1051]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server pvedaemon[1052]: Can't locate object method "new" via package "SelectSaver" (perhaps you forgot to load "SelectSaver"?) at /usr/lib/x86_64-linux-gnu/perl-base/IO/Handle.pm line 214.
Aug 15 22:07:10 Server pvedaemon[1051]: Can't locate object method "new" via package "SelectSaver" (perhaps you forgot to load "SelectSaver"?) at /usr/lib/x86_64-linux-gnu/perl-base/IO/Handle.pm line 214.
Aug 15 22:07:10 Server pvedaemon[1050]: Can't locate object method "new" via package "SelectSaver" (perhaps you forgot to load "SelectSaver"?) at /usr/lib/x86_64-linux-gnu/perl-base/IO/Handle.pm line 214.
Aug 15 22:07:10 Server pvedaemon[1052]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server pvedaemon[1052]: Can't locate object method "new" via package "SelectSaver" (perhaps you forgot to load "SelectSaver"?) at /usr/lib/x86_64-linux-gnu/perl-base/IO/Handle.pm line 214.
Aug 15 22:07:10 Server pvedaemon[1050]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server pvedaemon[1051]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server pvedaemon[1050]: Can't locate object method "new" via package "SelectSaver" (perhaps you forgot to load "SelectSaver"?) at /usr/lib/x86_64-linux-gnu/perl-base/IO/Handle.pm line 214.
Aug 15 22:07:10 Server pvedaemon[1052]: Can't use an undefined value as a symbol reference at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1879.
Aug 15 22:07:10 Server systemd-journald[391]: Suppressed 727892 messages from pvedaemon.service
 
and why is my SSDs reporting crazy temperatures that cant be real.

Aug 15 22:09:55 Server smartd[700]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 70
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 70
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 69 to 64
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdd [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 59
 
okay Ill just do the Windows fix and reinstall it since it just blew up my VM

Aug 15 22:37:57 Server kernel: BUG: unable to handle page fault for address: ffffffffc18466b0
Aug 15 22:37:57 Server kernel: #PF: supervisor write access in kernel mode
Aug 15 22:37:57 Server kernel: #PF: error_code(0x0003) - permissions violation
Aug 15 22:37:57 Server kernel: PGD 8ea23b067 P4D 8ea23b067 PUD 8ea23d067 PMD 10f932067 PTE 107368121
Aug 15 22:37:57 Server kernel: Oops: 0003 [#6] PREEMPT SMP NOPTI
Aug 15 22:37:57 Server kernel: CPU: 3 PID: 18981 Comm: sh Tainted: P D O 6.8.8-2-pve #1
Aug 15 22:37:57 Server kernel: Hardware name: To Be Filled By O.E.M. B550M-ITX/ac/B550M-ITX/ac, BIOS P3.40 01/18/2024
Aug 15 22:37:57 Server kernel: RIP: 0010:vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: Code: 41 b8 11 00 00 00 48 89 df e8 1c 9e e0 ff e9 6d f1 ff ff 44 89 f2 81 c6 b0 02 00 00 45 89 e9 31 c9 81 ca 00 31 20 00 41 b8 11 <00> 00 00 48 89 df e8 f5 9d e0 ff e9 e3 e4 ff ff 81 c6 b0 02 00 00
Aug 15 22:37:57 Server kernel: RSP: 0018:ffffbae45306bee8 EFLAGS: 00010086
Aug 15 22:37:57 Server kernel: RAX: ffffffffc18466b0 RBX: 0000000000450000 RCX: 0000000000000000
Aug 15 22:37:57 Server kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffdae43ef81fc0
Aug 15 22:37:57 Server kernel: RBP: ffffbae45306bef8 R08: 0000000000000000 R09: 0000000000000000
Aug 15 22:37:57 Server kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffbae45306bf58
Aug 15 22:37:57 Server kernel: R13: ffff970d13292900 R14: 0000000000000000 R15: 0000000000000000
Aug 15 22:37:57 Server kernel: FS: 000075e523d35740(0000) GS:ffff971bfdb80000(0000) knlGS:0000000000000000
Aug 15 22:37:57 Server kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 15 22:37:57 Server kernel: CR2: ffffffffc18466b0 CR3: 0000000cc8774000 CR4: 0000000000f50ef0
Aug 15 22:37:57 Server kernel: PKRU: 55555554
Aug 15 22:37:57 Server kernel: Call Trace:
Aug 15 22:37:57 Server kernel: <TASK>
Aug 15 22:37:57 Server kernel: ? show_regs+0x6d/0x80
Aug 15 22:37:57 Server kernel: ? __die+0x24/0x80
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? page_fault_oops+0x176/0x500
Aug 15 22:37:57 Server kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Aug 15 22:37:57 Server kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? kernelmode_fixup_or_oops+0xb2/0x140
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? __bad_area_nosemaphore+0x1a5/0x270
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? bad_area_nosemaphore+0x16/0x30
Aug 15 22:37:57 Server kernel: ? do_kern_addr_fault+0x7b/0xa0
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? exc_page_fault+0x10d/0x1b0
Aug 15 22:37:57 Server kernel: ? asm_exc_page_fault+0x27/0x30
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: ? fire_user_return_notifiers+0x3a/0x80
Aug 15 22:37:57 Server kernel: syscall_exit_to_user_mode+0x18b/0x260
Aug 15 22:37:57 Server kernel: ret_from_fork+0x29/0x70
Aug 15 22:37:57 Server kernel: ret_from_fork_asm+0x1b/0x30
Aug 15 22:37:57 Server kernel: RIP: 0033:0x75e523e0c293
Aug 15 22:37:57 Server kernel: Code: Unable to access opcode bytes at 0x75e523e0c269.
Aug 15 22:37:57 Server kernel: RSP: 002b:00007ffd3ccddf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Aug 15 22:37:57 Server kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000075e523e0c293
Aug 15 22:37:57 Server kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
Aug 15 22:37:57 Server kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
Aug 15 22:37:57 Server kernel: R10: 000075e523d35a10 R11: 0000000000000246 R12: 0000000000000001
Aug 15 22:37:57 Server kernel: R13: 000056746fe9d8d0 R14: 000056746fe9d8d0 R15: 00007ffd3ccde090
Aug 15 22:37:57 Server kernel: </TASK>
Aug 15 22:37:57 Server kernel: Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables bonding tls softdog sunrpc nfnetlink_log nfnetlink binfmt_misc intel_rapl_msr intel_rapl_common btusb btrtl edac_mce_amd btintel btbcm joydev kvm_amd btmtk bluetooth amdgpu ecdh_generic kvm ecc hid_generic snd_hda_codec_hdmi irqbypass crct10dif_pclmul polyval_clmulni snd_hda_intel amdxcp polyval_generic ghash_clmulni_intel drm_exec snd_intel_dspcfg sha256_ssse3 gpu_sched snd_intel_sdw_acpi sha1_ssse3 aesni_intel drm_buddy snd_hda_codec drm_suballoc_helper crypto_simd drm_ttm_helper cryptd snd_hda_core ttm snd_hwdep snd_pcm drm_display_helper snd_timer cec rapl snd rc_core wmi_bmof i2c_algo_bit pcspkr soundcore k10temp ccp usbhid hid mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c xhci_pci xhci_pci_renesas crc32_pclmul r8169
Aug 15 22:37:57 Server kernel: ahci i2c_piix4 realtek xhci_hcd libahci video wmi gpio_amdpt
Aug 15 22:37:57 Server kernel: CR2: ffffffffc18466b0
Aug 15 22:37:57 Server kernel: ---[ end trace 0000000000000000 ]---
Aug 15 22:37:57 Server kernel: RIP: 0010:vcn_v4_0_5_set_powergating_state+0x2190/0x2420 [amdgpu]
Aug 15 22:37:57 Server kernel: Code: 41 b8 11 00 00 00 48 89 df e8 1c 9e e0 ff e9 6d f1 ff ff 44 89 f2 81 c6 b0 02 00 00 45 89 e9 31 c9 81 ca 00 31 20 00 41 b8 11 <00> 00 00 48 89 df e8 f5 9d e0 ff e9 e3 e4 ff ff 81 c6 b0 02 00 00
Aug 15 22:37:57 Server kernel: RSP: 0018:ffffbae453043b88 EFLAGS: 00010086
Aug 15 22:37:57 Server kernel: RAX: ffffffffc18466b0 RBX: 0000000000450000 RCX: 0000000000000000
Aug 15 22:37:57 Server kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffdae43ef81fc0
Aug 15 22:37:57 Server kernel: RBP: ffffbae453043b98 R08: 0000000000000000 R09: 0000000000000000
Aug 15 22:37:57 Server kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffbae453043f58
Aug 15 22:37:57 Server kernel: R13: ffff970d1138d200 R14: 0000000000000000 R15: 0000000000000000
Aug 15 22:37:57 Server kernel: FS: 000075e523d35740(0000) GS:ffff971bfdb80000(0000) knlGS:0000000000000000
Aug 15 22:37:57 Server kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 15 22:37:57 Server kernel: CR2: 000075e523e0c269 CR3: 0000000cc8774000 CR4: 0000000000f50ef0
Aug 15 22:37:57 Server kernel: PKRU: 55555554
Aug 15 22:37:57 Server kernel: note: sh[18981] exited with irqs disabled
Aug 15 22:37:57 Server kernel: note: sh[18981] exited with preempt_count 1
Aug 15 22:37:57 Server kernel: BUG: unable to handle page fault for address: ffffffffc18466b0
Aug 15 22:37:57 Server kernel: #PF: supervisor write access in kernel mode
Aug 15 22:37:57 Server kernel: #PF: error_code(0x0003) - permissions violation
Aug 15 22:37:57 Server kernel: PGD 8ea23b067 P4D 8ea23b067 PUD 8ea23d067 PMD 10f932067 PTE 107368121
Aug 15 22:37:57 Server kernel: Oops: 0003 [#7] PREEMPT SMP NOPTI
Aug 15 22:37:57 Server kernel: CPU: 3 PID: 18993 Comm: ksmtuned Tainted: P D O 6.8.8-2-pve #1
Aug 15 22:37:57 Server kernel: Hardware name: To Be Filled By O.E.M. B550M-ITX/ac/B550M-ITX/ac, BIOS P3.40 01/18/2024
 
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 70
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 70
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 69 to 64
Aug 15 22:09:55 Server smartd[700]: Device: /dev/sdd [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 59
Those values probably aren't temperature in degree celsius but the "health" for that metric going drom 0 to 100 or reverse. See the "smartctl -a ..." output and have a look at the datasheet of the disk manufacturer in case there is one that explains how to interpret SMART attributes.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!