My server is running hardware that's between 5-6 years old aside from the drives that's been running 24/7 and I'm starting to have outages/crashes roughly every 9 days it seems. It literally just happened and caused VMs to start slowly locking up and once I got them powered down I was getting systemd timeouts and was unable to power them up till I power cycled the server. I did see that I had a stack trace calling out a taint on CPU 9 prior to the power cycle so I wanted to see if I could get some assistance in digging into this to see if it's time to look into new hardware or not since I'm running all desktop grade hardware since I don't want the power draw or noise of server grade components.
Here's the issue I mentioned with CPU 9
Server specs:
Ryzen 7 1700
32GB of Gskill memory
Asrock x370 Taichi
1 NVME
2 Spinning disk
1 SATA SSD
Let me know what other info you need from me and I'll be happy to provide and appreciate any assistance.
Code:
root@proxmox:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-helper: 7.3-6
pve-kernel-5.15: 7.3-2
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.39-3-pve: 5.15.39-3
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-6
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-3
pve-qemu-kvm: 7.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
Here's the issue I mentioned with CPU 9
Code:
Mar 16 16:34:15 proxmox kernel: [842475.449242] PGD 0 P4D 0
Mar 16 16:34:15 proxmox kernel: [842475.451512] Oops: 0000 [#2] SMP NOPTI
Mar 16 16:34:15 proxmox kernel: [842475.453755] CPU: 9 PID: 315308 Comm: iou-wrk-9392 Tainted: P D W O 5.15.83-1-pve #1
Mar 16 16:34:15 proxmox kernel: [842475.455979] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P5.10 12/17/2018
Mar 16 16:34:15 proxmox kernel: [842475.458185] RIP: 0010:dm_io_dec_pending+0x158/0x250
Mar 16 16:34:15 proxmox kernel: [842475.460363] Code: d0 00 00 00 e8 69 5b 6e ff 48 8b 44 24 10 65 48 2b 04 25 28 00 00 00 0f 85 f1 00 00 00 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f <5d> e9 32 88 60 00 45 84 f6 74 05 45 88 74 24 1a 4c 89 e7 e8 f0 d0
Mar 16 16:34:15 proxmox kernel: [842475.466872] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff927fe12722c0
Mar 16 16:34:15 proxmox kernel: [842475.473082] R13: 0000000000000000 R14: ffff927ac3e30578 R15: ffff927a5f8be418
Mar 16 16:34:15 proxmox kernel: [842475.484758] ? dm_submit_bio+0x1bb/0x390
Mar 16 16:34:15 proxmox kernel: [842475.493650] ? submit_bio+0x4d/0x140
Mar 16 16:34:15 proxmox kernel: [842475.501774] ? blkdev_write_iter+0xb3/0x160
Mar 16 16:34:15 proxmox kernel: [842475.509133] ? io_issue_sqe+0x401/0x1fd0
Mar 16 16:34:15 proxmox kernel: [842475.515719] ? finish_task_switch.isra.0+0x7e/0x2b0
Mar 16 16:34:15 proxmox kernel: [842475.602329] Code: 14 49 0f ba b4 24 b8 01 00 00 11 49 0f ba b4 24 bc 01 00 00 11 49 8b 84 24 68 21 00 00 83 78 70 52 0f 84 d8 03 00 00 4c 89 e7 <e8> 3b d2 ff ff 41 8b 84 24 08 02 00 00 83 e0 20 0f 85 eb 01 00 00
Mar 16 16:34:15 proxmox kernel: [842475.603389] RSP: 0018:ffffb859495d7d08 EFLAGS: 00010012
Mar 16 16:34:15 proxmox kernel: [842475.604413] RAX: ffff927ae696a000 RBX: 0000000000000001 RCX: 0000000000000010
Mar 16 16:34:15 proxmox kernel: [842475.605404] RDX: ffff927a10d78000 RSI: 0000000000000010 RDI: ffff927a11c94300
Mar 16 16:34:15 proxmox kernel: [842475.606366] RBP: ffffb859495d7d50 R08: 0000000000000000 R09: 0000000000000000
Mar 16 16:34:15 proxmox kernel: [842475.607301] R10: 0000000000000000 R11: 0000000000000000 R12: ffff927ac37b26f0
Mar 16 16:34:15 proxmox kernel: [842475.608199] R13: ffffb8594932d000 R14: ffffb8594932d000 R15: 0000000000000009
Mar 16 16:34:15 proxmox kernel: [842475.609072] FS: 00007f2557dff700(0000) GS:ffff9280ff040000(0000) knlGS:0000000000000000
Mar 16 16:34:15 proxmox kernel: [842475.609920] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 16 16:34:15 proxmox kernel: [842475.610742] CR2: 0000000000000000 CR3: 000000017842a000 CR4: 00000000003506e0
Server specs:
Ryzen 7 1700
32GB of Gskill memory
Asrock x370 Taichi
1 NVME
2 Spinning disk
1 SATA SSD
Let me know what other info you need from me and I'll be happy to provide and appreciate any assistance.