Assistance With Possible Hardware Problems

ensall

Member
Dec 28, 2020
25
0
6
34
My server is running hardware that's between 5-6 years old aside from the drives that's been running 24/7 and I'm starting to have outages/crashes roughly every 9 days it seems. It literally just happened and caused VMs to start slowly locking up and once I got them powered down I was getting systemd timeouts and was unable to power them up till I power cycled the server. I did see that I had a stack trace calling out a taint on CPU 9 prior to the power cycle so I wanted to see if I could get some assistance in digging into this to see if it's time to look into new hardware or not since I'm running all desktop grade hardware since I don't want the power draw or noise of server grade components.

Code:
root@proxmox:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-helper: 7.3-6
pve-kernel-5.15: 7.3-2
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.39-3-pve: 5.15.39-3
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-6
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-3
pve-qemu-kvm: 7.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

Here's the issue I mentioned with CPU 9
Code:
Mar 16 16:34:15 proxmox kernel: [842475.449242] PGD 0 P4D 0
Mar 16 16:34:15 proxmox kernel: [842475.451512] Oops: 0000 [#2] SMP NOPTI
Mar 16 16:34:15 proxmox kernel: [842475.453755] CPU: 9 PID: 315308 Comm: iou-wrk-9392 Tainted: P      D W  O      5.15.83-1-pve #1
Mar 16 16:34:15 proxmox kernel: [842475.455979] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P5.10 12/17/2018
Mar 16 16:34:15 proxmox kernel: [842475.458185] RIP: 0010:dm_io_dec_pending+0x158/0x250
Mar 16 16:34:15 proxmox kernel: [842475.460363] Code: d0 00 00 00 e8 69 5b 6e ff 48 8b 44 24 10 65 48 2b 04 25 28 00 00 00 0f 85 f1 00 00 00 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f <5d> e9 32 88 60 00 45 84 f6 74 05 45 88 74 24 1a 4c 89 e7 e8 f0 d0
Mar 16 16:34:15 proxmox kernel: [842475.466872] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff927fe12722c0
Mar 16 16:34:15 proxmox kernel: [842475.473082] R13: 0000000000000000 R14: ffff927ac3e30578 R15: ffff927a5f8be418
Mar 16 16:34:15 proxmox kernel: [842475.484758]  ? dm_submit_bio+0x1bb/0x390
Mar 16 16:34:15 proxmox kernel: [842475.493650]  ? submit_bio+0x4d/0x140
Mar 16 16:34:15 proxmox kernel: [842475.501774]  ? blkdev_write_iter+0xb3/0x160
Mar 16 16:34:15 proxmox kernel: [842475.509133]  ? io_issue_sqe+0x401/0x1fd0
Mar 16 16:34:15 proxmox kernel: [842475.515719]  ? finish_task_switch.isra.0+0x7e/0x2b0
Mar 16 16:34:15 proxmox kernel: [842475.602329] Code: 14 49 0f ba b4 24 b8 01 00 00 11 49 0f ba b4 24 bc 01 00 00 11 49 8b 84 24 68 21 00 00 83 78 70 52 0f 84 d8 03 00 00 4c 89 e7 <e8> 3b d2 ff ff 41 8b 84 24 08 02 00 00 83 e0 20 0f 85 eb 01 00 00
Mar 16 16:34:15 proxmox kernel: [842475.603389] RSP: 0018:ffffb859495d7d08 EFLAGS: 00010012
Mar 16 16:34:15 proxmox kernel: [842475.604413] RAX: ffff927ae696a000 RBX: 0000000000000001 RCX: 0000000000000010
Mar 16 16:34:15 proxmox kernel: [842475.605404] RDX: ffff927a10d78000 RSI: 0000000000000010 RDI: ffff927a11c94300
Mar 16 16:34:15 proxmox kernel: [842475.606366] RBP: ffffb859495d7d50 R08: 0000000000000000 R09: 0000000000000000
Mar 16 16:34:15 proxmox kernel: [842475.607301] R10: 0000000000000000 R11: 0000000000000000 R12: ffff927ac37b26f0
Mar 16 16:34:15 proxmox kernel: [842475.608199] R13: ffffb8594932d000 R14: ffffb8594932d000 R15: 0000000000000009
Mar 16 16:34:15 proxmox kernel: [842475.609072] FS:  00007f2557dff700(0000) GS:ffff9280ff040000(0000) knlGS:0000000000000000
Mar 16 16:34:15 proxmox kernel: [842475.609920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 16 16:34:15 proxmox kernel: [842475.610742] CR2: 0000000000000000 CR3: 000000017842a000 CR4: 00000000003506e0

Server specs:
Ryzen 7 1700
32GB of Gskill memory
Asrock x370 Taichi
1 NVME
2 Spinning disk
1 SATA SSD

Let me know what other info you need from me and I'll be happy to provide and appreciate any assistance.
 
The kernel is tainted because of ZFS, which is not a problem. Looks like a drive issue because of the blkdev_write_iter. hope you have backups. Check the SMART value and run a long self-test (for all drives).
 
@leesteken I do want to make sure to mention that I'm running only LVMs and not ZFS. Would that potentially impact things and what I should look for or just continue with the long smart tests?
 
Though I'm wondering if it's my external SSD I've got passed through to a VM that I'm using for a cache drive for the moment. May need to stop that since I know that drive is old and had tons of data passed through it over the past 8 years
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!