Nvme problems on HP elitedesk 800 G2

Hi,

I have a node on a HP elitedesk 800 G2 sff, with a Lexar NQ710 nvme M.2 disk, ZFS.

When I stress the disk (local backup for example, but also some operations inside a 150MB VM), it spikes IO delay and disk temp until it loses temporally the connection with the nvme, then it gets back to work as soon as I stop the workload:

1732802061023.png

It works flawlessly if I do the backup to an external NFS storage, or through PBS.

Is there a way to better understand where the problem is?
- incompatible nvme to the (old) mainboard
- faulty (old) mainboard/linux driver
- faulty (quite new even if not much performative) nvme

I do not see that much thing from the syslog or anywhere else during the spikes, only the system going down and back up when it loses disk control:

Code:
Nov 27 20:34:37 pve2 pmxcfs[2922139]: [dcdb] notice: data verification successful
Nov 27 20:49:51 pve2 pve-firewall[1296]: firewall update time (15.416 seconds)
Nov 27 20:49:53 pve2 pvestatd[1308]: status update time (11.030 seconds)
Nov 27 20:49:53 pve2 pmxcfs[2922139]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve2/local-zfs: -1
Nov 27 20:49:53 pve2 pmxcfs[2922139]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve2/Asustornas1: -1
Nov 27 20:49:53 pve2 pmxcfs[2922139]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/pve2/Asustornas1: /var/lib/rrdcached/db/pve2-storage/pve2/Asustornas1: illegal attempt to update using time 1732736991 when last update time is 1732736991 (minimum one second step)
Nov 27 20:49:53 pve2 pmxcfs[2922139]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve2/local: -1
Nov 27 20:49:53 pve2 pmxcfs[2922139]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve2/Backups: -1
Nov 27 20:49:53 pve2 pmxcfs[2922139]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/pve2/Backups: /var/lib/rrdcached/db/pve2-storage/pve2/Backups: illegal attempt to update using time 1732736991 when last update time is 1732736991 (minimum one second step)

I know I can update the BIOS but would know if there's something else I can do to debug and plan a solution.

Thank you.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!