Proxmox + NVME Pool read Errors

Pito2317

New Member
Feb 8, 2024
5
0
1
Hi, proxmox community.

OS: Proxmox 8.4.1
NVME Pool (for VMs): 3x Intel P3500 4TB + 1x Intel P4500 4TB NVME

Virtual machines on Proxmox with a ZFS pool (4 NVMe drives in RAID-Z1 configuration) are experiencing slowdowns, freezes, and read/write errors. Guest systems report disk access issues, and database operations end with unexpected errors. System logs indicate ZFS pool synchronization problems and potential irregularities in one of the NVMe drives:


[1516596.685261] zio_wait+0x13a/0x2c0 [zfs]
[1516596.685561] dsl_scan_sync+0xdf1/0x14a0 [zfs]
[1516596.686127] spa_sync+0x5f3/0x1030 [zfs]


[2.606601] nvme nvme0: Ignoring bogus Namespace Identifiers

Code:
Zpool status -v:

  pool: NVME
 state: ONLINE
  scan: scrub repaired 0B in 00:57:03 with 0 errors on Sun May 11 01:21:06 2025
config:

        NAME                                                                                                     STATE     READ WRITE CKSUM
        NVME                                                                                                     ONLINE       0     0     0
          raidz1-0                                                                                               ONLINE       0     0     0
            nvme-INTEL_SSDPE2KX020T7_PHLF813201MM2P0HGN                                                          ONLINE       0     0     0
            nvme-nvme.8086-43565044363438323030423332503034474e-494e54454c205353445045324d583032305434-00000001  ONLINE       0     0     0
            nvme-INTEL_SSDPE2MX020T4_CVPD648200952P04GN                                                          ONLINE       0     0     0
            nvme-INTEL_SSDPE2MX020T4_CVPD6482006V2P04GN                                                          ONLINE       0     0     0

errors: No known data errors


What diagnostic steps and solutions do you recommend for ZFS on NVMe drives when random slowdowns and errors occur without clear indications in the logs? How to distinguish hardware issues from ZFS configuration errors in such cases?