Hi, proxmox community.
OS: Proxmox 8.4.1
NVME Pool (for VMs): 3x Intel P3500 4TB + 1x Intel P4500 4TB NVME
Virtual machines on Proxmox with a ZFS pool (4 NVMe drives in RAID-Z1 configuration) are experiencing slowdowns, freezes, and read/write errors. Guest systems report disk access issues, and database operations end with unexpected errors. System logs indicate ZFS pool synchronization problems and potential irregularities in one of the NVMe drives:
[1516596.685261] zio_wait+0x13a/0x2c0 [zfs]
[1516596.685561] dsl_scan_sync+0xdf1/0x14a0 [zfs]
[1516596.686127] spa_sync+0x5f3/0x1030 [zfs]
[2.606601] nvme nvme0: Ignoring bogus Namespace Identifiers
What diagnostic steps and solutions do you recommend for ZFS on NVMe drives when random slowdowns and errors occur without clear indications in the logs? How to distinguish hardware issues from ZFS configuration errors in such cases?
OS: Proxmox 8.4.1
NVME Pool (for VMs): 3x Intel P3500 4TB + 1x Intel P4500 4TB NVME
Virtual machines on Proxmox with a ZFS pool (4 NVMe drives in RAID-Z1 configuration) are experiencing slowdowns, freezes, and read/write errors. Guest systems report disk access issues, and database operations end with unexpected errors. System logs indicate ZFS pool synchronization problems and potential irregularities in one of the NVMe drives:
[1516596.685261] zio_wait+0x13a/0x2c0 [zfs]
[1516596.685561] dsl_scan_sync+0xdf1/0x14a0 [zfs]
[1516596.686127] spa_sync+0x5f3/0x1030 [zfs]
[2.606601] nvme nvme0: Ignoring bogus Namespace Identifiers
Code:
Zpool status -v:
pool: NVME
state: ONLINE
scan: scrub repaired 0B in 00:57:03 with 0 errors on Sun May 11 01:21:06 2025
config:
NAME STATE READ WRITE CKSUM
NVME ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
nvme-INTEL_SSDPE2KX020T7_PHLF813201MM2P0HGN ONLINE 0 0 0
nvme-nvme.8086-43565044363438323030423332503034474e-494e54454c205353445045324d583032305434-00000001 ONLINE 0 0 0
nvme-INTEL_SSDPE2MX020T4_CVPD648200952P04GN ONLINE 0 0 0
nvme-INTEL_SSDPE2MX020T4_CVPD6482006V2P04GN ONLINE 0 0 0
errors: No known data errors
What diagnostic steps and solutions do you recommend for ZFS on NVMe drives when random slowdowns and errors occur without clear indications in the logs? How to distinguish hardware issues from ZFS configuration errors in such cases?