ZFS suspended

stefferle

New Member
Mar 30, 2024
2
0
1
Hi,

my proxmox server suddenly got SUSPENDED status on my ZFS volume. I've got 2 Samsung M.2 990 Pro SSDs, first on of them hab been in REMOVED status, and the volum wase sipmly in DEGRADED mode.

Here is my zpool status :
pool: data
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
config:

NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
mirror-0 DEGRADED 28 18 0
nvme-Samsung_SSD_990_PRO_with_Heatsink_4TB_S7HRNJ0WC04330B_1 REMOVED 0 0 0
nvme-Samsung_SSD_990_PRO_with_Heatsink_4TB_S7HRNJ0WC04333J_1 ONLINE 56 19 0

errors: 144 data errors, use '-v' for a list

pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:02:06 with 0 errors on Sun Mar 10 00:26:07 2024
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
nvme-eui.000000000000000100a075234516b8dc-part3 ONLINE 0 0 0

errors: No known data errors

Any advice ?

Steff
 
Try to find out what is wrong from error messages in journalctl (use the arrows keys to scroll) and fix them?
The removed drive could be a M.2 connector or CPU PCIe lane connector issues. Maybe it just need to be plugged in again?
It could also be a memory problem; maybe run a memtest first. But it's unlikely because there are no checksum errors, only read and write and especially the latter indicate a drive issue.
If the drives are the problem, replace them and restore VM/CTs from backups.
This is not Proxmox specific, so any generic NVMe troubleshooting or ZFS guide might help you.
 
Once you have a Suspended ZFS pool, you're going to have to reboot. Shutdown the regular way as much as possible - if it hangs then you will have to power down hard.

Try reseating the drives, but also look into firmware update for the 990s - I've seen several posts saying they die early and to avoid them. You may want to replace the "bad" 990 with an Enterprise SSD/nvme.

https://www.youtube.com/watch?v=D7XgEfxPGuo
 
Last edited:
Thank you for your answers, I will try shutting down thé regular way, open colouter and check first for an hardware issue.