ZFS suspended

stefferle

New Member
Mar 30, 2024
2
0
1
Hi,

my proxmox server suddenly got SUSPENDED status on my ZFS volume. I've got 2 Samsung M.2 990 Pro SSDs, first on of them hab been in REMOVED status, and the volum wase sipmly in DEGRADED mode.

Here is my zpool status :
pool: data
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
config:

NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
mirror-0 DEGRADED 28 18 0
nvme-Samsung_SSD_990_PRO_with_Heatsink_4TB_S7HRNJ0WC04330B_1 REMOVED 0 0 0
nvme-Samsung_SSD_990_PRO_with_Heatsink_4TB_S7HRNJ0WC04333J_1 ONLINE 56 19 0

errors: 144 data errors, use '-v' for a list

pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:02:06 with 0 errors on Sun Mar 10 00:26:07 2024
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
nvme-eui.000000000000000100a075234516b8dc-part3 ONLINE 0 0 0

errors: No known data errors

Any advice ?

Steff
 
Try to find out what is wrong from error messages in journalctl (use the arrows keys to scroll) and fix them?
The removed drive could be a M.2 connector or CPU PCIe lane connector issues. Maybe it just need to be plugged in again?
It could also be a memory problem; maybe run a memtest first. But it's unlikely because there are no checksum errors, only read and write and especially the latter indicate a drive issue.
If the drives are the problem, replace them and restore VM/CTs from backups.
This is not Proxmox specific, so any generic NVMe troubleshooting or ZFS guide might help you.
 
Once you have a Suspended ZFS pool, you're going to have to reboot. Shutdown the regular way as much as possible - if it hangs then you will have to power down hard.

Try reseating the drives, but also look into firmware update for the 990s - I've seen several posts saying they die early and to avoid them. You may want to replace the "bad" 990 with an Enterprise SSD/nvme.

https://www.youtube.com/watch?v=D7XgEfxPGuo
 
Last edited:
Thank you for your answers, I will try shutting down thé regular way, open colouter and check first for an hardware issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!