[SOLVED] Meaning of DEGRADED state on ZFS pool

liamlows

Active Member
Jan 9, 2020
37
10
28
Hey y'all,

So I recently set up proxmox on a r720xd and run it as a secondary node in my proxmox cluster. It has 10 1.2TB SAS drives on it (ST1200MM0108) that i have been running in raidz2 and i use a seperate SSD for the host (which runs a NFS share for the zfs pool) and the single VM i run on the r720 which is setup as a torrent seedbox.

Recently i noticed that the ZFS pool status states the pool is degraded and below is a screenshot of the status. I was a little confused since it says i have multiple faults and degraded drives but i have not noticed much of an issue with my pool and everything still works pretty well. The SMART values for all the drives seem fine and there is no indication there of a failed drive. Does anyone know what I should do here? Are my drives actually dying and if so is the only option to replace them with new ones?

I am hoping that either the status is incorrect or there is something I can do to fix them. I just bought these drives off eBay and through facebook marketplace and no issue was mentioned and i found no issues in my own testing. It would be a bummer to have to replace the whole array or find suitable replacements.

Thanks in advance!

1661128532677.png
 
  • Like
Reactions: jairu
Degraded means the pool is still working (so no complete loss of all data) but got problems like too many errors or a failed drive.
You got some checksum errors which means data is corrupted. And alot of read errors. Against the checksum errors you should run a scrub (zpool scrub ZFS1) and see if ZFS got enough parity data to repair the corrupted data.
So either 6 or your 10 disks are slowly failing or there is another problem that could effekt the disks too. Like a bad RAM module (you chould check that with memtest86+). bad PSU, bad cables, bad backplane, bad disk controller/HBA.
And 2 disks are shown as faulted which should mean that 2 of your disks are nonfunctional, so you got no parity anymore and therefore ZFS probably won't be able to fix any corrupted data as there is no parity data available anymore.
 
Last edited:
  • Like
Reactions: jairu and liamlows
@Dunuin Thanks for the prompt response! So ill go ahead and start by checking the memory using memtest86+ since that seems to be an easy thing to start on. I also ran `zpool scrub ZFS1` and `zpool status -v ZFS1` and it appears that the scrub managed to repair some of the data corruption but a lot of disks are still in a degraded state. If you dont mind let me know if this provides any more information that can help narrow down the core issue (see below for screenshot).

NOTE: the perc H310 mini i have in the server has been reflashed to HBA/IT mode for drive pass through, thus, the server itself cannot see the controller or drives at all.

Regarding the two faulted drives, I assume that there is nothing more i can do with those drives other than replace them, correct? Ill try to narrow down which physical ones they are so I can remove them from the array.

As far as all the other stuff you listed as potential issues that was not the drives themselves, I will try to test those last since those will likely be hard to find issues. I have looked in the r720xd at the cables, backplane, and HBA and didnt see anything out of the ordinary. I have also checked iDRAC and see no issues regarding any of those components either (PSU specifically but no other components are reporting red flags).

1661135313608.png
 
  • Like
Reactions: jairu
Although the history of the drives is unknown, I find the many read errors suspicious. Are they maybe not designed for 10 drives in a single chassis? Or is there additional vibration that might cause read issues? Or is the controller too slow and ZFS gives errors because of time-outs? Check the system logs for possible clues to the reasons of the errors.
And of course check all the possible things already mentioned in post #2.
 
  • Like
Reactions: liamlows
oh boy, I didn't know zpool has suffixes for errors in the output. I haven't seen anything like this output before.
PVE too...reporting "2.3M" errors as "2.3" errors in webUI. ;)
Maybe we should add a feature request for that.
 
oh boy, I didn't know zpool has suffixes for errors in the output. I haven't seen anything like this output before.

Hi,

I agree with you, but I see somehow many many errors(but not like Mxxx ) in a single case, where the temperature of the server room was about 54 Celsius ! The hddtemp report was about 60-62 C. Maybe could be another case like this !?

Good luck / Bafta !
 
  • Like
Reactions: liamlows
Hi,

I agree with you, but I see somehow many many errors(but not like Mxxx ) in a single case, where the temperature of the server room was about 54 Celsius ! The hddtemp report was about 60-62 C. Maybe could be another case like this !?

Good luck / Bafta !
currently reading temps on the CPU ~48 C and on the exhaust ~40C, drive temps are reading ~40C and the TRIP temp is set at 60C. Unfortunately I dont think this is what is causing problems.

I am currently trying to determine the physical locations of the faulted drives so I can try swapping them to different ports on the back plane and also going to try some disk utility stuff to see if i can find any other information about the drives and what might be going wrong (like iostat, fmadm, and fmdump).

Definitely a bit of a noob though with working with drives in this manner so any additional advice is greatly appreciated!
 
Although the history of the drives is unknown, I find the many read errors suspicious. Are they maybe not designed for 10 drives in a single chassis? Or is there additional vibration that might cause read issues? Or is the controller too slow and ZFS gives errors because of time-outs? Check the system logs for possible clues to the reasons of the errors.
And of course check all the possible things already mentioned in post #2.
Not sure but I'd assume they can be used in a grouping of 10 drives? I wasn't aware that this was a limitation in some drives. They are Seagate enterprise SAS drives with SED encryption (ST1200MM0108 - Data Sheet). Do you think that because they are SED encrypted and previously used that could be causing anomalous behavior?

Also for everyone helping, thanks a ton! I wanted to bring this post (https://forum.proxmox.com/threads/issues-with-removing-sas-disks-from-zfs-configuration.97424/) to y'alls attention too since i have had strange issues with some of these drives in the past. not sure if this provides much more context but one day the issues in the post simply resolved themself.
 
Alright to provide a little more information, here is another screenshot of all the disks listed in proxmox cross referenced with the zpool status and other useful information of their state.

Also as a side note it looks like i do have an extra disk not in the array that i thought was in the array, so im not sure how that happened or when it showed up as an available drive. Furthermore, when trying to initialize that disk with GPT, it also fails to do so.

Legend:
  • The green boxes signify that the second partition on the disk is "ZFS reserved", these disks were able to be initialized with GPT (using GUI)
  • The red boxes signify that the second partition on the disk is "No" or not "ZFS reserved", these disks were unable to be initialized with GPT (using GUI)
  • "FAULTED" means that the disks in that box are reporting as faulted disks from zpool status
  • "DEGRADED" means that the disks in that box are reporting as degraded from zpool status
  • "OK" means that the disks are reporting as 100% healthy no issues
Screen Shot 2022-08-22 at 1.28.26 PM.png

Looks like I have a bit of a cluster-f*** going on here, im starting to lean towards these being bad disks. Should I plan on destroying the pool and trying to get these disks to act a bit more normal?
 
You could do a long time SMART test on all of them (disk needs to be offline for that, so only with rescue image)
I'll try to do that when I have the next opportunity.

I also noticed that in the above screenshot with the colors and text in post #10, disks /dev/sdd, /dev/sde, /dev/sdf all show the following error when running fsck /dev/sdX for each disk. The only one that is different is /dev/sdf which does not report finding a GPT partition table which is werid because the GUI shows that /dev/sdd, /dev/sde do not have one but fsck does.

1661743798478.png
 
Alright so I have found out my solution to the problem. With some help from Reddit, I was able to narrow it down to the drives themselves and the fact that due to them being SED drives (SED FIPS 140-2 which is essentially a self encryption standard for drives, often used in DoD). They had something called Type 2 PI Protection which has known issues with ZFS (see video https://youtu.be/kxw7O436iZw) and all I had to do was use a utility called sg_format (https://docs.oracle.com/cd/E88353_01/html/E72487/sg-format-8.html) in order to turn of the protection feature . You also need to ensure that locking is not enabled on the drive using sedutil-cli (https://github.com/Drive-Trust-Alliance/sedutil). Lastly, sometimes SED drives come formatted as 520b, to fix this you can use setblocksize (https://github.com/ahouston/setblocksize). All of this is covered in comments on my reddit post but I'll sumarize what had to be done here in case someone stumbles upon this in the future.

How to prepare SED drives for ZFS:
1. First determine if the drive has type 2 PI Protection on. To do this run smartctl -x /dev/sd? and near the top of the output you should see:
Code:
Formatted with type 2 protection <--- THIS IS IT
8 bytes of protection information per logical block
LU is fully provisioned
which indicates the protection is active.

2. Download sg3-utils in order to use sg_format sudo apt-get install -y sg3_utils. I believe this utility will only work with seagate drives however if it doesn't move to setp 5 and continue from there. once installed, run sg_format --format --size=512 --fmtpinfo=0 /dev/sd? where `size` is the block size of the disk to remove the PI protection and format the disk. I recommend running these in batch jobs by adding the & symbol at the end of each command (e.g. cmd & cmd & cmd &) as it takes a while to do each drive. Make sure it completes successfully and doesn't output errors at the end of running.

3. After the format is complete reboot the machine, preferably a cold boot.

4. Verify if type 2 protection is disabled via smartctl -x /dev/sd? as was done in step 1. It should just say
Code:
LU is fully provisioned

5. Download sedutil-cli and compile it or download the binary. Once downloaded, find the binary and run sedutil-cli --query /dev/sd? to see if "Locked = N". If it does, you are likely done and will not have issues with ZFS moving forward. If it says "Locked = Y" continue.

After this step you may be done if you're using Seagate SED drives, but it doesn't hurt to continue (although likely unneccesary). Im fairly confident the below will work with any SED drive. Here on we unlock the drive using the PSID and set the block size to 512.

6. Locate the physical drive and write down the PSID which should be on the label. This is the "backup" key and the only way to disable the encryption. Make note of the serial number as well so you know which vdev corresponds with what PSID.

7. Run sedutil-cli --yesIreallywanttoERASEALLmydatausingthePSID <PSIDALLCAPSNODASHES> /dev/sd? as per https://github.com/Drive-Trust-Alliance/sedutil/blob/master/linux/PSIDRevert_LINUX.txt.

8. Reboot again, preferably cold.

9. Download setblocksize, navigate to the source directory, and compile it using make. Ensure sg_format is installed and run sg_map to get the corresponding /dev/sg? from /dev/sd?.

10. Run yes | ./setblocksize -b512 /dev/sg? to begin the format. I recommend running these in batch jobs by adding the & symbol at the end of each command (e.g. cmd & cmd & cmd &) as it takes a while to do each drive.

11. Cold reboot again and you should be done!

Hope this helps! Some of it is probably unnecessary so if anyone know more feel free to correct me. In the end i got my drives working though and didn't need to replace any components.

Links:

My Reddit post: https://www.reddit.com/r/homelab/comments/xz4rd4/strange_issues_with_zfs_pool_on_r720xd_wh710p/
sedutil-cli and setblocksize reddit post: https://www.reddit.com/r/homelab/comments/wdvf2j/psa_working_with_selfencrypting_drives_with/
 
Hi there, I had to setup my Proxmox Backup Installtion again with 2 disk in ZFS mode from the old system. The storage tab says: ZFS reserved. The button initialze with GPT is grayed out. There are 2 virtual drives with each physical disk. The smaller one says: ZFS reserved - the larger one says ZFS. Is there any way to rebuild this ZFS datastorage?

Thanks

Olaf
 
Hi there, I had to setup my Proxmox Backup Installtion again with 2 disk in ZFS mode from the old system. The storage tab says: ZFS reserved. The button initialze with GPT is grayed out. There are 2 virtual drives with each physical disk. The smaller one says: ZFS reserved - the larger one says ZFS. Is there any way to rebuild this ZFS datastorage?
Please create a new thread and post as many information (command output in CODE tags) as possible.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!