Another zfs degraded thread...sorry.

generalproxuser

Active Member
Mar 14, 2021
107
34
33
44
This is my first true go around with using a zfs pool as my storage backend.

I recently installed proxmox on a dell r420 because I got it for free literally. I flashed the HBA with IT mode (H710P Mini) and created a zfs pool with 8 disks (2TB SSD) in RAID10. All disks were purchased NIB. I have proxmox installed on a separate 256GB SSD.

Setup went fine and I chugged along. Now the status is showing degraded and 2 of the disks are not recognized. The data is still in-tact but I was wondering if I can hotswap the two disks in question?

I don't believe it is the disks themselves so I was hoping maybe I can somehow remove the disks in question, clean (delete partitions) in another system (linux live) and then re-introduce them back into the pool.

Otherwise I believe I may need to recreate the whole pool altogether with a bit more features enable/disabled to leverage longevity of the SSD's.

Thanks for any insight.
 
Last edited:
Check your logs if you can see the disks failing.
I have had a lot of trouble when I started with zfs because my SAS expander did not work as expected. Do you have any idea if your backplane contains an expander?
 
@tburger

Thanks for chiming in. To this point I have always used hardware "as is". Flashing the H710 to IT mode was definitely a new experience for me and it was a necessity to use ZFS. All that to say that I am unsure if the r420 server I acquired has an SAS expander.

Also, which logs (and where) do I look to check if the disks are failing?
 
Interesting...the logs didn't show the disks failing. I went ahead and did some hot swaps and the drives were recognized but showed unavailable. I wiped the disks through the gui and then used the command line to replace them. Even after replacing them they still showed as unavailable. I went ahead and destroyed the pool and re-created it. Now I am wondering if there is some procedure out there to allow zfs to have longevity and reliability especially with ssds.

I am also wondering if I should be looking at tuning the proxmox host with the 256GB ssd os install. Right now I am seeing some significant web gui delay along chrome and firefox browsers. SSH is fine.
 
longevity and reliability especially with ssds.
that is totally up to the SSDs. Best to use enterprise grade hardware.

Please install lsscsi and output (in code-tags) the output of lsscsi --size. Best is to analyze a failed system instead of a working one, so can you play around and do what you did last time that broke your system? Maybe just installing stuff, loading the disks full with data and such stuff.
 
If (or when) this pool fails I will do the lssci commands. @LnxBil says its best to analyze failed system (and I agree) so I will wait until then. Right now what I was using the pool for is back underway in full swing and so far it is still showing green on all disks.
 
Do you know the uptime of your system before the issue happened? I am still chasing an issue which strikes me exactly every 25 days :rolleyes::rolleyes:
 
I do not know the uptime when the pool failed but it was definitely less than 25 days (less than 5 days even) and I was constantly working on data file transfers.
 
Well the 25 days only is my "magic" number. Yours might be different. My point is: have an eye on uptime as well. Sometimes issues manifest themselves over time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!