Help fix my ZFS Pool

MazerRackham101

New Member
Nov 10, 2022
5
0
1
tldr: "zpool upgrade" failed and now I can't get to the data in my main ZFS pool.



Despite my better judgement, I ran "zpool upgrade. "zpool status" kept giving me the same status that I needed to run "zpool upgrade" on my main ZFS pool. I did so, and immediately started getting some pretty major errors.

I've been fighting this for a few hours now, and I'm still terrified I lost all of my data.

So far:

I can boot just fine into Proxmox if the 5 drives of the ZFS pool aren't physically inserted.

Inserting the drives causes no errors. "zpool import ZFSPool1" fails, hangs, and causes issues in Proxmox. It starts throwing "task zpool:126913blocked for more than XXX seconds" errors every 2 minutes. ps aux doesn't find a task ID 126913.

"zpool import -N ZFSPool1" works perfectly, but doesn't mount any datasets (what the -N option does) - but it DOES show the datasets.

"zpool import -nF ZFSPool1" to repair the pool ALSO starts throwing those same "task zpool:126913 blocked for more than XXX seconds" or "task txg_sync" errors.

I'm terrified I've lost my data again. It's all replaceable (everything irreplaceable is safely backed up) but would take a LOT of time to regather everything.

The main glimmer of hope I have is that, upon import with the "-N" option shows the pool, the data, and the data usage. I just can't touch it from Proxmox.

Proxmox "sees" my big ZFS Pool (ZFSPool1)
1698928784490.png

Errors coming up on the server's console
1698928795842.png
 
Last edited:
Hey man, it's been quite a while since you posted this, but I was curious to see what you ended up doing?
I ran into the same issue on my Proxmox server yesterday and I'm going crazy trying to get everything back up.
Thank you in advanced!
 
Hey man, it's been quite a while since you posted this, but I was curious to see what you ended up doing?
I ran into the same issue on my Proxmox server yesterday and I'm going crazy trying to get everything back up.
Thank you in advanced!
So there was decent thread engagement when I posted it on Level1Techs.

Basically, I ended up just force-mounting the dataset (using -N) and waited an INSANELY long time for it to mount (like >3 hours) and I got access to the data again.

The ZFS Pool was still borked to oblivion, but I could access it.

I put a bunch of drives in a TempPool and moved all the data to it. I then destroyed OriginalPool, and then I created NewPool on the same old disks as OriginalPool. I moved the data all back from TempPool to the NewPool.

Unfortunately, there wasn't a great solution.
 
Thanks for your swift reply!

Noob question though, how are you able to run commands while you're pool is in error?
When I have my drives connected, I can't boot into proxmox at all due to the following message at boot:
"Failed to start zfs-import@tank.service - Import ZFS pool tank"

When I have my drives disconnected, I can run zpool commands but proxmox can't find any pools whatsoever. So I won't even be and to run -N.

When I boot into recovery while my drives are connected, I can run zpool commands.
However, I just keep getting this in a loop.
Waited about an hour.
Unless it's not a loop and I do indeed have to wait longer as you mentioned?
 

Attachments

  • Prox.jpg
    Prox.jpg
    1,004.7 KB · Views: 9
Last edited:
Haha, so, I'm not SUGGESTING this, but I just pulled all the drives out, then booted, then plugged the drives back in.

It made it really fun to troubleshoot because I got to sit on the floor next to the rack the whole time as I had to do it like 100x while fighting with it.

Word to the wise: go ahead and set all VMs and LXCs to not boot at start. It'll make the not cycling much faster. I know I had to reboot a bunch while fighting it.
 
Lmao yeah the ol' "sitting down on the floor in a cramped area like a peasant while troubleshooting a homelab" scenario XD

Unfortunately, I already tried removing the drives, boot proxmox, then plug the drives back in. But proxmox doesn't detect any of them.

I had also removed auto boot from all my LXC and VMs, but the issue still persists :(
 
Will it boot without the drives in but fails to boot with your drives in?

Can you boot it up, plug in the drives and then run lsblk and see those drives?

If so, you should be able to "zpool import -N <ZFSPoolName>" and get it to start importing?
 
Precisely.
Boots without drives.
Doesn't boot with drives.

Once I boot without drive, then plug my drives in. Run lsblk. Can't see the drives.

Maybe I'm supposed to run a command to mount them in order for them to show up?
 
Unfortunately, I'm not sure how you could go about getting them to show up on lsblk. Maybe try lsblk -a?

Best I know, any plugged-in drive should show up.

Maybe you have to unmount the zfs pool you're trying to regain access to, make sure there are no files or directories at the mount point, and then try mounting again?

Try something like:
Bash:
zfs unmount -F ZFSPoolName

ls /ZFSPoolName/ -a

(If there's anything there, then, with the drives physicall removed:)
rm -rdf /ZFSPoolName/*

zfs mount -N ZFSPoolName

Don't take my word for it, by the way. That's a pretty aggressive route.

All it's doing above is ensuring the OS is unmounting your dataset, seeing if there are any files that are preventing your ZFS pool/dataset from mounting, deleting anything that's there and preventing the mount, and then re-mounting the dataset using -N.

One comment here is: I had more luck mounting single datasets vs the whole pool. So "mount -N ZFSPoolName/backups" and "mount -N ZFSPoolName/Pictures" went way better than just "mount -N ZFSPoolName"
 
Alright I'll give that a shot later on today and keep you posted.
I greatly appreciate the assistance!
 
When I try to do a zpool import -N tank, I get the following error:
WARNING: Pool 'tank' has encountered an uncorrectable I/O failure and has been suspended.
Does that mean my pool is entirely unsalvageable?
 
One I had a problem / unknow error with ZFS.

It started then I wanted to clean old data, old snapshots. ZFS started to hang up. Nothing helped.

Tried to import the pool with -N - still the same. I did not know what ZFS was trying to do.

Only importing pool in read-only mode allowed to see pool content. After I copied all files to backup pool I had to recreate it.
 
Once I boot without drive, then plug my drives in. Run lsblk. Can't see the drives.
Have you tried live-booting plain old Linux OS (USB) - & then lsblk - do the drives show up? If they don't you've got other disk/HW related problems.
 
One I had a problem / unknow error with ZFS.

It started then I wanted to clean old data, old snapshots. ZFS started to hang up. Nothing helped.

Tried to import the pool with -N - still the same. I did not know what ZFS was trying to do.

Only importing pool in read-only mode allowed to see pool content. After I copied all files to backup pool I had to recreate it.
Seems like that's my only option.
I managed to import my pool in read only while in recovery mode.
I see my data.
However, seems a lot of my volumes are corrupt since I'm not able to copy them to my external drive that's connected to to :/
Oh well. I guess it's better than nothing.

I'm doing an e2fsck as a last resort. Then I'll start prepping new drives.
 
Damn. You're right. I should have researched more before attempting anything out of desperation :(

As for my drives, yes they seem to be "functional" but I won't be reusing these after I swap them out.

I get the following when running a zpool status -v
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!