Restoring partitions on replaces ZFS-mirrored SSD

esi_y · Feb 18, 2024

alyarb said:
The PVE ZFS GUI could definitely benefit from some work in this area.

Which one? He asked about replacing a vdev, cloning partition table, regenerating UUIDs, boot-tool and now logging.

alyarb · Feb 18, 2024

prioritize all of it. the most important would be replacing an rpool member, cloning the partition table from a good drive and handling the proxmox-boot-tool portion before executing the zpool attach.

I think that's an operation that almost all noobs will struggle with because it's easy to take for granted that the redundant zfs data is just one partition. I scrtiped it years ago for a narrow use case and I still have to refer back to it for the steps because I don't do it enough to remember it.

zfs is quickly becoming the "premier" local storage for proxmox, if it isn't already. the PVE GUI should have the complete set of tools.

replace/detach/attach is not asking much and the boot tool is such a PVE-specific thing it absolutely must not be taken for granted.

esi_y · Feb 18, 2024

Not that I a disgree, but why not file Bugzilla reports for each as feature request and tracked them through here? I noticed previously even on actual bugs people rarely do more than a post in the forum. They do not even +1 themselves on existing reports. There even might be some of what you asked for already filed.

(BTW I was asking the other day about the boot-toll if it can e.g. smartly have a hot spare and auto-create ESP, etc. Similarly if one replaces a failed drive, it would have been nice if all it takes is to hot-swap and forget it. But it's not implemented and then I thought about it - I do not think it's even a priority because the kind of users that need that use a RAID controller for the OS and are well-versed with zfs. Personally the last thing I would want is a ZFS GUI feature that called an API that went wrong ... if you zfs destroy your pool in your terminal, that's easier to brush off here than to handle you when you clicked the same in GUI.

)

alyarb · Feb 18, 2024

I'm sure stuff like this is already filed. I have another feature request with tons of +1's going on 5 years old.

it took less than a day to implement in my feasibility testing. it's never getting done, I'm not wasting any more time on obvious feature enrichment. everyone knows the zfs gui is incomplete.

Dunuin · Feb 18, 2024

alyarb said:
I'm sure stuff like this is already filed

Yes: https://bugzilla.proxmox.com/show_bug.cgi?id=3289

I also think this should be added to the webUI. Stopped counting the cases exactly like this one where I needed to explain people how to revert the wrong "zpool replace" and do it the proper way according to wiki with cloning the partitions and bootloader. And they often even fail to follow the wiki article with using wrong disks or partitions for the placeholders.

esi_y · Feb 18, 2024

alyarb said:
everyone knows the zfs gui is incomplete.

Dunuin said:
Yes: https://bugzilla.proxmox.com/show_bug.cgi?id=3289

And it's not an "important feature for PVE" - I mean, at least it's answered directly, so no need to wonder.

Dunuin said:
I also think this should be added to the webUI. Stopped counting the cases exactly like this one where I needed to explain people how to revert the wrong "zpool replace" and do it the proper way according to wiki with cloning the partitions and bootloader.

But are they PVE subscribers? There's outstanding bugs that would have higher priority than this.

Dunuin said:
And they often even fail to follow the wiki article with using wrong disks or partitions for the placeholders.

That's what the support is for.

alyarb · Feb 18, 2024

don't ask people to file feature requests then. they aren't getting done.

those who can just make their own.

esi_y · Feb 18, 2024

alyarb said:
I'm sure stuff like this is already filed. I have another feature request with tons of +1's going on 5 years old.

Care to share?

alyarb said:
it took less than a day to implement in my feasibility testing. it's never getting done, I'm not wasting any more time on obvious feature enrichment. everyone knows the zfs gui is incomplete.

It might be a strategy, I think I get answered all the time something is not worth doing out of habit to see how much it riles one up.

Still, the +1s would have helped to get an idea, I rarely see them in PVE's BZ.

Dunuin · Feb 18, 2024

tempacc346235 said:
But are they PVE subscribers?

Sometimes yes...paying for a subscription doesn't guarantee that the person is experienced in working with partitions, ZFS or Linux CLI in general.
There is a lot that could end very bad (disk of a mirror failed, then puttng a factory-preformated new disk in and cloning the new disks GPT wiping the remaining existing disk and so on) so I would really like to see such a important feature in the webUI to reduce keep human errors.

esi_y · Feb 18, 2024

alyarb said:
don't ask people to file feature requests then. they aren't getting done.

those who can just make their own.

I don't work for Proxmox, I just did the same to see how many do +1, even they came to forum asked support, would point them to existing links and ask them since they encountered the same if they can indicate +1 for the bug report. No, nobody.

esi_y · Feb 18, 2024

Dunuin said:
Sometimes yes...

Then it' a cost/benefit. I suspect the few times this reaches support the load is so small (also easy to dispense link to a wiki) that it's literally less cost than to implement whole new interface on one particular storage type only in what is not a storage appliance at all. I mean, last time I pointed out how HA stack was lacking with ZFS replications, Dietmar told me I am using it wrong (as in, not shared storage), Aaron that it's all good actually. Sometimes I wonder how the worklfow is, i.e. who picks up the forum / bz / decide prio / put onto roadmap ... it feels like nothing like that is in place, it's more driven by support cases I can imagine - best idea would be when one sees kernel modules that were added due to "user request", strange things like JFS support, etc. That's where the time goes then. Not homelabs.

Sasha · Feb 19, 2024

tempacc346235 said:

You might want to have a look at:

Code:

journalctl -u zfs-zed

# it's not like they know what your /dev/... was though so maybe you want to show it with
lsblk -o+SERIAL

# better yet have a look at SMART output
smartctl -a /dev/...

# or for nvmes even better yet
apt install nvme-cli
nvme error-log -e 255 /dev/nvme...

It sounds mystique, but the only provement about incident remains system email:

Code:

ZFS has detected that a device was removed.

 impact: Fault tolerance of the pool may be compromised.
    eid: 6
  class: statechange
  state: REMOVED
   host: bs
   time: 2024-02-17 07:51:44+0500
  vpath: /dev/disk/by-id/nvme-nvme.126f-4d4e32333039353132473030383333-51323030304d4e2035313247-00000001-part3
  vguid: 0x65FD66A420471F69
   pool: rpool (0x4010E6931FD3CBB7)

smartctl -a /dev/... shows Available Spare: 93% so it was enough to return bad SSD to shop.

Appreciate to all good people for help!

Search

Search

Restoring partitions on replaces ZFS-mirrored SSD

esi_y

Renowned Member

alyarb

Well-Known Member

esi_y

Renowned Member

alyarb

Well-Known Member

Dunuin

Distinguished Member

esi_y

Renowned Member

alyarb

Well-Known Member

esi_y

Renowned Member

Dunuin

Distinguished Member

esi_y

Renowned Member

esi_y

Renowned Member

Sasha

Well-Known Member