[SOLVED] ZFS: How to Restart/Reload Without Rebooting

Proxygen · Sep 26, 2022

In an up to date Proxmox install, I have root on RAID1. There is another partition on ZFS, which has two VMs.

There is some free space in a deleted partition adjacent to partition 5, the ZFS partition (BTW across sda/sdb/sdc/sde), where partition 4 used to be. I am trying to get partition 5 to expand onto this unused space, but this is not the question in this thread at this time.

In order to do that (have 5 grow over where 4 was) I need to reload ZFS. How do I do that with proxmox? Unmounting the ZFS pool gets mounted right away automatically (maybe because the VMs are still running on it?).

How do I reload ZFS (even if it requires stopping the VMs)?

alexskysilk · Sep 26, 2022

you wont need to reload anything.

Please post the output of:
lsblk
zpool status

Dunuin · Sep 27, 2022

Gaia said:
Unmounting the ZFS pool gets mounted right away automatically (maybe because the VMs are still running on it?).

You need to disable the ZFSPool storage at Datacenter->Storage in order to prevent PVE from autoimporting a pool within a second after exporting it...and of cause, you shouldn't export a pool in the first place that is in use with guests running on it...

alexskysilk · Sep 27, 2022

Dunuin said:
You need to disable the ZFSPool storage at Datacenter->Storage in order to prevent PVE from autoimporting a pool within a second after exporting it...and of cause, you shouldn't export a pool in the first place that is in use with guests running on it...

What he's after can be done without ever interrupting availability (hint- in the same way as replacing individual disks to increase pool capacity.)

Proxygen · Sep 27, 2022

alexskysilk said:
you wont need to reload anything.

Please post the output of:
lsblk
zpool status

lsblk

zpool status

PS: zpool status doesn't make this clear but this is supposed to be a mirrored striped set (RAID10, to use standard RAID parlance)

Proxygen · Sep 27, 2022

alexskysilk said:
What he's after can be done without ever interrupting availability (hint- in the same way as replacing individual disks to increase pool capacity.)

would you mind elaborating on that?

Proxygen · Sep 27, 2022

Dunuin said:
You need to disable the ZFSPool storage at Datacenter->Storage in order to prevent PVE from autoimporting a pool within a second after exporting it...and of cause, you shouldn't export a pool in the first place that is in use with guests running on it...

pardon my unfamiliarity with ZFS, does export/import need a different spot to clone the exported filesystem contents to?

from what I gather in https://docs.oracle.com/cd/E19253-01/819-5461/gbchy/index.html, this is NOT the case.

also, to accomplish my goal, some guides suggest an exporting/importing step, others do not.

alexskysilk · Sep 27, 2022

Gaia said:
would you mind elaborating on that?

glad to

would just be a lot more difficult without your particulars. Having seen your setup, however, I'd advise NOT following what follows and instead reinstall proxmox-on-zfs properly as this configuration is cumbersome and easy to foul up.

but for educational purposes, here are the steps to follow. repeat for each disk in the pool:
1. offline the disk:
zpool offline zfs-storage-sdx5 sd[x]5
2. use parted (or whatever partition editor) to extend partition5
3. online it back:
zpool online zfs-storage-sdx5 sd[x]5
4. Wait for resilver to complete.

Needless to say, only offline ONE DISK AT A TIME. When all 4 disks have been gone through, the extra capacity will appear as if by magic. This will also be a good opportunity to switch your disk names to something more disk specific (sd[x] are dynamic names that are generated at boot, and may not always be correct.) luckily this is pretty simple to do:

zpool export zfs-storage-sdx5
zpool import zfs-storage-sdx5 -d /dev/disk/by-id

Proxygen · Sep 27, 2022

alexskysilk said:
glad to would just be a lot more difficult without your particulars. Having seen your setup, however, I'd advise NOT following what follows and instead reinstall proxmox-on-zfs properly as this configuration is cumbersome and easy to foul up.

but for educational purposes, here are the steps to follow. repeat for each disk in the pool:
1. offline the disk:
zpool offline zfs-storage-sdx5 sd[x]5
2. use parted (or whatever partition editor) to extend partition5
3. online it back:
zpool online zfs-storage-sdx5 sd[x]5
4. Wait for resilver to complete.

Needless to say, only offline ONE DISK AT A TIME. When all 4 disks have been gone through, the extra capacity will appear as if by magic. This will also be a good opportunity to switch your disk names to something more disk specific (sd[x] are dynamic names that are generated at boot, and may not always be correct.) luckily this is pretty simple to do:
zpool export zfs-storage-sdx5 zpool import zfs-storage-sdx5 -d /dev/disk/by-id

I will give that a try if it comes down to it.

But I can't do ZFS for the whole system. LXC's fastest IO is via the dir driver directly on disk (1), and my applications are extremely IO intensive (on NVME). The 2 VMS on ZFS are the only ones that are not IO intensive. I initially used ZFS (instead of also mdadm) because I used to need deduplication, but that is no longer the case.

On the next proxmox install, I might try ZFS for the root mount. I haven't because it's mdadm is simpler to deal with.

1) Reference:

(my use case does use multiple containers but performance trumps all other considerations)

LnxBil · Sep 27, 2022

Gaia said:
LXC's fastest IO is via the dir driver directly on disk (1), and my applications are extremely IO intensive (on NVME).

Then use dedicated drives. I don't see the point in partitioning stuff, your I/O will then be unpredictable.
Also, if there is a PostgreSQL database on the lxc, you will get way better performance on ZFS if you tune the database to take care of the features ZFS provides (atomic writes, disable block cache, compression ...

Gaia said:
PS: zpool status doesn't make this clear but this is supposed to be a mirrored striped set (RAID10, to use standard RAID parlance)

Yes, it states clearly that it's a pool of two mirrored vdevs.

Gaia said:
There is some free space in a deleted partition adjacent to partition 5, the ZFS partition (BTW across sda/sdb/sdc/sde), where partition 4 used to be. I am trying to get partition 5 to expand onto this unused space, but this is not the question in this thread at this time.

How much space is that? According to your lsblk output, there is (visually) no space left. It can only be a couple of GB per disks, isn't it?

Proxygen · Sep 27, 2022

LnxBil said:
Then use dedicated drives. I don't see the point in partitioning stuff, your I/O will then be unpredictable.
Also, if there is a PostgreSQL database on the lxc, you will get way better performance on ZFS if you tune the database to take care of the features ZFS provides (atomic writes, disable block cache, compression ...

Yes, it states clearly that it's a pool of two mirrored vdevs.

How much space is that? According to your lsblk output, there is (visually) no space left. It can only be a couple of GB per disks, isn't it?

The drives for the CTs with high IO demand are dedicated NVMEs. There are no Postgres DBs.

sd[x]4 is 100GB, so 400GB I'd like to reclaim.

Proxygen · Sep 27, 2022

@alexskysilk, could this be made even easier if I reboot instead? there is an upcoming scheduled downtime to install an extra drive in the machine, and there isn't a hurry to reclaim the unused space.

alexskysilk · Sep 27, 2022

Gaia said:
could this be made even easier if I reboot instead?

A reboot isnt necessary at any step. the process will simply stop and then resume when the system is back up. also, your storage will not be offline unless you fuck up and remove multiple disks from the pool.

to @LnxBil's point, you're kinda doing it wrong. sharing your drives with other IO consumers will render your performance unpredictable regardless of file system. the REASON you want to use ZFS is not because it may be the best for all use cases, but because it is the best COMPROMISE for all the features and capabilities it exposes.

Yes, containers on zfs create pretty heavy write amplification- SO DONT DO THAT. Create a pool for your primary vm disks, and use the disks for the stores that need the IO performance directly. If I were designing this, I'd probably do something like:

2x small SSDs for OS
4x SSDs striped mirror for my VM primary storage
1 or 2x nvme for high IO demand applications. you can MD raid them if you wish but doing anything like that will likely just hurt your IOPs.

caveat: if the system supports U.2 nvme's, make it all nvme. otherwise I'd stay away from m.2 nvmes for everything except absolutely necessary since they're not hot swappable.

Proxygen · Sep 27, 2022

I get the point. But NVME space is costly, and my applications chew thru them very fast (1 year, 1.5 year at most. Yes, 2500 TBW and up drives).

I have multiple containers on each large NVME (there are 2 and they M.2, but the applications have redundant/failover setups in 2 other datacenters). The load is fairly steady, so there are no surprises. When there is a problem, it's related to my own activity and I get alerts at second resolution. Usually this only happens when downloading something from a very fast connection, or unpacking a several hundred GB file, and ionice/bwlimit/throttling the DL (aria2c) does the trick. It's completely manageable.

The OS does fine with RAID 1 (exaggeratedly mirrored across 4 drives).
The VMs do fine on spinning rust with mirror+stripes, they have very low IO demands.
The backups of the containers of the NVME disks go into the ZFS pool (keep in mind each CT has 2 other identical ones at different DCs on standby)

If I were building this today, I'd have 4x SSDs instead of spinning rust (for faster backups, since I can't use ZFS) and possibly use ZFS for root and those 2 VMs. I'd not use U.2, they are more expensive disks and replacing the NVME disks is already the main cost.

Overall, everything is safe (mirrored/backed up/redundant) and the performance acceptable for each demand level. Furthering performance would yield zero benefit.

alexskysilk · Sep 27, 2022

Gaia said:
Overall, everything is safe (mirrored/backed up/redundant) and the performance acceptable for each demand level. Furthering performance would yield zero benefit.

/shrug. Assuming you are responsible for operation, only you can be the judge of that. others (and myself) are giving you the benefit of our knowledge and experience on the assumption that's what you're asking for. You owe me no explanation to why you made the choices you did, it doesn't make any difference to me.

Proxygen · Sep 27, 2022

alexskysilk said:
/shrug. Assuming you are responsible for operation, only you can be the judge of that. others (and myself) are giving you the benefit of our knowledge and experience on the assumption that's what you're asking for. You owe me no explanation to why you made the choices you did, it doesn't make any difference to me.

i appreciate you taking the time to share your thoughts, alex. i enjoy the discussion, and I am not trying to convince anyone - only show the rationale used. it's never one size fits all, and our use case is extremely unique.

Proxygen · Oct 5, 2022

found it faster to just move the items in ZFS to the NVME, re-created the ZFS pool, and moved it back. Much faster than dropping each disk at a time and waiting for the resilvering to finish. also fixed the disk reference. the live disk move feature of proxmox kicks ass, didn't even have to power off the VMs that run on ZFS

thanks!

Search

Search

[SOLVED] ZFS: How to Restart/Reload Without Rebooting

Proxygen

Active Member

alexskysilk

Distinguished Member

Dunuin

Distinguished Member

alexskysilk

Distinguished Member

Proxygen

Active Member

Proxygen

Active Member

Proxygen

Active Member

alexskysilk

Distinguished Member

Proxygen

Active Member

LnxBil

Distinguished Member

Proxygen

Active Member

Proxygen

Active Member

alexskysilk

Distinguished Member

Proxygen

Active Member

alexskysilk

Distinguished Member

Proxygen

Active Member

Proxygen

Active Member