[SOLVED] ZFS: How to Restart/Reload Without Rebooting

May 18, 2019
231
15
38
Varies
In an up to date Proxmox install, I have root on RAID1. There is another partition on ZFS, which has two VMs.

There is some free space in a deleted partition adjacent to partition 5, the ZFS partition (BTW across sda/sdb/sdc/sde), where partition 4 used to be. I am trying to get partition 5 to expand onto this unused space, but this is not the question in this thread at this time.

In order to do that (have 5 grow over where 4 was) I need to reload ZFS. How do I do that with proxmox? Unmounting the ZFS pool gets mounted right away automatically (maybe because the VMs are still running on it?).

How do I reload ZFS (even if it requires stopping the VMs)?
 
Unmounting the ZFS pool gets mounted right away automatically (maybe because the VMs are still running on it?).
You need to disable the ZFSPool storage at Datacenter->Storage in order to prevent PVE from autoimporting a pool within a second after exporting it...and of cause, you shouldn't export a pool in the first place that is in use with guests running on it...
 
  • Like
Reactions: Proxygen
You need to disable the ZFSPool storage at Datacenter->Storage in order to prevent PVE from autoimporting a pool within a second after exporting it...and of cause, you shouldn't export a pool in the first place that is in use with guests running on it...
What he's after can be done without ever interrupting availability (hint- in the same way as replacing individual disks to increase pool capacity.)
 
  • Like
Reactions: Proxygen
You need to disable the ZFSPool storage at Datacenter->Storage in order to prevent PVE from autoimporting a pool within a second after exporting it...and of cause, you shouldn't export a pool in the first place that is in use with guests running on it...

pardon my unfamiliarity with ZFS, does export/import need a different spot to clone the exported filesystem contents to?

from what I gather in https://docs.oracle.com/cd/E19253-01/819-5461/gbchy/index.html, this is NOT the case.

also, to accomplish my goal, some guides suggest an exporting/importing step, others do not.
 
would you mind elaborating on that?
glad to :) would just be a lot more difficult without your particulars. Having seen your setup, however, I'd advise NOT following what follows and instead reinstall proxmox-on-zfs properly as this configuration is cumbersome and easy to foul up.

but for educational purposes, here are the steps to follow. repeat for each disk in the pool:
1. offline the disk:
zpool offline zfs-storage-sdx5 sd[x]5
2. use parted (or whatever partition editor) to extend partition5
3. online it back:
zpool online zfs-storage-sdx5 sd[x]5
4. Wait for resilver to complete.

Needless to say, only offline ONE DISK AT A TIME. When all 4 disks have been gone through, the extra capacity will appear as if by magic. This will also be a good opportunity to switch your disk names to something more disk specific (sd[x] are dynamic names that are generated at boot, and may not always be correct.) luckily this is pretty simple to do:
zpool export zfs-storage-sdx5 zpool import zfs-storage-sdx5 -d /dev/disk/by-id
 
Last edited:
  • Like
Reactions: Proxygen
glad to :) would just be a lot more difficult without your particulars. Having seen your setup, however, I'd advise NOT following what follows and instead reinstall proxmox-on-zfs properly as this configuration is cumbersome and easy to foul up.

but for educational purposes, here are the steps to follow. repeat for each disk in the pool:
1. offline the disk:
zpool offline zfs-storage-sdx5 sd[x]5
2. use parted (or whatever partition editor) to extend partition5
3. online it back:
zpool online zfs-storage-sdx5 sd[x]5
4. Wait for resilver to complete.

Needless to say, only offline ONE DISK AT A TIME. When all 4 disks have been gone through, the extra capacity will appear as if by magic. This will also be a good opportunity to switch your disk names to something more disk specific (sd[x] are dynamic names that are generated at boot, and may not always be correct.) luckily this is pretty simple to do:
zpool export zfs-storage-sdx5 zpool import zfs-storage-sdx5 -d /dev/disk/by-id
I will give that a try if it comes down to it.

But I can't do ZFS for the whole system. LXC's fastest IO is via the dir driver directly on disk (1), and my applications are extremely IO intensive (on NVME). The 2 VMS on ZFS are the only ones that are not IO intensive. I initially used ZFS (instead of also mdadm) because I used to need deduplication, but that is no longer the case.

On the next proxmox install, I might try ZFS for the root mount. I haven't because it's mdadm is simpler to deal with.

1) Reference:
1664244300392.png
(my use case does use multiple containers but performance trumps all other considerations)
 
Last edited:
LXC's fastest IO is via the dir driver directly on disk (1), and my applications are extremely IO intensive (on NVME).
Then use dedicated drives. I don't see the point in partitioning stuff, your I/O will then be unpredictable.
Also, if there is a PostgreSQL database on the lxc, you will get way better performance on ZFS if you tune the database to take care of the features ZFS provides (atomic writes, disable block cache, compression ...

PS: zpool status doesn't make this clear but this is supposed to be a mirrored striped set (RAID10, to use standard RAID parlance)
Yes, it states clearly that it's a pool of two mirrored vdevs.

There is some free space in a deleted partition adjacent to partition 5, the ZFS partition (BTW across sda/sdb/sdc/sde), where partition 4 used to be. I am trying to get partition 5 to expand onto this unused space, but this is not the question in this thread at this time.
How much space is that? According to your lsblk output, there is (visually) no space left. It can only be a couple of GB per disks, isn't it?
 
Then use dedicated drives. I don't see the point in partitioning stuff, your I/O will then be unpredictable.
Also, if there is a PostgreSQL database on the lxc, you will get way better performance on ZFS if you tune the database to take care of the features ZFS provides (atomic writes, disable block cache, compression ...


Yes, it states clearly that it's a pool of two mirrored vdevs.


How much space is that? According to your lsblk output, there is (visually) no space left. It can only be a couple of GB per disks, isn't it?

The drives for the CTs with high IO demand are dedicated NVMEs. There are no Postgres DBs.

sd[x]4 is 100GB, so 400GB I'd like to reclaim.
 
Last edited:
could this be made even easier if I reboot instead?
A reboot isnt necessary at any step. the process will simply stop and then resume when the system is back up. also, your storage will not be offline unless you fuck up and remove multiple disks from the pool.

to @LnxBil's point, you're kinda doing it wrong. sharing your drives with other IO consumers will render your performance unpredictable regardless of file system. the REASON you want to use ZFS is not because it may be the best for all use cases, but because it is the best COMPROMISE for all the features and capabilities it exposes.

Yes, containers on zfs create pretty heavy write amplification- SO DONT DO THAT. Create a pool for your primary vm disks, and use the disks for the stores that need the IO performance directly. If I were designing this, I'd probably do something like:

2x small SSDs for OS
4x SSDs striped mirror for my VM primary storage
1 or 2x nvme for high IO demand applications. you can MD raid them if you wish but doing anything like that will likely just hurt your IOPs.

caveat: if the system supports U.2 nvme's, make it all nvme. otherwise I'd stay away from m.2 nvmes for everything except absolutely necessary since they're not hot swappable.
 
  • Like
Reactions: Proxygen
I get the point. But NVME space is costly, and my applications chew thru them very fast (1 year, 1.5 year at most. Yes, 2500 TBW and up drives).

I have multiple containers on each large NVME (there are 2 and they M.2, but the applications have redundant/failover setups in 2 other datacenters). The load is fairly steady, so there are no surprises. When there is a problem, it's related to my own activity and I get alerts at second resolution. Usually this only happens when downloading something from a very fast connection, or unpacking a several hundred GB file, and ionice/bwlimit/throttling the DL (aria2c) does the trick. It's completely manageable.

The OS does fine with RAID 1 (exaggeratedly mirrored across 4 drives).
The VMs do fine on spinning rust with mirror+stripes, they have very low IO demands.
The backups of the containers of the NVME disks go into the ZFS pool (keep in mind each CT has 2 other identical ones at different DCs on standby)

If I were building this today, I'd have 4x SSDs instead of spinning rust (for faster backups, since I can't use ZFS) and possibly use ZFS for root and those 2 VMs. I'd not use U.2, they are more expensive disks and replacing the NVME disks is already the main cost.

Overall, everything is safe (mirrored/backed up/redundant) and the performance acceptable for each demand level. Furthering performance would yield zero benefit.
 
Last edited:
Overall, everything is safe (mirrored/backed up/redundant) and the performance acceptable for each demand level. Furthering performance would yield zero benefit.
/shrug. Assuming you are responsible for operation, only you can be the judge of that. others (and myself) are giving you the benefit of our knowledge and experience on the assumption that's what you're asking for. You owe me no explanation to why you made the choices you did, it doesn't make any difference to me.
 
  • Like
Reactions: Proxygen
/shrug. Assuming you are responsible for operation, only you can be the judge of that. others (and myself) are giving you the benefit of our knowledge and experience on the assumption that's what you're asking for. You owe me no explanation to why you made the choices you did, it doesn't make any difference to me.
i appreciate you taking the time to share your thoughts, alex. i enjoy the discussion, and I am not trying to convince anyone - only show the rationale used. it's never one size fits all, and our use case is extremely unique.
 
found it faster to just move the items in ZFS to the NVME, re-created the ZFS pool, and moved it back. Much faster than dropping each disk at a time and waiting for the resilvering to finish. also fixed the disk reference. the live disk move feature of proxmox kicks ass, didn't even have to power off the VMs that run on ZFS :)

thanks!


1664946395280.png
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!