My VM doesn't boot anymore. The VM disk is on a ZFS. SMART passed. Any ideas?

I just noticed something here ... the VM 101 is your nextcloud but ... it has not drive on our NVME pool ... so now I get it, you use that 6-drive pool and make it BOTH the storage and boot drive of the nextcloud VM, right?
Yes, that's correct. Every drive (just two) related to the Nextcloud VM 101 is on that ZFS pool.

Untitled.jpg
 
Last edited:
Ok I understand it now.

Do you mind trying this before proceeding further yet?

But that said, do you mind changing the ref reservation value?

Something like:
Code:
zfs set refreserv=3T Nextcloud/vm-101-disk-1

As a side note - but is is NOT the cause of your issue - I would suggest to e.g. have that nextcloud vm have it's boot drive on that NVMe, much smaller and the storage only space mounted into it. That way you separate troubleshooting e.g. the VM and the backing up "just" the data.
 
Do you have access to the full logs (not just current boot), i.e. SSH and get the --since -until command. I really would like to see the command PVE uses to create raidz2 pool. I just have a weird hunch there's something off with the full refreservation, but I can't quite be sure - and I myself hate trying this and that as a matter of troubleshooting.

But that said, do you mind changing the ref reservation value?

Something like:
Code:
zfs set refreserv=3T Nextcloud/vm-101-disk-1

I'll try the zfs set refreserv=3T Nextcloud/vm-101-disk-1 command when the restoration is complete

I think this is the command that PvE used to create the raidz2 pool:

Code:
# /sbin/zpool create -o ashift=12 Nextcloud raidz2 /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456A /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456B /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456C /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456D /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456E /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456F
# /sbin/zfs set compression=on Nextcloud
# systemctl enable zfs-import@Nextcloud.service
Created symlink /etc/systemd/system/zfs-import.target.wants/zfs-import@Nextcloud.service -> /lib/systemd/system/zfs-import@.service.
TASK OK
 
Ok I understand it now.

Do you mind trying this before proceeding further yet?



As a side note - but is is NOT the cause of your issue - I would suggest to e.g. have that nextcloud vm have it's boot drive on that NVMe, much smaller and the storage only space mounted into it. That way you separate troubleshooting e.g. the VM and the backing up "just" the data.

I don't know how I would be able to separate them. I'll look into it.
For me it was easier to put both on the same pool for redundancy, but it makes sense having only the data.

The restoration is still in progress. It takes a while.
After that I'll try the zfs set refreserv=3T Nextcloud/vm-101-disk-1 command
 
I don't know how I would be able to separate them. I'll look into it.

We can discuss it here later, it's not that difficult, just for now I leave it out as the problem at hand is not related to this. It just makes a lot of things simpler.

For me it was easier to put both on the same pool for redundancy, but it makes sense having only the data.

For me it would make sense to have the VM as a ZVOL, but to have the data as normal dataset. At this point, I do not e.g. know how you use that entire space within the VM (i.e. what lsblk shows within the VM, in fact we do not even know what OS is in that VM:))

The restoration is still in progress. It takes a while.
After that I'll try the zfs set refreserv=3T Nextcloud/vm-101-disk-1 command

No worries, if you could also post the well known zfs list -o space after you had run it to see if it has done its thing would be nice.
 
We can discuss it here later, it's not that difficult, just for now I leave it out as the problem at hand is not related to this. It just makes a lot of things simpler.



For me it would make sense to have the VM as a ZVOL, but to have the data as normal dataset. At this point, I do not e.g. know how you use that entire space within the VM (i.e. what lsblk shows within the VM, in fact we do not even know what OS is in that VM:))



No worries, if you could also post the well known zfs list -o space after you had run it to see if it has done its thing would be nice.

Oh! That VM is running Ubuntu Server 24.04 LTS
I installed that and the only thing inside it is the Nextcloud AIO
I'll post the lsblk from within the VM once complete.

So just to confirm, after the restoration is complete I have a few pending commands you want me to run.
Do you mind telling me the order in which you want me to run them? No rush.

1) lsblk (from inside the VM)

2) zfs set refreserv=3T Nextcloud/vm-101-disk-1

3) zfs list -o space

4)
mkdir /mnt/testmp
mount /dev/zvol/Nextcloud/vm-101-disk-1 /mnt/testmp
ls /mnt/testmp
umount /mnt/testmp
dmesg -e
 
I think this is the command that PvE used to create the raidz2 pool:

Code:
# /sbin/zpool create -o ashift=12 Nextcloud raidz2 /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456A /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456B /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456C /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456D /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456E /dev/disk/by-id/ata-WDC_WD1002FBYS-0XXXX0_WD-WMATV123456F
# /sbin/zfs set compression=on Nextcloud
# systemctl enable zfs-import@Nextcloud.service
Created symlink /etc/systemd/system/zfs-import.target.wants/zfs-import@Nextcloud.service -> /lib/systemd/system/zfs-import@.service.
TASK OK

And yes, just to not ghost you on this one, it is indeed the command assembling the pool. My bad, I was actually interested in how the zvols were created too but you only would see that first time you are creating them since now I understood you are restoring it.

Side node ... the ashift 12 might not be the best choice for raidz2 in terms of overhead, but again it's not, at least directly, responsible for your issues now.
 
Oh! That VM is running Ubuntu Server 24.04 LTS
I installed that and the only thing inside it is the Nextcloud AIO
I'll post the lsblk from within the VM once complete.

So just to confirm, after the restoration is complete I have a few pending commands you want me to run.
Do you mind telling me the order in which you want me to run them? No rush.

So I would be interested in this order (note the extra notes below):

2) zfs set refreserv=3T Nextcloud/vm-101-disk-1

3) zfs list -o space

This two before you are attempting to start the VM even.

You can then try starting the VM and observing if you get anything in dmesg.

If it starts, you can use that opportunity to provide the:

1) lsblk (from inside the VM)

In fact you can do lsblk -o+FSTYPE

And if you cannot even boot it or you end up in the same situation like you had started your thread with, you could basically (so you do not have to go on restoring all the time):

4)
mkdir /mnt/testmp
mount /dev/zvol/Nextcloud/vm-101-disk-1 /mnt/testmp
ls /mnt/testmp

And then at this point still you can e.g. echo "testing only" > /mnt/testmp/test.txt and watch the dmesg. Afterwards you can unmount it (be sure not to be within the mounted directory at the time):

umount /mnt/testmp
 
The restore is complete and it automatically started the VM.
Same errors as always but before shutting it down, I ran lsblk (from inside the VM) so here it is:


Untitled.jpg
 
Code:
NAME                     AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
Nextcloud                 531G  3.00T        0B    192K             0B      3.00T
Nextcloud/vm-101-disk-0   531G     3M        0B    160K          2.84M         0B
Nextcloud/vm-101-disk-1  3.39T     3T        0B    137G          2.87T         0B
 
As you said, I started the VM after these two commands:

zfs set refreserv=3T Nextcloud/vm-101-disk-1
zfs list -o space

No errors so far.
Here is the only output I got when I ran dmesg -e

Code:
[Jul18 15:54] tap101i0: entered promiscuous mode
[  +0.044919] vmbr0: port 5(fwpr101p0) entered blocking state
[  +0.000004] vmbr0: port 5(fwpr101p0) entered disabled state
[  +0.000014] fwpr101p0: entered allmulticast mode
[  +0.000038] fwpr101p0: entered promiscuous mode
[  +0.000020] vmbr0: port 5(fwpr101p0) entered blocking state
[  +0.000002] vmbr0: port 5(fwpr101p0) entered forwarding state
[  +0.008783] fwbr101i0: port 1(fwln101i0) entered blocking state
[  +0.000004] fwbr101i0: port 1(fwln101i0) entered disabled state
[  +0.000014] fwln101i0: entered allmulticast mode
[  +0.000035] fwln101i0: entered promiscuous mode
[  +0.000030] fwbr101i0: port 1(fwln101i0) entered blocking state
[  +0.000002] fwbr101i0: port 1(fwln101i0) entered forwarding state
[  +0.008831] fwbr101i0: port 2(tap101i0) entered blocking state
[  +0.000005] fwbr101i0: port 2(tap101i0) entered disabled state
[  +0.000010] tap101i0: entered allmulticast mode
[  +0.000054] fwbr101i0: port 2(tap101i0) entered blocking state
[  +0.000002] fwbr101i0: port 2(tap101i0) entered forwarding state
 
Last edited:
Keep testing! :) Would be funny if all it took was my hunch in this case.

As for the lsblk in the VM ... you have the entire 3T+ as ext4 .... i think it's a big pity to use it this way ... have a zvol on the host and then present it as ext4 into the VM and use it for both OS and data. But this would be for another thread.
 
Bro, you're a wizard.
I will let this run for 24 hours to test before I try that about the zvol.

Can you explain the command?

zfs set refreserv=3T Nextcloud/vm-101-disk-1
 
Bro, you're a wizard.
I will let this run for 24 hours to test before I try that about the zvol.

Can you explain the command?

zfs set refreserv=3T Nextcloud/vm-101-disk-1

Well, this is like asking me to open the can of worms. :) Especially if my hunch was right. The thing is, the moment you posted you first screenshot showing AVAIL as 0 on the pool I have written it off as not enough space.

Then I realised oh but wait PVE uses the REFRESERV (it's an extra option) to "save" people from overfilling. Basically it's a topic that I would be best to answer with have a look at deep dive ZFS guide first (just to align on the terminology). There's the pool, then there's datasets, the normal kind, and a special kind of dataset is a zvol (which makes the space appear as a block device - something you really have to do when you present it to VMs).

Because datasets can have children and there are snapshots and all the other features like compression, deduplication, clones, etc. - it's a very different concept of what constitutes "FREE" space. On top of that, you are running a RAIDZ2, that's yet another layer of complexity where what is actually "AVAIL" is what you called "usable capacity" (you probably calculate it differently yet but believe in that AVAIL number) - you have some overhead because any data stored requires parity data to be stored alongside elsewhere, then you have different amount of space wasted depending on how you set the ashift (especially for RAIDZ2). It's all quite counterintuitive.

To cut it short (if you are on the same page with the terminology), the REFRESERV is basically reserved space for a dataset because otherwise ZFS is basically thinly provisioned. The difference between REFRESERV and RESERVE is that REFRESERV does not include e.g. snapshots - you are supposedly really making sure that the space will be there for user data, i.e. you rather have snapshot creation fail than not have that (previously seemingly available) space. On the flipside, this reservation bites away from the parent, i.e. the pool than has no way to allow that space to be used by e.g. another dataset. But I find this case to be extreme in that it left 0 AVAIL for the pool itself.

I do not quite know why a write to your zvol is causing failures because you clearly have lots of AVAIL space in that dataset, but I have to guess it is trying to write something somewhere that requires some space to be available that does not count towards the dataset per se (but something else in the pool). For all I know it might even be any sort of bug related to RAIDZ2 which is notorious for showing confusing numbers when it comes to FREE/AVAIL/DSUSED etc.

Let's wait if it really resolved your problem.

NB Be aware this does not really take away any space from you, it just does not guarantee the ZVOL the entire space, but "only" 3TB. I suspect something needs to be written somewhere else in that pool that was left with 0 usable space with that value set. I reckon the value is set courtesy of PVE GUI. It would be interesting to nail this down further because guys developing PVE should be interesting in this and might want change their defaults.

Again, this all in case my hunch was correct. If the above was more confusing than explanatory, my apologies, but you really need to grab a whole book on ZFS and then RAIDZ2 chapter on top. :D
 
Oh and btw, you might want to edit title of your thread to something like:

ZFS RAIDZ2 "lost async page write" - max REFRESERV ?

Maybe there's a ZFS guru amongst us to reply off the bat and that would attract them.
 
One more thing ...

Code:
NAME                     AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
Nextcloud                 531G  3.00T        0B    192K             0B      3.00T
Nextcloud/vm-101-disk-0   531G     3M        0B    160K          2.84M         0B
Nextcloud/vm-101-disk-1  3.39T     3T        0B    137G          2.87T         0B

It's only now that I realised the disk-0 is actually EFI and it only got REFRESERV 3MB.


This previously (before you lowered your disk-1 REFRESERV) meant your 1GB partition could not really have anything beyond 3MB on it.

While it looks there's never been beyond 160KB used out of it, I have to wonder now ...

Same errors as always

Can you check in the log (after the last restore), which exact dev (e.g. dev zd16) were the errors reported on? And check whether it is indeed the same as symlinked from disk 1 or 0 by checking ls -l /dev/zvol/Nextcloud/ ?

The remaining question then is how does the GUI even e.g. allow you to create thick provisioned zvol of size 1GB with REFRESERV 3MB ...
 
Last edited:
Code:
vm-101-disk-0 -> ../../zd0
vm-101-disk-1 -> ../../zd16
vm-101-disk-1-part1 -> ../../zd16p1
vm-101-disk-1-part2 -> ../../zd16p2

So far still no errors.
The errors that showed before the fix were for both zd0 and zd16.

Does that mean I need to run another command for drive 0 (EFI)?
Also, do you know if the available space on Nextcloud would be reduced after the fix?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!