Any ideas?Nothing interesting in the zvol properties, they are identical except dates and IDs.
Could you also share the log for the successful migration for comparison? I wasn't able to reproduce the issue yet locally.
Any ideas?Nothing interesting in the zvol properties, they are identical except dates and IDs.
Could you also share the log for the successful migration for comparison? I wasn't able to reproduce the issue yet locally.
smartctl -a /dev/XYZ
. But maybe the IO error is network-related. I'm not aware of any other user reporting this issue and wasn't able to reproduce it myself.It shouldn't, but it's better to follow every lead.I did originally have it on local zfs, then moved it to nfs, then back to local zfs.
How would this have affected it?
It's not clear why the IO error happens. It's unfortunate the logs don't contain any hint about that either.Is there a way to fix it?
Code:root@pve1:~# zfs get all rpool/data/vm-101-disk-0 NAME PROPERTY VALUE SOURCE rpool/data/vm-101-disk-0 type volume - rpool/data/vm-101-disk-0 creation Sat Jul 20 17:04 2024 - rpool/data/vm-101-disk-0 used 56K - rpool/data/vm-101-disk-0 available 874G - rpool/data/vm-101-disk-0 referenced 56K - rpool/data/vm-101-disk-0 compressratio 1.00x - rpool/data/vm-101-disk-0 reservation none default rpool/data/vm-101-disk-0 volsize 1M local rpool/data/vm-101-disk-0 volblocksize 16K default rpool/data/vm-101-disk-0 checksum on default rpool/data/vm-101-disk-0 compression on inherited from rpool rpool/data/vm-101-disk-0 readonly off default rpool/data/vm-101-disk-0 createtxg 327 - rpool/data/vm-101-disk-0 copies 1 default rpool/data/vm-101-disk-0 refreservation none default rpool/data/vm-101-disk-0 guid 12720012326769879953 - rpool/data/vm-101-disk-0 primarycache all default rpool/data/vm-101-disk-0 secondarycache all default rpool/data/vm-101-disk-0 usedbysnapshots 0B - rpool/data/vm-101-disk-0 usedbydataset 56K - rpool/data/vm-101-disk-0 usedbychildren 0B - rpool/data/vm-101-disk-0 usedbyrefreservation 0B - rpool/data/vm-101-disk-0 logbias latency default rpool/data/vm-101-disk-0 objsetid 141 - rpool/data/vm-101-disk-0 dedup off default rpool/data/vm-101-disk-0 mlslabel none default rpool/data/vm-101-disk-0 sync standard inherited from rpool rpool/data/vm-101-disk-0 refcompressratio 1.00x - rpool/data/vm-101-disk-0 written 0 - rpool/data/vm-101-disk-0 logicalused 28K - rpool/data/vm-101-disk-0 logicalreferenced 28K - rpool/data/vm-101-disk-0 volmode default default rpool/data/vm-101-disk-0 snapshot_limit none default rpool/data/vm-101-disk-0 snapshot_count none default rpool/data/vm-101-disk-0 snapdev hidden default rpool/data/vm-101-disk-0 context none default rpool/data/vm-101-disk-0 fscontext none default rpool/data/vm-101-disk-0 defcontext none default rpool/data/vm-101-disk-0 rootcontext none default rpool/data/vm-101-disk-0 redundant_metadata all default rpool/data/vm-101-disk-0 encryption off default rpool/data/vm-101-disk-0 keylocation none default rpool/data/vm-101-disk-0 keyformat none default rpool/data/vm-101-disk-0 pbkdf2iters 0 default rpool/data/vm-101-disk-0 snapshots_changed Thu Jul 25 10:30:10 2024 - rpool/data/vm-101-disk-0 prefetch all default
Code:root@pve2:~# zfs get all rpool/data/vm-101-disk-0 NAME PROPERTY VALUE SOURCE rpool/data/vm-101-disk-0 type volume - rpool/data/vm-101-disk-0 creation Sat Jul 20 19:54 2024 - rpool/data/vm-101-disk-0 used 56K - rpool/data/vm-101-disk-0 available 878G - rpool/data/vm-101-disk-0 referenced 56K - rpool/data/vm-101-disk-0 compressratio 1.00x - rpool/data/vm-101-disk-0 reservation none default rpool/data/vm-101-disk-0 volsize 1M local rpool/data/vm-101-disk-0 volblocksize 16K default rpool/data/vm-101-disk-0 checksum on default rpool/data/vm-101-disk-0 compression on inherited from rpool rpool/data/vm-101-disk-0 readonly off default rpool/data/vm-101-disk-0 createtxg 1555 - rpool/data/vm-101-disk-0 copies 1 default rpool/data/vm-101-disk-0 refreservation none default rpool/data/vm-101-disk-0 guid 9213793783124796618 - rpool/data/vm-101-disk-0 primarycache all default rpool/data/vm-101-disk-0 secondarycache all default rpool/data/vm-101-disk-0 usedbysnapshots 0B - rpool/data/vm-101-disk-0 usedbydataset 56K - rpool/data/vm-101-disk-0 usedbychildren 0B - rpool/data/vm-101-disk-0 usedbyrefreservation 0B - rpool/data/vm-101-disk-0 logbias latency default rpool/data/vm-101-disk-0 objsetid 369 - rpool/data/vm-101-disk-0 dedup off default rpool/data/vm-101-disk-0 mlslabel none default rpool/data/vm-101-disk-0 sync standard inherited from rpool rpool/data/vm-101-disk-0 refcompressratio 1.00x - rpool/data/vm-101-disk-0 written 0 - rpool/data/vm-101-disk-0 logicalused 28K - rpool/data/vm-101-disk-0 logicalreferenced 28K - rpool/data/vm-101-disk-0 volmode default default rpool/data/vm-101-disk-0 snapshot_limit none default rpool/data/vm-101-disk-0 snapshot_count none default rpool/data/vm-101-disk-0 snapdev hidden default rpool/data/vm-101-disk-0 context none default rpool/data/vm-101-disk-0 fscontext none default rpool/data/vm-101-disk-0 defcontext none default rpool/data/vm-101-disk-0 rootcontext none default rpool/data/vm-101-disk-0 redundant_metadata all default rpool/data/vm-101-disk-0 encryption off default rpool/data/vm-101-disk-0 keylocation none default rpool/data/vm-101-disk-0 keyformat none default rpool/data/vm-101-disk-0 pbkdf2iters 0 default rpool/data/vm-101-disk-0 snapshots_changed Thu Jul 25 10:30:11 2024 - rpool/data/vm-101-disk-0 prefetch all default
zfs set refreservation=10M rpool/data/vm-101-disk-0
This seemed to help, I pulled the plug on pve1 and let it fail over then brought it back online and it failed back. It has done this before successfully (rarely) so I am not 100% sure, but it looks like it might have done it. I wish I checked what the setting was before I executed it. Thanks!A bit of a shot in the dark, but can you try:
zfs set refreservation=10M rpool/data/vm-101-disk-0
This seemed to help, I pulled the plug on pve1 and let it fail over then brought it back online and it failed back. It has done this before successfully (rarely) so I am not 100% sure, but it looks like it might have done it. I wish I checked what the setting was before I executed it. Thanks!
Will keep an eye on it and see if it continues to work.
I wish I checked what the setting was before I executed it.
Code:rpool/data/vm-101-disk-0 refreservation none default
The storage configuration for ZFS has a@fiona I would not want to pretend I know all the intricacies of PVE especially with migrations, any idea how the OP could have ended up with a sparse efidisk just by moving it around within PVE environment (on and off NFS)? The default for zfs create -V is thick unless -s is specified, I do not think PVE code does that anywhere?
sparse
setting. I could not reproduce the issue with that either though.The storage configuration for ZFS has asparse
setting. I could not reproduce the issue with that either though.
I don't think sparse and zvol is a good idea generally
These effects can also occur when the volume size is changed while it is in use (particularly when shrinking the size). Extreme care should be used when adjusting the volume size.
So that I don't talk with nothing backing me up whatsover:
https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops.7.html#volsize
Without the reservation, the volume could run out of space, resulting in undefined behavior or data corruption, depending on how the volume is used. These effects can also occur when the volume size is changed while it is in use (particularly when shrinking the size). Extreme care should be used when adjusting the volume size.
No, it was a tiny fraction of available space.Of course, when you run out of space, thin provisioning has issues. That's the big downside of thin provisioning in general.Code:Without the reservation, the volume could run out of space, resulting in undefined behavior or data corruption, depending on how the volume is used. These effects can also occur when the volume size is changed while it is in use (particularly when shrinking the size). Extreme care should be used when adjusting the volume size.
@ballybob I suppose you did not run out of space, or?
Of course, when you run out of space, thin provisioning has issues. That's the big downside of thin provisioning in general.Code:Without the reservation, the volume could run out of space, resulting in undefined behavior or data corruption, depending on how the volume is used. These effects can also occur when the volume size is changed while it is in use (particularly when shrinking the size). Extreme care should be used when adjusting the volume size.
EDIT: and for completeness, shrinking volumes via PVE UI/API/CLI is not even allowed exactly because it's very dangerous.
@ballybob I suppose you did not run out of space, or?
I noticed that your EFI disk has volsize 1M instead of the 4M that the script creates it with. Did you move the EFI disk to another storage and back at some point? Did you originally create the VM on ZFS?
The EFI disk is special, because it has a fixed amount of data. Depending on the storage, the volume that data is stored on can have a different size. When moving the storage, the volume is allocated, then the data is copied over. That is not the same as reducing the volsize in ZFS. If you move it away from ZFS and back, it will be a new ZFS volume.How could that happen in PVE?
https://pastebin.com/HcJKiAiY@ballybob Do you mind posting alsoarc_summary
from your system?
Code:------------------------------------------------------------------------ ZFS Subsystem Report Wed Aug 07 11:08:34 2024 Linux 6.8.8-4-pve 2.2.4-pve1 Machine: pve1 (x86_64) 2.2.4-pve1 ... Tunables: ... zvol_blk_mq_blocks_per_thread 8 zvol_blk_mq_queue_depth 128 zvol_enforce_quotas 1 zvol_inhibit_dev 0 zvol_major 230 zvol_max_discard_blocks 16384 zvol_num_taskqs 0 zvol_open_timeout_ms 1000 zvol_prefetch_bytes 131072 zvol_request_sync 0 zvol_threads 0 zvol_use_blk_mq 0 zvol_volmode 1 ...