ZFS over iSCSI - Increase disk size

sga-ag

New Member
Apr 28, 2025
2
0
1
Hello everyone,

We have a Proxmox node that leverages ZFS overs iSCSI to a Truenas Scale storage server. We are using this project to connect Proxmox to Truenas using the Truenas API.

https://github.com/TheGrandWazoo/freenas-proxmox

My question is regarding expanding a virtual disk that is hosted on a ZFS over iSCSI storage backend. The expanding part works fine, the issue we are having is at the guest VMs level.

On 2 out of the 4 VMs that have a virtual disk on that ZFS overs iSCSI storage, operations (reads and writes) to the virtual disks halted. The virtual disk still appeared to be mounted (lsblk, df -h), but listing the content returned a I/O error. Trying to grow the file system also resulted in an error. I tried remounting the drives with mount -a but it said that it was already mounted, even though it was essentially inaccessible.

A reboot did fix the issue, but I am wondering if there's a better way to do this as I would rather not have to reboot the VMs when increasing the size of the virtual disk.


Thank you!
 
Last edited:
Hi @sga-ag, welcome to the forum.

The essence of the ZFS/iSCSI scheme is that PVE can SSH into the storage box to create a volume, which is then exported as iSCSI. At a high level, it doesn't matter whether the underlying filesystem is ZFS, XFS, or something else.

Disk expansion can be as simple as natural ZFS expansion, or there might be a NAS-specific sequence involved. You would need to consult the plugin code to see whether it uses the native TrueNAS API to perform the expansion or goes behind the scenes using direct ZFS commands.

Assuming the disk expansion on the NAS is handled properly, the next layer to consider is iSCSI. There are native iSCSI/SCSI commands that inform the host that the disk capacity has changed. You can usually find related messages in "dmesg" or "journalctl." The kernel then reacts to this and attempts to propagate the change up the stack.

As you can see, there are many potential failure points along the way. We don't yet know at which point the first failure occurred - once a failure happens, everything else downstream is doomed.

You'll need to trace the full sequence: NAS, Plugin, iSCSI, Kernel, Hypervisor. The initial breaking point will direct you to the appropriate support channel: TrueNAS support, the plugin developer, or PVE support.

All that said: things are complicated. You'll need to perform more testing and troubleshooting, gather that data, and perhaps someone will be able to assist - or at the very least, reproduce the issue.

Cheers,



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
The safes way to expand a disk is to unmount it and then do the resize and expansion of the filesystem. After this simply mount it again. It is all explained here: https://pve.proxmox.com/wiki/Resize_disks
If the disk resizing is not automatically discovered by the VM you will have to initiate a manual rescan:
  1. su -
  2. echo "- - -" > /sys/class/scsi_host/host<n>/scan
Remember it is the host bus you most rescan so given:
# ls /sys/class/scsi_host/
host0 host1 host2 host3 host4 host5 host6
and that your disk is under host0
echo "- - -" > /sys/class/scsi_host/host0/scan
 
Last edited:
The safes way to expand a disk is to unmount
Based on the fact that Op experienced a system hang and then IO errors , I am reasonably sure that the issue happened at hypervisor level, or on the NAS rather than inside the VM. If I am correct, then Ops approach should be - shut down the VM, rather than unmount the disk inside the VM.

But this is just a guess without concrete evidence. There is likely something interesting in the hypervisor's dmesg/journalctl at the time of expansion.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Based on the fact that Op experienced a system hang and then IO errors , I am reasonably sure that the issue happened at hypervisor level, or on the NAS rather than inside the VM. If I am correct, then Ops approach should be - shut down the VM, rather than unmount the disk inside the VM.
An unmount before resizing would have fixed the problem. A mounted disk can cause all various kinds of errors. It could easily have been caused by an automatic rescan of the hostbus which interfered with a filesystem with data waiting to be persisted to disk.
 
An unmount would have fixed the problem. A mounted disk can cause all various kinds of errors. It could easily have been caused by an automatic rescan of the hostbus which interfered with a filesystem with data waiting to be persisted to disk.
I will agree that your theory has merit. Absent any hard proof in the way of logs - your guess is as good as mine.

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Thanks to both of you! I will try and see i unmounting the drive before expanding it to see if that prevents I/O from locking up. I am unsure if the application that is using the disks will just wait for the disks to come back online or just crash.

Thank you for the great explanation!