ZFS over iscsi volumes hang debian vm OS repeatedly

jslanier

Well-Known Member
Jan 19, 2019
51
0
46
42
I had a Plex vm running on Debian 10 (starting with Proxmox 6) for quite a while with no notable crashing issues. However, recently I have noticed that the VM will not run for more than a day or 2 without becoming completely unusable. When the problem exists, the vm will not unmount the ZFS over iscsi storage inside the VM on shutdown, and I have to manually "stop" the VM from Proxmox.

In order to eliminate possible problems, yesterday I made a brand new Debian 12 VM, attached those ZFS over iscsi disks to the new VM, and migrated my Plex server config to the new VM. After letting it run for a few hours, I installed Radarr, and sure enough the problem came right back. It ran for a few hours before becoming unusable, and I had to manually kill it.

For more info, I am currently running the latest version of PVE as of today (8.2.4) as well as the latest version of Debian 12. I usually notice something has gone wrong when Plex is no longer working and the vm console shows several hung_task_timeout messages.

Here is my storage.cfg and my vm.conf:
dir: local
path /var/lib/vz
content backup,iso,vztmpl

zfspool: local-zfs
pool rpool/data
content images,rootdir
sparse 1

[I][B]zfs: datastor1[/B][/I]
blocksize 128k
iscsiprovider comstar
pool goliath
portal 10.0.0.5
target iqn.2010-08.org.illumos:02:8a4ca3a6-a226-6980-a6a1-ae0e913b46a5
content images
nowritecache 0

[B][I] sparse 1[/I][/B]

[I][B]zfs: kylefiber[/B][/I]
blocksize 128k
iscsiprovider comstar
pool ringo
portal 10.0.0.6
target iqn.2010-08.org.illumos:02:1352e462-fabe-cc35-986f-876c8d241428
content images
nowritecache 0

[B][I] sparse 1[/I][/B]

agent: 1
boot: order=scsi0;ide2;net0
cores: 14
cpu: host
ide2: none,media=cdrom
memory: 32768
meta: creation-qemu=9.0.0,ctime=1721053033
name: plex-v3
net0: virtio=BC:24:11:EB:D2:41,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: vmstor:vm-100-disk-0,discard=on,iothread=1,size=32G,ssd=1
[I][B]scsi1: datastor1:vm-102-disk-0,backup=0,discard=on,size=80T[/B][/I]
[B][I]scsi2: kylefiber:vm-102-disk-0,backup=0,discard=on,size=55T[/I][/B]
scsi3: vmstor:vm-102-disk-0,cache=writeback,discard=on,size=750G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=f7558573-a3db-4a81-9c96-bd1503afd2d1
sockets: 2
vmgenid: 1860f258-6c3b-4208-92c4-e2a59ba8b747

Anyone have any recommendations? This is extremely frustrating, and it is very difficult to troubleshoot.
 
I just turned on iothread for each ZFS over iscsi disk. Anyone think that makes a difference? I noticed this option was introduced in 7.3.
 
Ok, the update is this did not work. Still getting task hung timeouts after making that change.