ZFS over iscsi volumes hang debian vm OS repeatedly

jslanier

Well-Known Member
Jan 19, 2019
49
0
46
41
I had a Plex vm running on Debian 10 (starting with Proxmox 6) for quite a while with no notable crashing issues. However, recently I have noticed that the VM will not run for more than a day or 2 without becoming completely unusable. When the problem exists, the vm will not unmount the ZFS over iscsi storage inside the VM on shutdown, and I have to manually "stop" the VM from Proxmox.

In order to eliminate possible problems, yesterday I made a brand new Debian 12 VM, attached those ZFS over iscsi disks to the new VM, and migrated my Plex server config to the new VM. After letting it run for a few hours, I installed Radarr, and sure enough the problem came right back. It ran for a few hours before becoming unusable, and I had to manually kill it.

For more info, I am currently running the latest version of PVE as of today (8.2.4) as well as the latest version of Debian 12. I usually notice something has gone wrong when Plex is no longer working and the vm console shows several hung_task_timeout messages.

Here is my storage.cfg and my vm.conf:
dir: local
path /var/lib/vz
content backup,iso,vztmpl

zfspool: local-zfs
pool rpool/data
content images,rootdir
sparse 1

[I][B]zfs: datastor1[/B][/I]
blocksize 128k
iscsiprovider comstar
pool goliath
portal 10.0.0.5
target iqn.2010-08.org.illumos:02:8a4ca3a6-a226-6980-a6a1-ae0e913b46a5
content images
nowritecache 0

[B][I] sparse 1[/I][/B]

[I][B]zfs: kylefiber[/B][/I]
blocksize 128k
iscsiprovider comstar
pool ringo
portal 10.0.0.6
target iqn.2010-08.org.illumos:02:1352e462-fabe-cc35-986f-876c8d241428
content images
nowritecache 0

[B][I] sparse 1[/I][/B]

agent: 1
boot: order=scsi0;ide2;net0
cores: 14
cpu: host
ide2: none,media=cdrom
memory: 32768
meta: creation-qemu=9.0.0,ctime=1721053033
name: plex-v3
net0: virtio=BC:24:11:EB:D2:41,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: vmstor:vm-100-disk-0,discard=on,iothread=1,size=32G,ssd=1
[I][B]scsi1: datastor1:vm-102-disk-0,backup=0,discard=on,size=80T[/B][/I]
[B][I]scsi2: kylefiber:vm-102-disk-0,backup=0,discard=on,size=55T[/I][/B]
scsi3: vmstor:vm-102-disk-0,cache=writeback,discard=on,size=750G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=f7558573-a3db-4a81-9c96-bd1503afd2d1
sockets: 2
vmgenid: 1860f258-6c3b-4208-92c4-e2a59ba8b747

Anyone have any recommendations? This is extremely frustrating, and it is very difficult to troubleshoot.
 
I just turned on iothread for each ZFS over iscsi disk. Anyone think that makes a difference? I noticed this option was introduced in 7.3.
 
Ok, the update is this did not work. Still getting task hung timeouts after making that change.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!