[SOLVED] Disk IO throttles and "dies" inside VM, passthrough of drives (encrypted).

tig3r

New Member
Jan 29, 2023
7
1
1
I don't understand what's going wrong..
Running Ubuntu Server for plex as main use, a VM, have a couple of drives passed through to the VM.
All drives are Luks encrypted, works fine with unlocking/mounting with crypttab.
Streaming works fine, but When i try to copy or move files between the drives the speed starts really fine 250+MB/s (at least between the Enterprise drives).
But it starts throttling after say 10-15GB until it drops down slowly alteast when lookin at the graphs (summary).
Inside the VM the transfer freezes, I can't poll any more data from the "writing" disk, and when this happens it's also impossible to shutdown the VM since the ubuntu shutdown stops at failure to unmont filesystem and fails to kill process id that transfer files.

I do see the VMs ram topping out, but after a hard reboot of the whole server I can see that about 10-15 gb are written to the destination drive.

I really start to enjoy using PVE and o it for two more servers as well, but this is so frustrating my plex server cant have these issues, do I need to move back to Ubuntu baremetall?

My VM
Code:
gent: 1
boot: order=scsi0;ide2;net0
cores: 11
cpu: host
ide2: none,media=cdrom
memory: 25000
meta: creation-qemu=7.1.0,ctime=1676379437
name: Ubuntu-Plex2
net0: virtio=*:*:*:*:*,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
sata0: /dev/disk/by-id/ata-Samsung_SSD_850_E*,backup=0,size=488386584K,ssd=1
sata1: /dev/disk/by-id/ata-KINGSTON_SUV500*,backup=0,size=468851544K,ssd=1
scsi0: local-lvm:vm-102-disk-0,iothread=1,size=180G
scsi10: /dev/disk/by-id/ata-TOSHIBA_HDWQ140*backup=0,size=3907018584K
scsi11: /dev/disk/by-id/ata-WDC_WD30*,backup=0,size=2930266584K
scsi12: /dev/disk/by-id/ata-WDC_WD30*,backup=0,size=2930266584K
scsi13: /dev/disk/by-id/ata-WDC_WD30*,backup=0,size=2930266584K
scsi14: /dev/disk/by-id/ata-WDC_WD30*,backup=0,size=2930266584K
scsi2: /dev/disk/by-id/nvme-Force_MP510*,backup=0,size=937692504K,ssd=1
scsi5: /dev/disk/by-id/ata-TOSHIBA_MG07ACA14TE*,backup=0,size=13351934M
scsi6: /dev/disk/by-id/ata-TOSHIBA_MG07ACA14TE*,backup=0,size=13351934M
scsi7: /dev/disk/by-id/ata-TOSHIBA_MG07ACA14TE*,backup=0,size=13039G
scsi8: /dev/disk/by-id/ata-TOSHIBA_MG07ACA14TE*,backup=0,size=13351934M
scsi9: /dev/disk/by-id/ata-TOSHIBA_MG07ACA14TE*,backup=0,size=13039G
scsihw: virtio-scsi-single
smbios1: uuid=0*
sockets: 1
usb0: host=1058:0748
usb1: host=0bc2:ab2e
vmgenid: *

The attached file shows how the speed drops on IO for disks, till its stuck and i have to shutdown the whole system.
 

Attachments

  • Screenshot from 2023-02-18 23-04-17.png
    Screenshot from 2023-02-18 23-04-17.png
    32.7 KB · Views: 7
Last edited:
Please try enabling ,iothread=1 for all disks.
What are the specs of you host system?
pveversion -v?
 
Last edited:
Please try enabling ,iothread=1 for all disks.
What are the specs of you host system?
pveversion -v?
I'll link the version below. But firsts, my solution.
It took some time to get my post verified for" posting" so I did spend the whole Sunday trying everything i could think of that would change the behavior.
iothread=1 did not help at all, the result was still lockup of PID inside VM.
PIDs got locked so hard that 'shutdown' of VM took ages on trying to kill the PID.
I did try upgrading the kernel to 6.1.10.*, by itself it made no difference. But I'm still at that kernel.
Since slow transfers didn't seem make the same obvious lockups, and the RAM-memory was peaking fast when using cp/rsync/mv i did think that my data was stucked or cached somewhere so I did read up on the diffrent ones and did trying altering disk-write cache, the selection that works for me was in the end cache=directsync. Still want as high security as possible.
The speed is perhaps a tiny bit slower, but it stays high and don't drop, and no PID that locks up. Have tried multiple simultaneous moves and copy between drives. I don't use iothread=1 atm either.

So the solution in my case was to use, cache=directsync. So pleased to have found a solution that works.

edit:
System is a X470 mb (asus-consumer) with 32GB ram, amd 3900X cpu, 2x nvme, 2ssd and 10 hdd drives.

pveversion:
Code:
proxmox-ve: 7.3-1 (running kernel: 6.1.10-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-6.1: 7.3-4
pve-kernel-helper: 7.3-4
pve-kernel-5.15: 7.3-2
pve-kernel-6.1.10-1-pve: 6.1.10-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-1
lxcfs: 5.0.3-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-2
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
 
Last edited:
i also noticed the problem yesterday, i was on kernel version 6.1.2 for a very long time, then i updated to 6.1.10 a few days ago, then i noticed the problem yesterday. then i tested using kernel 6.1.14 but there is the problem too.

is this just a bug, or will the proxmox default config change to use only directsync on the harddisk?

with the default setting of disk cache the hard disk from kernel 6.1.10 in any case no longer works properly and the disk is hanging until the restart of the vm.
 
i also noticed the problem yesterday, i was on kernel version 6.1.2 for a very long time, then i updated to 6.1.10 a few days ago, then i noticed the problem yesterday. then i tested using kernel 6.1.14 but there is the problem too.

is this just a bug, or will the proxmox default config change to use only directsync on the harddisk?

with the default setting of disk cache the hard disk from kernel 6.1.10 in any case no longer works properly and the disk is hanging until the restart of the vm.
I though I had found a solution as stated above, but it did turn out to be full of problems on a daily basis, SMB shares from server when used turned out to "hang" the discs. Speed between discs did drop to about 10% of normal after a while, and I was unable to shutdown the VM, 9 ot of 10 times ubuntu had trouble with "stop jobs" they were in one or annother way attached to disks I/O. I was forced to "STOP" the VM. mergerfs did use some sort of cache for 4 of the drives that did use it. That was very unstable and often crashed the I/O.

I think my main problem was migrating data drives from one system with disk pass-trough instead of starting fresh. It would probably have worked out better to make a big LV with all the disks in proxmox and instead adress it as one share to the VM.
After a few more days (and nights) of reset and rebuild and trying all options I could think of I gave up and made that server a ubuntu "bare metal" as it was from the beginning, no problems correlated to discs or encryptions anymore. My other VM's that I was using, Unifi controller, databases and ect. now runs as docker instead.
It is a bit shame I really like proxmox and run it on 2 other servers, working flawless, but they were proxmox from the start, no encrypted discs or pass-thou at all.
 
for me the opt-in kernels are an optional unstable option, so i see the problem not so much with proxmox but with the expectations of the user using the kernel. it is not the main kernel proxmox ships with.

with my previous post i hope for a response from the proxmox team to understand if the kernel 6.1 will eventually be usable for this case.