ZFS over ISCSI (High Load on Move Disk)

ran

Active Member
Feb 13, 2019
23
1
43
22
Hi

We have a setup of ZFS over ISCSI using LIO on Ubuntu 18 , and we have an issue with high IO load once we move disks that are bigger than 100GB,

once the move starts the Load is low until about a half of the transfer is done, and then it's getting crazy high,

our setup is very high end , and the load is very unreasonable, our setup is raidz1-0 on 8TB X 8 nVME disks

with 512GB of RAM and dual xeon gold 3.3GHZ and that is just for the storage, our atime for zfs is disabled .

do you have any clues for the reason for high load on high capacity disk moves?

btw setting a limit on proxmox cluster options doesn't help at all.

Thanks.
 
Hi,

what PVE version do you use?
Code:
pveversion -v
 
Latest on all servers , minimal version on our cluster is 6.1-3

Proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-15
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
pve-zsync: 2.0-1
qemu-server: 6.1-4
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
 
  • Like
Reactions: fireon
Hi,

Thanks a lot for the help , though how can i do it ? do i change it using the command "zfs set checksum..." ?

can the change you are suggesting be done on a live production ZFS volume?

Thanks so much.
 
You can change this at runtime.

Code:
echo ssse3 >> /sys/module/zfs/parameters/zfs_vdev_raidz_impl
zfs set checksum=sha256 <pool>
 
  • Like
Reactions: fireon
Unfortunately it didn't help at all , for some reason during any major operation like disk move , clone or restore , its the same story , very high load that appears to be too many theards of process that are opening and causing major IO delay on the system .. anything else that you can suggest?

We are truely lost with this situation , we have multiple VM's relaying on that storage , they all can crash on any disk move. Even though the hardware is really top of the line as mentioned above.

Thanks.
 
Please send me the output of these commands.

Code:
arc_summary
lsblk
swapon
zpool get all
zfs get all
 
Thanks, this is the data for each attached in files
 

Attachments

  • getall.zip
    511.5 KB · Views: 1
  • arc_summ_lsblk_swapon_zpoolget_zfsget.txt
    47.9 KB · Views: 1
Hi Wolfgang ,

do you have any idea why it can happen?
do you have enough info from my side?

Thanks.
 
This is an NVMe problem at all.
It looks like a problem that the disks to fast ;-)
and many HW vendors use switches to extend the PCIe lanes for more devices that makes series problems.

Please send me the output of this command to verify how your NVMe are connected.

Code:
lspci -tv
lspci

Meanwhile, you can try the following to increase performance.

Set for cores the real core count without HT.
echo <Cores> > /sys/module/nvme/parameters/poll_queues

Check if the governor is set to performance
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Enable delay (0 = hybrid) 1< is ms
echo 0 > /sys/block/nvme0n1/queue/io_poll_delay
 
Last edited:
Hi,

Thank you , i have attached both outputs of commands

currently this file does not exist " /sys/module/nvme/parameters/poll_queues "

should i just create it with the echo command? and use the output of nproc command?

output of scaling_governor:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand

output of io_poll_delay:

cat /sys/block/nvme0n1/queue/io_poll_delay

-1

should i do all the changes you suggested live on active zfs server?

Thanks for the help
 

Attachments

  • lspci.txt
    26 KB · Views: 2
  • lspci-v.txt
    20.7 KB · Views: 3
Please check your bios is there is a configuration where you can set the CPU to performance Mode and disable all power-saving settings.
currently this file does not exist " /sys/module/nvme/parameters/poll_queues "
This is not a file, it is the sysfs.
It must exist in the actual pve kernel.
should i do all the changes you suggested live on active zfs server?
This is normally no problem.

But as your lspci report shows the NVMe is not balanced.
You have 4 bridges and
Bridge 3B:00.0 got 7 NVMe
Bridge 18:00.0 got 1 NVMe
Bridge 86:00.0 got 2 NVMe
Bridge af:00.0 got 1 NVMe

I would take care that the NVMe is balanced on the bridges
 
Hi Wolfgang , thank you

i should mention that NFS sharing on the same ZFS server and the same Nvme disks is much much faster

it's only when we use ZVOLS on that server and moving .. cloning disks then it gets a very high load..

so i'm not sure about the bridges solution.

what do you think?
 
Hi Wolfgang , thank you

i should mention that NFS sharing on the same ZFS server and the same Nvme disks is much much faster

it's only when we use ZVOLS on that server and moving .. cloning disks then it gets a very high load..

so i'm not sure about the bridges solution.

what do you think?

Hello,
I'm observing very similar if not the same issue on our setup. Things appear to work fine, but if trying to migrate a VM between two hosts, or even migrate a disk from one storage to another, everything comes to a complete halt, incredibly slow, VMs become unresponsive, CPU usage goes up.
This happens even if migrating from NVME storage on one host, to NVME storage on the other.
Did you ever find a solution?

P.S.
I was unable to find how to update the polling setting mentioned.
echo 1 > /sys/block/nvme0n1/queue/io_poll didn't work, it gave an error write error: Invalid argument . This also happened if using a text editor like nano.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!