ZFS over ISCSI (High Load on Move Disk)

ran

Member
Feb 13, 2019
17
0
6
21
Hi

We have a setup of ZFS over ISCSI using LIO on Ubuntu 18 , and we have an issue with high IO load once we move disks that are bigger than 100GB,

once the move starts the Load is low until about a half of the transfer is done, and then it's getting crazy high,

our setup is very high end , and the load is very unreasonable, our setup is raidz1-0 on 8TB X 8 nVME disks

with 512GB of RAM and dual xeon gold 3.3GHZ and that is just for the storage, our atime for zfs is disabled .

do you have any clues for the reason for high load on high capacity disk moves?

btw setting a limit on proxmox cluster options doesn't help at all.

Thanks.
 

wolfgang

Proxmox Staff Member
Oct 1, 2014
6,496
496
103
Hi,

what PVE version do you use?
Code:
pveversion -v
 

ran

Member
Feb 13, 2019
17
0
6
21
Latest on all servers , minimal version on our cluster is 6.1-3

Proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-15
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
pve-zsync: 2.0-1
qemu-server: 6.1-4
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
 

wolfgang

Proxmox Staff Member
Oct 1, 2014
6,496
496
103
  • Like
Reactions: fireon

ran

Member
Feb 13, 2019
17
0
6
21
Hi,

Thanks a lot for the help , though how can i do it ? do i change it using the command "zfs set checksum..." ?

can the change you are suggesting be done on a live production ZFS volume?

Thanks so much.
 

wolfgang

Proxmox Staff Member
Oct 1, 2014
6,496
496
103
You can change this at runtime.

Code:
echo ssse3 >> /sys/module/zfs/parameters/zfs_vdev_raidz_impl
zfs set checksum=sha256 <pool>
 
  • Like
Reactions: fireon

ran

Member
Feb 13, 2019
17
0
6
21
Unfortunately it didn't help at all , for some reason during any major operation like disk move , clone or restore , its the same story , very high load that appears to be too many theards of process that are opening and causing major IO delay on the system .. anything else that you can suggest?

We are truely lost with this situation , we have multiple VM's relaying on that storage , they all can crash on any disk move. Even though the hardware is really top of the line as mentioned above.

Thanks.
 

wolfgang

Proxmox Staff Member
Oct 1, 2014
6,496
496
103
Please send me the output of these commands.

Code:
arc_summary
lsblk
swapon
zpool get all
zfs get all
 

ran

Member
Feb 13, 2019
17
0
6
21
Thanks, this is the data for each attached in files
 

Attachments

  • getall.zip
    511.5 KB · Views: 1
  • arc_summ_lsblk_swapon_zpoolget_zfsget.txt
    47.9 KB · Views: 1

ran

Member
Feb 13, 2019
17
0
6
21
Hi Wolfgang ,

do you have any idea why it can happen?
do you have enough info from my side?

Thanks.
 

wolfgang

Proxmox Staff Member
Oct 1, 2014
6,496
496
103
This is an NVMe problem at all.
It looks like a problem that the disks to fast ;-)
and many HW vendors use switches to extend the PCIe lanes for more devices that makes series problems.

Please send me the output of this command to verify how your NVMe are connected.

Code:
lspci -tv
lspci

Meanwhile, you can try the following to increase performance.

Set for cores the real core count without HT.
echo <Cores> > /sys/module/nvme/parameters/poll_queues

Check if the governor is set to performance
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Enable delay (0 = hybrid) 1< is ms
echo 0 > /sys/block/nvme0n1/queue/io_poll_delay
 
Last edited:

ran

Member
Feb 13, 2019
17
0
6
21
Hi,

Thank you , i have attached both outputs of commands

currently this file does not exist " /sys/module/nvme/parameters/poll_queues "

should i just create it with the echo command? and use the output of nproc command?

output of scaling_governor:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand

output of io_poll_delay:

cat /sys/block/nvme0n1/queue/io_poll_delay

-1

should i do all the changes you suggested live on active zfs server?

Thanks for the help
 

Attachments

  • lspci.txt
    26 KB · Views: 2
  • lspci-v.txt
    20.7 KB · Views: 3

wolfgang

Proxmox Staff Member
Oct 1, 2014
6,496
496
103
Please check your bios is there is a configuration where you can set the CPU to performance Mode and disable all power-saving settings.
currently this file does not exist " /sys/module/nvme/parameters/poll_queues "
This is not a file, it is the sysfs.
It must exist in the actual pve kernel.
should i do all the changes you suggested live on active zfs server?
This is normally no problem.

But as your lspci report shows the NVMe is not balanced.
You have 4 bridges and
Bridge 3B:00.0 got 7 NVMe
Bridge 18:00.0 got 1 NVMe
Bridge 86:00.0 got 2 NVMe
Bridge af:00.0 got 1 NVMe

I would take care that the NVMe is balanced on the bridges
 

ran

Member
Feb 13, 2019
17
0
6
21
Hi Wolfgang , thank you

i should mention that NFS sharing on the same ZFS server and the same Nvme disks is much much faster

it's only when we use ZVOLS on that server and moving .. cloning disks then it gets a very high load..

so i'm not sure about the bridges solution.

what do you think?
 

JamesT

New Member
Sep 10, 2020
29
9
3
Perth, Western Australia
Hi Wolfgang , thank you

i should mention that NFS sharing on the same ZFS server and the same Nvme disks is much much faster

it's only when we use ZVOLS on that server and moving .. cloning disks then it gets a very high load..

so i'm not sure about the bridges solution.

what do you think?

Hello,
I'm observing very similar if not the same issue on our setup. Things appear to work fine, but if trying to migrate a VM between two hosts, or even migrate a disk from one storage to another, everything comes to a complete halt, incredibly slow, VMs become unresponsive, CPU usage goes up.
This happens even if migrating from NVME storage on one host, to NVME storage on the other.
Did you ever find a solution?

P.S.
I was unable to find how to update the polling setting mentioned.
echo 1 > /sys/block/nvme0n1/queue/io_poll didn't work, it gave an error write error: Invalid argument . This also happened if using a text editor like nano.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!