High Average Load (ZFS)

hvbert

Active Member
Oct 11, 2018
5
0
41
45
Hello

I have all the time Average Load above 3,5 although I have lot of free CPU and RAM. I can't find reason but it it relate with KVM.
Screenshot_334.png
Hardware:
Dell 910
CPU: Intel x7560 64 core
RAM: 256 EEC
H200 flased in IT Mode:
techmattr.wordpress.com/2016/04/11/updated-sas-hba-crossflashing-or-flashing-to-it-mode-dell-perc-h200-and-h310/
SSD: Corsair Force LS 120GB (sda (proxmox), sda4 (logs), sda5 (cache) split according manual:
forum.level1techs.com/t/proxmox-zfs-with-ssd-caching-setup-guide/97663
3 x HGST 3TB in Radiz1 /zfs-prod—pool/ (sdc, sdd,sdf)
4 x HP 300GB in Raidz2 /zfs-test-pool/ (sdb, sde, edg, sdi)
root@proxmox:~# zpool status

pool: zfs-prod--pool

state: ONLINE

scan: scrub repaired 0B in 1h47m with 0 errors on Sun Oct 14 02:11:45 2018

config:

NAME STATE READ WRITE CKSUM

zfs-prod--pool ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

sdc ONLINE 0 0 0

sdd ONLINE 0 0 0

sdf ONLINE 0 0 0

logs

sda4 ONLINE 0 0 0

cache

sda5 ONLINE 0 0 0



errors: No known data errors



pool: zfs-test-pool

state: ONLINE

scan: scrub repaired 0B in 0h0m with 0 errors on Sun Oct 14 00:24:04 2018

config:


NAME STATE READ WRITE CKSUM

zfs-test-pool ONLINE 0 0 0

raidz2-0 ONLINE 0 0 0

sdb ONLINE 0 0 0

sde ONLINE 0 0 0

sdg ONLINE 0 0 0

sdi ONLINE 0 0 0



errors: No known data errors

root@proxmox:~# pveperf /zfs-prod--pool/

CPU BOGOMIPS: 289400.48

REGEX/SECOND: 1170710

HD SIZE: 4763.84 GB (zfs-prod--pool)

FSYNCS/SECOND: 441.79

DNS EXT: 37.74 ms

DNS INT: 24.50 ms (garets.pl)



root@proxmox:~# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda 8:0 0 111.8G 0 disk

├─sda1 8:1 0 1M 0 part

├─sda2 8:2 0 256M 0 part

├─sda3 8:3 0 71.6G 0 part

│ ├─pve-swap 253:0 0 16G 0 lvm [SWAP]

│ ├─pve-root 253:1 0 27.8G 0 lvm /

│ ├─pve-data_tmeta 253:2 0 1G 0 lvm

│ │ └─pve-data 253:4 0 25G 0 lvm

│ └─pve-data_tdata 253:3 0 25G 0 lvm

│ └─pve-data 253:4 0 25G 0 lvm

├─sda4 8:4 0 8G 0 part

└─sda5 8:5 0 32G 0 part

sdb 8:16 0 279.4G 0 disk

├─sdb1 8:17 0 279.4G 0 part

└─sdb9 8:25 0 8M 0 part

sdc 8:32 0 2.7T 0 disk

├─sdc1 8:33 0 2.7T 0 part

└─sdc9 8:41 0 8M 0 part

sdd 8:48 0 2.7T 0 disk

├─sdd1 8:49 0 2.7T 0 part

└─sdd9 8:57 0 8M 0 part

sde 8:64 0 279.4G 0 disk

├─sde1 8:65 0 279.4G 0 part

└─sde9 8:73 0 8M 0 part

sdf 8:80 0 2.7T 0 disk

├─sdf1 8:81 0 2.7T 0 part

└─sdf9 8:89 0 8M 0 part

sdg 8:96 0 279.4G 0 disk

├─sdg1 8:97 0 279.4G 0 part

└─sdg9 8:105 0 8M 0 part

sdh 8:112 0 1.8T 0 disk

└─sdh1 8:113 0 1.8T 0 part /mnt/sdh1

sdi 8:128 0 279.4G 0 disk

├─sdi1 8:129 0 279.4G 0 part

└─sdi9 8:137 0 8M 0 part

sr0 11:0 1 1024M 0 rom

zd0 230:0 0 100G 0 disk

└─zd0p1 230:1 0 100G 0 part

zd16 230:16 0 100G 0 disk

├─zd16p1 230:17 0 500M 0 part

├─zd16p2 230:18 0 99.1G 0 part

└─zd16p3 230:19 0 467M 0 part

zd32 230:32 0 100G 0 disk

├─zd32p1 230:33 0 500M 0 part

├─zd32p2 230:34 0 2G 0 part

└─zd32p3 230:35 0 97.5G 0 part

zd48 230:48 0 200G 0 disk

├─zd48p1 230:49 0 100M 0 part

└─zd48p2 230:50 0 199.9G 0 part

root@proxmox:~# pveversion --verbose

proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)

pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)

pve-kernel-4.15: 5.2-10

pve-kernel-4.15.18-7-pve: 4.15.18-26

pve-kernel-4.15.18-5-pve: 4.15.18-24

pve-kernel-4.15.17-1-pve: 4.15.17-9

corosync: 2.4.2-pve5

criu: 2.11.1-1~bpo90

glusterfs-client: 3.8.8-1

ksm-control-daemon: 1.2-2

libjs-extjs: 6.0.1-2

libpve-access-control: 5.0-8

libpve-apiclient-perl: 2.0-5

libpve-common-perl: 5.0-40

libpve-guest-common-perl: 2.0-18

libpve-http-server-perl: 2.0-11

libpve-storage-perl: 5.0-30

libqb0: 1.0.1-1

lvm2: 2.02.168-pve6

lxc-pve: 3.0.2+pve1-2

lxcfs: 3.0.2-2

novnc-pve: 1.0.0-2

proxmox-widget-toolkit: 1.0-20

pve-cluster: 5.0-30

pve-container: 2.0-28

pve-docs: 5.2-8

pve-firewall: 3.0-14

pve-firmware: 2.0-5

pve-ha-manager: 2.0-5

pve-i18n: 1.0-6

pve-libspice-server1: 0.12.8-3

pve-qemu-kvm: 2.11.2-1

pve-xtermjs: 1.0-5

qemu-server: 5.0-36

smartmontools: 6.5+svn4324-1

spiceterm: 3.0-5

vncterm: 1.5-3

zfsutils-linux: 0.7.11-pve1~bpo1



root@proxmox:~# iostat

Linux 4.15.18-7-pve (proxmox) 10/16/2018 _x86_64_ (64 CPU)



avg-cpu: %user %nice %system %iowait %steal %idle

2.12 0.00 1.82 0.17 0.00 95.89



Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn

sda 10.82 15.26 484.29 8750203 277782375

sdb 0.00 0.01 0.02 8032 9228

sde 0.00 0.01 0.02 7108 9220

sdg 0.00 0.01 0.02 7048 9224

sdi 0.00 0.01 0.02 7100 9288

sdc 60.12 718.50 737.17 412118732 422828368

sdd 60.48 718.00 737.15 411833736 422817264

sdf 60.34 718.48 737.12 412111852 422798424

sdh 0.18 2.26 154.07 1297024 88372696

dm-0 0.00 0.01 0.00 3240 4

dm-1 1.87 1.24 14.37 708473 8245288

dm-2 0.00 0.00 0.00 124 4

zd0 25.10 68.90 37.91 39517607 21743308

zd16 14.81 204.31 43.97 117187128 25220016

zd32 1.82 2.13 12.49 1221390 7162116

zd48 32.06 444.79 62.04 255127880 35586385

root@proxmox:~# zpool iostat -v

capacity operations bandwidth

pool alloc free read write read write

-------------- ----- ----- ----- ----- ----- -----

zfs-prod--pool 923G 7.22T 106 75 2.10M 2.30M

raidz1 923G 7.22T 106 73 2.10M 2.16M

sdc - - 35 24 718K 737K

sdd - - 35 24 718K 737K

sdf - - 35 24 718K 737K

logs - - - - - -

sda4 6.96M 7.93G 0 2 3 144K

cache - - - - - -

sda5 26.6G 5.44G 1 4 14.1K 326K

-------------- ----- ----- ----- ----- ----- -----

zfs-test-pool 1.34M 1.09T 0 0 15 65

raidz2 1.34M 1.09T 0 0 15 65

sdb - - 0 0 4 16

sde - - 0 0 3 16

sdg - - 0 0 3 16

sdi - - 0 0 3 16

-------------- ----- ----- ----- ----- ----- -----
"top" command shows that 201% CPU consume process: 2091 which is releated with VM 201 according /proc/2091/cgroup
VM 201 it is Windows Server 2008 VirtO and via web interface it show only 16% CPU
Screenshot_337.png
Screenshot_336.png Screenshot_335.png

I have 3 VM: Windows Server 2008, Windows 7, Windows 10, when I shutdown all 3 VM, Average Load decrease to 0,5 so problem is releated only with VM not CT. Somebody can help fix it?

root@proxmox:~# top
top - 10:16:49 up 6 days, 15:34, 1 user, load average: 3.65, 3.68, 3.57
Tasks: 1153 total, 1 running, 826 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.2 us, 1.9 sy, 0.0 ni, 94.6 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 26412582+total, 20868484+free, 53617396 used, 1823584 buff/cache
KiB Swap: 16777212 total, 16776956 free, 256 used. 20891097+avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2091 root 20 0 17.008g 0.015t 12684 S 201.0 6.2 31:25.13 kvm
51524 root 20 0 9384668 8.032g 12452 S 69.4 3.2 17:07.58 kvm
62760 48 20 0 51016 25856 6780 S 32.5 0.0 1:01.35 httpd
7569 48 20 0 55108 29620 6772 S 24.2 0.0 0:29.13 httpd
52166 48 20 0 51288 25992 6604 S 23.6 0.0 1:37.43 httpd
18020 48 20 0 50764 25352 6572 S 15.9 0.0 0:06.35 httpd
55023 48 20 0 55704 30644 6828 S 15.6 0.0 1:23.04 httpd
27770 root 20 0 9446472 7.869g 12628 S 14.6 3.1 57:08.45 kvm
35468 27 20 0 136488 34604 7156 S 14.3 0.0 8:02.05 mysqld
47282 root 20 0 3141612 1.087g 12420 S 6.1 0.4 23:15.40 kvm
19878 999 20 0 11.563g 1.563g 21388 S 4.8 0.6 154:08.32 java
20181 root 20 0 46040 4648 3012 R 4.5 0.0 0:00.76 top
24916 root 20 0 551140 116676 11844 S 1.0 0.0 0:16.14 pvedaemon worke
35454 www-data 20 0 565908 124424 13272 S 1.0 0.0 0:11.06 pveproxy worker
2 root 20 0 0 0 0 S 0.3 0.0 49:58.94 kthreadd
9 root 20 0 0 0 0 I 0.3 0.0 13:13.17 rcu_sched
539 root 20 0 0 0 0 S 0.3 0.0 1:57.68 scsi_eh_0
900 root 0 -20 0 0 0 S 0.3 0.0 20:27.19 spl_dynamic_tas
1207 root 20 0 0 0 0 S 0.3 0.0 41:15.13 l2arc_feed
1676 root 0 -20 0 0 0 S 0.3 0.0 3:38.63 z_null_iss
1691 root 1 -19 0 0 0 S 0.3 0.0 4:16.88 z_wr_iss
1707 root 1 -19 0 0 0 S 0.3 0.0 4:16.46 z_wr_iss
1711 root 1 -19 0 0 0 S 0.3 0.0 4:17.16 z_wr_iss
1720 root 1 -19 0 0 0 S 0.3 0.0 4:16.29 z_wr_iss
1722 root 1 -19 0 0 0 S 0.3 0.0 4:16.63 z_wr_iss
1725 root 1 -19 0 0 0 S 0.3 0.0 4:17.00 z_wr_iss
1727 root 1 -19 0 0 0 S 0.3 0.0 4:16.65 z_wr_iss
1737 root 0 -20 0 0 0 S 0.3 0.0 9:13.10 z_wr_int_1
1742 root 0 -20 0 0 0 S 0.3 0.0 9:13.80 z_wr_int_6
1743 root 0 -20 0 0 0 S 0.3 0.0 9:12.94 z_wr_int_7
1776 root 39 19 0 0 0 S 0.3 0.0 1:03.97 dp_sync_taskq
2026 root 20 0 0 0 0 S 0.3 0.0 36:58.46 txg_sync
3950 root 20 0 686356 61692 48132 S 0.3 0.0 11:44.12 pmxcfs
18824 999 20 0 1817720 1.014g 12204 S 0.3 0.4 28:29.37 mysqld
20806 999 20 0 127580 7372 3720 S 0.3 0.0 6:12.55 amavis-services
28227 110 20 0 648084 72156 9028 S 0.3 0.0 6:17.28 mysqld
39153 www-data 20 0 558336 118344 12500 S 0.3 0.0 0:02.37 pveproxy worker
61660 root 20 0 0 0 0 I 0.3 0.0 0:00.20 kworker/u130:2
63200 root 20 0 0 0 0 I 0.3 0.0 0:00.20 kworker/36:2
1 root 20 0 57728 7520 5328 S 0.0 0.0 1:54.16 systemd
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H
7 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
 
16% of 16 cores = 2 cores; 2 cores * 100% = 200% so this is ok and correct

a load of 4 on a system with 64 threads is ok and if your virtual cores are 100% loaded, this is expected
 
  • Like
Reactions: hvbert
1. Ok, thank you for quick reply. I'm going to add additional VMs and CTs. So overall CPU / RAM / IO delay will be more importent indicators then Load Average?
2. FSYNCS/SECOND: 441.79. Could it be releate with quality of SSD? and if I change ssd for enterprise grade (intel) ssd, FSYNCS/SECOND will be better? or in my hardware configuration better idea will be to use RAM (I have a lot free RAM) to increase preformance for ZFS?
 
1. Ok, thank you for quick reply. I'm going to add additional VMs and CTs. So overall CPU / RAM / IO delay will be more importent indicators then Load Average?
load can be as high as your core/thread count and not cause overloading, a load value of 1 means that 1 core/thread is fully loaded

2. FSYNCS/SECOND: 441.79. Could it be releate with quality of SSD? and if I change ssd for enterprise grade (intel) ssd, FSYNCS/SECOND will be better? or in my hardware configuration better idea will be to use RAM (I have a lot free RAM) to increase preformance for ZFS?
of course enterprise ssds have better performance, but do you have actual performance problems?
since 2 cores of yours are 100% utilized (according to your message) my guess is that the application is not well tuned for multiple cores and need a high single thread performance
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!