drop_caches don't finisched on pveperf

udo

Distinguished Member
Apr 22, 2009
5,977
199
163
Ahrensburg; Germany
Hi,
I have on an server the effect, that dorp_caches don't finisched during an pveperf run.
Code:
ps aux | grep echo
root      835293 99.9  0.0   4292   708 pts/3    R+   Jul20 1140:50 sh -c echo 3 > /proc/sys/vm/drop_caches
There run an VM with IO, but the load is quite low and the underlaying storage system are an zfs striped mirror of 4 samsung enterprise disks.

There are no io-waits:
Code:
top
top - 07:59:21 up 10 days, 22:03,  6 users,  load average: 4.29, 4.50, 4.93
Tasks: 694 total,   3 running, 469 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.5 us,  4.3 sy,  0.0 ni, 88.0 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem : 16495880+total, 53446384 free, 10989219+used,  1620224 buff/cache
KiB Swap:  8388604 total,  4656968 free,  3731636 used. 53238628 avail Mem
...
I used not so big values for zfs_arc
Code:
cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_min=4294967296
options zfs zfs_arc_max=8589934592
Versions:
Code:
pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.17-3-pve)
pve-manager: 5.2-5 (running version: 5.2-5/eb24855a)
pve-kernel-4.15: 5.2-4
pve-kernel-4.15.18-1-pve: 4.15.18-15
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-35
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-1
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-28
pve-container: 2.0-24
pve-docs: 5.2-4
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-29
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
Any hint?

Udo
 
Info about IO:
Code:
zpool iostat 1
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool        473G   415G     50    229   325K  7.33M
rpool        473G   415G    113  1.09K   779K  20.7M
rpool        473G   415G     40    823   164K  15.7M
rpool        473G   415G     89  1.74K   364K  55.2M
rpool        473G   415G     23    889  95.9K  12.4M
rpool        473G   415G     28    980   128K  13.7M
rpool        473G   415G     23    962  95.9K  13.5M
rpool        473G   415G     36    919   148K  12.7M
rpool        473G   415G     35  1.65K   184K  45.3M
rpool        473G   415G     31    871   176K  11.9M
rpool        473G   415G     28    939   120K  12.9M
rpool        473G   415G     28    921   120K  13.7M
rpool        473G   415G     24    933   104K  12.9M
rpool        473G   415G     56  1.69K   280K  47.5M
rpool        473G   415G    527    955  2.07M  20.4M
rpool        473G   415G     35    949   148K  12.8M
rpool        473G   415G     26    849   108K  11.9M
rpool        473G   415G     30    873   132K  12.1M
rpool        473G   415G     28  1.63K   140K  43.2M
rpool        473G   415G     42  1.09K   172K  21.6M
rpool        473G   415G     26    915   116K  15.9M
rpool        473G   415G     39  1.02K   160K  13.6M
rpool        473G   415G     35    859   240K  11.6M
rpool        473G   415G     33  1.69K   148K  48.1M
rpool        473G   415G    198    871  4.67M  12.3M
rpool        473G   415G     31    803   188K  11.1M
rpool        473G   415G     35    943   196K  13.0M
rpool        473G   415G     25  1.03K   108K  14.0M
rpool        473G   415G     28    809   124K  10.9M
^C
Udo
 
Hi,
after migration the most IO active VMs away, the drop_caches get finished and pveperf work again (but need a (too) long time):
Code:
pveperf
CPU BOGOMIPS:      192050.40
REGEX/SECOND:      2483783
HD SIZE:           671.28 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2403.32
DNS EXT:           14.87 ms
DNS INT:           0.86 ms
But imho this should not be an problem, there must something wrong with ZOL.

#edit Perhaps kernel related, because it's don't happens on the cluster member where the VMs are now running, and an node with the same kernel and no load (but hdds, not ssds) take a long time also:
Code:
root@pve07:~# time pveperf
CPU BOGOMIPS:      120011.40
REGEX/SECOND:      1625253
HD SIZE:           1303.44 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     162.51
DNS EXT:           15.25 ms
DNS INT:           0.99 ms (lv3.metaways.net)

real    1m50.659s
user    0m3.180s
sys     1m42.751s

Linux pve07 4.15.17-3-pve #1 SMP PVE 4.15.17-13 (Mon, 18 Jun 2018 17:15:04 +0200) x86_64 GNU/Linux
top - 04:17:33 up 22 days, 12:02,  2 users,  load average: 0.96, 1.31, 1.31

Udo
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!