Hi folks,
I have observed that 3 windows servers 2008R2, 2016 each with a big 4TB as second drive do have very poor performance when handling with small files within this
big drive. Of course clients do have this issues too but I want to exclude it because makes troubleshooting easier.
drive c: 140GB NTFS, MBR, 1 partition, VirtIO, Driver 0.141, 50% free
drive f: 4TB NTFS, MBR, 1 partition, VirtIO, Driver 0.141, 30% free
from f: to f:
- copy 500MB with 2.500 files to new direcotry: 800kb-5mb/sec sometimes it stucks to 0mb/s
- applying ACLs on files is extreme slow
- copy one 500mb file 20-200mb/sec
- f: delete 1GB with 5.000 files ~ 5 min. Windows says 10-20 elements per second!
from f: to c:
- copy 500MB with 2.500 files to new direcotry: 17-50mb/sec
- 2nd run: copy 500MB with 2.500 files to new direcotry: 1-5mb/sec
- copy one 500mb file, 1 sek (suppose cache)
from c: to f:
- copy 1gb 5.000 files 3-5mb/sec sometimes 0kb/sec
- copy one 500mb file, 1 sek (suppose cache)
- c: delete 1GB with 5.000 files < 1 min. Windows says 1300-250 elements per second!
No virus scanner installed.
On the ceph backend I have seen continous rebuilds with 500-700mb/sec, peaks with 1.300mb/sec.
A quick check with iperf gives me 9.300mb/sek.
All tests do not fit to the very bad performance on the virtal machines.
I am running the latest versions of pve and ceph on 5 nodes, each on well equipped systems:
Supermicro X10DRI-T, 2x Intel E5-2667v4, 515GB DDR4 EC, Areca Controller 1883IX-12 8GB with BBU,
2x Intel 240GB SSD for OS, 8x HGST 10TB SAS for data, 2x 10GB NIC BCM57840 DualPort and 1x Intel X540-AT2 DualPort
Network layout
1x 1GB copper for management
1 bond 802.3ad of 2x 10GB for VM
1 bond 802.3ad of 4x 10GB for ceph (dedicated hp switch)
Versions:
proxmox-ve: 5.1-42 (running kernel: 4.13.16-1-pve) pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
pve-kernel-4.13: 5.1-43 pve-kernel-4.13.16-1-pve: 4.13.16-43 pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.4-1-pve: 4.13.4-26 ceph: 12.2.4-pve1 corosync: 2.4.2-pve3 criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1 ksm-control-daemon: 1.2-2 libjs-extjs: 6.0.1-2 libpve-access-control: 5.0-8
libpve-common-perl: 5.0-28 libpve-guest-common-perl: 2.0-14 libpve-http-server-perl: 2.0-8 libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1 lvm2: 2.02.168-pve6 lxc-pve: 2.1.1-3 lxcfs: 2.0.8-2 novnc-pve: 0.6-4 proxmox-widget-toolkit: 1.0-11
pve-cluster: 5.0-20 pve-container: 2.0-19 pve-docs: 5.1-16 pve-firewall: 3.0-5 pve-firmware: 2.0-4 pve-ha-manager: 2.0-5
pve-i18n: 1.0-4 pve-libspice-server1: 0.12.8-3 pve-qemu-kvm: 2.9.1-9 pve-xtermjs: 1.0-2 qemu-server: 5.0-22
smartmontools: 6.5+svn4324-1 spiceterm: 3.0-5 vncterm: 1.5-3 zfsutils-linux: 0.7.6-pve1~bpo9
Any ideas where to look at?
I have observed that 3 windows servers 2008R2, 2016 each with a big 4TB as second drive do have very poor performance when handling with small files within this
big drive. Of course clients do have this issues too but I want to exclude it because makes troubleshooting easier.
drive c: 140GB NTFS, MBR, 1 partition, VirtIO, Driver 0.141, 50% free
drive f: 4TB NTFS, MBR, 1 partition, VirtIO, Driver 0.141, 30% free
from f: to f:
- copy 500MB with 2.500 files to new direcotry: 800kb-5mb/sec sometimes it stucks to 0mb/s
- applying ACLs on files is extreme slow
- copy one 500mb file 20-200mb/sec
- f: delete 1GB with 5.000 files ~ 5 min. Windows says 10-20 elements per second!
from f: to c:
- copy 500MB with 2.500 files to new direcotry: 17-50mb/sec
- 2nd run: copy 500MB with 2.500 files to new direcotry: 1-5mb/sec
- copy one 500mb file, 1 sek (suppose cache)
from c: to f:
- copy 1gb 5.000 files 3-5mb/sec sometimes 0kb/sec
- copy one 500mb file, 1 sek (suppose cache)
- c: delete 1GB with 5.000 files < 1 min. Windows says 1300-250 elements per second!
No virus scanner installed.
On the ceph backend I have seen continous rebuilds with 500-700mb/sec, peaks with 1.300mb/sec.
A quick check with iperf gives me 9.300mb/sek.
All tests do not fit to the very bad performance on the virtal machines.
I am running the latest versions of pve and ceph on 5 nodes, each on well equipped systems:
Supermicro X10DRI-T, 2x Intel E5-2667v4, 515GB DDR4 EC, Areca Controller 1883IX-12 8GB with BBU,
2x Intel 240GB SSD for OS, 8x HGST 10TB SAS for data, 2x 10GB NIC BCM57840 DualPort and 1x Intel X540-AT2 DualPort
Network layout
1x 1GB copper for management
1 bond 802.3ad of 2x 10GB for VM
1 bond 802.3ad of 4x 10GB for ceph (dedicated hp switch)
Versions:
proxmox-ve: 5.1-42 (running kernel: 4.13.16-1-pve) pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
pve-kernel-4.13: 5.1-43 pve-kernel-4.13.16-1-pve: 4.13.16-43 pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.4-1-pve: 4.13.4-26 ceph: 12.2.4-pve1 corosync: 2.4.2-pve3 criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1 ksm-control-daemon: 1.2-2 libjs-extjs: 6.0.1-2 libpve-access-control: 5.0-8
libpve-common-perl: 5.0-28 libpve-guest-common-perl: 2.0-14 libpve-http-server-perl: 2.0-8 libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1 lvm2: 2.02.168-pve6 lxc-pve: 2.1.1-3 lxcfs: 2.0.8-2 novnc-pve: 0.6-4 proxmox-widget-toolkit: 1.0-11
pve-cluster: 5.0-20 pve-container: 2.0-19 pve-docs: 5.1-16 pve-firewall: 3.0-5 pve-firmware: 2.0-4 pve-ha-manager: 2.0-5
pve-i18n: 1.0-4 pve-libspice-server1: 0.12.8-3 pve-qemu-kvm: 2.9.1-9 pve-xtermjs: 1.0-2 qemu-server: 5.0-22
smartmontools: 6.5+svn4324-1 spiceterm: 3.0-5 vncterm: 1.5-3 zfsutils-linux: 0.7.6-pve1~bpo9
Any ideas where to look at?