HIgh IO, Slow performance

Manny Vazquez

Well-Known Member
Jul 12, 2017
107
2
58
Miami, FL USA
Hi,
I have a new Cluster, setup to what I understood was best practices each node with 2*6 cpu, 4*1tb sas disk on raidz2, 128 gb ram, separate bond network for corosync, etc.. everything that I have learned and put together over the last year.

Screen Shot 2019-02-05 at 4.01.04 PM.png

But, this new cluster is VERY SLOW, much slower than the one I am trying to replace. I noticed a high IO read out

Screen Shot 2019-02-05 at 4.00.52 PM.png

Restoring a vm is painfully slow and the web gui seems to lose connection frequently.

Where or how would I start to troubleshoot this slowness..?

root@pve2:~# pveperf
CPU BOGOMIPS: 134060.88
REGEX/SECOND: 1654004
HD SIZE: 1624.56 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 1.49
DNS EXT: 56.93 ms
DNS INT: 16.04 ms (medstar.local)
root@pve2:~#

root@pve2:~# zpool status
pool: rpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sda3 ONLINE 0 0 0
sdb3 ONLINE 0 0 0
sdd3 ONLINE 0 0 0
sdc3 ONLINE 0 0 0

errors: No known data errors


root@pve2:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 3.62T 345G 3.29T - 0% 9% 1.00x ONLINE -


root@pve2:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 63G 0 63G 0% /dev
tmpfs 13G 26M 13G 1% /run
rpool/ROOT/pve-1 1.6T 59G 1.6T 4% /
tmpfs 63G 69M 63G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 63G 0 63G 0% /sys/fs/cgroup
rpool 1.6T 128K 1.6T 1% /rpool
rpool/ROOT 1.6T 128K 1.6T 1% /rpool/ROOT
rpool/data 1.6T 128K 1.6T 1% /rpool/data
rpool/data/subvol-101-disk-0 20G 476M 20G 3% /rpool/data/subvol-101-disk-0
/dev/fuse 30M 28K 30M 1% /etc/pve
//172.21.82.100/WinShare 95G 14G 82G 14% /mnt/pve/FreeNas1
tmpfs 13G 0 13G 0% /run/user/0

Please help ..
 
Your bottleneck is ZFS. Namely IOPS or FSYNCs as shown with pveperf.
You could try to change from RAID Z2 to RAID 10 to increase IOPS.
You could try adding ZIL/SLOG to increase synchronous writes and possible L2ARC (have to take care because L2ARC table also eats RAM).
But I guess the performance will really be only good enough if you switch to two or more enterprise grade SSDs.

Or you could switch to LVM or unsupported mdadm RAID. Or you could buy a cheap HW RAID and set up LVM on it.
 
Your bottleneck is ZFS. Namely IOPS or FSYNCs as shown with pveperf.

Not sure what the number tells, where can I read about that?

You could try to change from RAID Z2 to RAID 10 to increase IOPS.
That would mean losing what I thought was the advantage of raidz2, best of both worlds passing the divers directlyu to proxmox and let zfs deal, instead of using a hardware raid, which I was told many times is not good since zfs does not SEE the drives.

You could try adding ZIL/SLOG to increase synchronous writes and possible L2ARC (have to take care because L2ARC table also eats RAM).
No idea about this, I will research

But I guess the performance will really be only good enough if you switch to two or more enterprise grade SSDs.
I have 2 HD slots free on each server, can I just add the ssds (live) to the server and mount them? I have looked for info on this but I can not find anything. I could bring the nodes down, but the problem still persist, how do I mount the extra 2 drives? (Ideally as mirrors, but WHERE on the gui or terminal?)[/QUOTE]

Or you could switch to LVM or unsupported mdadm RAID. Or you could buy a cheap HW RAID and set up LVM on it.
I actually changed from the raid controller that was originally on the servers to a "cheaper" version (from h700 to h200) so I could skip the hw raid and pass the drives to proxmox, since I was told multiple times that was the BEST and make it raidz2.
I wish there was a definitive answer.
 
I had a small cluster with Proxmox 3, in production for various years. It was configured with LVM. I upgraded to Proxmox5 to use real-time replication features, for which I had to install ZFS. Seriously the performance of my nodes has declined a lot, mainly because they have commented previously: the bottleneck is ZFS.

I have 2 HD slots free on each server, can I just add the ssds (live) to the server and mount them? I have looked for info on this but I can not find anything. I could bring the nodes down, but the problem still persist, how do I mount the extra 2 drives? (Ideally as mirrors, but WHERE on the gui or terminal?)
[/QUOTE]
It will depend on whether your server supports hot aggregation. Can you turn it off to add them?
 
It will depend on whether your server supports hot aggregation. Can you turn it off to add them?

Yes, I can bring the servers down, one at the time without impact, but still do not see HOW to add the other drives as a different pool ,, and maybe use a different type of zraid (like mirror) to make it faster
 
Creating a new zpool is really easy and then adding it via GUI is even easier.

I am busy at the moment, but if / when I get the time I will address all your questions including creating another mirror.

But even with two disks in ZFS mirror (and possible external ZIL or L2ARC cache), things will be to slow for you, based on my experience, unless the mirror is two enterprise grade SSD disks.

Right now I am about to test how many HDD disks are needed to get "acceptable" speed with ZFS for my use cases. I just built the hardware (picture attached :) and are about to configure switch and install test cluster if no customer interrupts me >:-/ . Will test multiple configurations up to 12 disks using different types of ZFS RAID, with and without external caching, different ashift values as well as zvol block sizes including replication over 10 Gbe .. lot's of work, but should be fun. :)
 

Attachments

  • IMG_20190207_133348.jpg
    IMG_20190207_133348.jpg
    420.1 KB · Views: 32
I just remembered, latest PM GUI also allows for creating ZFS pools, so no need to do it via command line.
upload_2019-2-7_13-44-10.png
 
Hi to all,

Yes zfs could be a bottleneck if you do not care about details. But if you take care about what load do you put on it is almost near as non-zfs like lvm. In many cases if you have the proper devices (Ram is most important ) it is faster.

Now at the subject:
- your pveperf results are very bad, especially the fsync (it is so bad and for this reson I am sure that the problem is outside of proxmox/zfs setup, maybe a hw raid controller)
- even so, as you can see, the problem seems to be your nas read speed, but you can test your nas+network speed:

dd if=/dev/urandom of={some-nas-share}/sample.txt bs=1G count=1

- also you can test your zfs dataset speed(copy a iso file):
cp /zfs-pool-name/a-dataset/some-iso /dev/null
 
Hi,

guletz, ZFS will be always twice as slow as "normal" filesystems by design, as it needs twice as many IO for writes.

Anyway i just finished the tests.

I tested on 10 x Enterprise 7200 RPM HDDs with and without 2 x s3500 8GB SLOG and 12,5G Cache (which was actually pretty empty) and ARC max around 11 GB. I tested with ashift 12 and 13, as well as zvolblocksize 8k and 16k.

Performance is really poor compared to my MDADM RAID with 4 HDD disks and is really not suitable for VMs which use GUI (eg. Windows with RDP). Also disk performance of single VM must be capped, if you do not want to take down whole system doing simple stuff with files inside VM.

In my experience, one needs SSDs for ZFS to be performant enough, where previously one could use HDDs with MDADM.

I will post the main results in another thread about ZFS performance right now.
 
Performance is really poor compared to my MDADM RAID with 4 HDD disks and is really not suitable for VMs which use GUI (eg. Windows with RDP). Also disk performance of single VM must be capped, if you do not want to take down whole system doing simple stuff with files inside VM.

I also use daily a RDP on a Win212r2 as a desktop(browsing, files, and with a small MSSql) and is very usable. In this case I have a single zfs mirror with 2xHDD(7200 rpm, non-eneterprise, 4kn) and no SLOG. On other node I have the same OS as a print-servers(many thousand labels/24h). So I can guess that I only have luck with zfs!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!