HIgh IO, Slow performance

Manny Vazquez · Feb 5, 2019

Hi,
I have a new Cluster, setup to what I understood was best practices each node with 2*6 cpu, 4*1tb sas disk on raidz2, 128 gb ram, separate bond network for corosync, etc.. everything that I have learned and put together over the last year.

But, this new cluster is VERY SLOW, much slower than the one I am trying to replace. I noticed a high IO read out

Restoring a vm is painfully slow and the web gui seems to lose connection frequently.

Where or how would I start to troubleshoot this slowness..?

root@pve2:~# pveperf
CPU BOGOMIPS: 134060.88
REGEX/SECOND: 1654004
HD SIZE: 1624.56 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 1.49
DNS EXT: 56.93 ms
DNS INT: 16.04 ms (medstar.local)
root@pve2:~#

root@pve2:~# zpool status
pool: rpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sda3 ONLINE 0 0 0
sdb3 ONLINE 0 0 0
sdd3 ONLINE 0 0 0
sdc3 ONLINE 0 0 0

errors: No known data errors

root@pve2:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 3.62T 345G 3.29T - 0% 9% 1.00x ONLINE -

root@pve2:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 63G 0 63G 0% /dev
tmpfs 13G 26M 13G 1% /run
rpool/ROOT/pve-1 1.6T 59G 1.6T 4% /
tmpfs 63G 69M 63G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 63G 0 63G 0% /sys/fs/cgroup
rpool 1.6T 128K 1.6T 1% /rpool
rpool/ROOT 1.6T 128K 1.6T 1% /rpool/ROOT
rpool/data 1.6T 128K 1.6T 1% /rpool/data
rpool/data/subvol-101-disk-0 20G 476M 20G 3% /rpool/data/subvol-101-disk-0
/dev/fuse 30M 28K 30M 1% /etc/pve
//172.21.82.100/WinShare 95G 14G 82G 14% /mnt/pve/FreeNas1
tmpfs 13G 0 13G 0% /run/user/0

Please help ..

Manny Vazquez · Feb 5, 2019

Finally restore finished

That is a little over one hour to restore ..

mailinglists · Feb 6, 2019

Your bottleneck is ZFS. Namely IOPS or FSYNCs as shown with pveperf.
You could try to change from RAID Z2 to RAID 10 to increase IOPS.
You could try adding ZIL/SLOG to increase synchronous writes and possible L2ARC (have to take care because L2ARC table also eats RAM).
But I guess the performance will really be only good enough if you switch to two or more enterprise grade SSDs.

Or you could switch to LVM or unsupported mdadm RAID. Or you could buy a cheap HW RAID and set up LVM on it.

Manny Vazquez · Feb 6, 2019

mailinglists said:
Your bottleneck is ZFS. Namely IOPS or FSYNCs as shown with pveperf.

Not sure what the number tells, where can I read about that?

mailinglists said:
You could try to change from RAID Z2 to RAID 10 to increase IOPS.

That would mean losing what I thought was the advantage of raidz2, best of both worlds passing the divers directlyu to proxmox and let zfs deal, instead of using a hardware raid, which I was told many times is not good since zfs does not SEE the drives.

mailinglists said:
You could try adding ZIL/SLOG to increase synchronous writes and possible L2ARC (have to take care because L2ARC table also eats RAM).

No idea about this, I will research

mailinglists said:
But I guess the performance will really be only good enough if you switch to two or more enterprise grade SSDs.

I have 2 HD slots free on each server, can I just add the ssds (live) to the server and mount them? I have looked for info on this but I can not find anything. I could bring the nodes down, but the problem still persist, how do I mount the extra 2 drives? (Ideally as mirrors, but WHERE on the gui or terminal?)[/QUOTE]

mailinglists said:
Or you could switch to LVM or unsupported mdadm RAID. Or you could buy a cheap HW RAID and set up LVM on it.

I actually changed from the raid controller that was originally on the servers to a "cheaper" version (from h700 to h200) so I could skip the hw raid and pass the drives to proxmox, since I was told multiple times that was the BEST and make it raidz2.
I wish there was a definitive answer.

Andres.Urzagasti · Feb 6, 2019

I had a small cluster with Proxmox 3, in production for various years. It was configured with LVM. I upgraded to Proxmox5 to use real-time replication features, for which I had to install ZFS. Seriously the performance of my nodes has declined a lot, mainly because they have commented previously: the bottleneck is ZFS.

Manny Vazquez said:
I have 2 HD slots free on each server, can I just add the ssds (live) to the server and mount them? I have looked for info on this but I can not find anything. I could bring the nodes down, but the problem still persist, how do I mount the extra 2 drives? (Ideally as mirrors, but WHERE on the gui or terminal?)

[/QUOTE]
It will depend on whether your server supports hot aggregation. Can you turn it off to add them?

Manny Vazquez · Feb 6, 2019

reconquista said:
It will depend on whether your server supports hot aggregation. Can you turn it off to add them?

Yes, I can bring the servers down, one at the time without impact, but still do not see HOW to add the other drives as a different pool ,, and maybe use a different type of zraid (like mirror) to make it faster

Andres.Urzagasti · Feb 6, 2019

Manny Vazquez said:
Yes, I can bring the servers down, one at the time without impact, but still do not see HOW to add the other drives as a different pool ,, and maybe use a different type of zraid (like mirror) to make it faster

You need to do this manually... its very complicated... but not impossible. Check in https://www.servethehome.com/add-raid-1-mirrored-zpool-to-proxmox-ve/

mailinglists · Feb 7, 2019

Creating a new zpool is really easy and then adding it via GUI is even easier.

I am busy at the moment, but if / when I get the time I will address all your questions including creating another mirror.

But even with two disks in ZFS mirror (and possible external ZIL or L2ARC cache), things will be to slow for you, based on my experience, unless the mirror is two enterprise grade SSD disks.

Right now I am about to test how many HDD disks are needed to get "acceptable" speed with ZFS for my use cases. I just built the hardware (picture attached

and are about to configure switch and install test cluster if no customer interrupts me >:-/ . Will test multiple configurations up to 12 disks using different types of ZFS RAID, with and without external caching, different ashift values as well as zvol block sizes including replication over 10 Gbe .. lot's of work, but should be fun.

mailinglists · Feb 7, 2019

I just remembered, latest PM GUI also allows for creating ZFS pools, so no need to do it via command line.

guletz · Feb 7, 2019

Hi to all,

Yes zfs could be a bottleneck if you do not care about details. But if you take care about what load do you put on it is almost near as non-zfs like lvm. In many cases if you have the proper devices (Ram is most important ) it is faster.

Now at the subject:
- your pveperf results are very bad, especially the fsync (it is so bad and for this reson I am sure that the problem is outside of proxmox/zfs setup, maybe a hw raid controller)
- even so, as you can see, the problem seems to be your nas read speed, but you can test your nas+network speed:

dd if=/dev/urandom of={some-nas-share}/sample.txt bs=1G count=1

- also you can test your zfs dataset speed(copy a iso file):
cp /zfs-pool-name/a-dataset/some-iso /dev/null

mailinglists · Feb 18, 2019

Hi,

guletz, ZFS will be always twice as slow as "normal" filesystems by design, as it needs twice as many IO for writes.

Anyway i just finished the tests.

I tested on 10 x Enterprise 7200 RPM HDDs with and without 2 x s3500 8GB SLOG and 12,5G Cache (which was actually pretty empty) and ARC max around 11 GB. I tested with ashift 12 and 13, as well as zvolblocksize 8k and 16k.

Performance is really poor compared to my MDADM RAID with 4 HDD disks and is really not suitable for VMs which use GUI (eg. Windows with RDP). Also disk performance of single VM must be capped, if you do not want to take down whole system doing simple stuff with files inside VM.

In my experience, one needs SSDs for ZFS to be performant enough, where previously one could use HDDs with MDADM.

I will post the main results in another thread about ZFS performance right now.

guletz · Feb 19, 2019

mailinglists said:
Performance is really poor compared to my MDADM RAID with 4 HDD disks and is really not suitable for VMs which use GUI (eg. Windows with RDP). Also disk performance of single VM must be capped, if you do not want to take down whole system doing simple stuff with files inside VM.

I also use daily a RDP on a Win212r2 as a desktop(browsing, files, and with a small MSSql) and is very usable. In this case I have a single zfs mirror with 2xHDD(7200 rpm, non-eneterprise, 4kn) and no SLOG. On other node I have the same OS as a print-servers(many thousand labels/24h). So I can guess that I only have luck with zfs!

Search

Search

HIgh IO, Slow performance

Manny Vazquez

Well-Known Member

Manny Vazquez

Well-Known Member

mailinglists

Renowned Member

Manny Vazquez

Well-Known Member

Andres.Urzagasti

Renowned Member

Manny Vazquez

Well-Known Member

Andres.Urzagasti

Renowned Member

mailinglists

Renowned Member

Attachments

mailinglists

Renowned Member

guletz

Famous Member

mailinglists

Renowned Member

guletz

Famous Member