ZFS dedicated SLOG

Trigve

New Member
Oct 15, 2016
29
0
1
43
Hi,
I'm trying to find if my proxmox system (with ZFS) will benefit from adding dedicated SSD M.2 for the SLOG. So can I somehow profile my system and find the number of sync writes (O_SYNC), or better, find if my sync writes are bottlenecks? I don't want to benchmark, I want to get the info on my current running system.

In my node summary dashboard, IO delay is always about the CPU usage.

For now I'm using 3 VMs:
- Windows Server 2012 R2 as app server (small traffic, small disk/cpu usage)
- Debian file server/postgresql server (file server medicore traffic, postgresql small traffic for now, but will change to a lot of traffic in the next year)
- Debian mail server, lot of log traffic

I have 24 GB of RAM, 8 GB MAX for the ARC. Currently I have 2 pools, 1 rpool for system (mirror) and 1 pool for data (mirror 2 x 1 TB). Data pool is on 7200 SATA disks and want to add the SLOG for it.

Thank You
 
If you want a "buy guide", just look here:

http://www.sebastien-han.fr/blog/20...-if-your-ssd-is-suitable-as-a-journal-device/

I don't know what you think what benchmarking is, but it a way to get the info on your current running system - exactly what you asked for and your sync writes will be slow. Have you tried pveperf?

Why haven't you used one pool instead of two? You would have almost doubled your sync writes instantly and the overall performance would be better.
 
Thanks for the reply.
I've have already read the linked page, thank you anyway for providing it.

Regards the benchmarking. I've run benchmark tools, but I wanted to know what my system actually is doing. You know, the benchmark could make some "sync" test but if my system does only 20% of the sync writes, than this benchmark isn't important to me. What I would like to have is some kind of I/O profiler just like you have profilers for code.

Here are my pveperf results:
Code:
# pveperf /pve_data
CPU BOGOMIPS:      19179.60
REGEX/SECOND:      1260617
HD SIZE:           579.39 GB (pve_data)
FSYNCS/SECOND:     90.55
DNS EXT:           107.22 ms
DNS INT:           55.07 ms

I've created 2 pools to get the clean separation of the OS and the data. It's tradeoff in my case. Because my server does allow to have 4 LFF disks total (which I already have populated), the only other options are NVMe (M.2) ssd.
 
Seems reasonable. What is your tradeoff? ZFS separates the data perfectly and normally PVE does not write a lot (besides a Bug that the config database is written to often).

You're throwing away a 50% of your possible IOPS with this tradeoff. I'd only do that if both are not of the same speed. It they're all the same type, just combine them. NVMe is normally only one SSD or is there the possibility to add two NVMe SSDs?
 
Thanks for the reply,
both disk are sata HDD, but the one used for the root are 200 GB only. I want to have the root on a separate device, for simpler bare metal backup and recovery. This is my top priority, to have simple backup/restore strategy. I have also option to not to use ZFS on root if some circumstances occurs. I know that it could also work with one pool and multiple datasets, but I'm much "happier" with current setup. I just want to have some backdoor open if something go wrong (as I'm not yet experienced with ZFS).

As you probably know, NVMe comes in different setups such as M.2 or "direct" PCI card. I've adapter which can handle 3 M.2 SSD (2 SATA and 1 PCIe) on 1 PCI card (and want to test it).

Anyway, what are the average FSYNC on enterprise 7200 SATA HDDs? Do they vary a lot from disk to disk?
 
Thanks for the reply,
both disk are sata HDD, but the one used for the root are 200 GB only. I want to have the root on a separate device, for simpler bare metal backup and recovery. This is my top priority, to have simple backup/restore strategy.

With ZFS, that's much easier, but ...

I have also option to not to use ZFS on root if some circumstances occurs.

... yes, your should have tested that cases a couple of times and have a live-system with ZFS ready for that. I can strongly recommend to climb the steep learning curve of ZFS, it pays of. A lot! I'm running it everywhere at the moment. I also built my own RPi-ZFS system and have the best backup/recovery strategy available. Using ZFS's send/receive to transfer the difference asynchronously and incremental without much overhead. Also snapshots and compression are really nice! You're going to love it.

I know that it could also work with one pool and multiple datasets, but I'm much "happier" with current setup. I just want to have some backdoor open if something go wrong (as I'm not yet experienced with ZFS).

That's understandable.

As you probably know, NVMe comes in different setups such as M.2 or "direct" PCI card. I've adapter which can handle 3 M.2 SSD (2 SATA and 1 PCIe) on 1 PCI card (and want to test it).

Ah yes. I almost forgot about the adapter cards. I'm running my directly on the mainboard. There are mainboards up to 3 dedicated NVMe (4x PCIe lanes) slots available.

Anyway, what are the average FSYNC on enterprise 7200 SATA HDDs? Do they vary a lot from disk to disk?

The important term here is IOPS: https://en.wikipedia.org/wiki/IOPS#Mechanical_hard_drives and the rpm determines the maximum random IOPS.

The "enterprise" implies normally a different firmware and "better time to bit error". 7.2k is classically not a "enterprise" grade device, they normally come with 10k or 15k rpm and are therefore faster (in terms of IOPS).

But that does not necessarily mean something to your application. If you do not have a lot of I/Os, you're probably fine. You have to test or better monitor your system with respect to the I/O and I/O wait time. I would not want to have a database or mailserver on that hardware (for my workloads), but it could be ok for you.
 
Ah yes. I almost forgot about the adapter cards. I'm running my directly on the mainboard. There are mainboards up to 3 dedicated NVMe (4x PCIe lanes) slots available.

Just curious, what MB do you use?

The important term here is IOPS: https://en.wikipedia.org/wiki/IOPS#Mechanical_hard_drives and the rpm determines the maximum random IOPS.

The "enterprise" implies normally a different firmware and "better time to bit error". 7.2k is classically not a "enterprise" grade device, they normally come with 10k or 15k rpm and are therefore faster (in terms of IOPS).

But that does not necessarily mean something to your application. If you do not have a lot of I/Os, you're probably fine. You have to test or better monitor your system with respect to the I/O and I/O wait time. I would not want to have a database or mailserver on that hardware (for my workloads), but it could be ok for you.

Yes, there is still an option to replace the hard drives, even SAS if needed. The time will tell. But from my tests, I didn't notices very bad performance. I don't say that performance is superb, but it is ok for our situation. And the current situation is, that there is no budget for new disks.

Anyway, thanks for all the information.
 
Just curious, what MB do you use?

I'm using the NVMe on a Asus Maximum VIII Hero (desktop machine with Proxmox VE on one disk).


Yes, there is still an option to replace the hard drives, even SAS if needed. The time will tell. But from my tests, I didn't notices very bad performance. I don't say that performance is superb, but it is ok for our situation. And the current situation is, that there is no budget for new disks.

That's good. Your plan sounds reasonable.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!