ritardo IO

toto-ets · Sunday at 12:52

I'm using proxmox VE 8.4.1, I have 2 nodes one with a 24 x INTEL(R) XEON(R) SILVER 4510 CPU and 128G of RAM, the other with a 20 x Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz with 256G of RAM, both have 4 x 2TB HDD disks in RAID5 ZFS, on the node with the 24 x INTEL(R) XEON(R) SILVER 4510 CPU and 128G of RAM I experience a lot of IO delay, on both nodes I only have one vm that I migrate between one node and the other, but on the node with the most performing CPU I have a very high IO delay, even during the migration phase the delay increases a lot, what could it be?

toto-ets · Sunday at 13:45

I also tried without ZFS, or with single disk, but it's the same thing

UdoB · Sunday at 21:01

toto-ets said:
4 x 2TB HDD disks in RAID5 ZFS

From that sentence it is not clear if you have a hardware Raid5 and use ZFS on top of it, or if you have a RaidZ1, which is similar like a Raid5.

Whatsoever - that does not work well. No way!

My recommendation, as mentioned multiple times here in the forum:

use an HBA, not hardware Raid - make sure PVE sees all physical disks
during installation build a single ZFS pool from mirrored vdevs (similar Raid10) - and use all HDDs for this; each and every disk will be bootable at the end, which is really nice, isn't it?
after initial setup add a fast "Special Device", at least mirrored. This is crucial! Use two NVMe if possible or use two SATA. Use "Enterprise class" devices with PLP for this. This must be done on the CLI, afaik. You'll find tutorials if you search for it...

This is the only approach I know which possibly(!) might get you acceptable performance for generic use. Note that "HDD only" just doesn't cut it nowadays...

toto-ets · 2025-05-12T02:03:50+0200

before using zfs I was using ceph, but after a month I had serious IO problems, I would like to go back to ceph and use 4 16TB HDDs with the addition of a 1TB SDD for DB/WAL, what do you think?

Neobin · 2025-05-12T04:24:00+0200

toto-ets said:
both have 4 x 2TB HDD disks

Are all eight the exact same model?
Are the controllers, to which those are connected, identical?

toto-ets said:
before using zfs I was using ceph, but after a month I had serious IO problems, I would like to go back to ceph and use 4 16TB HDDs with the addition of a 1TB SDD for DB/WAL, what do you think?

Did you think about the fact, that HDDs may simply not sufficient for your workload?

UdoB · 2025-05-12T08:57:38+0200

toto-ets said:
I would like to go back to ceph and use 4 16TB HDDs

Well..., no! You need several nodes and multiple OSD per node for a good experience, beside other things like a fast (>=10GBit/s) and redundant network. I had used Ceph over a year in my "productive Homelab(!)" - starting as small as possible. Some notes:

[TUTORIAL] Thread 'FabU: can I use Ceph in a _very_ small cluster?'

Dec 26, 2024

Ceph is great, but it needs some resources above the theoretical minimum to work reliably. My assumptions for the following text:

you want to use Ceph because... why not?
you want to use High Availability - which requires Shared Storage (note that a complete solution needs more things like a redundant network stack and power supplies)
you want to start as small (and cheap) as possible, because this is... “only” a Homelab

You plan for three Nodes. Each node has s single dedicated disk for use as an “OSD”. This is the documented...

toto-ets · 2025-05-12T10:30:02+0200

I read that for ceph they recommend using a few HDDs but larger in capacity, instead of many smaller HDDs, and they are talking about at least a minimum of 3 nodes, honestly I find different ideas on the web

toto-ets · 2025-05-12T10:31:28+0200

Neobin said:
Are all eight the exact same model?

Are the controllers, to which those are connected, identical?

yes, they are identical twins

toto-ets · 2025-05-12T10:32:46+0200

Neobin said:
Did you think about the fact, that HDDs may simply not sufficient for your workload?

in fact I was thinking of adding an SSD for DB/WAL which I understand would increase performance

UdoB · 2025-05-12T10:57:12+0200

toto-ets said:
I read that for ceph they recommend using a few HDDs but larger in capacity, instead of many smaller HDDs,

Under which specific circumstances? For me this sound plainly wrong. Especially with HDDs you want a zillion independent ones, not only a few.

Disclaimer, as already noted: I have dropped Ceph.

toto-ets · 2025-05-12T13:29:29+0200

so it seems to me that it is better to use ZFS in RAID10, and not ceph?

UdoB · 2025-05-12T14:16:34+0200

toto-ets said:
so it seems to me that it is better to use ZFS in RAID10, and not ceph?

Yes.

Disclaimer: I am a well-known ZFS-fanboy... ;-)

[TUTORIAL] Thread 'FabU: This is just a small setup with limited resources and only a few disks, should I use ZFS at all?'

Jan 3, 2025

Should I use ZFS at all?

For once this requires a disclaimer first: I am using ZFS nearly everywhere, where it is easily possible, though exceptions do exists. In any case I am definitely biased pro ZFS.

That said..., the correct answer is obviously: “Yes, of course” ;-)

Integrity

ZFS assures integrity. It will deliver exactly the very same data when you read it as was written at some point of time in the past.

“But all filesystems do this, right?” Well, there is more to it. ZFS works hard to actively assure the correctness. An...

toto-ets · 2025-05-12T15:38:25+0200

I would like to make a ZFS raid 10 pool, with 4 16TB HDDs, and then add a 1TB SDD for bd/wal, to increase performance? Is it possible to do it with zfs?

toto-ets · 2025-05-12T16:03:18+0200

when I create the ZFS pool I can't find the item I can't find the item add disk as cache (L2ARC) or log (SLOG) why?

UdoB · 2025-05-12T16:03:39+0200

toto-ets said:
I would like to make a ZFS raid 10 pool, with 4 16TB HDDs

Okay.

toto-ets said:
and then add a 1TB SDD for bd/wal,

That's wrong terminology. In ZFS there is an optional CACHE, which is a "read-only" caching device. And there is an SLOG, a "Separate LOG for the ZIL (the ZFS-Intention Log)".

Both are usually NOT recommended as they work differently than expected - most of the time. A Cache is a second level extension to the ARC (adaptive replacement cache), which always lives in Ram. When you establish a secondary Cache then this one needs Ram to work. This Ram is taken away from the system --> less Ram left for the normal ARC. Adding a large Cache may slow down your system. The recommendation is: upgrade your Ram to the absolute maximum possible. Only then re-evaluate (learn to read the output of arc_summary) the usefulness of a second level Cache.

An SLOG is often understood as a write-cache, which it is not. The SLOG is usually write-only. Never is data read from it. With the only exception that you encounter a power failure and data was written to SLOG but not yet to the data-disk. In that case its data is read during the next boot when importing the pool. Another aspect is that SLOG accelerates SYNC writes only. "Normal" writes are asynchron and SLOG has nothing to do with it.

It is worth to note that a ZIL exists with and without a SLOG. Without a dedicated SLOG the ZIL lives on the data-disks. That's the main reason why SYNC writes are slow and a separate SLOG would help with that.

In #3 I already mentioned a "Special Device". That's my recommendation. It must be good quality (mirrored and w/ PLP) as a loss of this one means loosing the complete pool.

UdoB · 2025-05-12T16:06:02+0200

toto-ets said:
when I create the ZFS pool I can't find the item I can't find the item add disk as cache (L2ARC) or log (SLOG) why?

Only staff can tell you.

A lot of things only work an the CLI... and with knowledge not presented in the PVE documentation...

toto-ets · 2025-05-12T22:45:25+0200

I wonder why they sponsor ceph if it doesn't do what they say? But even using dichi SDD series enterprise the situation is always tragic?

ritardo IO

Member

Member

Distinguished Member

Member

Distinguished Member

Distinguished Member

[TUTORIAL] Thread 'FabU: can I use Ceph in a _very_ small cluster?'

Member

Member

Member

Distinguished Member

Member

Distinguished Member

[TUTORIAL] Thread 'FabU: This is just a small setup with limited resources and only a few disks, should I use ZFS at all?'

Should I use ZFS at all?​

Integrity​

Member

Member

Distinguished Member

Distinguished Member

Member

We value your privacy

Should I use ZFS at all?

Integrity