cluster performance degradation

waltar · Dec 30, 2024

Ceph won't be faster for local I/O to a vm/lxc but migration would. Ceph is a distributed filesystem and even if it's scaling with the number of nodes and osd's the performance win comes only with many I/O client nodes (=vm/lxc) machines.
Ceph is by design much more failure resistant than a local node but with it's complexity much more failures can occur and even same time. So it's not quiet good to deal with small ceph installations as they can survive only smaller "incidents" than a bigger installation which is mostly underlooked.

toto-ets · Dec 30, 2024

in your experience using ceph if i have latency problems on one node only for HDD or controller reasons, will i have problems on the other 2 nodes too right? will i have problems on the whole cluster right?

kellogs · Dec 30, 2024

toto dont think about ceph till you are able to deploy at least 5 nodes with enterprise nvme and 25G and above network. For now just focus on your ZFS.

_gabriel · Dec 30, 2024

toto-ets said:
will i have problems on the whole cluster right?

yes as CEPH is RAID over network.

toto-ets · Dec 30, 2024

I have a doubt then, in my opinion I have one of the 3 nodes that has something wrong, because now with zfs I see on one the IO delay is very high compared to the other, and I'm migrating, they both have the same hdd but the controller changes

there is something that doesn't add up for me, this is where there is delay and where I had the pool problem previously, that the HDDs disappear and then I have to restart it

_gabriel · Dec 30, 2024

toto-ets said:
I see on one the IO delay is very high compared to the other

is scrub finished ?

toto-ets · Dec 30, 2024

I always see this in the pool info

kellogs · Dec 30, 2024

what is your hardware spec?

waltar · Dec 30, 2024

When you migrate somethink it's normal that the "migrate from" host has more to do as the "migrate to" host and you cannot expect same load on both because reading is much harder - for hdd's - for I/O than write and even compress is harder than uncompress.

Edit: - for hdd's (in opposite for ssd's) -

_gabriel · Dec 30, 2024

Read line scan : Scrub is in progress
So io delay is expected.
zpool status will show full line

toto-ets · Dec 30, 2024

pve1 40 x Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz , 256G di ram ,
pve2 24 x INTEL(R) XEON(R) SILVER 4510 , 128G di ram

as you can see the pve1 is much more performing than the pve2, but I have much more delay

kellogs · Dec 30, 2024

what is your motherboard brand?

I think your RAM is a little compare with the ZFS total data but you need to check on this. I dont use ZFS for VM datas.

toto-ets · Dec 30, 2024

waltar said:
When you migrate somethink it's normal that the "migrate from" host has more to do as the "migrate to" host and you cannot expect same load on both because reading is much harder for I/O than write and even compress is harder than uncompress.

the host that I have + rotardo is the one I'm writing, it's the one that is writing, I'm doing a replication from pve2 to pve1, and pve1 is the one with a lot of delay IO

toto-ets · Dec 30, 2024

kellogs said:
qual è la marca della tua scheda madre?

Penso che la tua RAM sia un po' più piccola rispetto ai dati totali di ZFS, ma devi controllare. Non uso ZFS per i dati VM.

Supermicro X12spl-f , with supermicro PCI controller always

waltar · Dec 30, 2024

toto-ets said:
I'm doing a replication from pve2 to pve1, and pve1 is the one with a lot of delay IO

That's what I write, it's harder for pve1 as for pve2 and so that's what you are seeing also.

kellogs · Dec 30, 2024

are you using this controller?

toto-ets · Dec 30, 2024

yes this motherboard, but with the additional PCI controller

toto-ets · Dec 30, 2024

waltar said:
That's what I write, it's harder for pve1 as for pve2 and so that's what you are seeing also.

Now that I'm done with this replica, I'll do another one in reverse, from pve 1 to pve2, to see if the problem is reversed

kellogs · Dec 30, 2024

which brand PCI controller?

toto-ets · Dec 30, 2024

kellogs said:
quale marca di controller PCI?

Supermicro

cluster performance degradation

Renowned Member

Member

Active Member

Famous Member

Member

Famous Member

Member

Active Member

Renowned Member

Famous Member

Member

Active Member

Member

Member

Renowned Member

Active Member

Member

Member

Active Member

Member

We value your privacy