cluster performance degradation

Ceph won't be faster for local I/O to a vm/lxc but migration would. Ceph is a distributed filesystem and even if it's scaling with the number of nodes and osd's the performance win comes only with many I/O client nodes (=vm/lxc) machines.
Ceph is by design much more failure resistant than a local node but with it's complexity much more failures can occur and even same time. So it's not quiet good to deal with small ceph installations as they can survive only smaller "incidents" than a bigger installation which is mostly underlooked.
 
Last edited:
in your experience using ceph if i have latency problems on one node only for HDD or controller reasons, will i have problems on the other 2 nodes too right? will i have problems on the whole cluster right?
 
toto dont think about ceph till you are able to deploy at least 5 nodes with enterprise nvme and 25G and above network. For now just focus on your ZFS.
 
  • Like
Reactions: cave
I have a doubt then, in my opinion I have one of the 3 nodes that has something wrong, because now with zfs I see on one the IO delay is very high compared to the other, and I'm migrating, they both have the same hdd but the controller changes
1735517705505.png

1735517735506.png

there is something that doesn't add up for me, this is where there is delay and where I had the pool problem previously, that the HDDs disappear and then I have to restart it
 
Last edited:
When you migrate somethink it's normal that the "migrate from" host has more to do as the "migrate to" host and you cannot expect same load on both because reading is much harder - for hdd's - for I/O than write and even compress is harder than uncompress.

Edit: - for hdd's (in opposite for ssd's) -
 
Last edited:
pve1 40 x Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz , 256G di ram ,
pve2 24 x INTEL(R) XEON(R) SILVER 4510 , 128G di ram

as you can see the pve1 is much more performing than the pve2, but I have much more delay
 
what is your motherboard brand?

I think your RAM is a little compare with the ZFS total data but you need to check on this. I dont use ZFS for VM datas.
 
When you migrate somethink it's normal that the "migrate from" host has more to do as the "migrate to" host and you cannot expect same load on both because reading is much harder for I/O than write and even compress is harder than uncompress.
the host that I have + rotardo is the one I'm writing, it's the one that is writing, I'm doing a replication from pve2 to pve1, and pve1 is the one with a lot of delay IO
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!