Hi,
I have a three node PVE cluster with identical nodes. Each node has an SSD that is part of a Ceph pool (I know, I should have more SSDs in the pool). And each node also has two HDDs that are part of another Ceph pool.
I replaced the three enterprise grade SSDs with three other, larger enterprise grade SSDs. Two nodes show low iodelays (2%) while one node shows very high iodelays (25%). The three nodes are basically identical (make, model, cpu, memory) and also the (old as well as the) new enterprise grade SSDs are identical (make, model, size). The only difference is the cpu and memory load which is lower (!) on the the node with the high iodelay than on the other two nodes. I even migrated / shut down all VMs on the offending node but the iodelay is still there.
IIRC, the high iodelay was not there before I replaced the SSDs (but I did not check before). So it could -- somehow -- be the one SSD in the offending node. Or something else I'm missing. How can I figure out where this is coming from?
Thanks!
I have a three node PVE cluster with identical nodes. Each node has an SSD that is part of a Ceph pool (I know, I should have more SSDs in the pool). And each node also has two HDDs that are part of another Ceph pool.
I replaced the three enterprise grade SSDs with three other, larger enterprise grade SSDs. Two nodes show low iodelays (2%) while one node shows very high iodelays (25%). The three nodes are basically identical (make, model, cpu, memory) and also the (old as well as the) new enterprise grade SSDs are identical (make, model, size). The only difference is the cpu and memory load which is lower (!) on the the node with the high iodelay than on the other two nodes. I even migrated / shut down all VMs on the offending node but the iodelay is still there.
IIRC, the high iodelay was not there before I replaced the SSDs (but I did not check before). So it could -- somehow -- be the one SSD in the offending node. Or something else I'm missing. How can I figure out where this is coming from?
Thanks!