Hey there,
------------------------
Context:
Final planning phase for storage configuration on a proxmox+ceph cluster.
I've ordered 4 X SuperMicro 2113S-WTRT, 7402P, 256GB RAM, 64GB SATADOM for boot, X710-T4 for 10G ethernets. I'll be adding 1-2 more nodes when we move to a new facility next year sometime. For now, the cluster is in development/learning phase to spin up server/security functions, and perform various failure testing.
I posted a previous thread on this build ( https://forum.proxmox.com/threads/n...ues-problems-few-questions.59186/#post-273646 ) This post is more down to a specific set of questions regarding write-sync performance concerns. Wolfgang brought up concerns about this performance metric for a db/wal device, which, upon further research, are certainly valid concerns. My storage plans have changed a bit since then so I'm not sure if I need to worry about this or not.
With the price of SSD space rapidly plummeting, I'm leaning towards not even doing an external "slow" pool of storage at all, as the costs by the time we figure in SAS cards, JBOD enclosure, large enterprise grade NVME SSD's with full PLP for block.db / block.wal, and the disks themselves was adding up to almost as much as just buying that much storage in consumer grade SSD's. It also occurs to me that the simpler I can make this thing the better. If I can do everything in a single, simple, SSD pool, then there's a value to that simplicity.
Assume for the moment I'm planning to skip the spinning disk arrays and just add 4TB SSD's as-needed to the cluster (and just add more nodes, possibly configured with less cores/ram as storage-only nodes when/if we need more space)... Also assume we're going to exploit consumer grade SSD's with no full-PLP implementation (bad sync-write performance) for bulk storage...
-------------------------
Questions:
Does sync-write performance impact ALL write workloads in ceph? Or is it tied more specifically to block.db or block.wal or both of these?
How important is sync write performance is going to be, assuming each "consumer-grade" SSD hosts it's own block.db and block.wal?
Would offloading just the WAL for all the "consumer" SSD's to something like an M.2 Optane drive be worthwhile? If so, how would I calculate the space requirements for just the WAL device?
--------------------
Thanks!
-Eric
------------------------
Context:
Final planning phase for storage configuration on a proxmox+ceph cluster.
I've ordered 4 X SuperMicro 2113S-WTRT, 7402P, 256GB RAM, 64GB SATADOM for boot, X710-T4 for 10G ethernets. I'll be adding 1-2 more nodes when we move to a new facility next year sometime. For now, the cluster is in development/learning phase to spin up server/security functions, and perform various failure testing.
I posted a previous thread on this build ( https://forum.proxmox.com/threads/n...ues-problems-few-questions.59186/#post-273646 ) This post is more down to a specific set of questions regarding write-sync performance concerns. Wolfgang brought up concerns about this performance metric for a db/wal device, which, upon further research, are certainly valid concerns. My storage plans have changed a bit since then so I'm not sure if I need to worry about this or not.
With the price of SSD space rapidly plummeting, I'm leaning towards not even doing an external "slow" pool of storage at all, as the costs by the time we figure in SAS cards, JBOD enclosure, large enterprise grade NVME SSD's with full PLP for block.db / block.wal, and the disks themselves was adding up to almost as much as just buying that much storage in consumer grade SSD's. It also occurs to me that the simpler I can make this thing the better. If I can do everything in a single, simple, SSD pool, then there's a value to that simplicity.
Assume for the moment I'm planning to skip the spinning disk arrays and just add 4TB SSD's as-needed to the cluster (and just add more nodes, possibly configured with less cores/ram as storage-only nodes when/if we need more space)... Also assume we're going to exploit consumer grade SSD's with no full-PLP implementation (bad sync-write performance) for bulk storage...
-------------------------
Questions:
Does sync-write performance impact ALL write workloads in ceph? Or is it tied more specifically to block.db or block.wal or both of these?
How important is sync write performance is going to be, assuming each "consumer-grade" SSD hosts it's own block.db and block.wal?
Would offloading just the WAL for all the "consumer" SSD's to something like an M.2 Optane drive be worthwhile? If so, how would I calculate the space requirements for just the WAL device?
--------------------
Thanks!
-Eric