Hello All,
We purchased (3) boxes with (2) GOLD 6526Y CPUs, 512GB memory, 25G networking on ConnectX-5 cards, and (2) directly attached enterprise 6.4TB NVMe disks.
The machines are VERY fast locally, and the plan was to install ceph, but what is natively well north of 100k iops (4k random writes with cache disabled), turns into 1k iops in a VM. We went through all of the standard ceph pat your head and rub your belly looking at the bios performance modes, and turning every knob imaginable, and found that the performance isn't at all consistent enough to trust.
We have a lot of ZFS elsewhere in the network and are very comfortable with it, so I wonder if there is a way to deploy ZFS redundancy using this hardware. Is installing truenas scale in a base VM with elevated priority an option? What about RSF-1?
I'm aware of the ZFS 1-minute replication that many use, and while that sounds like it could work we do have some database and email workloads that I wouldn't want to lose 1 minute on a failure. I suppose we could cluster those services at the application level, but that would require a bit of effort and more cluster solutions to manage.
I'd like to get some insight from others that have slayed these dragons before me so I know which road to go bark up. Thanks!
We purchased (3) boxes with (2) GOLD 6526Y CPUs, 512GB memory, 25G networking on ConnectX-5 cards, and (2) directly attached enterprise 6.4TB NVMe disks.
The machines are VERY fast locally, and the plan was to install ceph, but what is natively well north of 100k iops (4k random writes with cache disabled), turns into 1k iops in a VM. We went through all of the standard ceph pat your head and rub your belly looking at the bios performance modes, and turning every knob imaginable, and found that the performance isn't at all consistent enough to trust.
We have a lot of ZFS elsewhere in the network and are very comfortable with it, so I wonder if there is a way to deploy ZFS redundancy using this hardware. Is installing truenas scale in a base VM with elevated priority an option? What about RSF-1?
I'm aware of the ZFS 1-minute replication that many use, and while that sounds like it could work we do have some database and email workloads that I wouldn't want to lose 1 minute on a failure. I suppose we could cluster those services at the application level, but that would require a bit of effort and more cluster solutions to manage.
I'd like to get some insight from others that have slayed these dragons before me so I know which road to go bark up. Thanks!