Hi
We are in the process of migrating services from servers running a local ZFS-mirror on 4TB nVME SSD drives. We are migrating to new servers with the same specifications and hardware as what is currently running, but instead of spreading services over two nodes (a 3rd node runs some internal services and is part of the same cluster), we have deployed a 4-node ceph cluster for enabling failover/HA on the services.
We have succesfully migrated most services over, but have started to recieve reports of longer than usual response times. This data comes from both users of the services as well as monitoring data fra Zabbix.
We run only VMs and no LXC containers. All VMs are configured to have disk cache as "writeback", not "writeback (unsafe)", as I read somewhere in this forum that setting the VM disk cache to writeback would give the best possible performance on ZFS. Is this the same when running on Ceph or is it best to leave it on "default"?
Ceph configuration is fairly simple as WAL and DB are located on the same disks as the VMs
The network is a 10Gbe link for each host, but currently running on MTU=1500. From what I understand, if Ceph is configured to ensure at least two copies of the data has written to disk before it confirms the write operation to the disks, I would be able to transfer more data by setting MTU=9000, which should in this case improve the situation. But would I then need to change network configuration of all VMs to also have MTU=9000?
We are in the process of migrating services from servers running a local ZFS-mirror on 4TB nVME SSD drives. We are migrating to new servers with the same specifications and hardware as what is currently running, but instead of spreading services over two nodes (a 3rd node runs some internal services and is part of the same cluster), we have deployed a 4-node ceph cluster for enabling failover/HA on the services.
We have succesfully migrated most services over, but have started to recieve reports of longer than usual response times. This data comes from both users of the services as well as monitoring data fra Zabbix.
We run only VMs and no LXC containers. All VMs are configured to have disk cache as "writeback", not "writeback (unsafe)", as I read somewhere in this forum that setting the VM disk cache to writeback would give the best possible performance on ZFS. Is this the same when running on Ceph or is it best to leave it on "default"?
Ceph configuration is fairly simple as WAL and DB are located on the same disks as the VMs
The network is a 10Gbe link for each host, but currently running on MTU=1500. From what I understand, if Ceph is configured to ensure at least two copies of the data has written to disk before it confirms the write operation to the disks, I would be able to transfer more data by setting MTU=9000, which should in this case improve the situation. But would I then need to change network configuration of all VMs to also have MTU=9000?