Hi,
Like many other posters in this forum, I'm struggling with "very slow" disk I/O from guest VMs in proxmox.
For reference, I'm using buster which I think is 6.4(?). One of the threads I read yesterday suggested moving from ceph-nautilus to ceph-octopus, which I did yesterday and that has helped (write speed went up from 70MB/s to 100MB/s). Something that I don't recall seeing before was the option to choose "VirtIO" for the type of hard drive when adding a new disk to a VM.
In my case the OSDs are all RAID groups of SSDs, with the raw devices ("/dev/sd*") presented into proxmox, not LVM volumes. The RAID volumes for OSDs are distinct from the volume used to run proxmox. Using /sys, the RAID disks present as volumes with a physical block size of 4096 bytes and a logical volume size of 512 bytes. Which size will ceph be using? How do I verify if ceph is using 4K or make it use 4k if it isn't?
In addition to the "Metrics Server" (influxbd) configuration being set, I've got performance metrics from the OS going into influxdb which I can then query with grafana.
What I see is about 40% disk load generating 100MB/s with about 400 IOPS. When the system is otherwise occupied, I can see it doing > 10,000 IOPS (ie RAID controller isn't an issue.)
The cluster I'm running has a "front end" network that's 1G and a back end network that's 10G. When I'm doing disk performance tests, the 1G network is running at 100% - suggesting that there's some workload that could move.
In the ceph.conf, global.cluster_network is the 10G /24, global.mon_host has 10G IP#s but all of the mds.*.host entries point to a hostname that resolves to the 1G IP# and mon.*.public_addr entries point to an IP# on the public (1G) network. Can all of these be pushed to use IP#s on the 10G network?
Like many other posters in this forum, I'm struggling with "very slow" disk I/O from guest VMs in proxmox.
For reference, I'm using buster which I think is 6.4(?). One of the threads I read yesterday suggested moving from ceph-nautilus to ceph-octopus, which I did yesterday and that has helped (write speed went up from 70MB/s to 100MB/s). Something that I don't recall seeing before was the option to choose "VirtIO" for the type of hard drive when adding a new disk to a VM.
In my case the OSDs are all RAID groups of SSDs, with the raw devices ("/dev/sd*") presented into proxmox, not LVM volumes. The RAID volumes for OSDs are distinct from the volume used to run proxmox. Using /sys, the RAID disks present as volumes with a physical block size of 4096 bytes and a logical volume size of 512 bytes. Which size will ceph be using? How do I verify if ceph is using 4K or make it use 4k if it isn't?
In addition to the "Metrics Server" (influxbd) configuration being set, I've got performance metrics from the OS going into influxdb which I can then query with grafana.
What I see is about 40% disk load generating 100MB/s with about 400 IOPS. When the system is otherwise occupied, I can see it doing > 10,000 IOPS (ie RAID controller isn't an issue.)
The cluster I'm running has a "front end" network that's 1G and a back end network that's 10G. When I'm doing disk performance tests, the 1G network is running at 100% - suggesting that there's some workload that could move.
In the ceph.conf, global.cluster_network is the 10G /24, global.mon_host has 10G IP#s but all of the mds.*.host entries point to a hostname that resolves to the 1G IP# and mon.*.public_addr entries point to an IP# on the public (1G) network. Can all of these be pushed to use IP#s on the 10G network?