Hi,
Today we changed storage for a (debian wheezy) VM to SCSI with virtio SCSI and and rebooted. Came up fine. Storage is on ceph. (three node cluster, 10G network, with total of 12 OSDs)
Then we issued "fstrim -v / " on the VM and some trouble appeared:
in the wheezy guest we received:
Some example lines from the ceph log:
Otherwise the cluster has been and is running perfectly, with nothing but HEALTH_OK over the last weeks.
The above HEALTH_WARN also automatically disappeared again, and we're back to the usual HEALTH_OK now.
My question: how is it possible for a client "fstrim" to make the ceph cluster become HEALTH_WARN? The status came back to HEALTH_OK automatically, but fstrim did not free any space. (the ceph used and available did not change)
Proxmox 4.2-17/e1400248, three hosts, 4 OSD's per host.
Any ideas?
Today we changed storage for a (debian wheezy) VM to SCSI with virtio SCSI and and rebooted. Came up fine. Storage is on ceph. (three node cluster, 10G network, with total of 12 OSDs)
Then we issued "fstrim -v / " on the VM and some trouble appeared:
in the wheezy guest we received:
end_request: I/O error, dev sda, sector 185410448
end_request: I/O error, dev sda, sector 187507600
and some more lines like this. On the ceph cluster, status changed to HEALTH_WARN with blocked requests > 32 secend_request: I/O error, dev sda, sector 187507600
Some example lines from the ceph log:
2016-09-12 12:20:47.417408 osd.8 10.10.89.3:6808/2979 2080 : cluster [WRN] slow request 30.417220 seconds old, received at 2016-09-12 12:20:17.000120: osd_op(client.1905008.0:16775 rbd_data.5712f238e1f29.000000000000f8ca [delete] 2.3a5aa8c4 snapc 1b=[1b,6] ack+ondisk+write+known_if_redirected e1895) currently commit_sent
and2016-09-12 12:20:49.615040 osd.6 10.10.89.2:6808/2960 2277 : cluster [WRN] slow request 32.608357 seconds old, received at 2016-09-12 12:20:17.006174: osd_op(client.1905008.0:17130 rbd_data.5712f238e1f29.000000000000fa2c [delete] 2.d3536127 snapc 1b=[1b,6] ack+ondisk+write+known_if_redirected e1895) currently waiting for subops from 2,10
and2016-09-12 12:20:48.283468 osd.1 10.10.89.1:6808/3039 1063 : cluster [WRN] slow request 31.274059 seconds old, received at 2016-09-12 12:20:17.009346: osd_op(client.1905008.0:17174 rbd_data.5712f238e1f29.000000000000fa58 [delete] 2.e5a9c48d snapc 1b=[1b,6] ack+ondisk+write+known_if_redirected e1895) currently no flag points reached
basically affecting various OSDs, not tied to a specific proxmox node.Otherwise the cluster has been and is running perfectly, with nothing but HEALTH_OK over the last weeks.
The above HEALTH_WARN also automatically disappeared again, and we're back to the usual HEALTH_OK now.
My question: how is it possible for a client "fstrim" to make the ceph cluster become HEALTH_WARN? The status came back to HEALTH_OK automatically, but fstrim did not free any space. (the ceph used and available did not change)
Proxmox 4.2-17/e1400248, three hosts, 4 OSD's per host.
Any ideas?