VM IO freeze for 15 seconds when Ceph node reboot gracefully

Kelvin Kam

Active Member
Aug 8, 2017
7
0
41
28
Hi, I have built a Hyper-coveraged 3 nodes cluster with Proxmox and using ceph for shared storage.
However I observed VM IO will hang for about 15 seconds when one Ceph node performing graceful reboot. I had already tried the reboot procedure from redhat, configured noout and norebalance, but no luck.

Could anyone tell me if the freeze period can be decreased? Thank you very much.
 
How many monitors, managers and osd do you have?
3 monitors and 3 managers
Total 7 OSD, but sedated as two pool with crush rules, size has configured to 2/1
Node 1: 2 OSD (1 in affected pool)
Node 2: 3 OSD (2 in affected pool)
Node 3: 3 OSD (2 in affected pool)
*affected pool means the pool that stored testing VM
 
Total 7 OSD, but sedated as two pool with crush rules, size has configured to 2/1
The min_size 1 is dangerous. In-flight data might not be written out to a PG, leaving no copy available.

However I observed VM IO will hang for about 15 seconds when one Ceph node performing graceful reboot.
There will always be some time till a new primary OSD is selected. If you don't have already, try to set the cache on the disk to writeback. This might help in minimizing the effect.
 
The min_size 1 is dangerous. In-flight data might not be written out to a PG, leaving no copy available.
Thank you for your reply, so 3/2 is recommended even for small setup? Would 2/2 provides better performance?

There will always be some time till a new primary OSD is selected. If you don't have already, try to set the cache on the disk to writeback. This might help in minimizing the effect.
I will try this later. But it means the freeze time should be lesser if the rebooting node is not assigned as "Primary OSD"?
 
Thank you for your reply, so 3/2 is recommended even for small setup? Would 2/2 provides better performance?
Yes, but on node/OSD failure it will go into read-only mode till all replicas have been recovered.

I will try this later. But it means the freeze time should be lesser if the rebooting node is not assigned as "Primary OSD"?
Should be, but since Ceph doesn't know locality they are distributed.
 
There will always be some time till a new primary OSD is selected. If you don't have already, try to set the cache on the disk to writeback. This might help in minimizing the effect.
Hi Alwin, I think it may reduce the freeze of write for 2-3 seconds only, however the read is still freeze when the node initialize reboot...
 
I don't think that you can get rid of it completely, especially in a small cluster. Can you post a config of your VM, qm config <id>?
 
I don't think that you can get rid of it completely, especially in a small cluster. Can you post a config of your VM, qm config <id>?
Please find the VM config on below, currently is 3 nodes cluster and we may expand our cluster to 5 nodes in further. just afraid will face the same after expanded the cluster...

ceph-dcssd is system OS disk, it is Ceph (3/2) with using Samsung SM863a 240G SSD

root@PVE01:~# qm config 100
bootdisk: virtio0
cores: 4
ide2: none,media=cdrom
memory: 4096
name: WS2016
net0: virtio=4E:7E:30:05:60:CD,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=bfb0b6b1-5128-4241-9e18-2aa1030db4e0
sockets: 1
virtio0: ceph-dcssd:vm-100-disk-2,cache=writeback,size=60G
virtio1: ceph-crssd:vm-100-disk-0,cache=writeback,size=16G
virtio2: local-lvm:vm-100-disk-0,size=32G
virtio3: ceph-crssd:vm-100-disk-1,size=32G
virtio4: ceph-dcssd:vm-100-disk-1,size=32G
vmgenid: 936817f8-e536-4609-b164-ba8495fcb85c
 
Try with scsi disks instead of virtio. And add discard, so a trim inside the VM will be passed to Ceph. Less data to read for Ceph.

Another option is iothread, it allows qemu to open a thread per disk.
 
Try with scsi disks instead of virtio. And add discard, so a trim inside the VM will be passed to Ceph. Less data to read for Ceph.

Another option is iothread, it allows qemu to open a thread per disk.
I had gave a try for SCSI and iothread, SCSI did provide better performance than virtio. But both did not resolve the 15 seconds freeze issue. I am wondering if there is any settings in ceph for timeout settings which may help on it.
 
I had the same issue but with changing sysctl values in vm and settings recommend by Alwin, I am not facing issue now
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!