3.4: All VM stopped under heavy Ceph reorg

stefws · Feb 27, 2015

Had two Ceph pools for RBD virt disks, vm_images (boot hdd images) + rbd_data (extra hdd images).

Then while adding pools for a rados GW (.rgw.*) suddenly ceph health status said that my vm_images pool had too few PGs, thus I ran:

ceph osd pool set vm_images pg_num <larger_number>
ceph osd pool set vm_images pgp_num <larger_number>

Kicking off a 20 min rebalancing with a lot of IO in the Ceph Cluster, eventually Ceph Cluster was fine again, only almost all my PVE VMs ended up in stopped state, wondering why, a watchdog thingy maybe...

/Steffen

PS! Admitting my Ceph public and private networks are on the same physical 2-3Gbs LaCP load balanced network (some nodes with 2x1Gbs NICs, some with 3x1Gbs NIcs) since my only other physical network is a slow 100Mbs public network.

Search

Search

3.4: All VM stopped under heavy Ceph reorg

stefws

Renowned Member

We value your privacy