# pveversion -v
proxmox-ve: 4.4-79 (running kernel: 4.4.35-2-pve)
pve-manager: 4.4-12 (running version: 4.4-12/e71b7a74)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.35-2-pve: 4.4.35-79
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-108
pve-firmware: 1.1-10
libpve-common-perl: 4.0-91
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-73
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-3
pve-qemu-kvm: 2.7.1-1
pve-container: 1.0-93
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-1
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve14~bpo80
ceph: 10.2.5-1~bpo80+1
we have a ceph cluster
a minute after restarting one node at least 3 key kvm's paniced.
screen shot attached.
kvm's are on 2 diff nodes
boot: cn
bootdisk: scsi0
cores: 2
memory: 1024
name: fbcadmin
net0: virtio=DE:60:C3:F6:55:23,bridge=vmbr1
numa: 0
onboot: 1
ostype: l26
protection: 1
scsi0: ceph-kvm3:vm-100-disk-1,discard=on,size=8G
smbios1: uuid=195cf837-ebaa-49c2-95e9-5ba7a0869cb0
sockets: 1
...
scsi0: ceph-kvm3:vm-100-disk-1,discard=on,size=8G
...
[/code]
Hi Rob,after some research , since I have 8 nodes , I'll try using 5 for OSD and 3 for VM. I am not sure yet where to place the 3 mons .
Answer myself,Because there is an issue that the osd-stop would not reconised early enough, because the mon also died to fast. If you restart an node and shut down the ceph-osd first the VMs have app. 20sec less IO-stall.
make sure that you use virtio-scsi controller (not LSI), see VM options. I remember some panic when using LSI recently but I did not debug it further as modern OS should use virtio-scsi anyways.
Hi Rob,
I'm not sure if this help for this issue, but I had an seperate ceph,cluster (8 nodes), where the mons run on the pve-nodes.
So I would run the mons on the VM-nodes.
Was the restarted node an osd+mon-node? Because there is an issue that the osd-stop would not reconised early enough, because the mon also died to fast. If you restart an node and shut down the ceph-osd first the VMs have app. 20sec less IO-stall.
Normaly the VMs should handle short IO-stalling without trouble, but perhaps not?! (Don't know if the discard is also an problem in this case).
Udo
We use essential cookies to make this site work, and optional cookies to enhance your experience.