[SOLVED] PVE 5.1 Ceph - HA VM start timeout after relocation

Saiki · Nov 21, 2017

Hi everyone,

I have my test cluster setup with 4 PVE 5.1 nodes having their storage on ceph ssd pool.

I followed the wiki in order to setup HA.
I created a HA group, named cluster, including the 4 nodes with same priority.
I then enabled HA on a test VM, named test-ubuntu, with the following configuration :

max restart : 4
max relocate : 2
group : cluster
request state : started

In order to test this HA configuration, I powered off the node on which the test-ubuntu VM is running.
Ceph SSD pool is fine, its size and replication rule can support the loss of one node.
I notice that this VM is well relocated on another node of the HA group cluster.
However, the test-ubuntu VM cannot be started due to a timeout.

Code:

task started by HA resource agent
TASK ERROR: start failed: command '/usr/bin/kvm -id 114 -chardev 'socket,id=qmp,path=/var/run/qemu-server/114.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/114.pid -daemonize -smbios 'type=1,uuid=08daf2dc-0689-4210-ae75-8021cac53e50' -name test-ubuntu -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/114.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 4096 -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -chardev 'socket,path=/var/run/qemu-server/114.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e3e7c3b43fe4' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=rbd:rbd-ssd/vm-114-disk-1:mon_host=192.168.2.1;192.168.2.2;192.168.2.3;192.168.2.4:auth_supported=cephx:id=admin:keyring=/etc/pve/priv/ceph/ceph-rbd-ssd.keyring,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap114i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=6E:33:C2:AD:0E:FD,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: got timeout

I do not know where this issue could come from.
Can you please provide me some clues on this ?

Thank you in advance for your help.

Best regards,
Saiki

Alwin · Nov 23, 2017

Is your storage properly configured on all nodes (eg. can you access it)? Is your network configured the same on all nodes, same bridge available? Does a manual migration work?

Please use [\CODE] tags when posting configs/log messages, it keeps the formatting.

Saiki · Nov 23, 2017

Hi Alwin,

As requested, I edited my previous message with code formatting

Alwin said:
Is your storage properly configured on all nodes (eg. can you access it)? Is your network configured the same on all nodes, same bridge available? Does a manual migration work?

Yes for every questions. The storage and network are properly configured. I could manually migrate test-ubuntu VM on every nodes without any issue.

Best regards,
Saiki

Alwin · Nov 23, 2017

Is this reproducible on all your nodes? The disks vm are connected to different storages, are all of them accessible? How does your '/etc/pve/storage.cfg' look like?

Saiki · Nov 23, 2017

This is my storage.cfg file :

Code:

zfspool: local-zfs
        pool rpool
        content rootdir,images
        nodes anonymized
        sparse 0

dir: local
        path /var/lib/vz
        content images,rootdir,vztmpl,iso
        maxfiles 0

rbd: ceph-rbd-ssd
        content images
        krbd 0
        monhost 192.168.2.1;192.168.2.2;192.168.2.3;192.168.2.4
        pool rbd-ssd
        username admin

rbd: ceph-rbd-hdd
        content images
        krbd 0
        monhost 192.168.2.1;192.168.2.2;192.168.2.3;192.168.2.4
        pool rbd-hdd
        username admin

These are the IP addresses of my PVE Ceph nodes : 192.168.2.1;192.168.2.2;192.168.2.3;192.168.2.4

The Ceph pool is accessible via every nodes.

Alwin · Nov 24, 2017

Can you reproduce it? Do you see something in the kernel.log/syslog?

Saiki · Nov 27, 2017

Hi Alwin,

Arround 12:46, I powered off the node on which test-ubuntu VM (id 114) is running, and it was relocated.
Please find some log files.

kernel.log

Code:

Nov 27 12:48:40 node2 pve-ha-lrm[21325]: <root@pam> starting task UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam:
Nov 27 12:48:41 node2 kernel: [1648249.707860] device tap114i0 entered promiscuous mode
Nov 27 12:48:41 node2 kernel: [1648249.723039] vmbr4v30: port 2(tap114i0) entered blocking state
Nov 27 12:48:41 node2 kernel: [1648249.723041] vmbr4v30: port 2(tap114i0) entered disabled state
Nov 27 12:48:41 node2 kernel: [1648249.723163] vmbr4v30: port 2(tap114i0) entered blocking state
Nov 27 12:48:41 node2 kernel: [1648249.723165] vmbr4v30: port 2(tap114i0) entered forwarding state
Nov 27 12:49:10 node2 pve-ha-lrm[21325]: <root@pam> end task UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam: start failed: command '/usr/bin/kvm -id 114 -chardev 'socket,id=qmp,path=/var/run/qemu-server/114.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/114.pid -daemonize -smbios 'type=1,uuid=08daf2dc-0689-4210-ae75-8021cac53e50' -name test-ubuntu -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/114.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 4096 -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -chardev 'socket,path=/var/run/qemu-server/114.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e3e7c3b43fe4' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=rbd:rbd-ssd/vm-114-disk-1:mon_host=192.168.2.1;192.168.2.2;192.168.2.3;192.168.2.4:auth_supported=cephx:id=admin:keyring=/etc/pve/priv/ceph/ceph-rbd-ssd.keyring,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap114i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=6E:33:C2:AD:0E:FD,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: got timeout

syslog

Code:

Nov 27 12:46:25 node2 corosync[2528]: notice  [TOTEM ] A processor failed, forming new configuration.
Nov 27 12:46:25 node2 corosync[2528]:  [TOTEM ] A processor failed, forming new configuration.
Nov 27 12:46:26 node2 corosync[2528]: notice  [TOTEM ] A new membership (192.168.1.1:316) was formed. Members left: 2
Nov 27 12:46:26 node2 corosync[2528]: notice  [TOTEM ] Failed to receive the leave message. failed: 2
Nov 27 12:46:26 node2 corosync[2528]:  [TOTEM ] A new membership (192.168.1.1:316) was formed. Members left: 2
Nov 27 12:46:26 node2 corosync[2528]:  [TOTEM ] Failed to receive the leave message. failed: 2
Nov 27 12:46:26 node2 pmxcfs[2418]: [dcdb] notice: members: 1/3578, 3/2418, 4/2476
Nov 27 12:46:26 node2 pmxcfs[2418]: [dcdb] notice: starting data syncronisation
Nov 27 12:46:26 node2 pmxcfs[2418]: [status] notice: members: 1/3578, 3/2418, 4/2476
Nov 27 12:46:26 node2 pmxcfs[2418]: [status] notice: starting data syncronisation
Nov 27 12:46:26 node2 corosync[2528]: notice  [QUORUM] Members[3]: 1 3 4
Nov 27 12:46:26 node2 corosync[2528]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 12:46:26 node2 corosync[2528]:  [QUORUM] Members[3]: 1 3 4
Nov 27 12:46:26 node2 corosync[2528]:  [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 12:46:26 node2 pmxcfs[2418]: [dcdb] notice: received sync request (epoch 1/3578/00000015)
Nov 27 12:46:26 node2 pmxcfs[2418]: [status] notice: received sync request (epoch 1/3578/00000015)
Nov 27 12:46:26 node2 pmxcfs[2418]: [dcdb] notice: received all states
Nov 27 12:46:26 node2 pmxcfs[2418]: [dcdb] notice: leader is 1/3578
Nov 27 12:46:26 node2 pmxcfs[2418]: [dcdb] notice: synced members: 1/3578, 3/2418, 4/2476
Nov 27 12:46:26 node2 pmxcfs[2418]: [dcdb] notice: all data is up to date
Nov 27 12:46:26 node2 pmxcfs[2418]: [dcdb] notice: dfsm_deliver_queue: queue length 6
Nov 27 12:46:26 node2 pmxcfs[2418]: [status] notice: received all states
Nov 27 12:46:26 node2 pmxcfs[2418]: [status] notice: all data is up to date
Nov 27 12:46:26 node2 pmxcfs[2418]: [status] notice: dfsm_deliver_queue: queue length 23
Nov 27 12:46:28 node2 pmxcfs[2418]: [status] notice: received log
Nov 27 12:46:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:46:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:46:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:46:39 node2 pvestatd[2664]: status update time (5.719 seconds)
Nov 27 12:46:41 node2 ceph-osd[2763]: 2017-11-27 12:46:41.340705 7f2457bb6700 -1 osd.4 332 heartbeat_check: no reply from 192.168.2.3:6815 osd.6 since back 2017-11-27 12:46:21.046948 front 2017-11-27 12:46:21.046948 (cutoff 2017-11-27 12:46:21.340704)
Nov 27 12:46:41 node2 ceph-osd[2763]: 2017-11-27 12:46:41.340721 7f2457bb6700 -1 osd.4 332 heartbeat_check: no reply from 192.168.2.3:6811 osd.7 since back 2017-11-27 12:46:21.046948 front 2017-11-27 12:46:21.046948 (cutoff 2017-11-27 12:46:21.340704)
Nov 27 12:46:41 node2 ceph-osd[2763]: 2017-11-27 12:46:41.340725 7f2457bb6700 -1 osd.4 332 heartbeat_check: no reply from 192.168.2.3:6807 osd.8 since back 2017-11-27 12:46:21.046948 front 2017-11-27 12:46:21.046948 (cutoff 2017-11-27 12:46:21.340704)
Nov 27 12:47:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Nov 27 12:47:00 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:47:00 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:47:00 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:47:01 node2 systemd[1]: Started Proxmox VE replication runner.
Nov 27 12:47:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:47:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:47:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:48:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Nov 27 12:48:00 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:48:00 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:48:00 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:48:01 node2 systemd[1]: Started Proxmox VE replication runner.
Nov 27 12:48:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:48:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:48:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:48:40 node2 pve-ha-lrm[21325]: starting service vm:114
Nov 27 12:48:40 node2 pve-ha-lrm[21325]: <root@pam> starting task UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam:
Nov 27 12:48:40 node2 pve-ha-lrm[21326]: start VM 114: UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam:
Nov 27 12:48:40 node2 systemd[1]: Started 114.scope.
Nov 27 12:48:40 node2 systemd-udevd[21335]: Could not generate persistent MAC address for tap114i0: No such file or directory
Nov 27 12:48:41 node2 kernel: [1648249.707860] device tap114i0 entered promiscuous mode
Nov 27 12:48:41 node2 kernel: [1648249.723039] vmbr4v30: port 2(tap114i0) entered blocking state
Nov 27 12:48:41 node2 kernel: [1648249.723041] vmbr4v30: port 2(tap114i0) entered disabled state
Nov 27 12:48:41 node2 kernel: [1648249.723163] vmbr4v30: port 2(tap114i0) entered blocking state
Nov 27 12:48:41 node2 kernel: [1648249.723165] vmbr4v30: port 2(tap114i0) entered forwarding state
Nov 27 12:48:45 node2 pve-ha-lrm[21325]: Task 'UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam:' still active, waiting
Nov 27 12:48:50 node2 pve-ha-lrm[21325]: Task 'UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam:' still active, waiting
Nov 27 12:48:55 node2 pve-ha-lrm[21325]: Task 'UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam:' still active, waiting
Nov 27 12:49:00 node2 pve-ha-lrm[21325]: Task 'UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam:' still active, waiting
Nov 27 12:49:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Nov 27 12:49:00 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:49:00 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:49:00 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:49:01 node2 systemd[1]: Started Proxmox VE replication runner.
Nov 27 12:49:05 node2 pve-ha-lrm[21325]: Task 'UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam:' still active, waiting
Nov 27 12:49:10 node2 pve-ha-lrm[21325]: Task 'UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam:' still active, waiting
Nov 27 12:49:10 node2 pve-ha-lrm[21326]: start failed: command '/usr/bin/kvm -id 114 -chardev 'socket,id=qmp,path=/var/run/qemu-server/114.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/114.pid -daemonize -smbios 'type=1,uuid=08daf2dc-0689-4210-ae75-8021cac53e50' -name test-ubuntu -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/114.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 4096 -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -chardev 'socket,path=/var/run/qemu-server/114.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e3e7c3b43fe4' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=rbd:rbd-ssd/vm-114-disk-1:mon_host=192.168.2.1;192.168.2.2;192.168.2.3;192.168.2.4:auth_supported=cephx:id=admin:keyring=/etc/pve/priv/ceph/ceph-rbd-ssd.keyring,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap114i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=6E:33:C2:AD:0E:FD,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: got timeout
Nov 27 12:49:10 node2 pve-ha-lrm[21325]: <root@pam> end task UPID:node2:0000534E:09D30DB1:5A1BFB98:qmstart:114:root@pam: start failed: command '/usr/bin/kvm -id 114 -chardev 'socket,id=qmp,path=/var/run/qemu-server/114.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/114.pid -daemonize -smbios 'type=1,uuid=08daf2dc-0689-4210-ae75-8021cac53e50' -name test-ubuntu -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/114.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 4096 -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -chardev 'socket,path=/var/run/qemu-server/114.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e3e7c3b43fe4' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=rbd:rbd-ssd/vm-114-disk-1:mon_host=192.168.2.1;192.168.2.2;192.168.2.3;192.168.2.4:auth_supported=cephx:id=admin:keyring=/etc/pve/priv/ceph/ceph-rbd-ssd.keyring,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap114i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=6E:33:C2:AD:0E:FD,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: got timeout
Nov 27 12:49:10 node2 pve-ha-lrm[21325]: service status vm:114 started
Nov 27 12:49:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:49:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:49:30 node2 snmpd[15382]: error on subcontainer 'ia_addr' insert (-1)
Nov 27 12:50:00 node2 systemd[1]: Starting Proxmox VE replication runner...

ceph.log

Code:

2017-11-27 12:46:33.519034 mon.node2 mon.1 192.168.2.2:6789/0 3825 : cluster [INF] mon.node2 calling new monitor election
2017-11-27 12:46:33.541339 mon.node4 mon.3 192.168.2.4:6789/0 1179 : cluster [INF] mon.node4 calling new monitor election
2017-11-27 12:46:33.602046 mon.node1 mon.0 192.168.2.1:6789/0 26181 : cluster [INF] mon.node1 calling new monitor election
2017-11-27 12:46:38.664869 mon.node1 mon.0 192.168.2.1:6789/0 26182 : cluster [INF] mon.node1@0 won leader election with quorum 0,1,3
2017-11-27 12:46:38.672979 mon.node1 mon.0 192.168.2.1:6789/0 26183 : cluster [INF] overall HEALTH_OK
2017-11-27 12:46:39.403792 mon.node1 mon.0 192.168.2.1:6789/0 26184 : cluster [INF] mon.3 192.168.2.4:6789/0
2017-11-27 12:46:39.453081 mon.node1 mon.0 192.168.2.1:6789/0 26185 : cluster [INF] mon.1 192.168.2.2:6789/0
2017-11-27 12:46:39.509138 mon.node1 mon.0 192.168.2.1:6789/0 26186 : cluster [INF] monmap e4: 4 mons at {node1=192.168.2.1:6789/0,node3=192.168.2.3:6789/0,node2=192.168.2.2:6789/0,node4=192.168.2.4:6789/0}
2017-11-27 12:46:40.417497 mon.node1 mon.0 192.168.2.1:6789/0 26194 : cluster [WRN] Health check failed: 1/4 mons down, quorum node1,node2,node4 (MON_DOWN)
2017-11-27 12:46:41.051740 mon.node1 mon.0 192.168.2.1:6789/0 26207 : cluster [INF] osd.6 failed (root=default,datacenter=sbg,host=node3) (2 reporters from different host after 20.000134 >= grace 20.000000)
2017-11-27 12:46:41.051920 mon.node1 mon.0 192.168.2.1:6789/0 26209 : cluster [INF] osd.7 failed (root=default,datacenter=sbg,host=node3) (2 reporters from different host after 20.000321 >= grace 20.000000)
2017-11-27 12:46:41.052043 mon.node1 mon.0 192.168.2.1:6789/0 26211 : cluster [INF] osd.8 failed (root=default,datacenter=sbg,host=node3) (2 reporters from different host after 20.000418 >= grace 20.000000)
2017-11-27 12:46:41.052195 mon.node1 mon.0 192.168.2.1:6789/0 26213 : cluster [INF] osd.14 failed (root=default,datacenter=sbg,host=node3) (2 reporters from different host after 20.000490 >= grace 20.000000)
2017-11-27 12:46:41.647680 mon.node1 mon.0 192.168.2.1:6789/0 26223 : cluster [WRN] Health check failed: 4 osds down (OSD_DOWN)
2017-11-27 12:46:41.647722 mon.node1 mon.0 192.168.2.1:6789/0 26224 : cluster [WRN] Health check failed: 1 host (4 osds) down (OSD_HOST_DOWN)
2017-11-27 12:46:43.921063 mon.node1 mon.0 192.168.2.1:6789/0 26247 : cluster [WRN] Health check failed: Reduced data availability: 28 pgs peering (PG_AVAILABILITY)
2017-11-27 12:46:43.921092 mon.node1 mon.0 192.168.2.1:6789/0 26248 : cluster [WRN] Health check failed: Degraded data redundancy: 28 pgs unclean (PG_DEGRADED)
2017-11-27 12:46:49.217957 mon.node1 mon.0 192.168.2.1:6789/0 26249 : cluster [WRN] Health check update: Reduced data availability: 28 pgs inactive (PG_AVAILABILITY)

I have a feeling the issue comes from the storage. Indeed, I let time for Ceph to repair itself and the test-ubuntu VM could be started. It is very strange, because I set up pool with a replication rule on datacenter. The pool ceph-rbd-ssd pool has a size of 2 and uses replicated-ssd rule.
The other VM are stucked too.

Code:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class ssd
device 10 osd.10 class ssd
device 11 osd.11 class ssd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host node1 {
    id -3        # do not change unnecessarily
    id -4 class ssd        # do not change unnecessarily
    id -15 class hdd        # do not change unnecessarily
    # weight 8.077
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 0.873
    item osd.1 weight 0.873
    item osd.2 weight 0.873
    item osd.12 weight 5.458
}
host node2 {
    id -5        # do not change unnecessarily
    id -6 class ssd        # do not change unnecessarily
    id -16 class hdd        # do not change unnecessarily
    # weight 8.077
    alg straw2
    hash 0    # rjenkins1
    item osd.3 weight 0.873
    item osd.4 weight 0.873
    item osd.5 weight 0.873
    item osd.13 weight 5.458
}
datacenter rbx {
    id -11        # do not change unnecessarily
    id -14 class ssd        # do not change unnecessarily
    id -17 class hdd        # do not change unnecessarily
    # weight 16.155
    alg straw2
    hash 0    # rjenkins1
    item node1 weight 8.077
    item node2 weight 8.077
}
host node3 {
    id -7        # do not change unnecessarily
    id -8 class ssd        # do not change unnecessarily
    id -18 class hdd        # do not change unnecessarily
    # weight 8.077
    alg straw2
    hash 0    # rjenkins1
    item osd.6 weight 0.873
    item osd.7 weight 0.873
    item osd.8 weight 0.873
    item osd.14 weight 5.458
}
host node4 {
    id -9        # do not change unnecessarily
    id -10 class ssd        # do not change unnecessarily
    id -19 class hdd        # do not change unnecessarily
    # weight 8.077
    alg straw2
    hash 0    # rjenkins1
    item osd.9 weight 0.873
    item osd.10 weight 0.873
    item osd.11 weight 0.873
    item osd.15 weight 5.458
}
datacenter sbg {
    id -12        # do not change unnecessarily
    id -13 class ssd        # do not change unnecessarily
    id -20 class hdd        # do not change unnecessarily
    # weight 16.155
    alg straw2
    hash 0    # rjenkins1
    item node3 weight 8.077
    item node4 weight 8.077
}
root default {
    id -1        # do not change unnecessarily
    id -2 class ssd        # do not change unnecessarily
    id -21 class hdd        # do not change unnecessarily
    # weight 32.310
    alg straw2
    hash 0    # rjenkins1
    item rbx weight 16.155
    item sbg weight 16.155
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
rule replicated-ssd {
    id 1
    type replicated
    min_size 1
    max_size 10
    step take default class ssd
    step chooseleaf firstn 0 type datacenter
    step emit
}
rule replicated-hdd {
    id 2
    type replicated
    min_size 1
    max_size 10
    step take default class hdd
    step chooseleaf firstn 0 type datacenter
    step emit
}

# end crush map

Best regards,
Saiki

Alwin · Nov 27, 2017

Saiki said:
ceph-rbd-ssd pool has a size of 2

And what is the min_size? It could be that the cluster hits the min_size for the pool.

Is that data center only a logic level or did you create the cluster spanning two different data centers?

Saiki · Nov 28, 2017

Hello Alwin,

Thank you very much for your help !

Alwin said:
Is that data center only a logic level or did you create the cluster spanning two different data centers?

No, datacenter is not a logic level, it reflects the physical locations.
The addition of another PVE node on another datacenter is planned in order to always have 3 running PVE node in case of datacenter disaster

The issue was indeed the min_size of the pool which is by default 2.
I now have edited it with 1 and my VM relocation after datacenter disaster works perfectly fine

Best regards,
Saiki

Search

Search

[SOLVED] PVE 5.1 Ceph - HA VM start timeout after relocation

Saiki

New Member

Alwin

Proxmox Retired Staff

Saiki

New Member

Alwin

Proxmox Retired Staff

Saiki

New Member

Alwin

Proxmox Retired Staff

Saiki

New Member

Alwin

Proxmox Retired Staff

Saiki

New Member