Recent content by Le PAH

  1. L

    [SOLVED] Incoherent VM status following HA failover (bug?)

    Hello, The flapping interface issue have been resolved. It was a misconfiguration of the interfaces as well as a problem of ring redundancy on corosync that is now resolved. The HA problem persists thought and the syslog of the node that tries to start the HA VM shows possible problems on the...
  2. L

    [SOLVED] Incoherent VM status following HA failover (bug?)

    Here is the full syslog between the moment when the initial host is stopped and when it is back on: Mar 27 11:10:32 srv-pve3 corosync[2197]: error [TOTEM ] Marking ringid 1 interface 192.168.1.103 FAULTY Mar 27 11:10:32 srv-pve3 corosync[2197]: [TOTEM ] Marking ringid 1 interface...
  3. L

    [SOLVED] Incoherent VM status following HA failover (bug?)

    Hello, You might have found the culprit. On all nodes, one of the interfaces flaps: Mar 27 10:46:16 srv-pve2 corosync[2333]: error [TOTEM ] Marking ringid 1 interface 192.168.1.102 FAULTY Mar 27 10:46:16 srv-pve2 corosync[2333]: [TOTEM ] Marking ringid 1 interface 192.168.1.102 FAULTY Mar...
  4. L

    [SOLVED] Incoherent VM status following HA failover (bug?)

    That is what I do indeed. Regarding the syslog, I don't see anything suspicious.
  5. L

    [SOLVED] Incoherent VM status following HA failover (bug?)

    For more information, the cluster fails sometime to start a VM, when migrating it between 2 nodes: () task started by HA resource agent 2019-03-26 10:45:25 use dedicated network address for sending migration traffic (10.0.0.102) 2019-03-26 10:45:26 starting migration of VM 106 to node 'srv-pve2'...
  6. L

    [SOLVED] Incoherent VM status following HA failover (bug?)

    Here you go: root@srv-pve3:~# cat /etc/pve/qemu-server/106.conf #Serveur utilis%C3%A9 pour stocker une image des d%C3%A9p%C3%B4ts Debian. agent: 1 balloon: 1024 bootdisk: scsi0 cores: 1 ide2: none,media=cdrom memory: 2048 name: SRV-APT-REPO net0: virtio=56:B4:5A:55:79:08,bridge=vmbr0,rate=1.4...
  7. L

    [SOLVED] Incoherent VM status following HA failover (bug?)

    Thanks for you reply :) The qm monitor is not successful: root@srv-pve3:~# qm monitor 106 Entering Qemu Monitor for VM 106 - type 'help' for help qm> stop ERROR: VM 106 qmp command 'human-monitor-command' failed - unable to connect to VM 106 qmp socket - timeout after 31 retries Trying to...
  8. L

    [SOLVED] Incoherent VM status following HA failover (bug?)

    I've got the exact same issue for a different VM on a different HA Group for a different node: () task started by HA resource agent TASK ERROR: start failed: command '/usr/bin/kvm -id 106 -name SRV-APT-REPO -chardev 'socket,id=qmp,path=/var/run/qemu-server/106.qmp,server,nowait' -mon...
  9. L

    [SOLVED] Incoherent VM status following HA failover (bug?)

    Hello all, I'm experimenting with HA clustering and I ran into a possible bug. VM is configured with HA group to preferably run on the node 3: root@srv-pve1:~# cat /etc/pve/ha/groups.cfg group: PVE3_First comment PVE3 is prefered for this group nodes srv-pve3 nofailback 0...
  10. L

    Live migration of VM fails (sometimes…)

    Here you go: root@srv-pve1:~# pveversion -v proxmox-ve: 5.3-1 (running kernel: 4.15.18-12-pve) pve-manager: 5.3-11 (running version: 5.3-11/d4907f84) pve-kernel-4.15: 5.3-3 pve-kernel-4.15.18-12-pve: 4.15.18-35 pve-kernel-4.15.18-11-pve: 4.15.18-34 pve-kernel-4.15.18-10-pve: 4.15.18-32...
  11. L

    Live migration of VM fails (sometimes…)

    Hi all, VM migration fails at an inconstant rate between nodes: 2019-03-19 13:59:14 use dedicated network address for sending migration traffic (10.0.0.102) 2019-03-19 13:59:14 starting migration of VM 108 to node 'srv-pve2' (10.0.0.102) 2019-03-19 13:59:14 copying disk images 2019-03-19...
  12. L

    [SOLVED] VM Migration on a cluster running Ceph

    Thank you for the feedback. I'll try and use HA in the next days.
  13. L

    [SOLVED] VM Migration on a cluster running Ceph

    Okay, great that makes sense! I've got another question, related to the former subject. How should I configure Proxmox if I want to make a VM available at all time, even if the underlying node gets interrupted? The replication on the host doesn't seem to work with Ceph storage (no replicable...