Proxmox VE 4.0 beta2 with glusterfs: can't live-migrate VM's

mircsicz · Sep 10, 2015

Hi all,

I've setup a few days ago, so I used the beta1 iso. And I have the issue since I setup the machine's

I've two NUC's as my test environment, both with the same setup and hardware spec:

root@pve01:~# cat /proc/meminfo

Code:

MemTotal:       16345196 kB
MemFree:        14776328 kB
MemAvailable:   14821328 kB

root@pve01:~# cat /proc/cpuinfo

Code:

processor    : 3
vendor_id    : GenuineIntel
cpu family    : 6
model        : 58
model name    : Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz
stepping    : 9
microcode    : 0x1b
cpu MHz        : 2398.558
cache size    : 3072 KB
physical id    : 0
siblings    : 4
core id        : 1
cpu cores    : 2
apicid        : 3
initial apicid    : 3
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bugs        :
bogomips    : 4589.63
clflush size    : 64
cache_alignment    : 64
address sizes    : 36 bits physical, 48 bits virtual
power management:

root@pve02:~# pveversion -v

Code:

proxmox-ve: 4.0-10 (running kernel: 4.2.0-1-pve)pve-manager: 4.0-36 (running version: 4.0-36/9815097f)
pve-kernel-4.2.0-1-pve: 4.2.0-10
lvm2: 2.02.116-pve1
corosync-pve: 2.3.4-2
libqb0: 0.17.1-3
pve-cluster: 4.0-17
qemu-server: 4.0-23
pve-firmware: 1.1-7
libpve-common-perl: 4.0-20
libpve-access-control: 4.0-8
libpve-storage-perl: 4.0-21
pve-libspice-server1: 0.12.5-1
vncterm: 1.2-1
pve-qemu-kvm: 2.4-5
pve-container: 0.9-18
pve-firewall: 2.0-11
pve-ha-manager: 1.0-5
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.3-1
lxcfs: 0.9-pve2
cgmanager: 0.37-pve2
criu: 1.6.0-1
zfsutils: 0.6.4-pve3~jessie

The error I see when trying to live-migrate a VM is:

Code:

Sep 10 18:34:29 starting migration of VM 101 to node 'pve01' (10.10.10.31)
Sep 10 18:34:29 copying disk images
Sep 10 18:34:29 starting VM 101 on remote node 'pve01'
Sep 10 18:34:31 starting ssh migration tunnel
Sep 10 18:34:31 starting online/live migration on localhost:60000
Sep 10 18:34:31 migrate_set_speed: 8589934592
Sep 10 18:34:31 migrate_set_downtime: 0.1
Sep 10 18:34:33 migration status: active (transferred 244662773, remaining 798330880), total 1082793984)
Sep 10 18:34:35 migration status: active (transferred 477656869, remaining 499875840), total 1082793984)
Sep 10 18:34:37 ERROR: online migrate failure - aborting
Sep 10 18:34:37 aborting phase 2 - cleanup resources
Sep 10 18:34:37 migrate_cancel
Sep 10 18:34:38 ERROR: migration finished with problems (duration 00:00:09)
TASK ERROR: migration problems

while running "qm migrate 101 pve01 --online" I get this errors in syslog:

Code:

Sep 10 19:00:42 pve02 qm[6386]: <root@pam> starting task UPID:pve02:000018F3:0002CECD:55F1B73A:qmigrate:101:root@pam:Sep 10 19:00:43 pve02 pmxcfs[960]: [status] notice: received log
Sep 10 19:00:43 pve02 pmxcfs[960]: [status] notice: received log
Sep 10 19:00:50 pve02 pmxcfs[960]: [status] notice: received log
Sep 10 19:00:50 pve02 pmxcfs[960]: [status] notice: received log
Sep 10 19:00:50 pve02 qm[6387]: migration problems
Sep 10 19:00:50 pve02 qm[6386]: <root@pam> end task UPID:pve02:000018F3:0002CECD:55F1B73A:qmigrate:101:root@pam: migration problems

Cluster status seems to be fine:
root@pve01:~# pvecm status

Code:

Quorum information
------------------
Date:             Thu Sep 10 18:50:49 2015
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          60
Quorate:          Yes


Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate


Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.10.10.31 (local)
0x00000002          1 10.10.10.32

root@pve02:~# qm config 101

Code:

balloon: 512
bootdisk: virtio0
cores: 2
ide2: isos:iso/pmagic_2014_09_29.iso,media=cdrom
memory: 1024
name: tst
net0: virtio=E6:C5:9D:74:81:CB,bridge=vmbr0
numa: 0
ostype: l26
smbios1: uuid=c4caba6e-e35c-4508-bf79-a0bd17c2a29f
sockets: 1
virtio0: disks:101/vm-101-disk-1.raw,size=6G

The behaviour is the same with another VM using a qcow2 image...

root@pve02:~# cat /etc/pve/storage.cfg

Code:

dir: local
    path /var/lib/vz
    maxfiles 0
    shared
    content vztmpl


glusterfs: isos
    path /mnt/pve/isos
    volume isos
    content iso
    maxfiles 1
    server pve01.one.lan
    server2 pve02.one.lan


glusterfs: disks
    path /mnt/pve/disks
    volume disks
    content rootdir,images
    maxfiles 1
    server pve01.one.lan
    server2 pve02.one.lan

ZFS is in a good condition too:

root@pve02:~# zpool list

Code:

NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool  74.5G  2.52G  72.0G         -     1%     3%  1.00x  ONLINE  -

root@pve02:~# zfs list

Code:

NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool             12.3G  59.8G   470M  /rpool
rpool/ROOT        1.60G  59.8G    96K  /rpool/ROOT
rpool/ROOT/pve-1  1.60G  59.8G  1.60G  /
rpool/disks        720K  45.0G   720K  /rpool/disks
rpool/isos         470M  14.5G   470M  /rpool/isos
rpool/swap        9.83G  69.7G    64K  -

I can't see a prob, is it a known bug or is my setup faulty?

t.lamprecht · Sep 11, 2015

No known bug, AFAIK, and I successfully migrated an VM located on a glusterFS storage live.
But, I have the glusterFS one two different servers and no ZFS on any machine, FYI.

I think gluster shouldn't be the (single) cause of the problem.
What OS is the guest?

mircsicz · Sep 11, 2015

Hi Thomas,

thanks for your reply!

The Guest is only running a live-CD, partedMagic ATM... I'll setup a jessie and report how it goes!

Best
Mirco

mircsicz · Sep 11, 2015

While installing Jessie 8.2 from freshly loaded netinst.iso I got this:

root@pve02:~# qm config 100

Code:

balloon: 512
bootdisk: virtio0
cores: 2
ide2: isos:iso/debian-8.2.0-amd64-netinst.iso,media=cdrom,size=247M
memory: 1024
name: mon
net0: virtio=4A:00:B5:3E:44:D4,bridge=vmbr0
numa: 0
ostype: l26
smbios1: uuid=09598e03-61e1-4a03-8fe8-a80929b9ceda
sockets: 1
virtio0: disks:100/vm-100-disk-1.qcow2,size=6G

t.lamprecht · Sep 11, 2015

Hmm, can you write to the disks export from your host?
What happens when you try to dd some file?

Code:

dd if=/dev/zero of=/gluster/mount/path/sometestfile.zero bs=1024 count=100k

mircsicz · Sep 11, 2015

That look's as expected:

Code:

root@pve02:~# dd if=/dev/zero of=/mnt/pve/disks/images/100/test.zero bs=1024 count=100k
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 3.80664 s, 27.5 MB/s

Code:

root@pve02:~# mount|grep gluster
pve01.one.lan:disks on /mnt/pve/disks type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
pve01.one.lan:isos on /mnt/pve/isos type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

mircsicz · Sep 12, 2015

After setting a CentOS VM and testing I realized it was my fault, seems it's not possible to migrate a VM running from a live-CD...

I've successfully done online migrations with a CentOS host now! But if I add that host to the HA group I created an online migration fails!!!

How can I debug that?

t.lamprecht · Sep 12, 2015

Live migration with an live ISO works for me, as long as all components from the VM (iso, possible drives) are on a shared Storage.
I did it with fedora live and an NFS shared storage (like you one for the ISOs and one for the drive). have to test it with GlusterFS, but at the moment I see no reason for it to fail.

Did you managed to install the CentOS on the VM, what was the problem o, out of interest?

Note: I saw that you 'only' have two nodes. HA with to nodes is not possible! You need at least three nodes for it to work, this has technical reasons which CANNOT be undermined, real HA with less than three node is NOT possible (no matter if some marketing people from other Hypervisor sell it as such, thats 'only' failover)

Look for split brain and byzantine errors if you're interested for the reason it needs three or more.

To see an overlook of the states from HA go to Datacenter->HA and select the Status tab at the bottom. When you start an action which affects an HA service like, stop, migrate, start then our ha manager executes that action.
You see something like in the task log 'HA Migrate 100', those is normally uninteresting, but a little later an task should spawn with 'VM Migrate 100', there you can look for log output and troubles.
Also in the syslog, see what the actual master of your setup is, this has additional info’s. interesting entries are from pve-ha-lrm (local resource manager) and pve-ha-crm (cluster resource manager).

HA live migration works in general, as a matter of fact the fedora live CD i live migrated (see screenshot) was under HA.

Please post any suspicious/erroneous output from the log.

mircsicz · Sep 12, 2015

Thank's again for your reply, esp. as it's saturday!

HA states seem to be fine:

Even after readding a VM it seems to fine:

I wonder why I can enable HA if it's not going to work, I'ld expect it tell me that I need at least three nodes in the cluster to enable HA...

When trying to migrate that HA enabled VM I see the following in my syslog:

Code:

Sep 12 14:56:14 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipeSep 12 14:56:24 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:56:34 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:56:44 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:56:54 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:56:59 pve01 pmxcfs[969]: [status] notice: received log
Sep 12 14:57:04 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:57:04 pve01 pve-ha-crm[1549]: got crm command: migrate vm:101 pve01
Sep 12 14:57:04 pve01 pve-ha-crm[1549]: migrate service 'vm:101' to node 'pve01' (running)
Sep 12 14:57:04 pve01 pve-ha-crm[1549]: service 'vm:101': state changed from 'started' to 'migrate'  (node = pve02, target = pve01)
Sep 12 14:57:12 pve01 pmxcfs[969]: [status] notice: received log
Sep 12 14:57:13 pve01 qm[6965]: <root@pam> starting task UPID:pve01:00001B36:00F44D9D:55F42129:qmstart:101:root@pam:
Sep 12 14:57:13 pve01 qm[6966]: start VM 101: UPID:pve01:00001B36:00F44D9D:55F42129:qmstart:101:root@pam:
Sep 12 14:57:13 pve01 kernel: [160022.339526] device tap101i0 entered promiscuous mode
Sep 12 14:57:13 pve01 kernel: [160022.344943] vmbr0: port 2(tap101i0) entered forwarding state
Sep 12 14:57:13 pve01 kernel: [160022.344951] vmbr0: port 2(tap101i0) entered forwarding state
Sep 12 14:57:14 pve01 qm[6965]: <root@pam> end task UPID:pve01:00001B36:00F44D9D:55F42129:qmstart:101:root@pam: OK
Sep 12 14:57:14 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:57:24 pve01 qm[7041]: <root@pam> starting task UPID:pve01:00001B8C:00F451D5:55F42134:qmresume:101:root@pam:
Sep 12 14:57:24 pve01 qm[7052]: resume VM 101: UPID:pve01:00001B8C:00F451D5:55F42134:qmresume:101:root@pam:
Sep 12 14:57:24 pve01 qm[7041]: <root@pam> end task UPID:pve01:00001B8C:00F451D5:55F42134:qmresume:101:root@pam: OK
Sep 12 14:57:24 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:57:26 pve01 pmxcfs[969]: [status] notice: received log
Sep 12 14:57:34 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:57:34 pve01 pve-ha-crm[1549]: service 'vm:101' - migration failed (exit code 255)
Sep 12 14:57:34 pve01 pve-ha-crm[1549]: service 'vm:101': state changed from 'migrate' to 'started'  (node = pve02)
Sep 12 14:57:44 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:57:44 pve01 pve-ha-crm[1549]: migrate service 'vm:101' to node 'pve01' (running)
Sep 12 14:57:44 pve01 pve-ha-crm[1549]: service 'vm:101': state changed from 'started' to 'migrate'  (node = pve02, target = pve01)
Sep 12 14:57:54 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:57:54 pve01 pve-ha-crm[1549]: service 'vm:101' - migration failed (exit code 255)
Sep 12 14:57:54 pve01 pve-ha-crm[1549]: service 'vm:101': state changed from 'migrate' to 'started'  (node = pve02)
Sep 12 14:58:04 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:58:04 pve01 pve-ha-crm[1549]: migrate service 'vm:101' to node 'pve01' (running)
Sep 12 14:58:04 pve01 pve-ha-crm[1549]: service 'vm:101': state changed from 'started' to 'migrate'  (node = pve02, target = pve01)
Sep 12 14:58:13 pve01 pvedaemon[2726]: <root@pam> successful auth for user 'root@pam'
Sep 12 14:58:14 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:58:14 pve01 pve-ha-crm[1549]: service 'vm:101' - migration failed (exit code 255)
Sep 12 14:58:14 pve01 pve-ha-crm[1549]: service 'vm:101': state changed from 'migrate' to 'started'  (node = pve02)
Sep 12 14:58:24 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 14:58:24 pve01 pve-ha-crm[1549]: migrate service 'vm:101' to node 'pve01' (running)
Sep 12 14:58:24 pve01 pve-ha-crm[1549]: service 'vm:101': state changed from 'started' to 'migrate'  (node = pve02, target = pve01)

And it seems to loop in trying to migrate... Removing the VM from HA did quit the loop:

Code:

Sep 12 15:00:34 pve01 pve-ha-crm[1549]: removing stale service 'vm:101' (no config)

But that is very suspicious:

Code:

Sep 12 15:00:54 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 15:01:04 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe

BTW: I can't install Debian (tried prior to CentOS), it fails with following error:

But I just redownloaded the ISO so maybe that's the cause...

Edit: damn, the MD5's are correct, now trying the full ISO instead of netinst!

t.lamprecht · Sep 14, 2015

No, HA state doesn't looks fine, the local resource manager from the pve01 node is dead. Cluster is still quorate?
What is in the syslog from the pve01 node at the last time it reported to the CRM? (12.09.2015 - 07:35:34)

Could you try to restart it with

Code:

systemctl restart pve-ha-lrm.service

Then, whenn after about 2 minutes nothing changed restart the master

Code:

systemctl restart pve-ha-crm.service

Code:

Sep 12 15:00:54 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 15:01:04 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe

Yes the watchdog fails to update, restart the services first it should hopefully work again... Did you alter something, manually on your system?

And for the ISO, can you try it on local storage if you have enough space for a small installation? To finally rule out if it's your gluster setup or not.

mircsicz · Sep 14, 2015

Damn, so here's the log from around that time:

Code:

Sep 12 07:34:42 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipeSep 12 07:34:52 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 07:34:58 pve01 pvedaemon[11249]: command '/bin/nc6 -l -p 5900 -w 10 -e '/usr/bin/ssh -T -o BatchMode=yes 10.10.10.32 /usr/sbin/qm vncproxy 101 2>/dev/null'' failed: exit code 1
Sep 12 07:35:02 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 07:35:03 pve01 pmxcfs[969]: [status] notice: received log
Sep 12 07:35:12 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 07:35:12 pve01 pve-ha-crm[1549]: got crm command: migrate vm:101 pve01
Sep 12 07:35:12 pve01 pve-ha-crm[1549]: migrate service 'vm:101' to node 'pve01' (stopped)
Sep 12 07:35:22 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 07:35:24 pve01 pvedaemon[11776]: starting vnc proxy UPID:pve01:00002E00:00CBDA5B:55F3B99B:vncproxy:101:root@pam:
Sep 12 07:35:24 pve01 pvedaemon[17872]: <root@pam> starting task UPID:pve01:00002E00:00CBDA5B:55F3B99B:vncproxy:101:root@pam:
Sep 12 07:35:24 pve01 qm[11779]: VM 101 qmp command failed - VM 101 not running
Sep 12 07:35:26 pve01 pvedaemon[14679]: <root@pam> starting task UPID:pve01:00002E0B:00CBDB82:55F3B99E:hastart:101:root@pam:
Sep 12 07:35:27 pve01 pvedaemon[11776]: command '/bin/nc6 -l -p 5900 -w 10 -e '/usr/sbin/qm vncproxy 101 2>/dev/null'' failed: exit code 1
Sep 12 07:35:28 pve01 pvedaemon[14679]: <root@pam> starting task UPID:pve01:00002E0D:00CBDBF6:55F3B9A0:vncproxy:101:root@pam:
Sep 12 07:35:28 pve01 pvedaemon[11789]: starting vnc proxy UPID:pve01:00002E0D:00CBDBF6:55F3B9A0:vncproxy:101:root@pam:
Sep 12 07:35:28 pve01 qm[11792]: VM 101 qmp command failed - VM 101 not running
Sep 12 07:35:32 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 07:35:32 pve01 pve-ha-crm[1549]: service 'vm:101': state changed from 'stopped' to 'started'  (node = pve01)
Sep 12 07:35:34 pve01 pve-ha-lrm[1551]: successfully aquired lock 'ha_agent_pve01_lock'
Sep 12 07:35:34 pve01 pve-ha-lrm[1551]: ERROR: unable to open watchdog socket - Connection refused
Sep 12 07:35:34 pve01 pve-ha-lrm[1551]: server stopped
Sep 12 07:35:34 pve01 systemd[1]: pve-ha-lrm.service: main process exited, code=exited, status=255/n/a
Sep 12 07:35:35 pve01 systemd[1]: Unit pve-ha-lrm.service entered failed state.
Sep 12 07:35:42 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 07:35:42 pve01 pve-ha-crm[1549]: service 'vm:101': state changed from 'started' to 'freeze'
Sep 12 07:35:52 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 07:36:01 pve01 pvedaemon[11789]: command '/bin/nc6 -l -p 5900 -w 10 -e '/usr/sbin/qm vncproxy 101 2>/dev/null'' failed: exit code 1
Sep 12 07:36:02 pve01 pveproxy[9758]: worker exit
Sep 12 07:36:02 pve01 pveproxy[3260]: worker 9758 finished
Sep 12 07:36:02 pve01 pveproxy[3260]: starting 1 worker(s)
Sep 12 07:36:02 pve01 pveproxy[3260]: worker 11865 started
Sep 12 07:36:02 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 07:36:02 pve01 pvedaemon[17872]: <root@pam> starting task UPID:pve01:00002E5A:00CBE997:55F3B9C2:vncproxy:101:root@pam:
Sep 12 07:36:02 pve01 pvedaemon[11866]: starting vnc proxy UPID:pve01:00002E5A:00CBE997:55F3B9C2:vncproxy:101:root@pam:
Sep 12 07:36:03 pve01 qm[11869]: VM 101 qmp command failed - VM 101 not running
Sep 12 07:36:12 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 12 07:36:16 pve01 pvedaemon[14679]: <root@pam> starting task UPID:pve01:00002E77:00CBEEAC:55F3B9CF:hastart:101:root@pam:
Sep 12 07:36:17 pve01 pvedaemon[11866]: command '/bin/nc6 -l -p 5900 -w 10 -e '/usr/sbin/qm vncproxy 101 2>/dev/null'' failed: exit code 1
Sep 12 07:36:17 pve01 pvedaemon[8934]: <root@pam> starting task UPID:pve01:00002E87:00CBEF31:55F3B9D1:vncproxy:101:root@pam:
Sep 12 07:36:17 pve01 pvedaemon[11911]: starting vnc proxy UPID:pve01:00002E87:00CBEF31:55F3B9D1:vncproxy:101:root@pam:
Sep 12 07:36:17 pve01 qm[11914]: VM 101 qmp command failed - VM 101 not running

This is syslog on pve01 while running: "systemctl restart pve-ha-lrm.service"

Code:

Sep 14 08:41:35 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipeSep 14 08:41:45 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 14 08:41:53 pve01 watchdog-mux[7692]: watchdog set timeout: Invalid argument
Sep 14 08:41:53 pve01 systemd[1]: watchdog-mux.service: main process exited, code=exited, status=1/FAILURE
Sep 14 08:41:53 pve01 systemd[1]: Unit watchdog-mux.service entered failed state.
Sep 14 08:41:54 pve01 pve-ha-lrm[7697]: starting server
Sep 14 08:41:54 pve01 pve-ha-lrm[7697]: status change startup => wait_for_agent_lock
Sep 14 08:41:55 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 14 08:42:05 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe

Followed by "systemctl restart pve-ha-crm.service" on pve01

Code:

Sep 14 08:44:54 pve01 pve-ha-crm[1549]: received signal TERM
Sep 14 08:44:54 pve01 pve-ha-crm[1549]: server received shutdown request
Sep 14 08:44:55 pve01 pve-ha-crm[1549]: watchdog update failed - Broken pipe
Sep 14 08:44:55 pve01 pve-ha-crm[1549]: watchdog closed (disabled)
Sep 14 08:44:55 pve01 pve-ha-crm[1549]: server stopped
Sep 14 08:44:56 pve01 watchdog-mux[8331]: watchdog set timeout: Invalid argument
Sep 14 08:44:56 pve01 systemd[1]: watchdog-mux.service: main process exited, code=exited, status=1/FAILURE
Sep 14 08:44:56 pve01 systemd[1]: Unit watchdog-mux.service entered failed state.
Sep 14 08:44:56 pve01 pve-ha-crm[8336]: starting server
Sep 14 08:44:56 pve01 pve-ha-crm[8336]: status change startup => wait_for_quorum
Sep 14 08:45:01 pve01 pve-ha-crm[8336]: status change wait_for_quorum => slave
Sep 14 08:46:48 pve01 pvedaemon[16223]: <root@pam> successful auth for user 'root@pam'
Sep 14 08:46:57 pve01 pve-ha-crm[8336]: successfully aquired lock 'ha_manager_lock'
Sep 14 08:46:57 pve01 pve-ha-crm[8336]: ERROR: unable to open watchdog socket - Connection refused
Sep 14 08:46:57 pve01 pve-ha-crm[8336]: server received shutdown request
Sep 14 08:46:57 pve01 pve-ha-crm[8336]: server stopped
Sep 14 08:46:57 pve01 systemd[1]: pve-ha-crm.service: main process exited, code=exited, status=255/n/a
Sep 14 08:46:57 pve01 systemd[1]: Unit pve-ha-crm.service entered failed state.

which leave's me with an obvisiously bad state:

Did not alter any config, my setup step's were as follow's:

setup (with ZFS)
create cluster according to wiki http://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster
setup glusterfs according blog-post http://www.jamescoyle.net/how-to/435-setup-glusterfs-with-a-replicated-volume-over-2-nodes
mount glusterfs using webgui
create VM's for testing

Regarding Debian setup issue's: yesterday I've setup a Win8.1 VM and could do an online migration of that too! So it seems it's Debian issue only, and I've tried a testing netinst ISO in the meantime too... But I'll give it another shot later today

Search

Search

Proxmox VE 4.0 beta2 with glusterfs: can't live-migrate VM's

mircsicz

Renowned Member

t.lamprecht

Proxmox Staff Member

mircsicz

Renowned Member

mircsicz

Renowned Member

t.lamprecht

Proxmox Staff Member

mircsicz

Renowned Member

mircsicz

Renowned Member

t.lamprecht

Proxmox Staff Member

mircsicz

Renowned Member

t.lamprecht

Proxmox Staff Member

mircsicz

Renowned Member