4.0 HA manager spurious errors or not?

stefws

Renowned Member
Jan 29, 2015
302
4
83
Denmark
siimnet.dk
Having a 4.0 test cluster which started as a 4.0 beta1->2->ga (by altering apt source from prettiest to pve-no-subscription and patching up)
We seen following error at interval but only always on HA master and wondering why, any ideas?

root@n4:~# egrep "ERROR|[eE]+rror|PANIC|[pP]+anic|FATAL|[fF]+atal|[wW]arning|WARNING" /var/log/syslog
Oct 11 11:38:07 n4 pve-ha-crm[2591]: got unexpected error - can't open '/etc/pve/nodes/n4/lrm_status' - No such file or directory
Oct 11 15:32:27 n4 pve-ha-crm[2591]: got unexpected error - can't open '/etc/pve/nodes/n4/lrm_status' - No such file or directory
Oct 11 15:32:47 n4 pve-ha-crm[2591]: got unexpected error - can't open '/etc/pve/nodes/n4/lrm_status' - No such file or directory
Oct 11 19:53:17 n4 pve-ha-crm[2591]: got unexpected error - can't open '/etc/pve/nodes/n4/lrm_status' - No such file or directory
root@n4:~# ls -l /etc/pve/nodes/n4/lrm_status
-rw-r----- 1 root www-data 53 Oct 12 08:46 /etc/pve/nodes/n4/lrm_status
root@n4:~# ha-manager status
quorum OK
master n4 (active, Mon Oct 12 08:46:11 2015)
lrm n1 (active, Mon Oct 12 08:46:13 2015)
lrm n2 (active, Mon Oct 12 08:46:15 2015)
lrm n3 (active, Mon Oct 12 08:46:12 2015)
lrm n4 (active, Mon Oct 12 08:46:13 2015)
lrm n5 (active, Mon Oct 12 08:46:15 2015)
lrm n6 (active, Mon Oct 12 08:46:15 2015)
lrm n7 (active, Mon Oct 12 08:46:16 2015)


root@n4:~# dpkg -l | grep pve-
ii libpve-access-control 4.0-9 amd64 Proxmox VE access control library
ii libpve-common-perl 4.0-29 all Proxmox VE base library
ii libpve-storage-perl 4.0-25 all Proxmox VE storage management library
ii pve-cluster 4.0-22 amd64 Cluster Infrastructure for Proxmox Virtual Environment
ii pve-container 1.0-6 all Proxmox VE Container management tool
ii pve-firewall 2.0-12 amd64 Proxmox VE Firewall
ii pve-firmware 1.1-7 all Binary firmware code for the pve-kernel
ii pve-ha-manager 1.0-9 amd64 Proxmox VE HA Manager
ii pve-kernel-4.2.2-1-pve 4.2.2-16 amd64 The Proxmox PVE Kernel Image
ii pve-libspice-server1 0.12.5-1 amd64 SPICE remote display system server library
ii pve-manager 4.0-48 amd64 The Proxmox Virtual Environment
ii pve-qemu-kvm 2.4-9 amd64 Full virtualization on x86 hardware
 
Ok this looks strange, but I had one of this error messages on my test cluster so I will investigate what happens. The error is not critical, the ha-manger is designed so that it can cope with such "glitches".
Please open a bug report with those infos: https://bugzilla.proxmox.com/

btw. the -i options makes the search case insensitive, which simplifies your regex:
# egrep -i "ERROR|PANIC|FATAL|WARNING" /var/log/syslog
also:
#pveversion -v
output is always nice :)
 
Ok this looks strange, but I had one of this error messages on my test cluster so I will investigate what happens. The error is not critical, the ha-manger is designed so that it can cope with such "glitches".
Please open a bug report with those infos: https://bugzilla.proxmox.com/
Will try to do...

btw. the -i options makes the search case insensitive, which simplifies your regex:
# egrep -i "ERROR|PANIC|FATAL|WARNING" /var/log/syslog
I known :)

also:
#pveversion -v
output is always nice :)
root@n1:~# pveversion -v
proxmox-ve: 4.0-16 (running kernel: 4.2.2-1-pve)
pve-manager: 4.0-48 (running version: 4.0-48/0d8559d0)
pve-kernel-4.2.2-1-pve: 4.2.2-16
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-22
qemu-server: 4.0-30
pve-firmware: 1.1-7
libpve-common-perl: 4.0-29
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-25
pve-libspice-server1: 0.12.5-1
vncterm: 1.2-1
pve-qemu-kvm: 2.4-9
pve-container: 1.0-6
pve-firewall: 2.0-12
pve-ha-manager: 1.0-9
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.3-1
lxcfs: 0.9-pve2
cgmanager: 0.37-pve2
criu: 1.6.0-1
zfsutils: 0.6.5-pve4~jessie
openvswitch-switch: 2.3.2-1