Update crashes node

iprigger

Renowned Member
Sep 5, 2009
192
43
93
earth!
Hi All,

I just updated my nodes (5-node cluster) and one of the servers just did a reboot...

I have searched for logs but it looks like it just dumped and came back...

the server is a dual-cpu system with ~70G Memory... E5520 CPUs

proxmox-ve: 4.2-52 (running kernel: 4.4.8-1-pve)
pve-manager: 4.2-11 (running version: 4.2-11/2c626aa1)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-2.6.32-43-pve: 2.6.32-166
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-2.6.32-39-pve: 2.6.32-157
pve-kernel-4.2.2-1-pve: 4.2.2-16
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-40
qemu-server: 4.0-79
pve-firmware: 1.1-8
libpve-common-perl: 4.0-67
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-51
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-67
pve-firewall: 2.0-29
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie

And idea? The node is behaving normally otherwise. Since this is the second time this happens (with this node) I'm becoming a bit curious...

Tobias
 
I just did an deb update on a non cluster system, and ended up with an empty /etc/pve/

can you check if /etc/pve/ is empty on that node?
 
I just did an deb update on a non cluster system, and ended up with an empty /etc/pve/

can you check if /etc/pve/ is empty on that node?

Hi,
/etc/pve is full:
root@kevlavik:~# ls -al /etc/pve/
total 17
drwxr-xr-x 2 root www-data 0 Jan 1 1970 .
drwxr-xr-x 102 root root 12288 Jun 14 16:43 ..
-r--r----- 1 root www-data 9667 Jan 1 1970 .clusterlog
-rw-r----- 1 root www-data 2 Jan 1 1970 .debug
-r--r----- 1 root www-data 434 Jan 1 1970 .members
-r--r----- 1 root www-data 12393 Jan 1 1970 .rrd
-r--r----- 1 root www-data 529 Jan 1 1970 .version
-r--r----- 1 root www-data 3686 Jan 1 1970 .vmlist
-rw-r----- 1 root www-data 451 Sep 20 2015 authkey.pub
-rw-r----- 1 root www-data 340 Sep 20 2015 cluster.conf.old
-rw-r----- 1 root www-data 733 May 18 21:49 corosync.conf
-rw-r----- 1 root www-data 16 Sep 20 2015 datacenter.cfg
drwxr-xr-x 2 root www-data 0 Dec 25 14:54 ha
lrwxr-xr-x 1 root www-data 0 Jan 1 1970 local -> nodes/kevlavik
lrwxr-xr-x 1 root www-data 0 Jan 1 1970 lxc -> nodes/kevlavik/lxc
drwxr-xr-x 2 root www-data 0 Sep 20 2015 nodes
lrwxr-xr-x 1 root www-data 0 Jan 1 1970 openvz -> nodes/kevlavik/openvz
drwx------ 2 root www-data 0 Sep 20 2015 priv
-rw-r----- 1 root www-data 1350 Sep 20 2015 pve-root-ca.pem
-rw-r----- 1 root www-data 1675 Sep 20 2015 pve-www.key
lrwxr-xr-x 1 root www-data 0 Jan 1 1970 qemu-server -> nodes/kevlavik/qemu-server
-rw-r----- 1 root www-data 1145 May 21 15:29 storage.cfg
-rw-r----- 1 root www-data 1266 Jun 5 16:16 user.cfg
-rw-r----- 1 root www-data 270 Jun 3 15:01 vzdump.cron

I think that's OK like this (it's a clustered system... 5 nodes).

Membership information
----------------------
Nodeid Votes Name
1 1 reykjavik
3 1 kevlavik (local)
4 1 olafsvik
2 1 straumsvik
5 1 grindavik

Pity is I don't have logs... the system apparently dived - and booted up. I just noted that some hosts went down....

CU
Tobias