Node reboot loop

proxtest · Aug 22, 2016

HI there,

one of my 5 Nodes reboots every 5 - 10 minutes, its a torture for my Ceph storage! the cluster runs 3 months whitout any problem but now it sucks. :-(

Is there a timer (watchdog?) or something who can make this happen?

i can't find anything in syslog or kern.log, its a reset and not a normal shutdown!

i dont use the cluster manager because i dont trust them.

can somebody point me in the direction where i will find the reason?

proxmox-ve: 4.2-51 (running kernel: 4.4.8-1-pve)
pve-manager: 4.2-5 (running version: 4.2-5/7cf09667)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.8-1-pve: 4.4.8-51
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-39
qemu-server: 4.0-75
pve-firmware: 1.1-8
libpve-common-perl: 4.0-62
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-50
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-17
pve-container: 1.0-64
pve-firewall: 2.0-27
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
ceph: 0.94.7-1~bpo80+1

Quorum information
------------------
Date: Mon Aug 22 19:11:56 2016
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000002
Ring ID: 4684
Quorate: Yes

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 4
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.11.12.1
0x00000002 1 10.11.12.2 (local)
0x00000004 1 10.11.12.4
0x00000005 1 10.11.12.5

Membership information
----------------------
Nodeid Votes Name
1 1 node1pv
2 1 node2pv (local)
4 1 node4pv
5 1 node5pv

2016-08-22 19:14:32.194370 mon.0 [INF] pgmap v7526378: 1664 pgs: 892 active+clean, 9 active+undersized+degraded+remapped+backfilling, 763 active+undersized+degraded+remapped+wait_backfill; 3097 GB data, 7777 GB used, 81603 GB / 89380 GB avail; 163 kB/s wr, 37 op/s; 394883/2743666 objects degraded (14.393%); 1114915/2743666 objects misplaced (40.636%); 559 MB/s, 140 objects/s recovering
:-(

regards

psionic · Oct 3, 2019

One of my nodes starting doing the same as 'proxtext' after updating the node today. Any word on this issue?

Search

Search

Node reboot loop

proxtest

Active Member

psionic

Member