Node reboot loop

proxtest

Active Member
Mar 19, 2014
108
0
36
HI there,

one of my 5 Nodes reboots every 5 - 10 minutes, its a torture for my Ceph storage! the cluster runs 3 months whitout any problem but now it sucks. :-(

Is there a timer (watchdog?) or something who can make this happen?

i can't find anything in syslog or kern.log, its a reset and not a normal shutdown!

i dont use the cluster manager because i dont trust them. :)

can somebody point me in the direction where i will find the reason?

proxmox-ve: 4.2-51 (running kernel: 4.4.8-1-pve)
pve-manager: 4.2-5 (running version: 4.2-5/7cf09667)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.8-1-pve: 4.4.8-51
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-39
qemu-server: 4.0-75
pve-firmware: 1.1-8
libpve-common-perl: 4.0-62
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-50
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-17
pve-container: 1.0-64
pve-firewall: 2.0-27
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
ceph: 0.94.7-1~bpo80+1

Quorum information
------------------
Date: Mon Aug 22 19:11:56 2016
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000002
Ring ID: 4684
Quorate: Yes

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 4
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.11.12.1
0x00000002 1 10.11.12.2 (local)
0x00000004 1 10.11.12.4
0x00000005 1 10.11.12.5

Membership information
----------------------
Nodeid Votes Name
1 1 node1pv
2 1 node2pv (local)
4 1 node4pv
5 1 node5pv

2016-08-22 19:14:32.194370 mon.0 [INF] pgmap v7526378: 1664 pgs: 892 active+clean, 9 active+undersized+degraded+remapped+backfilling, 763 active+undersized+degraded+remapped+wait_backfill; 3097 GB data, 7777 GB used, 81603 GB / 89380 GB avail; 163 kB/s wr, 37 op/s; 394883/2743666 objects degraded (14.393%); 1114915/2743666 objects misplaced (40.636%); 559 MB/s, 140 objects/s recovering
:-(


regards
 
Last edited:
One of my nodes starting doing the same as 'proxtext' after updating the node today. Any word on this issue?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!