Random reboots

Ben S

New Member
Aug 16, 2018
4
0
1
35
Hello dear Proxmox community :) ,

We have a 5 nodes cluster with CEPH and HA enabled and we're experiencing some random and unwanted reboots.

Those happen mainly (but not limited to) when :
- We migrate VM/CT between nodes
- We reboot manually a node (some random other nodes reboot as well)

If we check the syslog we don't see any relevant information (see attached screenshot)

Kernel Version Linux 4.15.18-1-pve #1 SMP PVE 4.15.18-17 (Mon, 30 Jul 2018 12:53:35 +0200)
PVE Manager Version pve-manager/5.2-6/bcd5f008

Does anyone have an idea about a probable cause for this issue ?

Thanks

Ben
 

Attachments

  • photo_2018-08-14_15-39-50.jpg
    photo_2018-08-14_15-39-50.jpg
    248.3 KB · Views: 6
Are you using cluster network on the same NIC with storage network and/or migration network?

3 network need separated
 
We use CEPH storage and disks are on each server. All servers are on the same network.

Do you mean we need 3 NIC/Server ?
 
Yes,
  1. 1 for public network
  2. 1 for private network LAN - cross VM connection
  3. 1 for CEPH - 10Gbps would be better
  4. 1 for cluster network - this must be dedicated for ring0 network and you should add ring1 network. You may want to take a look here: https://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network
Your problem maybe caused by storage network saturated and ring network did not work. Hences, proxmox reboot server due to cluster down.
 
We had the same problem yesterday. Out of our 5 node cluster 3 nodes suddenly rebooted while trying to migrate a VM from one node to another. Our cluster uses one network for Ceph and one network for cross VM connection and public network. We have been running stable like this for many years with lots of migrations no problems at all. Our current Proxmox version is Virtual Environment 4.4-22/2728f613. Now we had this occur twice within one day and i have really no idea what to look for in the syslog. Here is the syslog from when the second reboot happened. Wondering if the first entry contains the right clue, but i have no idea what it means.


Aug 28 11:33:51 Bucky corosync[1816]: [TOTEM ] A processor failed, forming new configuration.
Aug 28 11:34:01 Bucky corosync[1816]: [TOTEM ] A new membership (192.168.X.XXX:2708) was formed. Members left: 5
Aug 28 11:34:01 Bucky corosync[1816]: [TOTEM ] Failed to receive the leave message. failed: 5
Aug 28 11:34:01 Bucky corosync[1816]: [TOTEM ] Retransmit List: 1
Aug 28 11:34:01 Bucky pmxcfs[1702]: [dcdb] notice: members: 1/22496, 2/28978, 3/1717, 4/1702
Aug 28 11:34:01 Bucky pmxcfs[1702]: [dcdb] notice: starting data syncronisation
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: members: 1/22496, 2/28978, 3/1717, 4/1702
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: starting data syncronisation
Aug 28 11:34:01 Bucky corosync[1816]: [QUORUM] Members[4]: 3 4 1 2
Aug 28 11:34:01 Bucky corosync[1816]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 28 11:34:01 Bucky pmxcfs[1702]: [dcdb] notice: received sync request (epoch 1/22496/0000000C)
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: received sync request (epoch 1/22496/00000008)
Aug 28 11:34:01 Bucky pmxcfs[1702]: [dcdb] notice: received all states
Aug 28 11:34:01 Bucky pmxcfs[1702]: [dcdb] notice: leader is 1/22496
Aug 28 11:34:01 Bucky pmxcfs[1702]: [dcdb] notice: synced members: 1/22496, 2/28978, 3/1717, 4/1702
Aug 28 11:34:01 Bucky pmxcfs[1702]: [dcdb] notice: all data is up to date
Aug 28 11:34:01 Bucky pmxcfs[1702]: [dcdb] notice: dfsm_deliver_queue: queue length 11
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: received all states
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: all data is up to date
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: dfsm_deliver_queue: queue length 111
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: received log
Aug 28 11:34:01 Bucky pmxcfs[1702]: [main] notice: ignore duplicate
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: received log
Aug 28 11:34:01 Bucky pmxcfs[1702]: [main] notice: ignore duplicate
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: received log
Aug 28 11:34:01 Bucky pmxcfs[1702]: [main] notice: ignore duplicate
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: received log
Aug 28 11:34:01 Bucky pmxcfs[1702]: [main] notice: ignore duplicate
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: received log
Aug 28 11:34:01 Bucky pmxcfs[1702]: [main] notice: ignore duplicate
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: received log
Aug 28 11:34:01 Bucky pmxcfs[1702]: [main] notice: ignore duplicate
Aug 28 11:34:01 Bucky pmxcfs[1702]: [status] notice: received log
Aug 28 11:34:01 Bucky pmxcfs[1702]: [main] notice: ignore duplicate
Aug 28 11:34:07 Bucky corosync[1816]: [TOTEM ] A new membership (192.168.X.XXX:2712) was formed. Members joined: 5
Aug 28 11:34:07 Bucky corosync[1816]: [TOTEM ] Retransmit List: 1
Aug 28 11:34:07 Bucky pmxcfs[1702]: [dcdb] notice: members: 1/22496, 2/28978, 3/1717, 4/1702, 5/1688
Aug 28 11:34:07 Bucky pmxcfs[1702]: [dcdb] notice: starting data syncronisation
Aug 28 11:34:07 Bucky pmxcfs[1702]: [status] notice: members: 1/22496, 2/28978, 3/1717, 4/1702, 5/1688
Aug 28 11:34:07 Bucky pmxcfs[1702]: [status] notice: starting data syncronisation
Aug 28 11:34:07 Bucky corosync[1816]: [QUORUM] Members[5]: 3 4 5 1 2
Aug 28 11:34:07 Bucky corosync[1816]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 28 11:34:07 Bucky pmxcfs[1702]: [dcdb] notice: received sync request (epoch 1/22496/0000000D)
Aug 28 11:34:07 Bucky pmxcfs[1702]: [status] notice: received sync request (epoch 1/22496/00000009)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!