Proxmox 2.0 Node1 CPU Failed Rebooted Automatically

C

Chris Rivera

Guest
This has took me some time to track down and find out what has happened and I still don't know exactly what happened.

It seems that proxmox (on my production cloud) detected 1 cpu failed and literally shutdown all vms and rebooted its self.


Sep 4 16:44:21 proxmox1 corosync[648411]: [TOTEM ] A processor failed, forming new configuration.
Sep 4 16:47:04 proxmox1 shutdown[649841]: shutting down for system reboot

When it came back online i checked the logs again and see:

Sep 4 17:10:44 proxmox1 corosync[1482]: [TOTEM ] A processor failed, forming new configuration.
Sep 4 17:11:53 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
Sep 4 17:11:53 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 31 32
Sep 4 17:11:53 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] CLM CONFIGURATION CHANGE
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] New Configuration:
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.154)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.155)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.156)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.157)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.158)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.159)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.160)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.161)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] Members Left:
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] Members Joined:
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] CLM CONFIGURATION CHANGE
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] New Configuration:
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.154)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.155)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.156)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.157)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.158)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.159)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.160)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.161)
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] Members Left:
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] Members Joined:
Sep 4 17:11:53 proxmox1 corosync[1482]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 4 17:11:53 proxmox1 corosync[1482]: [CPG ] chosen downlist: sender r(0) ip(63.217.249.158) ; members(old:8 left:0)
Sep 4 17:11:53 proxmox1 corosync[1482]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 4 17:12:19 proxmox1 corosync[1482]: [TOTEM ] A processor failed, forming new configuration.
Sep 4 17:13:30 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
Sep 4 17:13:44 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 31 32
Sep 4 17:13:44 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50
Sep 4 17:13:44 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 70 71 72 73 75 76 77 78 79 7a 7b
Sep 4 17:13:44 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 9a 9b 9c 9d 9e 9f a0 a1 a2 a3 a4
Sep 4 17:13:44 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: c7 c8 c9 ca cb cc cd ce cf d0
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] CLM CONFIGURATION CHANGE
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] New Configuration:
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.154)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.155)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.156)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.157)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.158)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.159)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.160)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.161)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] Members Left:
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] Members Joined:
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] CLM CONFIGURATION CHANGE
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] New Configuration:
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.154)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.155)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.156)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.157)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.158)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.159)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.160)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.161)
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] Members Left:
Sep 4 17:13:44 proxmox1 corosync[1482]: [CLM ] Members Joined:
Sep 4 17:13:44 proxmox1 corosync[1482]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 4 17:13:44 proxmox1 corosync[1482]: [CPG ] chosen downlist: sender r(0) ip(63.217.249.158) ; members(old:8 left:0)
Sep 4 17:13:44 proxmox1 corosync[1482]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 4 17:26:01 proxmox1 corosync[1482]: [TOTEM ] A processor failed, forming new configuration.
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] CLM CONFIGURATION CHANGE
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] New Configuration:
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.154)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.155)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.156)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.157)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.158)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.159)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.160)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.161)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] Members Left:
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] Members Joined:
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] CLM CONFIGURATION CHANGE
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] New Configuration:
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.154)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.155)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.156)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.157)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.158)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.159)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.160)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] #011r(0) ip(63.217.249.161)
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] Members Left:
Sep 4 17:29:02 proxmox1 corosync[1482]: [CLM ] Members Joined:
Sep 4 17:29:02 proxmox1 corosync[1482]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 4 17:29:12 proxmox1 corosync[1482]: [TOTEM ] A processor failed, forming new configuration.




WHAT IS GOING ON HERE????
 
It seems that proxmox (on my production cloud) detected 1 cpu failed

Why do you think a CPU has failed? The logs you send just indicates a problem with cluster communication.

Sep 4 16:44:21 proxmox1 corosync[648411]: [TOTEM ] A processor failed, forming new configuration.


This indicates that this node can't communicate to another node in the cluster.


Sep 4 16:47:04 proxmox1 shutdown[649841]: shutting down for system reboot


Someone manually restarted the node?


Sep 4 17:10:44 proxmox1 corosync[1482]: [TOTEM ] A processor failed, forming new configuration.
Sep 4 17:11:53 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
Sep 4 17:11:53 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 31 32
Sep 4 17:11:53 proxmox1 corosync[1482]: [TOTEM ] Retransmit List: 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] CLM CONFIGURATION CHANGE
Sep 4 17:11:53 proxmox1 corosync[1482]: [CLM ] New Configuration:

This is the log from corosync cluster membertship protocol (totem).
 
Originally Posted by Chris Rivera
Sep 4 16:44:21 proxmox1 corosync[648411]: [TOTEM ] A processor failed, forming new configuration.


This indicates that this node can't communicate to another node in the cluster.

Good to know. I replaced the processors before this post just to clear up any issues.



Originally Posted by Chris Rivera
Sep 4 16:44:21 proxmox1 corosync[648411]: [TOTEM ] A processor failed, forming new configuration.

This indicates that this node can't communicate to another node in the cluster

Which node is in question i do not see any hostname or ip associated. If i can guess... id bet its node 8 which does seems to have issues with quorum.. I had to run pvecm e 1 just to be able to ssh into the box since something is wrong.



Originally Posted by Chris Rivera
Sep 4 16:47:04 proxmox1 shutdown[649841]: shutting down for system reboot

Someone manually restarted the node?

This node was not manually rebooted. There was only 1 session logged in to the server which was me and i did not issue a reboot command. history is still on the server, was not cleared, and no history items show reboot or related command to reboot the server. It rebooted by itself, and or a process submitted the command

How can we track down what process, task, or application issued a reboot command? I need to find out what caused this node to reboot to ensure this will not happen again.


 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!