Hello,
We have a two node (master/slave) cluster utilizing DRBD for the image share. This morning I was greeted by frantic employees saying the web sites were down. Went to SSH into our VM's to see what was wrong and I couldn't reach the images. I then went to SSH the master node. Session was denied. I could ping the cluster nodes, but could not reach them for an SSH session.
After driving to the data center, I was able to login to the clusters via crash cart, but no activity was happening. A reboot and restart of images etc got everything back online, but I am now trying to figure out what exactly happened.
Checking the syslog, there was normal entries and then it just stopped and started back with the reboot messages.
I can't seem to find any type of Proxmox error log so if any body has any ideas to which I can use to trouble shoot the event and prevent it happening again, it would be greatly appreciated!
Edit: Using bare metal installations of PVE 1.7
We have a two node (master/slave) cluster utilizing DRBD for the image share. This morning I was greeted by frantic employees saying the web sites were down. Went to SSH into our VM's to see what was wrong and I couldn't reach the images. I then went to SSH the master node. Session was denied. I could ping the cluster nodes, but could not reach them for an SSH session.
After driving to the data center, I was able to login to the clusters via crash cart, but no activity was happening. A reboot and restart of images etc got everything back online, but I am now trying to figure out what exactly happened.
Checking the syslog, there was normal entries and then it just stopped and started back with the reboot messages.
Code:
Mar 29 21:35:03 server1 proxwww[32401]: Starting new child 32401
Mar 29 21:35:04 server1 pvemirror[2977]: starting cluster syncronization
Mar 29 21:35:04 server1 pvemirror[2977]: syncing templates
Mar 29 21:35:04 server1 pvemirror[2977]: cluster syncronization finished (0.08 seconds (files 0.00, config 0.00))
Mar 29 21:35:18 server1 proxwww[32403]: Starting new child 32403
Mar 29 21:35:33 server1 proxwww[32410]: Starting new child 32410
Mar 30 07:38:07 server1 kernel: imklog 3.18.6, log source = /proc/kmsg started.
Edit: Using bare metal installations of PVE 1.7
Last edited: