S
swordfishman
Guest
Background:
3 Servers in a cluster with an iSCSI target. Multicast is enabled and verified working. Cluster was fine up until recently when the backup jobs grew from ~250GB to ~500GB nightly. Backups are sent to another server (different network entirely) NFS. Guests are currently all working, but management of the guests and servers seems to be completely destroyed.
From the look of things the backups lock the files and then timeout during transfer sometime over the night. The next morning I login to see all three servers listed as offline (red instead of green). I can no longer see the names of the guests, but I can see that they are still there - 100 through 140 VMID.
pvecm n
Node Sts Inc Joined Name
1 M 524 2012-06-07 08:01:07 svr1
2 M 528 2012-06-07 08:01:08 svr2
3 M 528 2012-06-07 08:01:08 svr3
pveversion
pve-manager/2.1/f9b0f63a
/etc/init.d/cman stop
/etc/init.d/cman start - no errors!
/etc/init.d/pve-cluster restart - no errors!
One of the servers I ps aux | grep vzdump and killed off all the processes. Now it won't let me login on the web panel with the login failed for root. Apache restart did nothing.
I blanked out the servernames, but the guests are showing up as in the picture - cannot tell which is which!
Besides rebooting, what's the best way to get management back?
3 Servers in a cluster with an iSCSI target. Multicast is enabled and verified working. Cluster was fine up until recently when the backup jobs grew from ~250GB to ~500GB nightly. Backups are sent to another server (different network entirely) NFS. Guests are currently all working, but management of the guests and servers seems to be completely destroyed.
From the look of things the backups lock the files and then timeout during transfer sometime over the night. The next morning I login to see all three servers listed as offline (red instead of green). I can no longer see the names of the guests, but I can see that they are still there - 100 through 140 VMID.
pvecm n
Node Sts Inc Joined Name
1 M 524 2012-06-07 08:01:07 svr1
2 M 528 2012-06-07 08:01:08 svr2
3 M 528 2012-06-07 08:01:08 svr3
pveversion
pve-manager/2.1/f9b0f63a
/etc/init.d/cman stop
/etc/init.d/cman start - no errors!
/etc/init.d/pve-cluster restart - no errors!
One of the servers I ps aux | grep vzdump and killed off all the processes. Now it won't let me login on the web panel with the login failed for root. Apache restart did nothing.
I blanked out the servernames, but the guests are showing up as in the picture - cannot tell which is which!
Besides rebooting, what's the best way to get management back?