Hi,
We're making heavy use of PVE in our research institute and this week we had a bad suprise with the pvetunnel and pvemirror daemons.
Two nodes in our large cluster are having hardware problems, and were shutdown. pvemirror and pvetunnel starting writing several dozens of error messages per second in our logs as the ones described in http://www.proxmox.com/forum/showthread.php?t=1422:
Over the weekend, our decently sized disks in every host in the cluster were full and some services started to fail.
I know that the solution seems to be to restart the tunnel after having removed the node from the cluster, but I wonder if the daemons can be made a bit smarter wrt logging to avoid getting this sort of problem the next time a node goes down.
We're making heavy use of PVE in our research institute and this week we had a bad suprise with the pvetunnel and pvemirror daemons.
Two nodes in our large cluster are having hardware problems, and were shutdown. pvemirror and pvetunnel starting writing several dozens of error messages per second in our logs as the ones described in http://www.proxmox.com/forum/showthread.php?t=1422:
Sep 11 01:16:47 blade101 pvetunnel[3783]: trying to finish tunnel 23158
Over the weekend, our decently sized disks in every host in the cluster were full and some services started to fail.
I know that the solution seems to be to restart the tunnel after having removed the node from the cluster, but I wonder if the daemons can be made a bit smarter wrt logging to avoid getting this sort of problem the next time a node goes down.