i have a 4 node cluster hooked up to a SAN
today i got notification email from node1 (master node)
/etc/cron.daily/logrotate:
Job for pveproxy.service failed. See 'systemctl status pveproxy.service' and 'journalctl -xn' for details.
Job for spiceproxy.service failed. See 'systemctl status spiceproxy.service' and 'journalctl -xn' for details.
error: error running shared postrotate script for '/var/log/pveproxy/access.log '
run-parts: /etc/cron.daily/logrotate exited with return code 1
so when i got in i was unable to access the GUI through the master node, it's accessible from the other 3 however it's unresponsive. VMs are still running and accessible (ssh) and but I can't migrate them online do do a host reboot.
pveversion
pve-manager/4.2-4/2660193c (running kernel: 4.4.8-1-pve)
dmesg message
INFO: task pveproxy:9610 blocked for more than 120 seconds.
[1931640.280328] Tainted: P O 4.4.8-1-pve #1
[1931640.280359] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
i can't kill -9 tasks and any qm commands never finish (no error). what would be the safest way of fixing this ? log in to each vm and do a gracefull shutdown from within the vm and then reboot the host ?
Please advise
Thanks
today i got notification email from node1 (master node)
/etc/cron.daily/logrotate:
Job for pveproxy.service failed. See 'systemctl status pveproxy.service' and 'journalctl -xn' for details.
Job for spiceproxy.service failed. See 'systemctl status spiceproxy.service' and 'journalctl -xn' for details.
error: error running shared postrotate script for '/var/log/pveproxy/access.log '
run-parts: /etc/cron.daily/logrotate exited with return code 1
so when i got in i was unable to access the GUI through the master node, it's accessible from the other 3 however it's unresponsive. VMs are still running and accessible (ssh) and but I can't migrate them online do do a host reboot.
pveversion
pve-manager/4.2-4/2660193c (running kernel: 4.4.8-1-pve)
dmesg message
INFO: task pveproxy:9610 blocked for more than 120 seconds.
[1931640.280328] Tainted: P O 4.4.8-1-pve #1
[1931640.280359] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
i can't kill -9 tasks and any qm commands never finish (no error). what would be the safest way of fixing this ? log in to each vm and do a gracefull shutdown from within the vm and then reboot the host ?
Please advise
Thanks