[SOLVED] reboots hang with "watchdog did not stop"

sirsean12

New Member
Jul 17, 2017
8
0
1
34
Thanks for the Input. I will file a bug report as soon as I get some down time. I have updated to the latest version through the non-sub repo and still have the same issue. In your experience is it just better to do a full reinstall?

I really do think Proxmox has something good going on here, but the little random bugs really make it hard sell to my peers. I am more comfortable in a linux shell, these guys are mostly Windows people. My MAJOR Problems are below, I will open another comment about each.

NOVNC - Horrible latency on slower networks( i know this is a general VNC issue). Workaround is to use Spice, but, if you use NOVNC with the spice drivers over a remote VPN to the VM, there is no mouse cursor. This is a problem with deployment at customer sites because... if this were not the case, we could just set the video driver to Spice EVERY single time regardless of connection speed, use NOVNC for convenience and If there are latency problems, we can always use spice without powering down the VM. With this issue we have to determine this ahead of time and with situations where connections might be slow, due to ISP problems, you need to power down VM and switch to Spice. NOVNC is too convenient not to be able to click and use, then switch to spice if need be. Also, seems the spice people have a fix for the installer adding server2016 in the mix but have not added it to the installer executable on the download site. This is just laziness.

Bonding problems. LACP is selected and rebooted to take effect. WebGUI shows LACP but... cat /proc/net/bonding/bond0 shows "round robin" and I am getting errors on the switch and in Proxmox. I have no idea how to fix this without a reinstall.

Snapshots over ISCSI LVM - this needs to be implemented if possible.

Remove disks on ISCSI LVM without having to use the shell. ( I don't have a problem with this personally as it makes it harder to do something stupid if you are a noob, but others that like the point and click in professional / Enterprise environments do like such functionality.

Also, I just want to thank EVERYONE for the hard work and helping me with my issues, I hope my suggestions help.

Hopefully I can get this all cleared up by Friday... I can live with everything for now but this Round Robin issue.

Thank you!

-S
 

RobFantini

Renowned Member
May 24, 2012
1,662
35
68
Boston,Mass
re lacp , I did not see another thread so replying here. what kind of switch do you use?

and post your /etc/network/interfaces file.

we use bond_xmit_hash_policy layer2+3 and netgear layer 3 managed switches.

Code:
iface enp4s0f1 inet manual
iface enp4s0f0 inet manual
auto bond0
iface bond0 inet manual
       slaves enp4s0f0 enp4s0f1
       bond_miimon 100
       bond_mode 802.3ad
       bond_xmit_hash_policy layer2+3

auto vmbr0
iface vmbr0 inet static
        address 10.1.10.3
        netmask 255.255.255.0
        gateway 10.1.10.1
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0
 
  • Like
Reactions: fireon

sirsean12

New Member
Jul 17, 2017
8
0
1
34
re lacp , I did not see another thread so replying here. what kind of switch do you use?

and post your /etc/network/interfaces file.

we use bond_xmit_hash_policy layer2+3 and netgear layer 3 managed switches.

Code:
iface enp4s0f1 inet manual
iface enp4s0f0 inet manual
auto bond0
iface bond0 inet manual
       slaves enp4s0f0 enp4s0f1
       bond_miimon 100
       bond_mode 802.3ad
       bond_xmit_hash_policy layer2+3

auto vmbr0
iface vmbr0 inet static
        address 10.1.10.3
        netmask 255.255.255.0
        gateway 10.1.10.1
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0
Hey Rob,
Thanks for the Reply. I made a new thread with info here...

https://forum.proxmox.com/threads/b...ed-to-take-effect-stuck-in-round-robin.38054/

Thanks!
 
Nov 20, 2017
23
0
1
37
Canada
In the last upgrade of proxmox, I begin with the same problems of watchdog .
but is only in one node of my cluster.
How can force stop watchdog?

pveversion -v
proxmox-ve: 5.1-42 (running kernel: 4.13.13-6-pve)
pve-manager: 5.1-47 (running version: 5.1-47/97a08ab2)
pve-kernel-4.13: 5.1-42
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.4-1-pve: 4.13.4-26
ceph: 12.2.4-pve1
corosync: 2.4.2-pve3
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-28
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 2.1.1-3
lxcfs: 2.0.8-2
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-12
pve-cluster: 5.0-21
pve-container: 2.0-19
pve-docs: 5.1-16
pve-firewall: 3.0-5
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-3
pve-xtermjs: 1.0-2
qemu-server: 5.0-23
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.6-pve1~bpo9
 

emmanuel

New Member
Dec 5, 2017
28
1
3
experienced this on all 5 of our j1900 appliances, this did not happen before when I was working around it since June. I tried downgrading from 5.2 to 5.0 to see if this was version specific, still the same thing.
 

rordonez

New Member
Aug 4, 2010
12
0
1
Wanted to share our findings on this issue.
In our case a HP DL160 G8 - With a p420 Raid Card.

The last message displayed was watchdog did not stop

It was the shutdown of the raidcard that caused the hang,
We had to:
1 update the p420 firmware to 8.x
2 destroy the array that was on the disks,
3 Recreate the array,
4 Reinstall Proxmox, And the shutdown / reboot hang problem went away.

pveversion -v

pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.17-1-pve)
pve-manager: 5.2-1 (running version: 5.2-1/0fcd7879)
pve-kernel-4.15: 5.2-1
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-16
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-18
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-3
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-5
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-26
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9


hth

Rodrigo O
Xnet
 

goseph

Member
Dec 4, 2014
16
1
23
experienced this on all 5 of our j1900 appliances, this did not happen before when I was working around it since June. I tried downgrading from 5.2 to 5.0 to see if this was version specific, still the same thing.
This has to do with a BIOS setting:
I set in bios "OS" from Windows 7 to Android or Linux or Windows 8x and the rebooting and shutdown issues were gone. Reproduced it.

Still Problems?
"Play" with these values:
nano /etc/systemd/system.conf
RuntimeWatchdogSec=
ShutdownWatchdogSec=
 
  • Like
Reactions: emmanuel

leshch

New Member
Oct 17, 2018
7
0
1
34
Server HP DL360p Gen8

pveversion -v

proxmox-ve: 5.2-2 (running kernel: 4.15.17-1-pve)
pve-manager: 5.2-6 (running version: 5.2-6/bcd5f008)
pve-kernel-4.15: 5.2-1
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-37
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-29
pve-container: 2.0-24
pve-docs: 5.2-5
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-30
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9


When I restart, the machines does not actually restart, it rather stops at a message saying "watchdog watchdog0: watchdog did not stop!" and simply sits there, doing nothing.

I have tried changing GRUB_CMDLINE_LINUX="" to GRUB_CMDLINE_LINUX="reboot=bios"and GRUB_CMDLINE_LINUX="reboot=acpi"

I have also configured RuntimeWatchdogSec and ShutdownWatchdogSec to 0, to 20s..

I have also enabled and disabled ASR (automatic servers recovery) in the bios

I also tried systemctl reboot, shutdown -r now, init 6 and various other suggestions I found online.

But no luck. The behavior does not change, the computer does not restart, it just stays there showing watchdog watchdog0: watchdog did not stop!


How to disable watchdog completely?
 

goseph

Member
Dec 4, 2014
16
1
23
Server HP DL360p Gen8

pveversion -v

proxmox-ve: 5.2-2 (running kernel: 4.15.17-1-pve)
pve-manager: 5.2-6 (running version: 5.2-6/bcd5f008)
pve-kernel-4.15: 5.2-1
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-37
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-29
pve-container: 2.0-24
pve-docs: 5.2-5
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-30
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9


When I restart, the machines does not actually restart, it rather stops at a message saying "watchdog watchdog0: watchdog did not stop!" and simply sits there, doing nothing.

I have tried changing GRUB_CMDLINE_LINUX="" to GRUB_CMDLINE_LINUX="reboot=bios"and GRUB_CMDLINE_LINUX="reboot=acpi"

I have also configured RuntimeWatchdogSec and ShutdownWatchdogSec to 0, to 20s..

I have also enabled and disabled ASR (automatic servers recovery) in the bios

I also tried systemctl reboot, shutdown -r now, init 6 and various other suggestions I found online.

But no luck. The behavior does not change, the computer does not restart, it just stays there showing watchdog watchdog0: watchdog did not stop!


How to disable watchdog completely?
Try:
nano /etc/systemd/system.conf
RuntimeWatchdogSec=0
ShutdownWatchdogSec=0

But ask here if this is bad or not
 

leshch

New Member
Oct 17, 2018
7
0
1
34
Try:
nano /etc/systemd/system.conf
RuntimeWatchdogSec=0
ShutdownWatchdogSec=0

But ask here if this is bad or not
In my case the system is stuck without any messages.
The value “0” should be the default, so it would not change anything anyway.
 

goseph

Member
Dec 4, 2014
16
1
23
In my case the system is stuck without any messages.
The value “0” should be the default, so it would not change anything anyway.
How about giving it a try?
And can you choose the right "OS" inside BIOS? (Linux, Android, Windows 8x) Since "Windows 7" Setting caused those problems for me
 

leshch

New Member
Oct 17, 2018
7
0
1
34
How about giving it a try?
And can you choose the right "OS" inside BIOS? (Linux, Android, Windows 8x) Since "Windows 7" Setting caused those problems for me
I already try it, no changes.
What mean "choose the right OS inside BIOS"?
I don't have such options.
 
Last edited:

goseph

Member
Dec 4, 2014
16
1
23
I already try it, no changes.
What mean "choose the right OS inside BIOS"?
Sometimes you can choose an OperationSystem inside Bios. Like Linux or Windows.
Please do a Bios and Firmware-Update for all components including Raid-Controllers as well.
 

leshch

New Member
Oct 17, 2018
7
0
1
34
Sometimes you can choose an OperationSystem inside Bios. Like Linux or Windows.
Please do a Bios and Firmware-Update for all components including Raid-Controllers as well.
Already done.
All I want is sipmly turn off the watchdog or remove it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!