Unmounting and watchdog problem

Jan 18, 2017
97
2
8
42
Hello, i got 3 servers into a cluster.
the configuration of this 3 servers (firewall, nfs shares to my synology, etc) are the same.
when i reboot server 1 and 2 servers, no problems.
when i reboot server 3, he gives a problem about unmounting the shares (see attachment) and hangs on the watchdog error. I have to do a hard reset for a reboot.

someone an idea what could help?
tnx in advance.
Bart
 

Attachments

  • unmounting failes.png
    unmounting failes.png
    32.6 KB · Views: 35
Yes, i red a lot of it.
I know the watchdog line is not a problem, but no reboot after 20 minutes it is ;)

every servers has the same software configuration. so, how could i investigate why he didnt restart?
And do you have a idea why unmounting dont work?
 
i unmounted everything from storage NFS for server 3 and rebooted.
And again he gave the unmounting problem (but i didnt mount this shares...)
so thats strange isnt it?
 

Attachments

  • storageunmounting.png
    storageunmounting.png
    11.6 KB · Views: 13
please provide
  • pveversion -v
  • /etc/pve/storage.cfg
  • /etc/fstab
and enable persistent journal ("mkdir /var/log/journal; systemctl restart systemd-journald"). after rebooting, you can collect the log of the last shutdown with "journalctl -b-1 --since 'TIMEOFSHUTDOWN'", and include it as well.
 
root@server3:~# pveversion -v
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.4-1-pve: 4.13.4-25
pve-kernel-4.10.17-2-pve: 4.10.17-20
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90
openvswitch-switch: 2.7.0-2


# <file system> <mount point> <type> <options> <dump> <pass>
/dev/pve/root / ext4 errors=remount-ro 0 1
UUID=2969-9E6F /boot/efi vfat defaults 0 1
/dev/pve/swap none swap sw 0 0
proc /proc proc defaults 0 0



dir: local
path /var/lib/vz
content vztmpl,backup,iso

dir: extra-disk
path /mnt/2tbschijf
content iso,images,backup
maxfiles 3
nodes server111,server136
shared 0

nfs: NAS2-Eikelkamp
export /volume2/dcbackup
path /mnt/pve/NAS2-XXX
server XXXX
content iso,images,backup
maxfiles 5
nodes server3,server111,server136
options vers=3

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

nfs: ServersSATA
export /volume1/ServersSATA
path /mnt/pve/ServersSATA
server 172.16.1.252
content images
maxfiles 1
nodes server136,server111,server3
options vers=3

nfs: ServersSSD
export /volume2/ServersSSD
path /mnt/pve/ServersSSD
server 172.16.1.252
content images
maxfiles 1
nodes server136,server111,server3
options vers=3

nfs: BackupSATA
export /volume1/BackupSATA
path /mnt/pve/BackupSATA
server 172.16.1.252
content iso,backup,images
maxfiles 3
nodes server111,server3,server136
options vers=3

nfs: ExtraDisk-Server136
disable
export /mnt/2tbschijf
path /mnt/pve/ExtraDisk-Server136
server 172.16.1.2
content images,backup,iso
maxfiles 3
nodes server111
options vers=3

nfs: ExtraDisk-Server111
disable
export /mnt/2tbschijf
path /mnt/pve/ExtraDisk-Server111
server 172.16.1.1
content iso,images,backup
maxfiles 3
nodes server136
options vers=3

i did : mkdir /var/log/journal; systemctl restart systemd-journald
then i rebooted.
3 mounts OK this time.
1 mount UNMOUNTED FAILED (1min, 40 seconds)
watchdog 2 minutes i think, then rebooted automaticaly

rebooted again
2 mounts OK
2 mounts UNMOUNTED FAILED (1min, 40 seconds)
watchdog 5 minutes i think, then rebooted automaticaly

when i do the command:
root@server3:~# journalctl -b-1 --since 'TIMEOFSHUTDOWN
i get this prompt:
>

i have 2 files into the journal directory, do you want them?

16 MB root:systemd-journal 0640 2017/10/27 - 11:50:32
8 MB root:systemd-journal 0640 2017/10/27 - 11:41:27
 
you need to replace "TIMEOFSHUTDOWN" with the actual time when you issued to the reboot/shutdown ;)
 
do you use openvswitch? maybe it gets killed too early in the shutdown, and then unmounting NFS is no longer possible?
 
Yes, i use openswitch. On all three servers. 2 without problems.
the strange thing is, sometimes he gives the error: unmounting... and 2 seconds later OK.
so if he kills openswitch to fast, he couldnt after 2 seconds unmounting either. right?