Unmounting and watchdog problem

Jan 18, 2017
97
2
8
41
Hello, i got 3 servers into a cluster.
the configuration of this 3 servers (firewall, nfs shares to my synology, etc) are the same.
when i reboot server 1 and 2 servers, no problems.
when i reboot server 3, he gives a problem about unmounting the shares (see attachment) and hangs on the watchdog error. I have to do a hard reset for a reboot.

someone an idea what could help?
tnx in advance.
Bart
 

Attachments

  • unmounting failes.png
    unmounting failes.png
    32.6 KB · Views: 30
Yes, i red a lot of it.
I know the watchdog line is not a problem, but no reboot after 20 minutes it is ;)

every servers has the same software configuration. so, how could i investigate why he didnt restart?
And do you have a idea why unmounting dont work?
 
i unmounted everything from storage NFS for server 3 and rebooted.
And again he gave the unmounting problem (but i didnt mount this shares...)
so thats strange isnt it?
 

Attachments

  • storageunmounting.png
    storageunmounting.png
    11.6 KB · Views: 13
hmm, i rebooted again (without mounting) and he rebooted well and fast.
so the ' hang' problem must be on the mounting problem.
I will test one by one which mount the problem is.. :)
 
please provide
  • pveversion -v
  • /etc/pve/storage.cfg
  • /etc/fstab
and enable persistent journal ("mkdir /var/log/journal; systemctl restart systemd-journald"). after rebooting, you can collect the log of the last shutdown with "journalctl -b-1 --since 'TIMEOFSHUTDOWN'", and include it as well.
 
root@server3:~# pveversion -v
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.4-1-pve: 4.13.4-25
pve-kernel-4.10.17-2-pve: 4.10.17-20
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90
openvswitch-switch: 2.7.0-2


# <file system> <mount point> <type> <options> <dump> <pass>
/dev/pve/root / ext4 errors=remount-ro 0 1
UUID=2969-9E6F /boot/efi vfat defaults 0 1
/dev/pve/swap none swap sw 0 0
proc /proc proc defaults 0 0



dir: local
path /var/lib/vz
content vztmpl,backup,iso

dir: extra-disk
path /mnt/2tbschijf
content iso,images,backup
maxfiles 3
nodes server111,server136
shared 0

nfs: NAS2-Eikelkamp
export /volume2/dcbackup
path /mnt/pve/NAS2-XXX
server XXXX
content iso,images,backup
maxfiles 5
nodes server3,server111,server136
options vers=3

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

nfs: ServersSATA
export /volume1/ServersSATA
path /mnt/pve/ServersSATA
server 172.16.1.252
content images
maxfiles 1
nodes server136,server111,server3
options vers=3

nfs: ServersSSD
export /volume2/ServersSSD
path /mnt/pve/ServersSSD
server 172.16.1.252
content images
maxfiles 1
nodes server136,server111,server3
options vers=3

nfs: BackupSATA
export /volume1/BackupSATA
path /mnt/pve/BackupSATA
server 172.16.1.252
content iso,backup,images
maxfiles 3
nodes server111,server3,server136
options vers=3

nfs: ExtraDisk-Server136
disable
export /mnt/2tbschijf
path /mnt/pve/ExtraDisk-Server136
server 172.16.1.2
content images,backup,iso
maxfiles 3
nodes server111
options vers=3

nfs: ExtraDisk-Server111
disable
export /mnt/2tbschijf
path /mnt/pve/ExtraDisk-Server111
server 172.16.1.1
content iso,images,backup
maxfiles 3
nodes server136
options vers=3

i did : mkdir /var/log/journal; systemctl restart systemd-journald
then i rebooted.
3 mounts OK this time.
1 mount UNMOUNTED FAILED (1min, 40 seconds)
watchdog 2 minutes i think, then rebooted automaticaly

rebooted again
2 mounts OK
2 mounts UNMOUNTED FAILED (1min, 40 seconds)
watchdog 5 minutes i think, then rebooted automaticaly

when i do the command:
root@server3:~# journalctl -b-1 --since 'TIMEOFSHUTDOWN
i get this prompt:
>

i have 2 files into the journal directory, do you want them?

16 MB root:systemd-journal 0640 2017/10/27 - 11:50:32
8 MB root:systemd-journal 0640 2017/10/27 - 11:41:27
 
you need to replace "TIMEOFSHUTDOWN" with the actual time when you issued to the reboot/shutdown ;)
 
do you use openvswitch? maybe it gets killed too early in the shutdown, and then unmounting NFS is no longer possible?
 
Yes, i use openswitch. On all three servers. 2 without problems.
the strange thing is, sometimes he gives the error: unmounting... and 2 seconds later OK.
so if he kills openswitch to fast, he couldnt after 2 seconds unmounting either. right?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!