Constant "Stop job running for LSB" on reboot

kroem

Well-Known Member
Jul 12, 2016
45
0
46
38
I have some issues where I cannot perform a graceful reboot of my PVE4.3 node.

Sometimes I get "Stop job running for PVE Manager" but that becuase of stuck VM's. What is more annoiying is the stop job running for LSB - with no limit. I need to hard reboot it via IPMI to get around.

I get some log messages if I leave it hanging, see screengrab. I would greatly appreciate any help, I'll gather what information is needed if I get some pointers :)

0KCCiRi.png
 
It looks to me the system is unmounting the NFS mount before the VMs are stopped.
Are you using other network shares on the machines besides NFS ?

To see if the problem is coming form the NFS stop vs VM stop problem, try the following:
shutdown all the VM of the host by calling

service pve-manager stop

Then try to reboot and see if the host hangs again.
 
It looks to me the system is unmounting the NFS mount before the VMs are stopped.
Are you using other network shares on the machines besides NFS ?

To see if the problem is coming form the NFS stop vs VM stop problem, try the following:
shutdown all the VM of the host by calling

service pve-manager stop

Then try to reboot and see if the host hangs again.
No, all of the VM's are currently hosted over NFS, shared from the same host actaully.

So, I stopped pve-manager and rebooted. First it got a little stuck on PVE API Proxy Server, but not the full timer. Then it went to NFS lite this, waited the full timer and moved passed it

r9qAegk.png


and then "Reached target shutdown"

B0jZnpD.png



(These seem like "new" issues, but in reality the process that it has been getting stuck on have differered a little before too - LSB/The unmounting of NFS)

Thank you in advance...
 
Ok usually NFS will not shutdown properly due to processes trying to access files residing on the mount point.

Try the following ( if your customers can support the downtime :)

* stop the pve-manager service
when it's finished:
* stop the nfs service

if NFS hangs try to to see what's using it

fuser -uvm /path_to_my_nfs_mount
 
This is not in production - well, yes, Sonos is not working at home when it's down, but I can live this that ;)

root@cat:~# service pve-manager status
* pve-manager.service - PVE VM Manager
Loaded: loaded (/lib/systemd/system/pve-manager.service; enabled)
Active: inactive (dead) since Thu 2016-10-13 16:40:55 CEST; 5min ago
Process: 8040 ExecStop=/usr/bin/pvesh --nooutput create /nodes/localhost/stopall (code=exited, status=0/SUCCESS)
Process: 8036 ExecStop=/usr/bin/vzdump -stop (code=exited, status=0/SUCCESS)
Process: 4697 ExecStart=/usr/bin/pvesh --nooutput create /nodes/localhost/startall (code=exited, status=0/SUCCESS)
Main PID: 4697 (code=exited, status=0/SUCCESS)

Oct 13 16:24:30 cat pve-manager[4699]: <root@pam> starting task UPID:cat:000013E5:000012D3:57FF991E:qmstart:112:root@pam:
Oct 13 16:24:30 cat pve-manager[5093]: start VM 112: UPID:cat:000013E5:000012D3:57FF991E:qmstart:112:root@pam:
Oct 13 16:24:35 cat pve-manager[4699]: <root@pam> starting task UPID:cat:0000142A:000014C9:57FF9923:vzstart:115:root@pam:
Oct 13 16:24:35 cat pve-manager[5162]: starting CT 115: UPID:cat:0000142A:000014C9:57FF9923:vzstart:115:root@pam:
Oct 13 16:24:39 cat pve-manager[4697]: <root@pam> end task UPID:cat:0000125B:00000584:57FF98FC:startall::root@pam: OK
Oct 13 16:24:40 cat systemd[1]: Started PVE VM Manager.
Oct 13 16:40:54 cat systemd[1]: Stopping PVE VM Manager...
Oct 13 16:40:55 cat pve-manager[8040]: <root@pam> starting task UPID:cat:00001F6A:00019387:57FF9CF7:stopall::root@pam:
Oct 13 16:40:55 cat pve-manager[8040]: <root@pam> end task UPID:cat:00001F6A:00019387:57FF9CF7:stopall::root@pam: OK
Oct 13 16:40:55 cat systemd[1]: Stopped PVE VM Manager.

A whole lot are still accessing it it seems?

root@cat:~# fuser -uvm /mnt/pve/nfs_vol1
16:44:07.070 §nfs_vol1/ nfs_vol1_VM/ nfs_vol1_backup/

The VM storage have nothing
16:44:17.974 §root@cat:~# fuser -uvm /mnt/pve/nfs_vol1_VM/
16:44:18.005 § USER PID ACCESS COMMAND
16:44:18.006 §/mnt/pve/nfs_vol1_VM:
16:44:18.006 § root kernel mount (root)/mnt/pve/nfs_vol1_VM

(The others are too large, had to put em on Git)

 
Ok usually NFS will not shutdown properly due to processes trying to access files residing on the mount point.

Try the following ( if your customers can support the downtime :)
Any ideas? :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!