[SOLVED] PVE Shutdown/Reboot takes more than 10 minutes

fpdragon · Feb 5, 2024

Not sure where to start.
It runs on my new refurbed HPE DL380 Gen9.

Hope you can help. Or maybe it is just normal on server HW?
But from the outputs with these timeouts and watchdogs it is kind of strange.
Thanks.

UdoB · Feb 5, 2024

To find out which service takes a long time you may run ~# systemd-analyze blame. See https://www.freedesktop.org/software/systemd/man/latest/systemd-analyze.html for other commands like "...time", "...critical-chain", ...

Logged errors since last boot should be shown by ~# journalctl -b 0 -p 3

fpdragon · Feb 5, 2024

Thanks Udo.

Code:

~# journalctl -b 0 -p 3
Feb 05 11:20:06 ProxHpDL380 kernel: [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
Feb 05 11:20:08 ProxHpDL380 kernel: i8042: Can't read CTR while initializing i8042
Feb 05 11:20:06 ProxHpDL380 systemd-modules-load[1293]: Failed to find module 'vfio_virqfd #not needed if on kernel 6.2 or newer'
Feb 05 11:20:14 ProxHpDL380 pmxcfs[2740]: [quorum] crit: quorum_initialize failed: 2
Feb 05 11:20:14 ProxHpDL380 pmxcfs[2740]: [quorum] crit: can't initialize service
Feb 05 11:20:14 ProxHpDL380 pmxcfs[2740]: [confdb] crit: cmap_initialize failed: 2
Feb 05 11:20:14 ProxHpDL380 pmxcfs[2740]: [confdb] crit: can't initialize service
Feb 05 11:20:14 ProxHpDL380 pmxcfs[2740]: [dcdb] crit: cpg_initialize failed: 2
Feb 05 11:20:14 ProxHpDL380 pmxcfs[2740]: [dcdb] crit: can't initialize service
Feb 05 11:20:14 ProxHpDL380 pmxcfs[2740]: [status] crit: cpg_initialize failed: 2
Feb 05 11:20:14 ProxHpDL380 pmxcfs[2740]: [status] crit: can't initialize service
Feb 05 11:20:18 ProxHpDL380 smartd[2420]: Device: /dev/sdf [SAT], 341 Currently unreadable (pending) sectors
Feb 05 11:20:18 ProxHpDL380 smartd[2420]: Device: /dev/sdf [SAT], 340 Offline uncorrectable sectors
Feb 05 11:20:28 ProxHpDL380 pve-guests[2994]: CT is locked (disk)

Seems that there are several things goin on.

The firmware bug I read that this is something HW specific that linux wants to takeover some sensors. I guess it is non critical.

The quorum thing... before I had this machine running standalone and the shutdown took the same time.

/dev/sdf ... That was new to me. This is one of multiple disks that are passed through to a VM. SMART data is ok and the disk was working fine. At least I have not found an issue. Is the disk broken? Could this disk lead to the long shutdown delay and maybe also other problems?

What do you say?

ubu · Feb 5, 2024

Seems like your disks have problems, this can certainly lead to long startup and shutdown times, try to replace the disk /dev/sdf
Not every error is caught by SMART, unfortunatly, but try a smart long test:
smartctl -t long /dev/sdf

fpdragon · Feb 5, 2024

I set /dev/sdf to "retired", rebuild my storage pool and removed it.
now the shutdown takes only seconds.

Thanks a lot for the help.

Code:

~# journalctl  -b 0  -p 3

was the solution

ubu · Feb 5, 2024

[SOLVED] PVE Shutdown/Reboot takes more than 10 minutes

fpdragon

Member

Attachments

UdoB

Distinguished Member

fpdragon

Member

ubu

Famous Member

fpdragon

Member

ubu

Famous Member

We value your privacy