[SOLVED] [Feature Req] - Hooks should print their name (and intent) recursively

verbunk

Member
Jun 14, 2023
6
11
8
Hi @Support,

Posting here for consideration. I was prepping to upgrade to pve9 by getting all the updates for 8 squared away and had a stuck node with ~ 'processing post install hooks for pve-manager' (or so). Essentially the manager process was trying to bounce pve-ha-lrm and it was stuck. It ended up being `/usr/bin/perl /usr/share/lxc/hooks/lxc-pve-prestart-hook` which was stuck and kept being activated by one of the CTs HA settings (disabling fixed immediatelly) but in the console running apt upgrade I only saw a stuck job.

My feature req is to maybe be a bit more verbose in the way hooks are processed ... If there was an output like ,

Code:
* Processing hooks for pve-manager (4.0.7)
* * Checking condition A [ok]
* * Checking for executable [ok]
* * Restarting pve-ha-lrm [...]

Timeout `Restarting pve-ha-lrm` log written to /tmp/out.log

This removes the magic in the upgrade and gives a clear indication of what should happen, and a link to log when it doesn't. A slow march exercise to be sure but as the hooks (etc) are updated etc it would be a welcome change.
 
are you talking about "Processing triggers for .."? usually all that is done there is reloading the corresponding services (for PVE packages), so you'd only get an extra line that gives you no extra information..
 
In the case I had Monday pve-manager was stuck and I had to manually backtrace through `ps auxf` and then read the hierarchy of scripts calling scripts to realize it was caught waiting for pve-ha-lrm to reload. Then I had to diagnose why `systemctl restart pve-ha-lrm` was sticking by reading logs then finding out one CT had a stuck mount into ceph and was never going to complete.

`gives you no extra infomation` : well it's the support / documentation team's task to make sure the info is relevent. :D I'm not asking for fluff, just an indented line that said e.g. 'restarting pve-ha-lrm' under pve-manager and ideally a line that gave clues for timeout (or errors) in on working with problem resources.

This REQ isn't for anything tangible ... more like a greater importance being placed to empower the admins to diagnose through targeted messaging.
 
yes, but the issue is most package upgrades do all sorts of things in their maintainer scripts (and/or triggers) - if they all started logging all of that the resulting output would be overwhelming. the convention is to only print warnings and errors, not informational output for that reason.
 
I get it. It's tough to know when it's too much. I'd still argue that it's too little atm. Case in point, I'm upgrading now and,

Code:
Setting up pve-manager (8.4.11) ...

Progress: [ 76%] #####...

has been stuck for 5+ minutes on two nodes. I have no idea what it's trying to do (no mention of a script 'setting up'), or what dep is failing.

No need to continue the convo. It was just an observation and thought for some organized approach to debugging etc.

Thank for any consideration.