[SOLVED] PBS stops responding on update

May 16, 2020
271
16
38
51
Antwerp, Belgium
commandline.be
Running the latest PBS as a stand-alone I've ran into an issue then machine becomes unavailable.
Both console and network no longer respond. Sometimes console seems sluggish to respond, then stops.

My observation and impression is this is after proxmox-backup-daily-update ran.
To that end I disabled all of the below and re-enable the timers after 24 hours each.

proxmox-backup-daily-update.timer
apt-daily-upgrade.timer
man-db.timer

only proxmo-backup-daily-update.timer now remains disabled and the system appears to not hang anymore.
since trying out PBS I've not yet enabled a license for it.

is it possible the hang is related or did something else happen?

JL
 
Hi!
could post the syslog at the time of the issue (using journalctl)? Does the machine completely reboot/panic or does it just lag for a few minutes?
 
Hi!
could post the syslog at the time of the issue (using journalctl)? Does the machine completely reboot/panic or does it just lag for a few minutes?

Hey,

Though it seemed related to the update timers the system was found 'hung' this morning, it failed at 3:10 which is more or less the same time as before.

The machine simply stops responding to anything. Keyboard numlock led responds, no console shows anymore, network traffic stops, does not respond to ctrl-alt-del.

I've checked and tripple checked, there is nothing near or far in /var/log for syslog, daemon.log, kern.log,error Disabling update related timers did result in longer uptime.

Only current anomaly is e2scrub in /etc/crond.d which does not make sense to run on a ZFS filesystem? Now disabled the cronjob for e2scrub and also the service + timer. I've also disabled fstrim service and timer

For context, the system stayed up for multiple days when a firewall rule goof left it unreachable. Otherwise it tends to fail about every 24 hours.

Though this seemingly invalidates the below I'm trying out leaving e2scrub and fstrim disabled.

Now I notice repeat events of a backup up job running and finalising at 2:50 AM and the system "reliably failing" at 3:10AM The only anomalous activity at this time seemingly being for e2scrub related activity. There is no activity in any log after 3:10AM

I've now added

systemctl enable zfs-trim-weekly@zpool.timer --now

update: eventually it was found the last syslog was at 3:14 AM and this was for the proxmox-backup /api showing a 200 for a ticket url.

br,

JL
 
Last edited: