Proxmox locks up after few hours

sij4153

New Member
Jun 30, 2022
1
0
1
machine specs
dual X5670
24gb of Ecc ram
Asus z8na-d6 mobo



syslog:
Jul 1 00:37:02 Odin pveproxy[1381]: worker 98206 finished
Jul 1 00:37:02 Odin pveproxy[1381]: starting 1 worker(s)
Jul 1 00:37:02 Odin pveproxy[1381]: worker 107472 started
Jul 1 00:50:22 Odin pveproxy[98205]: worker exit
Jul 1 00:50:22 Odin pveproxy[1381]: worker 98205 finished
Jul 1 00:50:22 Odin pveproxy[1381]: starting 1 worker(s)
Jul 1 00:50:22 Odin pveproxy[1381]: worker 110211 started
Jul 1 00:51:30 Odin pveproxy[98204]: worker exit
Jul 1 00:51:30 Odin pveproxy[1381]: worker 98204 finished
Jul 1 00:51:30 Odin pveproxy[1381]: starting 1 worker(s)
Jul 1 00:51:30 Odin pveproxy[1381]: worker 110478 started
Jul 1 00:52:41 Odin pvedaemon[1375]: <root@pam> successful auth for user 'root@pam'
Jul 1 01:17:01 Odin CRON[114943]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jul 1 02:17:01 Odin CRON[125806]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jul 1 02:18:29 Odin smartd[1014]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 63
Jul 1 02:18:29 Odin smartd[1014]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 37
Jul 1 02:34:55 Odin pvedaemon[1375]: <root@pam> successful auth for user 'root@pam'
Jul 1 02:48:29 Odin smartd[1014]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 63 to 64
Jul 1 02:48:29 Odin smartd[1014]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 37 to 36
Jul 1 03:05:57 Odin systemd[1]: Starting Daily apt download activities...
Jul 1 03:05:58 Odin systemd[1]: apt-daily.service: Succeeded.
Jul 1 03:05:58 Odin systemd[1]: Finished Daily apt download activities.
Jul 1 03:08:24 Odin systemd[1]: session-1.scope: Succeeded.
Jul 1 03:08:34 Odin systemd[1]: Stopping User Manager for UID 0...
Jul 1 03:08:34 Odin systemd[2181]: Stopped target Main User Target.
Jul 1 03:08:34 Odin systemd[2181]: Stopped target Basic System.
Jul 1 03:08:34 Odin systemd[2181]: Stopped target Paths.
Jul 1 03:08:34 Odin systemd[2181]: Stopped target Sockets.
Jul 1 03:08:34 Odin systemd[2181]: Stopped target Timers.
Jul 1 03:08:34 Odin systemd[2181]: dirmngr.socket: Succeeded.
Jul 1 03:08:34 Odin systemd[2181]: Closed GnuPG network certificate management daemon.
Jul 1 03:08:34 Odin systemd[2181]: gpg-agent-browser.socket: Succeeded.
Jul 1 03:08:34 Odin systemd[2181]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jul 1 03:08:34 Odin systemd[2181]: gpg-agent-extra.socket: Succeeded.
Jul 1 03:08:34 Odin systemd[2181]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jul 1 03:08:34 Odin systemd[2181]: gpg-agent-ssh.socket: Succeeded.
Jul 1 03:08:34 Odin systemd[2181]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jul 1 03:08:34 Odin systemd[2181]: gpg-agent.socket: Succeeded.
Jul 1 03:08:34 Odin systemd[2181]: Closed GnuPG cryptographic agent and passphrase cache.
Jul 1 03:08:34 Odin systemd[2181]: Removed slice User Application Slice.
Jul 1 03:08:34 Odin systemd[2181]: Reached target Shutdown.
Jul 1 03:08:34 Odin systemd[2181]: systemd-exit.service: Succeeded.
Jul 1 03:08:34 Odin systemd[2181]: Finished Exit the Session.
Jul 1 03:08:34 Odin systemd[2181]: Reached target Exit the Session.
Jul 1 03:08:34 Odin systemd[1]: user@0.service: Succeeded.
Jul 1 03:08:34 Odin systemd[1]: Stopped User Manager for UID 0.
Jul 1 03:08:34 Odin systemd[1]: Stopping User Runtime Directory /run/user/0...
Jul 1 03:08:34 Odin systemd[1]: run-user-0.mount: Succeeded.
Jul 1 03:08:34 Odin systemd[1]: user-runtime-dir@0.service: Succeeded.
Jul 1 03:08:34 Odin systemd[1]: Stopped User Runtime Directory /run/user/0.
Jul 1 03:08:34 Odin systemd[1]: Removed slice User Slice of UID 0.
Jul 1 03:10:01 Odin CRON[136618]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Jul 1 03:17:01 Odin CRON[137857]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
 
I had similar issue few hours ago:

Jul 29 07:14:00 node2 systemd[1]: Started Proxmox VE replication runner.
Jul 29 07:15:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Jul 29 07:15:00 node2 systemd[1]: Started Proxmox VE replication runner.
Jul 29 07:15:01 node2 CRON[382262]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 29 07:16:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Jul 29 07:16:00 node2 systemd[1]: Started Proxmox VE replication runner.
Jul 29 07:17:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Jul 29 07:17:00 node2 systemd[1]: Started Proxmox VE replication runner.
Jul 29 07:17:01 node2 CRON[382676]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@$

Notice same minute and second in CRON log line.

In my case 3 nodes, running for 5 years without problems, and this morning all 3 nodes froze, with same line in log.
PVE v 5.4, updates where applied few months ago, so no changes in configuration in last days.

cron.hourly is empty, and when i manually run "cd / && run-parts --report /etc/cron.hourly" nothing is returned, and no error is displayed.

Any idea ?