[SOLVED] DL360 Reboot every once in a while

b345

New Member
Feb 6, 2024
11
4
3
i have a cluster of 3 node, two dl360p and one Dell PC Optiplex 9020.
every now and then, probably like twice a day, one of the DL360 will just reboot, am not sure why, i don't see anything useful. drives and power supply all is fine. anyone ever experience this issue.

Ps. the backupserver1 is my proxmox Backup Server which i don't have turn on all the time, i only power it on when im doing a backup.

see attached logs
 

Attachments

one of the node rebooted again, see logs before the host initiated the Reboot



Feb 06 05:09:39 server2 corosync[1689]: [KNET ] link: host: 3 link: 0 is down

Feb 06 05:09:39 server2 corosync[1689]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)

Feb 06 05:09:39 server2 corosync[1689]: [KNET ] host: host: 3 has no active links

Feb 06 05:09:39 server2 corosync[1689]: [KNET ] link: Resetting MTU for link 0 because host 3 joined

Feb 06 05:09:39 server2 corosync[1689]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)

Feb 06 05:09:39 server2 corosync[1689]: [KNET ] pmtud: Global data MTU changed to: 1397

Feb 06 05:17:01 server2 CRON[363953]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)

Feb 06 05:17:01 server2 CRON[363954]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)

Feb 06 05:17:01 server2 CRON[363953]: pam_unix(cron:session): session closed for user root

Feb 06 05:18:46 server2 corosync[1689]: [KNET ] link: host: 3 link: 0 is down

Feb 06 05:18:46 server2 corosync[1689]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)

Feb 06 05:18:46 server2 corosync[1689]: [KNET ] host: host: 3 has no active links

Feb 06 05:18:48 server2 corosync[1689]: [KNET ] rx: host: 3 link: 0 is up

Feb 06 05:18:48 server2 corosync[1689]: [KNET ] link: Resetting MTU for link 0 because host 3 joined

Feb 06 05:18:48 server2 corosync[1689]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)

Feb 06 05:18:48 server2 corosync[1689]: [KNET ] pmtud: Global data MTU changed to: 1397

Feb 06 05:24:43 server2 pmxcfs[1606]: [dcdb] notice: data verification successful

Feb 06 05:27:56 server2 corosync[1689]: [TOTEM ] Retransmit List: 2849d

Feb 06 05:28:22 server2 corosync[1689]: [TOTEM ] Retransmit List: 2855f

Feb 06 05:31:40 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:31:46 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:31:46 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:31:50 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:31:50 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:31:50 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:31:51 server2 sshd[376951]: Accepted publickey for root from 10.10.12.2 port 36916 ssh2: RSA SHA256:xxxxxxxxxxxxxxxx

Feb 06 05:31:51 server2 sshd[376951]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)

Feb 06 05:31:51 server2 systemd-logind[1206]: New session 43 of user root.

Feb 06 05:31:51 server2 systemd[1]: Created slice user-0.slice - User Slice of UID 0.

Feb 06 05:31:51 server2 systemd[1]: Starting user-runtime-dir@0.service - User Runtime Directory /run/user/0...

Feb 06 05:31:51 server2 systemd[1]: Finished user-runtime-dir@0.service - User Runtime Directory /run/user/0.

Feb 06 05:31:51 server2 systemd[1]: Starting user@0.service - User Manager for UID 0...

Feb 06 05:31:51 server2 (systemd)[376955]: pam_unix(systemd-user:session): session opened for user root(uid=0) by (uid=0)

Feb 06 05:31:51 server2 systemd[376955]: Queued start job for default target default.target.

Feb 06 05:31:51 server2 systemd[376955]: Created slice app.slice - User Application Slice.

Feb 06 05:31:51 server2 systemd[376955]: Reached target paths.target - Paths.

Feb 06 05:31:51 server2 systemd[376955]: Reached target timers.target - Timers.

Feb 06 05:31:51 server2 systemd[376955]: Listening on dirmngr.socket - GnuPG network certificate management daemon.

Feb 06 05:31:51 server2 systemd[376955]: Listening on gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).

Feb 06 05:31:51 server2 systemd[376955]: Listening on gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).

Feb 06 05:31:51 server2 systemd[376955]: Listening on gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).

Feb 06 05:31:51 server2 systemd[376955]: Listening on gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.

Feb 06 05:31:51 server2 systemd[376955]: Reached target sockets.target - Sockets.

Feb 06 05:31:51 server2 systemd[376955]: Reached target basic.target - Basic System.

Feb 06 05:31:51 server2 systemd[376955]: Reached target default.target - Main User Target.

Feb 06 05:31:51 server2 systemd[376955]: Startup finished in 432ms.

Feb 06 05:31:51 server2 systemd[1]: Started user@0.service - User Manager for UID 0.

Feb 06 05:31:51 server2 systemd[1]: Started session-43.scope - Session 43 of User root.

Feb 06 05:31:51 server2 sshd[376951]: pam_env(sshd:session): deprecated reading of user environment enabled

Feb 06 05:31:51 server2 login[376975]: pam_unix(login:session): session opened for user root(uid=0) by root(uid=0)

Feb 06 05:31:51 server2 login[376980]: ROOT LOGIN on '/dev/pts/0' from '10.10.12.2'

Feb 06 05:32:05 server2 sshd[376951]: Received disconnect from 10.10.12.2 port 36916:11: disconnected by user

Feb 06 05:32:05 server2 sshd[376951]: Disconnected from user root 10.10.12.2 port 36916

Feb 06 05:32:05 server2 sshd[376951]: pam_unix(sshd:session): session closed for user root

Feb 06 05:32:05 server2 systemd-logind[1206]: Session 43 logged out. Waiting for processes to exit.

Feb 06 05:32:05 server2 systemd[1]: session-43.scope: Deactivated successfully.

Feb 06 05:32:05 server2 systemd-logind[1206]: Removed session 43.

Feb 06 05:32:05 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:32:05 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:32:05 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:32:09 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:32:15 server2 systemd[1]: Stopping user@0.service - User Manager for UID 0...

Feb 06 05:32:15 server2 systemd[376955]: Activating special unit exit.target...

Feb 06 05:32:15 server2 systemd[376955]: Stopped target default.target - Main User Target.

Feb 06 05:32:15 server2 systemd[376955]: Stopped target basic.target - Basic System.

Feb 06 05:32:15 server2 systemd[376955]: Stopped target paths.target - Paths.

Feb 06 05:32:15 server2 systemd[376955]: Stopped target sockets.target - Sockets.

Feb 06 05:32:15 server2 systemd[376955]: Stopped target timers.target - Timers.

Feb 06 05:32:15 server2 systemd[376955]: Closed dirmngr.socket - GnuPG network certificate management daemon.

Feb 06 05:32:15 server2 systemd[376955]: Closed gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).

Feb 06 05:32:15 server2 systemd[376955]: Closed gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).

Feb 06 05:32:15 server2 systemd[376955]: Closed gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).

Feb 06 05:32:15 server2 systemd[376955]: Closed gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.

Feb 06 05:32:15 server2 systemd[376955]: Removed slice app.slice - User Application Slice.

Feb 06 05:32:15 server2 systemd[376955]: Reached target shutdown.target - Shutdown.

Feb 06 05:32:15 server2 systemd[376955]: Finished systemd-exit.service - Exit the Session.

Feb 06 05:32:15 server2 systemd[376955]: Reached target exit.target - Exit the Session.

Feb 06 05:32:15 server2 systemd[1]: user@0.service: Deactivated successfully.

Feb 06 05:32:15 server2 systemd[1]: Stopped user@0.service - User Manager for UID 0.

Feb 06 05:32:15 server2 systemd[1]: Stopping user-runtime-dir@0.service - User Runtime Directory /run/user/0...

Feb 06 05:32:15 server2 systemd[1]: run-user-0.mount: Deactivated successfully.

Feb 06 05:32:15 server2 systemd[1]: user-runtime-dir@0.service: Deactivated successfully.

Feb 06 05:32:15 server2 systemd[1]: Stopped user-runtime-dir@0.service - User Runtime Directory /run/user/0.

Feb 06 05:32:15 server2 systemd[1]: Removed slice user-0.slice - User Slice of UID 0.

Feb 06 05:33:07 server2 postfix/qmgr[1673]: 8D8AC247A8: from=<root@server1.test.com>, size=32993, nrcpt=1 (queue active)

Feb 06 05:33:08 server2 postfix/smtp[378042]: 8D8AC247A8: replace: header From: vzdump backup tool <root@server1.test.com>: From: server2 info@test.com

Feb 06 05:33:08 server2 postfix/smtp[378042]: 8D8AC247A8: to=<info@test.com>, relay=smtp.gmail.com[142.250.105.108]:587, delay=4692, delays=4691/0.05/0.79/0.83, dsn=2.0.0, status=sent (250 2.0.0 OK 1707215588 l191-20020a8157c8000000b006040a5496adsm234439ywb.145 - gsmtp)

Feb 06 05:33:08 server2 postfix/qmgr[1673]: 8D8AC247A8: removed

Feb 06 05:34:01 server2 corosync[1689]: [TOTEM ] Retransmit List: 28e31

Feb 06 05:38:29 server2 kernel: perf: interrupt took too long (5026 > 4992), lowering kernel.perf_event_max_sample_rate to 39750

Feb 06 05:40:51 server2 corosync[1689]: [TOTEM ] Retransmit List: 298ce

Feb 06 05:46:06 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 05:55:02 server2 corosync[1689]: [TOTEM ] Retransmit List: 2aede

Feb 06 05:59:14 server2 corosync[1689]: [TOTEM ] Retransmit List: 2b56e

Feb 06 06:01:06 server2 pmxcfs[1606]: [status] notice: received log

Feb 06 06:03:47 server2 corosync[1689]: [TOTEM ] Retransmit List: 2bc87

Feb 06 06:07:36 server2 corosync[1689]: [TOTEM ] Retransmit List: 2c28e

Feb 06 06:09:26 server2 corosync[1689]: [TOTEM ] Retransmit List: 2c56f

-- Reboot --
 
Last edited:
seems issue was relating to ceph. not sure what the actual solution was, but once i rever to normal zfs storage without ceph , i have not had any reboot for weeks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!