Hallo zusammen,
vielleicht kann mir ja jemand weiterhelfen, ich habe einen Proxmox Cluster mit 3 nodes, einer davon (node02) hat jetzt das zweite Mal in 3 Monaten einen unerwarteten Neustart:
Hat einer eine Idee woran das liegen kann?
vielleicht kann mir ja jemand weiterhelfen, ich habe einen Proxmox Cluster mit 3 nodes, einer davon (node02) hat jetzt das zweite Mal in 3 Monaten einen unerwarteten Neustart:
Code:
Jan 30 03:01:21 node02 sshd[2651334]: Accepted publickey for root from 192.168.100.10 port 59442 ssh2: RSA SHA256:htNllc9TyDIY0JCn3OAsmE7vIsgw/hwb1vD3DQGSMM4
Jan 30 03:01:21 node02 sshd[2651334]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Jan 30 03:01:21 node02 systemd-logind[1809]: New session 42818 of user root.
Jan 30 03:01:21 node02 systemd[1]: Started session-42818.scope - Session 42818 of User root.
Jan 30 03:01:21 node02 sshd[2651334]: pam_env(sshd:session): deprecated reading of user environment enabled
Jan 30 03:01:23 node02 sshd[2651334]: Received disconnect from 192.168.100.10 port 59442:11: disconnected by user
Jan 30 03:01:23 node02 sshd[2651334]: Disconnected from user root 192.168.100.10 port 59442
Jan 30 03:01:23 node02 sshd[2651334]: pam_unix(sshd:session): session closed for user root
Jan 30 03:01:23 node02 systemd[1]: session-42818.scope: Deactivated successfully.
Jan 30 03:01:23 node02 systemd-logind[1809]: Session 42818 logged out. Waiting for processes to exit.
Jan 30 03:01:23 node02 systemd-logind[1809]: Removed session 42818.
Jan 30 03:01:23 node02 sshd[2651394]: Accepted publickey for root from 192.168.100.10 port 59458 ssh2: RSA SHA256:htNllc9TyDIY0JCn3OAsmE7vIsgw/hwb1vD3DQGSMM4
Jan 30 03:01:23 node02 sshd[2651394]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Jan 30 03:01:23 node02 systemd-logind[1809]: New session 42819 of user root.
Jan 30 03:01:23 node02 systemd[1]: Started session-42819.scope - Session 42819 of User root.
Jan 30 03:01:23 node02 sshd[2651394]: pam_env(sshd:session): deprecated reading of user environment enabled
Jan 30 03:01:25 node02 sshd[2651394]: Received disconnect from 192.168.100.10 port 59458:11: disconnected by user
Jan 30 03:01:25 node02 sshd[2651394]: Disconnected from user root 192.168.100.10 port 59458
Jan 30 03:01:25 node02 sshd[2651394]: pam_unix(sshd:session): session closed for user root
Jan 30 03:01:25 node02 systemd[1]: session-42819.scope: Deactivated successfully.
Jan 30 03:01:25 node02 systemd[1]: session-42819.scope: Consumed 1.189s CPU time.
Jan 30 03:01:25 node02 systemd-logind[1809]: Session 42819 logged out. Waiting for processes to exit.
Jan 30 03:01:25 node02 systemd-logind[1809]: Removed session 42819.
Jan 30 03:01:35 node02 systemd[1]: Stopping user@0.service - User Manager for UID 0...
Jan 30 03:01:35 node02 systemd[2651017]: Activating special unit exit.target...
Jan 30 03:01:35 node02 systemd[2651017]: Stopped target default.target - Main User Target.
Jan 30 03:01:35 node02 systemd[2651017]: Stopped target basic.target - Basic System.
Jan 30 03:01:35 node02 systemd[2651017]: Stopped target paths.target - Paths.
Jan 30 03:01:35 node02 systemd[2651017]: Stopped target sockets.target - Sockets.
Jan 30 03:01:35 node02 systemd[2651017]: Stopped target timers.target - Timers.
Jan 30 03:01:35 node02 systemd[2651017]: Closed dirmngr.socket - GnuPG network certificate management daemon.
Jan 30 03:01:35 node02 systemd[2651017]: Closed gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jan 30 03:01:35 node02 systemd[2651017]: Closed gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Jan 30 03:01:35 node02 systemd[2651017]: Closed gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Jan 30 03:01:35 node02 systemd[2651017]: Closed gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Jan 30 03:01:35 node02 systemd[2651017]: Removed slice app.slice - User Application Slice.
Jan 30 03:01:35 node02 systemd[2651017]: Reached target shutdown.target - Shutdown.
Jan 30 03:01:35 node02 systemd[2651017]: Finished systemd-exit.service - Exit the Session.
Jan 30 03:01:35 node02 systemd[2651017]: Reached target exit.target - Exit the Session.
Jan 30 03:01:35 node02 systemd[1]: user@0.service: Deactivated successfully.
Jan 30 03:01:35 node02 systemd[1]: Stopped user@0.service - User Manager for UID 0.
Jan 30 03:01:35 node02 systemd[1]: Stopping user-runtime-dir@0.service - User Runtime Directory /run/user/0...
Jan 30 03:01:35 node02 systemd[1]: run-user-0.mount: Deactivated successfully.
Jan 30 03:01:35 node02 systemd[1]: user-runtime-dir@0.service: Deactivated successfully.
Jan 30 03:01:35 node02 systemd[1]: Stopped user-runtime-dir@0.service - User Runtime Directory /run/user/0.
Jan 30 03:01:35 node02 systemd[1]: Removed slice user-0.slice - User Slice of UID 0.
Jan 30 03:01:35 node02 systemd[1]: user-0.slice: Consumed 11.111s CPU time.
Jan 30 03:02:02 node02 kernel: ixgbe 0000:05:00.1 enp5s0f1: NIC Link is Down
Jan 30 03:02:02 node02 kernel: vmbr1: port 1(enp5s0f1) entered disabled state
Jan 30 03:02:03 node02 corosync[2263]: [KNET ] link: host: 3 link: 0 is down
Jan 30 03:02:03 node02 corosync[2263]: [KNET ] link: host: 1 link: 0 is down
Jan 30 03:02:03 node02 corosync[2263]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jan 30 03:02:03 node02 corosync[2263]: [KNET ] host: host: 3 has no active links
Jan 30 03:02:03 node02 corosync[2263]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Jan 30 03:02:03 node02 corosync[2263]: [KNET ] host: host: 1 has no active links
Jan 30 03:02:04 node02 corosync[2263]: [TOTEM ] Token has not been received in 2737 ms
Jan 30 03:02:05 node02 corosync[2263]: [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Jan 30 03:02:09 node02 kernel: ixgbe 0000:05:00.1 enp5s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Jan 30 03:02:09 node02 kernel: vmbr1: port 1(enp5s0f1) entered blocking state
Jan 30 03:02:09 node02 kernel: vmbr1: port 1(enp5s0f1) entered forwarding state
Jan 30 03:02:10 node02 corosync[2263]: [QUORUM] Sync members[1]: 2
Jan 30 03:02:10 node02 corosync[2263]: [QUORUM] Sync left[2]: 1 3
Jan 30 03:02:10 node02 corosync[2263]: [TOTEM ] A new membership (2.166) was formed. Members left: 1 3
Jan 30 03:02:10 node02 corosync[2263]: [TOTEM ] Failed to receive the leave message. failed: 1 3
Jan 30 03:02:10 node02 corosync[2263]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jan 30 03:02:10 node02 corosync[2263]: [QUORUM] Members[1]: 2
Jan 30 03:02:10 node02 pmxcfs[2167]: [dcdb] notice: members: 2/2167
Jan 30 03:02:10 node02 corosync[2263]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 30 03:02:10 node02 pmxcfs[2167]: [status] notice: node lost quorum
Jan 30 03:02:10 node02 pmxcfs[2167]: [status] notice: members: 2/2167
Jan 30 03:02:10 node02 pmxcfs[2167]: [dcdb] crit: received write while not quorate - trigger resync
Jan 30 03:02:10 node02 pmxcfs[2167]: [dcdb] crit: leaving CPG group
Jan 30 03:02:10 node02 pve-ha-lrm[2330]: lost lock 'ha_agent_node02_lock - cfs lock update failed - Operation not permitted
Jan 30 03:02:10 node02 pve-ha-lrm[2330]: status change active => lost_agent_lock
Jan 30 03:02:10 node02 pmxcfs[2167]: [dcdb] notice: start cluster connection
Jan 30 03:02:10 node02 pmxcfs[2167]: [dcdb] crit: cpg_join failed: 14
Jan 30 03:02:10 node02 pmxcfs[2167]: [dcdb] crit: can't initialize service
Jan 30 03:02:10 node02 pve-ha-crm[2318]: lost lock 'ha_manager_lock - cfs lock update failed - Device or resource busy
Jan 30 03:02:10 node02 pve-ha-crm[2318]: status change master => lost_manager_lock
Jan 30 03:02:10 node02 pve-ha-crm[2318]: watchdog closed (disabled)
Jan 30 03:02:10 node02 pve-ha-crm[2318]: status change lost_manager_lock => wait_for_quorum
Jan 30 03:02:16 node02 pmxcfs[2167]: [dcdb] notice: members: 2/2167
Jan 30 03:02:16 node02 pmxcfs[2167]: [dcdb] notice: all data is up to date
Jan 30 03:02:17 node02 kernel: ixgbe 0000:05:00.1 enp5s0f1: NIC Link is Down
Jan 30 03:02:17 node02 kernel: vmbr1: port 1(enp5s0f1) entered disabled state
Jan 30 03:02:30 node02 pvestatd[2283]: pbs01-node02: error fetching datastores - 500 Can't connect to +:8007 (Temporary failure in name resolution)
Jan 30 03:02:31 node02 pvestatd[2283]: status update time (20.278 seconds)
Jan 30 03:02:42 node02 kernel: ixgbe 0000:05:00.1 enp5s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Jan 30 03:02:42 node02 kernel: vmbr1: port 1(enp5s0f1) entered blocking state
Jan 30 03:02:42 node02 kernel: vmbr1: port 1(enp5s0f1) entered forwarding state
Jan 30 03:02:51 node02 pvestatd[2283]: pbs01-node02: error fetching datastores - 500 Can't connect to +:8007 (Temporary failure in name resolution)
Jan 30 03:02:51 node02 pvestatd[2283]: status update time (20.282 seconds)
Jan 30 03:02:51 node02 corosync[2263]: [KNET ] rx: host: 3 link: 0 is up
Jan 30 03:02:51 node02 corosync[2263]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Jan 30 03:02:51 node02 corosync[2263]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jan 30 03:02:51 node02 corosync[2263]: [KNET ] pmtud: Global data MTU changed to: 1397
Jan 30 03:02:51 node02 corosync[2263]: [KNET ] link: Resetting MTU for link 0 because host 1 joined
Jan 30 03:02:51 node02 corosync[2263]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Jan 30 03:02:51 node02 corosync[2263]: [KNET ] pmtud: Global data MTU changed to: 1397
Jan 30 03:02:51 node02 corosync[2263]: [QUORUM] Sync members[3]: 1 2 3
Jan 30 03:02:51 node02 corosync[2263]: [QUORUM] Sync joined[2]: 1 3
Jan 30 03:02:51 node02 corosync[2263]: [TOTEM ] A new membership (1.16e) was formed. Members joined: 1 3
Jan 30 03:02:51 node02 pmxcfs[2167]: [dcdb] notice: members: 1/2502, 2/2167, 3/931
Jan 30 03:02:51 node02 pmxcfs[2167]: [dcdb] notice: starting data syncronisation
Jan 30 03:02:51 node02 pmxcfs[2167]: [status] notice: members: 1/2502, 2/2167, 3/931
Jan 30 03:02:51 node02 pmxcfs[2167]: [status] notice: starting data syncronisation
Jan 30 03:02:51 node02 corosync[2263]: [QUORUM] This node is within the primary component and will provide service.
Jan 30 03:02:51 node02 corosync[2263]: [QUORUM] Members[3]: 1 2 3
Jan 30 03:02:51 node02 pmxcfs[2167]: [status] notice: node has quorum
Jan 30 03:02:51 node02 corosync[2263]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 30 03:02:52 node02 pmxcfs[2167]: [dcdb] notice: received sync request (epoch 1/2502/00000006)
Jan 30 03:02:52 node02 pmxcfs[2167]: [status] notice: received sync request (epoch 1/2502/00000004)
Jan 30 03:02:52 node02 pmxcfs[2167]: [dcdb] notice: received all states
Jan 30 03:02:52 node02 pmxcfs[2167]: [dcdb] notice: leader is 1/2502
Jan 30 03:02:52 node02 pmxcfs[2167]: [dcdb] notice: synced members: 1/2502, 3/931
Jan 30 03:02:52 node02 pmxcfs[2167]: [dcdb] notice: waiting for updates from leader
Jan 30 03:02:52 node02 pmxcfs[2167]: [dcdb] notice: dfsm_deliver_queue: queue length 4
Jan 30 03:02:52 node02 pmxcfs[2167]: [status] notice: received all states
Jan 30 03:02:52 node02 pmxcfs[2167]: [status] notice: all data is up to date
Jan 30 03:02:52 node02 pmxcfs[2167]: [dcdb] notice: update complete - trying to commit (got 4 inode updates)
Jan 30 03:02:52 node02 pmxcfs[2167]: [dcdb] notice: all data is up to date
Jan 30 03:02:52 node02 pmxcfs[2167]: [dcdb] notice: dfsm_deliver_sync_queue: queue length 4
Jan 30 03:02:53 node02 watchdog-mux[1810]: client watchdog expired - disable watchdog updates
Jan 30 03:02:55 node02 pve-ha-lrm[2330]: successfully acquired lock 'ha_agent_node02_lock'
Jan 30 03:02:55 node02 pve-ha-lrm[2330]: status change lost_agent_lock => active
Jan 30 03:02:55 node02 watchdog-mux[1810]: exit watchdog-mux with active connections
Jan 30 03:02:55 node02 systemd[1]: watchdog-mux.service: Deactivated successfully.
Jan 30 03:02:55 node02 systemd-journald[622]: Received client request to sync journal.
Jan 30 03:02:55 node02 kernel: watchdog: watchdog0: watchdog did not stop!
-- Reboot --
Hat einer eine Idee woran das liegen kann?
Last edited: