Hey yall,
my 2 Node Cluster started to do weird things I cant explain - at least for now.
All nodes run on the latest 7.3.4 release.
Serv1 seems to be offline? It can be pinged, and I can open terminals (even from VMs), view stats etc.
It is marked with a question mark from both webinterfaces. It also marks all VMs with a question mark.
No statistics are currently gathered (diagrams at stats are empty, even though CPU/RAM Readout bars are working)
"pvecm status" outputs following (executed on serv1):
It outputs the same on serv2.
However, these are the entries on serv1 (journalctl -xef):
It seems to have a network-issue where it disconnects or the protocol does weird stuff?
I cannot figure out what proxmox currently does, especially since the HA-Tab says "active" on both servers.
my 2 Node Cluster started to do weird things I cant explain - at least for now.
All nodes run on the latest 7.3.4 release.
Serv1 seems to be offline? It can be pinged, and I can open terminals (even from VMs), view stats etc.
It is marked with a question mark from both webinterfaces. It also marks all VMs with a question mark.
No statistics are currently gathered (diagrams at stats are empty, even though CPU/RAM Readout bars are working)
"pvecm status" outputs following (executed on serv1):
Code:
Cluster information
-------------------
Name: KOSMOS
Config Version: 8
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Fri Dec 30 15:27:22 2022
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1.169
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.5.11 (local)
0x00000002 1 192.168.5.12
It outputs the same on serv2.
However, these are the entries on serv1 (journalctl -xef):
Code:
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] IPC credentials authenticated (/dev/shm/qb-1832-102925-33-KsnQTJ/qb)
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] connecting to client [102925]
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] connection created
Dec 30 15:27:22 serv1 corosync[1832]: [CMAP ] lib_init_fn: conn=0x55b86e65f3b0
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] IPC credentials authenticated (/dev/shm/qb-1832-102925-34-Qcej7I/qb)
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] connecting to client [102925]
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] connection created
Dec 30 15:27:22 serv1 corosync[1832]: [QUORUM] lib_init_fn: conn=0x55b86e642810
Dec 30 15:27:22 serv1 corosync[1832]: [QUORUM] got quorum_type request on 0x55b86e642810
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] IPC credentials authenticated (/dev/shm/qb-1832-102925-35-UwcbFI/qb)
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] connecting to client [102925]
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] connection created
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] IPC credentials authenticated (/dev/shm/qb-1832-102925-36-MmGwjH/qb)
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] connecting to client [102925]
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] connection created
Dec 30 15:27:22 serv1 corosync[1832]: [QUORUM] got quorate request on 0x55b86e642810
Dec 30 15:27:22 serv1 corosync[1832]: [QUORUM] got trackstart request on 0x55b86e642810
Dec 30 15:27:22 serv1 corosync[1832]: [QUORUM] sending initial status to 0x55b86e642810
Dec 30 15:27:22 serv1 corosync[1832]: [QUORUM] sending nodelist notification to 0x55b86e642810, length = 72
Dec 30 15:27:22 serv1 corosync[1832]: [QUORUM] sending quorum notification to 0x55b86e642810, length = 56/64
Dec 30 15:27:22 serv1 corosync[1832]: [QUORUM] got trackstop request on 0x55b86e642810
Dec 30 15:27:22 serv1 corosync[1832]: [VOTEQ ] sending initial status to 0x55b86e64c030
Dec 30 15:27:22 serv1 corosync[1832]: [VOTEQ ] Sending nodelist callback. ring_id = 1.169
Dec 30 15:27:22 serv1 corosync[1832]: [VOTEQ ] Sending quorum callback, quorate = 1
Dec 30 15:27:22 serv1 corosync[1832]: [VOTEQ ] got getinfo request on 0x55b86e64c030 for node 1
Dec 30 15:27:22 serv1 corosync[1832]: [VOTEQ ] getinfo response error: 1
Dec 30 15:27:22 serv1 corosync[1832]: [VOTEQ ] got getinfo request on 0x55b86e64c030 for node 1
Dec 30 15:27:22 serv1 corosync[1832]: [VOTEQ ] getinfo response error: 1
Dec 30 15:27:22 serv1 corosync[1832]: [VOTEQ ] got getinfo request on 0x55b86e64c030 for node 2
Dec 30 15:27:22 serv1 corosync[1832]: [VOTEQ ] getinfo response error: 1
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] HUP conn (/dev/shm/qb-1832-102925-33-KsnQTJ/qb)
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] qb_ipcs_disconnect(/dev/shm/qb-1832-102925-33-KsnQTJ/qb) state:2
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] cs_ipcs_connection_closed()
Dec 30 15:27:22 serv1 corosync[1832]: [CMAP ] exit_fn for conn=0x55b86e65f3b0
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] cs_ipcs_connection_destroyed()
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-33-KsnQTJ/qb-response-cmap-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-33-KsnQTJ/qb-event-cmap-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-33-KsnQTJ/qb-request-cmap-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] HUP conn (/dev/shm/qb-1832-102925-34-Qcej7I/qb)
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] qb_ipcs_disconnect(/dev/shm/qb-1832-102925-34-Qcej7I/qb) state:2
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] cs_ipcs_connection_closed()
Dec 30 15:27:22 serv1 corosync[1832]: [QUORUM] lib_exit_fn: conn=0x55b86e642810
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] cs_ipcs_connection_destroyed()
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-34-Qcej7I/qb-response-quorum-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-34-Qcej7I/qb-event-quorum-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-34-Qcej7I/qb-request-quorum-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] HUP conn (/dev/shm/qb-1832-102925-35-UwcbFI/qb)
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] qb_ipcs_disconnect(/dev/shm/qb-1832-102925-35-UwcbFI/qb) state:2
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] cs_ipcs_connection_closed()
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] cs_ipcs_connection_destroyed()
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-35-UwcbFI/qb-response-cfg-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-35-UwcbFI/qb-event-cfg-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-35-UwcbFI/qb-request-cfg-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] HUP conn (/dev/shm/qb-1832-102925-36-MmGwjH/qb)
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] qb_ipcs_disconnect(/dev/shm/qb-1832-102925-36-MmGwjH/qb) state:2
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] cs_ipcs_connection_closed()
Dec 30 15:27:22 serv1 corosync[1832]: [MAIN ] cs_ipcs_connection_destroyed()
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-36-MmGwjH/qb-response-votequorum-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-36-MmGwjH/qb-event-votequorum-header
Dec 30 15:27:22 serv1 corosync[1832]: [QB ] Free'ing ringbuffer: /dev/shm/qb-1832-102925-36-MmGwjH/qb-request-votequorum-header
Dec 30 15:27:23 serv1 kernel: nfs: server 192.168.1.24 not responding, still trying
Dec 30 15:27:29 serv1 corosync[1832]: [KNET ] pmtud: Starting PMTUD for host: 2 link: 0
Dec 30 15:27:29 serv1 corosync[1832]: [KNET ] udp: detected kernel MTU: 1500
Dec 30 15:27:29 serv1 corosync[1832]: [KNET ] pmtud: PMTUD completed for host: 2 link: 0 current link mtu: 1397
Dec 30 15:27:59 serv1 corosync[1832]: [KNET ] pmtud: Starting PMTUD for host: 2 link: 0
Dec 30 15:27:59 serv1 corosync[1832]: [KNET ] udp: detected kernel MTU: 1500
Dec 30 15:27:59 serv1 corosync[1832]: [KNET ] pmtud: PMTUD completed for host: 2 link: 0 current link mtu: 1397
It seems to have a network-issue where it disconnects or the protocol does weird stuff?
I cannot figure out what proxmox currently does, especially since the HA-Tab says "active" on both servers.