SOLVED: Am not a smart person. I gave node 4 the same IP as another much older VM and didn't realize because I use FQDNs within my network.
This is a homelab.
After I upgraded all four servers in my pool to 7 I have had a couple of issues.
one is described here -> https://forum.proxmox.com/threads/kernel-panic-whole-server-crashes-about-every-day.91803/
The other is I will randomly lose SSH connectivity with the node, but all the VMs on the node don't lose network connectivity.
I have replaced the ethernet cable with a known good cable, and tried different ports on the switch.
I just caught this happening and the syslog output is below
This is a homelab.
After I upgraded all four servers in my pool to 7 I have had a couple of issues.
one is described here -> https://forum.proxmox.com/threads/kernel-panic-whole-server-crashes-about-every-day.91803/
The other is I will randomly lose SSH connectivity with the node, but all the VMs on the node don't lose network connectivity.
I have replaced the ethernet cable with a known good cable, and tried different ports on the switch.
I just caught this happening and the syslog output is below
Code:
Jul 30 00:32:01 pmx4 systemd[1]: Finished Proxmox VE replication runner.
Jul 30 00:32:13 pmx4 pvestatd[5587]: status update time (10.286 seconds)
Jul 30 00:32:18 pmx4 pvestatd[5587]: status update time (5.282 seconds)
Jul 30 00:32:21 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:32:23 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:32:36 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:32:36 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:32:37 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:32:41 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:32:42 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:32:50 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:32:51 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:33:00 pmx4 systemd[1]: Starting Proxmox VE replication runner...
Jul 30 00:33:01 pmx4 corosync[5559]: [KNET ] link: host: 3 link: 0 is down
Jul 30 00:33:01 pmx4 corosync[5559]: [KNET ] link: host: 2 link: 0 is down
Jul 30 00:33:01 pmx4 corosync[5559]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jul 30 00:33:01 pmx4 corosync[5559]: [KNET ] host: host: 3 has no active links
Jul 30 00:33:01 pmx4 corosync[5559]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jul 30 00:33:01 pmx4 corosync[5559]: [KNET ] host: host: 2 has no active links
Jul 30 00:33:02 pmx4 corosync[5559]: [TOTEM ] Token has not been received in 3225 ms
Jul 30 00:33:03 pmx4 corosync[5559]: [KNET ] rx: host: 2 link: 0 is up
Jul 30 00:33:03 pmx4 corosync[5559]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jul 30 00:33:04 pmx4 corosync[5559]: [TOTEM ] A processor failed, forming new configuration: token timed out (4300ms), waiting 5160ms for consensus.
Jul 30 00:33:10 pmx4 corosync[5559]: [KNET ] link: host: 2 link: 0 is down
Jul 30 00:33:10 pmx4 corosync[5559]: [KNET ] link: host: 1 link: 0 is down
Jul 30 00:33:10 pmx4 corosync[5559]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jul 30 00:33:10 pmx4 corosync[5559]: [KNET ] host: host: 2 has no active links
Jul 30 00:33:10 pmx4 corosync[5559]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Jul 30 00:33:10 pmx4 corosync[5559]: [KNET ] host: host: 1 has no active links
Jul 30 00:33:14 pmx4 corosync[5559]: [QUORUM] Sync members[1]: 4
Jul 30 00:33:14 pmx4 corosync[5559]: [QUORUM] Sync left[3]: 1 2 3
Jul 30 00:33:14 pmx4 corosync[5559]: [TOTEM ] A new membership (4.7d92) was formed. Members left: 1 2 3
Jul 30 00:33:14 pmx4 corosync[5559]: [TOTEM ] Failed to receive the leave message. failed: 1 2 3
Jul 30 00:33:14 pmx4 pmxcfs[3544]: [dcdb] notice: members: 4/3544
Jul 30 00:33:14 pmx4 corosync[5559]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jul 30 00:33:14 pmx4 pmxcfs[3544]: [status] notice: members: 4/3544
Jul 30 00:33:14 pmx4 corosync[5559]: [QUORUM] Members[1]: 4
Jul 30 00:33:14 pmx4 corosync[5559]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 30 00:33:14 pmx4 pmxcfs[3544]: [status] notice: node lost quorum
Jul 30 00:33:14 pmx4 pmxcfs[3544]: [dcdb] crit: received write while not quorate - trigger resync
Jul 30 00:33:14 pmx4 pmxcfs[3544]: [dcdb] crit: leaving CPG group
Jul 30 00:33:14 pmx4 pve-ha-lrm[6663]: unable to write lrm status file - unable to open file '/etc/pve/nodes/pmx4/lrm_status.tmp.6663' - Permission denied
Jul 30 00:33:14 pmx4 pvesr[224330]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:33:15 pmx4 pmxcfs[3544]: [dcdb] notice: start cluster connection
Jul 30 00:33:15 pmx4 pmxcfs[3544]: [dcdb] crit: cpg_join failed: 14
Jul 30 00:33:15 pmx4 pmxcfs[3544]: [dcdb] crit: can't initialize service
Jul 30 00:33:15 pmx4 pvesr[224330]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:33:16 pmx4 pvesr[224330]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:33:17 pmx4 pvesr[224330]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:33:18 pmx4 pvesr[224330]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:33:19 pmx4 pvesr[224330]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:33:20 pmx4 pvesr[224330]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:33:21 pmx4 pmxcfs[3544]: [dcdb] notice: members: 4/3544
Jul 30 00:33:21 pmx4 pmxcfs[3544]: [dcdb] notice: all data is up to date
Jul 30 00:33:21 pmx4 pvesr[224330]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:33:22 pmx4 pvesr[224330]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:33:23 pmx4 pvesr[224330]: cfs-lock 'file-replication_cfg' error: no quorum!
Jul 30 00:33:23 pmx4 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Jul 30 00:33:23 pmx4 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jul 30 00:33:23 pmx4 systemd[1]: Failed to start Proxmox VE replication runner.
Jul 30 00:33:24 pmx4 pvestatd[5587]: pbs1: error fetching datastores - 500 Can't connect to pbs.thisisafakehostname.com:8007 (Connection timed out)
Jul 30 00:33:25 pmx4 pvestatd[5587]: status update time (21.209 seconds)
Jul 30 00:33:32 pmx4 pvestatd[5587]: pbs1: error fetching datastores - 500 Can't connect to pbs.thisisafakehostname.com:8007 (Connection timed out)
Jul 30 00:33:32 pmx4 pvestatd[5587]: status update time (7.227 seconds)
Jul 30 00:33:42 pmx4 pvestatd[5587]: pbs1: error fetching datastores - 500 Can't connect to pbs.thisisafakehostname.com:8007 (Connection timed out)
Jul 30 00:33:42 pmx4 pvestatd[5587]: status update time (7.216 seconds)
Jul 30 00:33:46 pmx4 corosync[5559]: [KNET ] rx: host: 3 link: 0 is up
Jul 30 00:33:46 pmx4 corosync[5559]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Jul 30 00:33:52 pmx4 corosync[5559]: [QUORUM] Sync members[1]: 4
Jul 30 00:33:52 pmx4 corosync[5559]: [TOTEM ] A new membership (4.7d9a) was formed. Members
Jul 30 00:33:52 pmx4 corosync[5559]: [QUORUM] Members[1]: 4
Jul 30 00:33:52 pmx4 corosync[5559]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 30 00:33:52 pmx4 corosync[5559]: [KNET ] rx: host: 2 link: 0 is up
Jul 30 00:33:52 pmx4 corosync[5559]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jul 30 00:33:57 pmx4 corosync[5559]: [QUORUM] Sync members[1]: 4
Jul 30 00:33:57 pmx4 corosync[5559]: [TOTEM ] A new membership (4.7d9e) was formed. Members
Jul 30 00:33:57 pmx4 corosync[5559]: [QUORUM] Members[1]: 4
Jul 30 00:33:57 pmx4 corosync[5559]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 30 00:34:00 pmx4 systemd[1]: Starting Proxmox VE replication runner...
Jul 30 00:34:01 pmx4 pvesr[224528]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:34:02 pmx4 pvesr[224528]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:34:03 pmx4 corosync[5559]: [QUORUM] Sync members[1]: 4
Jul 30 00:34:03 pmx4 corosync[5559]: [TOTEM ] A new membership (4.7da2) was formed. Members
Jul 30 00:34:03 pmx4 corosync[5559]: [QUORUM] Members[1]: 4
Jul 30 00:34:03 pmx4 corosync[5559]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 30 00:34:03 pmx4 pvesr[224528]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:34:04 pmx4 pvesr[224528]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:34:05 pmx4 pvesr[224528]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:34:06 pmx4 pvesr[224528]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:34:07 pmx4 pvesr[224528]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:34:08 pmx4 corosync[5559]: [QUORUM] Sync members[1]: 4
Jul 30 00:34:08 pmx4 corosync[5559]: [TOTEM ] A new membership (4.7da6) was formed. Members
Jul 30 00:34:08 pmx4 corosync[5559]: [QUORUM] Members[1]: 4
Jul 30 00:34:08 pmx4 corosync[5559]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 30 00:34:08 pmx4 pvesr[224528]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:34:09 pmx4 pvesr[224528]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 30 00:34:10 pmx4 pvesr[224528]: cfs-lock 'file-replication_cfg' error: no quorum!
Jul 30 00:34:10 pmx4 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Jul 30 00:34:10 pmx4 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jul 30 00:34:10 pmx4 systemd[1]: Failed to start Proxmox VE replication runner.
Jul 30 00:34:13 pmx4 corosync[5559]: [QUORUM] Sync members[1]: 4
Jul 30 00:34:13 pmx4 corosync[5559]: [TOTEM ] A new membership (4.7daa) was formed. Members
Jul 30 00:34:13 pmx4 corosync[5559]: [QUORUM] Members[1]: 4
Jul 30 00:34:13 pmx4 corosync[5559]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 30 00:34:18 pmx4 corosync[5559]: [QUORUM] Sync members[1]: 4
Jul 30 00:34:18 pmx4 corosync[5559]: [TOTEM ] A new membership (4.7dae) was formed. Members
Jul 30 00:34:18 pmx4 corosync[5559]: [QUORUM] Members[1]: 4
Jul 30 00:34:18 pmx4 corosync[5559]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 30 00:34:23 pmx4 corosync[5559]: [QUORUM] Sync members[1]: 4
Jul 30 00:34:23 pmx4 corosync[5559]: [TOTEM ] A new membership (4.7db2) was formed. Members
Jul 30 00:34:23 pmx4 corosync[5559]: [QUORUM] Members[1]: 4
Jul 30 00:34:23 pmx4 corosync[5559]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 30 00:34:28 pmx4 corosync[5559]: [QUORUM] Sync members[1]: 4
Jul 30 00:34:28 pmx4 corosync[5559]: [TOTEM ] A new membership (4.7db6) was formed. Members
Jul 30 00:34:28 pmx4 corosync[5559]: [QUORUM] Members[1]: 4
Jul 30 00:34:28 pmx4 corosync[5559]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 30 00:34:30 pmx4 corosync[5559]: [KNET ] rx: host: 1 link: 0 is up
Jul 30 00:34:30 pmx4 corosync[5559]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Jul 30 00:34:30 pmx4 corosync[5559]: [QUORUM] Sync members[4]: 1 2 3 4
Jul 30 00:34:30 pmx4 corosync[5559]: [QUORUM] Sync joined[3]: 1 2 3
Jul 30 00:34:30 pmx4 corosync[5559]: [TOTEM ] A new membership (1.7dba) was formed. Members joined: 1 2 3
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: members: 1/871, 2/3222, 3/980, 4/3544
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: starting data syncronisation
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: members: 1/871, 2/3222, 3/980, 4/3544
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: starting data syncronisation
Jul 30 00:34:30 pmx4 corosync[5559]: [QUORUM] This node is within the primary component and will provide service.
Jul 30 00:34:30 pmx4 corosync[5559]: [QUORUM] Members[4]: 1 2 3 4
Jul 30 00:34:30 pmx4 corosync[5559]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: node has quorum
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: received sync request (epoch 1/871/00000004)
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: received sync request (epoch 1/871/00000004)
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: received all states
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: leader is 1/871
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: synced members: 1/871, 2/3222, 3/980
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: waiting for updates from leader
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: dfsm_deliver_queue: queue length 3
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: received all states
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: all data is up to date
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: dfsm_deliver_queue: queue length 32
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [main] notice: ignore duplicate
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: update complete - trying to commit (got 4 inode updates)
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: all data is up to date
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [dcdb] notice: dfsm_deliver_sync_queue: queue length 3
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pmx3/pbs1: -1
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pmx/pbs1: -1
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pmx/local: -1
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pmx/local-lvm: -1
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pmx3/local-lvm: -1
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pmx3/local: -1
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pmx2/local-lvm: -1
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pmx2/local: -1
Jul 30 00:34:30 pmx4 pmxcfs[3544]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pmx2/pbs1: -1
Jul 30 00:34:32 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:34:33 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:34:48 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:34:50 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:35:00 pmx4 pmxcfs[3544]: [status] notice: received log
Jul 30 00:35:00 pmx4 systemd[1]: Starting Proxmox VE replication runner...
Jul 30 00:35:01 pmx4 systemd[1]: pvesr.service: Succeeded.
Last edited: