Hi all, long time reader first time poster. Thanks for all you do.
I've got a two node (named pxhpdesky and pve) homelab cluster. I recently switched ISPs and was forced to transition from PPPoE to DHCP. In the process I had to scrap my router configuration. Starting from scratch, I've done little more to my router than assign two static IPs for my proxmox nodes that match their previous addresses. That didn't seem to satisfy the cluster, as I've had all manner of issues. At times the web UI will be inaccessible. VMs and LXCs won't start at boot. When I can access the UI the nodes they will sometimes have grey question marks or red X when powered up. Most commands on CLI hang. Worst of all, there is so much going wrong I can't really identify an error to google dork on. The most common entry in journalctl I see is
If I try to reboot I get
I can ping one node from the other, both can ping the router, and both resolve google.com. I've tried shutting down everything on each node but the problems still existed without powered up VMs and LXCs. I haven't had a reason to dig into logs on non-proxmox devices to see if they're having connectivity issues as I've noticed no other problems in the house.
Here is the simple output of
Here are the /etc/hosts files from pxhpdesky (main) and pve:
Here are the results of
Here are the interface files for pxhpdesky and pve:
Here are abridged outputs for
Can anyone help me troubleshoot this?
I've got a two node (named pxhpdesky and pve) homelab cluster. I recently switched ISPs and was forced to transition from PPPoE to DHCP. In the process I had to scrap my router configuration. Starting from scratch, I've done little more to my router than assign two static IPs for my proxmox nodes that match their previous addresses. That didn't seem to satisfy the cluster, as I've had all manner of issues. At times the web UI will be inaccessible. VMs and LXCs won't start at boot. When I can access the UI the nodes they will sometimes have grey question marks or red X when powered up. Most commands on CLI hang. Worst of all, there is so much going wrong I can't really identify an error to google dork on. The most common entry in journalctl I see is
pve corosync[899]: [TOTEM ] Retransmit List: 14 15 1b 1f 21 22 27 29.
If I try to reboot I get
Code:
Failed to set wall message, ignoring: Transport endpoint is not connected
Call to Reboot failed: Transport endpoint is not connected
I can ping one node from the other, both can ping the router, and both resolve google.com. I've tried shutting down everything on each node but the problems still existed without powered up VMs and LXCs. I haven't had a reason to dig into logs on non-proxmox devices to see if they're having connectivity issues as I've noticed no other problems in the house.
Here is the simple output of
pveversion
(same for both):
Code:
pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.8.12-11-pve)
Here are the /etc/hosts files from pxhpdesky (main) and pve:
Code:
127.0.0.1 localhost.localdomain localhost
192.168.1.187 pxhpdesky.apra pxhpdesky
192.168.1.192 pve pve.beelink.s12
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
Code:
127.0.0.1 localhost.localdomain localhost
192.168.1.192 pve.beelink.s12 pve
192.168.1.187 pxhpdesky.apra pxhpdesky
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
Here are the results of
ip address
for pxhpdesky and pve:
Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
link/ether 80:e8:2c:d1:b7:47 brd ff:ff:ff:ff:ff:ff
3: wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 1c:bf:c0:85:de:69 brd ff:ff:ff:ff:ff:ff
4: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 80:e8:2c:d1:b7:47 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.187/24 scope global vmbr0
valid_lft forever preferred_lft forever
5: vmbr0.10@vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 80:e8:2c:d1:b7:47 brd ff:ff:ff:ff:ff:ff
inet 10.0.10.0/24 scope global vmbr0.10
valid_lft forever preferred_lft forever
6: tap102i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr102i0 state UNKNOWN group default qlen 1000
link/ether 06:74:02:9a:cb:4b brd ff:ff:ff:ff:ff:ff
7: fwbr102i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether a2:6d:4a:da:f7:a1 brd ff:ff:ff:ff:ff:ff
8: fwpr102p0@fwln102i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
link/ether 8a:5a:11:9c:58:eb brd ff:ff:ff:ff:ff:ff
9: fwln102i0@fwpr102p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr102i0 state UP group default qlen 1000
link/ether a2:6d:4a:da:f7:a1 brd ff:ff:ff:ff:ff:ff
Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
link/ether e8:ff:1e:d8:fc:7c brd ff:ff:ff:ff:ff:ff
3: wlo1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether e8:62:be:2f:ae:e5 brd ff:ff:ff:ff:ff:ff
altname wlp0s20f3
4: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether e8:ff:1e:d8:fc:7c brd ff:ff:ff:ff:ff:ff
inet 192.168.1.192/24 scope global vmbr0
valid_lft forever preferred_lft forever
inet6 fe80::eaff:1eff:fed8:fc7c/64 scope link
valid_lft forever preferred_lft forever
9: veth105i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr105i0 state UP group default qlen 1000
link/ether fe:7c:9a:de:70:4e brd ff:ff:ff:ff:ff:ff link-netnsid 0
10: fwbr105i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether aa:9e:f6:db:4e:e6 brd ff:ff:ff:ff:ff:ff
....
Here are the interface files for pxhpdesky and pve:
Code:
auto lo
iface lo inet loopback
iface enp2s0 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.168.1.187/24
gateway 192.168.1.1
bridge-ports enp2s0
bridge-stp off
bridge-fd 0
iface wlp3s0 inet manual
auto vmbr0.10
iface vmbr0.10 inet static
address 10.0.10.0/24
#edgerouter IoT vlan 10
#post-up iptables-restore < /etc/network/save-iptables
Code:
auto lo
iface lo inet loopback
iface enp1s0 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.168.1.192/24
gateway 192.168.1.1
bridge-ports enp1s0
bridge-stp off
bridge-fd 0
iface wlo1 inet manual
Here are abridged outputs for
journalctl
for pxhpdesky and pve:
Code:
Jun 12 06:18:10 pxhpdesky systemd[1]: Failed to start pveproxy.service - PVE API Proxy Server.
....
Jun 12 07:21:07 pxhpdesky corosync[1153]: [KNET ] link: host: 2 link: 0 is down Jun 12 07:21:07 pxhpdesky corosync[1153]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jun 12 07:21:07 pxhpdesky corosync[1153]: [KNET ] host: host: 2 has no active links Jun 12 07:21:08 pxhpdesky corosync[1153]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
Jun 12 07:21:08 pxhpdesky corosync[1153]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Jun 12 07:21:08 pxhpdesky corosync[1153]: [KNET ] pmtud: Global data MTU changed to: 1397
Jun 12 07:21:34 pxhpdesky corosync[1153]: [TOTEM ] FAILED TO RECEIVE
Jun 12 07:21:34 pxhpdesky corosync[1153]: [QUORUM] Sync members[1]: 1
Jun 12 07:21:34 pxhpdesky corosync[1153]: [QUORUM] Sync left[1]: 2
Jun 12 07:21:34 pxhpdesky corosync[1153]: [TOTEM ] A new membership (1.1613) was formed. Members left: 2
Jun 12 07:21:34 pxhpdesky corosync[1153]: [TOTEM ] Failed to receive the leave message. failed: 2
Jun 12 07:21:34 pxhpdesky corosync[1153]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jun 12 07:21:34 pxhpdesky corosync[1153]: [QUORUM] Members[1]: 1
Jun 12 07:21:34 pxhpdesky corosync[1153]: [MAIN ] Completed service synchronization, ready to provide service.
....
Jun 12 07:21:37 pxhpdesky pmxcfs[1069]: [dcdb] crit: cpg_send_message failed: 9
....
Jun 12 07:29:05 pxhpdesky corosync[1153]: [KNET ] link: host: 2 link: 0 is down
Jun 12 07:29:05 pxhpdesky corosync[1153]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jun 12 07:29:05 pxhpdesky corosync[1153]: [KNET ] host: host: 2 has no active links
Jun 12 07:29:06 pxhpdesky corosync[1153]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
Jun 12 07:29:06 pxhpdesky corosync[1153]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jun 12 07:29:06 pxhpdesky corosync[1153]: [KNET ] pmtud: Global data MTU changed to: 1397
Code:
Jun 12 07:36:24 pve corosync[899]: [KNET ] link: host: 1 link: 0 is down
Jun 12 07:36:24 pve corosync[899]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Jun 12 07:36:24 pve corosync[899]: [KNET ] host: host: 1 has no active links
Jun 12 07:36:24 pve corosync[899]: [KNET ] link: Resetting MTU for link 0 because host 1 joined
Jun 12 07:36:24 pve corosync[899]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Jun 12 07:36:24 pve corosync[899]: [KNET ] pmtud: Global data MTU changed to: 1397
....
Jun 12 07:37:14 pve corosync[899]: [TOTEM ] Retransmit List: 14 15 1b 1f 21 22 27 29
Jun 12 07:37:15 pve corosync[899]: [TOTEM ] Retransmit List: 14 15 1b 1f 21 22 27 29
Jun 12 07:37:15 pve corosync[899]: [TOTEM ] Retransmit List: 14 15 1b 1f 21 22 27 29
Jun 12 07:37:15 pve corosync[899]: [TOTEM ] Retransmit List: 14 15 1b 1f 21 22 27 29
Jun 12 07:37:16 pve corosync[899]: [TOTEM ] Retransmit List: 14 15 1b 1f 21 22 27 29
Jun 12 07:37:16 pve corosync[899]: [TOTEM ] Retransmit List: 14 15 1b 1f 21 22 27 29
Can anyone help me troubleshoot this?