Hi all,
I've run into a problem with my Proxmox server and am at a loss for what to do. It's been up and running in its current form for a couple of years now on the current hardware. Yesterday I powered it down to add more memory. Motherboard is either finicky or ram is bad and was only add one additional stick so I have 3 of 4 slots full until I can figure out if it's mobo or ram. Powered it back up and I can't access it. No :8006, no ssh, and no ping from either my workstation or my firewall. All VMs and containers inside seemingly fine and I can reach them on the lan without issue.
When I moved to this hardware I had added it as a cluster to my old hardware so I could move the VMs. When I rebooted yesterday I got a message the VMs couldn't start initially because they lost quorum. Fine, I've been planning on removing that older server anyways so I did that and they're now two independent proxmox servers. No more quorum errors and over the reboots I've done since yesterday, each time all of the VMs and containers start right up on reboot.
Usually connect to 192.168.1.202. That's on one of the ports in a dual-port 10g nic. You'll also see .203 below and that's the ethernet port on the motherboard I've tested in the past but usually leave disconnected.
Looks like there's a problem with zfs-import-scan? Would that prevent the node from being available to/from anything on the network? It seems like a zfs issue shouldn't take down the entire network for a node but in searching for a solution I do see a lot of people with seemingly unrelated problems taking out their gui. I also assume that having an odd number of sticks installed for ram wouldn't break connectivity to the node while the VMs/containers continue to be reachable.
I wrote out the results of what seems to be the usual suggestions or requests for information to a thumb drive so I could post it here. I hit the character limit trying to add them so I'll add some here and some in a reply. I have two zfs pools - one that's just two nvme drives and one that's a bunch of hard disks. The nvme one doesn't currently contain anything because I had a syslog vm setup on there and it ate away the TBW limits of the drives pretty quickly so they now show smart errors of almost 200% TBW usage. At some point I'll destroy that pool or replace the drives, but there's currently nothing saved inside of them as far as I remember.
pveversion -v
systemctl list-units --failed
ip addr
cat /etc/network/interfaces
cat /etc/hosts
hostname -A
hostname -I
ip -c a
systemctl is-enabled networking
ip route
ip link
I've run into a problem with my Proxmox server and am at a loss for what to do. It's been up and running in its current form for a couple of years now on the current hardware. Yesterday I powered it down to add more memory. Motherboard is either finicky or ram is bad and was only add one additional stick so I have 3 of 4 slots full until I can figure out if it's mobo or ram. Powered it back up and I can't access it. No :8006, no ssh, and no ping from either my workstation or my firewall. All VMs and containers inside seemingly fine and I can reach them on the lan without issue.
When I moved to this hardware I had added it as a cluster to my old hardware so I could move the VMs. When I rebooted yesterday I got a message the VMs couldn't start initially because they lost quorum. Fine, I've been planning on removing that older server anyways so I did that and they're now two independent proxmox servers. No more quorum errors and over the reboots I've done since yesterday, each time all of the VMs and containers start right up on reboot.
Usually connect to 192.168.1.202. That's on one of the ports in a dual-port 10g nic. You'll also see .203 below and that's the ethernet port on the motherboard I've tested in the past but usually leave disconnected.
Looks like there's a problem with zfs-import-scan? Would that prevent the node from being available to/from anything on the network? It seems like a zfs issue shouldn't take down the entire network for a node but in searching for a solution I do see a lot of people with seemingly unrelated problems taking out their gui. I also assume that having an odd number of sticks installed for ram wouldn't break connectivity to the node while the VMs/containers continue to be reachable.
I wrote out the results of what seems to be the usual suggestions or requests for information to a thumb drive so I could post it here. I hit the character limit trying to add them so I'll add some here and some in a reply. I have two zfs pools - one that's just two nvme drives and one that's a bunch of hard disks. The nvme one doesn't currently contain anything because I had a syslog vm setup on there and it ate away the TBW limits of the drives pretty quickly so they now show smart errors of almost 200% TBW usage. At some point I'll destroy that pool or replace the drives, but there's currently nothing saved inside of them as far as I remember.
pveversion -v
Code:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-6 (running version: 7.1-6/4e61e21c)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-3
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
systemctl list-units --failed
Code:
UNIT LOAD ACTIVE SUB DESCRIPTION
● zfs-import-scan.service loaded failed failed Import ZFS pools by device scanning
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
ip addr
Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp6s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
link/ether 7c:10:c9:41:02:37 brd ff:ff:ff:ff:ff:ff
3: enp1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr2 state UP group default qlen 1000
link/ether a0:36:9f:21:ff:ac brd ff:ff:ff:ff:ff:ff
4: enp1s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr1 state UP group default qlen 1000
link/ether a0:36:9f:21:ff:ae brd ff:ff:ff:ff:ff:ff
5: wlp5s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether b4:0e:de:7f:f7:29 brd ff:ff:ff:ff:ff:ff
6: vmbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 7c:10:c9:41:02:37 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.203/24 scope global vmbr0
valid_lft forever preferred_lft forever
7: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether a0:36:9f:21:ff:ae brd ff:ff:ff:ff:ff:ff
inet6 fe80::a236:9fff:fe21:ffae/64 scope link
valid_lft forever preferred_lft forever
8: vmbr2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether a0:36:9f:21:ff:ac brd ff:ff:ff:ff:ff:ff
inet 192.168.1.202/24 scope global vmbr2
valid_lft forever preferred_lft forever
inet6 fe80::a236:9fff:fe21:ffac/64 scope link
valid_lft forever preferred_lft forever
9: veth107i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state UP group default qlen 1000
link/ether fe:91:31:e5:55:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
10: veth104i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state UP group default qlen 1000
link/ether fe:4d:03:66:fe:25 brd ff:ff:ff:ff:ff:ff link-netnsid 1
11: tap106i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr2 state UNKNOWN group default qlen 1000
link/ether de:a4:8a:a4:21:3f brd ff:ff:ff:ff:ff:ff
12: tap106i1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr1 state UNKNOWN group default qlen 1000
link/ether 5e:53:02:1f:f7:90 brd ff:ff:ff:ff:ff:ff
13: veth102i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state UP group default qlen 1000
link/ether fe:ef:d3:7b:70:68 brd ff:ff:ff:ff:ff:ff link-netnsid 2
14: tap105i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr2 state UNKNOWN group default qlen 1000
link/ether 32:c6:06:90:e2:26 brd ff:ff:ff:ff:ff:ff
cat /etc/network/interfaces
Code:
auto lo
iface lo inet loopback
iface enp6s0 inet manual
#Mobo port
iface enp1s0f0 inet manual
#Inside 10g port
iface enp1s0f1 inet manual
#Outside 10g port
auto vmbr0
iface vmbr0 inet static
address 192.168.1.203/24
bridge-ports enp6s0
bridge-stp off
bridge-fd 0
iface wlp5s0 inet manual
auto vmbr1
iface vmbr1 inet manual
bridge-ports enp1s0f1
bridge-stp off
bridge-fd 0
auto vmbr2
iface vmbr2 inet static
address 192.168.1.202/24
gateway 192.168.1.1
bridge-ports enp1s0f0
bridge-stp off
bridge-fd 0
cat /etc/hosts
Code:
127.0.0.1 localhost.localdomain localhost
192.168.1.202 proxmox.local proxmox
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
hostname -A
Code:
proxmox.local
hostname -I
Code:
192.168.1.203 192.168.1.202
ip -c a
Code:
1: [36mlo: [0m<LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback [33m00:00:00:00:00:00[0m brd [33m00:00:00:00:00:00[0m
inet [35m127.0.0.1[0m/8 scope host lo
valid_lft forever preferred_lft forever
inet6 [34m::1[0m/128 scope host
valid_lft forever preferred_lft forever
2: [36menp6s0: [0m<NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state [31mDOWN [0mgroup default qlen 1000
link/ether [33m7c:10:c9:41:02:37[0m brd [33mff:ff:ff:ff:ff:ff[0m
3: [36menp1s0f0: [0m<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr2 state [32mUP [0mgroup default qlen 1000
link/ether [33ma0:36:9f:21:ff:ac[0m brd [33mff:ff:ff:ff:ff:ff[0m
4: [36menp1s0f1: [0m<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr1 state [32mUP [0mgroup default qlen 1000
link/ether [33ma0:36:9f:21:ff:ae[0m brd [33mff:ff:ff:ff:ff:ff[0m
5: [36mwlp5s0: [0m<BROADCAST,MULTICAST> mtu 1500 qdisc noop state [31mDOWN [0mgroup default qlen 1000
link/ether [33mb4:0e:de:7f:f7:29[0m brd [33mff:ff:ff:ff:ff:ff[0m
6: [36mvmbr0: [0m<NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state [31mDOWN [0mgroup default qlen 1000
link/ether [33m7c:10:c9:41:02:37[0m brd [33mff:ff:ff:ff:ff:ff[0m
inet [35m192.168.1.203[0m/24 scope global vmbr0
valid_lft forever preferred_lft forever
7: [36mvmbr1: [0m<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state [32mUP [0mgroup default qlen 1000
link/ether [33ma0:36:9f:21:ff:ae[0m brd [33mff:ff:ff:ff:ff:ff[0m
inet6 [34mfe80::a236:9fff:fe21:ffae[0m/64 scope link
valid_lft forever preferred_lft forever
8: [36mvmbr2: [0m<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state [32mUP [0mgroup default qlen 1000
link/ether [33ma0:36:9f:21:ff:ac[0m brd [33mff:ff:ff:ff:ff:ff[0m
inet [35m192.168.1.202[0m/24 scope global vmbr2
valid_lft forever preferred_lft forever
inet6 [34mfe80::a236:9fff:fe21:ffac[0m/64 scope link
valid_lft forever preferred_lft forever
9: [36mveth107i0@if2: [0m<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state [32mUP [0mgroup default qlen 1000
link/ether [33mfe:91:31:e5:55:db[0m brd [33mff:ff:ff:ff:ff:ff[0m link-netnsid 0
10: [36mveth104i0@if2: [0m<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state [32mUP [0mgroup default qlen 1000
link/ether [33mfe:4d:03:66:fe:25[0m brd [33mff:ff:ff:ff:ff:ff[0m link-netnsid 1
11: [36mtap106i0: [0m<BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr2 state UNKNOWN group default qlen 1000
link/ether [33mde:a4:8a:a4:21:3f[0m brd [33mff:ff:ff:ff:ff:ff[0m
12: [36mtap106i1: [0m<BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr1 state UNKNOWN group default qlen 1000
link/ether [33m5e:53:02:1f:f7:90[0m brd [33mff:ff:ff:ff:ff:ff[0m
13: [36mveth102i0@if2: [0m<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state [32mUP [0mgroup default qlen 1000
link/ether [33mfe:ef:d3:7b:70:68[0m brd [33mff:ff:ff:ff:ff:ff[0m link-netnsid 2
14: [36mtap105i0: [0m<BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr2 state UNKNOWN group default qlen 1000
link/ether [33m32:c6:06:90:e2:26[0m brd [33mff:ff:ff:ff:ff:ff[0m
systemctl is-enabled networking
Code:
enabled
ip route
Code:
default via 192.168.1.1 dev vmbr2 proto kernel onlink
192.168.1.0/24 dev vmbr0 proto kernel scope link src 192.168.1.203 linkdown
192.168.1.0/24 dev vmbr2 proto kernel scope link src 192.168.1.202
ip link
Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp6s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN mode DEFAULT group default qlen 1000
link/ether 7c:10:c9:41:02:37 brd ff:ff:ff:ff:ff:ff
3: enp1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr2 state UP mode DEFAULT group default qlen 1000
link/ether a0:36:9f:21:ff:ac brd ff:ff:ff:ff:ff:ff
4: enp1s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr1 state UP mode DEFAULT group default qlen 1000
link/ether a0:36:9f:21:ff:ae brd ff:ff:ff:ff:ff:ff
5: wlp5s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether b4:0e:de:7f:f7:29 brd ff:ff:ff:ff:ff:ff
6: vmbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/ether 7c:10:c9:41:02:37 brd ff:ff:ff:ff:ff:ff
7: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether a0:36:9f:21:ff:ae brd ff:ff:ff:ff:ff:ff
8: vmbr2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether a0:36:9f:21:ff:ac brd ff:ff:ff:ff:ff:ff
9: veth107i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state UP mode DEFAULT group default qlen 1000
link/ether fe:91:31:e5:55:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
10: veth104i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state UP mode DEFAULT group default qlen 1000
link/ether fe:4d:03:66:fe:25 brd ff:ff:ff:ff:ff:ff link-netnsid 1
11: tap106i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr2 state UNKNOWN mode DEFAULT group default qlen 1000
link/ether de:a4:8a:a4:21:3f brd ff:ff:ff:ff:ff:ff
12: tap106i1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr1 state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 5e:53:02:1f:f7:90 brd ff:ff:ff:ff:ff:ff
13: veth102i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state UP mode DEFAULT group default qlen 1000
link/ether fe:ef:d3:7b:70:68 brd ff:ff:ff:ff:ff:ff link-netnsid 2
14: tap105i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr2 state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 32:c6:06:90:e2:26 brd ff:ff:ff:ff:ff:ff