Proxmox GUI + SSH unreachable

anthony.m

New Member
Dec 1, 2024
10
0
1
Hello,

I somehow broke my proxmox.
It was unresponsive, so I restarted the appliance and then it was worse :(

The web GUI loads intermittently but the content never does. (I get connection resets)
If I try to connect using ssh I get

$ ssh root@192.168.1.25
kex_exchange_identification: read: Software caused connection abort
banner exchange: Connection to 192.168.1.25 port 22: Software caused connection abort

Sometimes if I reboot the sshd service it works

If I do ip a, I get this
Code:
root@pve:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether d8:9e:f3:86:86:eb brd ff:ff:ff:ff:ff:ff
4: tap104i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 12:64:3f:28:85:f0 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1064:3fff:fe28:85f0/64 scope link
       valid_lft forever preferred_lft forever
7: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d8:9e:f3:86:86:eb brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.25/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::da9e:f3ff:fe86:86eb/64 scope link
       valid_lft forever preferred_lft forever

The vmbr0 match the ip in /etc/hosts
Code:
root@pve:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.25 pve.home.internal pve

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

I have the google DNS setup, the ping works and I have Internet.

I looked into journals for pvecluster, pveproxy and pvedaemon without seeing much.
Managing the cluster using the cli seem fine, I tried to delete a lxc container using pct with success.

I'm at a bit of a loss on what to investigate next, I would appreciate any suggestions. (This is a homelab to learn the technology and run my home labs, hence the lack of susbcription)
 
Hello,

Code:
root@pve:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   16G     0   16G   0% /dev
tmpfs                 3.2G  2.3M  3.2G   1% /run
/dev/mapper/pve-root   39G  6.3G   31G  17% /
tmpfs                  16G   31M   16G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
efivarfs              256K  128K  124K  51% /sys/firmware/efi/efivars
/dev/sda2            1022M   12M 1011M   2% /boot/efi
tmpfs                 3.2G     0  3.2G   0% /run/user/0
/dev/fuse             128M   20K  128M   1% /etc/pve

It seems ok to me.
A new interesting element is that vms running in the proxmox are also unreachable :|
So I really feel like something exploded within the network stack
 
Well, did you check the obvious things?
  • Did you make any firewall changes?
  • Update anything?
  • Install any new software (like for example fail2ban)?
  • Any changes to /etc/network/interfaces?
  • Changes to your gateway router configuration?
  • Swapped Ethernet cables?
ETA: Also, logging in with ssh verbose logging (ssh -vv) might provide clues. And don't forget to check if you changed any of the above things on the client machine you are using. And check the journal (journalctl -b) around the time of logging in.
 
Last edited:
I tried to do what I thought obvious, the only thing I can't do is switch the NIC, because the machine only has one and I don't have any ethernet dongle handy.
  • Did you make any firewall changes?
    • No, and I didn't even enable the firewall
  • Yes, I updated the distribution
  • I installed htop and vim but that's it
  • I tried setting a new static ip in /etc/network/interfaces and matched it in /etc/hosts and rebooted the proxmox but without success
  • I didn't change anything on my gateway router
  • I tried swapping ethernet cables and swapping port on the router, without success
    • Both ports work with other devices
 
Code:
root@pve:~# ls -la .ssh
total 20
drwx------ 2 root root 4096 Mar 22  2024 .
drwx------ 5 root root 4096 Dec  1 21:42 ..
lrwxrwxrwx 1 root root   29 Mar 22  2024 authorized_keys -> /etc/pve/priv/authorized_keys
-rw-r----- 1 root root  117 Mar 22  2024 config
-rw------- 1 root root 1811 Mar 22  2024 id_rsa
-rw-r--r-- 1 root root  390 Mar 22  2024 id_rsa.pub

The permission are not exactly that, should I adjust them ?
 
Try "ssh localhost" to log into the machine from itself. If that works, try logging in from your regular client using "ssh -vv" to see if there are any clues in the debug output. Likewise, you can look at the journal since the last boot (journalctl -b) to see if there are any ssh related errors or warnings.
 
Code:
$ ssh -vv root@192.168.1.25
OpenSSH_9.3p1, OpenSSL 3.1.1 30 May 2023
debug1: Reading configuration data /c/Users/me/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug2: resolve_canonicalize: hostname 192.168.1.25 is address
debug1: Connecting to 192.168.1.25 [192.168.1.25] port 22.
debug1: Connection established.
debug1: identity file /c/Users/me/.ssh/id_rsa type -1
debug1: identity file /c/Users/me/.ssh/id_rsa-cert type -1
debug1: identity file /c/Users/me/.ssh/id_ecdsa type -1
debug1: identity file /c/Users/me/.ssh/id_ecdsa-cert type -1
debug1: identity file /c/Users/me/.ssh/id_ecdsa_sk type -1
debug1: identity file /c/Users/me/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /c/Users/me/.ssh/id_ed25519 type 3
debug1: identity file /c/Users/me/.ssh/id_ed25519-cert type -1
debug1: identity file /c/Users/me/.ssh/id_ed25519_sk type -1
debug1: identity file /c/Users/me/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /c/Users/me/.ssh/id_xmss type -1
debug1: identity file /c/Users/me/.ssh/id_xmss-cert type -1
debug1: identity file /c/Users/me/.ssh/id_dsa type -1
debug1: identity file /c/Users/me/.ssh/id_dsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_9.3
kex_exchange_identification: read: Software caused connection abort
banner exchange: Connection to 192.168.1.25 port 22: Software caused connection abort

No interesting informations :|
 
I just retried to add more v and it worked.

In journalctl, the only thing I see is
Code:
Dec 01 20:48:11 pve dmeventd[403]: WARNING: Thin pool pve-data-tpool data is now 81.45% full.
But I don't believe it's an issue.

I also have a bunch of
Code:
Dec 01 21:09:40 pve pveproxy[1027]: detected empty handle
 
I tried changing the static ip out of precaution, I also double-checked on my router and I didn't saw anything
 
When I try to play with the UI, I have timeouts pretty much everywhere and my journal is spammed by

Code:
Dec 01 23:04:22 pve pveproxy[6202]: detected empty handle
Dec 01 23:04:17 pve pveproxy[6203]: detected empty handle
Dec 01 23:04:17 pve pveproxy[6202]: detected empty handle
Dec 01 23:04:12 pve pveproxy[6203]: detected empty handle
Dec 01 23:04:12 pve pveproxy[6202]: detected empty handle
Dec 01 23:04:07 pve pveproxy[6203]: detected empty handle

Is there a way to check if pveproxy and pvedaemon are talking to each other ?
 
I keep digging without success.
Thanks a lot for your help @BobhWasatch, I learned quite a few things in the process.

I will now use this opportunity of a broken proxmox to test my backup strategy :)