Proxmox GUI + SSH unreachable

anthony.m · Dec 1, 2024

Hello,

I somehow broke my proxmox.
It was unresponsive, so I restarted the appliance and then it was worse

The web GUI loads intermittently but the content never does. (I get connection resets)
If I try to connect using ssh I get

$ ssh root@192.168.1.25
kex_exchange_identification: read: Software caused connection abort
banner exchange: Connection to 192.168.1.25 port 22: Software caused connection abort

Sometimes if I reboot the sshd service it works

If I do ip a, I get this

Code:

root@pve:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether d8:9e:f3:86:86:eb brd ff:ff:ff:ff:ff:ff
4: tap104i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 12:64:3f:28:85:f0 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1064:3fff:fe28:85f0/64 scope link
       valid_lft forever preferred_lft forever
7: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d8:9e:f3:86:86:eb brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.25/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::da9e:f3ff:fe86:86eb/64 scope link
       valid_lft forever preferred_lft forever

The vmbr0 match the ip in /etc/hosts

Code:

root@pve:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.25 pve.home.internal pve

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

I have the google DNS setup, the ping works and I have Internet.

I looked into journals for pvecluster, pveproxy and pvedaemon without seeing much.
Managing the cluster using the cli seem fine, I tried to delete a lxc container using pct with success.

I'm at a bit of a loss on what to investigate next, I would appreciate any suggestions. (This is a homelab to learn the technology and run my home labs, hence the lack of susbcription)

BobhWasatch · Dec 1, 2024

Maybe your rootfs is full. Check the output of "df".

anthony.m · Dec 1, 2024

Hello,

Code:

root@pve:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   16G     0   16G   0% /dev
tmpfs                 3.2G  2.3M  3.2G   1% /run
/dev/mapper/pve-root   39G  6.3G   31G  17% /
tmpfs                  16G   31M   16G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
efivarfs              256K  128K  124K  51% /sys/firmware/efi/efivars
/dev/sda2            1022M   12M 1011M   2% /boot/efi
tmpfs                 3.2G     0  3.2G   0% /run/user/0
/dev/fuse             128M   20K  128M   1% /etc/pve

It seems ok to me.
A new interesting element is that vms running in the proxmox are also unreachable :|
So I really feel like something exploded within the network stack

BobhWasatch · Dec 1, 2024

Well, did you check the obvious things?

Did you make any firewall changes?
Update anything?
Install any new software (like for example fail2ban)?
Any changes to /etc/network/interfaces?
Changes to your gateway router configuration?
Swapped Ethernet cables?

ETA: Also, logging in with ssh verbose logging (ssh -vv) might provide clues. And don't forget to check if you changed any of the above things on the client machine you are using. And check the journal (journalctl -b) around the time of logging in.

anthony.m · Dec 1, 2024

I tried to do what I thought obvious, the only thing I can't do is switch the NIC, because the machine only has one and I don't have any ethernet dongle handy.

Did you make any firewall changes?
- No, and I didn't even enable the firewall
Yes, I updated the distribution
I installed htop and vim but that's it
I tried setting a new static ip in /etc/network/interfaces and matched it in /etc/hosts and rebooted the proxmox but without success
I didn't change anything on my gateway router
I tried swapping ethernet cables and swapping port on the router, without success
- Both ports work with other devices

BobhWasatch · Dec 1, 2024

Also check permissions on root's (presumably you are logging in as root) .ssh directory and the files inside. The directory needs to be rwxr-xr-x and the private key rw-------.

anthony.m · Dec 1, 2024

Code:

root@pve:~# ls -la .ssh
total 20
drwx------ 2 root root 4096 Mar 22  2024 .
drwx------ 5 root root 4096 Dec  1 21:42 ..
lrwxrwxrwx 1 root root   29 Mar 22  2024 authorized_keys -> /etc/pve/priv/authorized_keys
-rw-r----- 1 root root  117 Mar 22  2024 config
-rw------- 1 root root 1811 Mar 22  2024 id_rsa
-rw-r--r-- 1 root root  390 Mar 22  2024 id_rsa.pub

The permission are not exactly that, should I adjust them ?

BobhWasatch · Dec 1, 2024

That is a bit more strict than required but should be ok. Seems odd that you don't have a known_hosts file. Can you create/delete files in .ssh (touch file; rm file).

anthony.m · Dec 1, 2024

Yes it works

BobhWasatch · Dec 1, 2024

Try "ssh localhost" to log into the machine from itself. If that works, try logging in from your regular client using "ssh -vv" to see if there are any clues in the debug output. Likewise, you can look at the journal since the last boot (journalctl -b) to see if there are any ssh related errors or warnings.

anthony.m · Dec 1, 2024

Code:

$ ssh -vv root@192.168.1.25
OpenSSH_9.3p1, OpenSSL 3.1.1 30 May 2023
debug1: Reading configuration data /c/Users/me/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug2: resolve_canonicalize: hostname 192.168.1.25 is address
debug1: Connecting to 192.168.1.25 [192.168.1.25] port 22.
debug1: Connection established.
debug1: identity file /c/Users/me/.ssh/id_rsa type -1
debug1: identity file /c/Users/me/.ssh/id_rsa-cert type -1
debug1: identity file /c/Users/me/.ssh/id_ecdsa type -1
debug1: identity file /c/Users/me/.ssh/id_ecdsa-cert type -1
debug1: identity file /c/Users/me/.ssh/id_ecdsa_sk type -1
debug1: identity file /c/Users/me/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /c/Users/me/.ssh/id_ed25519 type 3
debug1: identity file /c/Users/me/.ssh/id_ed25519-cert type -1
debug1: identity file /c/Users/me/.ssh/id_ed25519_sk type -1
debug1: identity file /c/Users/me/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /c/Users/me/.ssh/id_xmss type -1
debug1: identity file /c/Users/me/.ssh/id_xmss-cert type -1
debug1: identity file /c/Users/me/.ssh/id_dsa type -1
debug1: identity file /c/Users/me/.ssh/id_dsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_9.3
kex_exchange_identification: read: Software caused connection abort
banner exchange: Connection to 192.168.1.25 port 22: Software caused connection abort

No interesting informations :|

BobhWasatch · Dec 1, 2024

Please check the journal (journalctl -f) while trying to log in. Any errors there?

anthony.m · Dec 1, 2024

I just retried to add more v and it worked.

In journalctl, the only thing I see is

Code:

Dec 01 20:48:11 pve dmeventd[403]: WARNING: Thin pool pve-data-tpool data is now 81.45% full.

But I don't believe it's an issue.

I also have a bunch of

Code:

Dec 01 21:09:40 pve pveproxy[1027]: detected empty handle

BobhWasatch · Dec 1, 2024

The only other thing I can think of right now is that maybe there is another device on your network that has the same IP. You could check by directly connecting the client to your server.

anthony.m · Dec 1, 2024

I tried changing the static ip out of precaution, I also double-checked on my router and I didn't saw anything

anthony.m · Dec 1, 2024

When I try to play with the UI, I have timeouts pretty much everywhere and my journal is spammed by

Code:

Dec 01 23:04:22 pve pveproxy[6202]: detected empty handle
Dec 01 23:04:17 pve pveproxy[6203]: detected empty handle
Dec 01 23:04:17 pve pveproxy[6202]: detected empty handle
Dec 01 23:04:12 pve pveproxy[6203]: detected empty handle
Dec 01 23:04:12 pve pveproxy[6202]: detected empty handle
Dec 01 23:04:07 pve pveproxy[6203]: detected empty handle

Is there a way to check if pveproxy and pvedaemon are talking to each other ?

anthony.m · Dec 2, 2024

I keep digging without success.
Thanks a lot for your help @BobhWasatch, I learned quite a few things in the process.

I will now use this opportunity of a broken proxmox to test my backup strategy

Search

Search

Proxmox GUI + SSH unreachable

anthony.m

New Member

BobhWasatch

Famous Member

anthony.m

New Member

BobhWasatch

Famous Member

anthony.m

New Member

BobhWasatch

Famous Member

anthony.m

New Member

BobhWasatch

Famous Member

anthony.m

New Member

BobhWasatch

Famous Member

anthony.m

New Member

BobhWasatch

Famous Member

anthony.m

New Member

BobhWasatch

Famous Member

anthony.m

New Member

anthony.m

New Member

anthony.m

New Member

We value your privacy