[SOLVED] Connection failed (Error 500: hostname lookup failed) Out of space.

osrius

New Member
Aug 1, 2024
3
0
1
I've run into an issue where I can't connect to a node through the Proxmox interface because it's out of space. I was going to delete a test VM to free up space and allow me to reconfigure the backups to point to my NAS and off of the small run drive I have on this node. I'm having issues freeing up space without being able to access the node / backups in the Proxmox management portal. I can SSH, but may have borked the zfs pool to the point I can't access / remove files from the cli.

I'm hoping someone could help me find a way to get back into the management portal to reconfigure my backups and potentially nuke the node and rework the zfs pool to be correctly set up. I was good in that I have the backups, but was bad as I hadn't set up the backups to be on a different or multiple nodes. Ideally I'd be able to move the one VM that I care about off the node or pull it's backup over to a different node to restore from there while reconfiguring the full node.

Any suggestions would be very welcome.

I've checked the following resources and below are the troubleshooting steps I've taken.
https://forum.proxmox.com/threads/e...key-key_file-or-key-at-usr-share-perl5.48943/
https://forum.proxmox.com/threads/required-command-to-remove-the-zfs-snapshot-via-cli.111704/
https://technotes.seastrom.com/asse...-a-ZFS-Filesystem-that-is-100percent-Full.pdf
https://forum.proxmox.com/threads/no-space-left-on-device.77411/


UI is throwing
hostname lookup 'zodiac' failed - failed to get address info for: zodiac: Name or service not known (500)

I checked this thread
https://forum.proxmox.com/threads/e...key-key_file-or-key-at-usr-share-perl5.48943/

Logs show the hostname is resolving correctly

Bash:
root@zodiac:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.10.40.187 zodiac.lab.astrolab.dev zodiac

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Bash:
root@zodiac:~# cat /etc/hostname
zodiac

Bash:
root@zodiac:~# ping $(uname -n)
PING zodiac.lab.astrolab.dev (10.10.40.187) 56(84) bytes of data.
64 bytes from zodiac.lab.astrolab.dev (10.10.40.187): icmp_seq=1 ttl=64 time=0.027 ms
64 bytes from zodiac.lab.astrolab.dev (10.10.40.187): icmp_seq=2 ttl=64 time=0.022 ms
...
64 bytes from zodiac.lab.astrolab.dev (10.10.40.187): icmp_seq=7 ttl=64 time=0.037 ms

Checked systemctl It was running with degraded proxmox services
Bash:
root@zodiac:~# systemctl --failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● corosync.service            loaded failed failed Corosync Cluster Engine
● postfix@-.service           loaded failed failed Postfix Mail Transport Agent (instance -)
● pve-cluster.service         loaded failed failed The Proxmox VE cluster filesystem
● pve-firewall.service        loaded failed failed Proxmox VE firewall
● pve-guests.service          loaded failed failed PVE guests
● pve-ha-crm.service          loaded failed failed PVE Cluster HA Resource Manager Daemon
● pve-ha-lrm.service          loaded failed failed PVE Local HA Resource Manager Daemon
● pvescheduler.service        loaded failed failed Proxmox VE scheduler
● pvestatd.service            loaded failed failed PVE Status Daemon
● systemd-hostnamed.service   loaded failed failed Hostname Service
● systemd-random-seed.service loaded failed failed Load/Save Random Seed
● systemd-update-utmp.service loaded failed failed Record System Boot/Shutdown in UTMP

All of the failures were
zodiac pveproxy[1321]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 2025.
proxmox_firewall: error updating firewall rules: failed to read guest map from /etc/pve/.vmlist

systemctl reset-failed worked and all of the degraded services were loaded correctly
Bash:
root@zodiac:~# systemctl reset-failed
root@zodiac:~# systemctl status
● zodiac
    State: running
    Units: 390 loaded (incl. loaded aliases)
     Jobs: 0 queued
   Failed: 0 units
    Since: Wed 2024-07-31 13:52:19 HDT; 26min ago
  systemd: 252.26-1~deb12u2

Checked the PVE Cluster and the service is exited with an Exception
Bash:
root@zodiac:~# systemctl status -l pve-cluster
○ pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
     Active: inactive (dead) since Wed 2024-07-31 13:52:24 HDT; 30min ago
    Process: 1112 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)
        CPU: 7ms

Jul 31 13:52:24 zodiac systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Jul 31 13:52:24 zodiac systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 31 13:52:24 zodiac systemd[1]: pve-cluster.service: Start request repeated too quickly.
Jul 31 13:52:24 zodiac systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jul 31 13:52:24 zodiac systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.

Journal is showing the device is out of space
Bash:
Jul 31 13:52:24 zodiac systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 31 13:52:24 zodiac systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jul 31 13:52:24 zodiac pmxcfs[1112]: [main] notice: resolved node name 'zodiac' to '10.10.40.187' for default node IP address
Jul 31 13:52:24 zodiac pmxcfs[1112]: [main] notice: resolved node name 'zodiac' to '10.10.40.187' for default node IP address
Jul 31 13:52:24 zodiac pmxcfs[1112]: [database] crit: chmod failed: No space left on device
Jul 31 13:52:24 zodiac pmxcfs[1112]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Jul 31 13:52:24 zodiac pmxcfs[1112]: [main] notice: exit proxmox configuration filesystem (-1)
Jul 31 13:52:24 zodiac pmxcfs[1112]: [database] crit: chmod failed: No space left on device
Jul 31 13:52:24 zodiac pmxcfs[1112]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Jul 31 13:52:24 zodiac pmxcfs[1112]: [main] notice: exit proxmox configuration filesystem (-1)
Jul 31 13:52:24 zodiac systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Jul 31 13:52:24 zodiac systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jul 31 13:52:24 zodiac systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 31 13:52:24 zodiac systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Jul 31 13:52:24 zodiac systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 31 13:52:24 zodiac systemd[1]: pve-cluster.service: Start request repeated too quickly.
Jul 31 13:52:24 zodiac systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jul 31 13:52:24 zodiac systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.

Leading me to believe I have a poorly configured ZFS pool. I checked different threads on both space availability.
https://forum.proxmox.com/threads/required-command-to-remove-the-zfs-snapshot-via-cli.111704/
https://technotes.seastrom.com/asse...-a-ZFS-Filesystem-that-is-100percent-Full.pdf

Looking into making space by deleting snapshots threw errors
Bash:
root@zodiac:~# zfs list -t snapshot
no datasets available

root@zodiac:~# zfs list -t volume
no datasets available

root@zodiac:~# zfs list rpool/dump
cannot open 'rpool/dump': dataset does not exist

root@zodiac:~# zfs list rpool
NAME    USED  AVAIL  REFER  MOUNTPOINT
rpool   450G     0B   104K  /rpool

root@zodiac:~# zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool              450G     0B   104K  /rpool
rpool/ROOT         450G     0B    96K  /rpool/ROOT
rpool/ROOT/pve-1   450G     0B   450G  /
rpool/data          96K     0B    96K  /rpool/data
rpool/var-lib-vz   104K     0B   104K  /var/lib/vz

Checked the isocket issue as outlined in an above thread. Guess I'm not using SAMBA
https://forum.proxmox.com/threads/no-space-left-on-device.77411/
Bash:
root@zodiac:~# cd /var/lib/samba/private/msg.sock
-bash: cd: /var/lib/samba/private/msg.sock: No such file or directory
root@zodiac:~# cd /var/lib/samba/private/
root@zodiac:/var/lib/samba/private# ls -a
.  ..
root@zodiac:/var/lib/samba/private#

I tried to qm remove VM 106 as it was a testing vm, but cannot get a connection
Bash:
root@zodiac:~# qm destroy 106
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused
root@zodiac:~#
 
Managed to work this out. zfs list wasn't showing the right information for me to find my vm-raid mount point. I ended up looking at it from a directory size sort and was able to find the mounted location of my vm-raid pool. From there I was able to navigate through the file structure to delete enough to get things running again.

Bash:
root@zodiac:/rpool# du -hx --max-depth=1 / | sort -rh | head -20
451G    /
449G    /vm-raid
1.5G    /usr
450M    /var
151M    /boot
3.7M    /etc
42K     /root
25K     /tmp
512     /srv
512     /opt
512     /mnt
512     /media
512     /home
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!