[SOLVED] No webUI but SSH works

Por12 · Jul 24, 2023

Hi

I woke up today and found that suddenly one of my proxmox servers has no webUI. If I go to https://ip:8006 the browser is stuck loading until it times out. If I log on to the other server in the cluster I see the name of the first server with an x.

I can access via ssh and have tried restarting the GUI (service pveproxy restart) and rebooting the server, but no success. I then tried updating the system but there seems to be a no space problem that wasn't there last week.

Code:

Ign:1 http://ftp.debian.org/debian bookworm InRelease
Ign:2 http://security.debian.org/debian-security bookworm-security InRelease
Ign:3 http://ftp.debian.org/debian bookworm-updates InRelease
Err:4 http://security.debian.org/debian-security bookworm-security Release
  Could not open file /var/lib/apt/lists/partial/security.debian.org_debian-security_dists_bookworm-security_Release - open (28: No space left on device) [IP: 151.101.134.132 80]
Err:5 http://ftp.debian.org/debian bookworm Release
  Could not open file /var/lib/apt/lists/partial/ftp.debian.org_debian_dists_bookworm_Release - open (28: No space left on device) [IP: 151.101.134.132 80]

If I run df -kh I see:

Code:

Filesystem                     Size  Used Avail Use% Mounted on
udev                            32G     0   32G   0% /dev
tmpfs                          6.3G   39M  6.3G   1% /run
rpool/ROOT/pve-1               458G  458G     0 100% /
tmpfs                           32G     0   32G   0% /dev/shm
tmpfs                          5.0M     0  5.0M   0% /run/lock
rpool                          128K  128K     0 100% /rpool
rpool/ROOT                     128K  128K     0 100% /rpool/ROOT
rpool/data                     128K  128K     0 100% /rpool/data
rust-ferraz                    3.0T  128K  3.0T   1% /rust-ferraz
rust-ferraz/subvol-509-disk-0  2.0T  591G  1.4T  30% /rust-ferraz/subvol-509-disk-0
tmpfs                          6.3G     0  6.3G   0% /run/user/0

So it seems my pve-1 root folder is suddenly full? I don't really understand how as I have not done anything on the server in the past few days and it was working fine before with only a few GBs utilization.

If I run df -i I see:

Code:

Filesystem                        Inodes IUsed      IFree IUse% Mounted on
udev                             8179457   502    8178955    1% /dev
tmpfs                            8188037   893    8187144    1% /run
rpool/ROOT/pve-1                   54828 54828          0  100% /
tmpfs                            8188037     1    8188036    1% /dev/shm
tmpfs                            8188037     9    8188028    1% /run/lock
rpool                                  8     8          0  100% /rpool
rpool/ROOT                             7     7          0  100% /rpool/ROOT
rpool/data                             6     6          0  100% /rpool/data
rust-ferraz                   6303678623     7 6303678616    1% /rust-ferraz
rust-ferraz/subvol-509-disk-0 2955585082  1290 2955583792    1% /rust-ferraz/subvol-509-disk-0
tmpfs                            1637607    18    1637589    1% /run/user/0

I'm running version 8.0.

Chris · Jul 24, 2023

Hi, can you identify which files are taking up most of the storage space? You could try to install `ncdu` in order to scan for large files. Of course you will have to cleanup some space first in order to do that. Also, please check the journal for errors, maybe this gives a hint on what is acting up.

Por12 · Jul 24, 2023

Thanks!

The first lines on journalcts are:

Code:

Jul 21 18:55:51 apollo systemd-journald[1339]: Data hash table of /var/log/journal/4d8a5caa03b54b14b8080470a2297c34/system.journal has a fill level at 75.1 (1537 of 2047 items, 524288 file size, 341 bytes per hash table item), suggesting rotation.
Jul 21 18:55:51 apollo systemd-journald[1339]: /var/log/journal/4d8a5caa03b54b14b8080470a2297c34/system.journal: Journal header limits reached or header out-of-date, rotating.

Then it reports mount errors on all nfs folders.

I'm unsure how to find files to delete. If I run du -h --max-depth=1 / | sort -h to try to find large files all I get is that almost all the space is on /, which is not very helpful:

Code:

0       /dev
0       /proc
0       /sys
512     /home
512     /media
512     /opt
512     /srv
2.0K    /rpool
12K     /mnt
25K     /tmp
62K     /root
3.8M    /etc
47M     /run
76M     /boot
1.2G    /usr
457G    /var
591G    /rust-ferraz
1.1T    /

Chris · Jul 24, 2023

Por12 said:
Thanks!

The first lines on journalcts are:

Code:

Jul 21 18:55:51 apollo systemd-journald[1339]: Data hash table of /var/log/journal/4d8a5caa03b54b14b8080470a2297c34/system.journal has a fill level at 75.1 (1537 of 2047 items, 524288 file size, 341 bytes per hash table item), suggesting rotation. Jul 21 18:55:51 apollo systemd-journald[1339]: /var/log/journal/4d8a5caa03b54b14b8080470a2297c34/system.journal: Journal header limits reached or header out-of-date, rotating.

Then it reports mount errors on all nfs folders.

I'm unsure how to find files to delete. If I run du -h --max-depth=1 / | sort -h to try to find large files all I get is that almost all the space is on /, which is not very helpful:

Code:

0 /dev 0 /proc 0 /sys 512 /home 512 /media 512 /opt 512 /srv 2.0K /rpool 12K /mnt 25K /tmp 62K /root 3.8M /etc 47M /run 76M /boot 1.2G /usr 457G /var 591G /rust-ferraz 1.1T /

Why? it tells you that /var contains 457G of data, so I suggest to look further there. /rust-ferraz is a mountpoint, so not of interest here.

bbgeek17 · Jul 24, 2023

Por12 said:
591G /rust-ferraz

the command that you ran " du -h --max-depth=1 / | sort -h" means:
du - calculate space occupied by directories and files
-h use human readable values
--max-depth-1 dont descend into subdirectories of the specified path, give only top level
/ - only report on root

The output shows how much each folder on your root disk is taking with last line showing the total of all lines above it.
Obvious suspects of the folders that took up all the steps:
457G /var
591G /rust-ferraz

The /var usually contains your VM disk images and containers. The second location is something you created. You can further inspect offenders by:
du -h --max-depth=1 -x /var
du -h --max-depth=1 -x /rust-ferraz

Once you run out of space the behavior of the entire system becomes unpredictable. Cleaning up the space is the only solution.

PS the "-x" option tells "du" to NOT cross to mount point locations, which gives you a more accurate view of the space on specific physical/logical device.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Por12 · Jul 24, 2023

Thank for the help and the lesson. It seems that the space used is on /var/tmp/vzdumptmp350587_509//mnt...

I guess this is related to container 509, but what are vzdump files used for? On top of my mind I think it can be a miss-configured scheduled back-up that tried to copy the data on rust-ferraz and it filled the space on the drive. Does it make sense?

Code:

root@apollo:~# du -h --max-depth=1 -x /var
296K    /var/backups
41M     /var/cache
1.8M    /var/spool
41G     /var/lib
416G    /var/tmp
512     /var/opt
512     /var/local
3.0M    /var/log
3.0M    /var/hdd.log
512     /var/mail
457G    /var
root@apollo:~# du -h --max-depth=1 -x /var/tmp
512     /var/tmp/espmounts
416G    /var/tmp/vzdumptmp350587_509
416G    /var/tmp
root@apollo:~#

bbgeek17 · Jul 24, 2023

vzdump is backup. The 416G file actually has tmp in the name, its a temporary file that should have been deleted, but may have been left behind by the process when system ran out of space. Hard to say without deeper investigation. You can go ahead and delete that file.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Por12 · Jul 24, 2023

All is working fine now. Thanks for the help.

Search

Search

[SOLVED] No webUI but SSH works

Por12

Member

Chris

Proxmox Staff Member

Por12

Member

Chris

Proxmox Staff Member

bbgeek17

Distinguished Member

Por12

Member

bbgeek17

Distinguished Member

Por12

Member

We value your privacy