[SOLVED] LXC File system issues. Proxmox showing >1000% disk utilisation. df -h showing negative usage. Cannot figure out how to fix this.

trotski94

New Member
Jun 9, 2020
4
1
3
29
First post - apologies if I'm in the wrong place or doing something wrong. I'm starting to pull my hair out at this one.
This is my first time doing anything with LXC's and I'm feeling way out of my depth right now. Summary page on Proxmox for my LXC is showing "Bootdisk size 1120.99% (16.15 TiB of 1.44 TiB)" which is obviously wrong. Tried to run fsck but I must be doing something wrong and no amount of google-fu is putting me on the right track. I've tried using "shutdown -Fr now" but that seems to have done nothing. I've tried to mount root as readonly and run fsck on it, but "mount -o remount,ro /" tells me it cannont mount /dev/loop1 read-only. Doing "umount /" tells me /: not mounted. Doing "umount /dev/loop1" tells me the same message as before.

Ultimately I know the cause to this - I only had a single storage set up in Proxmox, and it was used for everything including backups. A weekly backup ran and used up all the available disk space, starving all the LXCs and VMs of any ability to write to disk, screwing up a bunch of them. I restored most of them from backups but unfortunately this is the only LXC not backed up (because I don't have space).

I'd just rebuild it from scratch, but it has 1.0TB of files on it and my total server space is only 1.8TB, so I can't have two instances side by side... Any advice or guidance would be appreciated, because I'm completely lost here.
 
please post the config of the container (`pct config $VMID`) and your storage configuration (`cat /etc/pve/storage.cfg`) - is there anything in the journal from the time the disk was full?

I'd just rebuild it from scratch, but it has 1.0TB of files on it and my total server space is only 1.8TB, so I can't have two instances side by side... Any advice or guidance would be appreciated, because I'm completely lost here.
any chance to get some additional diskspace there (maybe even in form of an external USB-hdd?) - it's always quite scary to have to work on the last available copy of data
 
any chance to get some additional diskspace there (maybe even in form of an external USB-hdd?) - it's always quite scary to have to work on the last available copy of data

Afraid not, its rented hardware in another country. Worst comes to worst I can pull the data off, rebuild and push it back up.. but it's going to take days to weeks with my connection.


pct config output:
Code:
arch: amd64
cores: 2
hostname: Storage
memory: 1024
net0: name=eth0,bridge=vmbr0,hwaddr=6E:7D:8E:25:C8:C2,ip=dhcp,ip6=dhcp,type=veth
ostype: debian
rootfs: local:105/vm-105-disk-0.raw,size=1500G
swap: 1024
unprivileged: 1

storage.cfg:
Code:
dir: local
        path /var/lib/vz
        content snippets,images,vztmpl,iso,backup,rootdir
        maxfiles 0
        shared 0

Nothing I can see in the journal.. although honestly it was over a week ago this issue happened and I thought i'd gotten away with just restoring from backups (some VMs/LXCs wouldn't even boot. I thought I was lucky that this one seemed unaffected). But now over a week later I've noticed this.
 
hmm - would make a backup in all cases if possible at all if you value the data...
if you stop the container you can run fsck using `pct fsck VMID`
 
  • Like
Reactions: trotski94
pct fsck fails o_O

Code:
fsck from util-linux 2.29.2
/var/lib/vz/images/105/vm-105-disk-0.raw: Group descriptor 8112 has invalid unused inodes count 8190.  FIXED.
/var/lib/vz/images/105/vm-105-disk-0.raw: Group descriptor 8289 has invalid unused inodes count 8189.  FIXED.
/var/lib/vz/images/105/vm-105-disk-0.raw: Inode 65011796 extent tree (at level 2) could be narrower.  IGNORED.
[REMOVED SIMILAR IGNORED MESSAGES FOR POST LENGTH]
/var/lib/vz/images/105/vm-105-disk-0.raw: Inode 65014591 extent tree (at level 2) could be narrower.  IGNORED.
/var/lib/vz/images/105/vm-105-disk-0.raw: Inode 65014619 extent tree (at level 2) could be narrower.  IGNORED.
/var/lib/vz/images/105/vm-105-disk-0.raw: Inode 65015472 extent tree (at level 2) could be narrower.  IGNORED.
/var/lib/vz/images/105/vm-105-disk-0.raw: Deleted inode 66846724 has zero dtime.  FIXED.
/var/lib/vz/images/105/vm-105-disk-0.raw: Inodes that were part of a corrupted orphan linked list found.

/var/lib/vz/images/105/vm-105-disk-0.raw: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
        (i.e., without -a or -p options)
command 'fsck -a -l /var/lib/vz/images/105/vm-105-disk-0.raw' failed: exit code 4


Currently running it manually as it suggests
 
Great to hear :)

and yes - last time I ran into a broken disk was also the time when I really started making backups! :)
Please mark the thread as 'SOLVED'
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!