Proxmox acting up... cant login via web, cant reach services...

Mattias Hedman

Well-Known Member
Jan 19, 2019
122
10
58
55
Hello all!
My current main Proxmox server has been running for well over a year now, all VMs and LXC just chugging along, so much so that I felt like hey lets go on vacation!
2 days in one of my Nextcloud users messaged me asking - what is up with Nextcloud. Bewildered and with no access to the home environment, I calmed myself with one thing is nothing...
The day after the next Nexcloud user messaged me... now there has to be something, I try to reach Nextcloud... strange error that the sys admin has to be contacted... me...
Tries a few other services all gives an error. Tried to forgot this and have my vacation...
Now home.
For some reason I am not able to login to the web ui, I have even tried to change the password. The VMs are up and running but all of them have a read-only fs...

I can ssh into the VMs.. but cannot do anything, I have tried to restart them via cli, that succeeds, only one has come back to life so far. I takes forever to write to the fs.
All write tasks takes forever. Atm it is to late to start digging through the logs... after a 15 hours travel day I am not up to that task.

So I throw it here to see if I have and answers tomorrow when I fit for fight.
 
After a good nigths sleep I am at it again. For some unknown reason other than time passed, I can now login to the web gui, so it all starts with a win.
This could mean something...
rpool 14M 128K 14M 1% /rpool rpool/data 14M 128K 14M 1% /rpool/data rpool/ROOT 14M 128K 14M 1% /rpool/ROOT rpool/data/subvol-100-disk-0 887M 873M 14M 99% /rpool/data/subvol-100-disk-0 rpool/data/subvol-110-disk-0 431M 418M 14M 97% /rpool/data/subvol-110-disk-0 rpool/data/subvol-101-disk-0 1.7G 1.7G 14M 100% /rpool/data/subvol-101-disk-0
My lxc's has filled their root-disks... that can explain the disk activity, the sluggishness of the system of a whole.
The question now is how do I move the root disk? When I try to use the built in tool I get an time-out error.
I do have backup of all the VMs and LCXs so now I will try an restore of the last known point when it was all working and redirecting the root disk there. No... "Connection error 596: Connection timed out". Bugger.
 
Last edited:
So it for sure seems like this is the issue.
No space left on device (500)
Now how to fix it... first up I need to stop the LXCs from starting from boot. I get an "Error writing 100.conf: Input/output error" when trying to change that webui or cli.
 
So now I have deleted all the LXCs and restored them from backup and pointed them to another storage. So far it seems to be working out.