issue rebooting PVE host (kind of serious)

Vorl

Member
Jan 7, 2023
33
2
8
So this morning I had an issue. my zfs root filesystem filled up. It was my fault, I configured backups and missed a step in backing up to the backup server instead of the local zfs filestore but it lead to several issues.

1. I couldn't delete the backups through the gui because it wanted to make another file to delete a file (?)
2. the backup job was still running, I couldn't kill the job through the gui or CLI after I manually went in and deleted some old backups
3. the VMs were in a strange state where pve thought they were running, but they weren't I couldn't restart the VMs
4. I couldn't reboot the PVE host server either from the gui or the cli. I had to walk over to the host and pull the power. thankfully it came back up fast and everything seems to be good.

This is kind of a serious issue though. What prevented me from issuing a reboot/shutdown command especially from the cli?

also, just curious, what is the minimum recommended filesystem size for proxmox?

I am sure proper partitioning would have resolved this, but for some reason proxmox wants to grab all the disk space on the device instead of what it needs. granted I am a newbie to proxmox but not linux, I do know better and shouldn't have let everything get sucked into proxmox's host volume, but this was also my first host.

Would be nice if the installer defaulted to the "recommended" size for root, and left the rest for me to do something like create a second zfs volume for non host only processes.
 
yes, once a filesystem (especially the root one) is full recovery can be tricky. especially with storages that are Copy-on-Write, like ZFS - because every write/update doesn't immediately free up space. with ZFS, you can set quotas to ensure there is always some reserve for such scenarios - but which datasets should get what quota is up to you, and depends on your exact system and setup.
 
yeah, that goes back to maybe proxmox should only partition the recommended size for root (or at least start there and give people the option to use the whole drive/more of the drive if they way), and leave the rest empty instead of trying to claim the entire volume by default. It's generally a best practice to have one filesystem just for system things and do everything else on a second filesystem.
 
yeah, that goes back to maybe proxmox should only partition the recommended size for root (or at least start there and give people the option to use the whole drive/more of the drive if they way), and leave the rest empty instead of trying to claim the entire volume by default. It's generally a best practice to have one filesystem just for system things and do everything else on a second filesystem.
well, I'd rather say it's best practice to split system disks with / from data disks, and not do that on a partition level but sharing disks ;) but the default setup does split root from guest volume storage even on a single disk/zpool, either by virtue of / being a regular LV and the storage part being a thin pool, or by having two ZFS datasets for / and the storage part.. you can customize that however you see fit after the install, including adding more separation, quotas, removing things you don't want. note that there is an option in the installer to leave parts of the disk free (and another one to leave parts of the "pve" VG free in case of LVM).
 
I am not sure how ZFS works, but I know that when backups were accidentally configured to store data on the local zfs volume, when it filled up the the storage location for backus I couldn't do anything with the system, and it crashed the VMs.

NAME USED AVAIL REFER MOUNTPOINT rpool 108G 6.11G 96K /rpool rpool/ROOT 76.5G 6.11G 96K /rpool/ROOT rpool/ROOT/pve-1 76.5G 6.11G 76.5G / rpool/data 31.7G 6.11G 96K /rpool/data rpool/data/vm-100-disk-0 20.0G 6.11G 20.0G - rpool/data/vm-101-disk-0 148K 6.11G 148K - rpool/data/vm-101-disk-1 11.6G 6.11G 11.6G - rpool/data/vm-101-disk-2 68K 6.11G 68K -

This is what my install looks like by default for zfs list. To me it looks like everything sits on rpool and there doesn't appear to be any separation for a secondary volume to hold extra things like VMs.

Like I said, I don't understand zfs. I am just going by anecdotal evidence that when I delete data suddenly the availible space goes up for what looks like everything.,

What am I missing, how is it separated by default?
 
Last edited:
the datasets are a logical separation - to also split up (or rather, reserve) the space, you need to set quotas and or reservations (which can be changed later on, it's much more flexible than allocating a fixed amount of space up front like with LVM or partitions).
 
That goes back to my point of true separation of the PVE OS from everything else. It's not really separated. Why wouldn't you just give the OS what it needs or at least start off with a recommendation instead of claiming the whole disk? Can you put quotas in? Based on what you are saying, sure! How many people know that unless they know zfs? I picked zfs because from what I read it can handle power outages better without getting corrupt. While I have UPSs on my systems, things happen. If I don't have time to become a zfs expert does that mean I shouldn't use zfs?

is there somewhere in the GUI you setup quotas? Is there a tip somewhere telling people that for system stability you should setup quotas? I didn't see anything about that during install, but maybe I missed it.

Even when trying to read info about requirements there is info missing. here is the link the your page that talks about hardware requirements, No where on that page are there disk size elements.

https://www.proxmox.com/en/proxmox-ve/requirements

What are the minimum and recommended disk space allotments for PVE?

Can it be installed and run happily on an SD card?

Also, while we are talking about it, where are the tuning paramaters for zfs/PVE in the gui? I was reading that by default pve will use half the ram in a system for caching and that it needs to be manually tuned to use less.
 
So here we are 4 months later and the wiki wasn't updated, no answers to any of my questions.

The current default sets people up to fail, especially people that are installing it in home labs or labs in general to test/learn. If there are issues do you think people are more likely to try to push to use this in their companies?
 
What are the minimum and recommended disk space allotments for PVE?
the minimum is 8 GB if using the baremetal installer IIRC, but that won't give you a system usable for production for obvious reasons. that number is rather meaningless outside of testing the installer in a VM, and there you get a message, increase the disk size, and reboot (takes about 30s). PVE is meant for installation on server hardware, the disks being big enough for the system itself is not an issue.
Can it be installed and run happily on an SD card?
no. as the system requirements say, at least a hard drive.

the recommendations list:
  • Fast and redundant storage, best results are achieved with SSDs.
  • OS storage: Use a hardware RAID with battery protected write cache (“BBU”) or non-RAID with ZFS (optional SSD for ZIL).
  • VM storage:
    • For local storage, use either a hardware RAID with battery backed write cache (BBU) or non-RAID for ZFS and Ceph. Neither ZFS nor Ceph are compatible with a hardware RAID controller.
    • Shared and distributed storage is possible.
Also, while we are talking about it, where are the tuning paramaters for zfs/PVE in the gui? I was reading that by default pve will use half the ram in a system for caching and that it needs to be manually tuned to use less.
it doesn't need to, but it is likely a good idea unless you have a lot of memory or little memory pressure from/usage by guests. this is not PVE specific, it's how ZFS on Linux works. it's also documented, including how to change it: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_limit_memory_usage

basic Linux sysadmin skills and knowledge are required to administrate PVE systems. that includes being able to look up stuff in documentation, and areas that are in no way specific to PVE, but apply to pretty much any production Linux server.


So here we are 4 months later and the wiki wasn't updated, no answers to any of my questions.
this forum is pretty high volume. sometimes stuff slips through (and please keep in mind that the staff that answers you here also has other duties, including implementing the features you want or fixing the bugs you find :)). if you want guaranteed response times, take a look at our enterprise support offerings ;)
 
"the minimum is 8 GB if using the baremetal installer IIRC, but that won't give you a system usable for production for obvious reasons. that number is rather meaningless outside of testing the installer in a VM"

Why is it so hard to get a straight answer? Seriously, if 8GB is too small to use except to test installs what is the real answer, and why isn't it documented in your wiki or your new install documentation? Heck, lots of install documentation from vendors include senarios like 8GB for testing the installer, 15GB for running a test lab of up to XX number of machines 40GB of raid storage for production installs, etc etc..

Lots of proxmox installs are for lab/homelab (like mine) so giving real documentation is important even for companies deciding if they want to test proxmox.

"sure from/usage by guests. this is not PVE specific, it's how ZFS on Linux works. it's also documented,"

It's in no way related to PVE except you include it in your product. That makes it part of your ecosystem and something you should consider making easier to manage. saying "it's not ours" is just an excuss to make it someone else's issue.

"basic Linux sysadmin skills and knowledge are required to administrate PVE systems. that includes being able to look up stuff in documentation, and areas that are in no way specific to PVE, but apply to pretty much any production Linux server."

I have been working with Linux since 1995 and held an RHCE (it expired). I am well aware of how to look stuff up but you think (and so do some of your co-workers based on posts in various threads) that we have nothing better to do with our time than become experts in your one application. Very few organizations give their admins time to become experts in any application let alone multiple apps. Most admins have tons of apps and are spread very thin. This is especially true of small to mid sized companies where admins wear multiple hats.

If you want to endear yourself to your clients then make your app easier to work with, and don't force everyone to learn every nuance of every system. Heck even when people know they need to change something like memory that zfs uses, being able to change it in the gui or better yet make policies that are cluster wide would be amazing. Right now so much is "go into this hosts shell and change xyz setting and then do it on every host you have in cli. I saw this as someone that loves to script and prefers the command line. It's just not that easy anymore. Too many things need attention.



"this forum is pretty high volume. sometimes stuff slips through (and please keep in mind that the staff that answers you here also has other duties, including implementing the features you want or fixing the bugs you find :)). if you want guaranteed response times, take a look at our enterprise support offerings ;)"

This is true. I apologize for being short.
 
Last edited:
Why is it so hard to get a straight answer? Seriously, if 8GB is too small to use except to test installs what is the real answer, and why isn't it documented in your wiki or your new install documentation? Heck, lots of install documentation from vendors include senarios like 8GB for testing the installer, 15GB for running a test lab of up to XX number of machines 40GB of raid storage for production installs, etc etc..
because there is no one size fits all answer/solution.

space usage is really heavily correlated with how you use PVE. the "base" system (excluding all guest-related space usage) won't require more than 20-30G of space, unless you do some sort of customization that requires a lot of space (e.g., if you also run an APT mirror directly on your host, or something like that ;)). but that is pretty much meaningless - no production PVE system has that little disk space anyway (especially not if you follow the recommendations, and use a fast/modern disk/SSD).

once you add guest-related usage, a minimum doesn't make sense anymore either - because obviously, the bulk of the usage will be by guests (directly, or via backups/snapshots/..), and that amount is not determined by us or our software.
 
Thank you for that answer. Something else to consider as you stated above, you wouldn't add guest related things to the OS disk/partition.
I think based on the use of proxmox for home labs its more likely for users to use a single disk for both OS and data. Having a default partition of 30G for the OS regardless of which filesystem is used would be huge. Now I know you wouldn't make changes just for home users but it's a reality.

Also the default of 30G even for production environments would generally be a partition just for OS stuff and a separate partition for anything that grows to safeguard the OS. Add to that the fact that you would never do backups to the OS partition, and you wouldn't use it for snapshots, it's just for the OS, it seems pretty simple to give an OS partition size recommendation. It also really seems like that should be the default not just taking the whole disk. Taking the whole disk is the easy answer, just not the right one for pretty much any situation.
 
both default setups (LVM and ZFS) already split guest storage (pve/data LVM thin pool or rpool/data dataset) from system storage (pve/root LV or rpool/ROOT/pve-1 dataset) - the only thing that ends up on the latter is isos/templates (small) and backups (shouldn't be stored on the same disk anyhow, but on a different disk or external host). for LVM you can configure the space distribution in the installer (since reconfiguring that is not easily done afterwards), for ZFS you can change the reserved space or space limits at any time anyway. you can always add another LV or dataset for backing /var/lib/vz (the default dir storage called "local"), it's documented in our admin guide for LVM (3.7.4), the same approach also applies to ZFS if you desire it. for ZFS it is planned to create a separate dataset at install time: https://bugzilla.proxmox.com/show_bug.cgi?id=1410
 
That is awesome! while I don't look forward to redoing my hosts, I will definitely reinstall for that change.

Thank you for the information.
 
there shouldn't be a need to reinstall for that, you can do the same change and move the data on an installed system as well :)
 
Ok, that's just crazy. Do you happen to have documentation on that process? I wouldn't even know where to start looking
 
the steps you need to take before creating the dataset depend on what you currently use your 'local' storage for (e.g., if you only use it for backups, you just need to ensure you are not doing any backups or restores at the moment. if you use it for ISO images or guest volumes, you need to stop any guests using any of those files, ...).

1. create a dataset using zfs create ..
2. disable the local storage
3. mv the contents of /var/lib/vz to the mountpoint of the dataset from step 1
4. change the mountpoint of the dataset from 1. to /var/lib/vz and ensure it is mounted there
5. set is_mountpoint = 1 on the local storage
6. re-enable the local storage
 
thank you.

I didn't know if there was anything from the PVE side that had to be done.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!