pveupload temp file at /var/tmp is causing the OS disk to become full and system crash

Tekuno-Kage

Renowned Member
Jun 1, 2016
42
15
73
44
Today, we face an issue where an OS disk gets full. The reason was that even though we were uploading files to a Proxmox Storage outside the OS partition (a CEPH-FS Storage), we identified that Today, we faced an issue where the OS disk got full. Despite uploading files to a Proxmox Storage outside the OS partition (a CEPH-FS Storage), we found out that a file was generated in the /var/tmp directory. Initially, we assumed that this was due to a particular procedure used. However, I replicated the behavior and was able to confirm that it indeed caused the undesired situation. To test the theory further, I deleted the hidden file, which resulted in the following error message: "Error 500: temporary file '/var/tmp/pveupload-edac042ecf93413e26ad85a04506305a' does not exist."a file was generated on the /var/tmp. In the beginning, those are assumptions based on the procedure used. Therefore, I replicated the behavior, and sadly, I succeeded in replicating the undesired situation. To get some error, I deleted the hidden file, and that caused the following error, which proves the theory.
Error 500: temporary file '/var/tmp/pveupload-edac042ecf93413e26ad85a04506305a' does not exist


Code:
Package Versions:
proxmox-ve: 7.4-1 (running kernel: 5.15.131-2-pve)
pve-manager: 7.4-17 (running version: 7.4-17/513c62be)
pve-kernel-5.15: 7.4-9
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.126-1-pve: 5.15.126-1
pve-kernel-5.15.39-3-pve: 5.15.39-3
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph: 16.2.13-pve1
ceph-fuse: 16.2.13-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: not correctly installed
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.4-1
proxmox-backup-file-restore: 2.4.4-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-6
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.14-pve1

I strongly believe that "granting access" to a section that a user should not have access to, simply for the purpose of uploading a file, is not an appropriate behavior. It is vital that we ensure the security of our system.
I'm counting on to get help me keep our system running smoothly and without any issues.

Thanks
 
It's unfortunate, but uploads are spooled to a local dir and then moved to your destination.

If your root storage is not sufficient, which is often the case, do not use the GUI to upload content, use SSH/SCP directly to your desired CephFS directory.
 
  • Like
Reactions: Tekuno-Kage
I wish Proxmox used /tmp instead of /var/tmp, since upload don't need to survive a reboot. But /var/tmp is usually bigger than /tmp, which makes sense with large uploads. How small is your root filesystem (or /var/) that you run into issues with uploaded several GB?
 
I wish Proxmox used /tmp instead of /var/tmp, since upload don't need to survive a reboot. But /var/tmp is usually bigger than /tmp, which makes sense with large uploads. How small is your root filesystem (or /var/) that you run into issues with uploaded several GB?

I have been using a small root partition of about 16GB for my deployments since the 5.x versions. Typically, the OS occupies around 3.5GB to 6.5GB of space. However, recently, I noticed that the system was consuming more space than expected. I suspect that this happened because I left some old ZFS snapshots that were taking up space. I did not realize that during an upload, the system was using the OS disk, which is meant only for user functions, not OS functions. The uploaded file was around 6GB, and since the OS was already occupying 6.5GB, the system ran out of space and crashed. This was an unexpected situation for me as I had never seen it before. I understand now that my mistake was leaving a very old snapshot.

It's important to note that the current implementation could potentially lead to serious issues. No matter how much space we have, it's possible for a malicious actor to start multiple uploads, which can quickly fill up the OS partition. This can result in a range of problems that could affect the system's stability and performance. Therefore, it's crucial to address this concern proactively to ensure the system's health and efficiency.


@tom @dietmar
 
Last edited:
  • Like
Reactions: Kingneutron
And please some mechanism that is removing orphaned pveupload files from failed uploads. Especially if you have a look at the big amount of complaints about failed uploads.
 
  • Like
Reactions: Kingneutron
Hey @t.lamprecht, hope you're doing well! I just wanted to quickly reach out to you regarding an issue that I believe needs to be addressed. While you're currently evaluating Bugzilla bug 5254, I wanted to draw your attention to a related issue that concerns image downloads. I really think that addressing this problem would go a long way in improving the overall user experience. Would you mind taking a look when you have a moment? Thanks so much!
 
> I have been using a small root partition of about 16GB for my deployments since the 5.x versions

Yep, I did the same thing on a recent install of 8.1.4. Decided to bite the bullet and reinstall with LVM/ext4 root at 50GB. A workaround would be to redirect /var/tmp to a compressed zfs dataset with a soft symlink
 
Yep, I did the same thing on a recent install of 8.1.4. Decided to bite the bullet and reinstall with LVM/ext4 root at 50GB.

My genuine concern is not to redeploy; my installation is zfs, which makes it quite simple (grow the partition and allow zfs to grow, or use the boot disk to “install” with the new size and then rollback a saved snapshot).
My concern is about a malicious attack; someone could explode that behavior and crash the server very easily, like a time bomb. Make multiple downloads simultaneously and leave. A few minutes or hours later, the disk is full, and the system crashes. And because it is not on /tmp, the consumed space remains, and the system did not correctly boot until someone manually got there and cleaned the storage. It is almost impossible to do it by console because the log message hugs the screen and doesn't allow you to see what you are typing. The option is to check if network services and SSH are up and accessed by SSH.
That is why I am bringing this issue to the community and developers' attention. I believe that leaving the implementation of the downloads like that leaves the system vulnerable.

As an example of an option:
A workaround would be to redirect /var/tmp to a compressed zfs dataset with a soft symlink
To me, if the long-term decision is not to change the solution behavior, most probably, I will end up doing that. But that doesn’t look clean to me; it is a hardening that could be avoided. And will require preparing documentation and procedures to mitigate that one.

I want to open a respectful discussion on whether this implementation/behavior should be optimized or left with the concern that disk full could happen.

Regards,
 
  • Like
Reactions: Kingneutron
It seems there is still no solution for this, I have just come back to PVE for my homelab, and I am am hating the fact that I can't upload the files via the GUI, due to a folder size limitation.

Looking at df results, I cant work out where the limitation is. I am take it that its the tmpfs location?

Code:
root@pve:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                  7.5G     0  7.5G   0% /dev
tmpfs                 1.6G  1.2M  1.6G   1% /run
/dev/mapper/pve-root   67G  3.7G   60G   6% /
tmpfs                 7.6G   16M  7.6G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
efivarfs              122K   56K   62K  48% /sys/firmware/efi/efivars
/dev/sdb2            1022M   12M 1011M   2% /boot/efi
/dev/fuse             128M   16K  128M   1% /etc/pve
tmpfs                 1.6G     0  1.6G   0% /run/user/0
 
This is a just dead issue. Novices just expect to be able to freely upload files via the GUI with no situational awareness or any regard for how much space is available versus the size of the payload.

Numerous solutions have been provided in the thread. I don't expect PVE team to provide a configurable temp dir or evaluate free space prior to upload.
 
@Kingneutron

I am trying upload an opnSense ISO (2.1Gb) and I get an Error '0' error - see screenshot below.


1727455678842.png

When i try to SCP the same file to the relevant directory it also crashes after uploading around a Gig of data and I have been able to add smaller ISOs

1727456110437.png

and it shows in the ISO Images folder in the GUI.

Is there any particular logs I should be looking at or any keywords I can search for?

Thanksfor the help.


@alyarb,

Why wouldn't a novice expect to upload an Iso without an issue it would be the first thing you would do to create a VM, this is what the whole system is designed to do, as an example I would expect it to cover such as Windows Servers and most linux DVD installs, where the isos are between 2-6Gb. I had installed a 1TB drive, it should be able to cope with this size of iso off the bat without issue. Maybe in the installation, it could be an option to set the size of the tmp folder, so that it could be set manually to say 8Gb (if enough space is available at the point of installation) and then it would just work.

I am no newbie, with being in IT for over 23 years, I know my way round most systems and have used proxmox in the past, and I can't remember this being an issue in the past.

I have been trying the mentioned methods, and trying to scp the files directly to the disk (/template/iso folder), it still fails and crashes the server, I have tried to find anything in the logs but my logging skills on linux are a bit lacking (I will concede in that is where I am lacking in knowledge). I am unable to
 
It sounds like you may have installed a 1 TB drive for VM hosting, but that is not where PVE is installed or where your ISO repository is configured. Nothing in your df -h output is anywhere near that size.

If you cannot write a reasonably sized file directly to the directory via scp or ssh, I would consider the possibility of bad blocks on the storage.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!