PVE 8.1 Excessive Writes to Boot SSD

ligistx

Member
Mar 24, 2023
30
17
8
I know this has been discussed in multiple different places, many many times over, but I am trying to get a sense of if my experience is way beyond the norm.

I have been running Proxmox for 2 years now in a homelab setting, I have about a dozen or so VM's considting of pfsense, homeassistant, truenas ubuntu server, some LTXC's, a dozen docker containers, typical homelab type use case. I am running a pair of 980 (non pro's) 500GB as my boot drive in ZFS mirror for proxmox and store the VM's on the boot drives as well, so all of the proxmox logs + VM's are writting to the same pair of 980's.

That said, if I am reading this correctly, I have done 135 TB of writes in 2 years - this seems unphathamable. Am I reading this correctly? I am also not sure why I am seeing some data integrety errors, I should probably look into replacing this drive soon (or monitor to see if the number goes up? and if not, maybe just let it be?).

After seeing this, I went looking for solution and this morning I did set the Storage=volatile option in /etc/systemd/journald.conf hoping that will help going forward, but wasn't comfortable modifying anythign beyond that as the other options seemed slightly more invasive/less begnin. I did come across a github page where someone has a mod to write all logs to RAM and then dump to disc at a set interval which seems like a potential plausible option, but I wanted to check in here first to see if this seems normal, or just totally crazy. 135 TB in 2 years to me seems crazy... Honestly even 12.4 TB reads seems impossible.

1711817916939.png
 
So assuming this is your drive it is rated @ 300TBW / 5 years, which works out to 120TB for 2 years. So currently you're over spec. Not the best.

But concerning your setup;
I have about a dozen or so VM's considting of pfsense, homeassistant, truenas ubuntu server, some LTXC's, a dozen docker containers, typical homelab type use case............... I have done 135 TB of writes in 2 years - this seems unphathamable.
I'll be honest with what I think, all of that stuff on a 500gb NVME HV boot-drive + ZFS mirroring, I don't find the writes too excessive for 2 years, the opposite I find the reads to be extremely low!

Something maybe to consider; In my experience an NVME (non enterprise) that's fullish (80% ?) starts to behave differently, & potentially will have increased read/writes. ZFS has it's own concerns concerning pool capacity rates. (I think I read somewhere its between 70-80%).

Also don't know what your RAM/swap looks like - but this will also be a factor.

Another point, in your highly nested FS in an FS in an FS etc. (you can go work out which the worst contenders are), your write amplitude is going to be phenomenal. (So should the reads be probably, but not as large). Also in an (almost) parallel-writing environment from many contenders (you have a large number of VMs + LXCs @ 500gb NVME) + constant PVE logging - your amplitude is going to increase.

I'm not going to berate you for the choice of non-enterprise HW - but you know the deal.

Thanks for the github tip on using a RAM drive for PVE logging - looks interesting, but likely one has to consider the added corruption potential. Maybe I'll look into it - and learn some more on the way.


Edit: Just one other thing that could be of possible interest; you show smart values for one of the NVME's, how does that compare with the other NVME.
 
Last edited:
Have a look at your write amplification. Running TrueNAS or pfsense, with ZFS on top of ZFS, for example would be a bad idea. In case of pfsense I would use UFS and in case of TrueNAS I would buy dedicated disks and pass them through. Or use some lighter NAS like OMV with ext4/xfs.

And lots of writes is normal when using ZFS. Thats why its everywhere recommended to buy proper enterprise/datacenter grade SSDs that got a higher TBW/DWPD and power-loss protection to counter the high write amplification.

PS: Here in the homelab, those TrueNAS, OPNsense, homeassistant, docker containers , ... write those 135TB in a few months. Thats done here just by logging.
So in my opinion thats not much.
135TB over 2 years means you are writing with an average of 2.3 MB/s.
That isn't much for a NAS and some dozens of VMs/containers.
The problem is more that those SSDs aren't great for server workloads, only rated for continous 2 MB/s over 5 years.
SSDs are consumables. You damage them with each write and once they fail you replace them with a new one.
So on the long run cheaper to buy them looking at the price-per-TB-TBW and not the price-per-TB-capacity.
Then you realize that those enterprise/datacenter SSDs are acually way cheaper and not more expensive than those consumer SSDs.
 
Last edited:
and in case of TrueNAS I would buy dedicated disks and pass them through.
I'm pretty sure, the OP's TrueNas storage pools aren't located on the same 500gb boot disk. Apart from the size-constraints, If they were in fact running used storage pools for 2 years 24/7 with all the rest of his stuff on them, both his NVMEs would be toasted by now. If they are not toast, I want to purchase both from him!
 
Last edited:
o assuming this is your drive it is rated @ 300TBW / 5 years, which works out to 120TB for 2 years. So currently you're over spec. Not the best.
Correct. One of them already is showing what seem to be errors... but Samsung RMA is not easy to work with (at least, I have not had success finding a stupid "I want to RMA a product" button, I need to spend a little more time finding that I guess).

I'll be honest with what I think, all of that stuff on a 500gb NVME HV boot-drive + ZFS mirroring, I don't find the writes too excessive for 2 years, the opposite I find the reads to be extremely low!
Well that is good to know I suppose. I have 9 VM's, 1 LTXC, and about as many docker containers all running. They don't do much, but they are all always on.

The boot dirves are only at about 120 GB full, so no where near the 80% mark. My VM's are pretty small so thankfully I am not using large %'s of storage which should give garbage collection a much easier job.

Another point, in your highly nested FS in an FS in an FS etc. (you can go work out which the worst contenders are), your write amplitude is going to be phenomenal. (So should the reads be probably, but not as large). Also in an (almost) parallel-writing environment from many contenders (you have a large number of VMs + LXCs @ 500gb NVME) + constant PVE logging - your amplitude is going to increase.
I agree, running nested ZFS isn't a good idea, but the only ZFS OS under proxmox that is doing anything write wise is pfsense. Truenas almost never writes to its boot drive thankfully, and all other VM's are not running ZFS.

My Truenas pool operates via a passed through HBA and that array is 10x4TB spinning rust drives, so no issues there.

For more information, this system has 128 GB of ECC RAM, so I am not worried about RAM usage (I have plenty), nor am I worried about corruption in RAM. Obviously anything can happen, but ECC should be there to help save me.


And lots of writes is normal when using ZFS. Thats why its everywhere recommended to buy proper enterprise/datacenter grade SSDs that got a higher TBW/DWPD and power-loss protection to counter the high write amplification.
I was able to turn off some extranious logging in pfsense which cut it down from ~1.5 MB/s of writes (according to proxmox webUI) down to 200 kb/s, thats an order of magnitude reduction. I also used the script I found to store proxmox logs in RAM which I am sure is helping as well but I don't have a before and after to compare to...

I will check back in a month or so to see what the total write via SMART looks like, but I am hopeful these changes were enough to give the drives a lot longer to live. I need to RMA one of them anyways as I believe the integrety errors in the original screenshot are indicative of issues, although I will admit SSD SMART is not quite as easy to understand as HDD.
 
Also to add some contect, with the changes I have made, I am seeing an average of 1.25 MB/s being written to the drives:

1712259058638.png


Again, I don't have a before number to compare to, but I am fairly certain this is substancially lower. If my math is correct, this gets me down to about 40 TB written per year (lets round up a bit and say 50 TB a year), which, while certainly a lot, is substancially less then before.

Looking at my SMART data again, it looks like I was doing about 3.8 MB/s previously (135 TB written over 9,900 hours of power on time). Not as much as a reduction as I had hoped, but it is under half as many writes per second.
 
Happy you've improved your situation.
Just one final reminder: HAVE RESTORABLE BACKUPS OF EVRYTHING (including your working Proxmox install/config/settings etc).

Good luck.
 
  • Like
Reactions: Kingneutron
Happy you've improved your situation.
Just one final reminder: HAVE RESTORABLE BACKUPS OF EVRYTHING (including your working Proxmox install/config/settings etc).

Good luck.
I do use Proxmox backup server to backup VM's themselves, but how do you go about backing up the proxmox host? I still don't have a great solution for this...

I run Truenas virtually, but I have a pretty simple solution to bring it up bare metal if I ever need to grab data off it in the event proxmox is down, so I can "easily" do a backup from proxmox to truenas with config files etc, I have just never had a great rundown of "how" and "what" I should be backing up from proxmox? Is there a .config folder for example that I could backup and then restore on a fresh proxmox instal and would have all of my settings and VM's working as they were previsouly?

Any suggestions here would be greatly apprecaited.
 
Good idea to have a backup of your /var/lib/vz, /etc, /etc/pve and /var/lib/pve-cluster/config.db.

But restoring isn't as easy and replacing a whole folder. Only official docs on this: https://pve.proxmox.com/wiki/Proxmox_Cluster_File_System_(pmxcfs)#_recovery
But there are some scripts on github for backing up and restoring most common PVE config files. In the end only you can know what you edited and what needs to be restored. Keep in mind that PVE is a full Linux with thousands of packages and config files and not just a simple appliance.
 
Last edited:
Any suggestions here would be greatly apprecaited.
Run a cluster and when a (few) node(s) go down, replace them with fresh nodes (and the cluster configuration will be applied automatically). You'll have to install Proxmox and setup some network and storage but you'll will already have practiced (and documented) that when adding the first couple of nodes.
 
Run a cluster and when a (few) node(s) go down, replace them with fresh nodes (and the cluster configuration will be applied automatically). You'll have to install Proxmox and setup some network and storage but you'll will already have practiced (and documented) that when adding the first couple of nodes.
Being a homelab, I don't really have the ability to add additinoal nodes unfortuantely.
 
One more update, I was able to get pfsense to write its logs to RAM as well, so it looks like I am not down to under 1 MB/s of writes total, I think that is pretty acceptible :)
 
  • Like
Reactions: Kingneutron
Any suggestions here would be greatly apprecaited.
PVE host backup is one of the larger topics of discussion on this forum, & in my opinion one of the highest system-failure problems that administrators/home lab owners face.
But there are solutions out there, & Proxmox themselves have it as their first item on the Roadmap. Unfortunately its been there uncrossed for a long time.
Personally, this is what I do.

Good luck.
 
PVE host backup is one of the larger topics of discussion on this forum, & in my opinion one of the highest system-failure problems that administrators/home lab owners face.
But there are solutions out there, & Proxmox themselves have it as their first item on the Roadmap. Unfortunately its been there uncrossed for a long time.
Personally, this is what I do.

Good luck.
Yea... PVE backup seems so ridiculous to me. Losing your host and not having a robust backup solutions seems so archiac to me. Thats why I went to a mirrored array even tho its just a homelab.

I will get that a look, thanks!
 
Yea... PVE backup seems so ridiculous to me. Losing your host and not having a robust backup solutions seems so archiac to me. Thats why I went to a mirrored array even tho its just a homelab.
I know how you feel exactly!

However I will tell you this, your mirrored array is in no way to be considered a PVE host backup. This is a common misconception of many users. Anything on one drive is just mirrored on the other. Messed up PVE on one drive; you've got yourself 2 copies of the same messed up PVE. Any mirror/array solution (from a protection perspective) may help to mitigate local disk HW problems (sometimes but not always) - but will never offer any type of Host backup solution.

The only main pointers I'll give you are this.

1. Make as little changes as possible to your main PVE host. Wherever possible all services/changes should be run in LXCs/VMs only.
2. Document ALL changes you make to PVE. (You'll later find this invaluable & thank yourself).
3. Make a copy of the main PVE config/etc etc. (maybe use a script as I linked to above). You can use that at least as a reference to how you set things up.
4. From time to time, make a full disk image (zipped) of your PVE host drive.

Good luck.
 
However I will tell you this, your mirrored array is in no way to be considered a PVE host backup.
Ya, unfortunately I know, the first three rules of RAID is “it’s not a backup”. I just mean, losing the host due to a drive failure and not having an easy way to restore a config seems crazy to me.

Turenas and pfsense are so incredibly simple to restore, but… they are appliances unlike proxmox which is just Debian, KVM, and some proxmox-y things sprinkled on top.

I will investigate how to best back my system up. Actually…. Can I use PBS to back it up? I have a testbench that is physically separate from my promos host, and I have (not jokingly) too many small SSD’s and random harddrives I don’t actually have a use for sitting around. Would it make sense to instal PBS on one of them and use it as a way to backup Proxmox itself? Is that a viable solution?
 
You can use the proxmox-backup-client to backup any file/folder/block device to the PBS. I use that for both block-level backups of the whole system disks as well as config backups on file level.

But you will have to code your own backup/restore scripts.
 
Last edited:
Yea... PVE backup seems so ridiculous to me. Losing your host and not having a robust backup solutions seems so archiac to me. Thats why I went to a mirrored array even tho its just a homelab.

I will get that a look, thanks!

https://www.youtube.com/watch?v=g9J-mmoCLTs
^ Bare-metal backup and restore of PVE host with Veeam backup agent for linux -- I have tested this in a VM and it will restore LVM structure for you (but NOT LVM-thin!) and ext4 rootfs. Caveat - if you have other XFS/ext4 mounted disks and try Complete system backup, it throws those in as well. So I back my rootfs up with just the volume-level

I also have homegrown scripts that work for me, pls feel free to test them out and provide feedback

https://github.com/kneutron/ansitest/tree/master/proxmox
 
https://www.youtube.com/watch?v=g9J-mmoCLTs
^ Bare-metal backup and restore of PVE host with Veeam backup agent for linux -- I have tested this in a VM and it will restore LVM structure for you (but NOT LVM-thin!) and ext4 rootfs. Caveat - if you have other XFS/ext4 mounted disks and try Complete system backup, it throws those in as well. So I back my rootfs up with just the volume-level

I also have homegrown scripts that work for me, pls feel free to test them out and provide feedback

https://github.com/kneutron/ansitest/tree/master/proxmox
Hmm, how do I check if I am using lvm-thin? I want to say I am, but honestly can't remember. And I am not seeing in the settings of my VM's where it would say this.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!