ZFS Single Disk Setup - Best Practice for longevity and efficiency

gabrimox

New Member
Jun 10, 2025
26
4
3
zpool town
Hi all,
i have pve working home lab setup with a zfs single disk
I read tons of forum and reddit and i'm trying to figure out if i can configure some stuff to improve longevity and efficiency of my single disk,
asking to my big friend chat gpt suggest:


zfs set atime=off rpool
zfs set compression=lz4 rpool
zpool set autotrim=on rpool


Any suggestion/feedback?
thank you
 
Any suggestion/feedback?

Sure:
  1. prepare daily backups onto separate/independent hardware (to PBS if possible )
  2. setup good monitoring, you might need it -- is it an "Enterprise Class" solid state device?
 
  • Like
Reactions: Johannes S
Sure:
  1. prepare daily backups onto separate/independent hardware (to PBS if possible )
  2. setup good monitoring, you might need it -- is it an "Enterprise Class" solid state device?
Hey ty for your feedback
Anyway i already have backup configured, zfs replication send/rcv + /etc for pve host and pbs for vm
Monitoring is also performed

My question is different, i'm trying to understand if i can adopt some configuration to improve healthy, efficiency and longevity of my single zfs disk where pve host is configured... I know lot of parameters are in place by default in zfs but without redundacy can waste the ssd overall
 
There is no need to install in zfs and if you read what is zfs : it is exactly not made for 1 drive. ext4 all good and adjust for ceph stuff. Best is to kill most of the log that are flodding if any error appear. how to edit prox file to set a timer.. not sure.
 
  • Like
Reactions: greybeard
I disable pve-ha-crm and pve-ha-lrm services on single nodes (which reduce logging noise) but I'm not sure if it's a good idea for every single node.
I havent (yet) come across a single node setup where disabling these is a bad thing.
If anyone knows of one, please let us know why....

Perhaps this could be a question during initial setup, for example
"Do you plan to add this node to a cluster?"
"Yes / Maybe": (default choice) then leave these services enabled by default.
"No": then disable these services during initial setup


I have also seen third party scripts which configure various write-heavy log paths to be ram disks, backed by "proper" storage which gets synced every X minutes.
Perhaps having this as a standard feature, rather than add-on hack would be a good idea?
 
  • Like
Reactions: Kingneutron
There is no need to install in zfs and if you read what is zfs : it is exactly not made for 1 drive. ext4 all good and adjust for ceph stuff. Best is to kill most of the log that are flodding if any error appear. how to edit prox file to set a timer.. not sure.
Snapshot and replication are a first valid points to use zfs in single disk too in my opinion...
 
Last edited:
  • Like
Reactions: Johannes S
I've run iotop-c -cPa for about 30 minutes according to my guide here. One picture shows a production node in a cluster using only ZFS. The other is a virtual PVE that has no guests running using the default EXT4/LVM(-Thin) install.
I did not disable any default PVE services or atime. The outlined pve-v VM you see is the virtual PVE from the other picture. I censored some VM names and processes for reasons but nothing PVE related was hidden.
I recommend you do the same test and then calculate your SSD life with this. Write amplification is a thing but you have to judge that for your own setup.

ZFS based node in a cluster
pve_iotopa.png
EXT4 based virtual PVE
pve-v_iotopa.png
 
Last edited:
There is no need to install in zfs and if you read what is zfs : it is exactly not made for 1 drive.
Yes, there is no need to use ZFS and ZFS really shines with multiple drives.

But a) ZFS replication b) transparent compression c) technically cheap snapshots d) data integrity checks / bit-rot-detection(!) and some more features (e.g. encryption, not used in PVE) do work on a single device too.

If there is no actual technical reason (like cheap consumer Solid State or forced hardware Raid) I will always prefer ZFS over the classic filesystems.
 
I've run iotop-c -cPa for about 30 minutes according to my guide here. One picture shows a production node in a cluster using only ZFS. The other is a virtual PVE that has no guests running using the default EXT4/LVM(-Thin) install.
I did not disable any default PVE services or atime. The outlined pve-v VM you see is the virtual PVE from the other picture. I censored some VM names and processes for reasons but nothing PVE related was hidden.
I recommend you do the same test and then calculate your SSD life with this. Write amplification is a thing but you have to judge that for your own setup.

ZFS based node in a cluster
View attachment 90249
EXT4 based virtual PVE
View attachment 90250


i have monitoring already on:

last 24h
1757069552357.png
Patriot P400L 500GB
root@pve:~# nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning : 0
temperature : 54°C (327 Kelvin)
available_spare : 100%
available_spare_threshold : 10%
percentage_used : 52%
endurance group critical warning summary: 0
Data Units Read : 2,081,465 (1.07 TB)
Data Units Written : 9,457,646 (4.84 TB)
host_read_commands : 33,395,921
host_write_commands : 121,206,546
controller_busy_time : 1,560
power_cycles : 27
power_on_hours : 871
unsafe_shutdowns : 8
media_errors : 0
num_err_log_entries : 0
Warning Temperature Time : 0
Critical Composite Temperature Time : 0


52% used it sounds like a bug to me, what do you think?
anyway , what about my current i/o ?
 
i got a great result after tuning journal into Truenas VM (that is zfs....on zfs host....)

1757069734726.png

what i did
nano /etc/systemd/journald.conf
MaxLevelStore=warning
MaxLevelSyslog=warning
Storage=volatile
ForwardToSyslog=no
systemctl restart systemd-journald.service
 
  • Like
Reactions: UdoB
I havent (yet) come across a single node setup where disabling these is a bad thing.
If anyone knows of one, please let us know why....

It's not supported and most writes come from the logging of the metrics for the dashboards and the need to update the configuration database of the proxmox cluster file system ram disk mounted to /etc/pve. These writes also occur with disabling these services and you need the cluster file system even on a single node to run ProxmoxVE.

Perhaps this could be a question during initial setup, for example
"Do you plan to add this node to a cluster?"
"Yes / Maybe": (default choice) then leave these services enabled by default.
"No": then disable these services during initial setup

I don't think so. In my book it's a good thing people need to actively do something to disable them so they need to think about what they actually want to achieve. Sure there will always be people who are willing to do a curl http://easy_pve_setup.homelab|bash but even this stupidity needs a manual step.

I like that the installer gives you a working system which fits the usage in a professional run environment (aka datacenter-grade hardware, external monitoring system, professional sysadmins doing operating), imho it's a bad idea to make it easier for people to run their system in a non-supported way.

If you actually use dc-grade storage media you never ever need to think which services to disable and since you can get them quite cheap on ebay I don't see any problem with the default behaviour even in a homelab. And ProxmoxVE is not developed for homelabs, although (like I do myself) you can also use it for that.
 
  • Like
Reactions: uzumo and UdoB
I have also seen third party scripts which configure various write-heavy log paths to be ram disks, backed by "proper" storage which gets synced every X minutes.
Perhaps having this as a standard feature, rather than add-on hack would be a good idea?

Nope, it wouldn't. Imagine a situation where somebody has some issues leading to reboots. Now he want to find out what went wrong, to bad the logs were lost with the reboot too. I have no issues with people doing such a thing if they are aware of potential issues and how to revert such a thing for troubleshooting. But in my book it's really not a good idea to make it easier to make support more difficult. It's bad enough with the third party scripts (due to people who run everything without understanding what it actually does) it will even more bad if this a standard feature.
 
This is not the same command I used above. Run it, sort by writes and you can see which process is doing the most writes. Isn't that what you cared about?
ok your command was not complete to show cumulative write....

anyway, i got a VM with the higher volume, i'm investigate how to reduce (*arr suite...)
but the problem is that i can see a ratio x4 reflected on pve
so if whole VM write 25M, i see on PVE that is 100M !

this sounds with lot of sense when i see:

root@pve:~# nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning : 0
temperature : 54°C (327 Kelvin)
available_spare : 100%
available_spare_threshold : 10%
percentage_used : 54%
endurance group critical warning summary: 0
Data Units Read : 2,278,780 (1.17 TB)
Data Units Written : 9,640,927 (4.94 TB)


how can i fix this?