Quick wear out on raid1 - Looking for suggestions

en4ble

Member
Feb 24, 2023
66
4
8
This been first deployment using raid1 using wd red ssd (not the best tier but would never expect such a quick wear out).

Server been up for 75 days with wear out at 30% already (almost .5 % per day).
1724728649357.png

Purpose of those drives was primary OS (only) but they also hold couple (11) virtual security appliances (pfsense) (this could potentially be the reason of the high wear - but haven't tested that theory).

iostat for the raid:

Code:
 zpool iostat -v
                                          capacity     operations     bandwidth
pool                                    alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
rpool                                    171G   293G     24    214   351K  10.1M
  mirror-0                               171G   293G     24    214   351K  10.1M
    ata-WDC_WDS500G1R0A-68A4W0_24070L800900-part3      -      -     12    107   174K  5.03M
    ata-WDC_WDS500G1R0A-68A4W0_24070L800864-part3      -      -     12    107   176K  5.03M
--------------------------------------  -----  -----  -----  -----  -----  -----

And they seem heavy read/write based on smart:
241 Host_Writes_GiB 0x0030 253 253 --- Old_age Offline - 32044
242 Host_Reads_GiB 0x0030 253 253 --- Old_age Offline - 1091

1724728878409.png

I would appreciate some opinions on best approach here to perhaps minimize the wearout - something I may not be aware with raid1 we could potentially turn off?

Options and my ask:

1)migrate the vFWs to other pools so they don't use the raid - and monitor if the % is slowing down.
2)migrate to enterprise grade ssds - perhaps hot swap. Would love options on best SSDs for this based on the use case and read/write its doing now
3)any options that could be looked at with raid1 to perhaps improve hw longevity?
4)any other suggestions perhaps?!

Thank You in advance for replies!
 
Last edited:
  • Like
Reactions: Kingneutron
IIRC it helps disable logging of services that you are not using.
 
Used pair of cheap Intel S3700 / S3710 400GB makes a great Proxmox ZFS mirrored boot drive IMHO.
 
  • Like
Reactions: Kingneutron
2)migrate to enterprise grade ssds - perhaps hot swap. Would love options on best SSDs for this based on the use case and read/write its doing now

You don't really have much meaningful choices, price/value ratio-wise, perhaps Kingston DC600M for SATA SSD. For M.2 NVMe, you are limited to Microns 7300 or 7450 especially if you need 2280 size.
 
Used pair of cheap Intel S3700 / S3710 400GB makes a great Proxmox ZFS mirrored boot drive IMHO.
Could you go on lower storage when performing hotswap? I was under the impression I need same size or bigger when upgrading? Those are 500GB WD Reds.
 
I use consumer NVMe SSDs in my Proxmox machines, without a problem. I do disable the corosync, pve-ha-crm, and pve-ha-lrm services to minimize drive writes (no clusters here). These drives are just about a year old and have zero % wear out, so I would say something is not right with your set up. I also really don't store any data on my Proxmox nodes. All application/persistence data, docker volumes and VM/CT backups, etc. reside on my Synology NAS. This particular machine has the OS and the VMs all on the same mirrored ZFS drives.

1724789647200.png

1724789695376.png
 
  • Like
Reactions: Kingneutron
I use consumer NVMe SSDs in my Proxmox machines, without a problem. I do disable the corosync, pve-ha-crm, and pve-ha-lrm services to minimize drive writes (no clusters here). These drives are just about a year old and have zero % wear out, so I would say something is not right with your set up. I also really don't store any data on my Proxmox nodes. All application/persistence data, docker volumes and VM/CT backups, etc. reside on my Synology NAS. This particular machine has the OS and the VMs all on the same mirrored ZFS drives.

View attachment 73787

View attachment 73788
Thanks for the info man. I was shocked seeing wd red going down like that. Again I do run 11 vPfsense appliances that are being utilized heavily. Other than that its a vanilla proxmox deployment

I should be able to verify if its in fact pfsense in about a day. I'll move those off the raid1 and I think I'll run the post install script that was mentioned and remove the "cluster" since its also running solo.
 
  • Like
Reactions: Kingneutron
I use consumer NVMe SSDs in my Proxmox machines, without a problem. I do disable the corosync, pve-ha-crm, and pve-ha-lrm services to minimize drive writes (no clusters here). These drives are just about a year old and have zero % wear out, so I would say something is not right with your set up. I also really don't store any data on my Proxmox nodes. All application/persistence data, docker volumes and VM/CT backups, etc. reside on my Synology NAS. This particular machine has the OS and the VMs all on the same mirrored ZFS drives.

View attachment 73787

View attachment 73788
@louie1961 what is your iostat for this raid? Could you please share?
 
Personally, I'll never use WD RED SSD ever again. Had a 1TB in my mostly-off-except-weekends ZFS server and it died right after the warranty expired.

I would also not recommend trying to boot/run Proxmox on sub-1TB SSD if you can help it, larger sizes typically have higher TBW ratings.

You can get by on a budget with a 256GB SSD for OS and a small lvm-thin, but you have to turn off cluster services + turn off atime everywhere (including in-guests), set swappiness to 0 -- and look into zram and log2ram. And have enough RAM in the server so things don't swap

The no-name nvme ~238GB that came with my Qotom firewall appliance is only at 1% wear (mostly 24/7 operation since Feb) with the above mitigations in place. I may upgrade it to a 1TB when it gets to ~50% wear but for now it's no worries


Disk model: YSO256GTLCW-E3C-2

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 1%
Data Units Read: 23,287,249 [11.9 TB]
Data Units Written: 5,732,131 [2.93 TB]
Host Read Commands: 159,202,847
Host Write Commands: 267,908,696
Controller Busy Time: 1,327
Power Cycles: 62
Power On Hours: 4,614
Unsafe Shutdowns: 47
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 56 Celsius

I put in a 2nd Lexar NM790 1TB (with heat sink) to run the VMs on, and it's just recently crossed over to 1% wear.

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 15,424,533 [7.89 TB]
Data Units Written: 10,967,567 [5.61 TB]
Host Read Commands: 65,840,472
Host Write Commands: 239,758,570
Controller Busy Time: 481
Power Cycles: 24
Power On Hours: 4,519
Unsafe Shutdowns: 16
Media and Data Integrity Errors: 0
Error Information Log Entries: 1
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 40 Celsius
Temperature Sensor 2: 37 Celsius

EDIT:
2024.0518 enabled RAM disk logging for less writes to disk
[[
By default, pfsense is wrtting logs very often, so you can turn on a RAM disk option so it writes to RAM and flushes to SSD much less often.

In pfsense: "System" -> "Advanced" -> "Miscellaneous" -> "RAM Disk Settings"
]]

If you did the default install of pfsense to zfs boot/root, you should probably move the VM disk storage to lvm-thin so you're not doing COW+COW write amplification. And enable logging to RAM in-VM as described
 
Last edited:
Personally, I'll never use WD RED SSD ever again. Had a 1TB in my mostly-off-except-weekends ZFS server and it died right after the warranty expired.

I would also not recommend trying to boot/run Proxmox on sub-1TB SSD if you can help it, larger sizes typically have higher TBW ratings.

You can get by on a budget with a 256GB SSD for OS and a small lvm-thin, but you have to turn off cluster services + turn off atime everywhere (including in-guests), set swappiness to 0 -- and look into zram and log2ram. And have enough RAM in the server so things don't swap

The no-name nvme ~238GB that came with my Qotom firewall appliance is only at 1% wear (mostly 24/7 operation since Feb) with the above mitigations in place. I may upgrade it to a 1TB when it gets to ~50% wear but for now it's no worries


Disk model: YSO256GTLCW-E3C-2

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 1%
Data Units Read: 23,287,249 [11.9 TB]
Data Units Written: 5,732,131 [2.93 TB]
Host Read Commands: 159,202,847
Host Write Commands: 267,908,696
Controller Busy Time: 1,327
Power Cycles: 62
Power On Hours: 4,614
Unsafe Shutdowns: 47
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 56 Celsius

I put in a 2nd Lexar NM790 1TB (with heat sink) to run the VMs on, and it's just recently crossed over to 1% wear.

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 15,424,533 [7.89 TB]
Data Units Written: 10,967,567 [5.61 TB]
Host Read Commands: 65,840,472
Host Write Commands: 239,758,570
Controller Busy Time: 481
Power Cycles: 24
Power On Hours: 4,519
Unsafe Shutdowns: 16
Media and Data Integrity Errors: 0
Error Information Log Entries: 1
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 40 Celsius
Temperature Sensor 2: 37 Celsius

EDIT:
2024.0518 enabled RAM disk logging for less writes to disk
[[
By default, pfsense is wrtting logs very often, so you can turn on a RAM disk option so it writes to RAM and flushes to SSD much less often.

In pfsense: "System" -> "Advanced" -> "Miscellaneous" -> "RAM Disk Settings"
]]

If you did the default install of pfsense to zfs boot/root, you should probably move the VM disk storage to lvm-thin so you're not doing COW+COW write amplification. And enable logging to RAM in-VM as described
@Kingneutron I really appreciate your feedback.

Yes, I know about the RAM disk in pfsense. This will be part of my migration day after tomorrow. But good info in case someone did not know and can improve on that.

Will be moving those vpfsense onto lvm-thin as you described.

So I know about the cluster services (someone mentioned the post install script - I will run it) but I'm not sure what is "turn off atime" - never heard that one. Could you elaborate please on this aspect?! Thank you. EDIT: think I found some info: https://www.unixtutorial.org/zfs-performance-basics-disable-atime/

But still would appreciate some guidance on how to disable it globally? I can see a lot of entries for "atime"


Code:
mount | grep "atime"

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)

proc on /proc type proc (rw,relatime)

udev on /dev type devtmpfs (rw,nosuid,relatime,size=528274036k,nr_inodes=132068509,mode=755,inode64)

devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)

tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=105661512k,mode=755,inode64)

rpool/ROOT/pve-1 on / type zfs (rw,relatime,xattr,posixacl,casesensitive)

securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)

tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)

cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)

pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)

efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)

systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=232193)

hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)

mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)

debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)

tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)

fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)

configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)

ramfs on /run/credentials/systemd-sysctl.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)

ramfs on /run/credentials/systemd-sysusers.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)

ramfs on /run/credentials/systemd-tmpfiles-setup-dev.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)

rpool on /rpool type zfs (rw,relatime,xattr,noacl,casesensitive)

rpool/var-lib-vz on /var/lib/vz type zfs (rw,relatime,xattr,noacl,casesensitive)

rpool/ROOT on /rpool/ROOT type zfs (rw,relatime,xattr,noacl,casesensitive)

rpool/data on /rpool/data type zfs (rw,relatime,xattr,noacl,casesensitive)

ramfs on /run/credentials/systemd-tmpfiles-setup.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)

binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)

lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)

sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)

/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)

tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=105661508k,nr_inodes=26415377,mode=700,inode64)

Swap is off and I have plenty of ram so no worries there.

Worse case if I'm not able to stop the degradation I think I'll just need to figure out the hot swap way to get some heavy enterprise ssds but I'm really hoping I can slow it down with migration off the raid1. This is my first experience with zfs and probably my last :D

Thank you again!
 
Last edited:
  • Like
Reactions: Kingneutron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!