Samsung SSDs a right ashift size for ZFS pool?

Vassterak

Member
Jun 3, 2020
7
0
6
24
Czech republic
Hello,
I'm creating new zfs pool (raid 1)
And I have 2x Samsung 860 evo 512GB

I know that proxmox like to write a lot of stuff to disks (my setup 45 gb per 24 hours average, all VM idling none of them writing to disks)

And I didn't find any satisfying result about ashift for ssd.
Is it right to set it to 9? If SMART report says the sector size is 512 bytes?

Or should I go higher like 12/13 for ashift?

Also If I create a pool with higher ashift would not it create a write amplification?

Thank you for answer!
 
Also If I create a pool with higher ashift would not it create a write amplification?

If your ashift value is lower than the (mostly unknown) internal blocksize of your SSD, you will have write amplification. An SSD will have at least 4K (=12), some Samsung Enterprise SSDs have 8K (=13). Most cheap SSD like your EVO or even the PRO have higher internal blocksizes and are not suited for ZFS and/or PVE on them. They will wear out quickly in comparison to Enterprise SSDs - and we're talking about half a year wearout exhaustion of an SSD that is supposed to have a destined lifespan of 5 years.
 
  • Like
Reactions: _gabriel
Ok, But why SMART report 512 bytes is it a bug? Or is it on purpose?
I found some threads on other forums and I found that: It's on purpose for some compatibility?
But it wasn't corfirmed.
 
Ok, But why SMART report 512 bytes is it a bug? Or is it on purpose?
I found some threads on other forums and I found that: It's on purpose for some compatibility?
But it wasn't corfirmed.

For decades, the default block size was 512 bytes, so most devices report this, even if they do not implement it internally. The emulated state is often referred to as 512e. AFAIK all devices over 2 TB are using internal 4K sectors and the devices are then either 4K native or 512e for backwards compatibility. You should always use the native one due to possible write amplification.
 
I personally find that using ZFS wears out your drive extremely quickly. I'm testing now with one drive on a homelab, of the same type you're using and I've written 3.260 TB in less than half a year. The drive is supposed to last 4600 TBW. Check "Total_LBAs_Written" from day to day. It is exactly as LnxBil suggests.

On topic; I'd go for an ashift of 13, this seems to have the best effect for most people using this drive type. I'm using 12.

Not to pirate this thread but, what's a good recommendation for an enterprise SSD for using ZFS? Type/model/size for running proxmox and type/model/size for data storage? Any real world experience?
 
Not to pirate this thread but, what's a good recommendation for an enterprise SSD for using ZFS? Type/model/size for running proxmox and type/model/size for data storage? Any real world experience?

I've never had a bad enterprise SSD. With every server you buy from a server vendor, you get enterprise grade SSDs. If you want to buy yourself, you can look at this link. Personally I use used PM863 in my home setup, on work I'm using those for over 7 years with a minimal wearout.
 
Hi, thanks all of you for answer.
I'm running 2 Windows VMs and one ubuntu VM.
And since last week I'm measuring total LBAs written and it's around 50gb per day.

And if drives are rated for 300TBW and I will use let's say 80gb per day still the drives would last approximately 10years right?

But I know these drives aren't enterprise so within a year I will buy a proper drives.
 
Hi, thanks all of you for answer.
I'm running 2 Windows VMs and one ubuntu VM.
And since last week I'm measuring total LBAs written and it's around 50gb per day.

And if drives are rated for 300TBW and I will use let's say 80gb per day still the drives would last approximately 10years right?

But I know these drives aren't enterprise so within a year I will buy a proper drives.

Hi,

Sorry to jump on this thread, but since I'm planning on using similar disks, your experience after some time may help me deciding how to setup my system.

Which ashift did you end up setting?
How is the wearout of those samsung 860 evo 512Gb disks?

You're using them for VMs, which I believe will give them more usage, in my case I'm planning on using 2x samsung 860 evo 250Gb as zfs raid1 for the proxmox OS installation only. But I'm concerned they will be killed in a very short time (1 year or less).

Thank you.
zecas
 
If you just use them for the OS wear should be fine and the disks should last for several years. Just keep in mind that those consumer SSDs got no powerloss protection so on an power failure/outage you might loose you complete pool, even if it is raid1, because both SSDs will loose data at the same time. But for a VM storage you really want to use enterprise SSDs.
 
Last edited:
If you just use them for the OS wear should be fine and the disks should last for several years. Just keep in mind that those consumer SSDs got no powerloss protection so on an power failure/outage you might loose you complete pool, even if it is raid1, because both SSDs will loose data at the same time. But for a VM storage you really want to use enterprise SSDs.

Thank you for the reply, the lack of power loss protection is another concern, I'm planning to get an UPS to help dealing with that, but the ideal solution would be to have UPS and power loss protection on the SSDs :).

I also started to think if it would be a better approach to just use a single SSD with ext4 for the OS, loosing redundancy but maybe increasing disk longevity and I believe ext4 filesystem would also help to prevent lost of data in a power failure scenario, am I right?

Thanks
 
If you just use them for the OS wear should be fine and the disks should last for several years. Just keep in mind that those consumer SSDs got no powerloss protection so on an power failure/outage you might loose you complete pool, even if it is raid1, because both SSDs will loose data at the same time. But for a VM storage you really want to use enterprise SSDs.
- even for OS is not a good idea, because rrdcache will write a lot of data
- and if you want to use Samsung .... EVO pro for cache, then you will see that after 2-3 weeks you will have a wearlevel about 5%

Good luck / Bafta !
 
- even for OS is not a good idea, because rrdcache will write a lot of data
- and if you want to use Samsung .... EVO pro for cache, then you will see that after 2-3 weeks you will have a wearlevel about 5%
Full ack, this is what I wanted to write. In addition to rrdcache, the /etc/pve filesystem, or more precisely the sqlite database behind it also writes a lot.

It is totally useless to use SSD for the PVE system alone (excluding swap). The system is normally once boots and runs for weeks maybe month if you apply all kernel updates directly (and reboot) and everything you need will be in the cache. So the benefit of having SSD is negligeable.
If you have on the other hand swap and use it a lot, you will have even higher wearout and not a fast system, because of the low-end SSD devices.

2x samsung 860 evo 250Gb as zfs raid1

If it has to be two SSDs for the OS, just buy two used 120 GB Samsung Enterprise SSD from ebay, which are cheaper and more reliable than the EVOs.
 
- even for OS is not a good idea, because rrdcache will write a lot of data
- and if you want to use Samsung .... EVO pro for cache, then you will see that after 2-3 weeks you will have a wearlevel about 5%

Good luck / Bafta !
Main problem is the write amplification that is usually worse on consumer SSD. Especially if there is no powerloss protection so sync writes can't be cached/optimized. PVE actually isn't wirting that much. Lets assume its some hundret MBs per day of real data. But its doing alot of small writes (metrics, logs, writing the cluster configs each minute, ...) and if a 4KB IO causes for example a 128k write to the NAND these hundrets of MB per day easily gets amplified to something like dozens of GBs per day. But even then that shouldn't be such a big problem. The SSDs are rated for 150TB TBW over 5 years. So (150*1000 GB TBW)/(5*365 days)=82GB per day. So as long as there isn't more than 82GB written to the SSDs NAND per day your SSD should survive its 5 year warranty.
And by default there is no swap when using ZFS. But ofcause, if you add swap to that OS disks that will highly increase the wear.

Just monitor your SSDs SMART stats and look that your writes are less than 82GB per day.
 
Last edited:
Full ack, this is what I wanted to write. In addition to rrdcache, the /etc/pve filesystem, or more precisely the sqlite database behind it also writes a lot.

It is totally useless to use SSD for the PVE system alone (excluding swap). The system is normally once boots and runs for weeks maybe month if you apply all kernel updates directly (and reboot) and everything you need will be in the cache. So the benefit of having SSD is negligeable.
If you have on the other hand swap and use it a lot, you will have even higher wearout and not a fast system, because of the low-end SSD devices.



If it has to be two SSDs for the OS, just buy two used 120 GB Samsung Enterprise SSD from ebay, which are cheaper and more reliable than the EVOs.

Main problem is the write amplification that is usually worse on consumer SSD. Especially if there is no powerloss protection so sync writes can't be cached/optimized. PVE actually isn't wirting that much. Lets assume its some hundret MBs per day of real data. But its doing alot of small writes (metrics, logs, writing the cluster configs each minute, ...) and if a 4KB IO causes for example a 128k write to the NAND these hundrets of MB per day easily gets amplified to something like dozens of GBs per day. But even then that shouldn't be such a big problem. The SSDs are rated for 150TB TBW over 5 years. So (150*1000 GB TBW)/(5*365 days)=82GB per day. So as long as there isn't more than 82GB written to the SSDs NAND per day your SSD should survive its 5 year warranty.
And by default there is no swap when using ZFS. But ofcause, if you add swap to that OS disks that will highly increase the wear.

Just monitor your SSDs SMART stats and look that your writes are less than 82GB per day.

Well these SSDs are consumer grade, so no powerloss protection indeed, and the prospect of corrupting the OS disks in case of a power failure would be a real pain. Maybe I should go for a pair of SAS HGST 600Gb for the job? I know ... 600Gb would be overkill, but in this case I could put an ISO storage in there.

I'm just looking for the OS part, since the VMs will run from mechanical disks for the time being. When time and budget arrives, I will create a new ZFS on enterprise SSDs and move the VMs there. If it were today, I would be thinking about some new PM897 480Gb disks.

I'm also taking the chance to look into SSDs on ebay. I don't know about your experience, but I have none buying SSDs online, and I was keeping out just because we never know what usage they went through.

Anyway, at the moment I found 2 products that seem a possibility, can anyone give me an opinion? I'll learn something from it, that's for sure :)

- Samsung PM883 MZ-7LH2400 240GB
Buying 2x plus import and postage charges, it would set me at around 80€

- Samsung SM863 MZ-7KM1200 120GB
Buying 2x plus import and postage charges, it would set me at around 115€

From what I looked for, they both have power loss protection. The SM863 is a generation older than the PM883, but also have higher endurance:
- SM863 - DWPD 3.5 / 5 years
- PM883 - DWPD 1.3 / 3 years

Even though the SM863 should be better on paper than the PM883, the SM863 is older and thus worked for more time (potentially) and already reached the 5 year warranty.

No more information, besides some specs about the disks, which can be found around the web. I could ask the seller, but don't know if anyone would be willing to run some tests or provide some smart data.

Thank you.
 
I'm also taking the chance to look into SSDs on ebay. I don't know about your experience, but I have none buying SSDs online, and I was keeping out just because we never know what usage they went through.
You can ask the seller to show you the SMART stats before buying. Bought 19 second hand enterprise SSDs from different sellers on ebay and co and always got a picture of the SMART stats so I could verify before buying how much data was written to the SSDs and how much life they got left. None of the 19 SSDs I bought had more than 4% wear. There you can get some great deals like 2x 100GB enterprise SSDs for less than 20$ including shipping. Great deal for a system only striped mirror for PVE.
Even though the SM863 should be better on paper than the PM883, the SM863 is older and thus worked for more time (potentially) and already reached the 5 year warranty.
Have a close look at the warranty. For example Intels warranty for SSDs only includes the initial buyer. If you got them second hand there is no warranty anyway that you could make use of.
 
Last edited:
Yes, the SM and PM 863 are running great. We have different versions, we use them for almost 10 years now and we neither reached wearout (with running ZFS all the time) nor had a single disk failure. Using them as single mirrors, stripped mirrors and special vdev for huge ZFS pools - all without any problems.
 
You can ask the seller to show you the SMART stats before buying. Bought 19 second hand enterprise SSDs from different sellers on ebay and co and always got a picture of the SMART stats so I could verify before buying how much data was written to the SSDs and how much life they got left. None of the 19 SSDs I bought had more than 4% wear. There you can get some great deals like 2x 100GB enterprise SSDs for less than 20$ including shipping. Great deal for a system only striped mirror for PVE.

Have a close look at the warranty. For example Intels warranty for SSDs only includes the initial buyer. If you got them second hand there is no warranty anyway that you could make use of.

Yes, the SM and PM 863 are running great. We have different versions, we use them for almost 10 years now and we neither reached wearout (with running ZFS all the time) nor had a single disk failure. Using them as single mirrors, stripped mirrors and special vdev for huge ZFS pools - all without any problems.

I think you convinced me into looking into ebay for a good deal. The first buy is the most difficult one ... same thing when I bough the first refurbished server :)

I have not many experience on interpreting smart data, for instance I found a deal that posted crystaldiskinfo screenshots. The software basically resumes in a percentage the health status, which look very good ( both > 90%), but host read/write and power on count/hours seems too higher ... probably not for a 24/7 system. As for other properties, have to assume they are all ok:

Samsung PM863 SATA SSD
120Gb 22€
240Gb 35€
30€ postage

Pictures put them on manufacture year 2015/2016, so they passed the 3 year warranty, even though they are not the latest model, they are the previous one (1 generation older).

s-l500.png
s-l500.png


Shouldn't a tool like smartctl give more detailed info and disk parameters regarding the system? Or are those parameters displayed on crystaldiskinfo all the available ones for the disks tested? (well, some parameters are not displayed, screenshots show a scroll bar)


Thank you.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!