SSD and Layout recommendation ?

TotallyInOverMyHead

New Member
Dec 12, 2015
9
0
1
Hi there,
I have a very peculiar problem at work.


We have been running a Proxmox-Server for testing purposes, using KVM virtual machines that host linux based log-server(s), network monitoring solutions, flow graphing and the likes for testing purposes for 2 month now. When we set this server up I installed a single brand new Samsung 850 EVO 250GB.

Now i was told to make this single server production ready (which means an increase by 10x on the monitored devices), so i upgraded the Server to a beefier model, installed a 8 TB Sata-3 drive for storage and by chance decided to do a readout of the smart values on the SSD before plugging it into the new machine. 72 TBW, that disk is rated for 75 TBW. That is no good.

Now i read up on the issue and it seems to have to do with write amplification on SSDs and found a couple threads on this forum, that seems to echo this issue.


Now i have a three questions:

Can anyone recommend good SSD model that will hold 2-3 years? Preferably something that is 250GB+ in space, enterprise grade, maybe even has power loss protection and good write IOPS ?

What is the best way to set up the disks for this machine ?
I was thinking of sticking the Proxmox-OS and VM-Disks on the SSD, and the VM-Data (including log folders, etc) on the HDD, in order to lessen the effect of the SSD being hammered by loads of small writes from our VM's. Backups go to a NFS share on a NAS. I am however unsure what storage Type to best use, and then what Caching method to use for virtio based vDisks.

Is there anything else i need to configure in proxmox when using SSDs ?

Any input gladly appreciated
 
1) For the SSD part, use disks that are good as Ceph log devices. A "go-buy"-list is here:

http://www.sebastien-han.fr/blog/20...-if-your-ssd-is-suitable-as-a-journal-device/

2) seeh 3

3) You're using raid, right? If not, do so. Put Proxmox VE on the disk, not on the SSD, because the current implementation of the config backend flushes a lot of data and is said to be the cause for the extreme wearout. I suffered also from that "bug".
 
still have to check with the distributer, but i'm probably going with a Intel DC S3700 200GB for 190€ +VAT, 3700 TBW is far far ahead of a Samsung 850 Pro 512 GB's 300 TBW. let alone the 750 EVO's 75 TBW.
The DC P3700 is quite to rich for my companies blood at almost 602€, altho it provides alot better performance according to teh ceph list.

I find it quite hilarious how huge the TBW difference is compared to the Price.

Intel DC3700 200GB - 0,05€/TBW
Intel DC P3700 - 0,08€/TBW
Samsung SM863 240GB - 0,09€/TBW
Samsung 850 PRO 512GB - 0.63€/TBW
Samsung 850 EVO 512GB - 0.91€/TBW
Samsung 850 EVO 250GB - 1.05€/TBW



Thanks again, that ceph list and my €/TBW comparison have been quite eye opening
 
TBW for the 750 EVO is really bogus. The internal minimal blocksize for one write is soooo huge, you'll reach that numer very fast. The important term is still the sync write for speed, so the list is really good.
 
went with 2x 500GB SAS in ZFS Raid-1 via installer.

Now the only burning question remains as to what storage type to use for the single SSD (Intel DC S3700 200GB) so i can get the most performance out of the SSD.

As far as i can see my options are:
  • LVM
  • LVM Thin
  • Directory
Not sure if you can do a ZFS local pool out of one SSD and if that would even make any sense compared to above options. As a matter of fact i do not even know what advantage/disadvantages/gotchas i'd be facing with the above options.

The Data is mostly gonna be KVM disks for debian/ubuntu using virtio drivers and the occasional Container

Any pointers ?
 
yes you can do a single disk zpool - but you obviously won't have redundancy in case your single disk fails ;) which means you are missing a big chunk of the features the people usually use ZFS for. but you would still benefit from snapshots and linked clones, zfs send/receive, checksumming (but no recovery because there is no second copy), compression, ...
 
3) You're using raid, right? If not, do so. Put Proxmox VE on the disk, not on the SSD, because the current implementation of the config backend flushes a lot of data and is said to be the cause for the extreme wearout. I suffered also from that "bug".

Do you have any numbers? So it is currenty not recommended to install proxmox on SSD?

....
As far as i can see my options are:
  • LVM
  • LVM Thin
  • Directory
Not sure if you can do a ZFS local pool out of one SSD and if that would even make any sense compared to above options. As a matter of fact i do not even know what advantage/disadvantages/gotchas i'd be facing with the above options.
The Data is mostly gonna be KVM disks for debian/ubuntu using virtio drivers and the occasional Container
Any pointers ?

I think ZFS is the best option even a single disk setup (fabian mentioned some features). Correct me if i am wrong.
Two possible setups came through my mind, which setup would be the best?

(1) Install proxmox directly on ZFS during installtion. That way storage->local and local-lvm would be already on ZFS and i don't need to add an extra zfs-storage. (disadvantage: installation-setup and vm's are not separated)

(2) Install Proxmox on an extra small HDD/SDD with or without root-ZFS and configure the Intel DC S3700 200GB like:
-Storage->ZFS for Disk-Image, Container (new dataset for every vm : /ssdpool/vm-<01>)
-Storage->Dir for Container Template, ISO Image, VZDUMP file on the same zfs pool but different dataset (/ssdpool/vm-iso-backup)

I would prefer to option 2, if it is possible to use Storage->ZFS and Storage->Dir simultaneously on the same SSD ?!?!
(Asked similiar question here)
 
I would prefer to option 2, if it is possible to use Storage->ZFS and Storage->Dir simultaneously on the same SSD ?!?!
(Asked similiar question here)

yes. the ZFS storage plugin supports using an arbitrary dataset as base , and the dir storage supports using an arbitrary directory as base.

you can for example create the following datasets:
Code:
myzpool/guestvolumes => ZFS storage, container and VM images
myzpool/templates => dir storage, templates and isos
myzpool/backup => dir storage, vzdump backups

PVE will then automatically allocate ZVols (for VMs) or datasets (for containers) under myzpool/guestvolumes/, and will happily use the mounted directories for the dir storage. just make sure to set "is_mountpoint" on the directory storages so that they only get activated when mounted, not before. you can of course also do one directory storage for both templates and backups, but separate snapshotting etc is very convenient!

I would not recommend using a "directory on ZFS" storage for VM or CT images - you'd lose most or all the nice features and incur a performance penalty
 
Thanks @fabian - I'll do the VM and CT Storage on ZFS then, seems like the easiest option (just wasn't sure on the performance on a single disk ZFS vs single disk-LVM)


@bogo22 :
We went with
2x 500GB SAS in ZFS Raid-1 during install (Templates/ISO/short term backups)
1x Intel DC S3700 (reason see above) via ZFS-Plugin (VM and CT Storage)
1x NFS (redundant NAS) for long term backups

still contemplating if we wanna use a 8TB spinner for real time monitoring metrics and back these up to a NAS, or write em directly to a NAS, but thats beyond the scope of this topic.
 
I would not recommend using a "directory on ZFS" storage for VM or CT images - you'd lose most or all the nice features and incur a performance penalty

There are cases in which this setup is good (for technical reasons, not performance):

You want to have incremental backup possibility (ZFS snapshots & send/receive) and need to run VM with non-linear snapshots (QCOW2) e.g. want to switch easily between different snapshots of VM (organized in a tree, not a linear timeline as ZFS does). I need such a setup for a VM with different version of software installed in each snapshot.
 
We went with
2x 500GB SAS in ZFS Raid-1 during install (Templates/ISO/short term backups)
1x Intel DC S3700 (reason see above) via ZFS-Plugin (VM and CT Storage)
1x NFS (redundant NAS) for long term backups

You put your templates, isos and backups on a RAID, but your data is still "vulnerable"? That makes no sense to me.

Just buy another SSD and enhance your single-disk ZFS pool to a mirror.
 
You put your templates, isos and backups on a RAID, but your data is still "vulnerable"? That makes no sense to me.

Just buy another SSD and enhance your single-disk ZFS pool to a mirror.

Short answer:
Its a limitation of the Hardware used (Thanks to our shitty work policy of using every piece of old hardware unless it is broken, or >=10yrs at time of setup), for anything that is considered non-critical (like e.g. our billing system :rolleyes:)


Long Answer
The case has 3x internal and 1x external Hotswapable mounting options (canibalized the CD-Drive for this already).
So to illustrate my thinking:
Should a SAS-Drive fail, i can just replace it and do not have to reconfigure the proxmox-server (which, given my self-written shorthand guides still might take me 60+ Minutes of sitting infront of the screen). Still haven't found a linux pendant to something like shadowprotect for windows.
Should the SSD (which is in the Hotswap bay) fail (and i'm guessing this to be more likely then both SAS-Drives Failing due to my experience above which triggered this topic), i still do have local backups (on the SAS-Drives) and the remote backups. SO i can just change the SSD, and restore from backups. (btw i am aware there is a 4th mounting option available to me, but i'll probably need this for a HDD that stores historical monitoring data (so i do not kill my SSD within a couple of months again)

Loosing a days worth of data, will not kill me, having to set this all up again might.
 
Last edited:
I still don't get it.

From a ROI point, adding redundancy to OS and VM data (as I understand is on your SSD) will be much better than a restore. In case of failure of one SSD or disk, order a new one, shutdown the server in the off-office hours, replace the disk, bootup and resilver your ZFS pool. You have 0 downtime and no time to spend restoring and stuff. It's important that you'll have enterprise grade disks, because enterprise grade firmware does return an error faster instead of trying to fix your hardware. This will greatly improve disk response times in order of a disk failure of some sectors. All managers I know respond well to some ROI calculations, such that they'll provide money for that. The SSD cannot cost more a few days of work for you, so it should be clear that an additional ssd will be much cheaper.

Also, ZFS snapshots are similar to shadowprotect (as far as I understand what that software does). You can snapshot your VMs and your OS at the same time in the same pool. Backup is then a send/receive operation of the data to a off-site ZFS pool, restore a simple send/receive back. Incremental, compressed and potentially deduplicated (needs really beefy hardware) backup streams are the norm with ZFS.
 
[...]
All managers I know respond well to some ROI calculations, such that they'll provide money for that.[...]

Without rehashing the myriad of discussions I have with people over this on a almost daily basis (when i am at that Job and the 2 days afterwards):
  • The problem is, that we are an owner operated business.
  • Who needs managers anyways ?
  • The Fish starts to smell at the head first.
ps.: I know this. I do operate this way on my primary job (it has managers), but it still is a fight to even get broken hardware replaced, when the one writing the checks turns over every cent 4 times, gets confused when his 1+3 equals 6, cause they forgot to add 2 and think they are an expert because they read a Personal Computer magazine on Paper once a month, but throws away 10's of thousands of euros regularly cause they forget to cancel their leasing contracts :mad:


BTT:
[...]
Also, ZFS snapshots are similar to shadowprotect (as far as I understand what that software does). You can snapshot your VMs and your OS at the same time in the same pool. Backup is then a send/receive operation of the data to a off-site ZFS pool, restore a simple send/receive back. Incremental, compressed and potentially deduplicated (needs really beefy hardware) backup streams are the norm with ZFS.

This I did not know.

The Question is:
If i have a single OS-Drive on ZFS. And if I have said Drive protected via ZFS-Snapshots (of some form) remotely. And if said OS-Drive Fails, how do i get the initial working system going in order to reinstall ? replace hard drive - install base Proxmox, then restore ?

As i understand it, you need a remote ZFS-target for it to work. I can probably use our redundant trueNas for this.
 
The Question is:
If i have a single OS-Drive on ZFS. And if I have said Drive protected via ZFS-Snapshots (of some form) remotely. And if said OS-Drive Fails, how do i get the initial working system going in order to reinstall ? replace hard drive - install base Proxmox, then restore ?

As i understand it, you need a remote ZFS-target for it to work. I can probably use our redundant trueNas for this.

Easiest thing would be to replicate to a bootable replica and physically swap the disks, so a real 1:1 copy. Otherwise you will have a problem of getting the system up to the point to start a restore. This can be done with the Proxmox VE rescue system or some self-made live linux distribution and manually fixing the bootsector, but it'll work. The receiving end also needs ZFS and can be Proxmox VE as well. You can also test your backup inside your other Proxmox VE. This can also be a Laptop or some "low end" device with a lot of disk space (simple usb disk is sufficient, speed does not mater for testing)

Again: You normally want to avoid this by using more than one os disk to reduce the odds for this scenario.

For my clients we often have a multi-tier setup like local raid (hardware failure), local snapshots (software or layer-8 failures), off-site backups for disaster recovery with auto-rebuild scripts. My backup creates the restore scripts automatically such that you can boot a self-made linux distro we use (Debian-based with out configuration stuff etc.) and restore back to the disk if all disks fail. If we need to change the hardware as well, obviously, you need to change additional stuff.

Some customers also want more and use 3-disk RAID1 for having more time until the replacement disk arrives.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!